c++ - Optimising data-structures so that they take advantage of virtual memory -
i know how optimise data-structures in opencv (the mat type specifically) able leverage operating systems built in memory/virtual memory management.
for full context please read q , here - otherwise situation summed have large collection of mats* i'll need access arbitrarily , rapidly. main complication full amount of data above amount of ram available.
(*conceptually data recursively defined 3d array of 3d arrays, let's not muddy water confusion!)
rather build own lru cache , ram-hungry , inefficient 'page' addressing strategies access it, i'd rather let os me.
i think concepts, when comes actual implementation i'm twiddling thumbs:
is generic c++ consideration, or need address @ opencv level?
is simple making granularity of of data close (but not over) 4kb? (see solution here 4kb motivation)
- how mat(s) saved, accessed , represented on disk? (is how memory-mapping involved?)
is generic c++ consideration, or need address @ opencv level?
you allocate , use boatloads of memory. the whole point of paging / virtual memory it's transparent. gets extremely slow, keeps working. don't enomem
until you're out of swap space + ram.
on normal linux system, normal swap partition should small (under 1gb), you'll need dd
swap file, , mkswap
/ swapon
on it. make sure swap file has read-write permission root only. every major os have own procedures.
is simple making granularity of of data close (but not over) 4kb? (see solution here 4kb motivation)
if have pointers other data, make sure keep them together. want small "hot" data in few pages decent os lru algorithm won't page out.
if have hot data mixed cold data, paged out , lead page-file round trip before cache miss final data can happen.
like yakk says, sequential access patterns better, because disk i/o better multi-block reads. (even ssds have better throughput larger blocks). allows prefetching, allows 1 i/o request start before previous one's data arrives. maxing out i/o throughput requires pipelining requests.
try design algorithms sequential accesses when possible. advantageous @ levels of memory, paging way l1 cache. sequential access enables auto-vectorization vector-registers.
cache blocking (aka loop tiling) techniques applicable page misses. google details, main idea steps of algorithm on subset of data, instead of touching all data @ each step. each piece of data has loaded cache once total, instead of once each step of algorithm.
think of dram cache giant virtual address space.
how mat(s) saved, accessed , represented on disk? (is how memory-mapping involved?)
swap space / pagefile backing store process's address space. yes, it's similar you'd if allocated memory mmap
ing big file instead of making anonymous allocation.
Comments
Post a Comment