开发者

Linux Memory mapped files reserve lots of physical memory

开发者 https://www.devze.com 2023-01-18 00:05 出处:网络
I have a problem that was described in multiple threads concerning memory mapping and a growing memory consumption under Linux.

I have a problem that was described in multiple threads concerning memory mapping and a growing memory consumption under Linux.

When I open a 1GB file under Linux or MacOS X and map it into memory using

me.data_begin = mmap(NULL, capacity(me), prot, MAP_SHARED, me.file.handle, 0)开发者_StackOverflow社区;

and sequentially read the mapped memory, my program uses more and more physical memory although I used posix_madvise (even called it multiple times during the read process):

posix_madvise(me.data_begin, capacity(me), MMAP_SEQUENTIAL);

without success. :-(

I tried:

  • different flags MMAP_RANDOM, MMAP_DONTNEED, MMAP_NORMAL without success
  • posix_fadvise(me.file.handle, 0, capacity(me), POSIX_FADV_DONTNEED) before and after calling mmap -> no success

It works under Mac OS X !!! when I combine

posix_madvise(.. MMAP_SEQUENTIAL)

and

msync(me.data_begin, capacity(me), MS_INVALIDATE).

The resident memory is below 16M (I periodically called msync after 16mio steps).

But under Linux nothing works. Does anyone has an idea or a success story for my problem under Linux?

Cheers, David


Linux memory management is different from other systems. The key principle is that memory that is not being used is memory being wasted. In many ways, Linux tries to maximize memory usage, resulting (most of the time) in better performance.

It is not that "nothing works" in Linux, but that its behavior is a little different than you expect.

When memory pages are pulled from the mmapped file, the operating system has to decide which physical memory pages it will release (or swap out) in order to use. It will look for pages which are easier to swap out (don't require immediate disk write) and are less likely to be used again.

The madvice() POSIX call serves to tell the system how your application will use the pages. But as the name says, it is an advice so that the operating system is better instrumented in taking paging and swapping decisions. It is neither a policy nor an order.

To demonstrate the effects of madvice() on Linux, I modified one of the exercises I give to my students. See the complete source code here. My system is 64-bit and has 2 GB of RAM, which about 50% is in use now. Using the program to mmap a 2 GB file, read it sequentially and discard everything. It reports RSS usage every 200 MB is read. The results without madvice():

<juliano@home> ~% ./madvtest file.dat n
     0 :     3 MB
   200 :   202 MB
   400 :   402 MB
   600 :   602 MB
   800 :   802 MB
  1000 :  1002 MB
  1200 :  1066 MB
  1400 :  1068 MB
  1600 :  1078 MB
  1800 :  1113 MB
  2000 :  1113 MB

Linux kept pushing things out of memory until around 1 GB was read. After that, it started pressuring the process itself (since the other 50% of memory was active by the other processes) and stabilized until the end of the file.

Now, with madvice():

<juliano@home> ~% ./madvtest file.dat y
     0 :     3 MB
   200 :   202 MB
   400 :   402 MB
   600 :   494 MB
   800 :   501 MB
  1000 :   518 MB
  1200 :   530 MB
  1400 :   530 MB
  1600 :   530 MB
  1800 :   595 MB
  2000 :   788 MB

Note that Linux decided to allocate pages to the process only until it reached around 500 MB, much sooner than without madvice(). This is because after that, the pages currently in memory seemed much more valuable than the pages that were marked as sequential access by this process. There is a threshold in the VMM that defines when to start dropping old pages from the proccess.

You may ask why Linux kept allocating pages up to around 500 MB and didn't stop much sooner, since they were marked as sequential access. It is that either the system had enough free memory pages anyways, or the other resident pages were too old to keep around. Between keeping ancient pages in memory that don't seem to be useful anymore, and bringing more pages to serve a program that is running now, Linux chooses the second option.

Even if they were marked as sequential access, it was just an advice. The application may still want to go back to those pages and read them again. Or another application in the system. The madvice() call says only what the application itself is doing, Linux takes in consideration the bigger picture.

0

精彩评论

暂无评论...
验证码 换一张
取 消