postgres files in use not staying in linux file cache - Mailing list pgsql-performance

From Brio
Subject postgres files in use not staying in linux file cache
Date
Msg-id CAM+G8pRg9-Dvfr3dam9qG-etQGNGUQEO-cDPyhwR-_HTvWQABA@mail.gmail.com
Whole thread Raw
Responses Re: postgres files in use not staying in linux file cache  (Shaun Thomas <sthomas@optionshouse.com>)
Re: postgres files in use not staying in linux file cache  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-performance
Hi, I'm trying to investigate a performance problem.

We have a large database (over 1TB) running on a server with 160GB of RAM and 32 cores (Xeon E5-2650). The database files are on a NetApp mount.

The software is Postgres 9.3.1 on Ubuntu 12.04, Linux 3.2.0-38-generic.

Generally, when a query is slow, it's because it's waiting for I/O. Since only about 10% of the database can be in RAM at any time, this is expected. I'm trying to analyze the working set in the cache to see that relevant tables and indexes are cached. I can map our database's objects to files using pg_class.relfilenode to get the name(s), and then I use fincore from linux-ftools to inspect the Linux cache.

I have noticed a few times that an index scan may be taking a long time, and the query's backend process is reading from disk at about 2 MB/s, spending 99% of its time waiting for I/O (using iotop). This makes sense, if scanning an index that is not in cache.

As this is happening, I expect bits of the index and table to be pulled into cache, so the index scan may speed up as it goes, or will at least finish with some portion of the index in cache, so the scan won't be so slow next time. I use fincore to see how much of the index and table are in cache (taking into account that large objects >1GB will be split into multiple files). To my surprise, the files are cached 0%!

Some research about the Linux page cache suggests that any file can be essentially forced into cache by cat-ting to /dev/null. So I cat a file of one of these database objects, and I can see the cat process reading at about 100MB/s, so it takes 10 sec for a 1GB file. Then I check fincore again -- the file is still not cached. cat-ting the file again still takes 10 sec.

I cat several times in a row, and the file refuses to cache. Sometimes I see a bit of the file appear in the cache, but get removed a few seconds later. I also tried the fadvise program in the ftools, which didn't help. The I/O on this machine is not all that high (a few MB/s for various postgres processes), and there are a few GB free (shown in top). Most of the RAM (150GB+) is used by the page cache. No swap is in use.

Eventually the query finishes (or I cancel it). Then I find that running cat on the file does leave it in cache! So, is having multiple readers confusing Linux, or is Postgres doing any madvise on the file (I'd expect no to both).

So here's where I'm stuck. How can reading a file not leave it in the Linux cache? I'd expect it to enter the inactive list (which is about 80GB), so I'd expect another 80GB would need to be read before it would be its turn to be evicted.... which should take a long time if my maximum read speed is 100MB/s.

Since I don't understand the behaviour of simply running cat and the file not being cached, I wonder if this is an issue with Linux, not Postgres. But the issue seems to happen with files in use by Postgres, so maybe there's an interaction, so I thought I'd start here.

Any ideas how I can debug this further? Thanks!

Brian

pgsql-performance by date:

Previous
From: Igor Neyman
Date:
Subject: Re: Seqscan on big table, when an Index-Usage should be possible
Next
From: Vincent Lasmarias
Date:
Subject: Re: CPU load spikes when CentOS tries to reclaim 'cached' memory