Thread: Help in avoiding a query 'Warm-Up' period/shared buffer cache
Hello, We have a web-application running against a postgres 8.1 database, and basically, every time I run a report after no other reports have been run for several hours, the report will take significantly longer (e.g. 30 seconds), then if I re-run the report again, or run the report when the web-application has been used recently (< 1 second). Keep in mind that our web-app might issue 30 or more individual queries to return a given report and that this behavior is not just isolated to a single report-type - it basically happens for any of the reports after the web-app has been inactive. Also, I can trace it back to the timing of the underlying queries, which show this same behavior (e.g. it's not because of overhead in our web-app). So, it appears to be some sort of caching issue. I'm not 100% clear on how the shared buffer cache works, and what we might do to make sure that we don't have these periods where queries take a long time. Since our users' typical usage scenario is to not use the web-app for a long time and then come back and use it, if reports which generally take a second are taking 30 seconds, we have a real problem. I have isolated a single example of one such query which is very slow when no other queries have been run, and then speeds up significantly on the second run. First run, after a night of inactivity: explain analyze SELECT average_size, end_time FROM 1min_events WHERE file_id = '137271' AND end_time > now() - interval '2 minutes' ORDER BY end_time DESC LIMIT 1; Limit (cost=47.06..47.06 rows=1 width=24) (actual time=313.585..313.585 rows=1 loops=1) -> Sort (cost=47.06..47.06 rows=1 width=24) (actual time=313.584..313.584 rows=1 loops=1) Sort Key: end_time -> Bitmap Heap Scan on 1min_events (cost=44.03..47.05 rows=1 width=24) (actual time=313.562..313.568 rows=2 loops=1) Recheck Cond: ((end_time > (now() - '00:02:00'::interval)) AND (file_id = 137271)) -> BitmapAnd (cost=44.03..44.03 rows=1 width=0) (actual time=313.551..313.551 rows=0 loops=1) -> Bitmap Index Scan on 1min_events_end_idx (cost=0.00..5.93 rows=551 width=0) (actual time=0.076..0.076 rows=46 loops=1) Index Cond: (end_time > (now() - '00:02:00'::interval)) -> Bitmap Index Scan on 1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0) (actual time=313.468..313.468 rows=11082 loops=1) Index Cond: (file_id = 137271) Total runtime: 313.643 ms (11 rows) Second run, after that: explain analyze SELECT average_size, end_time FROM 1min_events WHERE file_id = '137271' AND end_time > now() - interval '2 minutes' ORDER BY end_time DESC LIMIT 1; Limit (cost=47.06..47.06 rows=1 width=24) (actual time=2.209..2.209 rows=1 loops=1) -> Sort (cost=47.06..47.06 rows=1 width=24) (actual time=2.208..2.208 rows=1 loops=1) Sort Key: end_time -> Bitmap Heap Scan on 1min_events (cost=44.03..47.05 rows=1 width=24) (actual time=2.192..2.194 rows=2 loops=1) Recheck Cond: ((end_time > (now() - '00:02:00'::interval)) AND (file_id = 137271)) -> BitmapAnd (cost=44.03..44.03 rows=1 width=0) (actual time=2.186..2.186 rows=0 loops=1) -> Bitmap Index Scan on 1min_events_end_idx (cost=0.00..5.93 rows=551 width=0) (actual time=0.076..0.076 rows=46 loops=1) Index Cond: (end_time > (now() - '00:02:00'::interval)) -> Bitmap Index Scan on 1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0) (actual time=2.106..2.106 rows=11082 loops=1) Index Cond: (file_id = 137271) Total runtime: 2.276 ms (11 rows) One of the things that is perplexing about the initial slowness of this query is that it's accessing the most recent rows in a given table (e.g. those in the last 2 minutes). So, I would expect the OS cache to be updated with these new rows. Some general information about the server / db: 1) The database is 25G, and has about 60 tables - some very small, but several > 5 MM rows. 2) The table I am querying against above (1min_events) has 5.5 MM rows, but is indexed on end_time, as well as a compound index on file_id, begin_time 3) The following are running on the server that holds the db: a) A program which is reading files and making several (5-10) database calls per minute (these calls tend to take < 100 ms each). These calls are inserting 10's of rows into several of the tables. b) An apache web-server c) The 8.1 postgres DB d) we are running periodic CRON jobs (generally at 11pm, 1 am and 3am) that truncate some of the older data e) we have autovacuum on with a 60 second naptime and and low scale factors 0.2, so analyzes and vacuums happen throughout the day - vacuums are generally triggered by the truncate CRON jobs too. 4) Some of our config settings: shared_buffers = 8192 work_mem = 8192 Total RAM on server is 1 Gig Basically any advice as to what to look at to avoid this situation would be greatly appreciated. Is this simply a matter of tuning the shared_buffers parameter? If not, is scheduling a set of queries to force the proper loading of the cache a logical solution? Thanks in advance, Mark
"Mark Liberman" <mliberman@mixedsignals.com> wrote > > First run, after a night of inactivity: > > -> Bitmap Index Scan on 1min_events_file_id_begin_idx > (cost=0.00..37.85 rows=3670 width=0) (actual time=313.468..313.468 > rows=11082 > loops=1) > Index Cond: (file_id = 137271) > Total runtime: 313.643 ms > > Second run, after that: > > -> Bitmap Index Scan on 1min_events_file_id_begin_idx > (cost=0.00..37.85 rows=3670 width=0) (actual time=2.106..2.106 rows=11082 > loops=1) > Index Cond: (file_id = 137271) > Total runtime: 2.276 ms It is clear that the first query takes longer time because of the IO time of index 1min_events_file_id_begin_idx (see 313.468 vs. 2.106). I am afraid currently there is no easy solution for this situation, unless you could predicate which part of relation/index your query will use, then you can preload or "warm-up" cache for it. Regards, Qingqing
On Thursday 05 January 2006 15:12, Qingqing Zhou wrote: > "Mark Liberman" <mliberman@mixedsignals.com> wrote > > > First run, after a night of inactivity: > > > > -> Bitmap Index Scan on > > 1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0) > > (actual time=313.468..313.468 rows=11082 > > loops=1) > > Index Cond: (file_id = 137271) > > Total runtime: 313.643 ms > > > > Second run, after that: > > > > -> Bitmap Index Scan on > > 1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0) > > (actual time=2.106..2.106 rows=11082 loops=1) > > Index Cond: (file_id = 137271) > > Total runtime: 2.276 ms > > It is clear that the first query takes longer time because of the IO time > of index 1min_events_file_id_begin_idx (see 313.468 vs. 2.106). I am afraid > currently there is no easy solution for this situation, unless you could > predicate which part of relation/index your query will use, then you can > preload or "warm-up" cache for it. > > Regards, > Qingqing Thanks Qingqing, this actually helped me determine that the compound index, 1min_events_file_id_begin_idx, is not the proper index to use as it is based on file_id and begin_time - the later of which is not involved in the where clause. It is only using that index to "filter" out the listed file_id. Now, my follow-up question / assumption. I am assuming that the IO time is so long on that index because it has to read the entire index (for that file_id) into memory (because it cannot just scan the rows with a certain date range because we are not using begin_time in the where clause). But, if I replaced that compound index with the proper compound index of file_id / end_time, it would give similar performance results to the scan on 1min_events_end_idx (which was < 1 ms). E.g. the latest rows that were updated are more likely to be in the cache - and it is smart enough to only read the index rows that it needs. Alternatively, I could create a single index on file_id (and rely upon the new bitmap scan capabilities in 1.2). But, I fear that, although this will be smaller than the erroneous compound index on file_id / begin_time, it will still display the same behavior in that it will need to read all rows from that index for the appropriate file_id - and since the data goes back every minute for 60 days, that IO might be large. Obviously, I will be testing this - but it might take a few days, as I haven't figure out how to simulate the "period of inactivity" to get the data flushed out of the cache ... so I have to run this each morning. But, any confirmation / corrections to my assumptions are greatly appreciated. E.g. is the compound index the way to go, or the solo index on file_id? Thanks, Mark
On Thu, 5 Jan 2006, Mark Liberman wrote: > Obviously, I will be testing this - but it might take a few days, as I haven't > figure out how to simulate the "period of inactivity" to get the data flushed > out of the cache ... so I have to run this each morning. cat large_file >/dev/null will probably do a pretty good job of this (especially if large_file is noticably larger then the amount of ram you have) David Lang
"Mark Liberman" <mliberman@mixedsignals.com> wrote > > Now, my follow-up question / assumption. I am assuming that the IO time > is > so long on that index because it has to read the entire index (for that > file_id) into memory > > any confirmation / corrections to my assumptions are greatly appreciated. > E.g. is > the compound index the way to go, or the solo index on file_id? > Only part of the index file is read. It is a btree index. Keep the index smaller but sufficient to guide your search is always good because even by the guidiance of the index, a heap visit to get the real data is not avoidable. Regards, Qingqing
On Thu, Jan 05, 2006 at 06:50:22PM -0800, David Lang wrote: > On Thu, 5 Jan 2006, Mark Liberman wrote: > > >Obviously, I will be testing this - but it might take a few days, as I > >haven't > >figure out how to simulate the "period of inactivity" to get the data > >flushed > >out of the cache ... so I have to run this each morning. > > cat large_file >/dev/null > > will probably do a pretty good job of this (especially if large_file is > noticably larger then the amount of ram you have) The following C code is much faster... /* * $Id: clearmem.c,v 1.1 2003/06/29 20:41:33 decibel Exp $ * * Utility to clear out a chunk of memory and zero it. Useful for flushing disk buffers */ int main(int argc, char *argv[]) { if (!calloc(atoi(argv[1]), 1024*1024)) { printf("Error allocating memory.\n"); } } Compile it and then pass in the number of MB of memory to allocate on the command line. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461