Home > mailing lists

Running a query twice to ensure cached results. - Mailing list pgsql-hackers

From	Ron Mayer
Subject	Running a query twice to ensure cached results.
Date	June 8, 2006 18:49:55
Msg-id	44887145.5030901@cheapcomplexdevices.com Whole thread Raw
In response to	Re: That EXPLAIN ANALYZE patch still needs work (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Running a query twice to ensure cached results. (Tom Lane <tgl@sss.pgh.pa.us>) Re: Running a query twice to ensure cached results. ("Luke Lonergan" <llonergan@greenplum.com>)
List	pgsql-hackers

Tree view

Tom Lane wrote:
> -- do it again to ensure fully cached
> bench=# select count(*) from accounts;

Short summary:
 Does running a query only twice really insure that a result is cached? It seems not to be the case for seq-scans on
Linux.
 I think this may matters to the discussions about a readahead thread/process that come up on this list that come up
hereoccasionally.

Experimental results here suggest that for larger tables Linux seems
to detect a seq-scan and not bother caching.   It's very reproducible
for me here to do a reboot and not see the full speedup on a seq_scan
until the third time I run a query.su  An example shown below [1] shows
that the third run of a query is faster than the second run.  The
output of a 'vmstat 5' [2] while these queries was happening agrees
that significant I/O was still happening on the second run, but
no I/O happened the third time.   The table comfortably fits in
memory (700MB table on a 2GB ram machine) and the machine was
otherwise idle so noone else wiped out the cache between the
first and second runs.

Why do I think this is worth mentioning here?  * I think it impacts the occasional thread about wanting    to include
logicin postgresql for readahead [3] or for    the threads suggesting hinting to the the OS though madvise    or
similarto avoid caching seq-scans.   It seems that the    Linux is detecting and at least somewhat reacting    to seq
scanseven with no hinting.  Anything added    to postgresql might end up being a duplicated effort.    I think Bruce
suggestedthat Solaris does this free-behind    automatically [4], but this is the first I've noticed    that Linux
seemsto do similar.

  * I think it matters to people who post explain analyze    twice without running it so often they get stable results.
  (I note that this was not a problem for Tom since the    timing of his first and second runs were the same so    I
assumehe was just saying that he observed that the    query was cached rather than that the first run forced    the
secondrun to be cached.)

    Ron

=========================================================================
== [note 1] the repeated queries showing the speedup after 3 runs.
== Running the same select count(*) 4 times after a clean reboot.
== Seems the OS's caching logic decided that the first seq_scan
== wasn't 'interesting' enough
=========================================================================
fli=# select count(*) from facets_s;  count
---------- 15976558
(1 row)

Time: 29788.047 ms
fli=# select count(*) from facets_s;  count
---------- 15976558
(1 row)

Time: 19344.573 ms
fli=# select count(*) from facets_s;  count
---------- 15976558
(1 row)

Time: 13411.272 ms
fli=# select count(*) from facets_s;  count
---------- 15976558
(1 row)

Time: 13107.856 ms

################################################################################
# [note 2] vmstat 5 while the above queries were being run
################################################################################

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r  b   swpd   free   buff  cache   si
so   bi    bo   in    cs us sy id wa 1  1    140  62140  71256 713360    0    0    47    31   92    84  7  1 92  0

*** the first time 1  0    140  50860  31912 808304    0    2 25215    29 1147  2612 49 15  0 36 1  0    360  54420
11112855240    0    0 23934     7 1139  2553 47 14  0 39 0  1    360  54008  11100 878708    0    0 23704    25 1149
246746 12  0 41 0  1    360  52512  11140 896592    0    0 24062     6 1135  2460 47 11  0 41

*** the second time 0  0    360  52688  11172 906916    0    0 13357    19 1085  1989 31  7 38 24 1  0    360  53976
11076912540    0   44 14273    57 1113  2102 32  7 29 32 2  0    360  54788  10908 923788    0    0 24509    54 1171
247446 12  0 42 1  0    360  54944   3096 939948    0    0 11180    39 1093  1976 65 13  0 22

*** the third time 3  0    360  54280   3872 940508    0    0   264    14 1041  1560 85 15  0  0 1  0    360  53852
3904940940    0    0    88    29 1022  1505 53  9 36  2 2  0    360  51616   4052 943068    0    0   443    54 1037
155282 15  0  4 1  0    360  51488   4060 943180    0    0    22     2 1013  1522 84 16  0  0

#############
[3] http://archives.postgresql.org/pgsql-hackers/2005-11/msg01449.php
[4] http://archives.postgresql.org/pgsql-performance/2003-10/msg00188.php

pgsql-hackers by date:

From: Josh Berkus
Date: 08 June 2006, 18:48:25
Subject: Re: [PATCHES] drop if exists remainder

From: Gregory Stark
Date: 08 June 2006, 18:50:56
Subject: Re: Ending EXPLAIN ANALYZE early (was Re: That EXPLAIN ANALYZE patch still needs work)

Running a query twice to ensure cached results. - Mailing list pgsql-hackers

Previous

Next