On Wed, Feb 13, 2013 at 10:42 AM, Carlos Henrique Reimer
<carlos.reimer@opendb.com.br> wrote:
> Hi,
>
> I`m trying to figure out why a query runs in 755ms in the morning and
> 20054ms (26x) in the evening.
>
>
> Morning:
>
> Sort (cost=151845.90..152304.21 rows=183322 width=62) (actual
> time=706.676..728.080 rows=32828 loops=1)
( index scan plan )
>
> Evening:
>
> Sort (cost=321670.51..322111.45 rows=176377 width=62) (actual
> time=20010.616..20031.887 rows=32840 loops=1)
(sequential scan)
So it believes the evening plan (seq scan) to be twice as slow as the
morning plan (index scan), but uses it anyway. So it must think the
morning plan would be even slower than that yet if it were run in the
evening.
Do you change your database's cost settings between morning and
evening (for example, in preparation for night-time batch processing)?
It would interesting see what it thinks of the index plan at the time
which it is choosing the sequential scan plan.
In the evening, can you rerun the query to get the sequential plan,
then "set enable_seqscan=off" and run it again to get the index plan
under the same conditions?
>
> We initially suspected the reason could be that in the morning all data is
> in memory and in the evening not all is in memory but as database size is
> 40GB and memory 64GB I would eliminate this hypothesis .
I wouldn't eliminate it for that reason. Just because you have 64GB
doesn't mean the kernel is willing to use all of it for file caching,
we've seen complaints that some kernels under some settings are not
adept at using that much memory. The reason to eliminate that theory
is that you get two different execution plans. PostgreSQL must think
something is different in order to change plan, and empirical degree
of cachedness is not something PostgreSQL is cognizant of, so it must
be something else that is different.
> Another reason we
> rejected this hypothesis is that even if you run the query two times, both
> took almost the same time.
That doesn't mean much for the sequential scan plans. PostgreSQL has
a "ring buffer" mechanism that prevents sequential scans from shoving
everything else out of the cache. If the OS has some mechanism with a
similar goal in mind for the part of the cache it controls, then
sequential plans might not become well cached even after several
repeated executions.
> Another possibility is a CPU bottleneck but as there is no indication of
> this condition in the performance data collected by sar, top, vmstat we
> assume the problem has another origin.
Could you show us some of that info? sar averages over ten minutes,
so a 20 second query probably isn't going to show up in it, unless you
repeat it endlessly.
> How could we determine why this difference in the response time?
You could run "explain (analyze,buffers)", and also turn on track_io_timing,
Cheers,
Jeff