Re: intermittant performance problem - Mailing list pgsql-general

From Mike Charnoky
Subject Re: intermittant performance problem
Date
Msg-id 49BEA107.4000107@nextbus.com
Whole thread Raw
In response to intermittant performance problem  (Mike Charnoky <noky@nextbus.com>)
Responses Re: intermittant performance problem  (Mike Charnoky <noky@nextbus.com>)
List pgsql-general
Scott Marlowe wrote:
> On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky <noky@nextbus.com> wrote:
>> The random sampling query is normally pretty snappy.  It usually takes on
>> the order of 1 second to sample a few thousand rows of data out of a few
>> million.  The sampling is consistently quick, too.  However, on some days,
>> the sampling starts off quick, then when the process starts sampling from a
>> different subset of data (different range of times for the same day), the
>> sampling query takes a couple minutes.
>
> Then definitely look at saving explain plans before execution to
> compare fast to slow runs.  This definitely sounds like ocassionally
> bad query plans to me so far.
>
>>
>> Tom Lane wrote:
>>> Mike Charnoky <noky@nextbus.com> writes:
>>>> The sampling query which runs really slow on some days looks something
>>>> like this:
>>>> INSERT INTO sampled_data
>>>>   (item_name, timestmp, ... )
>>>>   SELECT item_name, timestmp, ... )
>>>>   FROM raw_data
>>>>   WHERE timestmp >= ? and timestmp < ?
>>>>   AND item_name=?
>>>>   AND some_data_field NOTNULL
>>>>   ORDER BY random()
>>>>   LIMIT ?;
>>> Hmph, I'd expect that that would run pretty slowly *all* the time :-(.
>>> There's no good way to optimize "ORDER BY random()".  However, it seems
>>> like the first thing you should do is modify the program so that it
>>> issues an EXPLAIN for that right before actually doing the query, and
>>> then you could see if the plan is different on the slow days.

The problem came up over the weekend so I took a look at the info from
EXPLAIN.  The query plans were quite different on the days when the
problem happened.  I began to suspect that autoanalyze was not happening
daily like the autovacuums were, and sure enough it was only running
about every other day.  In fact, I saw that autoanalyze happened once
during the sampling process, and the sampling happened much faster
afterward.

We're tuning the autoanalyze parameters so it runs more frequently.  Is
it OK to run ANALYZE manually before I begin the sampling process?  Or
is there a possibility this will collide with an autoanalyze and result
in problems?  I seem to remember this was a problem in the past, though
it may have been before PG8.3...


Mike

pgsql-general by date:

Previous
From: ataherster
Date:
Subject: Re: [ask] Return Query
Next
From: Scott Marlowe
Date:
Subject: Re: [ADMIN] deployment query