Home > mailing lists

Re: intermittant performance problem - Mailing list pgsql-general

From	Scott Marlowe
Subject	Re: intermittant performance problem
Date	March 10, 2009 00:36:01
Msg-id	dcc563d10903092035gf777ef4y893c2ba20fe8e9c8@mail.gmail.com Whole thread Raw
In response to	Re: intermittant performance problem (Mike Charnoky <noky@nextbus.com>)
List	pgsql-general

Tree view

On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky <noky@nextbus.com> wrote:
> The random sampling query is normally pretty snappy.  It usually takes on
> the order of 1 second to sample a few thousand rows of data out of a few
> million.  The sampling is consistently quick, too.  However, on some days,
> the sampling starts off quick, then when the process starts sampling from a
> different subset of data (different range of times for the same day), the
> sampling query takes a couple minutes.

Then definitely look at saving explain plans before execution to
compare fast to slow runs.  This definitely sounds like ocassionally
bad query plans to me so far.

> Regarding the concurrent vacuuming, this is definitely not happening.  I
> always check pg_stat_activity whenever the sampling process starts to lag
> behind.  I have never seen a vacuum running during this time.

And if autovac is getting in the ways, try adjusting the various
autovac options. spefically autovacuum_vacuum_cost_delay set to 10 or
20 (mS).

>
> Interesting idea to issue the EXPLAIN first... I will see if I can
> instrument the sampling program to do this.
>
> Thanks for your help Tom.
>
>
> Mike
>
> Tom Lane wrote:
>>
>> Mike Charnoky <noky@nextbus.com> writes:
>>>
>>> The sampling query which runs really slow on some days looks something
>>> like this:
>>
>>> INSERT INTO sampled_data
>>>   (item_name, timestmp, ... )
>>>   SELECT item_name, timestmp, ... )
>>>   FROM raw_data
>>>   WHERE timestmp >= ? and timestmp < ?
>>>   AND item_name=?
>>>   AND some_data_field NOTNULL
>>>   ORDER BY random()
>>>   LIMIT ?;
>>
>> Hmph, I'd expect that that would run pretty slowly *all* the time :-(.
>> There's no good way to optimize "ORDER BY random()".  However, it seems
>> like the first thing you should do is modify the program so that it
>> issues an EXPLAIN for that right before actually doing the query, and
>> then you could see if the plan is different on the slow days.
>>
>>> We have done a great deal of PG tuning, including the autovacuum for the
>>> "raw_data" table.  Autovacuum kicks like clockwork every day on that
>>> table after the sampling process finishes (after one day's worth of data
>>> is deleted from "raw_data" table, a roughly 7% change in size).
>>
>> Also, are you sure you have ruled out the possibility that the problem
>> comes from autovac kicking in *while* the update is running?
>>
>>                        regards, tom lane
>>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



--
When fascism comes to America, it will be the intolerant selling it as
diversity.

pgsql-general by date:

From: Tom Lane
Date: 10 March 2009, 00:20:10
Subject: Re: C++ User-defined functions

From: Josh Trutwin
Date: 10 March 2009, 01:23:26
Subject: 8.3.6 build error on Debian Lenny

Re: intermittant performance problem - Mailing list pgsql-general

Previous

Next