Mike Charnoky <noky@nextbus.com> writes:
> The sampling query which runs really slow on some days looks something
> like this:
> INSERT INTO sampled_data
> (item_name, timestmp, ... )
> SELECT item_name, timestmp, ... )
> FROM raw_data
> WHERE timestmp >= ? and timestmp < ?
> AND item_name=?
> AND some_data_field NOTNULL
> ORDER BY random()
> LIMIT ?;
Hmph, I'd expect that that would run pretty slowly *all* the time :-(.
There's no good way to optimize "ORDER BY random()". However, it seems
like the first thing you should do is modify the program so that it
issues an EXPLAIN for that right before actually doing the query, and
then you could see if the plan is different on the slow days.
> We have done a great deal of PG tuning, including the autovacuum for the
> "raw_data" table. Autovacuum kicks like clockwork every day on that
> table after the sampling process finishes (after one day's worth of data
> is deleted from "raw_data" table, a roughly 7% change in size).
Also, are you sure you have ruled out the possibility that the problem
comes from autovac kicking in *while* the update is running?
regards, tom lane