Thread: optimizer behavior in the case of highly updated tables

optimizer behavior in the case of highly updated tables

From

Mark Rostron

Date:

09 June 2010, 03:19:44

I am looking for some specific information regarding optimizer behavior.

We recently experienced a situation where a query that was previously using a btree lookup (efficient) SWITCHED to using seqscan/hash lookup.

My questions would be:

- Under what circumstances is the optimizer likely to CHANGE behavior from using a btree index lookup to using a seq scan/hash lookup?

- What are the critical decision factors that would feed into the optimizer making such a change?

- Is it possible to measure any metrics in a way that would enable a prediction of such a change?

Platform

- 8.3.10 (64bit) on RHEL5.

- Linux xxxxx 2.6.18-164.10.1.el5xen #1 SMP Thu Jan 7 20:28:30 EST 2010 x86_64 x86_64 x86_64 GNU/Linux

Application

The table in question is:

- 30m rows, variable length (contains varchar fields), rowlength avg about 120B

- Approx. 3m unique values in the index column

Activity on the table would be, per row: “Insert, multiple updates, delete after 90 days”

We vacuum analyze this table once/weekly.

No partitions are used.

Our experience which prompts this question was as follows:

- If the table is not “vacuum analyze’d” at least once/week, the query plans become unpredictable as to whether they will use btree or seqscan/hash lookup

- Until last week, “vacuum analyze” was sufficient

- Friday evening of last week, the query plan for selected queries against this index changed again, but “vacuum analyze” was insufficient

- Rebuilding index on primary key and on the column index was insufficient

- It was necessary to take a site outage and perform a “vacuum full analyze” on the table

- Following this, the query plan reverted to the more efficient btree lookup

Clearly, the garbage buildup resulting from transaction activity on the table is the villain here.

- Is it possible to calculate expected space usage given row count and average row size

- At what point might the ratio of “expected”/”actual” space usage be able to indicate the need to perform “full vacuum”, or similar maintenance

Any observations/comments that anyone would care to make are welcome.

Thanks in advance for your time

Re: optimizer behavior in the case of highly updated tables

From

Scott Marlowe

Date:

09 June 2010, 05:54:37

On Tue, Jun 8, 2010 at 8:58 PM, Mark Rostron <mrostron@ql2.com> wrote:
> I am looking for some specific information regarding optimizer behavior.
>
> We recently experienced a situation where a query that was previously using
> a btree lookup (efficient) SWITCHED to using seqscan/hash lookup.

OK, before we go any further, got any explain analyze output of the
fast and slow queries?

Re: optimizer behavior in the case of highly updated tables

From

Greg Smith

Date:

09 June 2010, 06:04:20

Mark Rostron wrote:
>
> - It was necessary to take a site outage and perform a “vacuum full
> analyze” on the table
>
> - Following this, the query plan reverted to the more efficient btree
> lookup
>
> Clearly, the garbage buildup resulting from transaction activity on
> the table is the villain here.
>
> - Is it possible to calculate expected space usage given row count and
> average row size
>
> - At what point might the ratio of “expected”/”actual” space usage be
> able to indicate the need to perform “full vacuum”, or similar maintenance
>

I think you're right to focus on this part, because with your usage
pattern--deleting all old data constantly--you have to get this under
control in order for the query planner to do the right thing here.

Start by reading http://wiki.postgresql.org/wiki/VACUUM_FULL

What you probably should have done in order to return to sane query
plans was to run CLUSTER instead of VACUUM FULL. Site would have been
down less time, and you actually made some future problems a bit worse
by screwing your indexes up some using the problematic FULL.

As for measuring what's going wrong here, what you want to do is run
this around once a week during a slow period and save the output for
analysis:

VACUUM VERBOSE;

This will give you a bunch of statistics about space usage, including a
summary at the end that will tell if there's a serious problem you might
already be running into (running out of space in the free space map).
Combine that with a look at pg_stat_user_tables and you should have a
decent initial statistical picture of what's going on with your data.

There are two large scale things you may need to consider if volume on
your site expands in the future:

1) Deleting the historical data in smaller chunks and doing an immediate
VACUUM afterwards. Perhaps as often as daily. This keeps the amount of
dead space VACUUM has to clean up as small as possible, at the expensive
of higher site overhead.

2) Start partitioning your tables. This allows you to just DROP old
partitions rather than deleting rows from a single master table. It can
make this whole class of problem go away.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: optimizer behavior in the case of highly updated tables

From

Scott Marlowe

Date:

09 June 2010, 06:09:10

On Wed, Jun 9, 2010 at 12:04 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> Mark Rostron wrote:
>>
>> - It was necessary to take a site outage and perform a “vacuum full
>> analyze” on the table
>>
>> - Following this, the query plan reverted to the more efficient btree
>> lookup
>>
>> Clearly, the garbage buildup resulting from transaction activity on the
>> table is the villain here.
>>
>> - Is it possible to calculate expected space usage given row count and
>> average row size
>>
>> - At what point might the ratio of “expected”/”actual” space usage be able
>> to indicate the need to perform “full vacuum”, or similar maintenance
>>
>
> I think you're right to focus on this part, because with your usage
> pattern--deleting all old data constantly--you have to get this under
> control in order for the query planner to do the right thing here.

I think this is one of those places where you need to vacuum more
often and more aggresively to keep up.  If the usage pattern works
well with truncating partitions then do that.  But it's possible that
aggresive vacuuming can take care of this.

I wonder what vacuum verbose before and after the performance shift
would reveal anything useful about bloating.