Re: temporary tables, indexes, and query plans - Mailing list pgsql-performance

From Jon Nelson
Subject Re: temporary tables, indexes, and query plans
Date
Msg-id AANLkTinSVjip_UkKG69LsiuQ0=epDvA-daXC-Ec+76Q7@mail.gmail.com
Whole thread Raw
In response to Re: temporary tables, indexes, and query plans  (Jon Nelson <jnelson+pgsql@jamponi.net>)
Responses Re: temporary tables, indexes, and query plans  (Justin Pitts <justinpitts@gmail.com>)
List pgsql-performance
On Wed, Oct 27, 2010 at 1:52 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
> On Wed, Oct 27, 2010 at 1:32 PM, Reid Thompson <Reid.Thompson@ateb.com> wrote:
>> On Wed, 2010-10-27 at 13:23 -0500, Jon Nelson wrote:
>>> set it to 500 and restarted postgres.
>>
>> did you re-analyze?
>
> Not recently. I tried that, initially, and there was no improvement.
> I'll try it again now that I've set the stats to 500.
> The most recent experiment shows me that, unless I create whatever
> indexes I would like to see used *before* the large (first) update,
> then they just don't get used. At all. Why would I need to ANALYZE the
> table immediately following index creation? Isn't that part of the
> index creation process?
>
> Currently executing is a test where I place an "ANALYZE foo" after the
> COPY, first UPDATE, and first index, but before the other (much
> smaller) updates.
>
> ..
>
> Nope. The ANALYZE made no difference. This is what I just ran:
>
> BEGIN;
> CREATE TEMPORARY TABLE foo
> COPY ...
> UPDATE ... -- 1/3 of table, approx
> CREATE INDEX foo_rowB_idx on foo (rowB);
> ANALYZE ...
> -- queries from here to 'killed' use WHERE rowB = 'someval'
> UPDATE ... -- 7 rows. seq scan!
> UPDATE ... -- 242 rows, seq scan!
> UPDATE .. -- 3700 rows, seq scan!
> UPDATE .. -- 3100 rows, seq scan!
> killed.
>

Even generating the index beforehand (sans ANALYZE) was no help.
If I generate *all* of the indexes ahead of time, before the COPY,
that's the only time index usage jives with my expectations.

Here is an example of the output from auto analyze (NOTE: the WHERE
clause in this statement specifies a single value in the same column
that has a UNIQUE index on it):

Seq Scan on foo_table  (cost=0.00..289897.04 rows=37589 width=486)

and yet the actual row count is exactly 1.

If I change the order so that the index creation *and* analyze happen
*before* the first (large) update, then things appear to proceed
normally and the indexes are used when expected, although in some
cases the stats are still way off:

        Bitmap Heap Scan on foo_table  (cost=40.96..7420.39 rows=1999 width=158)

and yet there are only 7 rows that match. The others seem closer (only
off by 2x rather than 250x).

It seems as though creating an index is not enough. It seems as though
ANALYZE after index creation is not enough, either. I am theorizing
that I have to touch (or just scan?) some percentage of the table in
order for the index to be used?  If that's true, then what is ANALYZE
for?  I've got the stats cranked up to 500. Should I try 1000?


Jason Pitts:
RE: changing default_statistics_target (or via ALTER TABLE SET STATS)
not taking effect until ANALYZE is performed.

I did already know that, but it's probably good to put into this
thread. However, you'll note that this is a temporary table created at
the beginning of a transaction.


--
Jon

pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: AIX slow buffer reads
Next
From: Tom Lane
Date:
Subject: Re: temporary tables, indexes, and query plans