Home > mailing lists

Re: benchmarking the query planner - Mailing list pgsql-hackers

From	Ron Mayer
Subject	Re: benchmarking the query planner
Date	December 12, 2008 15:34:51
Msg-id	4942BCD2.9070306@cheapcomplexdevices.com Whole thread Raw
In response to	Re: benchmarking the query planner (Gregory Stark <stark@enterprisedb.com>)
List	pgsql-hackers

Tree view

Gregory Stark wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> The amount of I/O could stay the same, just sample all rows on block. [....]
> 
> It will also introduce strange biases. For instance in a clustered table it'll
> think there are a lot more duplicates than there really are because it'll see
> lots of similar values.

But for ndistinct - it seems it could only help things.  If the
ndistinct guesser just picks  max(the-current-one-row-per-block-guess,
a-guess-based-on-all-the-rows-on-the-blocks)
it seems we'd be no worse off for clustered tables; and much
better off for randomly organized tables.

In some ways I fear *not* sampling all rows on the block also
introduces strange biases by largely overlooking the fact that
the table's clustered.

In my tables clustered on zip-code we don't notice info like
"state='AZ' is present in well under 1% of blocks in the table",
while if we did scan all rows on the blocks it might guess this.
But I guess a histogram of blocks would be additional stat rather
than an improved one.

pgsql-hackers by date:

From: Tom Lane
Date: 12 December 2008, 15:34:16
Subject: Re: PostgreSQL 8.3.4 reproducible crash

From: Simon Riggs
Date: 12 December 2008, 15:45:24
Subject: Re: benchmarking the query planner

Re: benchmarking the query planner - Mailing list pgsql-hackers

Previous

Next