Re: benchmarking the query planner - Mailing list pgsql-hackers

From Robert Haas
Subject Re: benchmarking the query planner
Date
Msg-id 603c8f070903111811h5241122aved226a40b87389d8@mail.gmail.com
Whole thread Raw
In response to Re: benchmarking the query planner  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: benchmarking the query planner  (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
List pgsql-hackers
On Fri, Dec 12, 2008 at 3:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On Fri, 2008-12-12 at 13:20 -0500, Tom Lane wrote:
>>> Simon Riggs <simon@2ndQuadrant.com> writes:
>>>> Solutions can also include
>>>> * manual mechanism for setting ndistinct that doesn't keep getting
>>>> overwritten by subsequent ANALYZEs
>>>
>>> Hmm, that might actually be the most practical answer for large,
>>> reasonably-static tables.  Especially if we expose the "negative
>>> stadistinct" feature to let people specify it as a fraction of table
>>> size.
>
>> Works for me. Especially if you want to think more about ANALYZE before
>> changing that.
>
> Well, it's something that would be sane to contemplate adding in 8.4.
> It's way too late for any of this other stuff to happen in this release.

I'm thinking about trying to implement this, unless someone else is
already planning to do it.  I'm not sure it's practical to think about
getting this into 8.4 at this point, but it's worth doing whether it
does or not.

The main problem I see here is that I don't know what the mechanism
should be.  My first thought was to use a reloption, but that won't
work because reloptions are per-table and this setting is per-column.
So my second thought is to add a column to pg_attribute and add a new
DDL commands to modify it, maybe something like:

ALTER TABLE name ALTER [COLUMN] column SET NDISTINCT 1672;
ALTER TABLE name ALTER [COLUMN] column DROP NDISTINCT;

Another option would be to invent a reloption-like syntax for columns:

ALTER TABLE name ALTER [COLUMN] column SET (ndistinct = 1672);
ALTER TABLE name ALTER [COLUMN] column RESET (ndistinct);

This sort of seems cleaner and I can imagine it being useful for other
purposes, but I'm not sure if I want to go to the trouble of building
a completely general column-level reloptions mechanism ATM. It's also
not entirely clear to me whether this option, wherever we put it,
should be directly examined by the selectivity functions before
consulting the statistics tuple, or whether it should merely override
the value that ANALYZE puts into the statistics tuple.  The latter
seems simpler and more performant to me, but does lead to the possibly
surprising result that changes don't take effect until the next
ANALYZE.

Thoughts?

...Robert


pgsql-hackers by date:

Previous
From: KaiGai Kohei
Date:
Subject: Re: Updates of SE-PostgreSQL 8.4devel patches (r1710)
Next
From: Tom Lane
Date:
Subject: Re: Should SET ROLE inherit config params?