Re: benchmarking the query planner - Mailing list pgsql-hackers

From Robert Haas
Subject Re: benchmarking the query planner
Date
Msg-id 603c8f070904021926g92eb55sdfc68141133957c1@mail.gmail.com
Whole thread Raw
In response to Re: benchmarking the query planner  (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
List pgsql-hackers
On Thu, Mar 19, 2009 at 4:04 AM, ITAGAKI Takahiro
<itagaki.takahiro@oss.ntt.co.jp> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>> >> Works for me. Especially if you want to think more about ANALYZE before
>> >> changing that.
>> >
>> > Well, it's something that would be sane to contemplate adding in 8.4.
>> > It's way too late for any of this other stuff to happen in this release.
>>
>> I'm thinking about trying to implement this, unless someone else is
>> already planning to do it.  I'm not sure it's practical to think about
>> getting this into 8.4 at this point, but it's worth doing whether it
>> does or not.
>
> Can we use get_relation_stats_hook on 8.4? The pg_statistic catalog
> will be still modified by ANALYZEs, but we can rewrite the statistics
> just before it is used.
>
> your_relation_stats_hook(root, rte, attnum, vardata)
> {
>    Call default implementation;
>    if (rte->relid = YourRelation && attnum = YourColumn)
>        ((Form_pg_statistic) (vardata->statsTuple))->stadistinct = YourNDistinct;
> }

I don't know, can you run a query from inside the stats hook?  It
sounds like this could be made to work for a hard-coded relation and
column, but ideally you'd like to get this data out of a table
somewhere.

I started implementing this by adding attdistinct to pg_attribute and
making it a float8, with 0 meaning "don't override the results of the
normal stats computation" and any other value meaning "override the
results of the normal stats computation with this value".  I'm not
sure, however, whether I can count on the result of an equality test
against a floating-point zero to be reliable on every platform.    It
also seems like something of a waste of space, since the only positive
values that are useful are integers (and presumably less than 2^31-1)
and the only negative values that are useful are > -1.  So I'm
thinking about making it an integer, to be interpreted as follows:

0 => compute ndistinct normally
positive value => use this value for ndistinct
negative value => use this value * 10^-6 for ndistinct

Any thoughts?

...Robert


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: a few crazy ideas about hash joins
Next
From: Hiroshi Inoue
Date:
Subject: Re: More message encoding woes