Re: [HACKERS] longer-term optimizer musings - Mailing list pgsql-hackers

From Erik Riedel
Subject Re: [HACKERS] longer-term optimizer musings
Date
Msg-id sqy7Fd600anI01i2E0@andrew.cmu.edu
Whole thread Raw
In response to Re: [HACKERS] longer-term optimizer musings  (Bruce Momjian <maillist@candle.pha.pa.us>)
List pgsql-hackers
> Interesting.  The problem I see is that trying to do a char(20) column
> with min(A) and max(B) can have 256^19 possible unique values from A to
> B, so it kind if kills many general cases.  Floats have the same
> problem.
>  
Right, in most general cases, there isn't much you can do.

Although, if this seemed like an important thing, one could imagine an
extension to 'vacuum analyze' and pg_statistic that tried to track the
number of unique values while it finds the min and max.  Maybe tracking
some fixed number (10?) of unique attr values and stop searching once it
exceeds 10 different values (or maybe some tiny fraction of the tuples
in the relation, whatever gives a decent balance of memory and CPU at
analyze time).  Basically to find out if it might be .01% instead of the
10% default you suggest below.

This would work for a database that tracks all the CDs owned by "Bob"
and "Alice" even with char(20) first names.  For floats, it wouldn't be
very good for prices at Tiffany's, but should work pretty well for the
Everything's $1 store.

> A nice general fix would be to assume GROUP BY/AGG returns only 10% of
> the existing rows.  I don't even know if an Aggregate without a group by
> knows it only returns one row.  Oops, I guess not:
>  
>         test=> explain select max(relpages) from pg_class;
>         NOTICE:  QUERY PLAN:
>         
>         Aggregate  (cost=2.58 size=0 width=0)
>           ->  Seq Scan on pg_class  (cost=2.58 size=48 width=4)
>         
Yup, this would be easy to add (both the 10% and 1 for non-group aggs). 
The current code just passes along the cost and zeros the size and width
in all Sort, Group, and Aggregate nodes (this was the issue flagged as
Problem 1 in my message - and I tried to give line numbers where that
would have to be fixed).  Note that cost_sort() seems to work reasonably
enough, but has this non-obvious "sortkeys" argument that it does
nothing with.

> So I guess I am saying that your ideas are good, but we need to walk
> before we can run with this optimizer.
>  
Understood.  I am not discouraged and will continue throwing these
things out as I see them and think I have a reasonable explanation.

Erik



pgsql-hackers by date:

Previous
From: The Hermit Hacker
Date:
Subject: Re: [HACKERS] PostgreSQL LOGO (was: Developers Globe (FINAL))
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] aggregation memory leak and fix