Re: Maximum statistics target - Mailing list pgsql-hackers

From Cédric Villemain
Subject Re: Maximum statistics target
Date
Msg-id 200803101158.42022.cedric.villemain@dalibo.com
Whole thread Raw
In response to Re: Maximum statistics target  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
Le Monday 10 March 2008, Peter Eisentraut a écrit :
> Am Freitag, 7. März 2008 schrieb Tom Lane:
> > I'm not wedded to the number 1000 in particular --- obviously that's
> > just a round number.  But it would be good to see some performance tests
> > with larger settings before deciding that we don't need a limit.
>
> Well, I'm not saying we should raise the default statistics target.  But
> setting an arbitrary limit on the grounds that larger values might slow the
> system is like limiting the size of tables because larger tables will cause
> slower queries.  Users should have the option of finding out the best
> balance for themselves.  If there are concerns with larger statistics
> targets, we should document them.  I find nothing about this in the
> documentation at the moment.

I find 2 things:
«Increasing the target causes a proportional increase in the time and space
needed to do ANALYZE. »
in http://www.postgresql.org/docs/current/static/sql-analyze.html
and
« ... at the price of consuming more space in pg_statistic and slightly more
time to compute the estimates»
in http://www.postgresql.org/docs/current/static/planner-stats.html

But probably not clear enought about time impact in query plan.


>
> > IIRC, egjoinsel is one of the weak spots, so tests involving planning of
> > joins between two tables with large MCV lists would be a good place to
> > start.
>
> I have run tests with joining two and three tables with 10 million rows
> each, and the planning times seem to be virtually unaffected by the
> statistics target, for values between 10 and 800000.  They all look more or
> less like this:
>
> test=# explain select * from test1, test2 where test1.a = test2.b;
>                                  QUERY PLAN
> ---------------------------------------------------------------------------
>-- Hash Join  (cost=308311.00..819748.00 rows=10000000 width=16)
>    Hash Cond: (test1.a = test2.b)
>    ->  Seq Scan on test1  (cost=0.00..144248.00 rows=10000000 width=8)
>    ->  Hash  (cost=144248.00..144248.00 rows=10000000 width=8)
>          ->  Seq Scan on test2  (cost=0.00..144248.00 rows=10000000
> width=8) (5 rows)
>
> Time: 132,350 ms
>
> and with indexes
>
> test=# explain select * from test1, test2 where test1.a = test2.b;
>                                          QUERY PLAN
> ---------------------------------------------------------------------------
>----------------- Merge Join  (cost=210416.65..714072.26 rows=10000000
> width=16)
>    Merge Cond: (test1.a = test2.b)
>    ->  Index Scan using test1_index1 on test1  (cost=0.00..282036.13
> rows=10000000 width=8)
>    ->  Index Scan using test2_index1 on test2  (cost=0.00..282036.13
> rows=10000000 width=8)
> (4 rows)
>
> Time: 168,455 ms
>
> The time to analyze is also quite constant, just before you run out of
> memory. :)  The MaxAllocSize is the limiting factor in all this.  In my
> example, statistics targets larger than about 800000 created pg_statistic
> rows that would have been larger than 1GB, so they couldn't be stored.
>
> I suggest that we get rid of the limit of 1000, adequately document
> whatever issues might exist with large values (possibly not many, see
> above), and add an error message more user-friendly than "invalid memory
> alloc request size" for the cases where the value is too large to be
> storable.



--
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Include Lists for Text Search
Next
From: Andrew Dunstan
Date:
Subject: Re: Include Lists for Text Search