Re: Maximum statistics target - Mailing list pgsql-hackers
From | Cédric Villemain |
---|---|
Subject | Re: Maximum statistics target |
Date | |
Msg-id | 200803101158.42022.cedric.villemain@dalibo.com Whole thread Raw |
In response to | Re: Maximum statistics target (Peter Eisentraut <peter_e@gmx.net>) |
List | pgsql-hackers |
Le Monday 10 March 2008, Peter Eisentraut a écrit : > Am Freitag, 7. März 2008 schrieb Tom Lane: > > I'm not wedded to the number 1000 in particular --- obviously that's > > just a round number. But it would be good to see some performance tests > > with larger settings before deciding that we don't need a limit. > > Well, I'm not saying we should raise the default statistics target. But > setting an arbitrary limit on the grounds that larger values might slow the > system is like limiting the size of tables because larger tables will cause > slower queries. Users should have the option of finding out the best > balance for themselves. If there are concerns with larger statistics > targets, we should document them. I find nothing about this in the > documentation at the moment. I find 2 things: «Increasing the target causes a proportional increase in the time and space needed to do ANALYZE. » in http://www.postgresql.org/docs/current/static/sql-analyze.html and « ... at the price of consuming more space in pg_statistic and slightly more time to compute the estimates» in http://www.postgresql.org/docs/current/static/planner-stats.html But probably not clear enought about time impact in query plan. > > > IIRC, egjoinsel is one of the weak spots, so tests involving planning of > > joins between two tables with large MCV lists would be a good place to > > start. > > I have run tests with joining two and three tables with 10 million rows > each, and the planning times seem to be virtually unaffected by the > statistics target, for values between 10 and 800000. They all look more or > less like this: > > test=# explain select * from test1, test2 where test1.a = test2.b; > QUERY PLAN > --------------------------------------------------------------------------- >-- Hash Join (cost=308311.00..819748.00 rows=10000000 width=16) > Hash Cond: (test1.a = test2.b) > -> Seq Scan on test1 (cost=0.00..144248.00 rows=10000000 width=8) > -> Hash (cost=144248.00..144248.00 rows=10000000 width=8) > -> Seq Scan on test2 (cost=0.00..144248.00 rows=10000000 > width=8) (5 rows) > > Time: 132,350 ms > > and with indexes > > test=# explain select * from test1, test2 where test1.a = test2.b; > QUERY PLAN > --------------------------------------------------------------------------- >----------------- Merge Join (cost=210416.65..714072.26 rows=10000000 > width=16) > Merge Cond: (test1.a = test2.b) > -> Index Scan using test1_index1 on test1 (cost=0.00..282036.13 > rows=10000000 width=8) > -> Index Scan using test2_index1 on test2 (cost=0.00..282036.13 > rows=10000000 width=8) > (4 rows) > > Time: 168,455 ms > > The time to analyze is also quite constant, just before you run out of > memory. :) The MaxAllocSize is the limiting factor in all this. In my > example, statistics targets larger than about 800000 created pg_statistic > rows that would have been larger than 1GB, so they couldn't be stored. > > I suggest that we get rid of the limit of 1000, adequately document > whatever issues might exist with large values (possibly not many, see > above), and add an error message more user-friendly than "invalid memory > alloc request size" for the cases where the value is too large to be > storable. -- Cédric Villemain Administrateur de Base de Données Cel: +33 (0)6 74 15 56 53 http://dalibo.com - http://dalibo.org
pgsql-hackers by date: