Re: Plan time Improvement - 64bit bitmapset - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Plan time Improvement - 64bit bitmapset |
Date | |
Msg-id | 603c8f070906031807u73d6c29an53f86a172e3e3312@mail.gmail.com Whole thread Raw |
In response to | Re: Plan time Improvement - 64bit bitmapset (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Wed, Jun 3, 2009 at 5:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Jun 3, 2009 at 3:18 PM, Kevin Grittner >> <Kevin.Grittner@wicourts.gov> wrote: >>> I'm finding that even the ones that had a plan time in the range of >>> 260 ms go down to 15 ms to 85 ms once the statistics are cached. > >> I had some performance results back when we were last looking at >> default_statistics_target that indicated that the time to repeatedly >> decompress a toasted statistics array contributed significantly to the >> total planning time, but my suggestion to disable compression for >> pg_statistic was summarily poo-poohed for reasons that still aren't >> quite clear to me. > > Well, smaller is better. Kevin's example demonstrates that it's good > to have the stats sucked into cache. If they were uncompressed then > less of them would fit in however much cache space you have, and > whatever CPU savings you get would be lost to more I/O to read in > stats entries. Smaller-but-compressed isn't better if the limiting factor is CPU rather than I/O, which I suspect is nearly always the case for pg_statistic. I would be very surprised to find out that anyone has a database where query planning regularly results in disk access. Sure, there are cold-cache effects at system startup, and maybe for rarely used tables whose pg_statistic entries have been paged out to make room for more heavily used data, but it doesn't make any sense to me to optimize for those cases at the expense of the common case where you are constantly executing new and different queries. In that case, fast access to the statistics tuples means "I don't need to decompress this", not "I don't need to read this in from disk". It wouldn't be so bad if memory-resident tuples, once decompressed, stayed decompressed at least for the duration of the current query planning cycle. But it seemed to me when I looked at this a while back that every call to eqjoinsel() and the various other functions that look at statistics to do their thing appears to decompress the tuple all over again. That leads to a huge slowdown in query planning beginning, for the example Greg Stark constructed, right around a statistics target of 90. http://archives.postgresql.org/pgsql-hackers/2008-12/msg00386.php The exact point where you hit the slowdown will presumably vary with the size of the relevant columns, but it's surely going to affect a lot more people now that we've made the default target 10x larger. > Of course, this all depends on total database size vs total RAM, > but that's how I'd interpret the observation. PG is still mostly > optimized for databases bigger than RAM, so this decision still > makes sense. > > (I think you could try marking the columns of pg_statistic as "don't > compress" if you have a DB you want to optimize for all-in-memory > behavior.) If you set allow_system_table_mods... ...Robert
pgsql-hackers by date: