Home > mailing lists

Re: benchmarking the query planner - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: benchmarking the query planner
Date	December 12, 2008 11:08:17
Msg-id	1229094409.13078.283.camel@ebony.2ndQuadrant Whole thread Raw
In response to	Re: benchmarking the query planner (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On Fri, 2008-12-12 at 09:35 -0500, Tom Lane wrote:

> AFAICS, marginal enlargements in the sample size aren't going to help
> much for ndistinct --- you really need to look at most or all of the
> table to be guaranteed anything about that.
> 
> But having said that, I have wondered whether we should consider
> allowing the sample to grow to fill maintenance_work_mem, rather than
> making it a predetermined number of rows.  One difficulty is that the
> random-sampling code assumes it has a predetermined rowcount target;
> I haven't looked at whether that'd be easy to change or whether we'd
> need a whole new sampling algorithm.

I think we need to do block sampling before we increase sample size. On
large tables we currently do one I/O per sampled row, so the I/O cost of
ANALYZE would just increase linearly.

We need the increased sample size for ndistinct, not for other stats. So
I would suggest we harvest a larger sample, use that for ndistinct
estimation, but then sample-the-sample to minimise processing time for
other stats that aren't as sensitive as ndistinct.

Currently we fail badly on columns that have been CLUSTERed and we can
improve that significantly by looking at adjacent groups of rows, i.e.
block sampling.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

pgsql-hackers by date:

From: Tom Lane
Date: 12 December 2008, 11:07:43
Subject: Re: WIP: default values for function parameters

From: Tom Lane
Date: 12 December 2008, 11:09:49
Subject: Re: benchmarking the query planner

Re: benchmarking the query planner - Mailing list pgsql-hackers

Previous

Next