Home > mailing lists

Re: benchmarking the query planner - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: benchmarking the query planner
Date	December 12, 2008 13:34:39
Msg-id	1229103209.8673.62.camel@ebony.2ndQuadrant Whole thread
In response to	Re: benchmarking the query planner (Gregory Stark <stark@enterprisedb.com>)
Responses	Re: benchmarking the query planner
List	pgsql-hackers

Tree view

On Fri, 2008-12-12 at 17:05 +0000, Gregory Stark wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> 
> > The amount of I/O could stay the same, just sample all rows on block.
> > Lifting the sample size will help large tables. Will it be perfect? No.
> > But I'll take "better" over "not working at all".
> 
> That will just raise the table size at which the problems start. It'll still
> be a constant-sized sample.

Work with me here. I want to make the situation better. It still won't
be perfect, but is that an argument against any action at all?

> It will also introduce strange biases. For instance in a clustered table it'll
> think there are a lot more duplicates than there really are because it'll see
> lots of similar values.
> 
> Incidentally we *do* do block sampling. We pick random blocks and then pick
> random records within those blocks. This was new in, uh, 7.4? 8.0? Sometime
> around then. It dramatically reduced the i/o requirements but there were long
> discussions of how to do it without introducing biases.

No, we pick random rows. On bigger tables, they get further apart
typically and so we miss any clustering. I mean that we should pick a
random block and read all rows on it.

> > If we are going to quote literature we should believe all the
> > literature. We can't just listen to some people that did a few tests
> > with sample size, but then ignore the guy that designed the MS optimizer
> > and many others.
> 
> I'm not sure what you're talking about regarding "some people that did a few
> tests". I looked around for the paper I keep referencing and can't find it on
> my laptop. I'll look for it online. But it included a section which was a
> survey of past results from other papers and the best results required
> stupidly large sample sizes to get anything worthwhile. 

Even if you find it, we still need to know why we would listen to the
research in the absent paper yet ignore the conclusion in the paper by
the man in charge of the MS optimizer who said that block level sampling
is a good idea.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

pgsql-hackers by date:

From: "Robert Haas"
Date: 12 December 2008, 13:31:17
Subject: Re: Updates of SE-PostgreSQL 8.4devel patches (r1268)

From: Teodor Sigaev
Date: 12 December 2008, 13:37:19
Subject: Re: [PATCHES] GIN improvements

Re: benchmarking the query planner - Mailing list pgsql-hackers

Previous

Next