Re: [GENERAL] Performance of full outer join in 8.3 - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: [GENERAL] Performance of full outer join in 8.3
Date
Msg-id 1239858730.23905.125.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Re: [GENERAL] Performance of full outer join in 8.3  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [GENERAL] Performance of full outer join in 8.3  (Robert Haas <robertmhaas@gmail.com>)
Re: [GENERAL] Performance of full outer join in 8.3  (David Fetter <david@fetter.org>)
List pgsql-hackers
On Wed, 2009-04-15 at 20:58 -0400, Robert Haas wrote:
> On Wed, Apr 15, 2009 at 7:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> > The output of EXPLAIN is nowhere near stable enough to use within the
> > current exact-match regression test framework.  I'm not sure it would
> > be stable even if we suppressed the rowcount and cost figures.  Those
> > figures vary across platforms (because of alignment effects and probably
> > other things) and are also sensitive to the timing of autovacuums.  It
> > is known that a nontrivial fraction of the existing regression test
> > cases do suffer from uninteresting plan changes across platforms or
> > as a result of various phase-of-the-moon effects; that's why we keep
> > having to add "ORDER BY" clauses now and then.
> 
> Interesting.  I suppose you could insulate yourself from this somewhat
> by populating pg_statistic with a particular set of values rather than
> relying on ANALYZE to gather them, but this would have the substantial
> downside of being way more work to maintain, especially if anyone ever
> changed pg_statistic.
> 
> On a more practical level, I do think we need to give real
> consideration to some kind of options syntax for EXPLAIN, maybe
> something as simple as:
> 
> EXPLAIN (option_name, ...) query
> 
> Or maybe:
> 
> EXPLAIN (option_name = value, ...) query
> 
> It may or may not be the case that generating a useful regression test
> suite for the planner is too much work for anyone to bother, but they
> certainly won't if the tools aren't available.  It seems we get at
> least one request a month for some kind of explain-output option:
> suppress row counts, suppress costs, gather I/O statistics, show
> outputs, show # of batches for a hash join, and on and on and on.  I
> think we should implement a very basic version that maybe does nothing
> more than let you optionally suppress some of the existing output, but
> which provides an extensible syntax for others to build on.

I think the way to do this is to introduce plan output in XML (that
matches the node structure of the plan). We can then filter away any
junk we don't want to see for regression tests, or better still augment
the exact-match framework with a fuzzy-match spec that allows us to
specify a range of values.

The skill would be in constructing a set of tests that was not sensitive
to minor changes. The OP's join for example had a huge cost range
difference that would have clearly shown up in a regression test.

This will only move forward if it adds value directly for Tom, so if
it's worth doing then he needs to specify it and ask for someone to do
it. There will be someone available if the task is well defined.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Unicode string literals versus the world
Next
From: Fujii Masao
Date:
Subject: Re: Why isn't stats_temp_directory automatically created?