Re: Multi-pass planner - Mailing list pgsql-hackers

From Mischa Sandberg
Subject Re: Multi-pass planner
Date
Msg-id CB0F73E165CFFA4496D12161D835562CF876ABC3@US-COL-EXCHMBX1.green.sophos
Whole thread Raw
In response to Re: Multi-pass planner  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
In a federated database engine I built in the mid-90's,
it more or less ran both plans in parallel, to implement fast-first and min-total cost.
The index query in general started returning rows whose oids went into a filter
that discarded them from the serial query once it started to crank things out.

'index' and 'serial' is not exactly what was going on;
the federated engine was joining multiple tables across multiple (sometimes hundreds)
of databases, with really variable transmission times, and tolerance for timed-out db servers.
It had no reliable way to get cost estimates in advance,
since it didn't have access to statistical metadata (no, Postgres wasn't one
of the databases I had to take as input, more's the pity).

'Parallel' is also not quite accurate. It retained plans (queries were heavily repeated)
and did mickey-mouse simulated annealing of past performance, so that if one of two plans
was the faster 90% of the time, then there was a 10% probability it would run both plans in parallel,
just to check whether something had changed.

The code was part of a proprietary product, and was used by Acxiom and Cisco.
But the concept was pretty simple and as described above.

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Greg Stark
> Sent: Thursday, August 20, 2009 10:32 AM
> To: Robert Haas
> Cc: Kevin Grittner; decibel; Pg Hackers
> Subject: Re: [HACKERS] Multi-pass planner
>
> On Thu, Aug 20, 2009 at 6:28 PM, Greg Stark<gsstark@mit.edu> wrote:
> > I don't think it's a bad idea, I just think you have to set your
> > expectations pretty low. If the estimates are bad there
> isn't really
> > any plan that will be guaranteed to run quickly.
>
> Actually this is usually Tom's point when this topic comes
> up. Say you're deciding between an index scan and a
> sequential scan. The sequential scan has a total cost of
> 1000..1000 but the index scan has an estimated total cost of
> 1..10000. If you pick the sequential scan you might be
> running 1000x slower than the index scan in the worst case.
> But if you pick the index scan you might be running 10x
> slower than the sequential scan in the worst case. If you
> don't trust the estimate where does that leave you? Making a
> mistake either way is fatal.
>
>
> --
> greg
> http://mit.edu/~gsstark/resume.pdf
>
> --
> Sent via pgsql-hackers mailing list
> (pgsql-hackers@postgresql.org) To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: converting between netmask formats
Next
From: Brendan Jurd
Date:
Subject: Re: WIP: generalized index constraints