Re: Removing unneeded self joins - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Removing unneeded self joins
Date
Msg-id CA+TgmoZ_HbxXopVvwtrne7kh-hz+Smp1LwjrFzBrq_cGPXL_NA@mail.gmail.com
Whole thread Raw
In response to Re: Removing unneeded self joins  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, May 16, 2018 at 6:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
>> On 17 May 2018 at 08:44, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> What I was advocating was an approach that varies according to the
>>> query cost, so we don't waste time trying to tune the heck out of OLTP
>>> queries, but for larger queries we might take a more considered
>>> approach.
>
>> That's tricky. If we do this, it should be done before Path
>> generation, so not much is known about the costs in those case.
>
> Yeah.  It'd have to be a very heuristic thing that doesn't account
> for much beyond the number of relations in the query, and maybe their
> sizes --- although I don't think we even know the latter at the
> point where join removal would be desirable.  (And note that one of
> the desirable benefits of join removal is not having to find out the
> sizes of removed rels ... so just swapping that around doesn't appeal.)

As I've mentioned before, the problem we're talking about here is also
highly relevant to parallel query.  We only want to bother generating
partial paths for queries that are expensive enough to justify
considering parallelism, but we don't know how expensive they are
until we finishing planning.  The only way I could think of to tackle
that problem was to drive it off the relation sizes, but the presence
or absence of expensive functions in the target list can wildly change
the point at which parallelism potentially becomes useful.  So we end
up sometimes wasting effort generating partial paths that are totally
useless, and at other times failing to generate partial paths that
would have been useful.  (Really, I'd like to generate a lot more
partial paths than we do, trying out various numbers of workers, but
that would just make the existing problem worse.)

I have wondered about doing a preliminary pass over the tree where we
try to make a crude estimate of the amount of effort involved, and
then planning for real with that number in hand.  But it seems like
there's so little information you could get at that early stage that
it would be hard to decide anything useful on that basis.  You've got
to at least have relation sizes, and really you need some estimate of
the result cardinality as well.  It seems like we currently can't
figure out cardinality without also computing paths, and I've wondered
if we could split those into two separate phases.  But every time I
think about trying to do that I realize that my pain tolerance isn't
that high.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Problem while updating a foreign table pointing to a partitionedtable on foreign server
Next
From: Robert Haas
Date:
Subject: Re: Question about xmloption and pg_restore