Re: [HACKERS] Re: Improve OR conditions on joined columns (commonstar schema problem) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] Re: Improve OR conditions on joined columns (commonstar schema problem)
Date
Msg-id CAH2-WzkFm4KBWAkDBiN2nOaXLDJ7hLt=+dY3bc10fe266E+=1w@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Re: Improve OR conditions on joined columns (commonstar schema problem)
List pgsql-hackers
On Thu, Aug 23, 2018 at 11:10 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Rebased up to HEAD, per cfbot nagging.  Still no substantive change from
> v2.

I happened to have the opportunity to talk to Tom about this patch in
person. I expressed some very general concerns that are worth
repeating publicly.

This patch adds an enhancement that is an example of a broader class
of optimizer enhancement primarily aimed at making star-schema queries
have more efficient plans, by arranging to use several independent
nested loop joins based on a common pattern. Each nestloop join has
one particular dimension table on the outer side, and the fact table
on the inner side. The query plan is not so much a tree as it is a DAG
(directed acyclic graph), because the fact table is visited multiple
times. (There are already cases in Postgres in which the query plan is
technically a DAG, actually, but it could be taken much further.)

Aside from being inherently more efficient, DAG-like star schema plans
are also *ideal* targets for parallel query. The executor can execute
each nested loop join in a parallel worker with minimal contention --
the inner side of each nestloop join all probe a different fact table
index to the others. It's almost like executing several different
simple queries concurrently, with some serial processing at the end.
Even that serial processing can sometimes be minimized by having some
of the parallel workers use a Bloom filter in shared memory.

Tom is already concerned that the optimization added by this patch may
be too much of a special case, which is understandable. It may be that
we're failing to identify some greater opportunity to add DAG-like
plans for star schema queries.

-- 
Peter Geoghegan


pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Removing useless DISTINCT clauses
Next
From: Stephen Frost
Date:
Subject: Re: Removing useless DISTINCT clauses