Home > mailing lists

Re: [HACKERS] Improve OR conditions on joined columns (common starschema problem) - Mailing list pgsql-hackers

From	Claudio Freire
Subject	Re: [HACKERS] Improve OR conditions on joined columns (common starschema problem)
Date	February 10, 2017 12:18:27
Msg-id	CAGTBQpZ0JYHqsLikPB=ZB+Qzvqu5pjRcpeY97Zmf+g8dMKEmug@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Improve OR conditions on joined columns (common starschema problem) (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List	pgsql-hackers

Tree view

On Thu, Feb 9, 2017 at 9:50 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> WHERE t1 IN ('a','b') OR t2 IN ('c','d')
>
> into
>
> WHERE f1 IN (1,2) OR f2 IN (3,4)
>
> (assuming a,b,c,d maps to 1,2,3,4)
>
> BTW, there's an important caveat here: users generally do NOT want duplicate
> rows from the fact table if the dimension table results aren't unique. I
> thought my array solution was equivalent to what the JOINs would do in that
> case but that's actually wrong. The array solution does provide the behavior
> users generally want here though. JOIN is the easiest tool to pick up for
> this, so it's what people gravitate to, but I suspect most users would be
> happier with a construct that worked like the array trick does, but was
> easier to accomplish.
>
> I wonder if any other databases have come up with non-standard syntax to do
> this.

What I've been doing is do those transforms (tn -> fn) in application
code. While it's a chore, the improvement in plans is usually well
worth the trouble.

IF there's a FK between fact and dimension tables, you can be certain
the transform will yield equivalent results, becuase you'll be certain
the joins don't duplicate rows.

So the transform should be rather general and useful.

If you have a join of the form:

a join b on a.f1 = b.id

Where a.f1 has an FK referencing b.id, and a filter on b X of any
form, you can turn the plan into:

with b_ids as (select id from b where X)
...
a join b on a.f1 = b.id and a.f1 in (select id from b_ids)

In order to be useful, the expected row count from b_ids should be rather small.

pgsql-hackers by date:

From: Stephen Frost
Date: 10 February 2017, 12:04:36
Subject: Re: [HACKERS] Removal of deprecated views pg_user, pg_group,pg_shadow

From: Amit Langote
Date: 10 February 2017, 12:19:47
Subject: [HACKERS] Partitioned tables and relfilenode

Re: [HACKERS] Improve OR conditions on joined columns (common starschema problem) - Mailing list pgsql-hackers

Previous

Next