[HACKERS] Re: Improve OR conditions on joined columns (common star schemaproblem) - Mailing list pgsql-hackers

From Jim Nasby
Subject [HACKERS] Re: Improve OR conditions on joined columns (common star schemaproblem)
Date
Msg-id 3605fcb2-6622-6520-b678-83171fe4c9d8@openscg.com
Whole thread Raw
In response to [HACKERS] Re: Improve OR conditions on joined columns (common star schemaproblem)  (David Steele <david@pgmasters.net>)
Responses Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 3/16/17 11:54 AM, David Steele wrote:
> On 2/14/17 4:03 PM, Tom Lane wrote:
>> Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
>>> On 2/14/17 1:18 PM, Tom Lane wrote:
>>>> One point that could use further review is whether the de-duplication
>>>> algorithm is actually correct.  I'm only about 95% convinced by the
>>>> argument I wrote in planunionor.c's header comment.
>>
>>> I'll put some thought into it and see if I can find any holes. Are you
>>> only worried about the removal of "useless" rels or is there more?
>>
>> Well, the key point is whether it's really OK to de-dup on the basis
>> of only the CTIDs that are not eliminated in any UNION arm.  I was
>> feeling fairly good about that until I thought of the full-join-to-
>> left-join-to-no-join conversion issue mentioned in the comment.
>> Now I'm wondering if there are other holes; or maybe I'm wrong about
>> that one and it's not necessary to be afraid of full joins.
>
> This patch applies cleanly (with offsets) and compiles at cccbdde.
>
> Jim, have you had time to think about this?  Any insights?

Having thought about it, I share Tom's concerns. And now I'm worried 
about what happens if there are multiple separate OR clauses. I guess 
those would be handled by separate UNIONs?

I'm also finding it a bit hard to follow the comment Tom mentioned. I'm 
pretty sure I understand what's going on until this part:

> The identical proof can be expected to apply
> +  * in other arms, except in an arm that references that rel in its version
> +  * of the OR clause.  But in such an arm, we have effectively added a
> +  * restriction clause to what is known in other arms, which means that the
> +  * set of rows output by that rel can't increase compared to other arms.

AIUI, this is describing a case something like this:

SELECT child.blah FROM child LEFT JOIN parent USING(parent_id)  WHERE child.foo AND (child.baz=1 or child.baz=2)

given that parent.parent_id is unique. Except for these concerns, there 
would need to be a complex OR somewhere in here that sometimes 
referenced parent and sometimes didn't, such as
  WHERE child.foo AND (child.baz=1 OR parent.foo=3)

But I'm not following the logic here (very possibly because I'm wrong 
about what I said above):

> +  * Therefore the situation in such an arm must be that including the rel
> +  * could result in either zero or one output row, rather than exactly one
> +  * output row as in other arms.  So we still don't need to consider it for
> +  * de-duplication.

I'm definitely not certain about the rest of it.

Tom, could you expand the description some, especially with some examples?
-- 
Jim Nasby, Chief Data Architect, Austin TX
OpenSCG                 http://OpenSCG.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Composite IS NULL again, this time with plpgsql
Next
From: Jim Nasby
Date:
Subject: Re: [HACKERS] PinBuffer() no longer makes use of strategy