Re: Should Oracle outperform PostgreSQL on a complex - Mailing list pgsql-performance

From Mark Kirkwood
Subject Re: Should Oracle outperform PostgreSQL on a complex
Date
Msg-id 43A5DF23.5010504@paradise.net.nz
Whole thread Raw
In response to Re: Should Oracle outperform PostgreSQL on a complex  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Should Oracle outperform PostgreSQL on a complex  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-performance
Simon Riggs wrote:
> On Sat, 2005-12-17 at 13:13 -0500, Tom Lane wrote:
>
>>Simon Riggs <simon@2ndquadrant.com> writes:
>>
>>>On Fri, 2005-12-16 at 23:28 -0500, Bruce Momjian wrote:
>>>
>>>>How are star joins different from what we do now?
>>
>>>Methods:
>>>1. join all N small tables together in a cartesian product, then join to
>>>main Large table once (rather than N times)
>>
>>Of course, the reason the current planner does not think of this is that
>>it does not consider clauseless joins unless there is no alternative.
>
>
> Understood
>
>
>> The plan you currently get for
>>this sort of scenario is typically a nest of hash joins:
>>
>>                               QUERY PLAN
>>------------------------------------------------------------------------
>> Hash Join  (cost=2.25..4652.25 rows=102400 width=16)
>>   Hash Cond: ("outer".f1 = "inner".f1)
>>   ->  Hash Join  (cost=1.12..3115.12 rows=102400 width=12)
>>         Hash Cond: ("outer".f2 = "inner".f1)
>>         ->  Seq Scan on fact  (cost=0.00..1578.00 rows=102400 width=8)
>>         ->  Hash  (cost=1.10..1.10 rows=10 width=4)
>>               ->  Seq Scan on d2  (cost=0.00..1.10 rows=10 width=4)
>>   ->  Hash  (cost=1.10..1.10 rows=10 width=4)
>>         ->  Seq Scan on d1  (cost=0.00..1.10 rows=10 width=4)
>>(9 rows)
>>
>>This involves only one scan of the fact table.  As each row is pulled up
>>through the nest of hash joins, we hash one dimension key and join to
>>one small table at each level.
>
>
> Understood
>
>
>>This is at worst the same amount of work
>>as hashing all the keys at once and probing a single cartesian-product
>>hashtable, probably less work (fewer wasted key-comparisons).  And
>>definitely less memory.  You do have to keep your eye on the ball that
>>you don't waste a lot of overhead propagating the row up through
>>multiple join levels, but we got rid of most of the problem there in
>>8.1 via the introduction of "virtual tuple slots".  If this isn't fast
>>enough yet, it'd make more sense to invest effort in further cutting the
>>executor's join overhead (which of course benefits *all* plan types)
>>than in trying to make the planner choose a star join.
>
>
> That join type is used when an index-organised table is available, so
> that a SeqScan of the larger table can be avoided.
>
> I'd say the plan would make sense if the columns of the cartesian
> product match a multi-column index on the larger table that would not
> ever be used unless sufficient columns are restricted in each lookup.
> That way you are able to avoid the SeqScan that occurs for the multiple
> nested Hash Join case. (Clearly, normal selectivity rules apply on the
> use of the index in this way).
>
> So I think that plan type still can be effective in some circumstances.
> Mind you: building an N-way index on a large table isn't such a good
> idea, unless you can partition the tables and still use a join. Which is
> why I've not centred on this case as being important before now.
>
> My understanding: Teradata and DB2 use this.
>

FWIW - I think DB2 uses the successive fact RID buildup (i.e method 2),
unfortunately I haven't got a working copy of DB2 in front of me to test.

Cheers

Mark

pgsql-performance by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: Should Oracle outperform PostgreSQL on a complex
Next
From: Simon Riggs
Date:
Subject: Re: Should Oracle outperform PostgreSQL on a complex