Re: Merge David and Goliath tables efficiently - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: Merge David and Goliath tables efficiently
Date
Msg-id 27cdb9c5-6dc7-87ff-303e-b0d0d20edbc4@enterprisedb.com
Whole thread Raw
In response to Re: Merge David and Goliath tables efficiently  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: Merge David and Goliath tables efficiently  (nicolas paris <nicolas.paris@riseup.net>)
List pgsql-performance

On 6/19/23 09:46, Alvaro Herrera wrote:
> I came here to talk about partitionwise join, but then noticed you have
> already thought of that:
> 
> On 2023-Jun-18, nicolas paris wrote:
> 
>> Note that both plan acome from the same partitioned by hash table with
>> 100 parts, with a unique index on the list_id + hash_key. For strategy
>> 2.1, I turned on enable_partitionwise_join, since david table has the
>> same partitioning scheme as goliath including unique indexe. In both
>> case the query is:
> 
> Hmm, I suppose the reason partitionwise join isn't having any effect is
> that the presence of WHEN NOT MATCHED clauses force an outer join, which
> probably disarms partitionwise joining, since each join pair would
> require to match for nulls, so there would be two matching partitions at
> the other end.  A quick test for this hypothesis might be to try the
> MERGE without the WHEN NOT MATCHED clauses and see if partitionwise join
> works better.
> 
> Maybe Tom L's new outer-join infrastructure in 16 allows to improve on
> this, not sure.
> 

Not sure why would that disarm partitionwise join - attached is a simple
reproducer, generating two tables, loading 10000000 and 10000 rows into
them, and then doing explain on a simple merge.

IMHO the thing that breaks it is the ORDER BY in the merge, which likely
acts as an optimization fence and prevents all sorts of smart things
including the partitionwise join. I'd bet that if Nicolas replaces

  MERGE INTO "goliath" ca
  USING (SELECT * FROM "david" ORDER BY "list_id") AS t
  ..

with

  MERGE INTO "goliath" ca
  USING "david" AS t
  ...

it'll start doing the working much better.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

pgsql-performance by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Merge David and Goliath tables efficiently
Next
From: nicolas paris
Date:
Subject: Re: Merge David and Goliath tables efficiently