Re: Hash Right join and seq scan - Mailing list pgsql-performance

From David Rowley
Subject Re: Hash Right join and seq scan
Date
Msg-id CAApHDvo7SGgPoKW8KzBXVK_eY3dqanhDyJeizHDBb78uYxMrhA@mail.gmail.com
Whole thread Raw
In response to Re: Hash Right join and seq scan  (James Pang <jamespang886@gmail.com>)
Responses Re: Hash Right join and seq scan
List pgsql-performance
On Sat, 6 Jul 2024 at 02:43, James Pang <jamespang886@gmail.com> wrote:
>            for nest loop path, since the first one estimated only "8" rows , and they use partitionkeyid as joinkey
andall are hash partitions , is it better to estimate cost to  8 (loop times) * 1600 = 12800 (each one loop  map to
only1 hash partition bitmap scan ,avg one partition cost), that's much less than 398917.29 of all partitions ? 

I'm not really sure where you're getting the numbers from here. The
outer side of the deepest nested loop has an 8 row estimate, not the
nested loop itself.  I'm unsure where the 1600 is from. I only see
1669.

As of now, we don't do a great job of costing for partition pruning
that will happen during execution.  We won't be inventing anything to
fix that in existing releases of PostgreSQL, so you'll need to either
adjust the code yourself, or find a workaround.

You've not shown us your schema, but perhaps enable_partitionwise_join
= on might help you. Other things that might help are further lowering
random_page_cost or raising effective_cache_size artificially high.
It's hard to tell from here how much random I/O is being costed into
the index scans.  You could determine this by checking if the nested
loop plan costs change as a result of doing further increases to
effective_cache_size. You could maybe nudge it up enough for it to win
over the hash join plan. It is possible that this won't work, however.

>  for secondary Nest Loop Anti join could be rows 299118 rows *  15.78(avg index scan cost of one partition) =
4,720,082that still much less than  132168227.57 ? 
>          for Hash Right join, is it possible to estimate by  8 seq partition scan instead of all 32 hash partitions
sincethe first query estimated 8 rows only ? 
>          extend statistics may help estimate count(partitionkeyid) based on other columns bind variables, but looks
likethat did not help table join case. 

I can't quite follow this. You'll need to better explain where you're
getting these numbers for me to be able to understand.

David



pgsql-performance by date:

Previous
From: James Pang
Date:
Subject: Re: Hash Right join and seq scan
Next
From: Tamás PAPP
Date:
Subject: Low performance between datacenters