Home > mailing lists

Re: Hash Right join and seq scan - Mailing list pgsql-performance

From	James Pang
Subject	Re: Hash Right join and seq scan
Date	July 8 05:21:58
Msg-id	CAHgTRfdVJXARbUdU7KjhRuTEk8tr7=H1pOrWemJ8eOvSW3VExw@mail.gmail.com Whole thread Raw
In response to	Re: Hash Right join and seq scan (David Rowley <dgrowleyml@gmail.com>)
List	pgsql-performance

Tree view

Sorry for confusion, it's from attached explain output of the SQL. please check attached. my questions is : for nestloop of two partition tables , they use same partition key and equal join on partition key, the cost could be "outer tables estimated rows" * (average index scan of only one partition of inner table) , instead of "outer tables estimated rows" * (index scans of all partitions), is it possible ? or it's still need running time partition pruning enhancement?

random_page_cost = 1.1, seq_page_cost=1.0, effective_cache_size=0.75*physical memory size. set random_page_cost=0.9 make optimizer to choose index scan instead of seq scan.

Thanks,

James

David Rowley <dgrowleyml@gmail.com> 於 2024年7月6日週六上午8:33寫道：

On Sat, 6 Jul 2024 at 02:43, James Pang <jamespang886@gmail.com> wrote:
> for nest loop path, since the first one estimated only "8" rows , and they use partitionkeyid as joinkey and all are hash partitions , is it better to estimate cost to 8 (loop times) * 1600 = 12800 (each one loop map to only 1 hash partition bitmap scan ,avg one partition cost), that's much less than 398917.29 of all partitions ?

I'm not really sure where you're getting the numbers from here. The
outer side of the deepest nested loop has an 8 row estimate, not the
nested loop itself. I'm unsure where the 1600 is from. I only see
1669.

As of now, we don't do a great job of costing for partition pruning
that will happen during execution. We won't be inventing anything to
fix that in existing releases of PostgreSQL, so you'll need to either
adjust the code yourself, or find a workaround.

You've not shown us your schema, but perhaps enable_partitionwise_join
= on might help you. Other things that might help are further lowering
random_page_cost or raising effective_cache_size artificially high.
It's hard to tell from here how much random I/O is being costed into
the index scans. You could determine this by checking if the nested
loop plan costs change as a result of doing further increases to
effective_cache_size. You could maybe nudge it up enough for it to win
over the hash join plan. It is possible that this won't work, however.

> for secondary Nest Loop Anti join could be rows 299118 rows * 15.78(avg index scan cost of one partition) = 4,720,082 that still much less than 132168227.57 ?
> for Hash Right join, is it possible to estimate by 8 seq partition scan instead of all 32 hash partitions since the first query estimated 8 rows only ?
> extend statistics may help estimate count(partitionkeyid) based on other columns bind variables, but looks like that did not help table join case.

I can't quite follow this. You'll need to better explain where you're
getting these numbers for me to be able to understand.

David

Attachment

sql_details.txt

pgsql-performance by date:

From: Vitalii Tymchyshyn
Date: 08 July, 03:42:27
Subject: Re: Low performance between datacenters

From: Justin Pryzby
Date: 08 July, 05:26:56
Subject: Re: Hash Right join and seq scan

Re: Hash Right join and seq scan - Mailing list pgsql-performance

Attachment

Previous

Next