Home > mailing lists

Performance of Query 2 in TPC-H - Mailing list pgsql-performance

From	Ba Jinsheng
Subject	Performance of Query 2 in TPC-H
Date	November 4, 2024 11:42:52
Msg-id	SEZPR06MB649480C104C2479D0F0D8ACD8A512@SEZPR06MB6494.apcprd06.prod.outlook.com Whole thread Raw
List	pgsql-performance

Tree view

Please see this case:

TPC-H query 2:

select

s_acctbal,

s_name,

n_name,

p_partkey,

p_mfgr,

s_address,

s_phone,

s_comment

from

PART,

SUPPLIER,

PARTSUPP,

NATION,

REGION

where

p_partkey = ps_partkey

and s_suppkey = ps_suppkey

and p_size = 30

and p_type like '%STEEL'

and s_nationkey = n_nationkey

and n_regionkey = r_regionkey

and r_name = 'ASIA'

and ps_supplycost = (

select

min(ps_supplycost)

from

PARTSUPP,

SUPPLIER,

NATION,

REGION

where

p_partkey = ps_partkey

and s_suppkey = ps_suppkey

and s_nationkey = n_nationkey

and n_regionkey = r_regionkey

and r_name = 'ASIA'

)

order by

s_acctbal desc,

n_name,

s_name,

p_partkey

limit

100;

Its query plan and execution time:

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Limit (cost=66275.04..66275.05 rows=1 width=192) (actual time=268.349..268.418 rows=100 loops=1)

-> Sort (cost=66275.04..66275.05 rows=1 width=192) (actual time=268.348..268.411 rows=100 loops=1)

Sort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name, part.p_partkey

Sort Method: top-N heapsort Memory: 70kB

-> Hash Join (cost=37831.01..66275.03 rows=1 width=192) (actual time=230.386..268.130 rows=485 loops=1)

Hash Cond: ((part.p_partkey = partsupp.ps_partkey) AND ((SubPlan 1) = partsupp.ps_supplycost))

-> Gather (cost=1000.00..6425.40 rows=784 width=30) (actual time=0.586..0.753 rows=826 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Parallel Seq Scan on part (cost=0.00..5347.00 rows=327 width=30) (actual time=0.082..16.979 rows=275 loops=3)

Filter: (((p_type)::text ~~ '%STEEL'::text) AND (p_size = 30))

Rows Removed by Filter: 66391

-> Hash (cost=30524.01..30524.01 rows=160000 width=172) (actual time=228.502..228.506 rows=160240 loops=1)

Buckets: 65536 Batches: 8 Memory Usage: 4648kB

-> Hash Join (cost=408.01..30524.01 rows=160000 width=172) (actual time=4.820..165.744 rows=160240 loops=1)

Hash Cond: (partsupp.ps_suppkey = supplier.s_suppkey)

-> Seq Scan on partsupp (cost=0.00..25516.00 rows=800000 width=14) (actual time=0.013..63.459 rows=800000 loops=1)

-> Hash (cost=383.01..383.01 rows=2000 width=166) (actual time=4.789..4.792 rows=2003 loops=1)

Buckets: 2048 Batches: 1 Memory Usage: 413kB

-> Hash Join (cost=2.51..383.01 rows=2000 width=166) (actual time=0.098..3.945 rows=2003 loops=1)

Hash Cond: (supplier.s_nationkey = nation.n_nationkey)

-> Seq Scan on supplier (cost=0.00..323.00 rows=10000 width=144) (actual time=0.013..2.060 rows=10000 loops=1)

-> Hash (cost=2.45..2.45 rows=5 width=30) (actual time=0.053..0.055 rows=5 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 9kB

-> Hash Join (cost=1.07..2.45 rows=5 width=30) (actual time=0.043..0.049 rows=5 loops=1)

Hash Cond: (nation.n_regionkey = region.r_regionkey)

-> Seq Scan on nation (cost=0.00..1.25 rows=25 width=34) (actual time=0.008..0.010 rows=25 loops=1)

-> Hash (cost=1.06..1.06 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 9kB

-> Seq Scan on region (cost=0.00..1.06 rows=1 width=4) (actual time=0.009..0.010 rows=1 loops=1)

Filter: (r_name = 'ASIA'::bpchar)

Rows Removed by Filter: 4

SubPlan 1

-> Aggregate (cost=48.70..48.71 rows=1 width=32) (actual time=0.018..0.018 rows=1 loops=1311)

-> Nested Loop (cost=0.85..48.70 rows=1 width=6) (actual time=0.013..0.017 rows=1 loops=1311)

Join Filter: (region_1.r_regionkey = nation_1.n_regionkey)

Rows Removed by Join Filter: 3

-> Seq Scan on region region_1 (cost=0.00..1.06 rows=1 width=4) (actual time=0.000..0.001 rows=1 loops=1311)

Filter: (r_name = 'ASIA'::bpchar)

Rows Removed by Filter: 4

-> Nested Loop (cost=0.85..47.58 rows=4 width=10) (actual time=0.009..0.016 rows=4 loops=1311)

-> Nested Loop (cost=0.71..46.96 rows=4 width=10) (actual time=0.008..0.012 rows=4 loops=1311)

-> Index Scan using partsupp_pkey on partsupp partsupp_1 (cost=0.42..13.75 rows=4 width=10) (actual time=0.006..0.007 rows=4 loops=1311)

Index Cond: (ps_partkey = part.p_partkey)

-> Index Scan using supplier_pkey on supplier supplier_1 (cost=0.29..8.30 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=5244)

Index Cond: (s_suppkey = partsupp_1.ps_suppkey)

-> Index Scan using nation_pkey on nation nation_1 (cost=0.14..0.16 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=5244)

Index Cond: (n_nationkey = supplier_1.s_nationkey)

Planning Time: 2.613 ms

Execution Time: 268.610 ms

(50 rows)

After applying this patch:

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c

index 5be8da9e09..0f11b1cbdf 100644

--- a/src/backend/optimizer/path/joinpath.c

+++ b/src/backend/optimizer/path/joinpath.c

@@ -449,7 +449,7 @@ paraminfo_get_equal_hashops(PlannerInfo *root, ParamPathInfo *param_info,

*binary_mode = false;

/* Add join clauses from param_info to the hash key */

- if (param_info != NULL)

+ if (false)

{

List *clauses = param_info->ppi_clauses;

The query plan and execution time:

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Limit (cost=6713.81..6713.81 rows=1 width=192) (actual time=86.461..86.532 rows=100 loops=1)

-> Sort (cost=6713.81..6713.81 rows=1 width=192) (actual time=86.460..86.526 rows=100 loops=1)

Sort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name, part.p_partkey

Sort Method: top-N heapsort Memory: 69kB

-> Nested Loop (cost=1000.72..6713.80 rows=1 width=192) (actual time=0.855..86.150 rows=485 loops=1)

Join Filter: (region.r_regionkey = nation.n_regionkey)

-> Nested Loop (cost=1000.72..6712.72 rows=1 width=196) (actual time=0.852..85.448 rows=485 loops=1)

Join Filter: (nation.n_nationkey = supplier.s_nationkey)

Rows Removed by Join Filter: 6381

-> Nested Loop (cost=1000.72..6711.16 rows=1 width=170) (actual time=0.847..83.975 rows=485 loops=1)

-> Nested Loop (cost=1000.43..6710.86 rows=1 width=34) (actual time=0.828..83.116 rows=485 loops=1)

-> Gather (cost=1000.00..6425.40 rows=784 width=30) (actual time=0.579..7.037 rows=826 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Parallel Seq Scan on part (cost=0.00..5347.00 rows=327 width=30) (actual time=0.094..17.686 rows=275 loops=3)

Filter: (((p_type)::text ~~ '%STEEL'::text) AND (p_size = 30))

Rows Removed by Filter: 66391

-> Memoize (cost=0.43..133.03 rows=1 width=14) (actual time=0.076..0.092 rows=1 loops=826)

Cache Key:

Cache Mode: logical

Hits: 0 Misses: 826 Evictions: 825 Overflows: 0 Memory Usage: 1kB

-> Index Scan using partsupp_pkey on partsupp (cost=0.42..133.02 rows=1 width=14) (actual time=0.075..0.091 rows=1 loops=826)

Index Cond: (ps_partkey = part.p_partkey)

Filter: (ps_supplycost = (SubPlan 1))

Rows Removed by Filter: 3

SubPlan 1

-> Aggregate (cost=30.11..30.12 rows=1 width=32) (actual time=0.018..0.018 rows=1 loops=3304)

-> Nested Loop (cost=0.86..30.11 rows=1 width=6) (actual time=0.013..0.017 rows=1 loops=3304)

Join Filter: (region_1.r_regionkey = nation_1.n_regionkey)

Rows Removed by Join Filter: 3

-> Seq Scan on region region_1 (cost=0.00..1.06 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=3304)

Filter: (r_name = 'ASIA'::bpchar)

Rows Removed by Filter: 4

-> Nested Loop (cost=0.86..29.00 rows=4 width=10) (actual time=0.005..0.016 rows=4 loops=3304)

-> Nested Loop (cost=0.72..28.37 rows=4 width=10) (actual time=0.004..0.011 rows=4 loops=3304)

-> Index Scan using partsupp_pkey on partsupp partsupp_1 (cost=0.42..13.75 rows=4 width=10) (actual time=0.002..0.002 rows=4 loops=3304)

Index Cond: (ps_partkey = part.p_partkey)

-> Memoize (cost=0.30..8.31 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=13216)

Cache Key:

Cache Mode: logical

Hits: 0 Misses: 13216 Evictions: 13215 Overflows: 0 Memory Usage: 1kB

-> Index Scan using supplier_pkey on supplier supplier_1 (cost=0.29..8.30 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=13216)

Index Cond: (s_suppkey = partsupp_1.ps_suppkey)

-> Index Scan using nation_pkey on nation nation_1 (cost=0.14..0.16 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=13216)

Index Cond: (n_nationkey = supplier_1.s_nationkey)

-> Index Scan using supplier_pkey on supplier (cost=0.29..0.30 rows=1 width=144) (actual time=0.001..0.001 rows=1 loops=485)

Index Cond: (s_suppkey = partsupp.ps_suppkey)

-> Seq Scan on nation (cost=0.00..1.25 rows=25 width=34) (actual time=0.000..0.001 rows=14 loops=485)

-> Seq Scan on region (cost=0.00..1.06 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=485)

Filter: (r_name = 'ASIA'::bpchar)

Rows Removed by Filter: 2

Planning Time: 2.669 ms

Execution Time: 86.712 ms

(53 rows)

The estimated cost is reduced by 90%, and the execution time is reduced by 68%. The second query plan includes the operation Memoize, while the first query plan does not. I am wondering if we can optimize the logic anywhere to enable the second query plan.

Environment:

For PostgreSQL, I used the default configuration file.

For the hardware, my disk is HDD.

For the benchmark, I used 1 GB data, and my entire data folder can be downloaded here: https://drive.google.com/file/d/1ZBLHanIRwxbaMQIhRUSPv4I7y8g_0AWi/view?usp=sharing

The connection string is: postgresql://ubuntu:ubuntu@127.0.0.1:5432/tpch"

tpch=# select version();

version

--------------------------------------------------------------------------------------------------

PostgreSQL 17.0 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0, 64-bit

(1 row)

Best regards,

Jinsheng Ba

Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.

pgsql-performance by date:

From: Stepan Yankevych
Date: 04 November 2024, 11:23:41
Subject: Re: Postgresql 14/15/16/17 partition pruning on dependent table during join

From: Andrei Lepikhov
Date: 04 November 2024, 15:05:44
Subject: Re: Postgresql 14/15/16/17 partition pruning on dependent table during join

Performance of Query 2 in TPC-H - Mailing list pgsql-performance

Previous

Next