Re: POC: GROUP BY optimization - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: POC: GROUP BY optimization |
Date | |
Msg-id | ba2eccf2-d623-c12c-f7c2-a58ac4917c83@enterprisedb.com Whole thread Raw |
In response to | Re: POC: GROUP BY optimization (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: POC: GROUP BY optimization
|
List | pgsql-hackers |
On 7/20/21 3:12 PM, Tomas Vondra wrote: > ... > > The other comments from the review still apply - I'm particularly > concerned about the (1) point, i.e. plan changes in postgres_fdw. Those > seem to be rather strange (LIMIT not being pushed down in queries > without any grouping). I'd bet this is due to changes in sort costing > and does not seem very desirable. > I did look at this a bit closer, and yes - this very much seems like a costing issue in the patch. The first query in postgres_fdw that changes switches from a query with LIMIT pushed to remote QUERY PLAN ------------------------------------------------------------------------ Foreign Scan (cost=196.29..196.52 rows=10 width=14) Output: t1.c1, t2.c1, t1.c3 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2) Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3 FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON ... LIMIT ... OFFSET ... (4 rows) to the LIMIT executed locally QUERY PLAN ------------------------------------------------------------------------- Limit (cost=241.72..241.94 rows=10 width=14) Output: t1.c1, t2.c1, t1.c3 -> Foreign Scan (cost=239.47..261.97 rows=1000 width=14) Output: t1.c1, t2.c1, t1.c3 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2) Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3 FROM ("S 1"."T 1" r1 INNER JOIN ... (6 rows) The FDW code runs explain on both queries - with and without LIMIT pushed to remote side, to get estimates, and without the patch it gets this: QUERY PLAN ------------------------------------------------------------------------------ Sort (cost=106.97..109.47 rows=1000 width=14) Sort Key: r1.c3, r1."C 1" -> Hash Join (cost=33.50..57.14 rows=1000 width=14) Hash Cond: (r1."C 1" = r2."C 1") -> Seq Scan on "T 1" r1 (cost=0.00..21.00 rows=1000 width=10) -> Hash (cost=21.00..21.00 rows=1000 width=4) -> Seq Scan on "T 1" r2 (cost=0.00..21.00 rows=1000 width=4) (7 rows) QUERY PLAN ------------------------------------------------------------------------------------ Limit (cost=96.29..96.32 rows=10 width=14) -> Sort (cost=96.04..98.54 rows=1000 width=14) Sort Key: r1.c3, r1."C 1" -> Hash Join (cost=33.50..57.14 rows=1000 width=14) Hash Cond: (r1."C 1" = r2."C 1") -> Seq Scan on "T 1" r1 (cost=0.00..21.00 rows=1000 width=10) -> Hash (cost=21.00..21.00 rows=1000 width=4) -> Seq Scan on "T 1" r2 (cost=0.00..21.00 rows=1000 width=4) (8 rows) while with the patch it gets this: QUERY PLAN ------------------------------------------------------------------------------ Sort (cost=139.47..141.97 rows=1000 width=14) Sort Key: r1.c3, r1."C 1" -> Hash Join (cost=33.50..57.14 rows=1000 width=14) Hash Cond: (r1."C 1" = r2."C 1") -> Seq Scan on "T 1" r1 (cost=0.00..21.00 rows=1000 width=10) -> Hash (cost=21.00..21.00 rows=1000 width=4) -> Seq Scan on "T 1" r2 (cost=0.00..21.00 rows=1000 width=4) (7 rows) QUERY PLAN ------------------------------------------------------------------------------------ Limit (cost=145.75..145.77 rows=10 width=14) -> Sort (cost=145.50..148.00 rows=1000 width=14) Sort Key: r1.c3, r1."C 1" -> Hash Join (cost=33.50..57.14 rows=1000 width=14) Hash Cond: (r1."C 1" = r2."C 1") -> Seq Scan on "T 1" r1 (cost=0.00..21.00 rows=1000 width=10) -> Hash (cost=21.00..21.00 rows=1000 width=4) -> Seq Scan on "T 1" r2 (cost=0.00..21.00 rows=1000 width=4) (8 rows) Notice that the costs get "inverted" master patched ---------------------------------------------- no limit 106.97..109.47 139.47..141.97 limit 96.29.. 96.32 145.75..145.77 so the limit looks a bit more expensive - just enough so that it seems cheaper to transfer more rows and execute the limit locally. IMO this looks rather suspicious (or wrong), and compute_cpu_sort_cost may need some fixes. I wonder why differs from the old code so much ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: