Thread: Why is DEFAULT_FDW_TUPLE_COST so insanely low?
Over on [1] I was complaining that I thought DEFAULT_FDW_TUPLE_COST, which is defined as 0.01 was unrealistically low. For comparison, cpu_tuple_cost, something we probably expect to be in a CPU cache is also 0.01. We've defined DEFAULT_PARALLEL_TUPLE_COST to be 0.1, which is 10x cpu_tuple_cost. That's coming from a shared memory segment. So why do we think DEFAULT_FDW_TUPLE_COST should be the same as cpu_tuple_cost when that's probably pulling a tuple from some remote server over some (possibly slow) network? I did a little experiment in the attached .sql file and did some maths to try to figure out what it's really likely to be costing us. I tried this with and without the attached hack to have the planner not consider remote grouping just to see how much slower pulling a million tuples through the FDW would cost. I setup a loopback server on localhost (which has about the lowest possible network latency) and found the patched query to the foreign server took: Execution Time: 530.000 ms This is pulling all million tuples over and doing the aggregate locally. Unpatched, the query took: Execution Time: 35.334 ms so about 15x faster. If I take the seqscan cost for querying the local table, which is 14425.00 multiply that by 15 (the extra time it took to pull the 1 million tuples) then divide by 1 million to get the extra cost per tuple, then that comes to about 0.216. So that says DEFAULT_FDW_TUPLE_COST is about 21x lower than it should be. I tried cranking DEFAULT_FDW_TUPLE_COST up to 0.5 to see what plans would change in the postgres_fdw regression tests and quite a number changed. Many seem to be pushing the sorts down to the remote server where they were being done locally before. A few others just seem weird. For example, the first one seems to be blindly adding a remote sort when it does no good. I think it would take quite a bit of study with a debugger to figure out what's going on with many of these. Does anyone have any ideas why DEFAULT_FDW_TUPLE_COST was set so low? Does anyone object to it being set to something more realistic? David [1] https://www.postgresql.org/message-id/CAApHDvpXiXLxg4TsA8P_4etnuGQqAAbHWEOM4hGe=DCaXmi_jA@mail.gmail.com
Attachment
Has anything been done about this issue? --------------------------------------------------------------------------- On Wed, Aug 3, 2022 at 02:56:12AM +1200, David Rowley wrote: > Over on [1] I was complaining that I thought DEFAULT_FDW_TUPLE_COST, > which is defined as 0.01 was unrealistically low. > > For comparison, cpu_tuple_cost, something we probably expect to be in > a CPU cache is also 0.01. We've defined DEFAULT_PARALLEL_TUPLE_COST > to be 0.1, which is 10x cpu_tuple_cost. That's coming from a shared > memory segment. So why do we think DEFAULT_FDW_TUPLE_COST should be > the same as cpu_tuple_cost when that's probably pulling a tuple from > some remote server over some (possibly slow) network? > > I did a little experiment in the attached .sql file and did some maths > to try to figure out what it's really likely to be costing us. I tried > this with and without the attached hack to have the planner not > consider remote grouping just to see how much slower pulling a million > tuples through the FDW would cost. > > I setup a loopback server on localhost (which has about the lowest > possible network latency) and found the patched query to the foreign > server took: > > Execution Time: 530.000 ms > > This is pulling all million tuples over and doing the aggregate locally. > > Unpatched, the query took: > > Execution Time: 35.334 ms > > so about 15x faster. > > If I take the seqscan cost for querying the local table, which is > 14425.00 multiply that by 15 (the extra time it took to pull the 1 > million tuples) then divide by 1 million to get the extra cost per > tuple, then that comes to about 0.216. So that says > DEFAULT_FDW_TUPLE_COST is about 21x lower than it should be. > > I tried cranking DEFAULT_FDW_TUPLE_COST up to 0.5 to see what plans > would change in the postgres_fdw regression tests and quite a number > changed. Many seem to be pushing the sorts down to the remote server > where they were being done locally before. A few others just seem > weird. For example, the first one seems to be blindly adding a remote > sort when it does no good. I think it would take quite a bit of study > with a debugger to figure out what's going on with many of these. > > Does anyone have any ideas why DEFAULT_FDW_TUPLE_COST was set so low? > > Does anyone object to it being set to something more realistic? > > David > > [1] https://www.postgresql.org/message-id/CAApHDvpXiXLxg4TsA8P_4etnuGQqAAbHWEOM4hGe=DCaXmi_jA@mail.gmail.com > diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c > index 64632db73c..b4e3b91d7f 100644 > --- a/src/backend/optimizer/plan/planner.c > +++ b/src/backend/optimizer/plan/planner.c > @@ -3921,7 +3921,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel, > * If there is an FDW that's responsible for all baserels of the query, > * let it consider adding ForeignPaths. > */ > - if (grouped_rel->fdwroutine && > + if (0 && grouped_rel->fdwroutine && > grouped_rel->fdwroutine->GetForeignUpperPaths) > grouped_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_GROUP_AGG, > input_rel, grouped_rel, > ALTER SYSTEM SET max_parallel_workers_per_gather = 0; > SELECT pg_reload_conf(); > > CREATE EXTENSION postgres_fdw; > CREATE EXTENSION pg_prewarm; > > > DO $d$ > BEGIN > EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw > OPTIONS (dbname '$$||current_database()||$$', > port '$$||current_setting('port')||$$' > )$$; > END; > $d$; > > CREATE USER MAPPING FOR CURRENT_USER SERVER loopback; > > CREATE TABLE public.t (a INT); > INSERT INTO t SELECT x FROM generate_series(1,1000000) x; > VACUUM FREEZE ANALYZE t; > SELECT pg_prewarm('t'); > > CREATE FOREIGN TABLE ft ( > a INT > ) SERVER loopback OPTIONS (schema_name 'public', table_name 't'); > > EXPLAIN (ANALYZE) SELECT COUNT(*) FROM ft; > EXPLAIN (ANALYZE) SELECT COUNT(*) FROM t; > > DROP FOREIGN TABLE ft; > DROP TABLE t; > DROP SERVER loopback CASCADE; > ALTER SYSTEM RESET max_parallel_workers_per_gather; > SELECT pg_reload_conf(); > --- "expected\\postgres_fdw.out" 2022-08-03 01:34:42.806967000 +1200 > +++ "results\\postgres_fdw.out" 2022-08-03 02:33:40.719712900 +1200 > @@ -2164,8 +2164,8 @@ > -- unsafe conditions on one side (c8 has a UDT), not pushed down. > EXPLAIN (VERBOSE, COSTS OFF) > SELECT t1.c1, t2.c1 FROM ft1 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = 'foo' ORDER BY t1.c3, t1.c1 OFFSET 100LIMIT 10; > - QUERY PLAN > ------------------------------------------------------------------------------ > + QUERY PLAN > +------------------------------------------------------------------------------------------------------------------------------ > Limit > Output: t1.c1, t2.c1, t1.c3 > -> Sort > @@ -2182,7 +2182,7 @@ > -> Foreign Scan on public.ft1 t1 > Output: t1.c1, t1.c3 > Filter: (t1.c8 = 'foo'::user_enum) > - Remote SQL: SELECT "C 1", c3, c8 FROM "S 1"."T 1" > + Remote SQL: SELECT "C 1", c3, c8 FROM "S 1"."T 1" ORDER BY c3 ASC NULLS LAST, "C 1" ASC NULLSLAST > (17 rows) > > SELECT t1.c1, t2.c1 FROM ft1 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = 'foo' ORDER BY t1.c3, t1.c1 OFFSET 100LIMIT 10; > @@ -2873,13 +2873,13 @@ > Sort > Output: (sum(c1)), c2 > Sort Key: (sum(ft1.c1)) > - -> HashAggregate > + -> GroupAggregate > Output: sum(c1), c2 > Group Key: ft1.c2 > Filter: (avg((ft1.c1 * ((random() <= '1'::double precision))::integer)) > '100'::numeric) > -> Foreign Scan on public.ft1 > Output: c1, c2 > - Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" > + Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" ORDER BY c2 ASC NULLS LAST > (10 rows) > > -- Remote aggregate in combination with a local Param (for the output > @@ -3123,12 +3123,12 @@ > Sort > Output: (sum(c1) FILTER (WHERE ((((c1 / c1))::double precision * random()) <= '1'::double precision))), c2 > Sort Key: (sum(ft1.c1) FILTER (WHERE ((((ft1.c1 / ft1.c1))::double precision * random()) <= '1'::double precision))) > - -> HashAggregate > + -> GroupAggregate > Output: sum(c1) FILTER (WHERE ((((c1 / c1))::double precision * random()) <= '1'::double precision)), c2 > Group Key: ft1.c2 > -> Foreign Scan on public.ft1 > Output: c1, c2 > - Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" > + Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" ORDER BY c2 ASC NULLS LAST > (9 rows) > > explain (verbose, costs off) > @@ -3885,24 +3885,21 @@ > -- subquery using stable function (can't be sent to remote) > PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c4)= '1970-01-17'::date) ORDER BY c1; > EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st2(10, 20); > - QUERY PLAN > ----------------------------------------------------------------------------------------------------------- > - Sort > + QUERY PLAN > +---------------------------------------------------------------------------------------------------------------------------------- > + Nested Loop Semi Join > Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 > - Sort Key: t1.c1 > - -> Nested Loop Semi Join > + Join Filter: (t1.c3 = t2.c3) > + -> Foreign Scan on public.ft1 t1 > Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 > - Join Filter: (t1.c3 = t2.c3) > - -> Foreign Scan on public.ft1 t1 > - Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 > - Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20)) > - -> Materialize > + Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20)) ORDER BY "C 1" ASCNULLS LAST > + -> Materialize > + Output: t2.c3 > + -> Foreign Scan on public.ft2 t2 > Output: t2.c3 > - -> Foreign Scan on public.ft2 t2 > - Output: t2.c3 > - Filter: (date(t2.c4) = '01-17-1970'::date) > - Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10)) > -(15 rows) > + Filter: (date(t2.c4) = '01-17-1970'::date) > + Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10)) > +(12 rows) > > EXECUTE st2(10, 20); > c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 > @@ -9381,21 +9378,19 @@ > -- test FOR UPDATE; partitionwise join does not apply > EXPLAIN (COSTS OFF) > SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a % 25 = 0 ORDER BY 1,2 FOR UPDATE OF t1; > - QUERY PLAN > --------------------------------------------------------------- > + QUERY PLAN > +-------------------------------------------------------- > LockRows > - -> Sort > - Sort Key: t1.a > - -> Hash Join > - Hash Cond: (t2.b = t1.a) > + -> Nested Loop > + Join Filter: (t1.a = t2.b) > + -> Append > + -> Foreign Scan on ftprt1_p1 t1_1 > + -> Foreign Scan on ftprt1_p2 t1_2 > + -> Materialize > -> Append > -> Foreign Scan on ftprt2_p1 t2_1 > -> Foreign Scan on ftprt2_p2 t2_2 > - -> Hash > - -> Append > - -> Foreign Scan on ftprt1_p1 t1_1 > - -> Foreign Scan on ftprt1_p2 t1_2 > -(12 rows) > +(10 rows) > > SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a % 25 = 0 ORDER BY 1,2 FOR UPDATE OF t1; > a | b > @@ -9430,18 +9425,16 @@ > SET enable_partitionwise_aggregate TO false; > EXPLAIN (COSTS OFF) > SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1; > - QUERY PLAN > ------------------------------------------------------------ > - Sort > - Sort Key: pagg_tab.a > - -> HashAggregate > - Group Key: pagg_tab.a > - Filter: (avg(pagg_tab.b) < '22'::numeric) > - -> Append > - -> Foreign Scan on fpagg_tab_p1 pagg_tab_1 > - -> Foreign Scan on fpagg_tab_p2 pagg_tab_2 > - -> Foreign Scan on fpagg_tab_p3 pagg_tab_3 > -(9 rows) > + QUERY PLAN > +----------------------------------------------------- > + GroupAggregate > + Group Key: pagg_tab.a > + Filter: (avg(pagg_tab.b) < '22'::numeric) > + -> Append > + -> Foreign Scan on fpagg_tab_p1 pagg_tab_1 > + -> Foreign Scan on fpagg_tab_p2 pagg_tab_2 > + -> Foreign Scan on fpagg_tab_p3 pagg_tab_3 > +(7 rows) > > -- Plan with partitionwise aggregates is enabled > SET enable_partitionwise_aggregate TO true; > @@ -9475,34 +9468,32 @@ > -- Should have all the columns in the target list for the given relation > EXPLAIN (VERBOSE, COSTS OFF) > SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1; > - QUERY PLAN > ------------------------------------------------------------------------- > - Sort > - Output: t1.a, (count(((t1.*)::pagg_tab))) > + QUERY PLAN > +-------------------------------------------------------------------------------------------- > + Merge Append > Sort Key: t1.a > - -> Append > - -> HashAggregate > - Output: t1.a, count(((t1.*)::pagg_tab)) > - Group Key: t1.a > - Filter: (avg(t1.b) < '22'::numeric) > - -> Foreign Scan on public.fpagg_tab_p1 t1 > - Output: t1.a, t1.*, t1.b > - Remote SQL: SELECT a, b, c FROM public.pagg_tab_p1 > - -> HashAggregate > - Output: t1_1.a, count(((t1_1.*)::pagg_tab)) > - Group Key: t1_1.a > - Filter: (avg(t1_1.b) < '22'::numeric) > - -> Foreign Scan on public.fpagg_tab_p2 t1_1 > - Output: t1_1.a, t1_1.*, t1_1.b > - Remote SQL: SELECT a, b, c FROM public.pagg_tab_p2 > - -> HashAggregate > - Output: t1_2.a, count(((t1_2.*)::pagg_tab)) > - Group Key: t1_2.a > - Filter: (avg(t1_2.b) < '22'::numeric) > - -> Foreign Scan on public.fpagg_tab_p3 t1_2 > - Output: t1_2.a, t1_2.*, t1_2.b > - Remote SQL: SELECT a, b, c FROM public.pagg_tab_p3 > -(25 rows) > + -> GroupAggregate > + Output: t1.a, count(((t1.*)::pagg_tab)) > + Group Key: t1.a > + Filter: (avg(t1.b) < '22'::numeric) > + -> Foreign Scan on public.fpagg_tab_p1 t1 > + Output: t1.a, t1.*, t1.b > + Remote SQL: SELECT a, b, c FROM public.pagg_tab_p1 ORDER BY a ASC NULLS LAST > + -> GroupAggregate > + Output: t1_1.a, count(((t1_1.*)::pagg_tab)) > + Group Key: t1_1.a > + Filter: (avg(t1_1.b) < '22'::numeric) > + -> Foreign Scan on public.fpagg_tab_p2 t1_1 > + Output: t1_1.a, t1_1.*, t1_1.b > + Remote SQL: SELECT a, b, c FROM public.pagg_tab_p2 ORDER BY a ASC NULLS LAST > + -> GroupAggregate > + Output: t1_2.a, count(((t1_2.*)::pagg_tab)) > + Group Key: t1_2.a > + Filter: (avg(t1_2.b) < '22'::numeric) > + -> Foreign Scan on public.fpagg_tab_p3 t1_2 > + Output: t1_2.a, t1_2.*, t1_2.b > + Remote SQL: SELECT a, b, c FROM public.pagg_tab_p3 ORDER BY a ASC NULLS LAST > +(23 rows) > > SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1; > a | count > @@ -9518,24 +9509,23 @@ > -- When GROUP BY clause does not match with PARTITION KEY. > EXPLAIN (COSTS OFF) > SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 ORDER BY 1; > - QUERY PLAN > ------------------------------------------------------------------ > - Sort > - Sort Key: pagg_tab.b > - -> Finalize HashAggregate > - Group Key: pagg_tab.b > - Filter: (sum(pagg_tab.a) < 700) > - -> Append > - -> Partial HashAggregate > - Group Key: pagg_tab.b > - -> Foreign Scan on fpagg_tab_p1 pagg_tab > - -> Partial HashAggregate > - Group Key: pagg_tab_1.b > - -> Foreign Scan on fpagg_tab_p2 pagg_tab_1 > - -> Partial HashAggregate > - Group Key: pagg_tab_2.b > - -> Foreign Scan on fpagg_tab_p3 pagg_tab_2 > -(15 rows) > + QUERY PLAN > +----------------------------------------------------------- > + Finalize GroupAggregate > + Group Key: pagg_tab.b > + Filter: (sum(pagg_tab.a) < 700) > + -> Merge Append > + Sort Key: pagg_tab.b > + -> Partial GroupAggregate > + Group Key: pagg_tab.b > + -> Foreign Scan on fpagg_tab_p1 pagg_tab > + -> Partial GroupAggregate > + Group Key: pagg_tab_1.b > + -> Foreign Scan on fpagg_tab_p2 pagg_tab_1 > + -> Partial GroupAggregate > + Group Key: pagg_tab_2.b > + -> Foreign Scan on fpagg_tab_p3 pagg_tab_2 > +(14 rows) > > -- =================================================================== > -- access rights and superuser -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Sun, 29 Oct 2023 at 12:45, Bruce Momjian <bruce@momjian.us> wrote: > Has anything been done about this issue? Nothing has been done. I was hoping to get the attention of a few people who have dealt more with postgres_fdw in the past. I've attached a patch with adjusts DEFAULT_FDW_TUPLE_COST and sets it to 0.2. I set it to this because my experiment in [1] showed that it was about 21x lower than the actual costs (on my machine with a loopback fdw connecting to the same instance and database using my example query). Given that we have parallel_tuple_cost set to 0.1 by default, the network cost of a tuple from an FDW of 0.2 seems reasonable to me. Slightly higher is probably also reasonable, but given the seeming lack of complaints, I think I'd rather err on the low side. Changing it to 0.2, I see 4 plans change in postgres_fdw's regression tests. All of these changes are due to STD_FUZZ_FACTOR causing some other plan to win in add_path(). For example the query EXPLAIN (VERBOSE, ANALYZE) SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1; the plan switches from a HashAggregate to a GroupAggregate. This is because after increasing the DEFAULT_FDW_TUPLE_COST to 0.2 the sorted append child (fuzzily) costs the same as the unsorted seq scan path and the sorted path wins in add_path due to having better pathkeys. The seq scan path is then thrown away and we end up doing the Group Aggregate using the sorted append children. If I change STD_FUZZ_FACTOR to something like 1.0000001 then the plans no longer change when I do: alter server loopback options (add fdw_tuple_cost '0.01'); <run the query> alter server loopback options (drop fdw_tuple_cost); <run the query> Ordinarily, I'd not care too much about that, but I did test the performance of one of the plans and the new plan came out slower than the old one. I'm not exactly sure how best to proceed on this in the absence of any feedback. David [1] https://postgr.es/m/CAApHDvopVjjfh5c1Ed2HRvDdfom2dEpMwwiu5-f1AnmYprJngA@mail.gmail.com
Attachment
Looks like the value goes long back to d0d75c402217421b691050857eb3d7af82d0c770. The comment there adds "above and beyond cpu_tuple_cost". So certainly it's expected to be higher than cpu_tuple_cost. I have no memories of this. But looking at the surrounding code, I think DEFAULT_FDW_STARTUP_COST takes care of network costs and bandwidths. So DEFAULT_FDW_TUPLE_COST is just assembling row from bytes on network. That might have been equated to assembling row from heap buffer. But I think you are right, it should be comparable to the parallel tuple cost which at least is IPC like socket. This will also mean that operations which reduce the number of rows will be favoured and pushed down. That's what is desired. -- Best Wishes, Ashutosh Bapat On Mon, Oct 30, 2023 at 6:52 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Sun, 29 Oct 2023 at 12:45, Bruce Momjian <bruce@momjian.us> wrote: > > Has anything been done about this issue? > > Nothing has been done. I was hoping to get the attention of a few > people who have dealt more with postgres_fdw in the past. > > I've attached a patch with adjusts DEFAULT_FDW_TUPLE_COST and sets it > to 0.2. I set it to this because my experiment in [1] showed that it > was about 21x lower than the actual costs (on my machine with a > loopback fdw connecting to the same instance and database using my > example query). Given that we have parallel_tuple_cost set to 0.1 by > default, the network cost of a tuple from an FDW of 0.2 seems > reasonable to me. Slightly higher is probably also reasonable, but > given the seeming lack of complaints, I think I'd rather err on the > low side. > > Changing it to 0.2, I see 4 plans change in postgres_fdw's regression > tests. All of these changes are due to STD_FUZZ_FACTOR causing some > other plan to win in add_path(). > > For example the query EXPLAIN (VERBOSE, ANALYZE) SELECT a, sum(b), > min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY > 1; the plan switches from a HashAggregate to a GroupAggregate. This is > because after increasing the DEFAULT_FDW_TUPLE_COST to 0.2 the sorted > append child (fuzzily) costs the same as the unsorted seq scan path > and the sorted path wins in add_path due to having better pathkeys. > The seq scan path is then thrown away and we end up doing the Group > Aggregate using the sorted append children. > > If I change STD_FUZZ_FACTOR to something like 1.0000001 then the plans > no longer change when I do: > > alter server loopback options (add fdw_tuple_cost '0.01'); > <run the query> > alter server loopback options (drop fdw_tuple_cost); > <run the query> > > Ordinarily, I'd not care too much about that, but I did test the > performance of one of the plans and the new plan came out slower than > the old one. > > I'm not exactly sure how best to proceed on this in the absence of any feedback. > > David > > [1] https://postgr.es/m/CAApHDvopVjjfh5c1Ed2HRvDdfom2dEpMwwiu5-f1AnmYprJngA@mail.gmail.com
On Mon, Oct 30, 2023 at 02:22:08PM +1300, David Rowley wrote: > If I change STD_FUZZ_FACTOR to something like 1.0000001 then the plans > no longer change when I do: > > alter server loopback options (add fdw_tuple_cost '0.01'); > <run the query> > alter server loopback options (drop fdw_tuple_cost); > <run the query> > > Ordinarily, I'd not care too much about that, but I did test the > performance of one of the plans and the new plan came out slower than > the old one. > > I'm not exactly sure how best to proceed on this in the absence of any feedback. I think you just go and change it. Your number is better than what we have, and if someone wants to suggest a better number, we can change it later. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Tue, 31 Oct 2023 at 03:01, Bruce Momjian <bruce@momjian.us> wrote: > I think you just go and change it. Your number is better than what we > have, and if someone wants to suggest a better number, we can change it > later. I did some more experimentation on the actual costs of getting a tuple from a foreign server. Using the attached setup, I did: postgres=# explain (analyze, timing off) SELECT * FROM t; QUERY PLAN -------------------------------------------------------------------------------------------- Seq Scan on t (cost=0.00..144248.48 rows=10000048 width=4) (actual rows=10000000 loops=1) Planning Time: 0.077 ms Execution Time: 385.978 ms postgres=# explain (analyze, timing off) SELECT * FROM ft; QUERY PLAN --------------------------------------------------------------------------------------------------- Foreign Scan on ft (cost=100.00..244348.96 rows=10000048 width=4) (actual rows=10000000 loops=1) Planning Time: 0.126 ms Execution Time: 8335.392 ms So, let's take the first query and figure out the total cost per millisecond of execution time. We can then multiply that by the execution time of the 2nd query to calculate what we might expect the costs to be for the foreign table scan based on how long it took compared to the local table scan. postgres=# select 144248.48/385.978*8335.392; ?column? ----------------------------- 3115119.5824740270171341280 So, the above number is what we expect the foreign table scan to cost with the assumption that the cost per millisecond is about right for the local scan. We can then calculate how much we'll need to charge for a foreign tuple by subtracting the total cost of that query from our calculated value to calculate how much extra we need to charge, in total, then divide that by the number of tuples to get actual foreign tuple cost for this query. postgres=# select (3115119.58-244348.96)/10000000; ?column? ------------------------ 0.28707706200000000000 This is on an AMD 3990x running Linux 6.5 kernel. I tried the same on an Apple M2 mini and got: postgres=# select 144247.77/257.763*3052.084; ?column? ----------------------------- 1707988.7759402241595200680 postgres=# select (1707988.78-244347.54)/10000000; ?column? ------------------------ 0.14636412400000000000 So the actual foreign tuple cost on the M2 seems about half of what it is on the Zen 2 machine. Based on this, I agree with my original analysis that setting DEFAULT_FDW_TUPLE_COST to 0.2 is about right. Of course, this is a loopback onto localhost so remote networks likely would benefit from higher values, but based on this 0.01 is far too low and we should change it to at least 0.2. I'd be happy if anyone else would like to try the same experiment to see if there's some other value of DEFAULT_FDW_TUPLE_COST that might suit better. David
Attachment
On Tue, 31 Oct 2023 at 11:16, David Rowley <dgrowleyml@gmail.com> wrote: > I'd be happy if anyone else would like to try the same experiment to > see if there's some other value of DEFAULT_FDW_TUPLE_COST that might > suit better. No takers on the additional testing so I've pushed the patch that increases it to 0.2. David
On Thu, Nov 2, 2023 at 02:32:44PM +1300, David Rowley wrote: > On Tue, 31 Oct 2023 at 11:16, David Rowley <dgrowleyml@gmail.com> wrote: > > I'd be happy if anyone else would like to try the same experiment to > > see if there's some other value of DEFAULT_FDW_TUPLE_COST that might > > suit better. > > No takers on the additional testing so I've pushed the patch that > increases it to 0.2. Great! Thanks. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Thu, Nov 02, 2023 at 02:32:44PM +1300, David Rowley wrote: > No takers on the additional testing so I've pushed the patch that > increases it to 0.2. The CI has been telling me that the plans of the tests introduced by this patch are not that stable when building with 32b. See: diff -U3 /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out /tmp/cirrus-ci-build/build-32/testrun/postgres_fdw/regress/results/postgres_fdw.out --- /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out 2023-11-02 05:25:47.290268511 +0000 +++ /tmp/cirrus-ci-build/build-32/testrun/postgres_fdw/regress/results/postgres_fdw.out 2023-11-02 05:30:45.242316423+0000 @@ -4026,13 +4026,13 @@ Sort Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 Sort Key: t1.c1 - -> Nested Loop Semi Join + -> Hash Semi Join Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 - Join Filter: (t2.c3 = t1.c3) + Hash Cond: (t1.c3 = t2.c3) -> Foreign Scan on public.ft1 t1 Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20)) - -> Materialize + -> Hash Output: t2.c3 -> Foreign Scan on public.ft2 t2 Output: t2.c3 -- Michael
Attachment
On Thu, 2 Nov 2023 at 18:39, Michael Paquier <michael@paquier.xyz> wrote: > The CI has been telling me that the plans of the tests introduced by > this patch are not that stable when building with 32b. See: > diff -U3 /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out /tmp/cirrus-ci-build/build-32/testrun/postgres_fdw/regress/results/postgres_fdw.out > --- /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out 2023-11-02 05:25:47.290268511 +0000 > +++ /tmp/cirrus-ci-build/build-32/testrun/postgres_fdw/regress/results/postgres_fdw.out 2023-11-02 05:30:45.242316423 +0000 > @@ -4026,13 +4026,13 @@ > Sort > Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 > Sort Key: t1.c1 > - -> Nested Loop Semi Join > + -> Hash Semi Join > Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 > - Join Filter: (t2.c3 = t1.c3) > + Hash Cond: (t1.c3 = t2.c3) > -> Foreign Scan on public.ft1 t1 > Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 > Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20)) > - -> Materialize > + -> Hash > Output: t2.c3 > -> Foreign Scan on public.ft2 t2 > Output: t2.c3 No tests were introduced. Is this the only existing one that's unstable as far as you saw? I'm not yet seeing any failures in the buildfarm, so don't really want to push a fix for this one if there are going to be a few more unstable ones to fix. I may just hold off a while to see. Thanks for letting me know about this. David
On Thu, Nov 02, 2023 at 08:19:35PM +1300, David Rowley wrote: > No tests were introduced. Is this the only existing one that's > unstable as far as you saw? That seems to be the only one. > I'm not yet seeing any failures in the buildfarm, so don't really want > to push a fix for this one if there are going to be a few more > unstable ones to fix. I may just hold off a while to see. The CF bot is also thinking that this is not really stable, impacting the tests of the patches: https://cirrus-ci.com/task/6685074121293824 https://cirrus-ci.com/task/4739402799251456 https://cirrus-ci.com/task/5209803589419008 -- Michael
Attachment
On Thu, Nov 2, 2023 at 3:19 PM David Rowley <dgrowleyml@gmail.com> wrote:
I'm not yet seeing any failures in the buildfarm, so don't really want
to push a fix for this one if there are going to be a few more
unstable ones to fix. I may just hold off a while to see.
It seems that the test is still not stable on 32-bit machines even after
4b14e18714. I see the following plan diff on cfbot [1].
--- /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out 2023-11-02 11:35:12.016196978 +0000
+++ /tmp/cirrus-ci-build/build-32/testrun/postgres_fdw/regress/results/postgres_fdw.out 2023-11-02 11:42:09.092242808 +0000
@@ -4022,24 +4022,21 @@
-- subquery using stable function (can't be sent to remote)
PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c4) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st2(10, 20);
- QUERY PLAN
-----------------------------------------------------------------------------------------------------------
- Sort
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Nested Loop Semi Join
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Sort Key: t1.c1
- -> Nested Loop Semi Join
+ Join Filter: (t2.c3 = t1.c3)
+ -> Foreign Scan on public.ft1 t1
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Join Filter: (t2.c3 = t1.c3)
- -> Foreign Scan on public.ft1 t1
- Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20))
- -> Materialize
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20)) ORDER BY "C 1" ASC NULLS LAST
+ -> Materialize
+ Output: t2.c3
+ -> Foreign Scan on public.ft2 t2
Output: t2.c3
- -> Foreign Scan on public.ft2 t2
- Output: t2.c3
- Filter: (date(t2.c4) = '01-17-1970'::date)
- Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10))
-(15 rows)
+ Filter: (date(t2.c4) = '01-17-1970'::date)
+ Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10))
+(12 rows)
[1] https://api.cirrus-ci.com/v1/artifact/task/5727898984775680/testrun/build-32/testrun/postgres_fdw/regress/regression.diffs
Thanks
Richard
4b14e18714. I see the following plan diff on cfbot [1].
--- /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out 2023-11-02 11:35:12.016196978 +0000
+++ /tmp/cirrus-ci-build/build-32/testrun/postgres_fdw/regress/results/postgres_fdw.out 2023-11-02 11:42:09.092242808 +0000
@@ -4022,24 +4022,21 @@
-- subquery using stable function (can't be sent to remote)
PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c4) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st2(10, 20);
- QUERY PLAN
-----------------------------------------------------------------------------------------------------------
- Sort
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Nested Loop Semi Join
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Sort Key: t1.c1
- -> Nested Loop Semi Join
+ Join Filter: (t2.c3 = t1.c3)
+ -> Foreign Scan on public.ft1 t1
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Join Filter: (t2.c3 = t1.c3)
- -> Foreign Scan on public.ft1 t1
- Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20))
- -> Materialize
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20)) ORDER BY "C 1" ASC NULLS LAST
+ -> Materialize
+ Output: t2.c3
+ -> Foreign Scan on public.ft2 t2
Output: t2.c3
- -> Foreign Scan on public.ft2 t2
- Output: t2.c3
- Filter: (date(t2.c4) = '01-17-1970'::date)
- Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10))
-(15 rows)
+ Filter: (date(t2.c4) = '01-17-1970'::date)
+ Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10))
+(12 rows)
[1] https://api.cirrus-ci.com/v1/artifact/task/5727898984775680/testrun/build-32/testrun/postgres_fdw/regress/regression.diffs
Thanks
Richard
On Fri, 3 Nov 2023 at 01:02, Richard Guo <guofenglinux@gmail.com> wrote: > It seems that the test is still not stable on 32-bit machines even after > 4b14e18714. I see the following plan diff on cfbot [1]. I recreated that locally this time. Seems there's still flexibility to push or not push down the sort and the costs of each are close enough that it differs between 32 and 64-bit. The fix I just pushed removes the flexibility for doing a local sort by turning off enable_sort. Thanks for the report. David