Thread: Why is DEFAULT_FDW_TUPLE_COST so insanely low?

Why is DEFAULT_FDW_TUPLE_COST so insanely low?

From

David Rowley

Date:

02 August 2022, 14:56:12

Over on [1] I was complaining that I thought DEFAULT_FDW_TUPLE_COST,
which is defined as 0.01 was unrealistically low.

For comparison, cpu_tuple_cost, something we probably expect to be in
a CPU cache is also 0.01. We've defined DEFAULT_PARALLEL_TUPLE_COST
to be 0.1, which is 10x cpu_tuple_cost. That's coming from a shared
memory segment. So why do we think DEFAULT_FDW_TUPLE_COST should be
the same as cpu_tuple_cost when that's probably pulling a tuple from
some remote server over some (possibly slow) network?

I did a little experiment in the attached .sql file and did some maths
to try to figure out what it's really likely to be costing us. I tried
this with and without the attached hack to have the planner not
consider remote grouping just to see how much slower pulling a million
tuples through the FDW would cost.

I setup a loopback server on localhost (which has about the lowest
possible network latency) and found the patched query to the foreign
server took:

Execution Time: 530.000 ms

This is pulling all million tuples over and doing the aggregate locally.

Unpatched, the query took:

Execution Time: 35.334 ms

so about 15x faster.

If I take the seqscan cost for querying the local table, which is
14425.00 multiply that by 15 (the extra time it took to pull the 1
million tuples) then divide by 1 million to get the extra cost per
tuple, then that comes to about 0.216. So that says
DEFAULT_FDW_TUPLE_COST is about 21x lower than it should be.

I tried cranking DEFAULT_FDW_TUPLE_COST up to 0.5 to see what plans
would change in the postgres_fdw regression tests and quite a number
changed. Many seem to be pushing the sorts down to the remote server
where they were being done locally before. A few others just seem
weird. For example, the first one seems to be blindly adding a remote
sort when it does no good. I think it would take quite a bit of study
with a debugger to figure out what's going on with many of these.

Does anyone have any ideas why DEFAULT_FDW_TUPLE_COST was set so low?

Does anyone object to it being set to something more realistic?

David

[1] https://www.postgresql.org/message-id/CAApHDvpXiXLxg4TsA8P_4etnuGQqAAbHWEOM4hGe=DCaXmi_jA@mail.gmail.com

Attachment

Re: Why is DEFAULT_FDW_TUPLE_COST so insanely low?

From

Bruce Momjian

Date:

28 October 2023, 23:45:08

Has anything been done about this issue?

---------------------------------------------------------------------------

On Wed, Aug  3, 2022 at 02:56:12AM +1200, David Rowley wrote:
> Over on [1] I was complaining that I thought DEFAULT_FDW_TUPLE_COST,
> which is defined as 0.01 was unrealistically low.
> 
> For comparison, cpu_tuple_cost, something we probably expect to be in
> a CPU cache is also 0.01.  We've defined DEFAULT_PARALLEL_TUPLE_COST
> to be 0.1, which is 10x cpu_tuple_cost.  That's coming from a shared
> memory segment.  So why do we think DEFAULT_FDW_TUPLE_COST should be
> the same as cpu_tuple_cost when that's probably pulling a tuple from
> some remote server over some (possibly slow) network?
> 
> I did a little experiment in the attached .sql file and did some maths
> to try to figure out what it's really likely to be costing us. I tried
> this with and without the attached hack to have the planner not
> consider remote grouping just to see how much slower pulling a million
> tuples through the FDW would cost.
> 
> I setup a loopback server on localhost (which has about the lowest
> possible network latency) and found the patched query to the foreign
> server took:
> 
> Execution Time: 530.000 ms
> 
> This is pulling all million tuples over and doing the aggregate locally.
> 
> Unpatched, the query took:
> 
> Execution Time: 35.334 ms
> 
> so about 15x faster.
> 
> If I take the seqscan cost for querying the local table, which is
> 14425.00 multiply that by 15 (the extra time it took to pull the 1
> million tuples) then divide by 1 million to get the extra cost per
> tuple, then that comes to about 0.216.  So that says
> DEFAULT_FDW_TUPLE_COST is about 21x lower than it should be.
> 
> I tried cranking DEFAULT_FDW_TUPLE_COST up to 0.5 to see what plans
> would change in the postgres_fdw regression tests and quite a number
> changed. Many seem to be pushing the sorts down to the remote server
> where they were being done locally before. A few others just seem
> weird. For example, the first one seems to be blindly adding a remote
> sort when it does no good. I think it would take quite a bit of study
> with a debugger to figure out what's going on with many of these.
> 
> Does anyone have any ideas why DEFAULT_FDW_TUPLE_COST was set so low?
> 
> Does anyone object to it being set to something more realistic?
> 
> David
> 
> [1] https://www.postgresql.org/message-id/CAApHDvpXiXLxg4TsA8P_4etnuGQqAAbHWEOM4hGe=DCaXmi_jA@mail.gmail.com

> diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
> index 64632db73c..b4e3b91d7f 100644
> --- a/src/backend/optimizer/plan/planner.c
> +++ b/src/backend/optimizer/plan/planner.c
> @@ -3921,7 +3921,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
>       * If there is an FDW that's responsible for all baserels of the query,
>       * let it consider adding ForeignPaths.
>       */
> -    if (grouped_rel->fdwroutine &&
> +    if (0 && grouped_rel->fdwroutine &&
>          grouped_rel->fdwroutine->GetForeignUpperPaths)
>          grouped_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_GROUP_AGG,
>                                                        input_rel, grouped_rel,

> ALTER SYSTEM SET max_parallel_workers_per_gather = 0;
> SELECT pg_reload_conf();
> 
> CREATE EXTENSION postgres_fdw;
> CREATE EXTENSION pg_prewarm;
> 
> 
> DO $d$
>     BEGIN
>         EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw
>             OPTIONS (dbname '$$||current_database()||$$',
>                      port '$$||current_setting('port')||$$'
>             )$$;
>     END;
> $d$;
> 
> CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
> 
> CREATE TABLE public.t (a INT);
> INSERT INTO t SELECT x FROM generate_series(1,1000000) x;
> VACUUM FREEZE ANALYZE t;
> SELECT pg_prewarm('t');
> 
> CREATE FOREIGN TABLE ft (
>     a INT
> ) SERVER loopback  OPTIONS (schema_name 'public', table_name 't');
> 
> EXPLAIN (ANALYZE) SELECT COUNT(*) FROM ft;
> EXPLAIN (ANALYZE) SELECT COUNT(*) FROM t;
> 
> DROP FOREIGN TABLE ft;
> DROP TABLE t;
> DROP SERVER loopback CASCADE;
> ALTER SYSTEM RESET max_parallel_workers_per_gather;
> SELECT pg_reload_conf();

> --- "expected\\postgres_fdw.out"    2022-08-03 01:34:42.806967000 +1200
> +++ "results\\postgres_fdw.out"    2022-08-03 02:33:40.719712900 +1200
> @@ -2164,8 +2164,8 @@
>  -- unsafe conditions on one side (c8 has a UDT), not pushed down.
>  EXPLAIN (VERBOSE, COSTS OFF)
>  SELECT t1.c1, t2.c1 FROM ft1 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = 'foo' ORDER BY t1.c3, t1.c1 OFFSET
100LIMIT 10;
 
> -                                 QUERY PLAN                                  
> ------------------------------------------------------------------------------
> +                                                          QUERY PLAN
        
 
>
+------------------------------------------------------------------------------------------------------------------------------
>   Limit
>     Output: t1.c1, t2.c1, t1.c3
>     ->  Sort
> @@ -2182,7 +2182,7 @@
>                       ->  Foreign Scan on public.ft1 t1
>                             Output: t1.c1, t1.c3
>                             Filter: (t1.c8 = 'foo'::user_enum)
> -                           Remote SQL: SELECT "C 1", c3, c8 FROM "S 1"."T 1"
> +                           Remote SQL: SELECT "C 1", c3, c8 FROM "S 1"."T 1" ORDER BY c3 ASC NULLS LAST, "C 1" ASC
NULLSLAST
 
>  (17 rows)
>  
>  SELECT t1.c1, t2.c1 FROM ft1 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = 'foo' ORDER BY t1.c3, t1.c1 OFFSET
100LIMIT 10;
 
> @@ -2873,13 +2873,13 @@
>   Sort
>     Output: (sum(c1)), c2
>     Sort Key: (sum(ft1.c1))
> -   ->  HashAggregate
> +   ->  GroupAggregate
>           Output: sum(c1), c2
>           Group Key: ft1.c2
>           Filter: (avg((ft1.c1 * ((random() <= '1'::double precision))::integer)) > '100'::numeric)
>           ->  Foreign Scan on public.ft1
>                 Output: c1, c2
> -               Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1"
> +               Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" ORDER BY c2 ASC NULLS LAST
>  (10 rows)
>  
>  -- Remote aggregate in combination with a local Param (for the output
> @@ -3123,12 +3123,12 @@
>   Sort
>     Output: (sum(c1) FILTER (WHERE ((((c1 / c1))::double precision * random()) <= '1'::double precision))), c2
>     Sort Key: (sum(ft1.c1) FILTER (WHERE ((((ft1.c1 / ft1.c1))::double precision * random()) <= '1'::double
precision)))
> -   ->  HashAggregate
> +   ->  GroupAggregate
>           Output: sum(c1) FILTER (WHERE ((((c1 / c1))::double precision * random()) <= '1'::double precision)), c2
>           Group Key: ft1.c2
>           ->  Foreign Scan on public.ft1
>                 Output: c1, c2
> -               Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1"
> +               Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" ORDER BY c2 ASC NULLS LAST
>  (9 rows)
>  
>  explain (verbose, costs off)
> @@ -3885,24 +3885,21 @@
>  -- subquery using stable function (can't be sent to remote)
>  PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND
date(c4)= '1970-01-17'::date) ORDER BY c1;
 
>  EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st2(10, 20);
> -                                                QUERY PLAN                                                
> -----------------------------------------------------------------------------------------------------------
> - Sort
> +                                                            QUERY PLAN
            
 
>
+----------------------------------------------------------------------------------------------------------------------------------
> + Nested Loop Semi Join
>     Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
> -   Sort Key: t1.c1
> -   ->  Nested Loop Semi Join
> +   Join Filter: (t1.c3 = t2.c3)
> +   ->  Foreign Scan on public.ft1 t1
>           Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
> -         Join Filter: (t1.c3 = t2.c3)
> -         ->  Foreign Scan on public.ft1 t1
> -               Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
> -               Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20))
> -         ->  Materialize
> +         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20)) ORDER BY "C 1"
ASCNULLS LAST
 
> +   ->  Materialize
> +         Output: t2.c3
> +         ->  Foreign Scan on public.ft2 t2
>                 Output: t2.c3
> -               ->  Foreign Scan on public.ft2 t2
> -                     Output: t2.c3
> -                     Filter: (date(t2.c4) = '01-17-1970'::date)
> -                     Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10))
> -(15 rows)
> +               Filter: (date(t2.c4) = '01-17-1970'::date)
> +               Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10))
> +(12 rows)
>  
>  EXECUTE st2(10, 20);
>   c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
> @@ -9381,21 +9378,19 @@
>  -- test FOR UPDATE; partitionwise join does not apply
>  EXPLAIN (COSTS OFF)
>  SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a % 25 = 0 ORDER BY 1,2 FOR UPDATE OF
t1;
> -                          QUERY PLAN                          
> ---------------------------------------------------------------
> +                       QUERY PLAN                       
> +--------------------------------------------------------
>   LockRows
> -   ->  Sort
> -         Sort Key: t1.a
> -         ->  Hash Join
> -               Hash Cond: (t2.b = t1.a)
> +   ->  Nested Loop
> +         Join Filter: (t1.a = t2.b)
> +         ->  Append
> +               ->  Foreign Scan on ftprt1_p1 t1_1
> +               ->  Foreign Scan on ftprt1_p2 t1_2
> +         ->  Materialize
>                 ->  Append
>                       ->  Foreign Scan on ftprt2_p1 t2_1
>                       ->  Foreign Scan on ftprt2_p2 t2_2
> -               ->  Hash
> -                     ->  Append
> -                           ->  Foreign Scan on ftprt1_p1 t1_1
> -                           ->  Foreign Scan on ftprt1_p2 t1_2
> -(12 rows)
> +(10 rows)
>  
>  SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a % 25 = 0 ORDER BY 1,2 FOR UPDATE OF
t1;
>    a  |  b  
> @@ -9430,18 +9425,16 @@
>  SET enable_partitionwise_aggregate TO false;
>  EXPLAIN (COSTS OFF)
>  SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
> -                        QUERY PLAN                         
> ------------------------------------------------------------
> - Sort
> -   Sort Key: pagg_tab.a
> -   ->  HashAggregate
> -         Group Key: pagg_tab.a
> -         Filter: (avg(pagg_tab.b) < '22'::numeric)
> -         ->  Append
> -               ->  Foreign Scan on fpagg_tab_p1 pagg_tab_1
> -               ->  Foreign Scan on fpagg_tab_p2 pagg_tab_2
> -               ->  Foreign Scan on fpagg_tab_p3 pagg_tab_3
> -(9 rows)
> +                     QUERY PLAN                      
> +-----------------------------------------------------
> + GroupAggregate
> +   Group Key: pagg_tab.a
> +   Filter: (avg(pagg_tab.b) < '22'::numeric)
> +   ->  Append
> +         ->  Foreign Scan on fpagg_tab_p1 pagg_tab_1
> +         ->  Foreign Scan on fpagg_tab_p2 pagg_tab_2
> +         ->  Foreign Scan on fpagg_tab_p3 pagg_tab_3
> +(7 rows)
>  
>  -- Plan with partitionwise aggregates is enabled
>  SET enable_partitionwise_aggregate TO true;
> @@ -9475,34 +9468,32 @@
>  -- Should have all the columns in the target list for the given relation
>  EXPLAIN (VERBOSE, COSTS OFF)
>  SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
> -                               QUERY PLAN                               
> -------------------------------------------------------------------------
> - Sort
> -   Output: t1.a, (count(((t1.*)::pagg_tab)))
> +                                         QUERY PLAN                                         
> +--------------------------------------------------------------------------------------------
> + Merge Append
>     Sort Key: t1.a
> -   ->  Append
> -         ->  HashAggregate
> -               Output: t1.a, count(((t1.*)::pagg_tab))
> -               Group Key: t1.a
> -               Filter: (avg(t1.b) < '22'::numeric)
> -               ->  Foreign Scan on public.fpagg_tab_p1 t1
> -                     Output: t1.a, t1.*, t1.b
> -                     Remote SQL: SELECT a, b, c FROM public.pagg_tab_p1
> -         ->  HashAggregate
> -               Output: t1_1.a, count(((t1_1.*)::pagg_tab))
> -               Group Key: t1_1.a
> -               Filter: (avg(t1_1.b) < '22'::numeric)
> -               ->  Foreign Scan on public.fpagg_tab_p2 t1_1
> -                     Output: t1_1.a, t1_1.*, t1_1.b
> -                     Remote SQL: SELECT a, b, c FROM public.pagg_tab_p2
> -         ->  HashAggregate
> -               Output: t1_2.a, count(((t1_2.*)::pagg_tab))
> -               Group Key: t1_2.a
> -               Filter: (avg(t1_2.b) < '22'::numeric)
> -               ->  Foreign Scan on public.fpagg_tab_p3 t1_2
> -                     Output: t1_2.a, t1_2.*, t1_2.b
> -                     Remote SQL: SELECT a, b, c FROM public.pagg_tab_p3
> -(25 rows)
> +   ->  GroupAggregate
> +         Output: t1.a, count(((t1.*)::pagg_tab))
> +         Group Key: t1.a
> +         Filter: (avg(t1.b) < '22'::numeric)
> +         ->  Foreign Scan on public.fpagg_tab_p1 t1
> +               Output: t1.a, t1.*, t1.b
> +               Remote SQL: SELECT a, b, c FROM public.pagg_tab_p1 ORDER BY a ASC NULLS LAST
> +   ->  GroupAggregate
> +         Output: t1_1.a, count(((t1_1.*)::pagg_tab))
> +         Group Key: t1_1.a
> +         Filter: (avg(t1_1.b) < '22'::numeric)
> +         ->  Foreign Scan on public.fpagg_tab_p2 t1_1
> +               Output: t1_1.a, t1_1.*, t1_1.b
> +               Remote SQL: SELECT a, b, c FROM public.pagg_tab_p2 ORDER BY a ASC NULLS LAST
> +   ->  GroupAggregate
> +         Output: t1_2.a, count(((t1_2.*)::pagg_tab))
> +         Group Key: t1_2.a
> +         Filter: (avg(t1_2.b) < '22'::numeric)
> +         ->  Foreign Scan on public.fpagg_tab_p3 t1_2
> +               Output: t1_2.a, t1_2.*, t1_2.b
> +               Remote SQL: SELECT a, b, c FROM public.pagg_tab_p3 ORDER BY a ASC NULLS LAST
> +(23 rows)
>  
>  SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
>   a  | count 
> @@ -9518,24 +9509,23 @@
>  -- When GROUP BY clause does not match with PARTITION KEY.
>  EXPLAIN (COSTS OFF)
>  SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 ORDER BY 1;
> -                           QUERY PLAN                            
> ------------------------------------------------------------------
> - Sort
> -   Sort Key: pagg_tab.b
> -   ->  Finalize HashAggregate
> -         Group Key: pagg_tab.b
> -         Filter: (sum(pagg_tab.a) < 700)
> -         ->  Append
> -               ->  Partial HashAggregate
> -                     Group Key: pagg_tab.b
> -                     ->  Foreign Scan on fpagg_tab_p1 pagg_tab
> -               ->  Partial HashAggregate
> -                     Group Key: pagg_tab_1.b
> -                     ->  Foreign Scan on fpagg_tab_p2 pagg_tab_1
> -               ->  Partial HashAggregate
> -                     Group Key: pagg_tab_2.b
> -                     ->  Foreign Scan on fpagg_tab_p3 pagg_tab_2
> -(15 rows)
> +                        QUERY PLAN                         
> +-----------------------------------------------------------
> + Finalize GroupAggregate
> +   Group Key: pagg_tab.b
> +   Filter: (sum(pagg_tab.a) < 700)
> +   ->  Merge Append
> +         Sort Key: pagg_tab.b
> +         ->  Partial GroupAggregate
> +               Group Key: pagg_tab.b
> +               ->  Foreign Scan on fpagg_tab_p1 pagg_tab
> +         ->  Partial GroupAggregate
> +               Group Key: pagg_tab_1.b
> +               ->  Foreign Scan on fpagg_tab_p2 pagg_tab_1
> +         ->  Partial GroupAggregate
> +               Group Key: pagg_tab_2.b
> +               ->  Foreign Scan on fpagg_tab_p3 pagg_tab_2
> +(14 rows)
>  
>  -- ===================================================================
>  -- access rights and superuser


-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Re: Why is DEFAULT_FDW_TUPLE_COST so insanely low?

From

David Rowley

Date:

30 October 2023, 01:22:08

On Sun, 29 Oct 2023 at 12:45, Bruce Momjian <bruce@momjian.us> wrote:
> Has anything been done about this issue?

Nothing has been done. I was hoping to get the attention of a few
people who have dealt more with postgres_fdw in the past.

I've attached a patch with adjusts DEFAULT_FDW_TUPLE_COST and sets it
to 0.2.  I set it to this because my experiment in [1] showed that it
was about 21x lower than the actual costs (on my machine with a
loopback fdw connecting to the same instance and database using my
example query).  Given that we have parallel_tuple_cost set to 0.1 by
default, the network cost of a tuple from an FDW of 0.2 seems
reasonable to me. Slightly higher is probably also reasonable, but
given the seeming lack of complaints, I think I'd rather err on the
low side.

Changing it to 0.2, I see 4 plans change in postgres_fdw's regression
tests.  All of these changes are due to STD_FUZZ_FACTOR causing some
other plan to win in add_path().

For example the query EXPLAIN (VERBOSE, ANALYZE) SELECT a, sum(b),
min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY
1; the plan switches from a HashAggregate to a GroupAggregate. This is
because after increasing the DEFAULT_FDW_TUPLE_COST to 0.2 the sorted
append child (fuzzily) costs the same as the unsorted seq scan path
and the sorted path wins in add_path due to having better pathkeys.
The seq scan path is then thrown away and we end up doing the Group
Aggregate using the sorted append children.

If I change STD_FUZZ_FACTOR to something like 1.0000001 then the plans
no longer change when I do:

alter server loopback options (add fdw_tuple_cost '0.01');
<run the query>
alter server loopback options (drop fdw_tuple_cost);
<run the query>

Ordinarily, I'd not care too much about that, but I did test the
performance of one of the plans and the new plan came out slower than
the old one.

I'm not exactly sure how best to proceed on this in the absence of any feedback.

David

[1] https://postgr.es/m/CAApHDvopVjjfh5c1Ed2HRvDdfom2dEpMwwiu5-f1AnmYprJngA@mail.gmail.com

Attachment

increase_DEFAULT_FDW_TUPLE_COST_x20.patch

Re: Why is DEFAULT_FDW_TUPLE_COST so insanely low?

From

Ashutosh Bapat

Date:

30 October 2023, 10:45:44

Looks like the value goes long back to
d0d75c402217421b691050857eb3d7af82d0c770. The comment there adds
"above and beyond cpu_tuple_cost". So certainly it's expected to be
higher than cpu_tuple_cost. I have no memories of this. But looking at
the surrounding code, I think DEFAULT_FDW_STARTUP_COST takes care of
network costs and bandwidths. So DEFAULT_FDW_TUPLE_COST is just
assembling row from bytes on network. That might have been equated to
assembling row from heap buffer.

But I think you are right, it should be comparable to the parallel
tuple cost which at least is IPC like socket. This will also mean that
operations which reduce the number of rows will be favoured and pushed
down. That's what is desired.

--
Best Wishes,
Ashutosh Bapat

On Mon, Oct 30, 2023 at 6:52 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Sun, 29 Oct 2023 at 12:45, Bruce Momjian <bruce@momjian.us> wrote:
> > Has anything been done about this issue?
>
> Nothing has been done. I was hoping to get the attention of a few
> people who have dealt more with postgres_fdw in the past.
>
> I've attached a patch with adjusts DEFAULT_FDW_TUPLE_COST and sets it
> to 0.2.  I set it to this because my experiment in [1] showed that it
> was about 21x lower than the actual costs (on my machine with a
> loopback fdw connecting to the same instance and database using my
> example query).  Given that we have parallel_tuple_cost set to 0.1 by
> default, the network cost of a tuple from an FDW of 0.2 seems
> reasonable to me. Slightly higher is probably also reasonable, but
> given the seeming lack of complaints, I think I'd rather err on the
> low side.
>
> Changing it to 0.2, I see 4 plans change in postgres_fdw's regression
> tests.  All of these changes are due to STD_FUZZ_FACTOR causing some
> other plan to win in add_path().
>
> For example the query EXPLAIN (VERBOSE, ANALYZE) SELECT a, sum(b),
> min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY
> 1; the plan switches from a HashAggregate to a GroupAggregate. This is
> because after increasing the DEFAULT_FDW_TUPLE_COST to 0.2 the sorted
> append child (fuzzily) costs the same as the unsorted seq scan path
> and the sorted path wins in add_path due to having better pathkeys.
> The seq scan path is then thrown away and we end up doing the Group
> Aggregate using the sorted append children.
>
> If I change STD_FUZZ_FACTOR to something like 1.0000001 then the plans
> no longer change when I do:
>
> alter server loopback options (add fdw_tuple_cost '0.01');
> <run the query>
> alter server loopback options (drop fdw_tuple_cost);
> <run the query>
>
> Ordinarily, I'd not care too much about that, but I did test the
> performance of one of the plans and the new plan came out slower than
> the old one.
>
> I'm not exactly sure how best to proceed on this in the absence of any feedback.
>
> David
>
> [1] https://postgr.es/m/CAApHDvopVjjfh5c1Ed2HRvDdfom2dEpMwwiu5-f1AnmYprJngA@mail.gmail.com

Re: Why is DEFAULT_FDW_TUPLE_COST so insanely low?

From

Bruce Momjian

Date:

30 October 2023, 14:01:14

On Mon, Oct 30, 2023 at 02:22:08PM +1300, David Rowley wrote:
> If I change STD_FUZZ_FACTOR to something like 1.0000001 then the plans
> no longer change when I do:
> 
> alter server loopback options (add fdw_tuple_cost '0.01');
> <run the query>
> alter server loopback options (drop fdw_tuple_cost);
> <run the query>
> 
> Ordinarily, I'd not care too much about that, but I did test the
> performance of one of the plans and the new plan came out slower than
> the old one.
> 
> I'm not exactly sure how best to proceed on this in the absence of any feedback.

I think you just go and change it.  Your number is better than what we
have, and if someone wants to suggest a better number, we can change it
later.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Re: Why is DEFAULT_FDW_TUPLE_COST so insanely low?

From

David Rowley

Date:

30 October 2023, 22:16:30

On Tue, 31 Oct 2023 at 03:01, Bruce Momjian <bruce@momjian.us> wrote:
> I think you just go and change it.  Your number is better than what we
> have, and if someone wants to suggest a better number, we can change it
> later.

I did some more experimentation on the actual costs of getting a tuple
from a foreign server.

Using the attached setup, I did:

postgres=# explain (analyze, timing off) SELECT * FROM t;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..144248.48 rows=10000048 width=4) (actual
rows=10000000 loops=1)
 Planning Time: 0.077 ms
 Execution Time: 385.978 ms

postgres=# explain (analyze, timing off) SELECT * FROM ft;
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Foreign Scan on ft  (cost=100.00..244348.96 rows=10000048 width=4)
(actual rows=10000000 loops=1)
 Planning Time: 0.126 ms
 Execution Time: 8335.392 ms

So, let's take the first query and figure out the total cost per
millisecond of execution time. We can then multiply that by the
execution time of the 2nd query to calculate what we might expect the
costs to be for the foreign table scan based on how long it took
compared to the local table scan.

postgres=# select 144248.48/385.978*8335.392;
          ?column?
-----------------------------
 3115119.5824740270171341280

So, the above number is what we expect the foreign table scan to cost
with the assumption that the cost per millisecond is about right for
the local scan.  We can then calculate how much we'll need to charge
for a foreign tuple by subtracting the total cost of that query from
our calculated value to calculate how much extra we need to charge, in
total, then divide that by the number of tuples to get actual foreign
tuple cost for this query.

postgres=# select (3115119.58-244348.96)/10000000;
        ?column?
------------------------
 0.28707706200000000000

This is on an AMD 3990x running Linux 6.5 kernel.  I tried the same on
an Apple M2 mini and got:

postgres=# select 144247.77/257.763*3052.084;
          ?column?
-----------------------------
 1707988.7759402241595200680

postgres=# select (1707988.78-244347.54)/10000000;
        ?column?
------------------------
 0.14636412400000000000

So the actual foreign tuple cost on the M2 seems about half of what it
is on the Zen 2 machine.

Based on this, I agree with my original analysis that setting
DEFAULT_FDW_TUPLE_COST to 0.2 is about right. Of course, this is a
loopback onto localhost so remote networks likely would benefit from
higher values, but based on this 0.01 is far too low and we should
change it to at least 0.2.

I'd be happy if anyone else would like to try the same experiment to
see if there's some other value of DEFAULT_FDW_TUPLE_COST that might
suit better.

David

Attachment

setup_fdw.sql

Re: Why is DEFAULT_FDW_TUPLE_COST so insanely low?

From

David Rowley

Date:

02 November 2023, 01:32:44

On Tue, 31 Oct 2023 at 11:16, David Rowley <dgrowleyml@gmail.com> wrote:
> I'd be happy if anyone else would like to try the same experiment to
> see if there's some other value of DEFAULT_FDW_TUPLE_COST that might
> suit better.

No takers on the additional testing so I've pushed the patch that
increases it to 0.2.

David

Re: Why is DEFAULT_FDW_TUPLE_COST so insanely low?

From

Bruce Momjian

Date:

02 November 2023, 02:32:06

On Thu, Nov  2, 2023 at 02:32:44PM +1300, David Rowley wrote:
> On Tue, 31 Oct 2023 at 11:16, David Rowley <dgrowleyml@gmail.com> wrote:
> > I'd be happy if anyone else would like to try the same experiment to
> > see if there's some other value of DEFAULT_FDW_TUPLE_COST that might
> > suit better.
> 
> No takers on the additional testing so I've pushed the patch that
> increases it to 0.2.

Great!  Thanks.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Re: Why is DEFAULT_FDW_TUPLE_COST so insanely low?

From

Michael Paquier

Date:

02 November 2023, 05:39:46

On Thu, Nov 02, 2023 at 02:32:44PM +1300, David Rowley wrote:
> No takers on the additional testing so I've pushed the patch that
> increases it to 0.2.

The CI has been telling me that the plans of the tests introduced by
this patch are not that stable when building with 32b.  See:
diff -U3 /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out
/tmp/cirrus-ci-build/build-32/testrun/postgres_fdw/regress/results/postgres_fdw.out
--- /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out    2023-11-02 05:25:47.290268511 +0000
+++ /tmp/cirrus-ci-build/build-32/testrun/postgres_fdw/regress/results/postgres_fdw.out    2023-11-02
05:30:45.242316423+0000
 
@@ -4026,13 +4026,13 @@
  Sort
    Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
    Sort Key: t1.c1
-   ->  Nested Loop Semi Join
+   ->  Hash Semi Join
          Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-         Join Filter: (t2.c3 = t1.c3)
+         Hash Cond: (t1.c3 = t2.c3)
          ->  Foreign Scan on public.ft1 t1
                Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
                Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20))
-         ->  Materialize
+         ->  Hash
                Output: t2.c3
                ->  Foreign Scan on public.ft2 t2
                      Output: t2.c3
--
Michael

Attachment

signature.asc

Re: Why is DEFAULT_FDW_TUPLE_COST so insanely low?

From

David Rowley

Date:

02 November 2023, 07:19:35

On Thu, 2 Nov 2023 at 18:39, Michael Paquier <michael@paquier.xyz> wrote:
> The CI has been telling me that the plans of the tests introduced by
> this patch are not that stable when building with 32b.  See:
> diff -U3 /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out
/tmp/cirrus-ci-build/build-32/testrun/postgres_fdw/regress/results/postgres_fdw.out
> --- /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out 2023-11-02 05:25:47.290268511 +0000
> +++ /tmp/cirrus-ci-build/build-32/testrun/postgres_fdw/regress/results/postgres_fdw.out 2023-11-02 05:30:45.242316423
+0000
> @@ -4026,13 +4026,13 @@
>   Sort
>     Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
>     Sort Key: t1.c1
> -   ->  Nested Loop Semi Join
> +   ->  Hash Semi Join
>           Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
> -         Join Filter: (t2.c3 = t1.c3)
> +         Hash Cond: (t1.c3 = t2.c3)
>           ->  Foreign Scan on public.ft1 t1
>                 Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
>                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20))
> -         ->  Materialize
> +         ->  Hash
>                 Output: t2.c3
>                 ->  Foreign Scan on public.ft2 t2
>                       Output: t2.c3

No tests were introduced.  Is this the only existing one that's
unstable as far as you saw?

I'm not yet seeing any failures in the buildfarm, so don't really want
to push a fix for this one if there are going to be a few more
unstable ones to fix.  I may just hold off a while to see.

Thanks for letting me know about this.

David

Re: Why is DEFAULT_FDW_TUPLE_COST so insanely low?

From

Michael Paquier

Date:

02 November 2023, 07:22:37

On Thu, Nov 02, 2023 at 08:19:35PM +1300, David Rowley wrote:
> No tests were introduced.  Is this the only existing one that's
> unstable as far as you saw?

That seems to be the only one.

> I'm not yet seeing any failures in the buildfarm, so don't really want
> to push a fix for this one if there are going to be a few more
> unstable ones to fix.  I may just hold off a while to see.

The CF bot is also thinking that this is not really stable, impacting
the tests of the patches:
https://cirrus-ci.com/task/6685074121293824
https://cirrus-ci.com/task/4739402799251456
https://cirrus-ci.com/task/5209803589419008
--
Michael

Attachment

signature.asc

Re: Why is DEFAULT_FDW_TUPLE_COST so insanely low?

From

Richard Guo

Date:

02 November 2023, 12:02:12

On Thu, Nov 2, 2023 at 3:19 PM David Rowley <dgrowleyml@gmail.com> wrote:

I'm not yet seeing any failures in the buildfarm, so don't really want
to push a fix for this one if there are going to be a few more
unstable ones to fix. I may just hold off a while to see.

It seems that the test is still not stable on 32-bit machines even after
4b14e18714. I see the following plan diff on cfbot [1].

--- /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out 2023-11-02 11:35:12.016196978 +0000
+++ /tmp/cirrus-ci-build/build-32/testrun/postgres_fdw/regress/results/postgres_fdw.out 2023-11-02 11:42:09.092242808 +0000
@@ -4022,24 +4022,21 @@
-- subquery using stable function (can't be sent to remote)
PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c4) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st2(10, 20);
- QUERY PLAN
-----------------------------------------------------------------------------------------------------------
- Sort
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Nested Loop Semi Join
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Sort Key: t1.c1
- -> Nested Loop Semi Join
+ Join Filter: (t2.c3 = t1.c3)
+ -> Foreign Scan on public.ft1 t1
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Join Filter: (t2.c3 = t1.c3)
- -> Foreign Scan on public.ft1 t1
- Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20))
- -> Materialize
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20)) ORDER BY "C 1" ASC NULLS LAST
+ -> Materialize
+ Output: t2.c3
+ -> Foreign Scan on public.ft2 t2
Output: t2.c3
- -> Foreign Scan on public.ft2 t2
- Output: t2.c3
- Filter: (date(t2.c4) = '01-17-1970'::date)
- Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10))
-(15 rows)
+ Filter: (date(t2.c4) = '01-17-1970'::date)
+ Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10))
+(12 rows)

[1] https://api.cirrus-ci.com/v1/artifact/task/5727898984775680/testrun/build-32/testrun/postgres_fdw/regress/regression.diffs

Thanks
Richard

Re: Why is DEFAULT_FDW_TUPLE_COST so insanely low?

From

David Rowley

Date:

02 November 2023, 23:38:26

On Fri, 3 Nov 2023 at 01:02, Richard Guo <guofenglinux@gmail.com> wrote:
> It seems that the test is still not stable on 32-bit machines even after
> 4b14e18714.  I see the following plan diff on cfbot [1].

I recreated that locally this time.  Seems there's still flexibility
to push or not push down the sort and the costs of each are close
enough that it differs between 32 and 64-bit.

The fix I just pushed removes the flexibility for doing a local sort
by turning off enable_sort.

Thanks for the report.

David