Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
Date
Msg-id ed041b8e-1440-a09a-3632-85b3985df598@enterprisedb.com
Whole thread Raw
In response to Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
List pgsql-hackers
Pushed, after clarifying the comments a bit.

I also looked into what would it take to consider incremental paths, and 
in principle it doesn't seem all that complicated. The attached PoC 
patch extends get_cheapest_fractional_path_for_pathkeys() to optionally 
build incremental sort on paths if needed. There are two GUCs that make 
experimenting simpler:

* enable_fractional_paths - disables fractional paths entirely, i.e. we 
get behavior without the part I already pushed

* enable_fractional_incremental_paths - disables the incremental sort 
part added by the attached patch

With this, I get this plan (see the test in partitionwise_join.sql)

test=# EXPLAIN (COSTS OFF)
test-# SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id1, id2) 
ORDER BY id1 ASC, id2 ASC LIMIT 10;
                                   QUERY PLAN 

------------------------------------------------------------------------------
  Limit
    ->  Merge Left Join
          Merge Cond: ((x.id1 = y.id1) AND (x.id2 = y.id2))
          ->  Append
                ->  Index Only Scan using fract_t0_id1_id2_idx on
                                          fract_t0 x_1
                ->  Incremental Sort
                      Sort Key: x_2.id1, x_2.id2
                      Presorted Key: x_2.id1
                      ->  Index Scan using fract_t1_pkey on fract_t1 x_2
          ->  Materialize
                ->  Append
                      ->  Incremental Sort
                            Sort Key: y_1.id1, y_1.id2
                            Presorted Key: y_1.id1
                            ->  Index Scan using fract_t0_pkey on
                                                 fract_t0 y_1
                                  Index Cond: (id1 = id1)
                                  Filter: (id2 = id2)
                      ->  Incremental Sort
                            Sort Key: y_2.id1, y_2.id2
                            Presorted Key: y_2.id1
                            ->  Index Scan using fract_t1_pkey on
                                                 fract_t1 y_2
                                  Index Cond: (id1 = id1)
                                  Filter: (id2 = id2)
(23 rows)

instead of

                                   QUERY PLAN 

------------------------------------------------------------------------------
  Limit
    ->  Incremental Sort
          Sort Key: x.id1, x.id2
          Presorted Key: x.id1
          ->  Merge Left Join
                Merge Cond: (x.id1 = y.id1)
                Join Filter: (x.id2 = y.id2)
                ->  Append
                      ->  Index Scan using fract_t0_pkey on fract_t0 x_1
                      ->  Index Scan using fract_t1_pkey on fract_t1 x_2
                ->  Materialize
                      ->  Append
                            ->  Index Scan using fract_t0_pkey on
                                                 fract_t0 y_1
                            ->  Index Scan using fract_t1_pkey on
                                                 fract_t1 y_2
(14 rows)

i.e. the incremental sorts moved below the merge join (and the cost is 
lower, but that's not shown here).

So that seems reasonable, but there's a couple issues too:

1) Planning works (hence we can run explain), but execution fails 
because of segfault in CheckVarSlotCompatibility - it gets NULL slot for 
some reason. I haven't looked into the details, but I'd guess we need to 
pass a different rel to create_incrementalsort_path, not childrel.

2) enable_partitionwisejoin=on seems to cause some confusion, because it 
results in picking a different plan with higher cost. But that's clearly 
not because of this patch.

3) I'm still a bit skeptical about the cost of this implementation - it 
builds the incrementalsort path, just to throw most of the paths away. 
It'd be much better to just calculate the cost, which should be much 
cheaper, and only do the heavy work for the one "best" path.

4) One thing I did not realize before is what pathkeys we even consider 
here. Imagine you have two tables:

    CREATE TABLE t1 (a int, b int, primary key (a));
    CREATE TABLE t2 (a int, b int, primary key (a));

and query

    SELECT * FROM t1 JOIN t2 USING (a,b);

It seems reasonable to also consider pathkeys (a,b) because that's make 
e.g. mergejoin much cheaper, right? But no, we'll not do that - we only 
consider pathkeys that at least one child relation has, so in this case 
it's just (a). Which means we'll never consider incremental sort for 
this particular example.

It's a bit strange, because it's enough to create index on (a,b) for 
just one of the relations, and it'll suddenly consider building 
incremental sort on both sides.


I don't plan to pursue this further at this point, so I pushed the first 
part because that's a fairly simple improvement over what we have now. 
But it seems like a fairly promising area for improvements.

Also, the non-intuitive effects of enabling partition-wise joins (i.e. 
picking plans with higher estimated costs) is something worth exploring, 
I guess. But that's a separate issue.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: A qsort template
Next
From: Sergey Shinderuk
Date:
Subject: Re: Improve error handling of HMAC computations and SCRAM