Re: [HACKERS] [PATCH] Incremental sort - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] [PATCH] Incremental sort
Date
Msg-id 1025f78a-baa3-b159-5eb2-1bebf9d5a638@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] [PATCH] Incremental sort  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses Re: [HACKERS] [PATCH] Incremental sort
List pgsql-hackers
On 04/03/2018 11:09 AM, Alexander Korotkov wrote:
> On Sun, Apr 1, 2018 at 12:06 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
>     On 03/31/2018 10:43 PM, Tomas Vondra wrote:
>     > ...
>     > But I'm pretty sure it may lead to surprising behavior - for example if
>     > you disable incremental sorts (enable_incrementalsort=off), the plan
>     > will switch to plain sort without the additional costs. So you'll get a
>     > cheaper plan by disabling some operation. That's surprising.
>     >
> 
>     To illustrate this is a valid issue, consider this trivial example:
> 
>     create table t (a int, b int, c int);
> 
>     insert into t select 10*random(), 10*random(), 10*random()
>       from generate_series(1,1000000) s(i);
> 
>     analyze t;
> 
>     explain select * from (select * from t order by a,b) foo order by a,b,c;
> 
>                                    QUERY PLAN
>     ------------------------------------------------------------------------
>      Incremental Sort  (cost=133100.48..264139.27 rows=1000000 width=12)
>        Sort Key: t.a, t.b, t.c
>        Presorted Key: t.a, t.b
>        ->  Sort  (cost=132154.34..134654.34 rows=1000000 width=12)
>              Sort Key: t.a, t.b
>              ->  Seq Scan on t  (cost=0.00..15406.00 rows=1000000 width=12)
>     (6 rows)
> 
>     set enable_incrementalsort = off;
> 
>     explain select * from (select * from t order by a,b) foo order by a,b,c;
>                                    QUERY PLAN
>     ------------------------------------------------------------------------
>      Sort  (cost=261402.69..263902.69 rows=1000000 width=12)
>        Sort Key: t.a, t.b, t.c
>        ->  Sort  (cost=132154.34..134654.34 rows=1000000 width=12)
>              Sort Key: t.a, t.b
>              ->  Seq Scan on t  (cost=0.00..15406.00 rows=1000000 width=12)
>     (5 rows)
> 
>     So the cost with incremental sort was 264139, and after disabling the
>     incremental cost it dropped to 263902. Granted, the difference is
>     negligible in this case, but it's still surprising.
> 
>     Also, it can be made much more significant by reducing the number of
>     prefix groups in the data:
> 
>     truncate t;
> 
>     insert into t select 1,1,1 from generate_series(1,1000000) s(i);
> 
>     analyze t;
> 
>     set enable_incrementalsort = on;
> 
>     explain select * from (select * from t order by a,b) foo order by a,b,c;
> 
>                                    QUERY PLAN
>     ------------------------------------------------------------------------
>      Incremental Sort  (cost=324165.83..341665.85 rows=1000000 width=12)
>        Sort Key: t.a, t.b, t.c
>        Presorted Key: t.a, t.b
>        ->  Sort  (cost=132154.34..134654.34 rows=1000000 width=12)
>              Sort Key: t.a, t.b
>              ->  Seq Scan on t  (cost=0.00..15406.00 rows=1000000 width=12)
>     (6 rows)
> 
>     So that's 263902 vs. 341665, yet we still prefer the incremental mode.
> 
> 
> Problem is well-defined, thank you.
> I'll check what can be done in this field today.
> 

I think solving this may be fairly straight-forward. Essentially, until
now we only had one way to do the sort, so it was OK to make the sort
implicit by checking if the path is sorted

    if (input not sorted)
    {
        ... add a Sort node ...
    }

But now we have multiple possible ways to do the sort, with different
startup/total costs. So the places that create the sorts need to
actually generate the Sort paths for each sort alternative, and store
the information in the Sort node (instead of relying on pathkeys).

Ultimately, this should simplify the createplan.c places making all the
make_sort calls unnecessary (i.e. the input should be already sorted
when needed). Otherwise it'd mean the decision needs to be done locally,
but I don't think that should be needed.

But it's surely a fairly invasive change to the patch ...

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Dmitry Ivanov
Date:
Subject: Re: new function for tsquery creartion
Next
From: Greg Stark
Date:
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS