Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id 20200311024355.5h2v56jpgzvd7yox@development
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
Responses Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (Justin Pryzby <pryzby@telsasoft.com>)
Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
List pgsql-hackers
Hi,

I've been focusing on the planner part of the patch, particularly on
picking places need to consider incremental sort paths. As discussed in
the past, we might simply consider incremental sort everywhere we need
sorted path, but that's likely an overkill. So I used the set of queries
used for previous tests [1] and determined which places affect the most
plans (using a separate GUC for each place, per [2]).

The result is attached, on top of the incremental sort patch. Part 0004
tweak all places to cover all plan changes in the test queries, and part
0005 tweaks a couple additional places where I think we should consider
incremental sort (but I've been unable to construct a query for which
it'd make a difference).

I've also removed the various GUCs and all the places are now using the
same enable_incrementalsort GUC. Compared to [2] I've also modified the
code to consider full and incremental sort in the same loop, which does
allow reusing some of the work.

The next thing I work on is determining how expensive this can be, in
extreme cases with many indexes etc.


Now, a couple comments about parts 0001 - 0003 of the patch ...

1) I see a bunch of failures in the regression test, due to minor
differences in the explain output. All the differences are about minor
changes in memory usage, like this:

-               "Sort Space Used": 30,                             +
+               "Sort Space Used": 29,                             +

I'm not sure if it happens on my machine only, but maybe the test is not
entirely stable.


2) I think this bit in ExecReScanIncrementalSort is wrong:

     node->sort_Done = false;
     tuplesort_end(node->fullsort_state);
     node->prefixsort_state = NULL;
     tuplesort_end(node->fullsort_state);
     node->prefixsort_state = NULL;
     node->bound_Done = 0;

Notice both places reset fullsort_state and set prefixsort_state to
NULL. Another thing is that I'm not sure it's fine to pass NULL to
tuplesort_end (my guess is tuplesort_free will fail when it gets NULL).


3) Most of the execution plans look reasonable, except that some of the
plans look like this:


                          QUERY PLAN
   ---------------------------------------------------------
    Limit
      ->  GroupAggregate
            Group Key: t.a, t.b, t.c, t.d
            ->  Incremental Sort
                  Sort Key: t.a, t.b, t.c, t.d
                  Presorted Key: t.a, t.b, t.c
                  ->  Incremental Sort
                        Sort Key: t.a, t.b, t.c
                        Presorted Key: t.a, t.b
                        ->  Index Scan using t_a_b_idx on t
   (10 rows)

i.e. there are two incremental sorts on top of each other, with
different prefixes. But this this is not a new issue - it happens with
queries like this:

   SELECT a, b, c, d, count(*) FROM (
     SELECT * FROM t ORDER BY a, b, c
   ) foo GROUP BY a, b, c, d limit 1000;

i.e. there's a subquery with a subset of pathkeys. Without incremental
sort the plan looks like this:

                    QUERY PLAN
   ---------------------------------------------
    Limit
      ->  GroupAggregate
            Group Key: t.a, t.b, t.c, t.d
            ->  Sort
                  Sort Key: t.a, t.b, t.c, t.d
                  ->  Sort
                        Sort Key: t.a, t.b, t.c
                        ->  Seq Scan on t
   (8 rows)

so essentially the same plan shape. What bugs me though is that there
seems to be some sort of memory leak, so that this query consumes
gigabytes os RAM before it gets killed by OOM. But the memory seems not
to be allocated in any memory context (at least MemoryContextStats don't
show anything like that), so I'm not sure what's going on.

Reproducing it is fairly simple:

   CREATE TABLE t (a bigint, b bigint, c bigint, d bigint);
   INSERT INTO t SELECT
     1000*random(), 1000*random(), 1000*random(), 1000*random()
   FROM generate_series(1,10000000) s(i);
   CREATE INDEX idx ON t(a,b);
   ANALYZE t;

   EXPLAIN ANALYZE SELECT a, b, c, d, count(*)
   FROM (SELECT * FROM t ORDER BY a, b, c) foo GROUP BY a, b, c, d
   LIMIT 100;


[1] https://github.com/tvondra/incremental-sort-tests-2

[2] https://github.com/tvondra/postgres/tree/incremental-sort-20200309


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Next
From: Masahiko Sawada
Date:
Subject: Re: Berserk Autovacuum (let's save next Mandrill)