Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From James Coleman
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id CAAaqYe-KCy3tW6j2VLi++C6Z+d1wgi81sOxiKo0+HWVR2noNLg@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
Responses Re: [PATCH] Incremental sort (was: PoC: Partial sort)
List pgsql-hackers
On Fri, Mar 13, 2020 at 2:23 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Tue, Mar 10, 2020 at 10:44 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> > 3) Most of the execution plans look reasonable, except that some of the
> > plans look like this:
> >
> >
> >                           QUERY PLAN
> >    ---------------------------------------------------------
> >     Limit
> >       ->  GroupAggregate
> >             Group Key: t.a, t.b, t.c, t.d
> >             ->  Incremental Sort
> >                   Sort Key: t.a, t.b, t.c, t.d
> >                   Presorted Key: t.a, t.b, t.c
> >                   ->  Incremental Sort
> >                         Sort Key: t.a, t.b, t.c
> >                         Presorted Key: t.a, t.b
> >                         ->  Index Scan using t_a_b_idx on t
> >    (10 rows)
> >
> > i.e. there are two incremental sorts on top of each other, with
> > different prefixes. But this this is not a new issue - it happens with
> > queries like this:
> >
> >    SELECT a, b, c, d, count(*) FROM (
> >      SELECT * FROM t ORDER BY a, b, c
> >    ) foo GROUP BY a, b, c, d limit 1000;
> >
> > i.e. there's a subquery with a subset of pathkeys. Without incremental
> > sort the plan looks like this:
> >
> >                     QUERY PLAN
> >    ---------------------------------------------
> >     Limit
> >       ->  GroupAggregate
> >             Group Key: t.a, t.b, t.c, t.d
> >             ->  Sort
> >                   Sort Key: t.a, t.b, t.c, t.d
> >                   ->  Sort
> >                         Sort Key: t.a, t.b, t.c
> >                         ->  Seq Scan on t
> >    (8 rows)
> >
> > so essentially the same plan shape. What bugs me though is that there
> > seems to be some sort of memory leak, so that this query consumes
> > gigabytes os RAM before it gets killed by OOM. But the memory seems not
> > to be allocated in any memory context (at least MemoryContextStats don't
> > show anything like that), so I'm not sure what's going on.
> >
> > Reproducing it is fairly simple:
> >
> >    CREATE TABLE t (a bigint, b bigint, c bigint, d bigint);
> >    INSERT INTO t SELECT
> >      1000*random(), 1000*random(), 1000*random(), 1000*random()
> >    FROM generate_series(1,10000000) s(i);
> >    CREATE INDEX idx ON t(a,b);
> >    ANALYZE t;
> >
> >    EXPLAIN ANALYZE SELECT a, b, c, d, count(*)
> >    FROM (SELECT * FROM t ORDER BY a, b, c) foo GROUP BY a, b, c, d
> >    LIMIT 100;
>
> While trying to reproduce this, instead of lots of memory usage, I got
> the attached assertion failure instead.

And, without the EXPLAIN ANALYZE was able to get this one, which will
probably be a lot more helpful.

James

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Next
From: Robert Haas
Date:
Subject: Re: backup manifests