Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From James Coleman
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id CAAaqYe9Y6Hc35N2+eRMhPvc1YJ1LhdZjzT8k7J1KGcx_JC+_Kw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
Responses Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
List pgsql-hackers
On Sat, Mar 14, 2020 at 12:07 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Fri, Mar 13, 2020 at 8:23 PM James Coleman <jtc331@gmail.com> wrote:
> >
> > On Friday, March 13, 2020, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> >>
> >> On Fri, Mar 13, 2020 at 04:31:16PM -0400, James Coleman wrote:
> >>>
> >>> On Fri, Mar 13, 2020 at 2:23 PM James Coleman <jtc331@gmail.com> wrote:
> >>>>
> >>>>
> >>>> On Tue, Mar 10, 2020 at 10:44 PM Tomas Vondra
> >>>> <tomas.vondra@2ndquadrant.com> wrote:
> >>>> > 3) Most of the execution plans look reasonable, except that some of the
> >>>> > plans look like this:
> >>>> >
> >>>> >
> >>>> >                           QUERY PLAN
> >>>> >    ---------------------------------------------------------
> >>>> >     Limit
> >>>> >       ->  GroupAggregate
> >>>> >             Group Key: t.a, t.b, t.c, t.d
> >>>> >             ->  Incremental Sort
> >>>> >                   Sort Key: t.a, t.b, t.c, t.d
> >>>> >                   Presorted Key: t.a, t.b, t.c
> >>>> >                   ->  Incremental Sort
> >>>> >                         Sort Key: t.a, t.b, t.c
> >>>> >                         Presorted Key: t.a, t.b
> >>>> >                         ->  Index Scan using t_a_b_idx on t
> >>>> >    (10 rows)
> >>>> >
> >>>> > i.e. there are two incremental sorts on top of each other, with
> >>>> > different prefixes. But this this is not a new issue - it happens with
> >>>> > queries like this:
> >>>> >
> >>>> >    SELECT a, b, c, d, count(*) FROM (
> >>>> >      SELECT * FROM t ORDER BY a, b, c
> >>>> >    ) foo GROUP BY a, b, c, d limit 1000;
> >>>> >
> >>>> > i.e. there's a subquery with a subset of pathkeys. Without incremental
> >>>> > sort the plan looks like this:
> >>>> >
> >>>> >                     QUERY PLAN
> >>>> >    ---------------------------------------------
> >>>> >     Limit
> >>>> >       ->  GroupAggregate
> >>>> >             Group Key: t.a, t.b, t.c, t.d
> >>>> >             ->  Sort
> >>>> >                   Sort Key: t.a, t.b, t.c, t.d
> >>>> >                   ->  Sort
> >>>> >                         Sort Key: t.a, t.b, t.c
> >>>> >                         ->  Seq Scan on t
> >>>> >    (8 rows)
> >>>> >
> >>>> > so essentially the same plan shape. What bugs me though is that there
> >>>> > seems to be some sort of memory leak, so that this query consumes
> >>>> > gigabytes os RAM before it gets killed by OOM. But the memory seems not
> >>>> > to be allocated in any memory context (at least MemoryContextStats don't
> >>>> > show anything like that), so I'm not sure what's going on.
> >>>> >
> >>>> > Reproducing it is fairly simple:
> >>>> >
> >>>> >    CREATE TABLE t (a bigint, b bigint, c bigint, d bigint);
> >>>> >    INSERT INTO t SELECT
> >>>> >      1000*random(), 1000*random(), 1000*random(), 1000*random()
> >>>> >    FROM generate_series(1,10000000) s(i);
> >>>> >    CREATE INDEX idx ON t(a,b);
> >>>> >    ANALYZE t;
> >>>> >
> >>>> >    EXPLAIN ANALYZE SELECT a, b, c, d, count(*)
> >>>> >    FROM (SELECT * FROM t ORDER BY a, b, c) foo GROUP BY a, b, c, d
> >>>> >    LIMIT 100;
> >>>>
> >>>> While trying to reproduce this, instead of lots of memory usage, I got
> >>>> the attached assertion failure instead.
> >>>
> >>>
> >>> And, without the EXPLAIN ANALYZE was able to get this one, which will
> >>> probably be a lot more helpful.
> >>>
> >>
> >> Hmmm, I'll try reproducing it, but can you investigate the values in the
> >> Assert? I mean, it fails on this:
> >>
> >>   Assert(total_allocated == context->mem_allocated);
> >>
> >> so can you get a core or attach to the process using gdb, and see what's
> >> the expected / total value?
>
> I've reproduced this on multiple machines (though all are Ubuntu or
> Debian derivatives...I don't think that's likely to matter). A core
> dump is ~150MB, so I've uploaded to Dropbox [1].
>
> I didn't find an obvious first-level member of Tuplesortstate that was
> covered by either of the two blocks in the AllocSet (both are 8KB in
> size).
>
> James
>
> [1]: https://www.dropbox.com/s/jwndwp4634hzywk/aset_assertion_failure.core?dl=0

And...I think I might have found out the issue (though haven't proved
it 100% yet or fixed it):

The incremental sort node calls `tuplesort_puttupleslot`, which
switches the memory context to `sortcontext`. It then calls
`puttuple_common`. `puttuple_common` may then call `grow_memtuples`
which reallocs space for `sortstate->memtuples`, but `memtuples` is
elsewhere allocated in the memory context maincontext.

I had earlier in this debugging process noticed that `sortcontext` was
allocated in `maincontext`, which seemed conceptually odd if our goal
is to reuse the sort state, and I also found a comment that needed to
be changed relative to cleaning up the per-sort context (that talks
about it freeing the sort state itself), but the `memtuples` array was
in fact freed additionally at reset, so it seemed safe.

Given this issue though, I think I'm going to go ahead and rework so
that the `memtuples` array lies within the `sortcontext` instead.

James



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: resolve_generic_type() is over-complex and under-correct
Next
From: James Coleman
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)