Thread: Top-N sorts in EXPLAIN, row count estimates, and parallelism

Top-N sorts in EXPLAIN, row count estimates, and parallelism

From
Andres Freund
Date:
Hi,

Right now we don't indicate that a top-n sort is going to be used in
EXPLAIN, just EXPLAIN ANALYZE. That's imo suboptimal, because one quite
legitimately might want to know that before actually executing (as it
will make a huge amount of difference in the actual resource intensity
of the query).

postgres[28165][1]=# EXPLAIN (VERBOSE) SELECT * FROM hashes ORDER BY count DESC LIMIT 10;
┌───────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                              QUERY PLAN                                               │
├───────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=12419057.53..12419058.70 rows=10 width=45)                                               │
│   Output: hash, count                                                                                 │
│   ->  Gather Merge  (cost=12419057.53..66041805.65 rows=459591466 width=45)                           │
│         Output: hash, count                                                                           │
│         Workers Planned: 2                                                                            │
│         ->  Sort  (cost=12418057.51..12992546.84 rows=229795733 width=45)                             │
│               Output: hash, count                                                                     │
│               Sort Key: hashes.count DESC                                                             │
│               ->  Parallel Seq Scan on public.hashes  (cost=0.00..7452254.33 rows=229795733 width=45) │
│                     Output: hash, count                                                               │
└───────────────────────────────────────────────────────────────────────────────────────────────────────┘
(10 rows)

postgres[28165][1]=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM hashes ORDER BY count DESC LIMIT 10;

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                         QUERY PLAN
                                     │
 

├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=12419057.53..12419058.70 rows=10 width=45) (actual time=115204.278..115205.024 rows=10 loops=1)
                                     │
 
│   Output: hash, count
                                     │
 
│   ->  Gather Merge  (cost=12419057.53..66041805.65 rows=459591466 width=45) (actual time=115204.276..115205.020
rows=10loops=1)                            │
 
│         Output: hash, count
                                     │
 
│         Workers Planned: 2
                                     │
 
│         Workers Launched: 2
                                     │
 
│         ->  Sort  (cost=12418057.51..12992546.84 rows=229795733 width=45) (actual time=115192.189..115192.189 rows=7
loops=3)                              │
 
│               Output: hash, count
                                     │
 
│               Sort Key: hashes.count DESC
                                     │
 
│               Sort Method: top-N heapsort  Memory: 25kB
                                     │
 
│               Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
                                     │
 
│               Worker 1:  Sort Method: top-N heapsort  Memory: 25kB
                                     │
 
│               Worker 0: actual time=115186.558..115186.559 rows=10 loops=1
                                     │
 
│               Worker 1: actual time=115186.540..115186.540 rows=10 loops=1
                                     │
 
│               ->  Parallel Seq Scan on public.hashes  (cost=0.00..7452254.33 rows=229795733 width=45) (actual
time=0.080..90442.215rows=183836589 loops=3) │
 
│                     Output: hash, count
                                     │
 
│                     Worker 0: actual time=0.111..90366.999 rows=183976442 loops=1
                                     │
 
│                     Worker 1: actual time=0.107..90461.921 rows=184707680 loops=1
                                     │
 
│ Planning Time: 0.121 ms
                                     │
 
│ Execution Time: 115205.053 ms
                                     │
 

└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(20 rows)

It's also noticable that we preposterously assume that the sort actually
will return exactly the number of rows in the table, despite being a
top-n style sort. That seems bad for costing of the parallel query,
because it think we'll assume that costs tups * parallel_tuple_cost?

I'm also unclear as to why the Gather Merge ends up with twice as
many estimated rows as there are in the table.

Greetings,

Andres Freund



Re: Top-N sorts in EXPLAIN, row count estimates, and parallelism

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> Right now we don't indicate that a top-n sort is going to be used in
> EXPLAIN, just EXPLAIN ANALYZE.

Given the way that's implemented, I doubt that we can report it
reliably in EXPLAIN.

> It's also noticable that we preposterously assume that the sort actually
> will return exactly the number of rows in the table, despite being a
> top-n style sort.

In general, we report nodes below LIMIT with their execute-to-completion
cost and rowcount estimates.  Doing differently for a top-N sort would
be quite confusing, I should think.

> That seems bad for costing of the parallel query,
> because it think we'll assume that costs tups * parallel_tuple_cost?

If the parallel query stuff doesn't understand about LIMIT, that's
a bug independently of top-N sorts.

            regards, tom lane



Re: Top-N sorts in EXPLAIN, row count estimates, and parallelism

From
Andres Freund
Date:
Hi,

On 2019-05-23 18:31:43 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > It's also noticable that we preposterously assume that the sort actually
> > will return exactly the number of rows in the table, despite being a
> > top-n style sort.
> 
> In general, we report nodes below LIMIT with their execute-to-completion
> cost and rowcount estimates.  Doing differently for a top-N sort would
> be quite confusing, I should think.

I'm not quite sure that's true. I mean, a top-N sort wouldn't actually
necessarily return all the input rows, even if run to completion. Isn't
that a somewhat fundamental difference?

Greetings,

Andres Freund



Re: Top-N sorts in EXPLAIN, row count estimates, and parallelism

From
Peter Geoghegan
Date:
On Thu, May 23, 2019 at 3:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Given the way that's implemented, I doubt that we can report it
> reliably in EXPLAIN.

Does it have to be totally reliable?

cost_sort() costs sorts as top-N heapsorts. While we cannot make an
iron-clad guarantee that it will work out that way from within
tuplesort.c, that doesn't seem like it closes off the possibility of
more informative EXPLAIN output. For example, can't we at report that
the tuplesort will be "bounded" within EXPLAIN, indicating that we
intend to attempt to sort using a top-N heap sort (i.e. we'll
definitely do it that way if there is sufficient work_mem)?

-- 
Peter Geoghegan



Re: Top-N sorts in EXPLAIN, row count estimates, and parallelism

From
David Rowley
Date:
On Fri, 24 May 2019 at 10:44, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, May 23, 2019 at 3:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Given the way that's implemented, I doubt that we can report it
> > reliably in EXPLAIN.
>
> Does it have to be totally reliable?
>
> cost_sort() costs sorts as top-N heapsorts. While we cannot make an
> iron-clad guarantee that it will work out that way from within
> tuplesort.c, that doesn't seem like it closes off the possibility of
> more informative EXPLAIN output. For example, can't we at report that
> the tuplesort will be "bounded" within EXPLAIN, indicating that we
> intend to attempt to sort using a top-N heap sort (i.e. we'll
> definitely do it that way if there is sufficient work_mem)?

I think this really needs more of a concrete proposal.  Remember
LIMIT/OFFSET don't need to be constants, they could be a Param or some
return value from a subquery, so the bound might not be known until
after executor startup, to which EXPLAIN is not going to get to know
about that.

Perhaps something to be tagged onto the Sort path in grouping_planner
if preprocess_limit() managed to come up with a value.  double does
not seem like the perfect choice for a bound to show in EXPLAIN and
int64 could wrap for very high LIMIT + OFFSET values.  Showing an
approximate value in EXPLAIN seems like it might be a source of future
bug reports.  Perhaps if we did it, we could just set it to -1
(unknown) if LIMIT + OFFSET happened to wrap an int64.  Implementing
that would seem to require adding a new field for that in SortPath,
Sort and SortState, plus all the additional code for passing the value
over.  We'd have to then hope nobody used the field for anything
important in the future.

After that, what would we do with it in EXPLAIN?  Always show "Bound:
<n>", if it's not -1?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Top-N sorts in EXPLAIN, row count estimates, and parallelism

From
Peter Geoghegan
Date:
On Thu, May 23, 2019 at 7:48 PM David Rowley
<david.rowley@2ndquadrant.com> wrote:
> > cost_sort() costs sorts as top-N heapsorts. While we cannot make an
> > iron-clad guarantee that it will work out that way from within
> > tuplesort.c, that doesn't seem like it closes off the possibility of
> > more informative EXPLAIN output. For example, can't we at report that
> > the tuplesort will be "bounded" within EXPLAIN, indicating that we
> > intend to attempt to sort using a top-N heap sort (i.e. we'll
> > definitely do it that way if there is sufficient work_mem)?
>
> I think this really needs more of a concrete proposal.  Remember
> LIMIT/OFFSET don't need to be constants, they could be a Param or some
> return value from a subquery, so the bound might not be known until
> after executor startup, to which EXPLAIN is not going to get to know
> about that.

I was merely pointing out that it is clear when a sort *could* be a
top-n sort, which could be exposed by EXPLAIN without anyone feeling
misled.

> After that, what would we do with it in EXPLAIN?  Always show "Bound:
> <n>", if it's not -1?

I'm not sure.

The distinction between a top-n sort and any other sort is an
important one (it's certainly more important than the distinction
between an internal and external sort), so it's worth being flexible
in order to expose more information in EXPLAIN output. I would be
willing to accept some kind of qualified or hedged description in the
EXPLAIN output for a bounded sort node, even though that approach
doesn't seem desirable in general.

-- 
Peter Geoghegan