Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first - Mailing list pgsql-hackers

Magnus Hagander <magnus@hagander.net> writes:
> On Fri, May 04, 2007 at 12:38:18PM -0400, Tom Lane wrote:
>> Magnus Hagander <magnus@hagander.net> writes:
>>> Could we show it in EXPLAIN ANALYZE somehow? I'm thinking it would be good
>>> to see at runtime (for example as a hint that if you put in a bit more
>>> work_mem it might get used)
>> 
>> I don't see that this is any more interesting than whether the sort
>> spilled to disk or not; which is something we don't show in EXPLAIN
>> ANALYZE either.  trace_sort is the agreed API for examining that.

> Now that you mention it, that'd be nice to have as well - the point being
> making it available without recompile.

Well, trace_sort is available by default, but...

>> It's not exactly easy to do, because (a) none of this information
>> is exposed outside tuplesort.c, and (b) the tuplesortstate object
>> is probably gone by the time EXPLAIN ANALYZE runs, anyway.

> Hmm. Ok. Don't know enough about those parts of the code to comment on
> that, but I'll certainly take your word for it :-)

I take back point (b) --- the state object is released at ExecutorEnd,
and EXPLAIN ANALYZE examines the tree before doing that, so if we added
some kind of reporting function to tuplesort.c's API it'd be doable
easily enough.

What do you think the output should look like?  The first thought that
comes to mind is to add "method=memory" (or disk or top-N) to the
"actual" annotation:

regression=# explain analyze select * from tenk1 order by fivethous limit 100;
           QUERY PLAN
 

------------------------------------------------------------------------------------------------------------------------Limit
(cost=840.19..840.44 rows=100 width=244) (actual time=140.511..141.604 rows=100 loops=1)  ->  Sort
(cost=840.19..865.19rows=10000 width=244) (actual time=140.492..140.880 rows=100 loops=1 method=top-N)
                                                                                   ^^^^^^^^^^^^        Sort Key:
fivethous       ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244) (actual time=0.074..51.849 rows=10000
loops=1)Totalruntime: 143.089 ms
 
(5 rows)

Another possibility, which could be wedged into explain.c slightly more
easily, is to append "Method: top-N" or some such to the Sort Key line,
but I'm not sure that that would look nice.

Comments, ideas?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Paul Ramsey
Date:
Subject: Re: Where to find kind code for STATISTIC_KIND GEOMETRY?
Next
From: Tom Lane
Date:
Subject: Re: Where to find kind code for STATISTIC_KIND GEOMETRY?