Re: [HACKERS] CLUSTER command progress monitor - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] CLUSTER command progress monitor
Date
Msg-id CA+TgmobNjKsg7t3PwDBjquYyrhBX1BeTsNEmAjfMKwEDz2ib3A@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] CLUSTER command progress monitor  (Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp>)
Responses Re: [HACKERS] CLUSTER command progress monitor
List pgsql-hackers
On Tue, Sep 12, 2017 at 8:20 AM, Tatsuro Yamada
<yamada.tatsuro@lab.ntt.co.jp> wrote:
>>> I agree that progress reporting for sort is difficult. So it only reports
>>> the phase ("sorting tuples") in the current design of progress monitor of
>>> cluster.
>>> It doesn't report counter of sort.
>>
>> Doesn't that make it almost useless?  I would guess that scanning the
>> heap and writing the new heap would ordinarily account for most of the
>> runtime, or at least enough that you're going to want something more
>> than just knowing that's the phase you're in.
>
> Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort())
> I know that external merge sort takes a time than quick sort.
> I'll try investigating how to get a counter from external merge sort
> processing.
> Is this the right way?

Progress reporting on sorts seems like a tricky problem to me, as I
said before.  In most cases, a sort is going to involve an initial
stage where it reads all the input tuples and writes out quicksorted
runs, and then a merge phase where it merges all the output tapes into
a sorted result.  There are some complexities; for example, if the
number of tapes is really large, then we might need multiple merge
phases, only the last of which will produce tuples.  On the other
hand, if work_mem is very large, the time taken for sorting each run
might itself be significant that we'd like to have insight into
progress.  If we ignore those complexities, though, a reasonable way
of reporting progress might be to report the following:

1. blocks read from the relation
2. # of tuples we've put into the tuplesort
3. # of tuples we've extracted from the tuplesort

During the first part of the sort, (1) and (2) will be growing, and
the user can measure progress by comparing (1) to the total size of
the relation.  During the final merge, (3) will be growing, eventually
becoming equal to (2), so the user can measure progress my comparing
(2) with (3).

This approach only works for a seqscan-and-sort, though.  I'm not sure
what to do about the index scan case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Parallel Append implementation
Next
From: Shubham Barai
Date:
Subject: Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index