Re: [HACKERS] CLUSTER command progress monitor - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] CLUSTER command progress monitor
Date
Msg-id CA+Tgmob00ASAYZUvtCmMY45LfO3E2D-re59DEOHY7Lf1KLHXiw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] CLUSTER command progress monitor  (Antonin Houska <ah@cybertec.at>)
List pgsql-hackers
On Mon, Nov 20, 2017 at 12:05 PM, Antonin Houska <ah@cybertec.at> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada
>> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>> >   1. scanning heap
>> >   2. sort tuples
>>
>> These two phases overlap, though. I believe progress reporting for
>> sorts is really hard.  In the simple case where the data fits in
>> work_mem, none of the work of the sort gets done until all the data is
>> read.  Once you switch to an external sort, you're writing batch
>> files, so a lot of the work is now being done during data loading.
>> But as the number of batch files grows, the final merge at the end
>> becomes an increasingly noticeable part of the cost, and eventually
>> you end up needing multiple merge passes.  I think we need some smart
>> way to report on sorts so that we can tell how much of the work has
>> really been done, but I don't know how to do it.
>
> Whatever complexity is hidden in the sort, cost_sort() should have taken it
> into consideration when called via plan_cluster_use_sort(). Thus I think that
> once we have both startup and total cost, the current progress of the sort
> stage can be estimated from the current number of input and output
> rows. Please remind me if my proposal appears to be too simplistic.

I think it is far too simplistic.  If the sort is being fed by a
sequential scan, reporting the number of blocks scanned so far as
compared to the total number that will be scanned would be a fine way
of reporting on the progress of the sequential scan -- and it's better
to use blocks, which we know for sure about, than rows, at which we
can only guess.  But that's the *scan* progress, not the *sort*
progress.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] CLUSTER command progress monitor
Next
From: Thomas Munro
Date:
Subject: Re: [HACKERS] [PATCH] Incremental sort