Re: [HACKERS] CLUSTER command progress monitor - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: [HACKERS] CLUSTER command progress monitor
Date
Msg-id 15653.1511197525@localhost
Whole thread Raw
In response to Re: [HACKERS] CLUSTER command progress monitor  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] CLUSTER command progress monitor
Re: [HACKERS] CLUSTER command progress monitor
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> wrote:

> On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
> >   1. scanning heap
> >   2. sort tuples
>
> These two phases overlap, though. I believe progress reporting for
> sorts is really hard.  In the simple case where the data fits in
> work_mem, none of the work of the sort gets done until all the data is
> read.  Once you switch to an external sort, you're writing batch
> files, so a lot of the work is now being done during data loading.
> But as the number of batch files grows, the final merge at the end
> becomes an increasingly noticeable part of the cost, and eventually
> you end up needing multiple merge passes.  I think we need some smart
> way to report on sorts so that we can tell how much of the work has
> really been done, but I don't know how to do it.

Whatever complexity is hidden in the sort, cost_sort() should have taken it
into consideration when called via plan_cluster_use_sort(). Thus I think that
once we have both startup and total cost, the current progress of the sort
stage can be estimated from the current number of input and output
rows. Please remind me if my proposal appears to be too simplistic.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] [PATCH] A hook for session start
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] CLUSTER command progress monitor