Home > mailing lists

Re: [HACKERS] CLUSTER command progress monitor - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: [HACKERS] CLUSTER command progress monitor
Date	September 11, 2017 20:23:01
Msg-id	CAH2-WzkFr=buahK1LMriHti_RkA=DJnd6n1ACUQ5Z8zPM29bbQ@mail.gmail.com Whole thread
In response to	Re: [HACKERS] CLUSTER command progress monitor (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On Mon, Sep 11, 2017 at 7:38 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
> <yamada.tatsuro@lab.ntt.co.jp> wrote:
>> Thanks for the comment.
>>
>> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
>> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
>> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
>> heap and write new heap" when INDEX SCAN was selected.
>>
>> I agree that progress reporting for sort is difficult. So it only reports
>> the phase ("sorting tuples") in the current design of progress monitor of
>> cluster.
>> It doesn't report counter of sort.
>
> Doesn't that make it almost useless?  I would guess that scanning the
> heap and writing the new heap would ordinarily account for most of the
> runtime, or at least enough that you're going to want something more
> than just knowing that's the phase you're in.

It's definitely my experience that CLUSTER is incredibly I/O bound.
You're shoveling the tuples through tuplesort.c, but the actual
sorting component isn't where the real costs are. Profiling shows that
writing out the new heap (including moderately complicated
bookkeeping) is the bottleneck, IIRC. That's why parallel CLUSTER
didn't look attractive, even though it would be a fairly
straightforward matter to add that on top of the parallel CREATE INDEX
structure from the patch that I wrote to do that.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

From: Peter Geoghegan
Date: 11 September 2017, 19:24:12
Subject: Re: [HACKERS] The case for removing replacement selection sort

From: Andrew Dunstan
Date: 11 September 2017, 20:32:37
Subject: Re: [HACKERS] pgbench tap tests & minor fixes.

Re: [HACKERS] CLUSTER command progress monitor - Mailing list pgsql-hackers

Previous

Next