Re: pg_stat_progress_create_index vs. parallel index builds - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: pg_stat_progress_create_index vs. parallel index builds
Date
Msg-id CAEze2WhhGStdJ0=3nXGz9eUYNDVD0UEt0s04SDaumtOLYetstA@mail.gmail.com
Whole thread Raw
In response to pg_stat_progress_create_index vs. parallel index builds  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: pg_stat_progress_create_index vs. parallel index builds
List pgsql-hackers
On Wed, 2 Jun 2021 at 13:57, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
> Hi,
>
> While experimenting with parallel index builds, I've noticed a somewhat
> strange behavior of pg_stat_progress_create_index when a btree index is
> built with parallel workers - some of the phases seem to be missing.
>
> In serial (no parallelism) mode, the progress is roughly this (it's
> always the first/last timestamp of each phase):
>
>               |   command    |             phase
> -------------+--------------+----------------------------------------
>   12:56:01 AM | CREATE INDEX | building index: scanning table
>           ...
>   01:06:22 AM | CREATE INDEX | building index: scanning table
>   01:06:23 AM | CREATE INDEX | building index: sorting live tuples
>           ...
>   01:13:10 AM | CREATE INDEX | building index: sorting live tuples
>   01:13:11 AM | CREATE INDEX | building index: loading tuples in tree
>           ...
>   01:24:02 AM | CREATE INDEX | building index: loading tuples in tree
>
> So it goes through three phases:
>
> 1) scanning tuples
> 2) sorting live tuples
> 3) loading tuples in tree
>
> But with parallel build index build, it changes to:
>
>               |   command    |             phase
> -------------+--------------+----------------------------------------
>   11:40:48 AM | CREATE INDEX | building index: scanning table
>           ...
>   11:47:24 AM | CREATE INDEX | building index: scanning table (scan
>                 complete)
>   11:56:22 AM | CREATE INDEX | building index: scanning table
>   11:56:23 AM | CREATE INDEX | building index: loading tuples in tree
>           ...
>   12:05:33 PM | CREATE INDEX | building index: loading tuples in tree
>
> That is, the "sorting live tuples" phase disappeared, and instead it
> seems to be counted in the "scanning table" one, as if there was an
> update of the phase missing.

> I've only tried this on master, but I assume it behaves like this in the
> older releases too. I wonder if this is intentional - it sure is a bit
> misleading.

This was a suprise to me as well. According to documentation in
sortsupport.h (line 125-129) the parallel workers produce pre-sorted
segments during the scanning phase, which are subsequently merged by
the leader. This might mean that the 'sorting' phase is already
finished during the 'scanning' phase by waiting for the parallel
workers; I haven't looked further if this is the case and whether it
could be changed to also produce the sorting metrics, but seeing as it
is part of the parallel workers API of tuplesort, I think fixing it in
current releases is going to be difficult.

With regards,

Matthias van de Meent



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Race condition in recovery?
Next
From: Tomas Vondra
Date:
Subject: Re: pg_stat_progress_create_index vs. parallel index builds