Re: pg_stat_progress_create_index vs. parallel index builds - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: pg_stat_progress_create_index vs. parallel index builds
Date
Msg-id 202106092053.nvvkonovsy7u@alvherre.pgsql
Whole thread Raw
In response to Re: pg_stat_progress_create_index vs. parallel index builds  (Greg Nancarrow <gregn4422@gmail.com>)
List pgsql-hackers
On 2021-Jun-04, Greg Nancarrow wrote:

> I tested with and without the patch, using the latest PG14 source as
> of today, and can confirm that without the patch applied, the "sorting
> live tuples" phase is not reported in the parallel-case, but with the
> patch applied it then does get reported in that case. I also confirmed
> that, as you said, the patch only addresses the usual case where the
> parallel leader participates in the parallel operation.
> What is slightly puzzling to me (and perhaps digging deeper will
> reveal it) is why this "sorting live tuples" phase seems so short in
> the serial case compared to the parallel case?
> For example, in my test I created an index on a column of a table
> having 10 million records, and it took about 40 seconds, during which
> the "sorting live tuples" phase seemed to take about 8 seconds. Yet
> for the serial case, index creation took about 75 seconds, during
> which the "sorting live tuples" phase seemed to take about 1 second.

I think the reason is that scanning the table is not just scanning the
table -- it is also feeding tuples to tuplesort, which internally is
already sorting them as it receives them.  So by the time you're done
scanning the relation, some (large) fraction of the sorting work is
already done, which is why the "sorting" phase is so short.


Tracing sort is not easy.  we discussed this earlier; see
https://postgr.es/m/20181218210159.xtkltzm7flrwsm55@alvherre.pgsql
for example.

-- 
Álvaro Herrera       Valdivia, Chile
Maybe there's lots of data loss but the records of data loss are also lost.
(Lincoln Yeoh)



pgsql-hackers by date:

Previous
From: "Bossart, Nathan"
Date:
Subject: Re: Estimating HugePages Requirements?
Next
From: Alvaro Herrera
Date:
Subject: Re: unnesting multirange data types