Re: Parallel CREATE INDEX for GIN indexes - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Parallel CREATE INDEX for GIN indexes
Date
Msg-id 06bd6a23-e317-4707-83d0-23c15809547e@vondra.me
Whole thread Raw
In response to Re: Parallel CREATE INDEX for GIN indexes  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
On 2/12/25 15:59, Matthias van de Meent wrote:
> On Tue, 7 Jan 2025 at 12:59, Tomas Vondra <tomas@vondra.me> wrote:
>>
>> ...
>>
>> I haven't done anything about this, but I'm not sure adding the number
>> of GIN tuples to pg_stat_progress_create_index would be very useful. We
>> don't know the total number of entries, so it can't show the progress.
> 
> For btree scans, we update the number of to-be-inserted tuples
> together with the number of blocks scanned. Can we do something
> similar with GIN?
> 

I've been thinking about this, but I'm not quite sure how should that
work. The problem is in btree we have a 1:1 mapping to heap tuples, but
in GIN that's not quite that simple. Not only do we generate multiple
GIN entries for each heap row, but we also combine / merge those tuples
in various levels.

But I think it might look like this:

1) Each worker counts the number of GinTuples written to the shared
tuplesort, after the in-worker merge phase (i.e. it'd not be the number
of GIN entries generated in ginBuildCallbackParallel).

2) The leader then counts the number of entries it loaded from the
tuplesort, before merging/writing them into the index.

I think this would work as a measure of progress, even though it does
not really match the number of index tuples.

One thing I'm not not sure about is how would this work with the "single
tuplesort" patch? That patch moves the merging to the tuplesort code,
and there doesn't seem to be a nice way to pass the number of merged
outside.

> Can we track data for pg_stat_progress_create_index?
> 

Which data? I think progress for the CREATE INDEX would be nice, ofc.


regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Pg_stat_activity
Next
From: Junwang Zhao
Date:
Subject: Re: generic plans and "initial" pruning