Re: Progress report of CREATE INDEX for nested partitioned tables - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: Progress report of CREATE INDEX for nested partitioned tables
Date
Msg-id 20221217143002.GR1153@telsasoft.com
Whole thread Raw
In response to Re: Progress report of CREATE INDEX for nested partitioned tables  (Ilya Gladyshev <ilya.v.gladyshev@gmail.com>)
Responses Re: Progress report of CREATE INDEX for nested partitioned tables  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On Tue, Dec 13, 2022 at 10:18:58PM +0400, Ilya Gladyshev wrote:
> > > I actually think that the progress view would be better off without
> > > the total number of partitions, 
> > 
> > Just curious - why ?
> 
> We don't really know how many indexes we are going to create, unless we
> have some kind of preliminary "planning" stage where we acumulate all
> the relations that will need to have indexes created (rather than
> attached). And if someone wants the total, it can be calculated
> manually without this view, it's less user-friendly, but if we can't do
> it well, I would leave it up to the user.

Thanks.  One other reason is that the partitions (and sub-partitions)
may not be equally sized.  Also, I've said before that it's weird to
report macroscopic progress about the number of partitions finihed in
the same place as reporting microscopic details like the number of
blocks done of the relation currently being processed.

> > I have another proposal: since the original patch 3.5 years ago
> > didn't
> > consider or account for sub-partitions, let's not start counting them
> > now.  It was never defined whether they were included or not (and I
> > guess that they're not common) so we can take this opportunity to
> > clarify the definition.
> 
> I have had this thought initially, but then I thought that it's not
> what I would want, if I was to track progress of multi-level
> partitioned tables (but yeah, I guess it's pretty uncommon). In this
> respect, I like your initial counter-proposal more, because it leaves
> us room to improve this in the future. Otherwise, if we commit to
> reporting only top-level partitions now, I'm not sure we will have the
> opportunity to change this.

We have the common problem of too many patches.

https://www.postgresql.org/message-id/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
This changes the progress reporting to show indirect children as
"total", and adds a global variable to track recursion into
DefineIndex(), allowing it to be incremented without the value being
lost to the caller.

https://www.postgresql.org/message-id/20221211063334.GB27893%40telsasoft.com
This also counts indirect children, but only increments the progress
reporting in the parent.  This has the disadvantage that when
intermediate partitions are in use, the done_partitions counter will
"jump" from (say) 20 to 30 without ever hitting 21-29.

https://www.postgresql.org/message-id/20221213044331.GJ27893%40telsasoft.com
This has two alternate patches:
- One patch changes to only update progress reporting of *direct*
  children.  This is minimal, but discourages any future plan to track
  progress involving intermediate partitions with finer granularity.
- A alternative patch adds IndexStmt.nparts_done, and allows reporting
  fine-grained progress involving intermediate partitions.

https://www.postgresql.org/message-id/flat/039564d234fc3d014c555a7ee98be69a9e724836.camel@gmail.com
This also reports progress of intermediate children.  The first patch
does it by adding an argument to DefineIndex() (which isn't okay to
backpatch).  And an alternate patch does it by adding to IndexStmt.

@committers: Is it okay to add nparts_done to IndexStmt ?

-- 
Justin



pgsql-hackers by date:

Previous
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Next
From: Fabien COELHO
Date:
Subject: Re: [PATCH] random_normal function