Thread: CLUSTER on partitioned index

CLUSTER on partitioned index

From
Justin Pryzby
Date:
Forking this thread, since the existing CFs have been closed.
https://www.postgresql.org/message-id/flat/20200914143102.GX18552%40telsasoft.com#58b1056488451f8594b0f0ba40996afd

On Tue, Oct 06, 2020 at 01:38:23PM +0900, Michael Paquier wrote:
> On Mon, Oct 05, 2020 at 10:07:33PM -0500, Justin Pryzby wrote:
> > Honestly, I think you're over-thinking and over-engineering indisclustered.
> > 
> > If "clusteredness" was something we offered to maintain across DML, I think
> > that might be important to provide stronger guarantees.  As it is now, I don't
> > think this patch is worth changing the catalog definition.
> 
> Well, this use case is new because we are discussing the relationship
> of indisclustered across multiple transactions for multiple indexes,
> so I'd rather have this discussion than not, and I have learnt
> the hard way with REINDEX that we should care a lot about the
> consistency of partition trees at any step of the operation.

indisclustered is only used as a default for "CLUSTER" (without USING).  The
worst thing that can happen if it's "inconsistent" is that "CLUSTER;" clusters
a table on the "old" clustered index (that it was already clustered on), which
is what would've happened before running some command which was interrupted.

> Let's
> imagine a simple example here, take this partition tree: p (parent),
> and two partitions p1 and p2.  p has two partitioned indexes i and j,
> indexes also present in p1 and p2 as i1, i2, j1 and j2.  Let's assume
> that the user has done a CLUSTER on p USING i that completes, meaning
> that i, i1 and i2 have indisclustered set.  Now let's assume that the
> user does a CLUSTER on p USING j this time, and that this command
> fails while processing p2, meaning that indisclustered is set for j1,
> i2, and perhaps i or j depending on what the patch does.

I think the state of "indisclustered" at that point is not critical.
The command failed, and the user can re-run it, or ALTER..SET CLUSTER.
Actually, I think the only inconsistent state is if two indexes are both marked
indisclustered.

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

Also, I noticed that CREATE TABLE (LIKE.. INCLUDING INDEXES) doesn't preserve
indisclustered, but I can't say that's an issue.

-- 
Justin

Attachment

Re: CLUSTER on partitioned index

From
Zhihong Yu
Date:
Hi,
For v7-0002-Implement-CLUSTER-of-partitioned-table.patch:

+        * We have to build the list in a different memory context so it will
+        * survive the cross-transaction processing
+        */
+       old_context = MemoryContextSwitchTo(cluster_context);

cluster_context is not modified within the loop. Can the memory context switching code be moved outside the loop ?

Cheers

On Sat, Feb 6, 2021 at 6:46 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Mon, Jan 18, 2021 at 12:34:59PM -0600, Justin Pryzby wrote:
> On Sat, Nov 28, 2020 at 08:03:02PM -0600, Justin Pryzby wrote:
> > On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote:
> > > On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote:
> > > > On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote:
> > > > > I'm attaching a counter-proposal to your catalog change, which preserves
> > > > > indisclustered on children of clustered, partitioned indexes, and invalidates
> > > > > indisclustered when attaching unclustered indexes.
> > > >
> > > > ..and now propagates CLUSTER ON to child indexes.
> > > >
> > > > I left this as separate patches to show what I mean and what's new while we
> > > > discuss it.
> > >
> > > This fixes some omissions in the previous patch and error in its test cases.
> > >
> > > CLUSTER ON recurses to children, since I think a clustered parent index means
> > > that all its child indexes are clustered.  "SET WITHOUT CLUSTER" doesn't have
> > > to recurse to children, but I did it like that for consistency and it avoids
> > > the need to special case InvalidOid.
> >
> > The previous patch failed pg_upgrade when restoring a clustered, parent index,
> > since it's marked INVALID until indexes have been built on all child tables, so
> > CLUSTER ON was rejected on invalid index.
> >
> > So I think CLUSTER ON needs to be a separate pg_dump object, to allow attaching
> > the child index (thereby making the parent "valid") to happen before SET
> > CLUSTER on the parent index.
>
> Rebased on b5913f612 and now a3dc92600.

This resolves ORDER BY test failure with COLLATE "C".

--
Justin

Re: CLUSTER on partitioned index

From
Zhihong Yu
Date:
Hi,
For v10-0002-Implement-CLUSTER-of-partitioned-table.patch :

or that an partitioned index was previously set clustered.

'an partitioned index' -> a partitioned index

+ * Return a List of tables and associated index, where each index is a

associated index -> associated indices

For cluster():
-       rel = table_open(tableOid, NoLock);
+       rel = table_open(tableOid, ShareUpdateExclusiveLock);

Considering the comment preceding cluster() (forced to acquire exclusive locks on all the tables), maybe add a comment explaining why it is safe to take ShareUpdateExclusiveLock.

+cluster_multiple_rels(List *rvs, int options)

I think the multiple in the method name is not needed since the relation is in plural.

Cheers

On Fri, Apr 2, 2021 at 1:03 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
@cfbot: rebased

Re: CLUSTER on partitioned index

From
Laurenz Albe
Date:
On Tue, 2021-07-20 at 20:27 -0400, Alvaro Herrera wrote:
> I have to wonder if there really *is* a use case for CLUSTER in the
> first place on regular tables, let alone on partitioned tables, which
> are likely to be large and thus take a lot of time.  What justifies
> spending so much time on this implementation?  My impression is that
> CLUSTER is pretty much a fringe command nowadays, because of the access
> exclusive lock required.
> 
> Does anybody actually use it?

I see is used in the field occasionally, as it can really drastically
improve performance.  But I admit is is not frequently used.

In a data warehouse, which is updated only occasionally, running
CLUSTER after an update can make a lot of sense.

I personally think that it is enough to be able to cluster the table
partiton by partition.

Yours,
Laurenz Albe




Re: CLUSTER on partitioned index

From
Matthias van de Meent
Date:
On Sun, 12 Sept 2021 at 22:10, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote:
> > I have to wonder if there really *is* a use case for CLUSTER in the
> > first place on regular tables, let alone on partitioned tables, which
> > are likely to be large and thus take a lot of time.
>
> The cluster now is done one partition at a time, so it might take a long time,
> but doesn't lock the whole partition heirarchy.  Same as VACUUM (since v10) and
> (since v14) REINDEX.

Note: The following review is based on the assumption that this v11
revision was meant to contain only one patch. I put this up as a note,
because it seemed quite limited when compared to earlier versions of
the patchset.

I noticed that you store the result of find_all_inheritors(...,
NoLock) in get_tables_to_cluster_partitioned, without taking care of
potential concurrent partition hierarchy changes that the comment on
find_all_inheritors warns against or documenting why it is safe, which
sounds dangerous in case someone wants to adapt the code. One problem
I can think of is that only storing reloid and indoid is not
necessarily safe, as they might be reused by drop+create table running
parallel to the CLUSTER command.

Apart from that, I think it would be useful (though not strictly
necessary for this patch) if you could adapt the current CLUSTER
progress reporting to report the progress for the whole partition
hierarchy, instead of a new progress report for each relation, as was
the case in earlier versions of the patchset.

The v11 patch seems quite incomplete when compared to previous
discussions, or at the very least is very limited (no ALTER TABLE
clustering commands for partitioned tables, but `CLUSTER ptable USING
pindex` is supported). If v11 is the new proposed direction for ptable
clustering, could you also document these limitations in the
cluster.sgml and alter_table.sgml docs?

> [ v11-0001-Implement-CLUSTER-of-partitioned-table.patch ]

> diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
> ...
> +ALTER TABLE clstrpart SET WITHOUT CLUSTER;
> +ERROR:  cannot mark index clustered in partitioned table

This error message does not seem to match my expectation as a user: I
am not trying to mark an index as clustered, and for a normal table
"SET WITHOUT CLUSTER" does not fail for unclustered tables. I think
that behaviour of normal unclustered tables should be shared here as
well. At the very least, the error message should be changed.

>  ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
>  ERROR:  cannot mark index clustered in partitioned table

A "HINT: use the CLUSTER command to cluster partitioned tables" (or
equivalent) should be added if we decide to keep the clustering APIs
of ALTER TABLE disabled for partitioned tables, as CLUSTER is now
implemented for partitioned tables.

> -DROP TABLE clstrpart;

I believe that this cleanup should not be fully removed, but moved to
before '-- Test CLUSTER with external tuplesorting', as the table is
not used after that line.

Kind regards,

Matthias van de Meent



Re: CLUSTER on partitioned index

From
Justin Pryzby
Date:
On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote:
> I have to wonder if there really *is* a use case for CLUSTER in the
> first place on regular tables, let alone on partitioned tables, which
> are likely to be large and thus take a lot of time.  What justifies
> spending so much time on this implementation?  My impression is that
> CLUSTER is pretty much a fringe command nowadays, because of the access
> exclusive lock required.
> 
> Does anybody actually use it?

I hope that Alvaro will comment on the simplified patches.  If multiple people
think the patch isn't worth it, feel free to close it.  But I don't see how
complexity could be the reason.

-- 
Justin



Re: CLUSTER on partitioned index

From
Alvaro Herrera
Date:
On 2022-Feb-23, Justin Pryzby wrote:

> I hope that Alvaro will comment on the simplified patches.  If multiple people
> think the patch isn't worth it, feel free to close it.  But I don't see how
> complexity could be the reason.

I gave your patch a look and it seems a reasonable thing to do.  Maybe
not terribly useful in most cases, but there may be some cases for which
it is.  I found some part of it a bit repetitive, so I moved things
around a bit.  What do think about this?

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"After a quick R of TFM, all I can say is HOLY CR** THAT IS COOL! PostgreSQL was
amazing when I first started using it at 7.2, and I'm continually astounded by
learning new features and techniques made available by the continuing work of
the development team."
Berend Tober, http://archives.postgresql.org/pgsql-hackers/2007-08/msg01009.php

Attachment

Re: CLUSTER on partitioned index

From
Alvaro Herrera
Date:
I realized after posting that we used to allow clustering toast tables,
but after my changes we no longer do.  (Justin's version had a
RELKIND_HAS_STORAGE test here instead, which seemed a little too lax.) I
don't know why we allowed it and I don't know of anyone who has ever
used that feature and we don't have any test coverage for it, but I
don't have any reason to explicitly disallow it either.  So I propose to
continue to allow it:

>From 05ba6124422fb7c2fd19575e905e444ba3eef1e5 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Thu, 31 Mar 2022 12:49:57 +0200
Subject: [PATCH] allow to cluster toast tables

---
 src/backend/commands/cluster.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8417cbdb67..b391d7c434 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -451,7 +451,8 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
     }
 
     Assert(OldHeap->rd_rel->relkind == RELKIND_RELATION ||
-           OldHeap->rd_rel->relkind == RELKIND_MATVIEW);
+           OldHeap->rd_rel->relkind == RELKIND_MATVIEW ||
+           OldHeap->rd_rel->relkind == RELKIND_TOASTVALUE);
 
     /*
      * All predicate locks on the tuples or pages are about to be made
-- 
2.30.2

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Java is clearly an example of money oriented programming"  (A. Stepanov)



Re: CLUSTER on partitioned index

From
Robert Haas
Date:
On Thu, Mar 31, 2022 at 6:54 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> I realized after posting that we used to allow clustering toast tables,
> but after my changes we no longer do.  (Justin's version had a
> RELKIND_HAS_STORAGE test here instead, which seemed a little too lax.) I
> don't know why we allowed it and I don't know of anyone who has ever
> used that feature and we don't have any test coverage for it, but I
> don't have any reason to explicitly disallow it either.  So I propose to
> continue to allow it:

I think that's probably a good decision. It's certainly useful to have
a way to force a rewrite of a TOAST table, although a lot of people
who would benefit from that operation probably don't know that they
need it, or don't know that they need just that, and end up rewriting
both the main table and the TOAST table. Whether it's useful to be
able to run CLUSTER specifically rather than VACUUM FULL on the TOAST
table is less clear, but I don't think we're likely to save anything
by forbidding it. Maybe we should consider adding a test, though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: CLUSTER on partitioned index

From
Justin Pryzby
Date:
On Wed, Mar 30, 2022 at 10:51:43PM +0200, Alvaro Herrera wrote:
> On 2022-Feb-23, Justin Pryzby wrote:
> 
> > I hope that Alvaro will comment on the simplified patches.  If multiple people
> > think the patch isn't worth it, feel free to close it.  But I don't see how
> > complexity could be the reason.
> 
> I gave your patch a look and it seems a reasonable thing to do.  Maybe
> not terribly useful in most cases, but there may be some cases for which
> it is.  I found some part of it a bit repetitive, so I moved things
> around a bit.  What do think about this?

Thanks for looking at it.

The changes to finish_heap_swap() and get_tables_to_cluster() are superfluous,
right ?

I think this comment is worth preserving (it'd be okay if it lived in the
commit message).
-                        * Expand partitioned relations for CLUSTER (the corresponding
-                        * thing for VACUUM FULL happens in and around expand_vacuum_rel()

+       if (rel != NULL) In this case, maybe it should Assert() that it's
relkind=p (mostly for purposes of self-documentation).

+    partition of the specified partitioned index (which must be specified).
This is my own language, but now seems repetitive.  I think the parenthetical
part should be a separate sentance: "For partitioned indexes, the index may not
be omitted.".

Otherwise looks ok.

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index b3463ae5c46..fbc090cd0b0 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -199,7 +199,8 @@ CLUSTER [VERBOSE]
 
    <para>
     Clustering a partitioned table clusters each of its partitions using the
-    partition of the specified partitioned index (which must be specified).
+    partition of the specified partitioned index.  When clustering a
+    partitioned table, the index may not be omitted.
    </para>
 
  </refsect1>
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8417cbdb67f..412147f05bc 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -231,6 +231,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
     params.options |= CLUOPT_RECHECK;
     if (rel != NULL)
     {
+        Assert (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
         check_index_is_clusterable(rel, indexOid, true, AccessShareLock);
         rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
 
@@ -451,6 +452,7 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
     }
 
     Assert(OldHeap->rd_rel->relkind == RELKIND_RELATION ||
+           OldHeap->rd_rel->relkind == RELKIND_TOASTVALUE ||
            OldHeap->rd_rel->relkind == RELKIND_MATVIEW);
 
     /*
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 3f2758d13f6..6cf18c8d321 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -305,6 +305,8 @@ WHERE pg_class.oid=indexrelid
 ---------
 (0 rows)
 
+-- Verify that toast is clusterable
+CLUSTER pg_toast.pg_toast_826 USING pg_toast_826_index;
 -- Verify that clustering all tables does in fact cluster the right ones
 CREATE USER regress_clstr_user;
 CREATE TABLE clstr_1 (a INT PRIMARY KEY);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 74118993a82..ae27c35f65d 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -104,6 +104,9 @@ WHERE pg_class.oid=indexrelid
     AND pg_class_2.relname = 'clstr_tst'
     AND indisclustered;
 
+-- Verify that toast is clusterable
+CLUSTER pg_toast.pg_toast_826 USING pg_toast_826_index;
+
 -- Verify that clustering all tables does in fact cluster the right ones
 CREATE USER regress_clstr_user;
 CREATE TABLE clstr_1 (a INT PRIMARY KEY);



Re: CLUSTER on partitioned index

From
Justin Pryzby
Date:
On Thu, Mar 31, 2022 at 12:54:36PM +0200, Alvaro Herrera wrote:
> I realized after posting that we used to allow clustering toast tables,
> but after my changes we no longer do.  (Justin's version had a
> RELKIND_HAS_STORAGE test here instead, which seemed a little too lax.) I
> don't know why we allowed it and I don't know of anyone who has ever
> used that feature and we don't have any test coverage for it, but I
> don't have any reason to explicitly disallow it either.  So I propose to
> continue to allow it:

Good catch.

My daily vacuum script would've discovered that they're no longer supported, as
it tests for (among other things) c.relkind IN ('r','t').  That clusters tables
that have an indisclustered set and vacuums various others.  (BTW, it's the
same script that discovered in 2019 that clustering on expressional indexes had
been broken by the heapam changes).

I think the response should be to add a test case, which could be 0001 or
00099.



Re: CLUSTER on partitioned index

From
Alvaro Herrera
Date:
Thanks, pushed.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La fuerza no está en los medios físicos
sino que reside en una voluntad indomable" (Gandhi)



Re: CLUSTER on partitioned index

From
Alvaro Herrera
Date:
Small things here.

1. in VACUUM FULL we only process partitions that are owned by the
invoking user.  We don't have this test in the new code.  I'm not sure
why do we do that there; is it worth doing the same here?

2. We should silently skip a partition that's a foreign table, I
suppose.

3. We do mark the index on the partitions as indisclustered AFAICS (we
claim that the partitioned table's index is not marked, which is
accurate).  So users doing unadorned CLUSTER afterwards will get the
partitions clustered too, once they cluster the partitioned table.  If
they don't want this, they would have to ALTER TABLE to remove the
marking.  How likely is that this will be a problem?  Maybe documenting
this point is enough.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
Voy a acabar con todos los humanos / con los humanos yo acabaré
voy a acabar con todos (bis) / con todos los humanos acabaré ¡acabaré! (Bender)



Re: CLUSTER on partitioned index

From
Justin Pryzby
Date:
On Sat, Apr 02, 2022 at 07:11:47PM +0200, Alvaro Herrera wrote:
> Thanks, pushed.

Thank you for revisiting it, and thanks to Zhihong Yu for earlier review.

I'll look into your outstanding questions later this week.

-- 
Justin



Re: CLUSTER on partitioned index

From
Justin Pryzby
Date:
On Sat, Apr 02, 2022 at 07:21:11PM +0200, Alvaro Herrera wrote:
> Small things here.

> 1. in VACUUM FULL we only process partitions that are owned by the
> invoking user.  We don't have this test in the new code.  I'm not sure
> why do we do that there; is it worth doing the same here?

That dates to a556549d7 (see also cbe24a6dd8 for an earlier commit in CLUSTER
itself).  The reason was to avoid blocking if an unprivileged user runs VACUUM
FULL which would try to lock things (including shared catalogs) before checking
if they have permission to vacuum them.  That commit also initially checks the
owner of the partitioned table, and then re-checking owner of partitions later
on.

A similar issue exists here.  But 1) catalog tables are not partitioned, and,
2) ownership of a partitioned table is checked immediately.  So the problem can
only occur if a user who owns a partitioned table but doesn't own all its
partitions tries to cluster it, and it blocks behind another session.  Fixing
this is probably a good idea, but seems improbable that it would avoid a DOS.

> 2. We should silently skip a partition that's a foreign table, I
> suppose.

I think it's not needed, since the loop over index children doesn't see a child
index on the foreign table.  ?

> 3. We do mark the index on the partitions as indisclustered AFAICS (we
> claim that the partitioned table's index is not marked, which is
> accurate).  So users doing unadorned CLUSTER afterwards will get the
> partitions clustered too, once they cluster the partitioned table.  If
> they don't want this, they would have to ALTER TABLE to remove the
> marking.  How likely is that this will be a problem?  Maybe documenting
> this point is enough.

It seems at least as likely that someone would *want* the partitions to be
marked clustered as that someone would want them to be unchanged.

The cluster mark accurately reflects having been clustered.  It seems unlikely
that a user would want something else to be clustered later by "cluster;".
Since clustering on a partitioned table wasn't supported before, nothing weird
will happen to someone who upgrades to v15 unless they elect to use the new
feature.  As this seems to be POLA, it doesn't even need to be documented.  ?

Attachment

Re: CLUSTER on partitioned index

From
Zhihong Yu
Date:


On Mon, Apr 11, 2022 at 7:06 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Sat, Apr 02, 2022 at 07:21:11PM +0200, Alvaro Herrera wrote:
> Small things here.

> 1. in VACUUM FULL we only process partitions that are owned by the
> invoking user.  We don't have this test in the new code.  I'm not sure
> why do we do that there; is it worth doing the same here?

That dates to a556549d7 (see also cbe24a6dd8 for an earlier commit in CLUSTER
itself).  The reason was to avoid blocking if an unprivileged user runs VACUUM
FULL which would try to lock things (including shared catalogs) before checking
if they have permission to vacuum them.  That commit also initially checks the
owner of the partitioned table, and then re-checking owner of partitions later
on.

A similar issue exists here.  But 1) catalog tables are not partitioned, and,
2) ownership of a partitioned table is checked immediately.  So the problem can
only occur if a user who owns a partitioned table but doesn't own all its
partitions tries to cluster it, and it blocks behind another session.  Fixing
this is probably a good idea, but seems improbable that it would avoid a DOS.

> 2. We should silently skip a partition that's a foreign table, I
> suppose.

I think it's not needed, since the loop over index children doesn't see a child
index on the foreign table.  ?

> 3. We do mark the index on the partitions as indisclustered AFAICS (we
> claim that the partitioned table's index is not marked, which is
> accurate).  So users doing unadorned CLUSTER afterwards will get the
> partitions clustered too, once they cluster the partitioned table.  If
> they don't want this, they would have to ALTER TABLE to remove the
> marking.  How likely is that this will be a problem?  Maybe documenting
> this point is enough.

It seems at least as likely that someone would *want* the partitions to be
marked clustered as that someone would want them to be unchanged.

The cluster mark accurately reflects having been clustered.  It seems unlikely
that a user would want something else to be clustered later by "cluster;".
Since clustering on a partitioned table wasn't supported before, nothing weird
will happen to someone who upgrades to v15 unless they elect to use the new
feature.  As this seems to be POLA, it doesn't even need to be documented.  ?
Hi,
For v13-0002-cluster-early-ownership-check-of-partitions.patch :

only for it to fails ownership check anyway 

to fails -> to fail

Cheers

Re: CLUSTER on partitioned index

From
Michael Paquier
Date:
On Mon, Apr 11, 2022 at 09:06:09AM -0500, Justin Pryzby wrote:
> On Sat, Apr 02, 2022 at 07:21:11PM +0200, Alvaro Herrera wrote:
>> 1. in VACUUM FULL we only process partitions that are owned by the
>> invoking user.  We don't have this test in the new code.  I'm not sure
>> why do we do that there; is it worth doing the same here?

I think that adding a test is a good idea for such things.  Perhaps we
could have an isolation test, but what Justin is proposing seems good
enough to me for this goal.

> That dates to a556549d7 (see also cbe24a6dd8 for an earlier commit in CLUSTER
> itself).  The reason was to avoid blocking if an unprivileged user runs VACUUM
> FULL which would try to lock things (including shared catalogs) before checking
> if they have permission to vacuum them.  That commit also initially checks the
> owner of the partitioned table, and then re-checking owner of partitions later
> on.
>
> A similar issue exists here.  But 1) catalog tables are not partitioned, and,
> 2) ownership of a partitioned table is checked immediately.  So the problem can
> only occur if a user who owns a partitioned table but doesn't own all its
> partitions tries to cluster it, and it blocks behind another session.  Fixing
> this is probably a good idea, but seems improbable that it would avoid a DOS.

Catalogs are out of the picture as you say and I would not worry about
them becoming somewhat partitioned even in the far future.  Are you
saying that it is possible for a user kicking a CLUSTER command on a
partitioned table who has no ownership on some of the partitions to
do some blocking table_open() calls if the permission check is not
done in get_tables_to_cluster_partitioned()?  Hence, this user could
block the access to such partitions?  I am not sure that we need to
add any new ownership checks here as CLUOPT_RECHECK gets added to the
parameters in cluster() before calling cluster_multiple_rels(), then
we do a mix of try_relation_open() with a skip when we are not the
owner anymore.  So this logic looks sound to me.  In short, you don't
need this extra check, and the test proposed in 0002 keeps the same
behavior.

>> 2. We should silently skip a partition that's a foreign table, I
>> suppose.
>
> I think it's not needed, since the loop over index children doesn't see a child
> index on the foreign table?

Hmm.  That may be a sign to add an assertion, at least, or something
based on RELKIND_HAS_STORAGE().

I was wondering what 0001 was doing here as that's a separate issue,
but it looked fine so I have applied it.

+       /* Use a permanent memory context for the result list */
+       old_context = MemoryContextSwitchTo(cluster_context);
+
        rtc = (RelToCluster *) palloc(sizeof(RelToCluster));

Independently of the extra ownership check, the memory context
manipulation has to be fixed and the code shoudl switch to
RelToCluster only when saving an item.

+CREATE ROLE ptnowner;
Roles that are created in the regression tests need to be prefixed
with "regress_", or some buildfarm members will complain.  FWIW, I
enforce -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS in all my dev
builds.

I have added an open item for now, but the whole looks
straight-forward to me.
--
Michael

Attachment

Re: CLUSTER on partitioned index

From
Justin Pryzby
Date:
On Wed, Apr 13, 2022 at 03:50:15PM +0900, Michael Paquier wrote:
> 
> > That dates to a556549d7 (see also cbe24a6dd8 for an earlier commit in CLUSTER
> > itself).  The reason was to avoid blocking if an unprivileged user runs VACUUM
> > FULL which would try to lock things (including shared catalogs) before checking
> > if they have permission to vacuum them.  That commit also initially checks the
> > owner of the partitioned table, and then re-checking owner of partitions later
> > on.
> > 
> > A similar issue exists here.  But 1) catalog tables are not partitioned, and,
> > 2) ownership of a partitioned table is checked immediately.  So the problem can
> > only occur if a user who owns a partitioned table but doesn't own all its
> > partitions tries to cluster it, and it blocks behind another session.  Fixing
> > this is probably a good idea, but seems improbable that it would avoid a DOS.
> 
> Catalogs are out of the picture as you say and I would not worry about
> them becoming somewhat partitioned even in the far future.  Are you
> saying that it is possible for a user kicking a CLUSTER command on a
> partitioned table who has no ownership on some of the partitions to
> do some blocking table_open() calls if the permission check is not
> done in get_tables_to_cluster_partitioned()?  Hence, this user could
> block the access to such partitions?  I am not sure that we need to
> add any new ownership checks here as CLUOPT_RECHECK gets added to the
> parameters in cluster() before calling cluster_multiple_rels(), then
> we do a mix of try_relation_open() with a skip when we are not the
> owner anymore.  So this logic looks sound to me.  In short, you don't
> need this extra check, and the test proposed in 0002 keeps the same
> behavior.

Are you sure?  The ownership re-check in cluster_rel() occurs after acquiring
locks.

s1:
postgres=# CREATE TABLE p(i int) PARTITION BY LIST (i);
postgres=# CREATE TABLE p1 PARTITION OF p FOR VALUES IN (1);
postgres=# CREATE TABLE p2 PARTITION OF p FOR VALUES IN (2);
postgres=# CREATE INDEX ON p (i);
postgres=# CREATE ROLE po WITH LOGIN;
postgres=# ALTER TABLE p OWNER TO po;
postgres=# begin; SELECT FROM p1;

s2:
postgres=> SET client_min_messages =debug;
postgres=> CLUSTER VERBOSE p USING p_i_idx ;
LOG:  process 26058 still waiting for AccessExclusiveLock on relation 39577 of database 5 after 1000.105 ms
postgres=> SELECT 39577::regclass;
regclass | p1



Re: CLUSTER on partitioned index

From
Michael Paquier
Date:
On Wed, Apr 13, 2022 at 05:52:14AM -0500, Justin Pryzby wrote:
> Are you sure?  The ownership re-check in cluster_rel() occurs after acquiring
> locks.

Yep, you are right.  However, the SQL test does not check for this
blocking scenario.  In short, removing the new ACL check in
get_tables_to_cluster_partitioned() makes the test behave the same
way.  Could you implement an isolation check to make sure that the
difference is visible?  The SQL check looks useful in itself, either
way.
--
Michael

Attachment

Re: CLUSTER on partitioned index

From
Alvaro Herrera
Date:
Thanks for the patch -- I have pushed it now, with some wording changes
and renaming the role to regress_* to avoid buildfarm's ire.

Michaël in addition proposes an isolation test.  I'm not sure; is it
worth the additional test run time?  It doesn't seem a critical issue.
But if anybody feels like contributing one, step right ahead.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: CLUSTER on partitioned index

From
Michael Paquier
Date:
On Thu, Apr 14, 2022 at 10:37:06PM +0200, Alvaro Herrera wrote:
> Thanks for the patch -- I have pushed it now, with some wording changes
> and renaming the role to regress_* to avoid buildfarm's ire.

Cool, thanks.

> Michaël in addition proposes an isolation test.  I'm not sure; is it
> worth the additional test run time?  It doesn't seem a critical issue.
> But if anybody feels like contributing one, step right ahead.

Well, I am a bit annoyed that we don't actually check that a CLUSTER
command does not block when doing a CLUSTER on a partitioned table
while a lock is held on one of its partitions.  So, attached is a
proposal of patch to improve the test coverage in this area.  While on
it, I have added a test with a normal table.  You can see the
difference once you remove the ACL check added recently in
get_tables_to_cluster_partitioned().  What do you think?
--
Michael

Attachment

Re: CLUSTER on partitioned index

From
Michael Paquier
Date:
On Sat, Apr 16, 2022 at 08:58:50PM +0900, Michael Paquier wrote:
> Well, I am a bit annoyed that we don't actually check that a CLUSTER
> command does not block when doing a CLUSTER on a partitioned table
> while a lock is held on one of its partitions.  So, attached is a
> proposal of patch to improve the test coverage in this area.  While on
> it, I have added a test with a normal table.  You can see the
> difference once you remove the ACL check added recently in
> get_tables_to_cluster_partitioned().  What do you think?

This was the last reason why this was listed as an open item, so,
hearing nothing, I have applied this patch to add those extra tests,
and switched the item as fixed.
--
Michael

Attachment

Re: CLUSTER on partitioned index

From
Alvaro Herrera
Date:
On 2022-Apr-26, Michael Paquier wrote:

> On Sat, Apr 16, 2022 at 08:58:50PM +0900, Michael Paquier wrote:
> > Well, I am a bit annoyed that we don't actually check that a CLUSTER
> > command does not block when doing a CLUSTER on a partitioned table
> > while a lock is held on one of its partitions.  So, attached is a
> > proposal of patch to improve the test coverage in this area.
> 
> This was the last reason why this was listed as an open item, so,
> hearing nothing, I have applied this patch to add those extra tests,
> and switched the item as fixed.

Thank you!

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/