Thread: pg_dump --if-exists --clean when drop index that is partition of a partitioned index

hi.

CREATE TABLE tp(c int, a int, b int) PARTITION BY RANGE (b);
CREATE TABLE tp_1(c int, a int, b int);
ALTER TABLE tp ATTACH PARTITION tp_1 FOR VALUES FROM (0) TO (1);
CREATE INDEX t_a_idx ON tp_1(a);
CREATE INDEX tp_a_idx ON tp(a);

pg_dump  --schema=public --if-exists --clean --no-statistics
--no-owner --no-data --table-and-children=tp > 1.sql
pg_dump output file 1.sql excerpts:
----
DROP INDEX IF EXISTS public.t_a_idx;
DROP INDEX IF EXISTS public.tp_a_idx;
DROP TABLE IF EXISTS public.tp_1;
DROP TABLE IF EXISTS public.tp;
----
if you execute the
DROP INDEX IF EXISTS public.t_a_idx;

ERROR:  cannot drop index t_a_idx because index tp_a_idx requires it
HINT:  You can drop index tp_a_idx instead.

Is this pg_dump output what we expected?


SELECT * FROM pg_partition_tree('tp_a_idx');
  relid   | parentrelid | isleaf | level
----------+-------------+--------+-------
 tp_a_idx |             | f      |     0
 t_a_idx  | tp_a_idx    | t      |     1
(2 rows)



Hi

po 14. 4. 2025 v 7:54 odesílatel jian he <jian.universality@gmail.com> napsal:
hi.

CREATE TABLE tp(c int, a int, b int) PARTITION BY RANGE (b);
CREATE TABLE tp_1(c int, a int, b int);
ALTER TABLE tp ATTACH PARTITION tp_1 FOR VALUES FROM (0) TO (1);
CREATE INDEX t_a_idx ON tp_1(a);
CREATE INDEX tp_a_idx ON tp(a);

pg_dump  --schema=public --if-exists --clean --no-statistics
--no-owner --no-data --table-and-children=tp > 1.sql
pg_dump output file 1.sql excerpts:
----
DROP INDEX IF EXISTS public.t_a_idx;
DROP INDEX IF EXISTS public.tp_a_idx;
DROP TABLE IF EXISTS public.tp_1;
DROP TABLE IF EXISTS public.tp;
----
if you execute the
DROP INDEX IF EXISTS public.t_a_idx;

ERROR:  cannot drop index t_a_idx because index tp_a_idx requires it
HINT:  You can drop index tp_a_idx instead.

Is this pg_dump output what we expected?


It is a bug, I think, the implementation of these parts of code is older than partitioning support, and doesn't do necessary detach.

regards

Pavel

 

SELECT * FROM pg_partition_tree('tp_a_idx');
  relid   | parentrelid | isleaf | level
----------+-------------+--------+-------
 tp_a_idx |             | f      |     0
 t_a_idx  | tp_a_idx    | t      |     1
(2 rows)


On Mon, Apr 14, 2025 at 2:09 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> Hi
>
> po 14. 4. 2025 v 7:54 odesílatel jian he <jian.universality@gmail.com> napsal:
>>
>> hi.
>>
>> CREATE TABLE tp(c int, a int, b int) PARTITION BY RANGE (b);
>> CREATE TABLE tp_1(c int, a int, b int);
>> ALTER TABLE tp ATTACH PARTITION tp_1 FOR VALUES FROM (0) TO (1);
>> CREATE INDEX t_a_idx ON tp_1(a);
>> CREATE INDEX tp_a_idx ON tp(a);
>>
>> pg_dump  --schema=public --if-exists --clean --no-statistics
>> --no-owner --no-data --table-and-children=tp > 1.sql
>> pg_dump output file 1.sql excerpts:
>> ----
>> DROP INDEX IF EXISTS public.t_a_idx;
>> DROP INDEX IF EXISTS public.tp_a_idx;
>> DROP TABLE IF EXISTS public.tp_1;
>> DROP TABLE IF EXISTS public.tp;
>> ----
>> if you execute the
>> DROP INDEX IF EXISTS public.t_a_idx;
>>
>> ERROR:  cannot drop index t_a_idx because index tp_a_idx requires it
>> HINT:  You can drop index tp_a_idx instead.
>>
>> Is this pg_dump output what we expected?
>>
>
> It is a bug, I think, the implementation of these parts of code is older than partitioning support, and doesn't do
necessarydetach. 
>


seems pretty easy to fix.
we only need dropStmt when IndxInfo->parentidx oid is invalid.

+        if (!OidIsValid(indxinfo->parentidx))
+            appendPQExpBuffer(delq, "DROP INDEX %s;\n", qqindxname);

I have tested the above changes on PG11, master.

Attachment


út 15. 4. 2025 v 7:33 odesílatel jian he <jian.universality@gmail.com> napsal:
On Mon, Apr 14, 2025 at 2:09 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> Hi
>
> po 14. 4. 2025 v 7:54 odesílatel jian he <jian.universality@gmail.com> napsal:
>>
>> hi.
>>
>> CREATE TABLE tp(c int, a int, b int) PARTITION BY RANGE (b);
>> CREATE TABLE tp_1(c int, a int, b int);
>> ALTER TABLE tp ATTACH PARTITION tp_1 FOR VALUES FROM (0) TO (1);
>> CREATE INDEX t_a_idx ON tp_1(a);
>> CREATE INDEX tp_a_idx ON tp(a);
>>
>> pg_dump  --schema=public --if-exists --clean --no-statistics
>> --no-owner --no-data --table-and-children=tp > 1.sql
>> pg_dump output file 1.sql excerpts:
>> ----
>> DROP INDEX IF EXISTS public.t_a_idx;
>> DROP INDEX IF EXISTS public.tp_a_idx;
>> DROP TABLE IF EXISTS public.tp_1;
>> DROP TABLE IF EXISTS public.tp;
>> ----
>> if you execute the
>> DROP INDEX IF EXISTS public.t_a_idx;
>>
>> ERROR:  cannot drop index t_a_idx because index tp_a_idx requires it
>> HINT:  You can drop index tp_a_idx instead.
>>
>> Is this pg_dump output what we expected?
>>
>
> It is a bug, I think, the implementation of these parts of code is older than partitioning support, and doesn't do necessary detach.
>


seems pretty easy to fix.
we only need dropStmt when IndxInfo->parentidx oid is invalid.

+        if (!OidIsValid(indxinfo->parentidx))
+            appendPQExpBuffer(delq, "DROP INDEX %s;\n", qqindxname);

I don't think it is the correct fix.

because then fails CREATE INDEX qqindxname

The motivation for usage --clean and --if-exists option is possibility to call restore idempotently. It should not fail when I do restore in an empty database, and it shouldn't fail if I do restore in an existing database.

Regards

Pavel
 
 

I have tested the above changes on PG11, master.
On Tue, Apr 15, 2025 at 1:45 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>
>> seems pretty easy to fix.
>> we only need dropStmt when IndxInfo->parentidx oid is invalid.
>>
>> +        if (!OidIsValid(indxinfo->parentidx))
>> +            appendPQExpBuffer(delq, "DROP INDEX %s;\n", qqindxname);
>
>
> I don't think it is the correct fix.
>
> because then fails CREATE INDEX qqindxname
>
> The motivation for usage --clean and --if-exists option is possibility to call restore idempotently. It should not
failwhen I do restore in an empty database, and it shouldn't fail if I do restore in an existing database. 
>


I am not sure what you mean fails
``CREATE INDEX qqindxname``
?

for example:

>> >> CREATE TABLE tp(c int, a int, b int) PARTITION BY RANGE (b);
>> >> CREATE TABLE tp_1(c int, a int, b int);
>> >> ALTER TABLE tp ATTACH PARTITION tp_1 FOR VALUES FROM (0) TO (1);
>> >> CREATE INDEX t_a_idx ON tp_1(a);
>> >> CREATE INDEX tp_a_idx ON tp(a);

with the patch, pg_dump output will be
```
DROP INDEX IF EXISTS public.tp_a_idx;
DROP TABLE IF EXISTS public.tp_11;
DROP TABLE IF EXISTS public.tp;
....
CREATE INDEX tp_a_idx ON ONLY public.tp USING btree (a);
CREATE INDEX tp_11_a_idx ON public.tp_11 USING btree (a);
ALTER INDEX public.tp_a_idx ATTACH PARTITION public.tp_11_a_idx;
```

What's your expectation?





út 15. 4. 2025 v 8:52 odesílatel jian he <jian.universality@gmail.com> napsal:
On Tue, Apr 15, 2025 at 1:45 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>
>> seems pretty easy to fix.
>> we only need dropStmt when IndxInfo->parentidx oid is invalid.
>>
>> +        if (!OidIsValid(indxinfo->parentidx))
>> +            appendPQExpBuffer(delq, "DROP INDEX %s;\n", qqindxname);
>
>
> I don't think it is the correct fix.
>
> because then fails CREATE INDEX qqindxname
>
> The motivation for usage --clean and --if-exists option is possibility to call restore idempotently. It should not fail when I do restore in an empty database, and it shouldn't fail if I do restore in an existing database.
>


I am not sure what you mean fails
``CREATE INDEX qqindxname``
?

for example:

>> >> CREATE TABLE tp(c int, a int, b int) PARTITION BY RANGE (b);
>> >> CREATE TABLE tp_1(c int, a int, b int);
>> >> ALTER TABLE tp ATTACH PARTITION tp_1 FOR VALUES FROM (0) TO (1);
>> >> CREATE INDEX t_a_idx ON tp_1(a);
>> >> CREATE INDEX tp_a_idx ON tp(a);

with the patch, pg_dump output will be
```
DROP INDEX IF EXISTS public.tp_a_idx;
DROP TABLE IF EXISTS public.tp_11;
DROP TABLE IF EXISTS public.tp;
....
CREATE INDEX tp_a_idx ON ONLY public.tp USING btree (a);
CREATE INDEX tp_11_a_idx ON public.tp_11 USING btree (a);
ALTER INDEX public.tp_a_idx ATTACH PARTITION public.tp_11_a_idx;
```

What's your expectation?

I am not sure at this moment. 

There is a question why dump exports DROP INDEX, although the index will be dropped implicitly when table is dropped.

So your fix can work, because all indexes will be dropped by DROP TABLE, but on second hand it is not consistent with the current behavior of pg_dump, where indexes are dropped explicitly.

I have not knowledge, why pg_dump exports DROP INDEX explicitly, although it is redundant.

Regards

Pavel
 
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I have not knowledge, why pg_dump exports DROP INDEX explicitly, although
> it is redundant.

I concur that the proposed patch is probably not the right fix.
It just looks wrong and arbitrary.

My first thought about the right way to fix it was that the
INDEX ATTACH TOC entry should have a dropStmt that is "ALTER ...
DETACH PARTITION", since the way --clean works is to issue the
dropStmts in reverse order.  However, there's no ALTER INDEX
DETACH PARTITION command.  Also, looking at dumpIndexAttach,
there's a comment saying very explicitly not to:

        /*
         * There is no point in creating a drop query as the drop is done by
         * index drop.  (If you think to change this, see also
         * _printTocEntry().)  Although this object doesn't really have
         * ownership as such, set the owner field anyway to ensure that the
         * command is run by the correct role at restore time.
         */

My guess is that that was correct when written and has been falsified
by later changes.  So I think the next thing to do is run down exactly
what broke it and then decide if that change was buggy or if the
comment is just obsolete.  If it's obsolete then we have to figure
out what to do next.

            regards, tom lane



After thinking about this a bit more: the problem is basically one
of things being done in the wrong order.  And the way to ensure
that things get done in the right order is to set up the object
dependencies correctly.  So what we need to do is to switch the
dependency direction between the child and parent index objects,
as attached.

The effect of this is to switch both the creation order and the
drop order of the child and parent indexes.  Normally, flipping
the creation order would be problematic.  But for partitioned
indexes it doesn't matter because there's not actually any
connection between the child and parent indexes until we issue
INDEX ATTACH.  The INDEX ATTACH object has dependencies on both,
so it doesn't come out till after both, so all that still works.

However ... this is still only a partial fix.  It fixes the
problem as-presented, where you used both --clean and --if-exists.
But if you just say --clean, then the DROP command for the child
index fails because it's already gone thanks to dropping the
parent index first.  Possibly the least messy way to deal with
that is to always put IF EXISTS into the DROP command for a
child index.  This isn't totally problem-free, because then we
would have to teach RestoreArchive to not insert IF EXISTS if
one is already there, and that probably makes the fix not
back-patchable for fear of compatibility problems with unpatched
copies of pg_restore.

Or we could do what Jian suggested and just not emit any dropStmt
for child indexes.  I continue to fear that that will have
undesirable side-effects, but I have to admit that I'm not sure
what.  The fact that the backend will automatically drop the
child index if we drop either its parent index or its table
seems to cover a lot of the edge cases ... but does it cover
all of them?

            regards, tom lane

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index c6e6d3b2b86..961b8f05eef 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -19630,13 +19630,25 @@ getDependencies(Archive *fout)
          * Ordinarily, table rowtypes have implicit dependencies on their
          * tables.  However, for a composite type the implicit dependency goes
          * the other way in pg_depend; which is the right thing for DROP but
-         * it doesn't produce the dependency ordering we need. So in that one
+         * it doesn't produce the dependency ordering we need. So in that
          * case, we reverse the direction of the dependency.
+         *
+         * Similarly, the server's idea of the dependency direction for
+         * partitioned indexes is backwards for our purposes, so reverse it.
+         * This swap has no real effect during the object creation sequence,
+         * because the leaf and partitioned indexes are not connected until we
+         * issue INDEX ATTACH.  What it does for us is to ensure that when
+         * --clean mode is used, the DROP INDEX commands come out in an order
+         * that the server will accept.
          */
         if (deptype == 'i' &&
             dobj->objType == DO_TABLE &&
             refdobj->objType == DO_TYPE)
             addObjectDependency(refdobj, dobj->dumpId);
+        else if (deptype == 'P' &&
+                 dobj->objType == DO_INDEX &&
+                 refdobj->objType == DO_INDEX)
+            addObjectDependency(refdobj, dobj->dumpId);
         else
             /* normal case */
             addObjectDependency(dobj, refdobj->dumpId);

On Wed, Apr 16, 2025 at 6:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Or we could do what Jian suggested and just not emit any dropStmt
> for child indexes.  I continue to fear that that will have
> undesirable side-effects, but I have to admit that I'm not sure
> what.  The fact that the backend will automatically drop the
> child index if we drop either its parent index or its table
> seems to cover a lot of the edge cases ... but does it cover
> all of them?
>

If pg_dump not produce "DROP INDEX IF EXISTS child_index;"
then we are actually tests drop parent index will cascade to child index.
If it fails, then execute pg_dump output, CREATE INDEX on child table will fail,
error message be like:
ERROR:  relation "t_a_idx" already exists


CREATE TABLE tp(c int, a int, b int) PARTITION BY RANGE (b);
CREATE TABLE tp_1(c int, a int, b int);
ALTER TABLE tp ATTACH PARTITION tp_1 FOR VALUES FROM (0) TO (1);
CREATE INDEX t_a_idx ON tp_1(a);
CREATE INDEX tp_a_idx ON tp(a);

in this case, the partition index and partitioned index dependency
relationship is
DEPENDENCY_PARTITION_PRI.

per https://www.postgresql.org/docs/current/catalog-pg-depend.html
"
Example: a child partitioned index is made partition-dependent on both the
partition table it is on and the parent partitioned index, so that it goes away
if either of those is dropped, but not otherwise. The dependency on the parent
index is primary, so that if the user tries to drop the child partitioned index,
the error message will suggest dropping the parent index instead (not the
table).
"
from the doc, drop the partitioned index will cascade and also drop
the partition index.



On Tue, Apr 15, 2025 at 3:08 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
> There is a question why dump exports DROP INDEX, although the index will be dropped implicitly when table is dropped.
>
> So your fix can work, because all indexes will be dropped by DROP TABLE, but on second hand it is not consistent with
thecurrent behavior of pg_dump, where indexes are dropped explicitly. 
>
> I have not knowledge, why pg_dump exports DROP INDEX explicitly, although it is redundant.
>
> Regards
>
> Pavel
>
if you specify ``--section=post-data``, then only the index command
will be generated.
so we actually do need separate dump exports DROP INDEX.