Thread: FK Constraint sort order with pg_dump

FK Constraint sort order with pg_dump

From
Christian Barthel
Date:
Hello,

The sorting order of FK constraints with the same name is based on the
OID (because it lands in the “Usually shouldn’t get here” OID comparison
block at [1]).  Wouldn’t it be better if the order of those constraints
were based on the table name?

Details:

The above schema is identical except in the order how the constraints
were added (the constraint name is the same on those two tables):

--8<---------------cut here---------------start------------->8---
  -- --- Schema Version 1:
  CREATE TABLE a (id int unique);

  CREATE TABLE b (id int);
  ALTER TABLE b ADD CONSTRAINT x_fkey FOREIGN KEY (id) REFERENCES a(id);
  CREATE TABLE c (id int);
  ALTER TABLE c ADD CONSTRAINT x_fkey FOREIGN KEY (id) REFERENCES a(id);

  -- --- Schema Version 2:

  CREATE TABLE a (id int unique);

  CREATE TABLE c (id int);
  ALTER TABLE c ADD CONSTRAINT x_fkey FOREIGN KEY (id) REFERENCES a(id);
  CREATE TABLE b (id int);
  ALTER TABLE b ADD CONSTRAINT x_fkey FOREIGN KEY (id) REFERENCES a(id);
--8<---------------cut here---------------end--------------->8---

Doing a pg_dump on Version 1 and Version 2 leads to two different dumps
despite being the same schema: (*)

--8<---------------cut here---------------start------------->8---
--- version1    2022-07-21 19:16:31.369010843 +0200
+++ version2    2022-07-21 19:16:26.688976178 +0200
@@ -86,18 +86,18 @@


 --
--- Name: b x_fkey; Type: FK CONSTRAINT; Schema: public; Owner: bch
+-- Name: c x_fkey; Type: FK CONSTRAINT; Schema: public; Owner: bch
 --

-ALTER TABLE ONLY public.b
+ALTER TABLE ONLY public.c
     ADD CONSTRAINT x_fkey FOREIGN KEY (id) REFERENCES public.a(id);


 --
--- Name: c x_fkey; Type: FK CONSTRAINT; Schema: public; Owner: bch
+-- Name: b x_fkey; Type: FK CONSTRAINT; Schema: public; Owner: bch
 --

-ALTER TABLE ONLY public.c
+ALTER TABLE ONLY public.b
     ADD CONSTRAINT x_fkey FOREIGN KEY (id) REFERENCES public.a(id);
--8<---------------cut here---------------end--------------->8---

Attached is a patch file that adds a string comparison function call to
sort FK constraints (based on the table if it exists).  Any thoughts on
that?

[1]

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/bin/pg_dump/pg_dump_sort.c;h=80641cd79a2e6ce0a10bd55218b10d22ac369ed5;hb=7c850320d8cfa5503ecec61c2559661b924f7595#l212

(*) Tested on 14.4
--
Christian Barthel


Attachment

Re: FK Constraint sort order with pg_dump

From
Adrian Klaver
Date:
On 7/21/22 10:25, Christian Barthel wrote:
> Hello,
> 
> The sorting order of FK constraints with the same name is based on the
> OID (because it lands in the “Usually shouldn’t get here” OID comparison
> block at [1]).  Wouldn’t it be better if the order of those constraints
> were based on the table name?
> 

Why does it matter?

-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: FK Constraint sort order with pg_dump

From
"David G. Johnston"
Date:
On Thu, Jul 21, 2022 at 10:49 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 7/21/22 10:25, Christian Barthel wrote:
> Hello,
>
> The sorting order of FK constraints with the same name is based on the
> OID (because it lands in the “Usually shouldn’t get here” OID comparison
> block at [1]).  Wouldn’t it be better if the order of those constraints
> were based on the table name?
>

Why does it matter?


As the code comment says:

/* To have a stable sort order, break ties for some object types */

This seems like it is simply a missed case.

David J.

Re: FK Constraint sort order with pg_dump

From
Christian Barthel
Date:
On Thursday, July 21, 2022, Adrian Klaver wrote:

> On 7/21/22 10:25, Christian Barthel wrote:
>> Hello, The sorting order of FK constraints with the same name is
>> based on the OID (because it lands in the “Usually shouldn’t get
>> here” OID comparison block at [1]).  Wouldn’t it be better if the
>> order of those constraints were based on the table name?
>>
>
> Why does it matter?

As the comment in pg_dump.c states, logically identical schemas should
produce identical dumps:

| * We rely on dependency information to help us determine a safe order,
| so * the initial sort is mostly for cosmetic purposes: we sort by name
| to * ensure that logically identical schemas will dump identically.
  <https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/bin/pg_dump/pg_dump.c#l883>

This is done for most objects (tables, functions etc).  Why not for FK
constraints?

It makes comparing schemas on different postgres instances simpler
(i.e. when you’re working with testing, staging, live systems etc).

--
Christian Barthel



Re: FK Constraint sort order with pg_dump

From
Adrian Klaver
Date:
On 7/21/22 10:59, Christian Barthel wrote:
> On Thursday, July 21, 2022, Adrian Klaver wrote:

>> Why does it matter?
> 
> As the comment in pg_dump.c states, logically identical schemas should
> produce identical dumps:
> 
> | * We rely on dependency information to help us determine a safe order,
> | so * the initial sort is mostly for cosmetic purposes: we sort by name
> | to * ensure that logically identical schemas will dump identically.
>    <https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/bin/pg_dump/pg_dump.c#l883>
> 
> This is done for most objects (tables, functions etc).  Why not for FK
> constraints?
> 
> It makes comparing schemas on different postgres instances simpler
> (i.e. when you’re working with testing, staging, live systems etc).
> 

Alright that I can see.


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: FK Constraint sort order with pg_dump

From
Tom Lane
Date:
Christian Barthel <bch@online.de> writes:
> On Thursday, July 21, 2022, Adrian Klaver wrote:
>> Why does it matter?

> As the comment in pg_dump.c states, logically identical schemas should
> produce identical dumps:

Agreed, but this is far from the only deficiency in DOTypeNameCompare.
If we're going to try to fill in the gaps, we should be systematic
about it.  I took a quick stab at implementing the cases it omits,
as attached.  There are still a few gaps:

* DO_OPCLASS and DO_OPFAMILY ought to have a subsidiary sort on the
access method name, since their names are only unique per-access-method.
The trouble here is that OpclassInfo/OpfamilyInfo don't provide any
way to get to the access method.  That could be fixed with more fields,
and maybe it's worth doing, but I didn't do that here.

* For casts, the cast name is consed up as the concatenation of the source
and target type names, which isn't enough to guarantee uniqueness.  We
could add the types' schema names, perhaps ... is it worth the trouble?
(Not to mention possible breakage of scripts that expect the current
naming convention.)

* Likewise for transforms, we'd need to add the type's schema name if
we want the transform name to be unique.

Not sure whether it's worth venturing into such nonlocal fixes.

            regards, tom lane

diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 5de3241eb4..2282c002ae 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -280,6 +280,34 @@ DOTypeNameCompare(const void *p1, const void *p2)
         if (cmpval != 0)
             return cmpval;
     }
+    else if (obj1->objType == DO_CONSTRAINT ||
+             obj1->objType == DO_FK_CONSTRAINT)
+    {
+        ConstraintInfo *cobj1 = *(ConstraintInfo *const *) p1;
+        ConstraintInfo *cobj2 = *(ConstraintInfo *const *) p2;
+        const char *objname1;
+        const char *objname2;
+
+        /* Sort by owning object name (namespace was considered already) */
+        if (cobj1->contable)
+            objname1 = cobj1->contable->dobj.name;
+        else if (cobj1->condomain)
+            objname1 = cobj1->condomain->dobj.name;
+        else
+            objname1 = NULL;
+        if (cobj2->contable)
+            objname2 = cobj2->contable->dobj.name;
+        else if (cobj2->condomain)
+            objname2 = cobj2->condomain->dobj.name;
+        else
+            objname2 = NULL;
+        if (objname1 && objname2)
+        {
+            cmpval = strcmp(objname1, objname2);
+            if (cmpval != 0)
+                return cmpval;
+        }
+    }
     else if (obj1->objType == DO_POLICY)
     {
         PolicyInfo *pobj1 = *(PolicyInfo *const *) p1;
@@ -291,6 +319,17 @@ DOTypeNameCompare(const void *p1, const void *p2)
         if (cmpval != 0)
             return cmpval;
     }
+    else if (obj1->objType == DO_RULE)
+    {
+        RuleInfo   *robj1 = *(RuleInfo *const *) p1;
+        RuleInfo   *robj2 = *(RuleInfo *const *) p2;
+
+        /* Sort by table name (table namespace was considered already) */
+        cmpval = strcmp(robj1->ruletable->dobj.name,
+                        robj2->ruletable->dobj.name);
+        if (cmpval != 0)
+            return cmpval;
+    }
     else if (obj1->objType == DO_TRIGGER)
     {
         TriggerInfo *tobj1 = *(TriggerInfo *const *) p1;
@@ -302,6 +341,39 @@ DOTypeNameCompare(const void *p1, const void *p2)
         if (cmpval != 0)
             return cmpval;
     }
+    else if (obj1->objType == DO_DEFAULT_ACL)
+    {
+        DefaultACLInfo *daclobj1 = *(DefaultACLInfo *const *) p1;
+        DefaultACLInfo *daclobj2 = *(DefaultACLInfo *const *) p2;
+
+        /* Sort by role name (objtype and namespace were considered already) */
+        cmpval = strcmp(daclobj1->defaclrole,
+                        daclobj2->defaclrole);
+        if (cmpval != 0)
+            return cmpval;
+    }
+    else if (obj1->objType == DO_PUBLICATION_REL)
+    {
+        PublicationRelInfo *probj1 = *(PublicationRelInfo *const *) p1;
+        PublicationRelInfo *probj2 = *(PublicationRelInfo *const *) p2;
+
+        /* Sort by publication name (table name/nsp was considered already) */
+        cmpval = strcmp(probj1->publication->dobj.name,
+                        probj2->publication->dobj.name);
+        if (cmpval != 0)
+            return cmpval;
+    }
+    else if (obj1->objType == DO_PUBLICATION_TABLE_IN_SCHEMA)
+    {
+        PublicationSchemaInfo *psobj1 = *(PublicationSchemaInfo *const *) p1;
+        PublicationSchemaInfo *psobj2 = *(PublicationSchemaInfo *const *) p2;
+
+        /* Sort by publication name (schema name was considered already) */
+        cmpval = strcmp(psobj1->publication->dobj.name,
+                        psobj2->publication->dobj.name);
+        if (cmpval != 0)
+            return cmpval;
+    }
 
     /* Usually shouldn't get here, but if we do, sort by OID */
     return oidcmp(obj1->catId.oid, obj2->catId.oid);