Thread: Add primary keys to system catalogs

Add primary keys to system catalogs

From
Peter Eisentraut
Date:
I saw someone ask once for a schema diagram of the system catalogs. 
Things like that have occasionally been produced manually, but they are 
not regularly updated.  That made me wonder, why can't we add primary 
and foreign keys to system catalogs and then have existing tools produce 
such a schema diagram automatically?

Since we have ADD PRIMARY KEY USING INDEX, we can declare a primary key 
for an existing index.  So this doesn't have to affect the low-level 
early bootstrapping.  The attached patch adds those commands manually. 
Another option might be to have the catalog generation Perl tooling 
create those declarations automatically from some marker in the catalog 
files.  That would also allow declaring unique constraints for the 
unique indexes that don't end up being the primary key.

I'm not dealing here with how we might do foreign keys between system 
catalogs, but I think the idea would be to have some way to declare a 
foreign key "on paper" without any actual triggers.

Any thoughts on this direction?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Add primary keys to system catalogs

From
Craig Ringer
Date:


On Sat, 3 Oct 2020, 14:40 Peter Eisentraut, <peter.eisentraut@2ndquadrant.com> wrote:
I saw someone ask once for a schema diagram of the system catalogs.
Things like that have occasionally been produced manually, but they are
not regularly updated.  That made me wonder, why can't we add primary
and foreign keys to system catalogs and then have existing tools produce
such a schema diagram automatically?

Since we have ADD PRIMARY KEY USING INDEX, we can declare a primary key
for an existing index.  So this doesn't have to affect the low-level
early bootstrapping.  The attached patch adds those commands manually.
Another option might be to have the catalog generation Perl tooling
create those declarations automatically from some marker in the catalog
files.  That would also allow declaring unique constraints for the
unique indexes that don't end up being the primary key.


Any thoughts on this direction?

I like the general idea a lot. I've used Pg for a long time and I still struggle to march up relationships sometimes. Especially given that some things use relation oid oid keys and some use named cols as keys. 

So a big +1 from me for the idea. Especially if we ensure psql recognises when the relation 'oid' attribute has a declared PK and includes it in the column list. 

I don't object to simply declaring them without any implementing triggers. You aren't supposed to mess with the catalogs anyway. I'd actually like it to be defended against more actively by default. So the FKs are implicitly always valid, because the implementation says so. Much the same way trigger based FKs are unless you break the rules and mess with the triggers.

Frankly I think we really need a way to mark FKs to be DISABLED or NOT CHECKED or something and a way to mark them as NOT VALID. Rsther than expecting uses to fiddle with the implementation triggers. But I don't think FKs on system catalogs require that, it's just be cosmetic there.

Re: Add primary keys to system catalogs

From
Bruce Momjian
Date:
On Sat, Oct  3, 2020 at 09:27:02PM +0800, Craig Ringer wrote:
> 
> 
> On Sat, 3 Oct 2020, 14:40 Peter Eisentraut, <peter.eisentraut@2ndquadrant.com>
> wrote:
> 
>     I saw someone ask once for a schema diagram of the system catalogs.
>     Things like that have occasionally been produced manually, but they are
>     not regularly updated.  That made me wonder, why can't we add primary
>     and foreign keys to system catalogs and then have existing tools produce
>     such a schema diagram automatically?
> 
>     Since we have ADD PRIMARY KEY USING INDEX, we can declare a primary key
>     for an existing index.  So this doesn't have to affect the low-level
>     early bootstrapping.  The attached patch adds those commands manually.
>     Another option might be to have the catalog generation Perl tooling
>     create those declarations automatically from some marker in the catalog
>     files.  That would also allow declaring unique constraints for the
>     unique indexes that don't end up being the primary key.
> 
> 
>     Any thoughts on this direction?
> 
> 
> I like the general idea a lot. I've used Pg for a long time and I still
> struggle to march up relationships sometimes. Especially given that some things
> use relation oid oid keys and some use named cols as keys. 
> 
> So a big +1 from me for the idea. Especially if we ensure psql recognises when
> the relation 'oid' attribute has a declared PK and includes it in the column
> list. 

+1

> I don't object to simply declaring them without any implementing triggers. You
> aren't supposed to mess with the catalogs anyway. I'd actually like it to be
> defended against more actively by default. So the FKs are implicitly always
> valid, because the implementation says so. Much the same way trigger based FKs
> are unless you break the rules and mess with the triggers.
> 
> Frankly I think we really need a way to mark FKs to be DISABLED or NOT CHECKED
> or something and a way to mark them as NOT VALID. Rsther than expecting uses to
> fiddle with the implementation triggers. But I don't think FKs on system
> catalogs require that, it's just be cosmetic there.

+1

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Add primary keys to system catalogs

From
John Naylor
Date:

On Sat, Oct 3, 2020 at 9:27 AM Craig Ringer <craig.ringer@enterprisedb.com> wrote:
> So a big +1 from me for the idea. Especially if we ensure psql recognises when the relation 'oid' attribute has a declared PK and includes it in the column list.

Oid has been in the normal column list since v12 when it stopped being a system column. Or did you mean something else?

Re: Add primary keys to system catalogs

From
Craig Ringer
Date:


On Sun, 4 Oct 2020, 01:32 John Naylor, <john.naylor@enterprisedb.com> wrote:

On Sat, Oct 3, 2020 at 9:27 AM Craig Ringer <craig.ringer@enterprisedb.com> wrote:
> So a big +1 from me for the idea. Especially if we ensure psql recognises when the relation 'oid' attribute has a declared PK and includes it in the column list.

Oid has been in the normal column list since v12 when it stopped being a system column. Or did you mean something else?

It means I've spent way too long working on extensions running mainly on pg11 lately and way too little reading -hackers. Ignore they bit.  Thanks..

Re: Add primary keys to system catalogs

From
Andres Freund
Date:
Hi,

On 2020-10-03 08:40:31 +0200, Peter Eisentraut wrote:
> Since we have ADD PRIMARY KEY USING INDEX, we can declare a primary key for
> an existing index.  So this doesn't have to affect the low-level early
> bootstrapping.  The attached patch adds those commands manually. Another
> option might be to have the catalog generation Perl tooling create those
> declarations automatically from some marker in the catalog files.  That
> would also allow declaring unique constraints for the unique indexes that
> don't end up being the primary key.

Hm. What prevents us from declaring the pkey during bootstrap? I don't
at all like adding yet another place that needs manual editing when
doing DDL changes.

Greetings,

Andres Freund



Re: Add primary keys to system catalogs

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2020-10-03 08:40:31 +0200, Peter Eisentraut wrote:
>> Since we have ADD PRIMARY KEY USING INDEX, we can declare a primary key for
>> an existing index.  So this doesn't have to affect the low-level early
>> bootstrapping.  The attached patch adds those commands manually. Another
>> option might be to have the catalog generation Perl tooling create those
>> declarations automatically from some marker in the catalog files.  That
>> would also allow declaring unique constraints for the unique indexes that
>> don't end up being the primary key.

> Hm. What prevents us from declaring the pkey during bootstrap? I don't
> at all like adding yet another place that needs manual editing when
> doing DDL changes.

We don't add new catalogs often, so the cost-benefit ratio of automating
this looks pretty poor.  It's not like you'll be able to forget to do it,
given the proposed regression test check for catalogs without pkeys.

One thing I'm wondering about though is whether Peter's implementation
ends up with desirable pg_depend contents for the pkey constraints.
Probably we want them to just be pinned and not have any explicit
dependencies on the associated indexes, but I haven't thought hard
about it.

            regards, tom lane



Re: Add primary keys to system catalogs

From
Andres Freund
Date:
Hi,

On 2020-10-06 15:31:16 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2020-10-03 08:40:31 +0200, Peter Eisentraut wrote:
> >> Since we have ADD PRIMARY KEY USING INDEX, we can declare a primary key for
> >> an existing index.  So this doesn't have to affect the low-level early
> >> bootstrapping.  The attached patch adds those commands manually. Another
> >> option might be to have the catalog generation Perl tooling create those
> >> declarations automatically from some marker in the catalog files.  That
> >> would also allow declaring unique constraints for the unique indexes that
> >> don't end up being the primary key.
> 
> > Hm. What prevents us from declaring the pkey during bootstrap? I don't
> > at all like adding yet another place that needs manual editing when
> > doing DDL changes.
> 
> We don't add new catalogs often, so the cost-benefit ratio of automating
> this looks pretty poor.

True, we don't create new ones that often. Still think that distributing
such setup over fewer places is good. And it's not like there's only a
handful of pkeys to start with. To me it makes more sense to add a
DECLARE_PRIMARY_KEY in indexing.h, replacing the relevant
DECLARE_UNIQUE_INDEX stanzas. Or, possibly preferrable, do it as part of
the CATALOG() directly - which'd avoid needing the index statement in
the first place.

The Catalog.pm part is trivial. It's abit harder to implement actually
creating the constraint - but even the hard route of adding a new field
to Boot_DeclareIndexStmt or such wouldn't be particularly hard.


> One thing I'm wondering about though is whether Peter's implementation
> ends up with desirable pg_depend contents for the pkey constraints.
> Probably we want them to just be pinned and not have any explicit
> dependencies on the associated indexes, but I haven't thought hard
> about it.

That sounds right to me.


Wonder whether it's not time to move the DECLARE bits from indexing.h to
the individual catalogs, independent of what we're discussing here. With
todays Catalog.pm there's really not much point in keeping them
separate, imo.

Greetings,

Andres Freund



Re: Add primary keys to system catalogs

From
John Naylor
Date:

On Tue, Oct 6, 2020 at 4:16 PM Andres Freund <andres@anarazel.de> wrote:
True, we don't create new ones that often. Still think that distributing
such setup over fewer places is good. And it's not like there's only a
handful of pkeys to start with. To me it makes more sense to add a
DECLARE_PRIMARY_KEY in indexing.h, replacing the relevant
DECLARE_UNIQUE_INDEX stanzas.

That seems logical.
 
Or, possibly preferrable, do it as part of
the CATALOG() directly - which'd avoid needing the index statement in
the first place.

That line is already messy in places. AFAICT, you'd still need info from the index statement (at least my shortened version below), leading to two syntaxes and two places for the same thing. Maybe I'm missing something?
 
Wonder whether it's not time to move the DECLARE bits from indexing.h to
the individual catalogs, independent of what we're discussing here. With
todays Catalog.pm there's really not much point in keeping them
separate, imo.

That would look nicer, at least. A declaration could go from

DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subscription_rel using btree(srrelid oid_ops, srsubid oid_ops));
#define SubscriptionRelSrrelidSrsubidIndexId 6117

to something like

DECLARE_UNIQUE_INDEX(btree(srrelid oid_ops, srsubid oid_ops), 6117, SubscriptionRelSrrelidSrsubidIndexId));

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company 

Re: Add primary keys to system catalogs

From
Peter Eisentraut
Date:
On 2020-10-03 08:40, Peter Eisentraut wrote:
> Since we have ADD PRIMARY KEY USING INDEX, we can declare a primary key
> for an existing index.  So this doesn't have to affect the low-level
> early bootstrapping.  The attached patch adds those commands manually.
> Another option might be to have the catalog generation Perl tooling
> create those declarations automatically from some marker in the catalog
> files.  That would also allow declaring unique constraints for the
> unique indexes that don't end up being the primary key.

I have reworked my patch like this.  Now, the primary key is identified 
by using DECLARE_UNIQUE_INDEX_PKEY() instead of DECLARE_UNIQUE_INDEX() 
in the catalog pg_*.h files.  This automatically generates the required 
ALTER TABLE commands in a new file system_constraints.sql.  initdb is 
ordered so that that file is run before dependency processing, so the 
system constraints also end up pinned (as was requested during previous 
discussion).

Note, this needs the get_constraint_index() patch I just posted 
separately.  The old implementation of get_constraint_index() does not 
handle pinned constraints.

There is something strange going on in the misc_sanity test, as seen by 
the test output change

-NOTICE:  pg_constraint contains unpinned initdb-created object(s)

This previously detected some constraints from the information schema, 
so it was correct to point those out.  But since it looks for the 
min(oid) of a catalog, this will now find one of the new pinned 
constraints, so it won't detect this case anymore.  I think that test is 
not written correctly.

-- 
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/

Attachment

Re: Add primary keys to system catalogs

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> [ v2-0001-Add-primary-keys-and-unique-constraints-to-system.patch ]

I've reviewed this patch.  It looks pretty solid to me, with a couple
trivial nits as mentioned below, and one bigger thing that's perhaps
in the category of bikeshedding.  Namely, do we really want to prefer
using the OID indexes as the primary keys?  In most cases there's some
other index that seems to me to be what a user would think of as the
pkey, for example pg_class_relname_nsp_index for pg_class or
pg_proc_proname_args_nsp_index for pg_proc.  Preferring OID where it
exists is a nice simple rule, which has some attractiveness, but the
OID indexes seem to me like a lookup aid rather than the "real" object
identity.

> There is something strange going on in the misc_sanity test, as seen by 
> the test output change
> -NOTICE:  pg_constraint contains unpinned initdb-created object(s)

Formerly the lowest OID in pg_constraint was that of the
information_schema.cardinal_number_domain_check CHECK constraint,
which is intentionally not pinned.  Now the lowest OID is that of
pg_proc_oid_index, which is pinned, hence the output change.  I think
that's fine.  The idea here is just to notice whether any catalogs have
been missed by setup_depend, and we have now proven that pg_constraint
wasn't missed, so it's cool.

The other catalog that is listed in the expected output, but is not
one of the ones intentionally excluded by setup_depend, is pg_rewrite.
That's because its lowest OID is a rewrite rule for a system view, which
we've intentionally made non-pinned.  So that's also fine, but maybe
it'd be worth explaining it in the comment above the DO block.

Anyway, on to the minor nits:

system_constraints.sql should be removed by the maintainer-clean target
in src/backend/catalog/Makefile; perhaps also mention it in the comment
for the clean target.  Also I suppose src/tools/msvc/clean.bat needs to
remove it, like it does postgres.bki.

The contents of system_constraints.sql seem pretty randomly ordered,
and I bet the order isn't stable across machines.  It would be wise
to sort the entries by name to ensure we don't get inconsistencies
between different builds.  (If nothing else, that could complicate
validating tarballs.)

I don't follow why the pg_seclabel, pg_shseclabel indexes aren't made
into pkeys?  There's a comment claiming they "use a nondefault operator
class", but that's untrue AFAICS.

            regards, tom lane



Re: Add primary keys to system catalogs

From
Robert Haas
Date:
On Sat, Oct 3, 2020 at 9:27 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
> Frankly I think we really need a way to mark FKs to be DISABLED or NOT CHECKED or something and a way to mark them as
NOTVALID. Rsther than expecting uses to fiddle with the implementation triggers. But I don't think FKs on system
catalogsrequire that, it's just be cosmetic there. 

Not really. I think the idea that users don't or shouldn't ever do
manual DDL on system catalogs is not very plausible, considering that
we suggest such steps in our own release notes.

I don't have any complaint about labelling some of the unique indexes
as primary keys, but I think installing foreign keys that don't really
enforce anything may lead to confusion.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Add primary keys to system catalogs

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I don't have any complaint about labelling some of the unique indexes
> as primary keys, but I think installing foreign keys that don't really
> enforce anything may lead to confusion.

I'm not sure if I buy the "confusion" argument, but after thinking
about this more I realized that there's a stronger roadblock for
treating catalog interrelationships as SQL foreign keys.  Namely,
that we always represent no-reference situations with a zero OID,
whereas it'd have to be NULL to look like a valid foreign-key case.
Changing to real NULLs in those columns would of course break a ton
of C code, so I don't think that's gonna happen.

We could overlay an FK onto OID columns that should never have zero
entries, but that would miss enough of the interesting cases that
I don't find it a very attractive proposal.

So I think we should finish up the project of making real SQL
constraints for the catalog indexes, but the idea of making FK
constraints too looks like it'll end in failure.

The reason I got interested in this right now is the nearby
discussion [1] about why findoidjoins misses some catalog
relationships and whether we should fix that and/or make its results
more readily accessible.  I'd thought perhaps FK constraint entries
could supersede the need for that tool altogether, but now it seems
like not.  I still like the idea of marking OID relationships in the
catalog headers though.  Perhaps we should take Joel's suggestion
of a new system catalog more seriously, and have genbki.pl populate
such a catalog from info in the catalog header files.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/d1f413ff-0e50-413d-910c-bdd9ee9752c1%40www.fastmail.com



Re: Add primary keys to system catalogs

From
Tom Lane
Date:
I wrote:
> ... I still like the idea of marking OID relationships in the
> catalog headers though.  Perhaps we should take Joel's suggestion
> of a new system catalog more seriously, and have genbki.pl populate
> such a catalog from info in the catalog header files.

On second thought, a catalog is overkill; it'd only be useful if the data
could change after initdb, which this data surely cannot.  The right way
to expose such info to SQL is with a set-returning function reading a
constant table in the C code, a la pg_get_keywords().  That way takes a
whole lot less infrastructure and is just as useful to SQL code.

[ wanders away wondering about a debug mode in which we actually validate
OIDs inserted into such columns... ]

            regards, tom lane



Re: Add primary keys to system catalogs

From
"Joel Jacobson"
Date:
On Mon, Jan 18, 2021, at 19:33, Tom Lane wrote:
> On second thought, a catalog is overkill; it'd only be useful if the data
> could change after initdb, which this data surely cannot.  The right way
> to expose such info to SQL is with a set-returning function reading a
> constant table in the C code, a la pg_get_keywords().  That way takes a
> whole lot less infrastructure and is just as useful to SQL code.

+1

Re: Add primary keys to system catalogs

From
Laurenz Albe
Date:
On Sun, 2021-01-17 at 17:07 -0500, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> > [ v2-0001-Add-primary-keys-and-unique-constraints-to-system.patch ]
> 
> [...] do we really want to prefer
> using the OID indexes as the primary keys?  In most cases there's some
> other index that seems to me to be what a user would think of as the
> pkey, for example pg_class_relname_nsp_index for pg_class or
> pg_proc_proname_args_nsp_index for pg_proc.  Preferring OID where it
> exists is a nice simple rule, which has some attractiveness, but the
> OID indexes seem to me like a lookup aid rather than the "real" object
> identity.

I disagree.  The OID is the real, immutable identity of an object.
The "relname" of a "pg_class" encatalogtry can change any time.
Since there are no foreign keys that reference catalogs, that won't cause
problems, but I still think that primary keys should change as little as
possible.

Yours,
Laurenz Albe




Re: Add primary keys to system catalogs

From
"Joel Jacobson"
Date:
On Mon, Jan 18, 2021, at 18:23, Tom Lane wrote:
> I realized that there's a stronger roadblock for
> treating catalog interrelationships as SQL foreign keys.  Namely,
> that we always represent no-reference situations with a zero OID,
> whereas it'd have to be NULL to look like a valid foreign-key case.

Another roadblock is perhaps the lack of foreign keys on arrays,
which would be needed by the following references:

SELECT * FROM oidjoins WHERE column_type ~ '(vector|\[\])$' ORDER BY 1,2,3;
      table_name      |  column_name   | column_type | ref_table_name | ref_column_name
----------------------+----------------+-------------+----------------+-----------------
pg_constraint        | conexclop      | oid[]       | pg_operator    | oid
pg_constraint        | conffeqop      | oid[]       | pg_operator    | oid
pg_constraint        | confkey        | int2[]      | pg_attribute   | attnum
pg_constraint        | conkey         | int2[]      | pg_attribute   | attnum
pg_constraint        | conpfeqop      | oid[]       | pg_operator    | oid
pg_constraint        | conppeqop      | oid[]       | pg_operator    | oid
pg_extension         | extconfig      | oid[]       | pg_class       | oid
pg_index             | indclass       | oidvector   | pg_opclass     | oid
pg_index             | indcollation   | oidvector   | pg_collation   | oid
pg_index             | indkey         | int2vector  | pg_attribute   | attnum
pg_partitioned_table | partattrs      | int2vector  | pg_attribute   | attnum
pg_partitioned_table | partclass      | oidvector   | pg_opclass     | oid
pg_partitioned_table | partcollation  | oidvector   | pg_collation   | oid
pg_policy            | polroles       | oid[]       | pg_authid      | oid
pg_proc              | proallargtypes | oid[]       | pg_type        | oid
pg_proc              | proargtypes    | oidvector   | pg_type        | oid
pg_statistic_ext     | stxkeys        | int2vector  | pg_attribute   | attnum
pg_trigger           | tgattr         | int2vector  | pg_attribute   | attnum
(18 rows)

I see there is an old thread with work in this area,  "Re: GSoC 2017: Foreign Key Arrays":


The last message in the thread is from 2018-10-02:
>On Tue, 2 Oct, 2018 at 05:13:26AM +0200, Michael Paquier wrote:
>>On Sat, Aug 11, 2018 at 05:20:57AM +0200, Mark Rofail wrote:
>> I am still having problems rebasing this patch. I can not figure it out on
>> my own.
>Okay, it's been a couple of months since this last email, and nothing
>has happened, so I am marking it as returned with feedback.
>--
>Michael

Personally, I would absolutely *love* FKs on array columns.
I always feel shameful when adding array columns to tables,
when the elements are PKs in some other table.
It's convenient and often the best design,
but it feels dirty knowing there are no FKs to help detect application bugs.

Let's hope the current FKs-on-catalog-discussion can ignite the old Foreign Key Arrays thread.

/Joel

Re: Add primary keys to system catalogs

From
Vik Fearing
Date:
On 1/19/21 11:46 AM, Laurenz Albe wrote:
> On Sun, 2021-01-17 at 17:07 -0500, Tom Lane wrote:
>> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>>> [ v2-0001-Add-primary-keys-and-unique-constraints-to-system.patch ]
>>
>> [...] do we really want to prefer
>> using the OID indexes as the primary keys?  In most cases there's some
>> other index that seems to me to be what a user would think of as the
>> pkey, for example pg_class_relname_nsp_index for pg_class or
>> pg_proc_proname_args_nsp_index for pg_proc.  Preferring OID where it
>> exists is a nice simple rule, which has some attractiveness, but the
>> OID indexes seem to me like a lookup aid rather than the "real" object
>> identity.
> 
> I disagree.  The OID is the real, immutable identity of an object.
> The "relname" of a "pg_class" encatalogtry can change any time.
> Since there are no foreign keys that reference catalogs, that won't cause
> problems, but I still think that primary keys should change as little as
> possible.

This might be good advice for systems where the primary key defines the
physical layout of the table, but for postgres's heap there is no reason
not to prefer the natural key of the table.

My vote is with Tom on this one.
-- 
Vik Fearing



Re: Add primary keys to system catalogs

From
Mark Rofail
Date:
Dear Joel,

I would love to start working on this again, I tried to revive the patch again in 2019, however, I faced the same problems as last time (detailed in the thread) and I didn't get the support I needed to continue with this patch.

Are you willing to help rebase and finally publish this patch?

Best Regards,
Mark Rofail

On Tue, 19 Jan 2021 at 2:22 PM Joel Jacobson <joel@compiler.org> wrote:
On Mon, Jan 18, 2021, at 18:23, Tom Lane wrote:
> I realized that there's a stronger roadblock for
> treating catalog interrelationships as SQL foreign keys.  Namely,
> that we always represent no-reference situations with a zero OID,
> whereas it'd have to be NULL to look like a valid foreign-key case.

Another roadblock is perhaps the lack of foreign keys on arrays,
which would be needed by the following references:

SELECT * FROM oidjoins WHERE column_type ~ '(vector|\[\])$' ORDER BY 1,2,3;
      table_name      |  column_name   | column_type | ref_table_name | ref_column_name
----------------------+----------------+-------------+----------------+-----------------
pg_constraint        | conexclop      | oid[]       | pg_operator    | oid
pg_constraint        | conffeqop      | oid[]       | pg_operator    | oid
pg_constraint        | confkey        | int2[]      | pg_attribute   | attnum
pg_constraint        | conkey         | int2[]      | pg_attribute   | attnum
pg_constraint        | conpfeqop      | oid[]       | pg_operator    | oid
pg_constraint        | conppeqop      | oid[]       | pg_operator    | oid
pg_extension         | extconfig      | oid[]       | pg_class       | oid
pg_index             | indclass       | oidvector   | pg_opclass     | oid
pg_index             | indcollation   | oidvector   | pg_collation   | oid
pg_index             | indkey         | int2vector  | pg_attribute   | attnum
pg_partitioned_table | partattrs      | int2vector  | pg_attribute   | attnum
pg_partitioned_table | partclass      | oidvector   | pg_opclass     | oid
pg_partitioned_table | partcollation  | oidvector   | pg_collation   | oid
pg_policy            | polroles       | oid[]       | pg_authid      | oid
pg_proc              | proallargtypes | oid[]       | pg_type        | oid
pg_proc              | proargtypes    | oidvector   | pg_type        | oid
pg_statistic_ext     | stxkeys        | int2vector  | pg_attribute   | attnum
pg_trigger           | tgattr         | int2vector  | pg_attribute   | attnum
(18 rows)

I see there is an old thread with work in this area,  "Re: GSoC 2017: Foreign Key Arrays":


The last message in the thread is from 2018-10-02:
>On Tue, 2 Oct, 2018 at 05:13:26AM +0200, Michael Paquier wrote:
>>On Sat, Aug 11, 2018 at 05:20:57AM +0200, Mark Rofail wrote:
>> I am still having problems rebasing this patch. I can not figure it out on
>> my own.
>Okay, it's been a couple of months since this last email, and nothing
>has happened, so I am marking it as returned with feedback.
>--
>Michael

Personally, I would absolutely *love* FKs on array columns.
I always feel shameful when adding array columns to tables,
when the elements are PKs in some other table.
It's convenient and often the best design,
but it feels dirty knowing there are no FKs to help detect application bugs.

Let's hope the current FKs-on-catalog-discussion can ignite the old Foreign Key Arrays thread.


/Joel

Re: Add primary keys to system catalogs

From
Tom Lane
Date:
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> On Sun, 2021-01-17 at 17:07 -0500, Tom Lane wrote:
>> [...] do we really want to prefer
>> using the OID indexes as the primary keys?  In most cases there's some
>> other index that seems to me to be what a user would think of as the
>> pkey, for example pg_class_relname_nsp_index for pg_class or
>> pg_proc_proname_args_nsp_index for pg_proc.  Preferring OID where it
>> exists is a nice simple rule, which has some attractiveness, but the
>> OID indexes seem to me like a lookup aid rather than the "real" object
>> identity.

> I disagree.  The OID is the real, immutable identity of an object.
> The "relname" of a "pg_class" encatalogtry can change any time.
> Since there are no foreign keys that reference catalogs, that won't cause
> problems, but I still think that primary keys should change as little as
> possible.

Yeah, there's also the point that the OID is what we use for "foreign
key" references from other catalogs.  I don't deny that there are
reasons to think of OID as the pkey.  But as long as these constraints
are basically cosmetic, it seemed to me that we should consider the
other approach.  I'm not dead set on that, just wanted to discuss it.

            regards, tom lane



Re: Add primary keys to system catalogs

From
"Joel Jacobson"
Date:
On Tue, Jan 19, 2021, at 18:25, Mark Rofail wrote:
>Dear Joel,
>
>I would love to start working on this again, I tried to revive the patch again in 2019, however, I faced the same problems as >last time (detailed in the thread) and I didn't get the support I needed to continue with this patch.
>
>Are you willing to help rebase and finally publish this patch?

Willing, sure, not perhaps not able,
if there are complex problems that need a deep understanding of code base,
I'm not sure I will be of much help, but I can for sure do testing and reviewing.

/Joel

Re: Add primary keys to system catalogs

From
Mark Rofail
Date:
I'll post tomorrow the latest rebased patch with a summary of the issues. Let's continue this in the foreign key array thread, as to not clutter this thread.

Regards, Mark Rofail.

On Tue, Jan 19, 2021, 11:00 PM Joel Jacobson <joel@compiler.org> wrote:
On Tue, Jan 19, 2021, at 18:25, Mark Rofail wrote:
>Dear Joel,
>
>I would love to start working on this again, I tried to revive the patch again in 2019, however, I faced the same problems as >last time (detailed in the thread) and I didn't get the support I needed to continue with this patch.
>
>Are you willing to help rebase and finally publish this patch?

Willing, sure, not perhaps not able,
if there are complex problems that need a deep understanding of code base,
I'm not sure I will be of much help, but I can for sure do testing and reviewing.

/Joel

Re: Add primary keys to system catalogs

From
Peter Eisentraut
Date:
On 2021-01-18 00:35, Robert Haas wrote:
> I don't have any complaint about labelling some of the unique indexes
> as primary keys, but I think installing foreign keys that don't really
> enforce anything may lead to confusion.

FWIW, "not enforced" constraints (such as foreign keys) is a feature 
that is in the SQL standard and in other SQL implementations.

-- 
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/



Re: Add primary keys to system catalogs

From
Peter Eisentraut
Date:
On 2021-01-18 18:23, Tom Lane wrote:
> The reason I got interested in this right now is the nearby
> discussion [1] about why findoidjoins misses some catalog
> relationships and whether we should fix that and/or make its results
> more readily accessible.  I'd thought perhaps FK constraint entries
> could supersede the need for that tool altogether, but now it seems
> like not.

Yes, catalog consistency checking was also something I had in mind. 
There are clearly a number of complications still to resolve.  But in 
general the idea of representing relationships between tables in the 
DDL/schema sounds like a sensible idea.

-- 
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/



Re: Add primary keys to system catalogs

From
Peter Eisentraut
Date:
On 2021-01-17 23:07, Tom Lane wrote:
> I've reviewed this patch.  It looks pretty solid to me, with a couple
> trivial nits as mentioned below, and one bigger thing that's perhaps
> in the category of bikeshedding.  Namely, do we really want to prefer
> using the OID indexes as the primary keys?  In most cases there's some
> other index that seems to me to be what a user would think of as the
> pkey, for example pg_class_relname_nsp_index for pg_class or
> pg_proc_proname_args_nsp_index for pg_proc.  Preferring OID where it
> exists is a nice simple rule, which has some attractiveness, but the
> OID indexes seem to me like a lookup aid rather than the "real" object
> identity.

I chose this because the notional foreign keys point to the OID.

If you design some basic business database with customer IDs, product 
IDs, etc., you'd also usually make the ID the primary key, even if you 
have, say, a unique constraint on the product name.  But this is of 
course a matter of taste to some degree.

> system_constraints.sql should be removed by the maintainer-clean target
> in src/backend/catalog/Makefile; perhaps also mention it in the comment
> for the clean target.  Also I suppose src/tools/msvc/clean.bat needs to
> remove it, like it does postgres.bki.

done

> The contents of system_constraints.sql seem pretty randomly ordered,
> and I bet the order isn't stable across machines.  It would be wise
> to sort the entries by name to ensure we don't get inconsistencies
> between different builds.  (If nothing else, that could complicate
> validating tarballs.)

They follow the order in which the catalogs are processed byt genbki.pl. 
  This is the same order in which the catalog data and indexes are 
created in postgres.bki.  The Perl code to handle each of these is 
conceptually the same, so that seems solid.  We could order them 
differently, but there is also value in keeping the catalog processing 
order globally consistent.

> I don't follow why the pg_seclabel, pg_shseclabel indexes aren't made
> into pkeys?  There's a comment claiming they "use a nondefault operator
> class", but that's untrue AFAICS.

When I started the patch, it was text_pattern_ops, but that was a long 
time ago. ;-)  Fixed now.

-- 
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/

Attachment

Re: Add primary keys to system catalogs

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 2021-01-17 23:07, Tom Lane wrote:
>> I've reviewed this patch.  It looks pretty solid to me, with a couple
>> trivial nits as mentioned below, and one bigger thing that's perhaps
>> in the category of bikeshedding.  Namely, do we really want to prefer
>> using the OID indexes as the primary keys?

> I chose this because the notional foreign keys point to the OID.
> If you design some basic business database with customer IDs, product 
> IDs, etc., you'd also usually make the ID the primary key, even if you 
> have, say, a unique constraint on the product name.  But this is of 
> course a matter of taste to some degree.

Fair enough.  As I said upthread, I just wanted to be sure we'd considered
the alternative.  I'm content to use the OIDs as pkeys, although I think
that decision should be explicitly recorded somewhere (cf attachment).

>> The contents of system_constraints.sql seem pretty randomly ordered,
>> and I bet the order isn't stable across machines.

> They follow the order in which the catalogs are processed byt genbki.pl. 

Looking closer, I see the data structure is an array not a hash, so
I withdraw the concern about instability.

After reading the patch again, I have a couple more nits about comments,
which I'll just present as a proposed delta patch.  Otherwise it's good.
I'll mark it RFC.

            regards, tom lane

diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 6747996ce3..b68c1752c0 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -663,6 +663,8 @@ die
   "genbki OID counter reached $GenbkiNextOid, overrunning FirstBootstrapObjectId\n"
   if $GenbkiNextOid > $FirstBootstrapObjectId;

+# Now generate system_constraints.sql
+
 foreach my $c (@system_constraints)
 {
     print $constraints $c, "\n";
diff --git a/src/include/catalog/genbki.h b/src/include/catalog/genbki.h
index 296765d987..5d05fafb5d 100644
--- a/src/include/catalog/genbki.h
+++ b/src/include/catalog/genbki.h
@@ -55,12 +55,15 @@
 #define DECLARE_TOAST(name,toastoid,indexoid) extern int no_such_variable

 /*
- * These lines processed by genbki.pl to create the statements
+ * These lines are processed by genbki.pl to create the statements
  * the bootstrap parser will turn into DefineIndex calls.
  *
- * The keyword is DECLARE_INDEX or DECLARE_UNIQUE_INDEX or DECLARE_UNIQUE_INDEX_PKEY.  The first two
- * arguments are the index name and OID, the rest is much like a standard
- * 'create index' SQL command.
+ * The keyword is DECLARE_INDEX or DECLARE_UNIQUE_INDEX or
+ * DECLARE_UNIQUE_INDEX_PKEY.  ("PKEY" marks the index as being the catalog's
+ * primary key; currently this is only cosmetically different from a regular
+ * unique index.  By convention, we usually make a catalog's OID column its
+ * pkey, if it has one.)  The first two arguments are the index's name and
+ * OID, the rest is much like a standard 'create index' SQL command.
  *
  * For each index, we also provide a #define for its OID.  References to
  * the index in the C code should always use these #defines, not the actual
diff --git a/src/test/regress/expected/misc_sanity.out b/src/test/regress/expected/misc_sanity.out
index ae85817c61..9ebe28a78d 100644
--- a/src/test/regress/expected/misc_sanity.out
+++ b/src/test/regress/expected/misc_sanity.out
@@ -43,6 +43,9 @@ WHERE refclassid = 0 OR refobjid = 0 OR
 -- whatever OID the test is complaining about must have been added later
 -- in initdb, where it intentionally isn't pinned.  Legitimate exceptions
 -- to that rule are listed in the comments in setup_depend().
+-- Currently, pg_rewrite is also listed by this check, even though it is
+-- covered by setup_depend().  That happens because there are no rules in
+-- the pinned data, but initdb creates some intentionally-not-pinned views.
 do $$
 declare relnm text;
   reloid oid;
diff --git a/src/test/regress/sql/misc_sanity.sql b/src/test/regress/sql/misc_sanity.sql
index ea80cf1f10..9699f5cc3b 100644
--- a/src/test/regress/sql/misc_sanity.sql
+++ b/src/test/regress/sql/misc_sanity.sql
@@ -44,6 +44,9 @@ WHERE refclassid = 0 OR refobjid = 0 OR
 -- whatever OID the test is complaining about must have been added later
 -- in initdb, where it intentionally isn't pinned.  Legitimate exceptions
 -- to that rule are listed in the comments in setup_depend().
+-- Currently, pg_rewrite is also listed by this check, even though it is
+-- covered by setup_depend().  That happens because there are no rules in
+-- the pinned data, but initdb creates some intentionally-not-pinned views.

 do $$
 declare relnm text;

Re: Add primary keys to system catalogs

From
"Joel Jacobson"
Date:
Many thanks for excellent work!

I've tested the patch successfully.

I ran this query (on a patched database) to see if there are still any catalog tables without primary keys:

SELECT
  table_name
FROM information_schema.tables
WHERE table_schema = 'pg_catalog'
AND table_type = 'BASE TABLE'
EXCEPT
SELECT
  table_constraints.table_name
FROM information_schema.table_constraints
JOIN information_schema.key_column_usage
  ON key_column_usage.constraint_name = table_constraints.constraint_name
WHERE table_constraints.table_schema = 'pg_catalog'
AND table_constraints.constraint_type = 'PRIMARY KEY'

table_name
-------------
pg_depend
pg_shdepend
(2 rows)

Wouldn't it be possible to add primary keys to these two as well?

It would need to be multi-column primary keys, but should be ok
since we have that for other catalogs?

/Joel

On Thu, Jan 21, 2021, at 18:15, Tom Lane wrote:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 2021-01-17 23:07, Tom Lane wrote:
>> I've reviewed this patch.  It looks pretty solid to me, with a couple
>> trivial nits as mentioned below, and one bigger thing that's perhaps
>> in the category of bikeshedding.  Namely, do we really want to prefer
>> using the OID indexes as the primary keys?

> I chose this because the notional foreign keys point to the OID.
> If you design some basic business database with customer IDs, product 
> IDs, etc., you'd also usually make the ID the primary key, even if you 
> have, say, a unique constraint on the product name.  But this is of 
> course a matter of taste to some degree.

Fair enough.  As I said upthread, I just wanted to be sure we'd considered
the alternative.  I'm content to use the OIDs as pkeys, although I think
that decision should be explicitly recorded somewhere (cf attachment).

>> The contents of system_constraints.sql seem pretty randomly ordered,
>> and I bet the order isn't stable across machines.

> They follow the order in which the catalogs are processed byt genbki.pl. 

Looking closer, I see the data structure is an array not a hash, so
I withdraw the concern about instability.

After reading the patch again, I have a couple more nits about comments,
which I'll just present as a proposed delta patch.  Otherwise it's good.
I'll mark it RFC.

regards, tom lane

Re: Add primary keys to system catalogs

From
Tom Lane
Date:
"Joel Jacobson" <joel@compiler.org> writes:
> I ran this query (on a patched database) to see if there are still any catalog tables without primary keys:
> ...
> pg_depend
> pg_shdepend

Yeah, this is noted in the patch's own regression tests.

> Wouldn't it be possible to add primary keys to these two as well?

Neither of the existing indexes is suitable, not being unique.

We could imagine adding a unique index across the whole column set,
but that would be an awfully large price to pay for neatnik-ism.
Also, at least for pg_depend (less sure about pg_shdepend), some code
cleanup would be required, because I don't think that we try very
hard to avoid making duplicate dependency entries.  On the whole
I feel this'd be counterproductive.

            regards, tom lane



Re: Add primary keys to system catalogs

From
"Joel Jacobson"
Date:
On Fri, Jan 22, 2021, at 16:42, Tom Lane wrote:
>"Joel Jacobson" <joel@compiler.org> writes:
>> I ran this query (on a patched database) to see if there are still any catalog tables without primary keys:
>> ...
>> pg_depend
>> pg_shdepend
>
>Yeah, this is noted in the patch's own regression tests.

Thanks. Looks good.

>> Wouldn't it be possible to add primary keys to these two as well?
>
>Neither of the existing indexes is suitable, not being unique.
>
>We could imagine adding a unique index across the whole column set,
>but that would be an awfully large price to pay for neatnik-ism.
>Also, at least for pg_depend (less sure about pg_shdepend), some code
>cleanup would be required, because I don't think that we try very
>hard to avoid making duplicate dependency entries.  On the whole
>I feel this'd be counterproductive.
>
>regards, tom lane

I see, and I agree with you.

I'm very happy with this patch.

I've already found great use of it in a tool I'm working on:

Many thanks to all of you for all the great work!
Can't wait to use this functionality in production.

/Joel

Re: Add primary keys to system catalogs

From
Peter Eisentraut
Date:
On 2021-01-21 18:15, Tom Lane wrote:
> After reading the patch again, I have a couple more nits about comments,
> which I'll just present as a proposed delta patch.  Otherwise it's good.
> I'll mark it RFC.

Committed with your update, thanks.

-- 
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/



Re: Add primary keys to system catalogs

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> Committed with your update, thanks.

Hmm, shouldn't there have been a catversion bump in there?

            regards, tom lane



Re: Add primary keys to system catalogs

From
Peter Eisentraut
Date:
On 2021-01-30 22:56, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> Committed with your update, thanks.
> 
> Hmm, shouldn't there have been a catversion bump in there?

I suppose yes on the grounds that it introduces something new in a 
freshly initdb-ed database.  But I thought it wasn't necessary because 
there is no dependency between the binaries and the on-disk state.

There has already been another catversion change since, so it's no 
longer relevant.

-- 
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/



Re: Add primary keys to system catalogs

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 2021-01-30 22:56, Tom Lane wrote:
>> Hmm, shouldn't there have been a catversion bump in there?

> I suppose yes on the grounds that it introduces something new in a 
> freshly initdb-ed database.  But I thought it wasn't necessary because 
> there is no dependency between the binaries and the on-disk state.

I've generally worked on the theory that a catversion bump is indicated
if you need to initdb in order to pass the updated regression tests.
Which one did in this case.  However ...

> There has already been another catversion change since, so it's no 
> longer relevant.

... yeah, it's moot now.

            regards, tom lane



Re: Add primary keys to system catalogs

From
Peter Eisentraut
Date:
On 2021-02-01 15:24, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> On 2021-01-30 22:56, Tom Lane wrote:
>>> Hmm, shouldn't there have been a catversion bump in there?
> 
>> I suppose yes on the grounds that it introduces something new in a
>> freshly initdb-ed database.  But I thought it wasn't necessary because
>> there is no dependency between the binaries and the on-disk state.
> 
> I've generally worked on the theory that a catversion bump is indicated
> if you need to initdb in order to pass the updated regression tests.
> Which one did in this case.

Yeah, that's a good way of looking at it.  I'll keep that in mind.

-- 
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/



Re: Add primary keys to system catalogs

From
Peter Eisentraut
Date:
On 2021-01-22 16:42, Tom Lane wrote:
>> pg_depend
>> pg_shdepend
> Yeah, this is noted in the patch's own regression tests.
> 
>> Wouldn't it be possible to add primary keys to these two as well?
> Neither of the existing indexes is suitable, not being unique.
> 
> We could imagine adding a unique index across the whole column set,
> but that would be an awfully large price to pay for neatnik-ism.
> Also, at least for pg_depend (less sure about pg_shdepend), some code
> cleanup would be required, because I don't think that we try very
> hard to avoid making duplicate dependency entries.  On the whole
> I feel this'd be counterproductive.

I did attempt to make a six- or seven-column unique constraint on 
pg_depend a while ago.  This fails pretty much immediately during 
initdb.  None of the code that adds dependencies, in particular 
recordMultipleDependencies(), checks if the dependency is already there.

I do wonder, however, under what circumstances code would be put into a 
situation where it would add the exact same dependency again, and also 
under what circumstances it would add a dependency between the same 
objects but a different deptype, and how that would work during 
recursive deletion.  Now that we have the refobjversion column, the 
presence of duplicate dependencies might be even more dubious.  I think 
that could be worth analyzing.

-- 
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/



Re: Add primary keys to system catalogs

From
Julien Rouhaud
Date:
On Tue, Feb 2, 2021 at 6:49 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> I do wonder, however, under what circumstances code would be put into a
> situation where it would add the exact same dependency again, and also
> under what circumstances it would add a dependency between the same
> objects but a different deptype, and how that would work during
> recursive deletion.  Now that we have the refobjversion column, the
> presence of duplicate dependencies might be even more dubious.  I think
> that could be worth analyzing.

There's at least dependencies from indexes to pg_class (underlying
table) and pg_collation that can be entirely duplicated (if the same
column and/or collation is used in both key and predicate for
instance), and this was the case before refobjversion was introduced.



Re: Add primary keys to system catalogs

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> I do wonder, however, under what circumstances code would be put into a 
> situation where it would add the exact same dependency again, and also 
> under what circumstances it would add a dependency between the same 
> objects but a different deptype, and how that would work during 
> recursive deletion.  Now that we have the refobjversion column, the 
> presence of duplicate dependencies might be even more dubious.  I think 
> that could be worth analyzing.

The duplicate-deps case occurs if, say, you use the same function more
than once in a view.  While we could probably get rid of dups generated
within the same recordMultipleDependencies call (and maybe already do?),
it's harder to deal with dups made by retail calls.  For example,
makeOperatorDependencies doesn't trouble to avoid making duplicate
dependencies when the operator's left and right input types are the
same, or the same as the result type.

The case with almost-dup-but-not-same-deptype *does* occur, for instance
it's very possible to have both normal and automatic dependencies.
As I recall, the policy is to perform the deletion if any of the deps
would allow it.

Dunno about refobjversion; I have my doubts that putting that info in
pg_depend was a sane design choice at all.  But from what I understand
of it, wouldn't all deps on a given object necessarily have the same
version?

            regards, tom lane



Re: Add primary keys to system catalogs

From
Julien Rouhaud
Date:
On Tue, Feb 2, 2021 at 11:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Dunno about refobjversion; I have my doubts that putting that info in
> pg_depend was a sane design choice at all.  But from what I understand
> of it, wouldn't all deps on a given object necessarily have the same
> version?

Correct, assuming of course that the dependencies are from the same object.



Re: Add primary keys to system catalogs

From
Peter Eisentraut
Date:
On 03.10.20 08:40, Peter Eisentraut wrote:
> I saw someone ask once for a schema diagram of the system catalogs. 
> Things like that have occasionally been produced manually, but they are 
> not regularly updated.  That made me wonder, why can't we add primary 
> and foreign keys to system catalogs and then have existing tools produce 
> such a schema diagram automatically?

Given that these prerequisites have been accomplished in PostgreSQL 14, 
I went back and made a little tool to convert the foreign-key 
relationship information into a graphviz representation.  The result 
turns out to be pretty unwieldly and probably not something easily 
suitable for the documentation; maybe someone wants to fine-tune the 
graphviz settings to get a better view.  My tool is at 
<https://github.com/petere/pgcatviz>.  Attached is a sample output.

Attachment