Thread: Add primary keys to system catalogs
I saw someone ask once for a schema diagram of the system catalogs. Things like that have occasionally been produced manually, but they are not regularly updated. That made me wonder, why can't we add primary and foreign keys to system catalogs and then have existing tools produce such a schema diagram automatically? Since we have ADD PRIMARY KEY USING INDEX, we can declare a primary key for an existing index. So this doesn't have to affect the low-level early bootstrapping. The attached patch adds those commands manually. Another option might be to have the catalog generation Perl tooling create those declarations automatically from some marker in the catalog files. That would also allow declaring unique constraints for the unique indexes that don't end up being the primary key. I'm not dealing here with how we might do foreign keys between system catalogs, but I think the idea would be to have some way to declare a foreign key "on paper" without any actual triggers. Any thoughts on this direction? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Sat, 3 Oct 2020, 14:40 Peter Eisentraut, <peter.eisentraut@2ndquadrant.com> wrote:
I saw someone ask once for a schema diagram of the system catalogs.
Things like that have occasionally been produced manually, but they are
not regularly updated. That made me wonder, why can't we add primary
and foreign keys to system catalogs and then have existing tools produce
such a schema diagram automatically?
Since we have ADD PRIMARY KEY USING INDEX, we can declare a primary key
for an existing index. So this doesn't have to affect the low-level
early bootstrapping. The attached patch adds those commands manually.
Another option might be to have the catalog generation Perl tooling
create those declarations automatically from some marker in the catalog
files. That would also allow declaring unique constraints for the
unique indexes that don't end up being the primary key.
Any thoughts on this direction?
I like the general idea a lot. I've used Pg for a long time and I still struggle to march up relationships sometimes. Especially given that some things use relation oid oid keys and some use named cols as keys.
So a big +1 from me for the idea. Especially if we ensure psql recognises when the relation 'oid' attribute has a declared PK and includes it in the column list.
I don't object to simply declaring them without any implementing triggers. You aren't supposed to mess with the catalogs anyway. I'd actually like it to be defended against more actively by default. So the FKs are implicitly always valid, because the implementation says so. Much the same way trigger based FKs are unless you break the rules and mess with the triggers.
Frankly I think we really need a way to mark FKs to be DISABLED or NOT CHECKED or something and a way to mark them as NOT VALID. Rsther than expecting uses to fiddle with the implementation triggers. But I don't think FKs on system catalogs require that, it's just be cosmetic there.
On Sat, Oct 3, 2020 at 09:27:02PM +0800, Craig Ringer wrote: > > > On Sat, 3 Oct 2020, 14:40 Peter Eisentraut, <peter.eisentraut@2ndquadrant.com> > wrote: > > I saw someone ask once for a schema diagram of the system catalogs. > Things like that have occasionally been produced manually, but they are > not regularly updated. That made me wonder, why can't we add primary > and foreign keys to system catalogs and then have existing tools produce > such a schema diagram automatically? > > Since we have ADD PRIMARY KEY USING INDEX, we can declare a primary key > for an existing index. So this doesn't have to affect the low-level > early bootstrapping. The attached patch adds those commands manually. > Another option might be to have the catalog generation Perl tooling > create those declarations automatically from some marker in the catalog > files. That would also allow declaring unique constraints for the > unique indexes that don't end up being the primary key. > > > Any thoughts on this direction? > > > I like the general idea a lot. I've used Pg for a long time and I still > struggle to march up relationships sometimes. Especially given that some things > use relation oid oid keys and some use named cols as keys. > > So a big +1 from me for the idea. Especially if we ensure psql recognises when > the relation 'oid' attribute has a declared PK and includes it in the column > list. +1 > I don't object to simply declaring them without any implementing triggers. You > aren't supposed to mess with the catalogs anyway. I'd actually like it to be > defended against more actively by default. So the FKs are implicitly always > valid, because the implementation says so. Much the same way trigger based FKs > are unless you break the rules and mess with the triggers. > > Frankly I think we really need a way to mark FKs to be DISABLED or NOT CHECKED > or something and a way to mark them as NOT VALID. Rsther than expecting uses to > fiddle with the implementation triggers. But I don't think FKs on system > catalogs require that, it's just be cosmetic there. +1 -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Sat, Oct 3, 2020 at 9:27 AM Craig Ringer <craig.ringer@enterprisedb.com> wrote:
> So a big +1 from me for the idea. Especially if we ensure psql recognises when the relation 'oid' attribute has a declared PK and includes it in the column list.
Oid has been in the normal column list since v12 when it stopped being a system column. Or did you mean something else?
On Sun, 4 Oct 2020, 01:32 John Naylor, <john.naylor@enterprisedb.com> wrote:
On Sat, Oct 3, 2020 at 9:27 AM Craig Ringer <craig.ringer@enterprisedb.com> wrote:
> So a big +1 from me for the idea. Especially if we ensure psql recognises when the relation 'oid' attribute has a declared PK and includes it in the column list.Oid has been in the normal column list since v12 when it stopped being a system column. Or did you mean something else?
It means I've spent way too long working on extensions running mainly on pg11 lately and way too little reading -hackers. Ignore they bit. Thanks..
Hi, On 2020-10-03 08:40:31 +0200, Peter Eisentraut wrote: > Since we have ADD PRIMARY KEY USING INDEX, we can declare a primary key for > an existing index. So this doesn't have to affect the low-level early > bootstrapping. The attached patch adds those commands manually. Another > option might be to have the catalog generation Perl tooling create those > declarations automatically from some marker in the catalog files. That > would also allow declaring unique constraints for the unique indexes that > don't end up being the primary key. Hm. What prevents us from declaring the pkey during bootstrap? I don't at all like adding yet another place that needs manual editing when doing DDL changes. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2020-10-03 08:40:31 +0200, Peter Eisentraut wrote: >> Since we have ADD PRIMARY KEY USING INDEX, we can declare a primary key for >> an existing index. So this doesn't have to affect the low-level early >> bootstrapping. The attached patch adds those commands manually. Another >> option might be to have the catalog generation Perl tooling create those >> declarations automatically from some marker in the catalog files. That >> would also allow declaring unique constraints for the unique indexes that >> don't end up being the primary key. > Hm. What prevents us from declaring the pkey during bootstrap? I don't > at all like adding yet another place that needs manual editing when > doing DDL changes. We don't add new catalogs often, so the cost-benefit ratio of automating this looks pretty poor. It's not like you'll be able to forget to do it, given the proposed regression test check for catalogs without pkeys. One thing I'm wondering about though is whether Peter's implementation ends up with desirable pg_depend contents for the pkey constraints. Probably we want them to just be pinned and not have any explicit dependencies on the associated indexes, but I haven't thought hard about it. regards, tom lane
Hi, On 2020-10-06 15:31:16 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2020-10-03 08:40:31 +0200, Peter Eisentraut wrote: > >> Since we have ADD PRIMARY KEY USING INDEX, we can declare a primary key for > >> an existing index. So this doesn't have to affect the low-level early > >> bootstrapping. The attached patch adds those commands manually. Another > >> option might be to have the catalog generation Perl tooling create those > >> declarations automatically from some marker in the catalog files. That > >> would also allow declaring unique constraints for the unique indexes that > >> don't end up being the primary key. > > > Hm. What prevents us from declaring the pkey during bootstrap? I don't > > at all like adding yet another place that needs manual editing when > > doing DDL changes. > > We don't add new catalogs often, so the cost-benefit ratio of automating > this looks pretty poor. True, we don't create new ones that often. Still think that distributing such setup over fewer places is good. And it's not like there's only a handful of pkeys to start with. To me it makes more sense to add a DECLARE_PRIMARY_KEY in indexing.h, replacing the relevant DECLARE_UNIQUE_INDEX stanzas. Or, possibly preferrable, do it as part of the CATALOG() directly - which'd avoid needing the index statement in the first place. The Catalog.pm part is trivial. It's abit harder to implement actually creating the constraint - but even the hard route of adding a new field to Boot_DeclareIndexStmt or such wouldn't be particularly hard. > One thing I'm wondering about though is whether Peter's implementation > ends up with desirable pg_depend contents for the pkey constraints. > Probably we want them to just be pinned and not have any explicit > dependencies on the associated indexes, but I haven't thought hard > about it. That sounds right to me. Wonder whether it's not time to move the DECLARE bits from indexing.h to the individual catalogs, independent of what we're discussing here. With todays Catalog.pm there's really not much point in keeping them separate, imo. Greetings, Andres Freund
On Tue, Oct 6, 2020 at 4:16 PM Andres Freund <andres@anarazel.de> wrote:
True, we don't create new ones that often. Still think that distributing
such setup over fewer places is good. And it's not like there's only a
handful of pkeys to start with. To me it makes more sense to add a
DECLARE_PRIMARY_KEY in indexing.h, replacing the relevant
DECLARE_UNIQUE_INDEX stanzas.
That seems logical.
Or, possibly preferrable, do it as part of
the CATALOG() directly - which'd avoid needing the index statement in
the first place.
That line is already messy in places. AFAICT, you'd still need info from the index statement (at least my shortened version below), leading to two syntaxes and two places for the same thing. Maybe I'm missing something?
Wonder whether it's not time to move the DECLARE bits from indexing.h to
the individual catalogs, independent of what we're discussing here. With
todays Catalog.pm there's really not much point in keeping them
separate, imo.
That would look nicer, at least. A declaration could go from
DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subscription_rel using btree(srrelid oid_ops, srsubid oid_ops));
#define SubscriptionRelSrrelidSrsubidIndexId 6117
#define SubscriptionRelSrrelidSrsubidIndexId 6117
to something like
DECLARE_UNIQUE_INDEX(btree(srrelid oid_ops, srsubid oid_ops), 6117, SubscriptionRelSrrelidSrsubidIndexId));
On 2020-10-03 08:40, Peter Eisentraut wrote: > Since we have ADD PRIMARY KEY USING INDEX, we can declare a primary key > for an existing index. So this doesn't have to affect the low-level > early bootstrapping. The attached patch adds those commands manually. > Another option might be to have the catalog generation Perl tooling > create those declarations automatically from some marker in the catalog > files. That would also allow declaring unique constraints for the > unique indexes that don't end up being the primary key. I have reworked my patch like this. Now, the primary key is identified by using DECLARE_UNIQUE_INDEX_PKEY() instead of DECLARE_UNIQUE_INDEX() in the catalog pg_*.h files. This automatically generates the required ALTER TABLE commands in a new file system_constraints.sql. initdb is ordered so that that file is run before dependency processing, so the system constraints also end up pinned (as was requested during previous discussion). Note, this needs the get_constraint_index() patch I just posted separately. The old implementation of get_constraint_index() does not handle pinned constraints. There is something strange going on in the misc_sanity test, as seen by the test output change -NOTICE: pg_constraint contains unpinned initdb-created object(s) This previously detected some constraints from the information schema, so it was correct to point those out. But since it looks for the min(oid) of a catalog, this will now find one of the new pinned constraints, so it won't detect this case anymore. I think that test is not written correctly. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
Attachment
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > [ v2-0001-Add-primary-keys-and-unique-constraints-to-system.patch ] I've reviewed this patch. It looks pretty solid to me, with a couple trivial nits as mentioned below, and one bigger thing that's perhaps in the category of bikeshedding. Namely, do we really want to prefer using the OID indexes as the primary keys? In most cases there's some other index that seems to me to be what a user would think of as the pkey, for example pg_class_relname_nsp_index for pg_class or pg_proc_proname_args_nsp_index for pg_proc. Preferring OID where it exists is a nice simple rule, which has some attractiveness, but the OID indexes seem to me like a lookup aid rather than the "real" object identity. > There is something strange going on in the misc_sanity test, as seen by > the test output change > -NOTICE: pg_constraint contains unpinned initdb-created object(s) Formerly the lowest OID in pg_constraint was that of the information_schema.cardinal_number_domain_check CHECK constraint, which is intentionally not pinned. Now the lowest OID is that of pg_proc_oid_index, which is pinned, hence the output change. I think that's fine. The idea here is just to notice whether any catalogs have been missed by setup_depend, and we have now proven that pg_constraint wasn't missed, so it's cool. The other catalog that is listed in the expected output, but is not one of the ones intentionally excluded by setup_depend, is pg_rewrite. That's because its lowest OID is a rewrite rule for a system view, which we've intentionally made non-pinned. So that's also fine, but maybe it'd be worth explaining it in the comment above the DO block. Anyway, on to the minor nits: system_constraints.sql should be removed by the maintainer-clean target in src/backend/catalog/Makefile; perhaps also mention it in the comment for the clean target. Also I suppose src/tools/msvc/clean.bat needs to remove it, like it does postgres.bki. The contents of system_constraints.sql seem pretty randomly ordered, and I bet the order isn't stable across machines. It would be wise to sort the entries by name to ensure we don't get inconsistencies between different builds. (If nothing else, that could complicate validating tarballs.) I don't follow why the pg_seclabel, pg_shseclabel indexes aren't made into pkeys? There's a comment claiming they "use a nondefault operator class", but that's untrue AFAICS. regards, tom lane
On Sat, Oct 3, 2020 at 9:27 AM Craig Ringer <craig.ringer@enterprisedb.com> wrote: > Frankly I think we really need a way to mark FKs to be DISABLED or NOT CHECKED or something and a way to mark them as NOTVALID. Rsther than expecting uses to fiddle with the implementation triggers. But I don't think FKs on system catalogsrequire that, it's just be cosmetic there. Not really. I think the idea that users don't or shouldn't ever do manual DDL on system catalogs is not very plausible, considering that we suggest such steps in our own release notes. I don't have any complaint about labelling some of the unique indexes as primary keys, but I think installing foreign keys that don't really enforce anything may lead to confusion. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > I don't have any complaint about labelling some of the unique indexes > as primary keys, but I think installing foreign keys that don't really > enforce anything may lead to confusion. I'm not sure if I buy the "confusion" argument, but after thinking about this more I realized that there's a stronger roadblock for treating catalog interrelationships as SQL foreign keys. Namely, that we always represent no-reference situations with a zero OID, whereas it'd have to be NULL to look like a valid foreign-key case. Changing to real NULLs in those columns would of course break a ton of C code, so I don't think that's gonna happen. We could overlay an FK onto OID columns that should never have zero entries, but that would miss enough of the interesting cases that I don't find it a very attractive proposal. So I think we should finish up the project of making real SQL constraints for the catalog indexes, but the idea of making FK constraints too looks like it'll end in failure. The reason I got interested in this right now is the nearby discussion [1] about why findoidjoins misses some catalog relationships and whether we should fix that and/or make its results more readily accessible. I'd thought perhaps FK constraint entries could supersede the need for that tool altogether, but now it seems like not. I still like the idea of marking OID relationships in the catalog headers though. Perhaps we should take Joel's suggestion of a new system catalog more seriously, and have genbki.pl populate such a catalog from info in the catalog header files. regards, tom lane [1] https://www.postgresql.org/message-id/flat/d1f413ff-0e50-413d-910c-bdd9ee9752c1%40www.fastmail.com
I wrote: > ... I still like the idea of marking OID relationships in the > catalog headers though. Perhaps we should take Joel's suggestion > of a new system catalog more seriously, and have genbki.pl populate > such a catalog from info in the catalog header files. On second thought, a catalog is overkill; it'd only be useful if the data could change after initdb, which this data surely cannot. The right way to expose such info to SQL is with a set-returning function reading a constant table in the C code, a la pg_get_keywords(). That way takes a whole lot less infrastructure and is just as useful to SQL code. [ wanders away wondering about a debug mode in which we actually validate OIDs inserted into such columns... ] regards, tom lane
On Mon, Jan 18, 2021, at 19:33, Tom Lane wrote:
> On second thought, a catalog is overkill; it'd only be useful if the data
> could change after initdb, which this data surely cannot. The right way
> to expose such info to SQL is with a set-returning function reading a
> constant table in the C code, a la pg_get_keywords(). That way takes a
> whole lot less infrastructure and is just as useful to SQL code.
+1
On Sun, 2021-01-17 at 17:07 -0500, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > > [ v2-0001-Add-primary-keys-and-unique-constraints-to-system.patch ] > > [...] do we really want to prefer > using the OID indexes as the primary keys? In most cases there's some > other index that seems to me to be what a user would think of as the > pkey, for example pg_class_relname_nsp_index for pg_class or > pg_proc_proname_args_nsp_index for pg_proc. Preferring OID where it > exists is a nice simple rule, which has some attractiveness, but the > OID indexes seem to me like a lookup aid rather than the "real" object > identity. I disagree. The OID is the real, immutable identity of an object. The "relname" of a "pg_class" encatalogtry can change any time. Since there are no foreign keys that reference catalogs, that won't cause problems, but I still think that primary keys should change as little as possible. Yours, Laurenz Albe
On Mon, Jan 18, 2021, at 18:23, Tom Lane wrote:
> I realized that there's a stronger roadblock for
> treating catalog interrelationships as SQL foreign keys. Namely,
> that we always represent no-reference situations with a zero OID,
> whereas it'd have to be NULL to look like a valid foreign-key case.
Another roadblock is perhaps the lack of foreign keys on arrays,
which would be needed by the following references:
SELECT * FROM oidjoins WHERE column_type ~ '(vector|\[\])$' ORDER BY 1,2,3;
table_name | column_name | column_type | ref_table_name | ref_column_name
----------------------+----------------+-------------+----------------+-----------------
pg_constraint | conexclop | oid[] | pg_operator | oid
pg_constraint | conffeqop | oid[] | pg_operator | oid
pg_constraint | confkey | int2[] | pg_attribute | attnum
pg_constraint | conkey | int2[] | pg_attribute | attnum
pg_constraint | conpfeqop | oid[] | pg_operator | oid
pg_constraint | conppeqop | oid[] | pg_operator | oid
pg_extension | extconfig | oid[] | pg_class | oid
pg_index | indclass | oidvector | pg_opclass | oid
pg_index | indcollation | oidvector | pg_collation | oid
pg_index | indkey | int2vector | pg_attribute | attnum
pg_partitioned_table | partattrs | int2vector | pg_attribute | attnum
pg_partitioned_table | partclass | oidvector | pg_opclass | oid
pg_partitioned_table | partcollation | oidvector | pg_collation | oid
pg_policy | polroles | oid[] | pg_authid | oid
pg_proc | proallargtypes | oid[] | pg_type | oid
pg_proc | proargtypes | oidvector | pg_type | oid
pg_statistic_ext | stxkeys | int2vector | pg_attribute | attnum
pg_trigger | tgattr | int2vector | pg_attribute | attnum
(18 rows)
I see there is an old thread with work in this area, "Re: GSoC 2017: Foreign Key Arrays":
The last message in the thread is from 2018-10-02:
>On Tue, 2 Oct, 2018 at 05:13:26AM +0200, Michael Paquier wrote:
>>On Sat, Aug 11, 2018 at 05:20:57AM +0200, Mark Rofail wrote:
>> I am still having problems rebasing this patch. I can not figure it out on
>> my own.
>Okay, it's been a couple of months since this last email, and nothing
>has happened, so I am marking it as returned with feedback.
>--
>Michael
Personally, I would absolutely *love* FKs on array columns.
I always feel shameful when adding array columns to tables,
when the elements are PKs in some other table.
It's convenient and often the best design,
but it feels dirty knowing there are no FKs to help detect application bugs.
Let's hope the current FKs-on-catalog-discussion can ignite the old Foreign Key Arrays thread.
/Joel
On 1/19/21 11:46 AM, Laurenz Albe wrote: > On Sun, 2021-01-17 at 17:07 -0500, Tom Lane wrote: >> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: >>> [ v2-0001-Add-primary-keys-and-unique-constraints-to-system.patch ] >> >> [...] do we really want to prefer >> using the OID indexes as the primary keys? In most cases there's some >> other index that seems to me to be what a user would think of as the >> pkey, for example pg_class_relname_nsp_index for pg_class or >> pg_proc_proname_args_nsp_index for pg_proc. Preferring OID where it >> exists is a nice simple rule, which has some attractiveness, but the >> OID indexes seem to me like a lookup aid rather than the "real" object >> identity. > > I disagree. The OID is the real, immutable identity of an object. > The "relname" of a "pg_class" encatalogtry can change any time. > Since there are no foreign keys that reference catalogs, that won't cause > problems, but I still think that primary keys should change as little as > possible. This might be good advice for systems where the primary key defines the physical layout of the table, but for postgres's heap there is no reason not to prefer the natural key of the table. My vote is with Tom on this one. -- Vik Fearing
Dear Joel,
I would love to start working on this again, I tried to revive the patch again in 2019, however, I faced the same problems as last time (detailed in the thread) and I didn't get the support I needed to continue with this patch.
Are you willing to help rebase and finally publish this patch?
Best Regards,
Mark Rofail
On Tue, 19 Jan 2021 at 2:22 PM Joel Jacobson <joel@compiler.org> wrote:
On Mon, Jan 18, 2021, at 18:23, Tom Lane wrote:> I realized that there's a stronger roadblock for> treating catalog interrelationships as SQL foreign keys. Namely,> that we always represent no-reference situations with a zero OID,> whereas it'd have to be NULL to look like a valid foreign-key case.Another roadblock is perhaps the lack of foreign keys on arrays,which would be needed by the following references:SELECT * FROM oidjoins WHERE column_type ~ '(vector|\[\])$' ORDER BY 1,2,3;table_name | column_name | column_type | ref_table_name | ref_column_name----------------------+----------------+-------------+----------------+-----------------pg_constraint | conexclop | oid[] | pg_operator | oidpg_constraint | conffeqop | oid[] | pg_operator | oidpg_constraint | confkey | int2[] | pg_attribute | attnumpg_constraint | conkey | int2[] | pg_attribute | attnumpg_constraint | conpfeqop | oid[] | pg_operator | oidpg_constraint | conppeqop | oid[] | pg_operator | oidpg_extension | extconfig | oid[] | pg_class | oidpg_index | indclass | oidvector | pg_opclass | oidpg_index | indcollation | oidvector | pg_collation | oidpg_index | indkey | int2vector | pg_attribute | attnumpg_partitioned_table | partattrs | int2vector | pg_attribute | attnumpg_partitioned_table | partclass | oidvector | pg_opclass | oidpg_partitioned_table | partcollation | oidvector | pg_collation | oidpg_policy | polroles | oid[] | pg_authid | oidpg_proc | proallargtypes | oid[] | pg_type | oidpg_proc | proargtypes | oidvector | pg_type | oidpg_statistic_ext | stxkeys | int2vector | pg_attribute | attnumpg_trigger | tgattr | int2vector | pg_attribute | attnum(18 rows)I see there is an old thread with work in this area, "Re: GSoC 2017: Foreign Key Arrays":The last message in the thread is from 2018-10-02:>On Tue, 2 Oct, 2018 at 05:13:26AM +0200, Michael Paquier wrote:>>On Sat, Aug 11, 2018 at 05:20:57AM +0200, Mark Rofail wrote:>> I am still having problems rebasing this patch. I can not figure it out on>> my own.>Okay, it's been a couple of months since this last email, and nothing>has happened, so I am marking it as returned with feedback.>-->MichaelPersonally, I would absolutely *love* FKs on array columns.I always feel shameful when adding array columns to tables,when the elements are PKs in some other table.It's convenient and often the best design,but it feels dirty knowing there are no FKs to help detect application bugs.Let's hope the current FKs-on-catalog-discussion can ignite the old Foreign Key Arrays thread./Joel
Laurenz Albe <laurenz.albe@cybertec.at> writes: > On Sun, 2021-01-17 at 17:07 -0500, Tom Lane wrote: >> [...] do we really want to prefer >> using the OID indexes as the primary keys? In most cases there's some >> other index that seems to me to be what a user would think of as the >> pkey, for example pg_class_relname_nsp_index for pg_class or >> pg_proc_proname_args_nsp_index for pg_proc. Preferring OID where it >> exists is a nice simple rule, which has some attractiveness, but the >> OID indexes seem to me like a lookup aid rather than the "real" object >> identity. > I disagree. The OID is the real, immutable identity of an object. > The "relname" of a "pg_class" encatalogtry can change any time. > Since there are no foreign keys that reference catalogs, that won't cause > problems, but I still think that primary keys should change as little as > possible. Yeah, there's also the point that the OID is what we use for "foreign key" references from other catalogs. I don't deny that there are reasons to think of OID as the pkey. But as long as these constraints are basically cosmetic, it seemed to me that we should consider the other approach. I'm not dead set on that, just wanted to discuss it. regards, tom lane
On Tue, Jan 19, 2021, at 18:25, Mark Rofail wrote:
>Dear Joel,
>
>I would love to start working on this again, I tried to revive the patch again in 2019, however, I faced the same problems as >last time (detailed in the thread) and I didn't get the support I needed to continue with this patch.
>
>Are you willing to help rebase and finally publish this patch?
Willing, sure, not perhaps not able,
if there are complex problems that need a deep understanding of code base,
I'm not sure I will be of much help, but I can for sure do testing and reviewing.
/Joel
I'll post tomorrow the latest rebased patch with a summary of the issues. Let's continue this in the foreign key array thread, as to not clutter this thread.
Regards, Mark Rofail.
On Tue, Jan 19, 2021, 11:00 PM Joel Jacobson <joel@compiler.org> wrote:
On Tue, Jan 19, 2021, at 18:25, Mark Rofail wrote:>Dear Joel,>>I would love to start working on this again, I tried to revive the patch again in 2019, however, I faced the same problems as >last time (detailed in the thread) and I didn't get the support I needed to continue with this patch.>>Are you willing to help rebase and finally publish this patch?Willing, sure, not perhaps not able,if there are complex problems that need a deep understanding of code base,I'm not sure I will be of much help, but I can for sure do testing and reviewing./Joel
On 2021-01-18 00:35, Robert Haas wrote: > I don't have any complaint about labelling some of the unique indexes > as primary keys, but I think installing foreign keys that don't really > enforce anything may lead to confusion. FWIW, "not enforced" constraints (such as foreign keys) is a feature that is in the SQL standard and in other SQL implementations. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
On 2021-01-18 18:23, Tom Lane wrote: > The reason I got interested in this right now is the nearby > discussion [1] about why findoidjoins misses some catalog > relationships and whether we should fix that and/or make its results > more readily accessible. I'd thought perhaps FK constraint entries > could supersede the need for that tool altogether, but now it seems > like not. Yes, catalog consistency checking was also something I had in mind. There are clearly a number of complications still to resolve. But in general the idea of representing relationships between tables in the DDL/schema sounds like a sensible idea. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
On 2021-01-17 23:07, Tom Lane wrote: > I've reviewed this patch. It looks pretty solid to me, with a couple > trivial nits as mentioned below, and one bigger thing that's perhaps > in the category of bikeshedding. Namely, do we really want to prefer > using the OID indexes as the primary keys? In most cases there's some > other index that seems to me to be what a user would think of as the > pkey, for example pg_class_relname_nsp_index for pg_class or > pg_proc_proname_args_nsp_index for pg_proc. Preferring OID where it > exists is a nice simple rule, which has some attractiveness, but the > OID indexes seem to me like a lookup aid rather than the "real" object > identity. I chose this because the notional foreign keys point to the OID. If you design some basic business database with customer IDs, product IDs, etc., you'd also usually make the ID the primary key, even if you have, say, a unique constraint on the product name. But this is of course a matter of taste to some degree. > system_constraints.sql should be removed by the maintainer-clean target > in src/backend/catalog/Makefile; perhaps also mention it in the comment > for the clean target. Also I suppose src/tools/msvc/clean.bat needs to > remove it, like it does postgres.bki. done > The contents of system_constraints.sql seem pretty randomly ordered, > and I bet the order isn't stable across machines. It would be wise > to sort the entries by name to ensure we don't get inconsistencies > between different builds. (If nothing else, that could complicate > validating tarballs.) They follow the order in which the catalogs are processed byt genbki.pl. This is the same order in which the catalog data and indexes are created in postgres.bki. The Perl code to handle each of these is conceptually the same, so that seems solid. We could order them differently, but there is also value in keeping the catalog processing order globally consistent. > I don't follow why the pg_seclabel, pg_shseclabel indexes aren't made > into pkeys? There's a comment claiming they "use a nondefault operator > class", but that's untrue AFAICS. When I started the patch, it was text_pattern_ops, but that was a long time ago. ;-) Fixed now. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
Attachment
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 2021-01-17 23:07, Tom Lane wrote: >> I've reviewed this patch. It looks pretty solid to me, with a couple >> trivial nits as mentioned below, and one bigger thing that's perhaps >> in the category of bikeshedding. Namely, do we really want to prefer >> using the OID indexes as the primary keys? > I chose this because the notional foreign keys point to the OID. > If you design some basic business database with customer IDs, product > IDs, etc., you'd also usually make the ID the primary key, even if you > have, say, a unique constraint on the product name. But this is of > course a matter of taste to some degree. Fair enough. As I said upthread, I just wanted to be sure we'd considered the alternative. I'm content to use the OIDs as pkeys, although I think that decision should be explicitly recorded somewhere (cf attachment). >> The contents of system_constraints.sql seem pretty randomly ordered, >> and I bet the order isn't stable across machines. > They follow the order in which the catalogs are processed byt genbki.pl. Looking closer, I see the data structure is an array not a hash, so I withdraw the concern about instability. After reading the patch again, I have a couple more nits about comments, which I'll just present as a proposed delta patch. Otherwise it's good. I'll mark it RFC. regards, tom lane diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl index 6747996ce3..b68c1752c0 100644 --- a/src/backend/catalog/genbki.pl +++ b/src/backend/catalog/genbki.pl @@ -663,6 +663,8 @@ die "genbki OID counter reached $GenbkiNextOid, overrunning FirstBootstrapObjectId\n" if $GenbkiNextOid > $FirstBootstrapObjectId; +# Now generate system_constraints.sql + foreach my $c (@system_constraints) { print $constraints $c, "\n"; diff --git a/src/include/catalog/genbki.h b/src/include/catalog/genbki.h index 296765d987..5d05fafb5d 100644 --- a/src/include/catalog/genbki.h +++ b/src/include/catalog/genbki.h @@ -55,12 +55,15 @@ #define DECLARE_TOAST(name,toastoid,indexoid) extern int no_such_variable /* - * These lines processed by genbki.pl to create the statements + * These lines are processed by genbki.pl to create the statements * the bootstrap parser will turn into DefineIndex calls. * - * The keyword is DECLARE_INDEX or DECLARE_UNIQUE_INDEX or DECLARE_UNIQUE_INDEX_PKEY. The first two - * arguments are the index name and OID, the rest is much like a standard - * 'create index' SQL command. + * The keyword is DECLARE_INDEX or DECLARE_UNIQUE_INDEX or + * DECLARE_UNIQUE_INDEX_PKEY. ("PKEY" marks the index as being the catalog's + * primary key; currently this is only cosmetically different from a regular + * unique index. By convention, we usually make a catalog's OID column its + * pkey, if it has one.) The first two arguments are the index's name and + * OID, the rest is much like a standard 'create index' SQL command. * * For each index, we also provide a #define for its OID. References to * the index in the C code should always use these #defines, not the actual diff --git a/src/test/regress/expected/misc_sanity.out b/src/test/regress/expected/misc_sanity.out index ae85817c61..9ebe28a78d 100644 --- a/src/test/regress/expected/misc_sanity.out +++ b/src/test/regress/expected/misc_sanity.out @@ -43,6 +43,9 @@ WHERE refclassid = 0 OR refobjid = 0 OR -- whatever OID the test is complaining about must have been added later -- in initdb, where it intentionally isn't pinned. Legitimate exceptions -- to that rule are listed in the comments in setup_depend(). +-- Currently, pg_rewrite is also listed by this check, even though it is +-- covered by setup_depend(). That happens because there are no rules in +-- the pinned data, but initdb creates some intentionally-not-pinned views. do $$ declare relnm text; reloid oid; diff --git a/src/test/regress/sql/misc_sanity.sql b/src/test/regress/sql/misc_sanity.sql index ea80cf1f10..9699f5cc3b 100644 --- a/src/test/regress/sql/misc_sanity.sql +++ b/src/test/regress/sql/misc_sanity.sql @@ -44,6 +44,9 @@ WHERE refclassid = 0 OR refobjid = 0 OR -- whatever OID the test is complaining about must have been added later -- in initdb, where it intentionally isn't pinned. Legitimate exceptions -- to that rule are listed in the comments in setup_depend(). +-- Currently, pg_rewrite is also listed by this check, even though it is +-- covered by setup_depend(). That happens because there are no rules in +-- the pinned data, but initdb creates some intentionally-not-pinned views. do $$ declare relnm text;
Many thanks for excellent work!
I've tested the patch successfully.
I ran this query (on a patched database) to see if there are still any catalog tables without primary keys:
SELECT
table_name
FROM information_schema.tables
WHERE table_schema = 'pg_catalog'
AND table_type = 'BASE TABLE'
EXCEPT
SELECT
table_constraints.table_name
FROM information_schema.table_constraints
JOIN information_schema.key_column_usage
ON key_column_usage.constraint_name = table_constraints.constraint_name
WHERE table_constraints.table_schema = 'pg_catalog'
AND table_constraints.constraint_type = 'PRIMARY KEY'
table_name
-------------
pg_depend
pg_shdepend
(2 rows)
Wouldn't it be possible to add primary keys to these two as well?
It would need to be multi-column primary keys, but should be ok
since we have that for other catalogs?
/Joel
On Thu, Jan 21, 2021, at 18:15, Tom Lane wrote:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:> On 2021-01-17 23:07, Tom Lane wrote:>> I've reviewed this patch. It looks pretty solid to me, with a couple>> trivial nits as mentioned below, and one bigger thing that's perhaps>> in the category of bikeshedding. Namely, do we really want to prefer>> using the OID indexes as the primary keys?> I chose this because the notional foreign keys point to the OID.> If you design some basic business database with customer IDs, product> IDs, etc., you'd also usually make the ID the primary key, even if you> have, say, a unique constraint on the product name. But this is of> course a matter of taste to some degree.Fair enough. As I said upthread, I just wanted to be sure we'd consideredthe alternative. I'm content to use the OIDs as pkeys, although I thinkthat decision should be explicitly recorded somewhere (cf attachment).>> The contents of system_constraints.sql seem pretty randomly ordered,>> and I bet the order isn't stable across machines.> They follow the order in which the catalogs are processed byt genbki.pl.Looking closer, I see the data structure is an array not a hash, soI withdraw the concern about instability.After reading the patch again, I have a couple more nits about comments,which I'll just present as a proposed delta patch. Otherwise it's good.I'll mark it RFC.regards, tom lane
"Joel Jacobson" <joel@compiler.org> writes: > I ran this query (on a patched database) to see if there are still any catalog tables without primary keys: > ... > pg_depend > pg_shdepend Yeah, this is noted in the patch's own regression tests. > Wouldn't it be possible to add primary keys to these two as well? Neither of the existing indexes is suitable, not being unique. We could imagine adding a unique index across the whole column set, but that would be an awfully large price to pay for neatnik-ism. Also, at least for pg_depend (less sure about pg_shdepend), some code cleanup would be required, because I don't think that we try very hard to avoid making duplicate dependency entries. On the whole I feel this'd be counterproductive. regards, tom lane
On Fri, Jan 22, 2021, at 16:42, Tom Lane wrote:
>"Joel Jacobson" <joel@compiler.org> writes:
>> I ran this query (on a patched database) to see if there are still any catalog tables without primary keys:
>> ...
>> pg_depend
>> pg_shdepend
>
>Yeah, this is noted in the patch's own regression tests.
Thanks. Looks good.
>> Wouldn't it be possible to add primary keys to these two as well?
>
>Neither of the existing indexes is suitable, not being unique.
>
>We could imagine adding a unique index across the whole column set,
>but that would be an awfully large price to pay for neatnik-ism.
>Also, at least for pg_depend (less sure about pg_shdepend), some code
>cleanup would be required, because I don't think that we try very
>hard to avoid making duplicate dependency entries. On the whole
>I feel this'd be counterproductive.
>
>regards, tom lane
I see, and I agree with you.
I'm very happy with this patch.
I've already found great use of it in a tool I'm working on:
Many thanks to all of you for all the great work!
Can't wait to use this functionality in production.
/Joel
On 2021-01-21 18:15, Tom Lane wrote: > After reading the patch again, I have a couple more nits about comments, > which I'll just present as a proposed delta patch. Otherwise it's good. > I'll mark it RFC. Committed with your update, thanks. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > Committed with your update, thanks. Hmm, shouldn't there have been a catversion bump in there? regards, tom lane
On 2021-01-30 22:56, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: >> Committed with your update, thanks. > > Hmm, shouldn't there have been a catversion bump in there? I suppose yes on the grounds that it introduces something new in a freshly initdb-ed database. But I thought it wasn't necessary because there is no dependency between the binaries and the on-disk state. There has already been another catversion change since, so it's no longer relevant. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 2021-01-30 22:56, Tom Lane wrote: >> Hmm, shouldn't there have been a catversion bump in there? > I suppose yes on the grounds that it introduces something new in a > freshly initdb-ed database. But I thought it wasn't necessary because > there is no dependency between the binaries and the on-disk state. I've generally worked on the theory that a catversion bump is indicated if you need to initdb in order to pass the updated regression tests. Which one did in this case. However ... > There has already been another catversion change since, so it's no > longer relevant. ... yeah, it's moot now. regards, tom lane
On 2021-02-01 15:24, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: >> On 2021-01-30 22:56, Tom Lane wrote: >>> Hmm, shouldn't there have been a catversion bump in there? > >> I suppose yes on the grounds that it introduces something new in a >> freshly initdb-ed database. But I thought it wasn't necessary because >> there is no dependency between the binaries and the on-disk state. > > I've generally worked on the theory that a catversion bump is indicated > if you need to initdb in order to pass the updated regression tests. > Which one did in this case. Yeah, that's a good way of looking at it. I'll keep that in mind. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
On 2021-01-22 16:42, Tom Lane wrote: >> pg_depend >> pg_shdepend > Yeah, this is noted in the patch's own regression tests. > >> Wouldn't it be possible to add primary keys to these two as well? > Neither of the existing indexes is suitable, not being unique. > > We could imagine adding a unique index across the whole column set, > but that would be an awfully large price to pay for neatnik-ism. > Also, at least for pg_depend (less sure about pg_shdepend), some code > cleanup would be required, because I don't think that we try very > hard to avoid making duplicate dependency entries. On the whole > I feel this'd be counterproductive. I did attempt to make a six- or seven-column unique constraint on pg_depend a while ago. This fails pretty much immediately during initdb. None of the code that adds dependencies, in particular recordMultipleDependencies(), checks if the dependency is already there. I do wonder, however, under what circumstances code would be put into a situation where it would add the exact same dependency again, and also under what circumstances it would add a dependency between the same objects but a different deptype, and how that would work during recursive deletion. Now that we have the refobjversion column, the presence of duplicate dependencies might be even more dubious. I think that could be worth analyzing. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
On Tue, Feb 2, 2021 at 6:49 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > I do wonder, however, under what circumstances code would be put into a > situation where it would add the exact same dependency again, and also > under what circumstances it would add a dependency between the same > objects but a different deptype, and how that would work during > recursive deletion. Now that we have the refobjversion column, the > presence of duplicate dependencies might be even more dubious. I think > that could be worth analyzing. There's at least dependencies from indexes to pg_class (underlying table) and pg_collation that can be entirely duplicated (if the same column and/or collation is used in both key and predicate for instance), and this was the case before refobjversion was introduced.
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > I do wonder, however, under what circumstances code would be put into a > situation where it would add the exact same dependency again, and also > under what circumstances it would add a dependency between the same > objects but a different deptype, and how that would work during > recursive deletion. Now that we have the refobjversion column, the > presence of duplicate dependencies might be even more dubious. I think > that could be worth analyzing. The duplicate-deps case occurs if, say, you use the same function more than once in a view. While we could probably get rid of dups generated within the same recordMultipleDependencies call (and maybe already do?), it's harder to deal with dups made by retail calls. For example, makeOperatorDependencies doesn't trouble to avoid making duplicate dependencies when the operator's left and right input types are the same, or the same as the result type. The case with almost-dup-but-not-same-deptype *does* occur, for instance it's very possible to have both normal and automatic dependencies. As I recall, the policy is to perform the deletion if any of the deps would allow it. Dunno about refobjversion; I have my doubts that putting that info in pg_depend was a sane design choice at all. But from what I understand of it, wouldn't all deps on a given object necessarily have the same version? regards, tom lane
On Tue, Feb 2, 2021 at 11:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Dunno about refobjversion; I have my doubts that putting that info in > pg_depend was a sane design choice at all. But from what I understand > of it, wouldn't all deps on a given object necessarily have the same > version? Correct, assuming of course that the dependencies are from the same object.
On 03.10.20 08:40, Peter Eisentraut wrote: > I saw someone ask once for a schema diagram of the system catalogs. > Things like that have occasionally been produced manually, but they are > not regularly updated. That made me wonder, why can't we add primary > and foreign keys to system catalogs and then have existing tools produce > such a schema diagram automatically? Given that these prerequisites have been accomplished in PostgreSQL 14, I went back and made a little tool to convert the foreign-key relationship information into a graphviz representation. The result turns out to be pretty unwieldly and probably not something easily suitable for the documentation; maybe someone wants to fine-tune the graphviz settings to get a better view. My tool is at <https://github.com/petere/pgcatviz>. Attached is a sample output.