Thread: BUG #18244: Corruption in indexes involving whole-row expressions
The following bug has been logged on the website: Bug reference: 18244 Logged by: Nikolay Samokhvalov Email address: nik@postgres.ai PostgreSQL version: 16.1 Operating system: any Description: nik=# create index on t1 (hash_record(t1)); CREATE INDEX Such an index can easily be corrupted, e.g.: nik=# alter table t1 add column yo int default -1; ALTER TABLE Or if we just drop a column: nik=# alter table t_n drop column c2; ALTER TABLE nik=# \d t_n Table "public.t_n" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- c1 | integer | | | Indexes: "t_n_hash_record_idx" btree (hash_record(t_n.*)) nik=# select * from t_n where hash_record(t_n) = hash_record(row(1, -1)); c1 ---- 1 (1 row) nik=# select * from t_n where hash_record(t_n) = hash_record(row(1)); c1 ---- (0 rows) Or if index is unique: nik=# select * from t_n; c4 ---- -1 -1 -1 (3 rows) nik=# \d t_n Table "public.t_n" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------------- c4 | integer | | | '-1'::integer Indexes: "t_n_hash_record_idx" UNIQUE, btree (hash_record(t_n.*)) nik=# reindex index t_n_hash_record_idx; ERROR: could not create unique index "t_n_hash_record_idx" DETAIL: Key (hash_record(t_n.*))=(385747274) is duplicated. Proposal: prohibit the use of whole-row expression – as it is already done for generated columns and produce a similar error ("cannot use whole-row variable in column generation expression")
On Tue, 2023-12-12 at 19:28 +0000, PG Bug reporting form wrote: > nik=# create index on t1 (hash_record(t1)); > > Such an index can easily be corrupted, e.g.: > > nik=# alter table t1 add column yo int default -1; > > Proposal: prohibit the use of whole-row expression – as it is already done > for generated columns and produce a similar error ("cannot use whole-row > variable in column generation expression") I reported that bug before: https://www.postgresql.org/message-id/flat/e48a5d9a2d3d72985d61ee254314f5f5f5444a55.camel%40cybertec.at I think that your proposal is good, it seems that nobody cared enough to implement it. If we forbid that, one thing to consider is pg_upgrade: it needs a new check. Yours, Laurenz Albe
On Wed, Dec 13, 2023 at 11:53 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > On Tue, 2023-12-12 at 19:28 +0000, PG Bug reporting form wrote: > > nik=# create index on t1 (hash_record(t1)); > > > > Such an index can easily be corrupted, e.g.: > > > > nik=# alter table t1 add column yo int default -1; > > > > Proposal: prohibit the use of whole-row expression – as it is already done > > for generated columns and produce a similar error ("cannot use whole-row > > variable in column generation expression") > > I reported that bug before: > https://www.postgresql.org/message-id/flat/e48a5d9a2d3d72985d61ee254314f5f5f5444a55.camel%40cybertec.at FTR I also posted a repro for another variation of that problem. I think it's slightly more general that, it not just whole-row expressions (eg index on <table_name>), it's row types generally: https://www.postgresql.org/message-id/CA%2BhUKGKb4SB%2BqQ-vAVomxAvJY6um%2B5URyq2D0vv10g7mbYZ1Ww%40mail.gmail.com
On Wed, 2023-12-13 at 14:38 +1300, Thomas Munro wrote: > On Wed, Dec 13, 2023 at 11:53 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Tue, 2023-12-12 at 19:28 +0000, PG Bug reporting form wrote: > > > nik=# create index on t1 (hash_record(t1)); > > > > > > Such an index can easily be corrupted, e.g.: > > > > > > nik=# alter table t1 add column yo int default -1; > > > > > > Proposal: prohibit the use of whole-row expression – as it is already done > > > for generated columns and produce a similar error ("cannot use whole-row > > > variable in column generation expression") > > > > I reported that bug before: > > https://www.postgresql.org/message-id/flat/e48a5d9a2d3d72985d61ee254314f5f5f5444a55.camel%40cybertec.at > > FTR I also posted a repro for another variation of that problem. I > think it's slightly more general that, it not just whole-row > expressions (eg index on <table_name>), it's row types generally: > > https://www.postgresql.org/message-id/CA%2BhUKGKb4SB%2BqQ-vAVomxAvJY6um%2B5URyq2D0vv10g7mbYZ1Ww%40mail.gmail.com Yes, and that makes it difficult to decide on the correct path. Should we check all places where the type is used before allowing ALTER TYPE? Sounds complicated and error-prone. Should we forbid composite types in index declarations? Sounds posssible, but very restrictive. We might as well forbid composite types in table definitions. That would be a good thing in my opinion, but it would cause a compatibility headache. Perhaps the best thing would be to add a warning to chapter 8.16.4 that it is "best practice" (and conforming to the first normal form) not to use composite types in column definitions, and that altering a composite type used in a table definition can lead to consistency problems. Yours, Laurenz Albe
Laurenz Albe <laurenz.albe@cybertec.at> writes: > Should we forbid composite types in index declarations? Sounds posssible, > but very restrictive. That would make a lot of people very sad, I fear. I think a minimum answer could be to document that you might need to REINDEX such indexes after a change in the composite type. We could perhaps try to be more proactive, say by marking such indexes invalid. I think though that that would have deadlock problems as well as race conditions. Perhaps best to leave it at "reindex when necessary", especially in view of the tiny number of reports to date. regards, tom lane
On Tue, 2023-12-12 at 23:04 -0500, Tom Lane wrote: > Laurenz Albe <laurenz.albe@cybertec.at> writes: > > Should we forbid composite types in index declarations? Sounds posssible, > > but very restrictive. > > That would make a lot of people very sad, I fear. > > I think a minimum answer could be to document that you might need to > REINDEX such indexes after a change in the composite type. We could > perhaps try to be more proactive, say by marking such indexes invalid. > I think though that that would have deadlock problems as well as race > conditions. Perhaps best to leave it at "reindex when necessary", > especially in view of the tiny number of reports to date. I agree. How about the attached documentation patch? Yours, Laurenz Albe
Attachment
On Thu, Dec 14, 2023 at 1:43 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Tue, 2023-12-12 at 23:04 -0500, Tom Lane wrote: > > Laurenz Albe <laurenz.albe@cybertec.at> writes: > > > Should we forbid composite types in index declarations? Sounds posssible, > > > but very restrictive. > > > > That would make a lot of people very sad, I fear. > > > > I think a minimum answer could be to document that you might need to > > REINDEX such indexes after a change in the composite type. We could > > perhaps try to be more proactive, say by marking such indexes invalid. > > I think though that that would have deadlock problems as well as race > > conditions. Perhaps best to leave it at "reindex when necessary", > > especially in view of the tiny number of reports to date. > > I agree. How about the attached documentation patch? > + While you can use composite types in column definitions, there is usually no + benefit in doing so. Is this true? If yes, we'd better update this to https://wiki.postgresql.org/wiki/Don%27t_Do_This. > Yours, > Laurenz Albe -- Regards Junwang Zhao
On Thu, 2023-12-14 at 16:27 +0800, Junwang Zhao wrote: > + While you can use composite types in column definitions, there is usually no > + benefit in doing so. > > Is this true? If yes, we'd better update this to > https://wiki.postgresql.org/wiki/Don%27t_Do_This. I cannot think of a use case for composite type columns that wouldn't be better served by directly declaring the columns of the composite key in the table definition. And it makes accessing the values more complicated. So yes, I'd be on board with a "don't do this" entry. Yours, Laurenz Albe