Thread: Database-level collation version tracking

Database-level collation version tracking

From

Peter Eisentraut

Date:

01 February 2022, 15:20:14

This patch adds to database objects the same version tracking that 
collation objects have.  There is a new pg_database column 
datcollversion that stores the version, a new function 
pg_database_collation_actual_version() to get the version from the 
operating system, and a new subcommand ALTER DATABASE ... REFRESH 
COLLATION VERSION.

This was not originally added together with pg_collation.collversion, 
since originally version tracking was only supported for ICU, and ICU on 
a database-level is not currently supported.  But we now have version 
tracking for glibc (since PG13), FreeBSD (since PG14), and Windows 
(since PG13), so this is useful to have now.  And of course ICU on 
database-level is being worked on at the moment as well.

This patch is pretty much complete AFAICT.  One bonus thing would be to 
add a query to the ALTER DATABASE ref page similar to the one on the 
ALTER COLLATION ref page that identifies the objects that are affected 
by outdated collations.  The database version of that might just show 
all collation-using objects or something like that.  Suggestions welcome.

Also, one curious behavior is that if you get to a situation where the 
collation version is mismatched, every time an autovacuum worker 
launches you get the collation version mismatch warning in the log. 
Maybe that's actually correct, but maybe we want to dial it down a bit 
for non-interactive sessions.

Attachment

0001-Database-level-collation-version-tracking.patch

Re: Database-level collation version tracking

From

Julien Rouhaud

Date:

07 February 2022, 10:29:47

On Tue, Feb 01, 2022 at 04:20:14PM +0100, Peter Eisentraut wrote:
> This patch adds to database objects the same version tracking that collation
> objects have.

This version conflicts with 87669de72c2 (Some cleanup for change of collate and
ctype fields to type text), so I'm attaching a simple rebase of your patch to
make the cfbot happy, no other changes.

> There is a new pg_database column datcollversion that stores
> the version, a new function pg_database_collation_actual_version() to get
> the version from the operating system, and a new subcommand ALTER DATABASE
> ... REFRESH COLLATION VERSION.
> 
> This was not originally added together with pg_collation.collversion, since
> originally version tracking was only supported for ICU, and ICU on a
> database-level is not currently supported.  But we now have version tracking
> for glibc (since PG13), FreeBSD (since PG14), and Windows (since PG13), so
> this is useful to have now.  And of course ICU on database-level is being
> worked on at the moment as well.
> This patch is pretty much complete AFAICT.

Agreed.  Here's a review of the patch:

- there should be a mention to the need for a catversion bump in the message
  comment
- there is no test
- it's missing some updates in create_database.sgml, and psql tab completion
  for CREATE DATABASE with the new collation_version defelem.

- that's not really something new with this patch, but should we output the
  collation version info or mismatch info in \l / \dO?

+       if (!actual_versionstr)
+           ereport(ERROR,
+                   (errmsg("database \"%s\" has no actual collation version, but a version was specified",
+                           name)));-

this means you can't connect on such a database anymore.  The level is probably
ok for collation version check, but for db isn't that too much?

+Oid
+AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
+{
+   Relation    rel;
+   Oid         dboid;
+   HeapTuple   tup;
+   Datum       datum;
+   bool        isnull;
+   char       *oldversion;
+   char       *newversion;
+
+   rel = table_open(DatabaseRelationId, RowExclusiveLock);
+   dboid = get_database_oid(stmt->dbname, false);
+
+   if (!pg_database_ownercheck(dboid, GetUserId()))
+       aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
+                      stmt->dbname);
+
+   tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(dboid));
+   if (!HeapTupleIsValid(tup))
+       elog(ERROR, "cache lookup failed for database %u", dboid);

Is that ok to not obtain a lock on the database when refreshing the collation?
I guess it's not worth bothering as it can only lead to spurious messages for
connection done concurrently, but there should be a comment to clarify it.
Also, it means that someone can drop the database concurrently. So it's
should be a "database does not exist" rather than a cache lookup failed error
message.

+   /*
+    * Check collation version.  See similar code in
+    * pg_newlocale_from_collation().
+    */
+   datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_datcollversion,
+                           &isnull);
+   if (!isnull)
+   {

This (and pg_newlocale_from_collation()) reports a problem if a recorded
collation version is found but there's no reported collation version.
Shouldn't it also complain if it's the opposite?  It's otherwise a backdoor to
make sure there won't be any check about the version anymore, and while it can
probably happen if you mess with the catalogs it still doesn't look great.

+       /*
+        * template0 shouldn't have any collation-dependent objects, so unset
+        * the collation version.  This avoids warnings when making a new
+        * database from it.
+        */
+       "UPDATE pg_database SET datcollversion = NULL WHERE datname = 'template0';\n\n",

I'm not opposed, but shouldn't there indeed be a warning in case of discrepancy
in the source database (whether template or not)?

# update pg_database set datcollversion = 'meh' where datname in ('postgres', 'template1');
UPDATE 2

# create database test1 template postgres;
CREATE DATABASE

# create database test2 template template1;
CREATE DATABASE

# \c test2
WARNING:  database "test2" has a collation version mismatch
DETAIL:  The database was created using collation version meh, but the operating system provides version 2.34.
HINT:  Rebuild all objects affected by collation in this database and run ALTER DATABASE test2 REFRESH COLLATION
VERSION,or build PostgreSQL with the right library version.

You are now connected to database "test2" as user "rjuju".

The rest of the patch looks good to me.  There's notably pg_dump and pg_upgrade
support so it indeed looks complete.

> One bonus thing would be to add
> a query to the ALTER DATABASE ref page similar to the one on the ALTER
> COLLATION ref page that identifies the objects that are affected by outdated
> collations.  The database version of that might just show all
> collation-using objects or something like that.  Suggestions welcome.

That would be nice, but that's something quite hard to do and the resulting
query would be somewhat unreadable.
Also, you need to look at custom data types, expression and quals at least, so
I'm not even sure that you can actually do it in pure SQL without additional
infrastructure.

> Also, one curious behavior is that if you get to a situation where the
> collation version is mismatched, every time an autovacuum worker launches
> you get the collation version mismatch warning in the log. Maybe that's
> actually correct, but maybe we want to dial it down a bit for
> non-interactive sessions.

I'm +0.5 to keep it that way.  Index corruption is a real danger, so if you
have enough autovacuum worker launched to have a real problem with that, you
clearly should take care of the problem even faster.

Attachment

v2-0001-Database-level-collation-version-tracking.patch

Re: Database-level collation version tracking

From

Peter Eisentraut

Date:

07 February 2022, 15:44:24

On 07.02.22 11:29, Julien Rouhaud wrote:
> - there should be a mention to the need for a catversion bump in the message
>    comment

done

> - there is no test

Suggestions where to put it?  We don't really have tests for the 
collation-level versioning either, do we?

> - it's missing some updates in create_database.sgml, and psql tab completion
>    for CREATE DATABASE with the new collation_version defelem.

Added to create_database.sgml, but not to psql.  We don't have 
completion for the collation option either, since it's only meant to be 
used by pg_upgrade, not interactively.

> - that's not really something new with this patch, but should we output the
>    collation version info or mismatch info in \l / \dO?

It's a possibility.  Perhaps there is a question of performance if we 
show it in \l and people have tons of databases and they have to make a 
locale call for each one.  As you say, it's more an independent feature, 
but it's worth looking into.

> +       if (!actual_versionstr)
> +           ereport(ERROR,
> +                   (errmsg("database \"%s\" has no actual collation version, but a version was specified",
> +                           name)));-
> 
> this means you can't connect on such a database anymore.  The level is probably
> ok for collation version check, but for db isn't that too much?

Right, changed to warning.

> +Oid
> +AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
> +{
> +   Relation    rel;
> +   Oid         dboid;
> +   HeapTuple   tup;
> +   Datum       datum;
> +   bool        isnull;
> +   char       *oldversion;
> +   char       *newversion;
> +
> +   rel = table_open(DatabaseRelationId, RowExclusiveLock);
> +   dboid = get_database_oid(stmt->dbname, false);
> +
> +   if (!pg_database_ownercheck(dboid, GetUserId()))
> +       aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
> +                      stmt->dbname);
> +
> +   tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(dboid));
> +   if (!HeapTupleIsValid(tup))
> +       elog(ERROR, "cache lookup failed for database %u", dboid);
> 
> Is that ok to not obtain a lock on the database when refreshing the collation?

That code takes a RowExclusiveLock on pg_database.  Did you have 
something else in mind?

> +   /*
> +    * Check collation version.  See similar code in
> +    * pg_newlocale_from_collation().
> +    */
> +   datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_datcollversion,
> +                           &isnull);
> +   if (!isnull)
> +   {
> 
> This (and pg_newlocale_from_collation()) reports a problem if a recorded
> collation version is found but there's no reported collation version.
> Shouldn't it also complain if it's the opposite?  It's otherwise a backdoor to
> make sure there won't be any check about the version anymore, and while it can
> probably happen if you mess with the catalogs it still doesn't look great.

get_collation_actual_version() always returns either null or not null 
for a given installation.  So the situation that the stored version is 
null and the actual version is not null can only happen as part of a 
software upgrade.  In that case, all uses of collations after an upgrade 
would immediately start complaining about missing versions, which seems 
like a bad experience.  Users can explicitly opt in to version tracking 
by running REFRESH VERSION once.

> +       /*
> +        * template0 shouldn't have any collation-dependent objects, so unset
> +        * the collation version.  This avoids warnings when making a new
> +        * database from it.
> +        */
> +       "UPDATE pg_database SET datcollversion = NULL WHERE datname = 'template0';\n\n",
> 
> I'm not opposed, but shouldn't there indeed be a warning in case of discrepancy
> in the source database (whether template or not)?
> 
> # update pg_database set datcollversion = 'meh' where datname in ('postgres', 'template1');
> UPDATE 2
> 
> # create database test1 template postgres;
> CREATE DATABASE
> 
> # create database test2 template template1;
> CREATE DATABASE
> 
> # \c test2
> WARNING:  database "test2" has a collation version mismatch

I don't understand what the complaint is here.  It seems to work ok?

Attachment

v3-0001-Database-level-collation-version-tracking.patch

Re: Database-level collation version tracking

From

Julien Rouhaud

Date:

07 February 2022, 16:08:14

On Mon, Feb 07, 2022 at 04:44:24PM +0100, Peter Eisentraut wrote:
> On 07.02.22 11:29, Julien Rouhaud wrote:
> 
> > - there is no test
> 
> Suggestions where to put it?  We don't really have tests for the
> collation-level versioning either, do we?

There's so limited testing in collate.* regression tests, so I thought it would
be ok to add it there.  At least some ALTER DATABASE ... REFRESH VERSION would
be good, similarly to collation-level versioning.

> > - it's missing some updates in create_database.sgml, and psql tab completion
> >    for CREATE DATABASE with the new collation_version defelem.
> 
> Added to create_database.sgml, but not to psql.  We don't have completion
> for the collation option either, since it's only meant to be used by
> pg_upgrade, not interactively.

Ok.

> 
> > - that's not really something new with this patch, but should we output the
> >    collation version info or mismatch info in \l / \dO?
> 
> It's a possibility.  Perhaps there is a question of performance if we show
> it in \l and people have tons of databases and they have to make a locale
> call for each one.  As you say, it's more an independent feature, but it's
> worth looking into.

Agreed.

> > +Oid
> > +AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
> > +{
> > +   Relation    rel;
> > +   Oid         dboid;
> > +   HeapTuple   tup;
> > +   Datum       datum;
> > +   bool        isnull;
> > +   char       *oldversion;
> > +   char       *newversion;
> > +
> > +   rel = table_open(DatabaseRelationId, RowExclusiveLock);
> > +   dboid = get_database_oid(stmt->dbname, false);
> > +
> > +   if (!pg_database_ownercheck(dboid, GetUserId()))
> > +       aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
> > +                      stmt->dbname);
> > +
> > +   tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(dboid));
> > +   if (!HeapTupleIsValid(tup))
> > +       elog(ERROR, "cache lookup failed for database %u", dboid);
> > 
> > Is that ok to not obtain a lock on the database when refreshing the collation?
> 
> That code takes a RowExclusiveLock on pg_database.  Did you have something
> else in mind?

But that lock won't prevent a concurrent DROP DATABASE, so it's totally
expected to hit that cache lookup failed error.  There should either be a
shdepLockAndCheckObject(), or changing the error message to some errmsg("data
xxx does not exist").

> > +   /*
> > +    * Check collation version.  See similar code in
> > +    * pg_newlocale_from_collation().
> > +    */
> > +   datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_datcollversion,
> > +                           &isnull);
> > +   if (!isnull)
> > +   {
> > 
> > This (and pg_newlocale_from_collation()) reports a problem if a recorded
> > collation version is found but there's no reported collation version.
> > Shouldn't it also complain if it's the opposite?  It's otherwise a backdoor to
> > make sure there won't be any check about the version anymore, and while it can
> > probably happen if you mess with the catalogs it still doesn't look great.
> 
> get_collation_actual_version() always returns either null or not null for a
> given installation.  So the situation that the stored version is null and
> the actual version is not null can only happen as part of a software
> upgrade.  In that case, all uses of collations after an upgrade would
> immediately start complaining about missing versions, which seems like a bad
> experience.  Users can explicitly opt in to version tracking by running
> REFRESH VERSION once.

Ah right, I do remember that point which was also discussed in the collation
version tracking.  Sorry about the noise.

> > +       /*
> > +        * template0 shouldn't have any collation-dependent objects, so unset
> > +        * the collation version.  This avoids warnings when making a new
> > +        * database from it.
> > +        */
> > +       "UPDATE pg_database SET datcollversion = NULL WHERE datname = 'template0';\n\n",
> > 
> > I'm not opposed, but shouldn't there indeed be a warning in case of discrepancy
> > in the source database (whether template or not)?
> > 
> > # update pg_database set datcollversion = 'meh' where datname in ('postgres', 'template1');
> > UPDATE 2
> > 
> > # create database test1 template postgres;
> > CREATE DATABASE
> > 
> > # create database test2 template template1;
> > CREATE DATABASE
> > 
> > # \c test2
> > WARNING:  database "test2" has a collation version mismatch
> 
> I don't understand what the complaint is here.  It seems to work ok?

The comment says that you explicitly set a NULL collation version to avoid
warning when creating a database using template0 as a template, presumably
after a collation lib upgrade.

But as far as I can see the source database collation version is not checked
when creating a new database, so it seems to me that either the comment is
wrong, or we need another check for that.  The latter seems preferable to me.

Re: Database-level collation version tracking

From

Peter Eisentraut

Date:

08 February 2022, 11:14:02

On 07.02.22 17:08, Julien Rouhaud wrote:
> There's so limited testing in collate.* regression tests, so I thought it would
> be ok to add it there.  At least some ALTER DATABASE ... REFRESH VERSION would
> be good, similarly to collation-level versioning.

I don't think you can run ALTER DATABASE from the regression test 
scripts, since the database name is not fixed.  You'd have to paste the 
command together using psql tricks or something.  I guess it could be 
done, but maybe there is a better solution.  We could put it into the 
createdb test suite, or write a new TAP test suite somewhere.  There is 
no good precedent for this.

>> That code takes a RowExclusiveLock on pg_database.  Did you have something
>> else in mind?
> 
> But that lock won't prevent a concurrent DROP DATABASE, so it's totally
> expected to hit that cache lookup failed error.  There should either be a
> shdepLockAndCheckObject(), or changing the error message to some errmsg("data
> xxx does not exist").

I was not familiar with that function.  AFAICT, it is only used for 
database and role settings (AlterDatabaseSet(), AlterRoleSet()), but not 
when just updating the role or database catalog (e.g., AlterRole(), 
RenameRole(), RenameDatabase()).  So I don't think it is needed here. 
Maybe I'm missing something.

Re: Database-level collation version tracking

From

Julien Rouhaud

Date:

08 February 2022, 12:55:18

On Tue, Feb 08, 2022 at 12:14:02PM +0100, Peter Eisentraut wrote:
> On 07.02.22 17:08, Julien Rouhaud wrote:
> > There's so limited testing in collate.* regression tests, so I thought it would
> > be ok to add it there.  At least some ALTER DATABASE ... REFRESH VERSION would
> > be good, similarly to collation-level versioning.
> 
> I don't think you can run ALTER DATABASE from the regression test scripts,
> since the database name is not fixed.  You'd have to paste the command
> together using psql tricks or something.  I guess it could be done, but
> maybe there is a better solution.  We could put it into the createdb test
> suite, or write a new TAP test suite somewhere.  There is no good precedent
> for this.

I was thinking using a simple DO command, as there are already some other usage
for that for non deterministic things (like TOAST tables).  If it's too
problematic I'm fine with not having tests for that for now.

> > > That code takes a RowExclusiveLock on pg_database.  Did you have something
> > > else in mind?
> > 
> > But that lock won't prevent a concurrent DROP DATABASE, so it's totally
> > expected to hit that cache lookup failed error.  There should either be a
> > shdepLockAndCheckObject(), or changing the error message to some errmsg("data
> > xxx does not exist").
> 
> I was not familiar with that function.  AFAICT, it is only used for database
> and role settings (AlterDatabaseSet(), AlterRoleSet()), but not when just
> updating the role or database catalog (e.g., AlterRole(), RenameRole(),
> RenameDatabase()).  So I don't think it is needed here. Maybe I'm missing
> something.

I'm just saying that without such a lock you can easily trigger the "cache
lookup" error, and that's something that's supposed to happen with normal
usage I think.  So it should be a better message saying that the database has
been concurrently dropped, or actually simply does not exist like it's done in
AlterDatabaseOwner() for the same pattern:

[...]
    tuple = systable_getnext(scan);
    if (!HeapTupleIsValid(tuple))
        ereport(ERROR,
                (errcode(ERRCODE_UNDEFINED_DATABASE),
                 errmsg("database \"%s\" does not exist", dbname)));
[...]

Apart from that I still think that we should check the collation version of the
source database when creating a new database.  It won't cost much but will give
the DBA a chance to recreate the indexes before risking invalid index usage.

Re: Database-level collation version tracking

From

Michael Banck

Date:

08 February 2022, 14:08:43

On Mon, Feb 07, 2022 at 04:44:24PM +0100, Peter Eisentraut wrote:
> On 07.02.22 11:29, Julien Rouhaud wrote:
> > - that's not really something new with this patch, but should we output the
> >    collation version info or mismatch info in \l / \dO?
> 
> It's a possibility.  Perhaps there is a question of performance if we show
> it in \l and people have tons of databases and they have to make a locale
> call for each one.  As you say, it's more an independent feature, but it's
> worth looking into.

Ok, but \l+ shows among others the database size, so I guess at that
level also showing this should be fine? (or is that already the case?)


Michael

Re: Database-level collation version tracking

From

Peter Eisentraut

Date:

09 February 2022, 11:48:35

On 08.02.22 13:55, Julien Rouhaud wrote:
> I'm just saying that without such a lock you can easily trigger the "cache
> lookup" error, and that's something that's supposed to happen with normal
> usage I think.  So it should be a better message saying that the database has
> been concurrently dropped, or actually simply does not exist like it's done in
> AlterDatabaseOwner() for the same pattern:
> 
> [...]
>     tuple = systable_getnext(scan);
>     if (!HeapTupleIsValid(tuple))
>         ereport(ERROR,
>                 (errcode(ERRCODE_UNDEFINED_DATABASE),
>                  errmsg("database \"%s\" does not exist", dbname)));
> [...]

In my code, the existence of the database is checked by

     dboid = get_database_oid(stmt->dbname, false);

This also issues an appropriate user-facing error message if the 
database does not exist.

The flow in AlterDatabaseOwner() is a bit different, it looks up the 
pg_database tuple directly from the name.  I think both are correct.  My 
code has been copied from the analogous code in AlterCollation().

Re: Database-level collation version tracking

From

Julien Rouhaud

Date:

09 February 2022, 13:31:18

On Wed, Feb 09, 2022 at 12:48:35PM +0100, Peter Eisentraut wrote:
> On 08.02.22 13:55, Julien Rouhaud wrote:
> > I'm just saying that without such a lock you can easily trigger the "cache
> > lookup" error, and that's something that's supposed to happen with normal
> > usage I think.  So it should be a better message saying that the database has
> > been concurrently dropped, or actually simply does not exist like it's done in
> > AlterDatabaseOwner() for the same pattern:
> > 
> > [...]
> >     tuple = systable_getnext(scan);
> >     if (!HeapTupleIsValid(tuple))
> >         ereport(ERROR,
> >                 (errcode(ERRCODE_UNDEFINED_DATABASE),
> >                  errmsg("database \"%s\" does not exist", dbname)));
> > [...]
> 
> In my code, the existence of the database is checked by
> 
>     dboid = get_database_oid(stmt->dbname, false);
> 
> This also issues an appropriate user-facing error message if the database
> does not exist.

Yes but if you run a DROP DATABASE concurrently  you will either get a
"database does not exist" or "cache lookup failed" depending on whether the
DROP is processed before or after the get_database_oid().

I agree that it's not worth trying to make it concurrent-drop safe, but I also
thought that throwing plain elog(ERROR) should be avoided when reasonably
doable.  And in that situation we know it can happen in normal situation, so
having a real error message looks like a cost-free improvement.  Now if it's
better to have a cache lookup error even in that situation just for safety or
something ok, it's not like trying to refresh a db collation and having someone
else dropping it at the same time is going to be a common practice anway.

> The flow in AlterDatabaseOwner() is a bit different, it looks up the
> pg_database tuple directly from the name.  I think both are correct.  My
> code has been copied from the analogous code in AlterCollation().

I also think it would be better to have a "collation does not exist" in the
syscache failure message, but same here dropping collation is probably even
less frequent than dropping database, let alone while refreshing the collation
version.

Re: Database-level collation version tracking

From

Peter Eisentraut

Date:

09 February 2022, 16:12:41

On 08.02.22 13:55, Julien Rouhaud wrote:
> Apart from that I still think that we should check the collation version of the
> source database when creating a new database.  It won't cost much but will give
> the DBA a chance to recreate the indexes before risking invalid index usage.

A question on this:  In essence, this would be putting code into 
createdb() similar to the code in postinit.c:CheckMyDatabase().  But 
what should we make it do and say exactly?

After thinking about this for a bit, I suggest: If the actual collation 
version of the newly created database does not match the recorded 
collation version of the template database, we should error out and make 
the user fix the template database (by reindexing there etc.).

The alternative is to warn, as it does now in postinit.c.  But then the 
phrasing of the message becomes complicated: Should we make the user fix 
the new database or the template database or both?  And if they don't 
fix the template database, they will have the same problem again.  So 
making it a hard failure seems better to me.

Re: Database-level collation version tracking

From

Alvaro Herrera

Date:

09 February 2022, 16:25:08

On 2022-Feb-08, Julien Rouhaud wrote:

> On Tue, Feb 08, 2022 at 12:14:02PM +0100, Peter Eisentraut wrote:

> > I don't think you can run ALTER DATABASE from the regression test scripts,
> > since the database name is not fixed.  You'd have to paste the command
> > together using psql tricks or something.  I guess it could be done, but
> > maybe there is a better solution.  We could put it into the createdb test
> > suite, or write a new TAP test suite somewhere.  There is no good precedent
> > for this.
> 
> I was thinking using a simple DO command, as there are already some other usage
> for that for non deterministic things (like TOAST tables).  If it's too
> problematic I'm fine with not having tests for that for now.

You can do this:

select current_database() as datname \gset
alter database :"datname" owner to foo;

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: Database-level collation version tracking

From

Julien Rouhaud

Date:

10 February 2022, 02:54:30

On Wed, Feb 09, 2022 at 05:12:41PM +0100, Peter Eisentraut wrote:
> On 08.02.22 13:55, Julien Rouhaud wrote:
> > Apart from that I still think that we should check the collation version of the
> > source database when creating a new database.  It won't cost much but will give
> > the DBA a chance to recreate the indexes before risking invalid index usage.
> 
> A question on this:  In essence, this would be putting code into createdb()
> similar to the code in postinit.c:CheckMyDatabase().  But what should we
> make it do and say exactly?
> 
> After thinking about this for a bit, I suggest: If the actual collation
> version of the newly created database does not match the recorded collation
> version of the template database, we should error out and make the user fix
> the template database (by reindexing there etc.).

I'm not sure what you really mean by "actual collation version of the newly
created database".  Is it really a check after having done all the work or just
checking a discrepancy when computing the to-be-created database version from
the source database, ie.  something like

   if (dbcollversion == NULL)
+   {
       dbcollversion = src_collversion;
+      if src_collversion != get_collation_actual_version(the source db)
+          // raise error or warning

> The alternative is to warn, as it does now in postinit.c.  But then the
> phrasing of the message becomes complicated: Should we make the user fix the
> new database or the template database or both?  And if they don't fix the
> template database, they will have the same problem again.  So making it a
> hard failure seems better to me.

Agreed, I'm in favor of a hard error.  Maybe a message like:

errmsg(cannot create database %s)
errdetail(the template database %s was created using collation version %s, but
the operating system provides version %s)

Also, that check shouldn't be done when using the COLLATION_VERSION option of
create database, since it's there for pg_upgrade usage?

Re: Database-level collation version tracking

From

Peter Eisentraut

Date:

10 February 2022, 08:57:59

New patch that fixes all reported issues, I think:

- Added test for ALTER DATABASE / REFRESH COLLATION VERSION

- Rewrote AlterDatabaseRefreshCollVersion() with better locking

- Added version checking in createdb()

Attachment

v4-0001-Database-level-collation-version-tracking.patch

Re: Database-level collation version tracking

From

Julien Rouhaud

Date:

10 February 2022, 11:08:02

On Thu, Feb 10, 2022 at 09:57:59AM +0100, Peter Eisentraut wrote:
> New patch that fixes all reported issues, I think:
> 
> - Added test for ALTER DATABASE / REFRESH COLLATION VERSION
> 
> - Rewrote AlterDatabaseRefreshCollVersion() with better locking
> 
> - Added version checking in createdb()

Thanks!  All issues are indeed fixed.  I just have a few additional comments:



> From 290ebb9ca743a2272181f435d5ea76d8a7280a0a Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter@eisentraut.org>
> Date: Thu, 10 Feb 2022 09:44:20 +0100
> Subject: [PATCH v4] Database-level collation version tracking



> +     * collation version was specified explicitly as an statement option; that

typo, should be "as a statement"

> +        actual_versionstr = get_collation_actual_version(COLLPROVIDER_LIBC, dbcollate);
> +        if (!actual_versionstr)
> +            ereport(ERROR,
> +                    (errmsg("template database \"%s\" has a collation version, but no actual collation version could
bedetermined",
 
> +                            dbtemplate)));
> +
> +        if (strcmp(actual_versionstr, src_collversion) != 0)
> +            ereport(ERROR,
> +                    (errmsg("template database \"%s\" has a collation version mismatch",
> +                            dbtemplate),
> +                     errdetail("The template database was created using collation version %s, "
> +                               "but the operating system provides version %s.",
> +                               src_collversion, actual_versionstr),
> +                     errhint("Rebuild all objects affected by collation in the template database and run "
> +                             "ALTER DATABASE %s REFRESH COLLATION VERSION, "
> +                             "or build PostgreSQL with the right library version.",
> +                             quote_identifier(dbtemplate))));

After a second read I think the messages are slightly ambiguous.  What do you
think about specifying the problematic collation name and provider?

For now we only support libc default collation so users will probably have to
reindex almost everything on that database (not sure if the versioning is more
fine grained on Windows), but we should probably still specify "affected by
libc collation" in the errhint so they have a chance to avoid unnecessary
reindex.

And this will hopefully become more important to have those information, when
ICU default collations will land :)

> +/*
> + * ALTER DATABASE name REFRESH COLLATION VERSION
> + */
> +ObjectAddress
> +AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)

I'm wondering why you changed this function to return an ObjectAddress rather
than an Oid?  There's no event trigger support for ALTER DATABASE, and the rest
of similar utility commands also returns Oid.

Other than that it all looks good to me!

Re: Database-level collation version tracking

From

Peter Eisentraut

Date:

11 February 2022, 11:07:02

On 10.02.22 12:08, Julien Rouhaud wrote:
>> +                     errhint("Rebuild all objects affected by collation in the template database and run "
>> +                             "ALTER DATABASE %s REFRESH COLLATION VERSION, "
>> +                             "or build PostgreSQL with the right library version.",
>> +                             quote_identifier(dbtemplate))));
> 
> After a second read I think the messages are slightly ambiguous.  What do you
> think about specifying the problematic collation name and provider?
> 
> For now we only support libc default collation so users will probably have to
> reindex almost everything on that database (not sure if the versioning is more
> fine grained on Windows), but we should probably still specify "affected by
> libc collation" in the errhint so they have a chance to avoid unnecessary
> reindex.

I think accurate would be something like "objects using the default 
collation", since objects using a specific collation are not meant, even 
if they use the same provider.

>> +/*
>> + * ALTER DATABASE name REFRESH COLLATION VERSION
>> + */
>> +ObjectAddress
>> +AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
> 
> I'm wondering why you changed this function to return an ObjectAddress rather
> than an Oid?  There's no event trigger support for ALTER DATABASE, and the rest
> of similar utility commands also returns Oid.

Hmm, I was looking at RenameDatabase() and AlterDatabaseOwner(), which 
return ObjectAddress.

Re: Database-level collation version tracking

From

Julien Rouhaud

Date:

11 February 2022, 12:51:10

On Fri, Feb 11, 2022 at 12:07:02PM +0100, Peter Eisentraut wrote:
> On 10.02.22 12:08, Julien Rouhaud wrote:
> > > +                     errhint("Rebuild all objects affected by collation in the template database and run "
> > > +                             "ALTER DATABASE %s REFRESH COLLATION VERSION, "
> > > +                             "or build PostgreSQL with the right library version.",
> > > +                             quote_identifier(dbtemplate))));
> > 
> > After a second read I think the messages are slightly ambiguous.  What do you
> > think about specifying the problematic collation name and provider?
> > 
> > For now we only support libc default collation so users will probably have to
> > reindex almost everything on that database (not sure if the versioning is more
> > fine grained on Windows), but we should probably still specify "affected by
> > libc collation" in the errhint so they have a chance to avoid unnecessary
> > reindex.
> 
> I think accurate would be something like "objects using the default
> collation", since objects using a specific collation are not meant, even if
> they use the same provider.

Technically is the objects explicitly use the same collation as the default
collation they should be impacted the same way, but agreed.

> > > +/*
> > > + * ALTER DATABASE name REFRESH COLLATION VERSION
> > > + */
> > > +ObjectAddress
> > > +AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
> > 
> > I'm wondering why you changed this function to return an ObjectAddress rather
> > than an Oid?  There's no event trigger support for ALTER DATABASE, and the rest
> > of similar utility commands also returns Oid.
> 
> Hmm, I was looking at RenameDatabase() and AlterDatabaseOwner(), which
> return ObjectAddress.

Apparently I managed to only check AlterDatabase and AlterDatabaseSet, which
both return an Oid.  Maybe we could also update those two to also return an
ObjectAddress, for consistency?

Re: Database-level collation version tracking

From

Peter Eisentraut

Date:

14 February 2022, 08:55:19

On 11.02.22 13:51, Julien Rouhaud wrote:
>>> I'm wondering why you changed this function to return an ObjectAddress rather
>>> than an Oid?  There's no event trigger support for ALTER DATABASE, and the rest
>>> of similar utility commands also returns Oid.
>>
>> Hmm, I was looking at RenameDatabase() and AlterDatabaseOwner(), which
>> return ObjectAddress.
> 
> Apparently I managed to only check AlterDatabase and AlterDatabaseSet, which
> both return an Oid.  Maybe we could also update those two to also return an
> ObjectAddress, for consistency?

I have committed this patch.

I didn't address the above issue.  I looked at it a bit, but I also 
found other (non-database) object types that had a mix of different 
return types.  It's not clear to me what this is all supposed to mean. 
If no one is checking the return, they should really all be turned into 
void, IMO.  Maybe this should be a separate discussion.

Re: Database-level collation version tracking

From

Julien Rouhaud

Date:

14 February 2022, 09:14:35

Hi,

On Mon, Feb 14, 2022 at 09:55:19AM +0100, Peter Eisentraut wrote:
> I have committed this patch.

Great!  Do you plan to send a rebased version of the ICU default collation
soon or should I start looking at the current v4?

> I didn't address the above issue.  I looked at it a bit, but I also found
> other (non-database) object types that had a mix of different return types.
> It's not clear to me what this is all supposed to mean. If no one is
> checking the return, they should really all be turned into void, IMO.  Maybe
> this should be a separate discussion.

Agreed.

Re: Database-level collation version tracking

From

Peter Eisentraut

Date:

14 February 2022, 15:50:03

On 14.02.22 10:14, Julien Rouhaud wrote:
> Do you plan to send a rebased version of the ICU default collation
> soon or should I start looking at the current v4?

I will send an updated patch in the next few days.

Re: Database-level collation version tracking

From

Alvaro Herrera

Date:

31 March 2022, 09:14:29

On 2022-Feb-14, Peter Eisentraut wrote:

> On 11.02.22 13:51, Julien Rouhaud wrote:
> > > > I'm wondering why you changed this function to return an ObjectAddress rather
> > > > than an Oid?  There's no event trigger support for ALTER DATABASE, and the rest
> > > > of similar utility commands also returns Oid.
> > > 
> > > Hmm, I was looking at RenameDatabase() and AlterDatabaseOwner(), which
> > > return ObjectAddress.
> > 
> > Apparently I managed to only check AlterDatabase and AlterDatabaseSet, which
> > both return an Oid.  Maybe we could also update those two to also return an
> > ObjectAddress, for consistency?

> I didn't address the above issue.  I looked at it a bit, but I also found
> other (non-database) object types that had a mix of different return types.
> It's not clear to me what this is all supposed to mean. If no one is
> checking the return, they should really all be turned into void, IMO.  Maybe
> this should be a separate discussion.

IIRC we changed the return types of all DDL back when we were doing the
event triggers work (first to OIDs and then to ObjectAddress), but we
didn't realize at the time that shared objects such as databases etc
were not going to be supported by event triggers.  So those particular
changes were for naught, but we never reverted them.

Maybe it's OK to have all the functions supporting databases and
tablespaces return void.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
<inflex> really, I see PHP as like a strange amalgamation of C, Perl, Shell
<crab> inflex: you know that "amalgam" means "mixture with mercury",
       more or less, right?
<crab> i.e., "deadly poison"