Re: Database-level collation version tracking - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: Database-level collation version tracking |
Date | |
Msg-id | 20220207102947.6npnjyilcy4cgkxt@jrouhaud Whole thread Raw |
In response to | Database-level collation version tracking (Peter Eisentraut <peter.eisentraut@enterprisedb.com>) |
Responses |
Re: Database-level collation version tracking
|
List | pgsql-hackers |
On Tue, Feb 01, 2022 at 04:20:14PM +0100, Peter Eisentraut wrote: > This patch adds to database objects the same version tracking that collation > objects have. This version conflicts with 87669de72c2 (Some cleanup for change of collate and ctype fields to type text), so I'm attaching a simple rebase of your patch to make the cfbot happy, no other changes. > There is a new pg_database column datcollversion that stores > the version, a new function pg_database_collation_actual_version() to get > the version from the operating system, and a new subcommand ALTER DATABASE > ... REFRESH COLLATION VERSION. > > This was not originally added together with pg_collation.collversion, since > originally version tracking was only supported for ICU, and ICU on a > database-level is not currently supported. But we now have version tracking > for glibc (since PG13), FreeBSD (since PG14), and Windows (since PG13), so > this is useful to have now. And of course ICU on database-level is being > worked on at the moment as well. > This patch is pretty much complete AFAICT. Agreed. Here's a review of the patch: - there should be a mention to the need for a catversion bump in the message comment - there is no test - it's missing some updates in create_database.sgml, and psql tab completion for CREATE DATABASE with the new collation_version defelem. - that's not really something new with this patch, but should we output the collation version info or mismatch info in \l / \dO? + if (!actual_versionstr) + ereport(ERROR, + (errmsg("database \"%s\" has no actual collation version, but a version was specified", + name)));- this means you can't connect on such a database anymore. The level is probably ok for collation version check, but for db isn't that too much? +Oid +AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt) +{ + Relation rel; + Oid dboid; + HeapTuple tup; + Datum datum; + bool isnull; + char *oldversion; + char *newversion; + + rel = table_open(DatabaseRelationId, RowExclusiveLock); + dboid = get_database_oid(stmt->dbname, false); + + if (!pg_database_ownercheck(dboid, GetUserId())) + aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE, + stmt->dbname); + + tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(dboid)); + if (!HeapTupleIsValid(tup)) + elog(ERROR, "cache lookup failed for database %u", dboid); Is that ok to not obtain a lock on the database when refreshing the collation? I guess it's not worth bothering as it can only lead to spurious messages for connection done concurrently, but there should be a comment to clarify it. Also, it means that someone can drop the database concurrently. So it's should be a "database does not exist" rather than a cache lookup failed error message. + /* + * Check collation version. See similar code in + * pg_newlocale_from_collation(). + */ + datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_datcollversion, + &isnull); + if (!isnull) + { This (and pg_newlocale_from_collation()) reports a problem if a recorded collation version is found but there's no reported collation version. Shouldn't it also complain if it's the opposite? It's otherwise a backdoor to make sure there won't be any check about the version anymore, and while it can probably happen if you mess with the catalogs it still doesn't look great. + /* + * template0 shouldn't have any collation-dependent objects, so unset + * the collation version. This avoids warnings when making a new + * database from it. + */ + "UPDATE pg_database SET datcollversion = NULL WHERE datname = 'template0';\n\n", I'm not opposed, but shouldn't there indeed be a warning in case of discrepancy in the source database (whether template or not)? # update pg_database set datcollversion = 'meh' where datname in ('postgres', 'template1'); UPDATE 2 # create database test1 template postgres; CREATE DATABASE # create database test2 template template1; CREATE DATABASE # \c test2 WARNING: database "test2" has a collation version mismatch DETAIL: The database was created using collation version meh, but the operating system provides version 2.34. HINT: Rebuild all objects affected by collation in this database and run ALTER DATABASE test2 REFRESH COLLATION VERSION,or build PostgreSQL with the right library version. You are now connected to database "test2" as user "rjuju". The rest of the patch looks good to me. There's notably pg_dump and pg_upgrade support so it indeed looks complete. > One bonus thing would be to add > a query to the ALTER DATABASE ref page similar to the one on the ALTER > COLLATION ref page that identifies the objects that are affected by outdated > collations. The database version of that might just show all > collation-using objects or something like that. Suggestions welcome. That would be nice, but that's something quite hard to do and the resulting query would be somewhat unreadable. Also, you need to look at custom data types, expression and quals at least, so I'm not even sure that you can actually do it in pure SQL without additional infrastructure. > Also, one curious behavior is that if you get to a situation where the > collation version is mismatched, every time an autovacuum worker launches > you get the collation version mismatch warning in the log. Maybe that's > actually correct, but maybe we want to dial it down a bit for > non-interactive sessions. I'm +0.5 to keep it that way. Index corruption is a real danger, so if you have enough autovacuum worker launched to have a real problem with that, you clearly should take care of the problem even faster.
Attachment
pgsql-hackers by date: