Re: pg14 psql broke \d datname.nspname.relname - Mailing list pgsql-hackers
From | Mark Dilger |
---|---|
Subject | Re: pg14 psql broke \d datname.nspname.relname |
Date | |
Msg-id | 300422FF-60E3-4EF1-BE8B-16D302B2CE6C@enterprisedb.com Whole thread Raw |
In response to | Re: pg14 psql broke \d datname.nspname.relname (Mark Dilger <mark.dilger@enterprisedb.com>) |
Responses |
Re: pg14 psql broke \d datname.nspname.relname
Re: pg14 psql broke \d datname.nspname.relname |
List | pgsql-hackers |
> On Oct 13, 2021, at 1:43 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote: > > The issue of name parsing impacts pg_dump and pg_dumpall, also. Consider what happens with: > > pg_dump -t production.critical.secrets > secrets.dump > dropdb production > > In v13, if your default database is "testing", and database "testing" has the same schemas and tables (but not data) asproduction, you are unhappy. You just dumped a copy of your test data and blew away the production data. > > You could end up unhappy in v14, if database "testing" has a schema named "production" and a table that matches the pattern/^critical.secrets$/, but otherwise, you'll get an error from pg_dump, "pg_dump: error: no matching tables were found". Neither behavior seems correct. With the attached patch, this scenario results in a "cross-database references are not implemented" error. > The function where the processing occurs is processSQLNamePattern, which is called by pg_dump, pg_dumpall, and psql. Allthree callers expect processSQLNamePattern to append where-clauses to a buffer, not to execute any sql of its own. Ipropose that processSQLNamePattern return an error code if the pattern contains more than three parts, but otherwise insertthe database portion into the buffer as a "pg_catalog.current_database() OPERATOR(pg_catalog.=) <database>", where<database> is a properly escaped representation of the database portion. Maybe someday we can change that to OPERATOR(pg_catalog.~),but for now we lack the sufficient logic for handling multiple matching database names. (The situationis different for pg_dumpall, as it's using the normal logic for matching a relation name, not for matching a database,and we'd still be fine matching that against a pattern.) I ultimately went with your strcmp idea rather than OPERATOR(pg_catalog.=), as rejecting the database name as part of thequery complicates the calling convention for no apparent benefit. I had been concerned about database names that werecollation-wise equal but byte-wise unequal, but it seems we already treat those as distinct database names, so my concernwas unnecessary. We already use strcmp on database names from frontend clients (fe_utils/parallel_slots.c, psql/prompt.c,pg_amcheck.c, pg_dump.c, pg_upgrade/relfilenode.c), from libpq (libpq/hba.c) and from the backend (commands/dbcommands.c,init/postinit.c). I tried testing how this plays out by handing `createdb` the name é (U+00E9 "LATIN SMALL LETTER E WITH ACCUTE") and thenagain the name é (U+0065 "LATIN SMALL LETTER E" followed by U+0301 "COMBINING ACCUTE ACCENT".) That results in two distinctdatabases, not an error about a duplicate database name: # select oid, datname, datdba, encoding, datcollate, datctype from pg_catalog.pg_database where datname IN ('é', 'é'); oid | datname | datdba | encoding | datcollate | datctype -------+---------+--------+----------+-------------+------------- 37852 | é | 10 | 6 | en_US.UTF-8 | en_US.UTF-8 37855 | é | 10 | 6 | en_US.UTF-8 | en_US.UTF-8 (2 rows) But that doesn't seem to prove much, as other tools in my locale don't treat those as equal either. (Testing with perl's"eq" operator, they compare as distinct.) I expected to find regression tests providing better coverage for this somewhere,but did not. Anybody know more about it? > For psql and pg_dump, I'm tempted to restrict the database portion (if not quoted) to neither contain shell glob charactersnor POSIX regex characters, and return an error code if any are found, so that the clients can raise an appropriateerror to the user. With the patch, using pattern characters in an unquoted database portion results in a "database name must be literal" error. Using them in a quoted database name is allowed, but unless you are connected to a database that literally equalsthat name, you will get a "cross-database references are not implemented" error. > In psql, this proposal would result in no tables matching \d wrongdb.schema.table, which would differ from v13's behavior. You wouldn't get an error about having specified the wrong database. You'd just get no matching relations. \d??db??.schema.table would complain about the db portion being a pattern. \d "??db??".schema.table would work, assumingyou're connected to a database literally named ??db?? With the patch, psql will treat \d wrongdb.schema.table as a "cross-database references are not implemented" error. > In pg_dumpall, --exclude-database=more.than.one.part would give an error about too many dotted parts rather than simplytrying to exclude the last "part" and silently ignoring the prefix, which I think is what v13's pg_dumpall would do. --exclude-database=db?? would work to exclude four character database names beginning in "db". The patch implements this. > In pg_dump, the -t wrongdb.schema.table would match nothing and give the familiar error "pg_dump: error: no matching tableswere found". With the patch, pg_dump instead gives a "cross-database references are not implemented" error. > pg_dump -t too.many.dotted.names would give a different error about too many parts. With the patch, pg_dump instead gives a "improper qualified name (too many dotted names)" error. > pg_dump -t db??.foo.bar would give an error about the database needing to be a literal name rather than a pattern. With the patch, pg_dump gives a "database name must be literal" error. This is the only new error message in the patch,which puts a burden on translators, but I didn't see any existing message that would serve. Suggestions welcome. > I don't like your proposal to use a strcmp() rather than a pg_catalog.= match, because it diverges from how the rest ofthe pattern is treated, including in how encoding settings might interact with the name, needing to be executed on theclient side rather than in the server where the rest of the name resolution is happening. Recanted, as discussed above. The patch only changes the behavior of pg_amcheck in that it now rejects patterns with too many parts. Using database patternswas and remains legal for this tool. The patch changes nothing about reindexdb. That's a debatable design choice, but reindexdb doesn't use string_utils's processSQLNamePattern()function as the other tools do, nor does its documentation reference psql's #APP-PSQL-PATTERNS documentation. It's --schema option only takes literal names. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: