Re: pg14 psql broke \d datname.nspname.relname - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: pg14 psql broke \d datname.nspname.relname
Date
Msg-id 300422FF-60E3-4EF1-BE8B-16D302B2CE6C@enterprisedb.com
Whole thread Raw
In response to Re: pg14 psql broke \d datname.nspname.relname  (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses Re: pg14 psql broke \d datname.nspname.relname  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: pg14 psql broke \d datname.nspname.relname  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-hackers
> On Oct 13, 2021, at 1:43 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>
> The issue of name parsing impacts pg_dump and pg_dumpall, also.  Consider what happens with:
>
> pg_dump -t production.critical.secrets > secrets.dump
> dropdb production
>
> In v13, if your default database is "testing", and database "testing" has the same schemas and tables (but not data)
asproduction, you are unhappy.  You just dumped a copy of your test data and blew away the production data. 
>
> You could end up unhappy in v14, if database "testing" has a schema named "production" and a table that matches the
pattern/^critical.secrets$/, but otherwise, you'll get an error from pg_dump, "pg_dump: error: no matching tables were
found". Neither behavior seems correct. 

With the attached patch, this scenario results in a "cross-database references are not implemented" error.

> The function where the processing occurs is processSQLNamePattern, which is called by pg_dump, pg_dumpall, and psql.
Allthree callers expect processSQLNamePattern to append where-clauses to a buffer, not to execute any sql of its own.
Ipropose that processSQLNamePattern return an error code if the pattern contains more than three parts, but otherwise
insertthe database portion into the buffer as a "pg_catalog.current_database() OPERATOR(pg_catalog.=) <database>",
where<database> is a properly escaped representation of the database portion.  Maybe someday we can change that to
OPERATOR(pg_catalog.~),but for now we lack the sufficient logic for handling multiple matching database names.  (The
situationis different for pg_dumpall, as it's using the normal logic for matching a relation name, not for matching a
database,and we'd still be fine matching that against a pattern.) 

I ultimately went with your strcmp idea rather than OPERATOR(pg_catalog.=), as rejecting the database name as part of
thequery complicates the calling convention for no apparent benefit.  I had been concerned about database names that
werecollation-wise equal but byte-wise unequal, but it seems we already treat those as distinct database names, so my
concernwas unnecessary.  We already use strcmp on database names from frontend clients (fe_utils/parallel_slots.c,
psql/prompt.c,pg_amcheck.c, pg_dump.c, pg_upgrade/relfilenode.c), from libpq (libpq/hba.c) and from the backend
(commands/dbcommands.c,init/postinit.c).   

I tried testing how this plays out by handing `createdb` the name é (U+00E9 "LATIN SMALL LETTER E WITH ACCUTE") and
thenagain the name é (U+0065 "LATIN SMALL LETTER E" followed by U+0301 "COMBINING ACCUTE ACCENT".)  That results in two
distinctdatabases, not an error about a duplicate database name: 

# select oid, datname, datdba, encoding, datcollate, datctype from pg_catalog.pg_database where datname IN ('é', 'é');
  oid  | datname | datdba | encoding | datcollate  |  datctype
-------+---------+--------+----------+-------------+-------------
 37852 | é       |     10 |        6 | en_US.UTF-8 | en_US.UTF-8
 37855 | é       |     10 |        6 | en_US.UTF-8 | en_US.UTF-8
(2 rows)

But that doesn't seem to prove much, as other tools in my locale don't treat those as equal either.  (Testing with
perl's"eq" operator, they compare as distinct.)  I expected to find regression tests providing better coverage for this
somewhere,but did not.  Anybody know more about it? 

> For psql and pg_dump, I'm tempted to restrict the database portion (if not quoted) to neither contain shell glob
charactersnor POSIX regex characters, and return an error code if any are found, so that the clients can raise an
appropriateerror to the user. 

With the patch, using pattern characters in an unquoted database portion results in a "database name must be literal"
error. Using them in a quoted database name is allowed, but unless you are connected to a database that literally
equalsthat name, you will get a "cross-database references are not implemented" error. 

> In psql, this proposal would result in no tables matching \d wrongdb.schema.table, which would differ from v13's
behavior. You wouldn't get an error about having specified the wrong database.  You'd just get no matching relations.
\d??db??.schema.table would complain about the db portion being a pattern.  \d "??db??".schema.table would work,
assumingyou're connected to a database literally named ??db?? 

With the patch, psql will treat \d wrongdb.schema.table as a "cross-database references are not implemented" error.

> In pg_dumpall, --exclude-database=more.than.one.part would give an error about too many dotted parts rather than
simplytrying to exclude the last "part" and silently ignoring the prefix, which I think is what v13's pg_dumpall would
do. --exclude-database=db?? would work to exclude four character database names beginning in "db". 

The patch implements this.

> In pg_dump, the -t wrongdb.schema.table would match nothing and give the familiar error "pg_dump: error: no matching
tableswere found". 

With the patch, pg_dump instead gives a "cross-database references are not implemented" error.

>   pg_dump -t too.many.dotted.names would give a different error about too many parts.

With the patch, pg_dump instead gives a "improper qualified name (too many dotted names)" error.

> pg_dump -t db??.foo.bar would give an error about the database needing to be a literal name rather than a pattern.

With the patch, pg_dump gives a "database name must be literal" error.  This is the only new error message in the
patch,which puts a burden on translators, but I didn't see any existing message that would serve.  Suggestions welcome. 

> I don't like your proposal to use a strcmp() rather than a pg_catalog.= match, because it diverges from how the rest
ofthe pattern is treated, including in how encoding settings might interact with the name, needing to be executed on
theclient side rather than in the server where the rest of the name resolution is happening.  

Recanted, as discussed above.


The patch only changes the behavior of pg_amcheck in that it now rejects patterns with too many parts.  Using database
patternswas and remains legal for this tool. 

The patch changes nothing about reindexdb.  That's a debatable design choice, but reindexdb doesn't use string_utils's
processSQLNamePattern()function as the other tools do, nor does its documentation reference psql's #APP-PSQL-PATTERNS
documentation. It's --schema option only takes literal names. 




—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: cursor use vs pg_stat_statements
Next
From: Anna Akenteva
Date:
Subject: Some questions about schema privileges