Thread: pg14 psql broke \d datname.nspname.relname
This commit broke psql \d datname.nspname.relname commit 2c8726c4b0a496608919d1f78a5abc8c9b6e0868 Author: Robert Haas <rhaas@postgresql.org> Date: Wed Feb 3 13:19:41 2021 -0500 Factor pattern-construction logic out of processSQLNamePattern. ... patternToSQLRegex is a little more general than what is required by processSQLNamePattern. That function is only interested in patterns that can have up to 2 parts, a schema and a relation; but patternToSQLRegex can limit the maximum number of parts to between 1 and 3, so that patterns can look like either "database.schema.relation", "schema.relation", or "relation" depending on how it's invoked and what the user specifies. processSQLNamePattern only passes two buffers, so works exactly the same as before, always interpreting the pattern as either a "schema.relation" pattern or a "relation" pattern. But, future callers can use this function in other ways. |$ LD_LIBRARY_PATH=tmp_install/usr/local/pgsql/lib/ src/bin/psql/psql -h /tmp regression |psql (15devel) |Type "help" for help. |regression=# \d regresion.public.bit_defaults |Did not find any relation named "regresion.public.bit_defaults". |regression=# \d public.bit_defaults | Table "public.bit_defaults" |... This worked before v14 (even though the commit message says otherwise). |$ /usr/lib/postgresql/13/bin/psql -h /tmp regression |psql (13.2 (Debian 13.2-1.pgdg100+1), server 15devel) |... |regression=# \d regresion.public.bit_defaults | Table "public.bit_defaults" |... -- Justin
> On Oct 11, 2021, at 2:24 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > This commit broke psql \d datname.nspname.relname > > commit 2c8726c4b0a496608919d1f78a5abc8c9b6e0868 > Author: Robert Haas <rhaas@postgresql.org> > Date: Wed Feb 3 13:19:41 2021 -0500 > > Factor pattern-construction logic out of processSQLNamePattern. > ... > patternToSQLRegex is a little more general than what is required > by processSQLNamePattern. That function is only interested in > patterns that can have up to 2 parts, a schema and a relation; > but patternToSQLRegex can limit the maximum number of parts to > between 1 and 3, so that patterns can look like either > "database.schema.relation", "schema.relation", or "relation" > depending on how it's invoked and what the user specifies. > > processSQLNamePattern only passes two buffers, so works exactly > the same as before, always interpreting the pattern as either > a "schema.relation" pattern or a "relation" pattern. But, > future callers can use this function in other ways. > > |$ LD_LIBRARY_PATH=tmp_install/usr/local/pgsql/lib/ src/bin/psql/psql -h /tmp regression > |psql (15devel) > |Type "help" for help. > |regression=# \d regresion.public.bit_defaults > |Did not find any relation named "regresion.public.bit_defaults". > |regression=# \d public.bit_defaults > | Table "public.bit_defaults" > |... > > This worked before v14 (even though the commit message says otherwise). > > |$ /usr/lib/postgresql/13/bin/psql -h /tmp regression > |psql (13.2 (Debian 13.2-1.pgdg100+1), server 15devel) > |... > |regression=# \d regresion.public.bit_defaults > | Table "public.bit_defaults" > |... I can only assume that you are intentionally misspelling "regression" as "regresion" (with only one "s") as part of the test. I have not checked if that worked before v14, but if it ignored the misspelled database name before v14, and it rejectsit now, I'm not sure that counts as a bug. Am I misunderstanding your bug report? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Mark Dilger <mark.dilger@enterprisedb.com> writes: > I can only assume that you are intentionally misspelling "regression" as "regresion" (with only one "s") as part of thetest. I have not checked if that worked before v14, but if it ignored the misspelled database name before v14, and itrejects it now, I'm not sure that counts as a bug. Doesn't work with the correct DB name, either: regression=# \d public.bit_defaults Table "public.bit_defaults" Column | Type | Collation | Nullable | Default --------+----------------+-----------+----------+--------------------- b1 | bit(4) | | | '1001'::"bit" b2 | bit(4) | | | '0101'::"bit" b3 | bit varying(5) | | | '1001'::bit varying b4 | bit varying(5) | | | '0101'::"bit" regression=# \d regression.public.bit_defaults Did not find any relation named "regression.public.bit_defaults". regards, tom lane
> On Oct 11, 2021, at 3:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Doesn't work with the correct DB name, either: > > regression=# \d public.bit_defaults > Table "public.bit_defaults" > Column | Type | Collation | Nullable | Default > --------+----------------+-----------+----------+--------------------- > b1 | bit(4) | | | '1001'::"bit" > b2 | bit(4) | | | '0101'::"bit" > b3 | bit varying(5) | | | '1001'::bit varying > b4 | bit varying(5) | | | '0101'::"bit" > > regression=# \d regression.public.bit_defaults > Did not find any relation named "regression.public.bit_defaults". REL_13_STABLE appears to accept any amount of nonsense you like: foo=# \d nonesuch.foo.a.b.c.d.bar.baz Table "bar.baz" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- i | integer | | | Is this something we're intentionally supporting? There is no regression test covering this, else we'd have seen breakagein the build-farm. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 11, 2021 at 02:47:59PM -0700, Mark Dilger wrote: > > |$ LD_LIBRARY_PATH=tmp_install/usr/local/pgsql/lib/ src/bin/psql/psql -h /tmp regression > > |psql (15devel) > > |Type "help" for help. > > |regression=# \d regresion.public.bit_defaults > > |Did not find any relation named "regresion.public.bit_defaults". > > |regression=# \d public.bit_defaults > > | Table "public.bit_defaults" > > |... > > > > This worked before v14 (even though the commit message says otherwise). > > > > |$ /usr/lib/postgresql/13/bin/psql -h /tmp regression > > |psql (13.2 (Debian 13.2-1.pgdg100+1), server 15devel) > > |... > > |regression=# \d regresion.public.bit_defaults > > | Table "public.bit_defaults" > > |... > > I can only assume that you are intentionally misspelling "regression" as "regresion" (with only one "s") as part of thetest. I have not checked if that worked before v14, but if it ignored the misspelled database name before v14, and itrejects it now, I'm not sure that counts as a bug. > > Am I misunderstanding your bug report? It's not intentional but certainly confusing to put a typo there. Sorry for that (and good eyes, BTW). In v15/master: regression=# \d regression.public.bit_defaults Did not find any relation named "regression.public.bit_defaults". After reverting that commit and recompiling psql: regression=# \d regression.public.bit_defaults Table "public.bit_defaults" ... In v13 psql: regression=# \d regression.public.bit_defaults Table "public.bit_defaults" ... It looks like before v13 any "datname" prefix was ignored. But now it fails to show the table because it does: WHERE c.relname OPERATOR(pg_catalog.~) '^(public.bit_defaults)$' COLLATE pg_catalog.default AND n.nspname OPERATOR(pg_catalog.~) '^(regression)$' COLLATE pg_catalog.default -- Justin
> On Oct 11, 2021, at 3:26 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > It looks like before v13 any "datname" prefix was ignored. The evidence so far suggests that something is broken in v14, but it is less clear to me what the appropriate behavior is. The v14 psql is rejecting even a correctly named database.schema.table, but v13 psql accepted lots.of.nonsense.schema.table,and neither of those seems at first glance to be correct. But perhaps there are good reasonsfor ignoring the nonsense prefixes? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Mark Dilger <mark.dilger@enterprisedb.com> writes: >> On Oct 11, 2021, at 3:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Doesn't work with the correct DB name, either: >> regression=# \d regression.public.bit_defaults >> Did not find any relation named "regression.public.bit_defaults". > REL_13_STABLE appears to accept any amount of nonsense you like: Yeah, I'm pretty sure that the old rule was to just ignore whatever appeared in the database-name position. While we could tighten that up to insist that it match the current DB's name, I'm not sure that I see the point. There's no near-term prospect of doing anything useful with some other DB's name there, so being more restrictive seems like it'll probably break peoples' scripts to little purpose. regards, tom lane
> On Oct 11, 2021, at 3:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> REL_13_STABLE appears to accept any amount of nonsense you like: > > Yeah, I'm pretty sure that the old rule was to just ignore whatever > appeared in the database-name position. While we could tighten that > up to insist that it match the current DB's name, I'm not sure that > I see the point. There's no near-term prospect of doing anything > useful with some other DB's name there, so being more restrictive > seems like it'll probably break peoples' scripts to little purpose. You appear correct about the old behavior. It's unclear how intentional it was. There was a schema buffer and a name buffer,and while parsing the name, if a dot was encountered, the contents just parsed were copied into the schema buffer. If multiple dots were encountered, that had the consequence of blowing away the earlier ones. But since we allow tables and schemas with dotted names in them, I'm uncertain what \d foo.bar.baz is really asking. Thatcould be "foo.bar"."baz", or "foo"."bar"."baz", or "foo"."bar.baz", or even "public"."foo.bar.baz". The old behaviorseems a bit dangerous. There may be tables with all those names, and the user may not have meant the one that wegave them. The v14 code is no better. It just assumes that is "foo"."bar.baz". So (with debugging statements included): foo=# create table "foo.bar.baz" (i integer); CREATE TABLE foo=# \d public.foo.bar.baz Converting "public.foo.bar.baz" GOT "^(public)$" . "^(foo.bar.baz)$" Table "public.foo.bar.baz" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- i | integer | | | I expect I'll have to submit a patch restoring the old behavior, but I wonder if that's the best direction to go. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, 11 Oct 2021 at 19:35, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
But since we allow tables and schemas with dotted names in them, I'm uncertain what \d foo.bar.baz is really asking.
FWIW, it’s absolutely clear to me that "." is a special character which has to be quoted in order to be in an identifier. In other words, a.b.c is three identifiers separated by two period punctuation marks; what exactly those periods mean is another question. If somebody uses periods in their names, they have to quote those names just as if they used capital letters etc.
But that's just my impression. I comment at all because I remember looking at something to do with the grammar (I think I wanted to implement ALTER … RENAME TO newschema.newname) and noticed that a database name could be given in the syntax.
Mark Dilger <mark.dilger@enterprisedb.com> writes: > But since we allow tables and schemas with dotted names in them, I'm uncertain what \d foo.bar.baz is really asking. That could be "foo.bar"."baz", or "foo"."bar"."baz", or "foo"."bar.baz", or even "public"."foo.bar.baz". The old behaviorseems a bit dangerous. There may be tables with all those names, and the user may not have meant the one that wegave them. You are attacking a straw man here. To use a period in an identifier, you have to double-quote it; that's the same in SQL or \d. regression=# create table "foo.bar" (f1 int); CREATE TABLE regression=# \d foo.bar Did not find any relation named "foo.bar". regression=# \d "foo.bar" Table "public.foo.bar" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- f1 | integer | | | According to a quick test, you did not manage to break that in v14. > I expect I'll have to submit a patch restoring the old behavior, but I wonder if that's the best direction to go. I do not understand why you're even questioning that. The old behavior had stood for a decade or two without complaints. regards, tom lane
> On Oct 11, 2021, at 4:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > You are attacking a straw man here. To use a period in an identifier, > you have to double-quote it; that's the same in SQL or \d. That's a strange argument. If somebody gives an invalid identifier, we shouldn't assume they know the proper use of quotations. Somebody asking for a.b.c.d.e is clearly in the dark about something. Maybe it's the need to quote the "a.b"part separately from the "c.d.e" part, or maybe it's something else. There are lots of reasonable guesses about whatthey meant, and for backward compatibility reasons we define using the suffix d.e and ignoring the prefix a.b.c as thecorrect answer. That's a pretty arbitrary thing to do, but it has the advantage of being backwards compatible. >> I expect I'll have to submit a patch restoring the old behavior, but I wonder if that's the best direction to go. > > I do not understand why you're even questioning that. The old > behavior had stood for a decade or two without complaints. I find the backward compatibility argument appealing, but since we have clients that understand the full database.schema.relationformat without ignoring the database portion, our client behavior is getting inconsistent. I'd liketo leave the door open for someday supporting server.database.schema.relation format, too. I was just wondering whenit might be time to stop being lenient in psql and instead reject malformed identifiers. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 11, 2021 at 7:09 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > I was just wondering when it might be time to stop being lenient in psql and instead reject malformed identifiers. I suppose that I probably wouldn't have chosen this behavior in a green field situation. But Hyrum's law tells us that there are bound to be some number of users relying on it. I don't think that it's worth inconveniencing those people without getting a clear benefit in return. Being lenient here just doesn't have much downside in practice, as evidenced by the total lack of complaints about that lenience. -- Peter Geoghegan
On Mon, Oct 11, 2021 at 10:33 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Mon, Oct 11, 2021 at 7:09 PM Mark Dilger > <mark.dilger@enterprisedb.com> wrote: > > I was just wondering when it might be time to stop being lenient in psql and instead reject malformed identifiers. > > I suppose that I probably wouldn't have chosen this behavior in a > green field situation. But Hyrum's law tells us that there are bound > to be some number of users relying on it. I don't think that it's > worth inconveniencing those people without getting a clear benefit in > return. > > Being lenient here just doesn't have much downside in practice, as > evidenced by the total lack of complaints about that lenience. I find it kind of surprising to find everyone agreeing with this argument. I mean, PostgreSQL users are often quick to criticize MySQL for accepting 0000-00-00 as a date, because it isn't, and you shouldn't accept garbage and do stuff with it as if it were valid data. But by the same argument, accepting a database name that we know is not correct as a request to show data in the current database seems wrong to me. I completely agree that somebody might be relying on the fact that \d thisdb.someschema.sometable does something sensible when logged into thisdb, but surely no user is relying on \d jgldslghksdghjsgkhsdgjhskg.someschema.sometable is going to just ignore the leading gibberish. Nor do I understand why we'd want to ignore the leading gibberish. Saying, as Tom did, that nobody has complained about that behavior is just another way of saying that nobody tested it. Surely if someone had, it wouldn't be like this. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Oct 11, 2021 at 10:33 PM Peter Geoghegan <pg@bowt.ie> wrote: >> Being lenient here just doesn't have much downside in practice, as >> evidenced by the total lack of complaints about that lenience. > I find it kind of surprising to find everyone agreeing with this > argument. If the behavior v14 had implemented were "throw an error if the first word doesn't match the current database name", perhaps nobody would have questioned it. But that's not what we have. It's fairly clear that neither you nor Mark thought very much about this case, let alone tested it. Given that, I am not very pleased that you are retroactively trying to justify breaking it by claiming that it was already broken. It's been that way since 7.3 implemented schemas, more or less, and nobody's complained about it. Therefore I see little argument for changing that behavior. Changing it in an already-released branch is especially suspect. regards, tom lane
> On Oct 12, 2021, at 7:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > If the behavior v14 had implemented were "throw an error if the > first word doesn't match the current database name", perhaps nobody > would have questioned it. But that's not what we have. It's fairly > clear that neither you nor Mark thought very much about this case, > let alone tested it. Given that, I am not very pleased that you > are retroactively trying to justify breaking it by claiming that > it was already broken. It's been that way since 7.3 implemented > schemas, more or less, and nobody's complained about it. Therefore > I see little argument for changing that behavior. Changing it in > an already-released branch is especially suspect. I completely agree that we need to fix this. My question was only whether "fix" means to make it accept database.schema.tableor whether it means to accept any.prefix.at.all.schema.table. It sounds like more people like the latter,so I'll go with that unless this debate rages on and a different conclusion is reached. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Oct 12, 2021 at 10:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > If the behavior v14 had implemented were "throw an error if the > first word doesn't match the current database name", perhaps nobody > would have questioned it. But that's not what we have. It's fairly > clear that neither you nor Mark thought very much about this case, > let alone tested it. Given that, I am not very pleased that you > are retroactively trying to justify breaking it by claiming that > it was already broken. It's been that way since 7.3 implemented > schemas, more or less, and nobody's complained about it. Therefore > I see little argument for changing that behavior. Changing it in > an already-released branch is especially suspect. Oh, give me a break. The previous behavior obviously hasn't been tested either, and is broken on its face. If someone *had* complained about it, I imagine you would have promptly fixed it and likely back-patched the fix, probably in under 24 hours from the time of the report. I find it difficult to take seriously the contention that anyone is expecting \d dlsgjdsghj.sdhg.l.dsg.jkhsdg.foo.bar to work like \d foo.bar, or that they would even prefer that behavior over an error message. You're carefully avoiding addressing that question in favor of having a discussion of backward compatibility, but a better term for what we're talking about here would be bug-compatibility. -- Robert Haas EDB: http://www.enterprisedb.com
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > On Tue, Oct 12, 2021 at 10:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > If the behavior v14 had implemented were "throw an error if the > > first word doesn't match the current database name", perhaps nobody > > would have questioned it. But that's not what we have. It's fairly > > clear that neither you nor Mark thought very much about this case, > > let alone tested it. Given that, I am not very pleased that you > > are retroactively trying to justify breaking it by claiming that > > it was already broken. It's been that way since 7.3 implemented > > schemas, more or less, and nobody's complained about it. Therefore > > I see little argument for changing that behavior. Changing it in > > an already-released branch is especially suspect. > > Oh, give me a break. The previous behavior obviously hasn't been > tested either, and is broken on its face. If someone *had* complained > about it, I imagine you would have promptly fixed it and likely > back-patched the fix, probably in under 24 hours from the time of the > report. I find it difficult to take seriously the contention that > anyone is expecting \d dlsgjdsghj.sdhg.l.dsg.jkhsdg.foo.bar to work > like \d foo.bar, or that they would even prefer that behavior over an > error message. You're carefully avoiding addressing that question in > favor of having a discussion of backward compatibility, but a better > term for what we're talking about here would be bug-compatibility. I tend to agree with Robert on this particular case. Accepting random nonsense there isn't a feature or something which really needs to be preserved. For my 2c, I would hope that one day we will be able to accept other database names there and if that happens, what then? We'd "break" these cases anyway. Better to be clear that such nonsense isn't intended to be accepted and clean that up. I do think it'd be good to accept the current database name there as that's reasonable. Thanks, Stephen
Attachment
On Tue, Oct 12, 2021 at 7:41 AM Robert Haas <robertmhaas@gmail.com> wrote: > Oh, give me a break. The previous behavior obviously hasn't been > tested either, and is broken on its face. If someone *had* complained > about it, I imagine you would have promptly fixed it and likely > back-patched the fix, probably in under 24 hours from the time of the > report. You're asking us to imagine a counterfactual. But this counterfactual bug report would have to describe a real practical problem. The details would matter. It's reasonable to suppose that we haven't seen such a bug report for a reason. I can't speak for Tom. My position on this is that it's better to leave it alone at this time, given the history, and the lack of complaints from users. > I find it difficult to take seriously the contention that > anyone is expecting \d dlsgjdsghj.sdhg.l.dsg.jkhsdg.foo.bar to work > like \d foo.bar, or that they would even prefer that behavior over an > error message. You're carefully avoiding addressing that question in > favor of having a discussion of backward compatibility, but a better > term for what we're talking about here would be bug-compatibility. Let's assume that it is bug compatibility. Is that intrinsically a bad thing? -- Peter Geoghegan
On 10/12/21 5:19 PM, Stephen Frost wrote: > Greetings, > > * Robert Haas (robertmhaas@gmail.com) wrote: >> On Tue, Oct 12, 2021 at 10:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> If the behavior v14 had implemented were "throw an error if the >>> first word doesn't match the current database name", perhaps nobody >>> would have questioned it. But that's not what we have. It's fairly >>> clear that neither you nor Mark thought very much about this case, >>> let alone tested it. Given that, I am not very pleased that you >>> are retroactively trying to justify breaking it by claiming that >>> it was already broken. It's been that way since 7.3 implemented >>> schemas, more or less, and nobody's complained about it. Therefore >>> I see little argument for changing that behavior. Changing it in >>> an already-released branch is especially suspect. >> >> Oh, give me a break. The previous behavior obviously hasn't been >> tested either, and is broken on its face. If someone *had* complained >> about it, I imagine you would have promptly fixed it and likely >> back-patched the fix, probably in under 24 hours from the time of the >> report. I find it difficult to take seriously the contention that >> anyone is expecting \d dlsgjdsghj.sdhg.l.dsg.jkhsdg.foo.bar to work >> like \d foo.bar, or that they would even prefer that behavior over an >> error message. You're carefully avoiding addressing that question in >> favor of having a discussion of backward compatibility, but a better >> term for what we're talking about here would be bug-compatibility. > > I tend to agree with Robert on this particular case. Accepting random > nonsense there isn't a feature or something which really needs to be > preserved. For my 2c, I would hope that one day we will be able to > accept other database names there and if that happens, what then? We'd > "break" these cases anyway. Better to be clear that such nonsense isn't > intended to be accepted and clean that up. I do think it'd be good to > accept the current database name there as that's reasonable. I am going to throw my hat in with Robert and Stephen, too. At least for 15 if we don't want to change this behavior in back branches. -- Vik Fearing
I understand Tom's position to be that the behavior should be changed back, since it was 1) unintentional; and 2) breaks legitimate use (when the datname matches current_database). I think there's an easy answer here that would satisfy everyone; two patches: 0001 to fix the unintentional behavior change; 0002 to reject garbage input: anything with more than 3 dot-separated components, or with 3 components where the first doesn't match current_database. 0001 would be backpatched to v14. If it turns out there's no consensus on 0002, or if it were really hard for some reason, or (more likely) nobody went to the bother to implement it this year, then that's okay. I would prefer if it errored if the datname didn't match the current database. After all, it would've helped me to avoid making a confusing problem report. -- Justin
On Tue, Oct 12, 2021 at 12:44 PM Peter Geoghegan <pg@bowt.ie> wrote: > You're asking us to imagine a counterfactual. But this counterfactual > bug report would have to describe a real practical problem. Yes. And I think this one should be held to the same standard: \d mydb.myschema.mytable not working is potentially a real, practical problem. \d sdlgkjdss.dsgkjsk.sdgskldjgds.myschema.mytable not working isn't. > Let's assume that it is bug compatibility. Is that intrinsically a bad thing? Well my view is that having the same bugs is better than having different ones, but fixing the bugs is superior to either. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Oct 12, 2021 at 12:57 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > I think there's an easy answer here that would satisfy everyone; two patches: > 0001 to fix the unintentional behavior change; > 0002 to reject garbage input: anything with more than 3 dot-separated > components, or with 3 components where the first doesn't match > current_database. > > 0001 would be backpatched to v14. > > If it turns out there's no consensus on 0002, or if it were really hard for > some reason, or (more likely) nobody went to the bother to implement it this > year, then that's okay. This might work, but I fear that 0001 would end up being substantially more complicated than a combined patch that solves both problems together. -- Robert Haas EDB: http://www.enterprisedb.com
> On Oct 12, 2021, at 10:03 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Oct 12, 2021 at 12:57 PM Justin Pryzby <pryzby@telsasoft.com> wrote: >> I think there's an easy answer here that would satisfy everyone; two patches: >> 0001 to fix the unintentional behavior change; >> 0002 to reject garbage input: anything with more than 3 dot-separated >> components, or with 3 components where the first doesn't match >> current_database. >> >> 0001 would be backpatched to v14. >> >> If it turns out there's no consensus on 0002, or if it were really hard for >> some reason, or (more likely) nobody went to the bother to implement it this >> year, then that's okay. > > This might work, but I fear that 0001 would end up being substantially > more complicated than a combined patch that solves both problems > together. Here is a WIP patch that restores the old behavior, just so you can eyeball how large it is. (It passes check-world andI've read it over once, but I'm not ready to stand by this as correct quite yet.) I need to add a regression test tomake sure this behavior is not accidentally changed in the future, and will repost after doing so. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
> On Oct 12, 2021, at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Oct 12, 2021 at 12:44 PM Peter Geoghegan <pg@bowt.ie> wrote: >> You're asking us to imagine a counterfactual. But this counterfactual >> bug report would have to describe a real practical problem. > > Yes. And I think this one should be held to the same standard: \d > mydb.myschema.mytable not working is potentially a real, practical > problem. \d sdlgkjdss.dsgkjsk.sdgskldjgds.myschema.mytable not working > isn't. I favor restoring the v13 behavior, but I don't think \d mydb.myschema.mytable was ever legitimate. You got exactly thesame results with \d nosuchdb.myschema.mytable, meaning the user was given a false sense of security that the databasename was being used to fetch the definition from the database they specified. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Oct 12, 2021 at 1:18 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > Here is a WIP patch that restores the old behavior, just so you can eyeball how large it is. I guess that's not that bad. Why did we end up with the behavior that the current comment describes this way? "(Additional dots in the name portion are not treated as special.)" I thought there was some reason why it needed to work that way. -- Robert Haas EDB: http://www.enterprisedb.com
> On Oct 12, 2021, at 10:54 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Oct 12, 2021 at 1:18 PM Mark Dilger > <mark.dilger@enterprisedb.com> wrote: >> Here is a WIP patch that restores the old behavior, just so you can eyeball how large it is. > > I guess that's not that bad. Why did we end up with the behavior that > the current comment describes this way? > > "(Additional dots in the name portion are not treated as special.)" > > I thought there was some reason why it needed to work that way. We're not talking about the parsing of string literals, but rather about the parsing of shell-style patterns. The primarycaller of this logic is processSQLNamePattern(), which expects only a relname or a (schema,relname) pair, not databasenames nor anything else. The pattern myschema.my.*table is not a three-part pattern, but a two part pattern, with a literal schema name and a relationname pattern. In v14 it can be seen to work as follows: \d pg_toast.pg_.oast_2619 TOAST table "pg_toast.pg_toast_2619" Column | Type ------------+--------- chunk_id | oid chunk_seq | integer chunk_data | bytea Owning table: "pg_catalog.pg_statistic" Indexes: "pg_toast_2619_index" PRIMARY KEY, btree (chunk_id, chunk_seq) \d pg_toast.pg_.*_2619 TOAST table "pg_toast.pg_toast_2619" Column | Type ------------+--------- chunk_id | oid chunk_seq | integer chunk_data | bytea Owning table: "pg_catalog.pg_statistic" Indexes: "pg_toast_2619_index" PRIMARY KEY, btree (chunk_id, chunk_seq) In v13, neither of those matched anything (which is defensible, I guess) but the following did match, which is really nuts: +CREATE SCHEMA g_; +CREATE TABLE g_.oast_2619 (i integer); +\d pg_toast..g_.oast_2619 + Table "g_.oast_2619" + Column | Type | Collation | Nullable | Default +--------+---------+-----------+----------+--------- + i | integer | | | The behavior Justin reported in the original complaint was \d regresion.public.bit_defaults, which gets handled as schema=~ /^(regresion)$/ and relname =~ /^(public.bit_defaults)$/. That gives no results for him, but I tend to think noresults is defensible. Apparently, this behavior breaks an old bug, and we need to restore the old bug and then debate this behavioral change forv15. I'd rather people had engaged in the discussion about this feature during the v14 cycle, since this patch was postedand reviewed on -hackers, and I don't recall anybody complaining about it. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Oct 12, 2021, at 10:18 AM, Mark Dilger <mark.dilger@enterprisedb.com> wrote: > > Here is a WIP patch that restores the old behavior, just so you can eyeball how large it is. (It passes check-world andI've read it over once, but I'm not ready to stand by this as correct quite yet.) I need to add a regression test tomake sure this behavior is not accidentally changed in the future, and will repost after doing so. I wasn't thinking critically enough about how psql handles \d when I accepted Justin's initial characterization of the bug. The psql client has never thought about the stuff to the left of the schema name as a database name, even if some usersthought about it that way. It also doesn't think about the pattern as a literal string. The psql client's interpretation of the pattern is a bit of a chimera, following shell glob patterns for some things andPOSIX regex rules for others. The reason for that is shell glob stuff gets transliterated into the corresponding POSIXsyntax, but non-shell-glob stuff is left in tact, with the one outlier being dots, which have a very special interpretation. The interpretation of a dot as meaning "match one character" is not a shell glob rule but a regex one, andone that psql never supported because it split the pattern on all dots and threw away stuff to the left. There was thereforenever an opportunity for an unquoted dot to make it through to the POSIX regular expression for processing. Forother regex type stuff, it happily passed it through to the POSIX regex, so that the following examples work even thoughthey contain non-shell-glob regex stuff: v13=# create table ababab (i integer); CREATE TABLE v13=# \dt (ab){3} List of relations Schema | Name | Type | Owner --------+--------+-------+------------- public | ababab | table | mark.dilger (1 row) v13=# \dt pg_catalog.pg_clas{1,2} List of relations Schema | Name | Type | Owner ------------+----------+-------+------------- pg_catalog | pg_class | table | mark.dilger v13=# \dt pg_catalog.pg_[am]{1,3} List of relations Schema | Name | Type | Owner ------------+-------+-------+------------- pg_catalog | pg_am | table | mark.dilger (1 row) Splitting the pattern on all the dots and throwing away any additional leftmost fields is a bug, and when you stop doingthat, passing additional dots through to the POSIX regular expression for processing is the most natural thing to do. This is, in fact, how v14 works. It is a bit debatable whether treating the first dot as a separator and the additionaldots as stuff to be passed through is the right thing, so we could call the v14 behavior a mis-feature, but it'snot as clearcut as the discussion upthread suggested. Reverting to v13 behavior seems wrong, but I'm now uncertain howto proceed. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Oct 12, 2021 at 5:21 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > I wasn't thinking critically enough about how psql handles \d when I accepted Justin's initial characterization of thebug. The psql client has never thought about the stuff to the left of the schema name as a database name, even if someusers thought about it that way. It also doesn't think about the pattern as a literal string. I agree. > The psql client's interpretation of the pattern is a bit of a chimera, following shell glob patterns for some things andPOSIX regex rules for others. Yes. And that's pretty weird, but it's long-established precedent so we have to deal with it. > Splitting the pattern on all the dots and throwing away any additional leftmost fields is a bug, ... I also agree with you right up to here. > and when you stop doing that, passing additional dots through to the POSIX regular expression for processing is the mostnatural thing to do. This is, in fact, how v14 works. It is a bit debatable whether treating the first dot as a separatorand the additional dots as stuff to be passed through is the right thing, so we could call the v14 behavior a mis-feature,but it's not as clearcut as the discussion upthread suggested. Reverting to v13 behavior seems wrong, but I'mnow uncertain how to proceed. But not this part, or at least not entirely. If we pass the dots through to the POSIX regular expression, we can only do that either for the table name or the schema name, not both - either the first or last dot must mark the boundary between the two. That means that you can't use all the same regexy things for one as you can for the other, which is a strange system. I knew that your patch made it do that, and I committed it that way because I didn't think it really mattered, and also because the whole system is already pretty strange, so what's one more bit of strangeness? I think there are at least 3 defensible behaviors here: 1. Leave it like it is. If there is more than one dot, the extra ones are part of one of the regex-glob thingies. 2. If there is more than one dot, error! Tell the user they messed up. 3. If there are exactly two dots, treat it as db-schema-user. Accept it if the dbname matches the current db, and otherwise say we can't access the named db. If there are more than two dots, then (a) it's an error as in (2) or (b) the extra ones become part of the regex-glob thingies as in (2). The thing that's unprincipled about (3) is that we can't support a regexp-glob thingy there -- we can only test for a literal string match. And I already said what I thought was wrong with (1). But none of these are horrible, and I don't think it really matters which one we adopt. I don't even know if I can really rank the choices I just listed against each other. Before I was arguing for (3a) but I'm not sure I actually like that one particularly better. -- Robert Haas EDB: http://www.enterprisedb.com
> On Oct 13, 2021, at 6:24 AM, Robert Haas <robertmhaas@gmail.com> wrote: > >> and when you stop doing that, passing additional dots through to the POSIX regular expression for processing is the mostnatural thing to do. This is, in fact, how v14 works. It is a bit debatable whether treating the first dot as a separatorand the additional dots as stuff to be passed through is the right thing, so we could call the v14 behavior a mis-feature,but it's not as clearcut as the discussion upthread suggested. Reverting to v13 behavior seems wrong, but I'mnow uncertain how to proceed. > > But not this part, or at least not entirely. > > If we pass the dots through to the POSIX regular expression, we can > only do that either for the table name or the schema name, not both - Agreed. > either the first or last dot must mark the boundary between the two. > That means that you can't use all the same regexy things for one as > you can for the other, which is a strange system. The closest analogy is how regular expressions consider \1 \2 .. \9 as backreferences, but \10 \11 ... are dependent on context: "A multi-digit sequence not starting with a zero is taken as a back reference if it comes after a suitable subexpression(i.e., the number is in the legal range for a back reference), and otherwise is taken as octal." Taking a dotas a separator if it can be taken that way, and as a regex character otherwise, is not totally out of line with existingprecedent. On the other hand, the backreference vs. octal precedent is not one I particularly like. > I knew that your > patch made it do that, and I committed it that way because I didn't > think it really mattered, and also because the whole system is already > pretty strange, so what's one more bit of strangeness? > > I think there are at least 3 defensible behaviors here: > > 1. Leave it like it is. If there is more than one dot, the extra ones > are part of one of the regex-glob thingies. > > 2. If there is more than one dot, error! Tell the user they messed up. I don't like the backward compatibility issues with this one. Justin's use of database.schema.relname will work up untilv14 (by throwing away the database part), then draw an error in v14, then (assuming we support the database portionin v15 onward) start working again. > 3. If there are exactly two dots, treat it as db-schema-user. Accept > it if the dbname matches the current db, and otherwise say we can't > access the named db. If there are more than two dots, then (a) it's an > error as in (2) or (b) the extra ones become part of the regex-glob > thingies as in (2). 3a is a bit strange, when considered in the context of patterns. If db1, db2, and db3 all exist and each have a table foo.bar,and psql is connected to db1, how should the command \d db?.foo.bar behave? We have no problem with db1.foo.bar,but we do have problems with the other two. If the answer is to complain about the databases that are unconnected,consider what happens if the user writes this in a script when only db1 exists, and later the script stops workingbecause somebody created database db2. Maybe that's not completely horrible, but surely it is less than ideal. 3b is what pg_amcheck does. It accepts database.schema.relname, and it will complain if no matching database/schema/relationcan be found (unless --no-strict-names was given.) — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 13, 2021 at 10:40 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > 3a is a bit strange, when considered in the context of patterns. If db1, db2, and db3 all exist and each have a tablefoo.bar, and psql is connected to db1, how should the command \d db?.foo.bar behave? We have no problem with db1.foo.bar,but we do have problems with the other two. If the answer is to complain about the databases that are unconnected,consider what happens if the user writes this in a script when only db1 exists, and later the script stops workingbecause somebody created database db2. Maybe that's not completely horrible, but surely it is less than ideal. > > 3b is what pg_amcheck does. It accepts database.schema.relname, and it will complain if no matching database/schema/relationcan be found (unless --no-strict-names was given.) Well, like I said, we can't treat a part that's purportedly a DB name as a pattern, so when connected to db1, I presume the command \d db?.foo.bar would have to behave just like \d dskjlglsghdksgdjkshg.foo.bar. I suppose technically I'm wrong: db? could be matched against the list of database names as a pattern, and then we could complain only if it doesn't match exactly and only the current DB. But I don't like adding a bunch of extra code to accomplish nothing useful, so if we're going to match it all I think it should just strcmp(). But I'm still not sure what the best thing to do overall is here. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Oct 12, 2021 at 12:57 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > I would prefer if it errored if the datname didn't match the current database. > After all, it would've helped me to avoid making a confusing problem report. How would you have felt if it had said something like: error: argument to \d should be of the form [schema-name-pattern.]relation-name-pattern Would that have been better or worse for you than accepting a third part of the pattern as a database name if and only if it matched the current database name exactly? -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Oct 13, 2021 at 12:54 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > It seems unfortunate if names from log messages qualified with datname were now > rejected. Like this one: > > | automatic analyze of table "ts.child.cdrs_2021_10_12"... That's a good argument, IMHO. -- Robert Haas EDB: http://www.enterprisedb.com
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > On Wed, Oct 13, 2021 at 12:54 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > It seems unfortunate if names from log messages qualified with datname were now > > rejected. Like this one: > > > > | automatic analyze of table "ts.child.cdrs_2021_10_12"... > > That's a good argument, IMHO. Agreed. Thanks, Stephen
Attachment
> On Oct 13, 2021, at 8:43 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Oct 13, 2021 at 10:40 AM Mark Dilger > <mark.dilger@enterprisedb.com> wrote: >> 3a is a bit strange, when considered in the context of patterns. If db1, db2, and db3 all exist and each have a tablefoo.bar, and psql is connected to db1, how should the command \d db?.foo.bar behave? We have no problem with db1.foo.bar,but we do have problems with the other two. If the answer is to complain about the databases that are unconnected,consider what happens if the user writes this in a script when only db1 exists, and later the script stops workingbecause somebody created database db2. Maybe that's not completely horrible, but surely it is less than ideal. >> >> 3b is what pg_amcheck does. It accepts database.schema.relname, and it will complain if no matching database/schema/relationcan be found (unless --no-strict-names was given.) > > Well, like I said, we can't treat a part that's purportedly a DB name > as a pattern, so when connected to db1, I presume the command \d > db?.foo.bar would have to behave just like \d > dskjlglsghdksgdjkshg.foo.bar. I suppose technically I'm wrong: db? > could be matched against the list of database names as a pattern, and > then we could complain only if it doesn't match exactly and only the > current DB. But I don't like adding a bunch of extra code to > accomplish nothing useful, so if we're going to match it all I think > it should just strcmp(). > > But I'm still not sure what the best thing to do overall is here. The issue of name parsing impacts pg_dump and pg_dumpall, also. Consider what happens with: pg_dump -t production.critical.secrets > secrets.dump dropdb production In v13, if your default database is "testing", and database "testing" has the same schemas and tables (but not data) as production,you are unhappy. You just dumped a copy of your test data and blew away the production data. You could end up unhappy in v14, if database "testing" has a schema named "production" and a table that matches the pattern/^critical.secrets$/, but otherwise, you'll get an error from pg_dump, "pg_dump: error: no matching tables were found". Neither behavior seems correct. The function where the processing occurs is processSQLNamePattern, which is called by pg_dump, pg_dumpall, and psql. Allthree callers expect processSQLNamePattern to append where-clauses to a buffer, not to execute any sql of its own. Ipropose that processSQLNamePattern return an error code if the pattern contains more than three parts, but otherwise insertthe database portion into the buffer as a "pg_catalog.current_database() OPERATOR(pg_catalog.=) <database>", where<database> is a properly escaped representation of the database portion. Maybe someday we can change that to OPERATOR(pg_catalog.~),but for now we lack the sufficient logic for handling multiple matching database names. (The situationis different for pg_dumpall, as it's using the normal logic for matching a relation name, not for matching a database,and we'd still be fine matching that against a pattern.) For psql and pg_dump, I'm tempted to restrict the database portion (if not quoted) to neither contain shell glob charactersnor POSIX regex characters, and return an error code if any are found, so that the clients can raise an appropriateerror to the user. In psql, this proposal would result in no tables matching \d wrongdb.schema.table, which would differ from v13's behavior. You wouldn't get an error about having specified the wrong database. You'd just get no matching relations. \d??db??.schema.table would complain about the db portion being a pattern. \d "??db??".schema.table would work, assumingyou're connected to a database literally named ??db?? In pg_dumpall, --exclude-database=more.than.one.part would give an error about too many dotted parts rather than simply tryingto exclude the last "part" and silently ignoring the prefix, which I think is what v13's pg_dumpall would do. --exclude-database=db??would work to exclude four character database names beginning in "db". In pg_dump, the -t wrongdb.schema.table would match nothing and give the familiar error "pg_dump: error: no matching tableswere found". pg_dump -t too.many.dotted.names would give a different error about too many parts. pg_dump -t db??.foo.barwould give an error about the database needing to be a literal name rather than a pattern. I don't like your proposal to use a strcmp() rather than a pg_catalog.= match, because it diverges from how the rest of thepattern is treated, including in how encoding settings might interact with the name, needing to be executed on the clientside rather than in the server where the rest of the name resolution is happening. Does this sound like a workable proposal? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 13, 2021 at 4:43 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > The function where the processing occurs is processSQLNamePattern, which is called by pg_dump, pg_dumpall, and psql. Allthree callers expect processSQLNamePattern to append where-clauses to a buffer, not to execute any sql of its own. Ipropose that processSQLNamePattern return an error code if the pattern contains more than three parts, but otherwise insertthe database portion into the buffer as a "pg_catalog.current_database() OPERATOR(pg_catalog.=) <database>", where<database> is a properly escaped representation of the database portion. Maybe someday we can change that to OPERATOR(pg_catalog.~),but for now we lack the sufficient logic for handling multiple matching database names. (The situationis different for pg_dumpall, as it's using the normal logic for matching a relation name, not for matching a database,and we'd still be fine matching that against a pattern.) I agree with matching using OPERATOR(pg_catalog.=) but I think it should be an error, not a silently-return-nothing case. > In pg_dumpall, --exclude-database=more.than.one.part would give an error about too many dotted parts rather than simplytrying to exclude the last "part" and silently ignoring the prefix, which I think is what v13's pg_dumpall would do. --exclude-database=db?? would work to exclude four character database names beginning in "db". Those things sound good. -- Robert Haas EDB: http://www.enterprisedb.com
> On Oct 13, 2021, at 1:43 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote: > > The issue of name parsing impacts pg_dump and pg_dumpall, also. Consider what happens with: > > pg_dump -t production.critical.secrets > secrets.dump > dropdb production > > In v13, if your default database is "testing", and database "testing" has the same schemas and tables (but not data) asproduction, you are unhappy. You just dumped a copy of your test data and blew away the production data. > > You could end up unhappy in v14, if database "testing" has a schema named "production" and a table that matches the pattern/^critical.secrets$/, but otherwise, you'll get an error from pg_dump, "pg_dump: error: no matching tables were found". Neither behavior seems correct. With the attached patch, this scenario results in a "cross-database references are not implemented" error. > The function where the processing occurs is processSQLNamePattern, which is called by pg_dump, pg_dumpall, and psql. Allthree callers expect processSQLNamePattern to append where-clauses to a buffer, not to execute any sql of its own. Ipropose that processSQLNamePattern return an error code if the pattern contains more than three parts, but otherwise insertthe database portion into the buffer as a "pg_catalog.current_database() OPERATOR(pg_catalog.=) <database>", where<database> is a properly escaped representation of the database portion. Maybe someday we can change that to OPERATOR(pg_catalog.~),but for now we lack the sufficient logic for handling multiple matching database names. (The situationis different for pg_dumpall, as it's using the normal logic for matching a relation name, not for matching a database,and we'd still be fine matching that against a pattern.) I ultimately went with your strcmp idea rather than OPERATOR(pg_catalog.=), as rejecting the database name as part of thequery complicates the calling convention for no apparent benefit. I had been concerned about database names that werecollation-wise equal but byte-wise unequal, but it seems we already treat those as distinct database names, so my concernwas unnecessary. We already use strcmp on database names from frontend clients (fe_utils/parallel_slots.c, psql/prompt.c,pg_amcheck.c, pg_dump.c, pg_upgrade/relfilenode.c), from libpq (libpq/hba.c) and from the backend (commands/dbcommands.c,init/postinit.c). I tried testing how this plays out by handing `createdb` the name é (U+00E9 "LATIN SMALL LETTER E WITH ACCUTE") and thenagain the name é (U+0065 "LATIN SMALL LETTER E" followed by U+0301 "COMBINING ACCUTE ACCENT".) That results in two distinctdatabases, not an error about a duplicate database name: # select oid, datname, datdba, encoding, datcollate, datctype from pg_catalog.pg_database where datname IN ('é', 'é'); oid | datname | datdba | encoding | datcollate | datctype -------+---------+--------+----------+-------------+------------- 37852 | é | 10 | 6 | en_US.UTF-8 | en_US.UTF-8 37855 | é | 10 | 6 | en_US.UTF-8 | en_US.UTF-8 (2 rows) But that doesn't seem to prove much, as other tools in my locale don't treat those as equal either. (Testing with perl's"eq" operator, they compare as distinct.) I expected to find regression tests providing better coverage for this somewhere,but did not. Anybody know more about it? > For psql and pg_dump, I'm tempted to restrict the database portion (if not quoted) to neither contain shell glob charactersnor POSIX regex characters, and return an error code if any are found, so that the clients can raise an appropriateerror to the user. With the patch, using pattern characters in an unquoted database portion results in a "database name must be literal" error. Using them in a quoted database name is allowed, but unless you are connected to a database that literally equalsthat name, you will get a "cross-database references are not implemented" error. > In psql, this proposal would result in no tables matching \d wrongdb.schema.table, which would differ from v13's behavior. You wouldn't get an error about having specified the wrong database. You'd just get no matching relations. \d??db??.schema.table would complain about the db portion being a pattern. \d "??db??".schema.table would work, assumingyou're connected to a database literally named ??db?? With the patch, psql will treat \d wrongdb.schema.table as a "cross-database references are not implemented" error. > In pg_dumpall, --exclude-database=more.than.one.part would give an error about too many dotted parts rather than simplytrying to exclude the last "part" and silently ignoring the prefix, which I think is what v13's pg_dumpall would do. --exclude-database=db?? would work to exclude four character database names beginning in "db". The patch implements this. > In pg_dump, the -t wrongdb.schema.table would match nothing and give the familiar error "pg_dump: error: no matching tableswere found". With the patch, pg_dump instead gives a "cross-database references are not implemented" error. > pg_dump -t too.many.dotted.names would give a different error about too many parts. With the patch, pg_dump instead gives a "improper qualified name (too many dotted names)" error. > pg_dump -t db??.foo.bar would give an error about the database needing to be a literal name rather than a pattern. With the patch, pg_dump gives a "database name must be literal" error. This is the only new error message in the patch,which puts a burden on translators, but I didn't see any existing message that would serve. Suggestions welcome. > I don't like your proposal to use a strcmp() rather than a pg_catalog.= match, because it diverges from how the rest ofthe pattern is treated, including in how encoding settings might interact with the name, needing to be executed on theclient side rather than in the server where the rest of the name resolution is happening. Recanted, as discussed above. The patch only changes the behavior of pg_amcheck in that it now rejects patterns with too many parts. Using database patternswas and remains legal for this tool. The patch changes nothing about reindexdb. That's a debatable design choice, but reindexdb doesn't use string_utils's processSQLNamePattern()function as the other tools do, nor does its documentation reference psql's #APP-PSQL-PATTERNS documentation. It's --schema option only takes literal names. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Mark Dilger <mark.dilger@enterprisedb.com> writes: > [ v1-0001-Reject-patterns-with-too-many-parts-or-wrong-db.patch ] This needs a rebase over the recent renaming of our Perl test modules. (Per the cfbot, so do several of your other pending patches.) regards, tom lane
> On Nov 3, 2021, at 12:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Mark Dilger <mark.dilger@enterprisedb.com> writes: >> [ v1-0001-Reject-patterns-with-too-many-parts-or-wrong-db.patch ] > > This needs a rebase over the recent renaming of our Perl test modules. > (Per the cfbot, so do several of your other pending patches.) > > regards, tom lane Thanks for calling my attention to it. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Wed, Oct 13, 2021 at 09:24:53AM -0400, Robert Haas wrote: > > Splitting the pattern on all the dots and throwing away any additional > > leftmost fields is a bug, ... > > I also agree with you right up to here. > > > and when you stop doing that, passing additional dots through to the POSIX > > regular expression for processing is the most natural thing to do. This > > is, in fact, how v14 works. It is a bit debatable whether treating the > > first dot as a separator and the additional dots as stuff to be passed > > through is the right thing, so we could call the v14 behavior a > > mis-feature, but it's not as clearcut as the discussion upthread suggested. > > Reverting to v13 behavior seems wrong, but I'm now uncertain how to > > proceed. > > But not this part, or at least not entirely. > > If we pass the dots through to the POSIX regular expression, we can > only do that either for the table name or the schema name, not both - > either the first or last dot must mark the boundary between the two. > That means that you can't use all the same regexy things for one as > you can for the other, which is a strange system. I knew that your > patch made it do that, and I committed it that way because I didn't > think it really mattered, and also because the whole system is already > pretty strange, so what's one more bit of strangeness? Rather than trying to guess at the meaning of each '.' based on the total string. I wonder, if we could for v15 require '.' to be spelled in longer way if it needs to be treated as part of the regex. Perhaps requiring something like '(.)' be used rather than a bare '.' might be good enough and documenting otherwise it's really a separator? I suppose we could also invent a non-standard class as a stand in like '[::any::]', but that seems kinda weird. I think it might be possible to give better error messages long term if we knew what '.' should mean without looking at the whole thing. Garick
> On Nov 4, 2021, at 6:37 AM, Hamlin, Garick L <ghamlin@isc.upenn.edu> wrote: > >> If we pass the dots through to the POSIX regular expression, we can >> only do that either for the table name or the schema name, not both - >> either the first or last dot must mark the boundary between the two. >> That means that you can't use all the same regexy things for one as >> you can for the other, which is a strange system. I knew that your >> patch made it do that, and I committed it that way because I didn't >> think it really mattered, and also because the whole system is already >> pretty strange, so what's one more bit of strangeness? > > Rather than trying to guess at the meaning of each '.' based on the total > string. I wonder, if we could for v15 require '.' to be spelled in longer way > if it needs to be treated as part of the regex. We're trying to fix an edge case, not change how the basic case works. Most users are accustomed to using patterns fromwithin psql like: \dt myschema.mytable Whatever patch we accept must not break these totally normal and currently working cases. > Perhaps requiring something like '(.)' be used rather than a bare '.' > might be good enough and documenting otherwise it's really a separator? > I suppose we could also invent a non-standard class as a stand in like > '[::any::]', but that seems kinda weird. If I understand you, that would require the above example to be written as: \dt myschema(.)mytable which nobody expects to have to do, and which would be a very significant breaking change in v15. I can't see anything likethat being accepted. > I think it might be possible to give better error messages long term > if we knew what '.' should mean without looking at the whole thing. You quote a portion of an email from Robert. After that email, there were several more, and a new patch. The commit messageof the new patch explains what it does. I wonder if you'd review that message, quoted here, or even better, reviewthe entire patch. Does this seem like an ok fix to you? Subject: [PATCH v2] Reject patterns with too many parts or wrong db Object name patterns used by pg_dump and psql potentially contain multiple parts (dotted names), and nothing prevents users from specifying a name with too many parts, nor specifying a database-qualified name for a database other than the currently connected database. Prior to PostgreSQL version 14, pg_dump, pg_dumpall and psql quietly discarded extra parts of the name on the left. For example, `pg_dump -t` only expected a possibly schema qualified table name, not a database name, and the following command pg_dump -t production.marketing.customers quietly ignored the "production" database name with neither warning nor error. Commit 2c8726c4b0a496608919d1f78a5abc8c9b6e0868 changed the behavior of name parsing. Where names contain more than the maximum expected number of dots, the extra dots on the right were interpreted as part of the name, such that the above example was interpreted as schema=production, relation=marketing.customers. This turns out to be highly unintuitive to users. We've had reports that users sometimes copy-and-paste database- and schema-qualified relation names from the logs. https://www.postgresql.org/message-id/20211013165426.GD27491%40telsasoft.com There is no support for cross database references, but allowing a database qualified pattern when the database portion matches the current database, as in the above report, seems more friendly than rejecting it, so do that. We don't allow the database portion itself to be a pattern, because if it matched more than one database (including the current one), there would be confusion about which database(s) were processed. Consistent with how we allow db.schemapat.relpat in pg_dump and psql, also allow db.schemapat for specifying schemas, as: \dn mydb.myschema in psql and pg_dump --schema=mydb.myschema Fix the pre-v14 behavior of ignoring leading portions of patterns containing too many dotted names, and the v14.0 misfeature of combining trailing portions of such patterns, and instead reject such patterns in all cases by raising an error. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2021-Oct-20, Mark Dilger wrote: > I tried testing how this plays out by handing `createdb` the name é > (U+00E9 "LATIN SMALL LETTER E WITH ACCUTE") and then again the name é > (U+0065 "LATIN SMALL LETTER E" followed by U+0301 "COMBINING ACCUTE > ACCENT".) That results in two distinct databases, not an error about > a duplicate database name: > > # select oid, datname, datdba, encoding, datcollate, datctype from pg_catalog.pg_database where datname IN ('é', 'é'); > oid | datname | datdba | encoding | datcollate | datctype > -------+---------+--------+----------+-------------+------------- > 37852 | é | 10 | 6 | en_US.UTF-8 | en_US.UTF-8 > 37855 | é | 10 | 6 | en_US.UTF-8 | en_US.UTF-8 > (2 rows) > > But that doesn't seem to prove much, as other tools in my locale don't > treat those as equal either. (Testing with perl's "eq" operator, they > compare as distinct.) I expected to find regression tests providing > better coverage for this somewhere, but did not. Anybody know more > about it? I think it would appropriate to normalize identifiers that are going to be stored in catalogs. As presented, this is a bit ridiculous and I see no reason to continue to support it. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Ed is the standard text editor." http://groups.google.com/group/alt.religion.emacs/msg/8d94ddab6a9b0ad3
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > I think it would appropriate to normalize identifiers that are going to > be stored in catalogs. As presented, this is a bit ridiculous and I see > no reason to continue to support it. If we had any sort of convention about the encoding of identifiers stored in shared catalogs, maybe we could do something about that. But we don't, so any change is inevitably going to break someone's use-case. In any case, that seems quite orthogonal to the question of how to treat names with too many dots in them. Considering we are three days out from freezing 14.1, I think it is time to stop the meandering discussion and fix it. And by "fix", I mean revert to the pre-14 behavior. regards, tom lane
On Fri, Nov 5, 2021 at 9:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > In any case, that seems quite orthogonal to the question of how to treat > names with too many dots in them. Considering we are three days out from > freezing 14.1, I think it is time to stop the meandering discussion and > fix it. And by "fix", I mean revert to the pre-14 behavior. I do not think that there is consensus on that proposal. And FWIW, I still oppose it. It's debatable whether this even qualifies as a bug in the first place, and even more debatable whether accepting and ignoring arbitrary garbage is the right solution. -- Robert Haas EDB: http://www.enterprisedb.com
> On Nov 5, 2021, at 6:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: >> I think it would appropriate to normalize identifiers that are going to >> be stored in catalogs. As presented, this is a bit ridiculous and I see >> no reason to continue to support it. > > If we had any sort of convention about the encoding of identifiers stored > in shared catalogs, maybe we could do something about that. But we don't, > so any change is inevitably going to break someone's use-case. I only started the discussion about normalization to demonstrate that existing behavior does not require it. > In any case, that seems quite orthogonal to the question of how to treat > names with too many dots in them. Agreed. > Considering we are three days out from > freezing 14.1, I think it is time to stop the meandering discussion and > fix it. Agreed. > And by "fix", I mean revert to the pre-14 behavior. That's one solution. The patch I posted on October 20, and rebased two days ago, has not received any negative feedback. If you want to revert to pre-14 behavior for 14.1, do you oppose the patch going in for v15? (I'm not taking aposition here, just asking what you'd prefer.) — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On Tue, Dec 21, 2021 at 10:58:39AM -0800, Mark Dilger wrote: > > Rebased patch attached: This version doesn't apply anymore: http://cfbot.cputube.org/patch_36_3367.log === Applying patches on top of PostgreSQL commit ID 5513dc6a304d8bda114004a3b906cc6fde5d6274 === === applying patch ./v3-0001-Reject-patterns-with-too-many-parts-or-wrong-db.patch [...] 1 out of 52 hunks FAILED -- saving rejects to file src/bin/psql/describe.c.rej Could you send a rebased version? In the meantime I will switch the cf entry to Waiting on Author.
> On Jan 15, 2022, at 12:28 AM, Julien Rouhaud <rjuju123@gmail.com> wrote: > > Could you send a rebased version? Yes. Here it is: — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Mon, Jan 17, 2022 at 1:06 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > > On Jan 15, 2022, at 12:28 AM, Julien Rouhaud <rjuju123@gmail.com> wrote: > > Could you send a rebased version? > Yes. Here it is: This is not a full review, but I just noticed that: + * dotcnt: how many separators were parsed from the pattern, by reference. + * Can be NULL. But then: + Assert(dotcnt != NULL); On a related note, it's unclear why you've added three new arguments to processSQLNamePattern() but only one of them gets a mention in the function header comment. It's also pretty clear that the behavior of patternToSQLRegex() is changing, but the function header comments are not. -- Robert Haas EDB: http://www.enterprisedb.com
> On Jan 17, 2022, at 1:54 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > + * dotcnt: how many separators were parsed from the pattern, by reference. > + * Can be NULL. > > But then: > > + Assert(dotcnt != NULL); Removed the "Can be NULL" part, as that use case doesn't make sense. The caller should always care whether the number ofdots was greater than they are prepared to handle. > On a related note, it's unclear why you've added three new arguments > to processSQLNamePattern() but only one of them gets a mention in the > function header comment. Updated the header comments to include all parameters. > It's also pretty clear that the behavior of patternToSQLRegex() is > changing, but the function header comments are not. Updated the header comments for this, too. Also, rebased as necessary: — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Continuing my pass through the "bug fixes" section of the CommitFest, I came upon this patch, which is contested. Here is my attempt to summarize where things stand. As I understand it: - Tom wants to revert to the previous behavior of accepting arbitrary garbage, so that \d slkgjskld.jgdsjhgjklsdhg.saklasgh.foo.bar means \d foo.bar. - I want \d mydb.foo.bar to mean \d foo.bar if the dbname is mydb and report an error otherwise; anything with dots>2 is also an error in my view. - Peter Geoghegan agrees with Tom. - Stephen Frost agrees with me. - Vik Fearing also agrees with me. - Justin Pryzby, who originally discovered the problem, prefers the same behavior that I prefer long-term, but thinks Tom's behavior is better than doing nothing. - Mark Dilger, Isaac Moreland, Garick Hamlin, Alvaro Herrera, and Julien Rouhaud have commented on the thread but have not endorsed either of these dueling proposals. By my count, that's probably a vote of 4-2 in view of the preferred solution, but it depends on whether you could Justin's vote as +1 for my preferred solution or maybe +0.75 or +0.50 or something. At any rate, it's close. If anyone else would like to take a position, please do so in the next few days. If there are no more votes, I'm going to proceed with trying to fix up Mark's patch implementing my preferred solution and getting it committed. Thanks, ...Robert
> On Mar 15, 2022, at 12:27 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > - Justin Pryzby, who originally discovered the problem, prefers the > same behavior that I prefer long-term, but thinks Tom's behavior is > better than doing nothing. > - Mark Dilger, Isaac Moreland, Garick Hamlin, Alvaro Herrera, and > Julien Rouhaud have commented on the thread but have not endorsed > either of these dueling proposals. I vote in favor of committing the patch, though I'd also say it's not super important to me. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 13, 2021 at 11:54:26AM -0500, Justin Pryzby wrote: > It seems unfortunate if names from log messages qualified with datname were now > rejected. Like this one: > > | automatic analyze of table "ts.child.cdrs_2021_10_12"... Mark mentioned this "log message" use case in his proposed commit message, but I wanted to mention what seems like a more important parallel: postgres=# SELECT 'postgres.public.postgres_log'::regclass; regclass | postgres_log postgres=# SELECT 'not.postgres.public.postgres_log'::regclass; ERROR: improper relation name (too many dotted names): not.postgres.public.postgres_log ^ postgres=# SELECT 'not.public.postgres_log'::regclass; ERROR: cross-database references are not implemented: "not.public.postgres_log" I think Mark used this as the model behavior for \d for this patch, which sounds right. Since the "two dot" case wasn't fixed in 14.1 nor 2, it seems better to implement the ultimate, intended behavior now, rather than trying to exactly match what old versions did. I'm of the understanding that's what Mark's patch does, so +1 from me. I don't know how someone upgrading from an old version would know about the change, though (rejecting junk prefixes rather than ignoring them). *If* it were important, it seems like it'd need to be added to the 14.0 release notes.
On Tue, Mar 15, 2022 at 12:31 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
> On Mar 15, 2022, at 12:27 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> - Justin Pryzby, who originally discovered the problem, prefers the
> same behavior that I prefer long-term, but thinks Tom's behavior is
> better than doing nothing.
> - Mark Dilger, Isaac Moreland, Garick Hamlin, Alvaro Herrera, and
> Julien Rouhaud have commented on the thread but have not endorsed
> either of these dueling proposals.
I vote in favor of committing the patch, though I'd also say it's not super important to me.
I'm on board with leaving the v14 change in place - fixing the bug so that a matching database name is accepted (the whole copy-from-logs argument is quite compelling). I'm not too concerned about psql, since \d is mainly used interactively, and since the change will result in errors in pg_dump/pg_restore the usual due diligence for upgrading should handle the necessary tweaks should the case arise where bogus/ignore stuff is present.
David J.
On 2022-01-26 09:04:15 -0800, Mark Dilger wrote: > Also, rebased as necessary: Needs another one: http://cfbot.cputube.org/patch_37_3367.log Marked as waiting-on-author. Greetings, Andres Freund
> On Mar 21, 2022, at 6:12 PM, Andres Freund <andres@anarazel.de> wrote: > > Needs another one: http://cfbot.cputube.org/patch_37_3367.log > > Marked as waiting-on-author. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Mon, Mar 21, 2022 at 9:32 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > [ new patch version ] This patch adds three new arguments to processSQLNamePattern() and documents one of them. It adds three new parameters to patternToSQLRegex() as well, and documents none of them. I think that the text of the comment might need some updating too, in particular the sentence "Additional dots in the name portion are not treated as special." There are no comments explaining the left_is_literal stuff. It appears that your intention here is that if the pattern string supplied by the user contains any of *?|+()[]{}.^\ not surrounded by double-quotes, we signal the caller. Some callers then use this to issue a complaint that the database name must be a literal. To me, this behavior doesn't really make sense. If something is a literal, that means we're not going to interpret the special characters that it contains. Here, we are interpreting the special characters just so we can complain that they exist. It seems to me that a simpler solution would be to not interpret them at all. I attach a patch showing what I mean by that. It just rips out the dbname_is_literal stuff in favor of doing nothing at all. To put the whole thing another way, if the user types "\d }.public.ft", your code wants to complain about the fact that the user is trying to use regular expression characters in a place where they are not allowed to do that. I argue that we should instead just be comparing "}" against the database name and see whether it happens to match. -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
> On Mar 22, 2022, at 11:04 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > This patch adds three new arguments to processSQLNamePattern() and > documents one of them. It adds three new parameters to > patternToSQLRegex() as well, and documents none of them. This next patch adds the missing comments. > I think that > the text of the comment might need some updating too, in particular > the sentence "Additional dots in the name portion are not treated as > special." Changed. > There are no comments explaining the left_is_literal stuff. It appears > that your intention here is that if the pattern string supplied by the > user contains any of *?|+()[]{}.^\ not surrounded by double-quotes, we > signal the caller. Some callers then use this to issue a complaint > that the database name must be a literal. To me, this behavior doesn't > really make sense. If something is a literal, that means we're not > going to interpret the special characters that it contains. Here, we > are interpreting the special characters just so we can complain that > they exist. It seems to me that a simpler solution would be to not > interpret them at all. I attach a patch showing what I mean by that. > It just rips out the dbname_is_literal stuff in favor of doing nothing > at all. To put the whole thing another way, if the user types "\d > }.public.ft", your code wants to complain about the fact that the user > is trying to use regular expression characters in a place where they > are not allowed to do that. I argue that we should instead just be > comparing "}" against the database name and see whether it happens to > match. I think your change is fine, so I've rolled it into this next patch. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Fri, Mar 25, 2022 at 3:42 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > I think your change is fine, so I've rolled it into this next patch. OK, cool. Here are some more comments. In describe.c, why are the various describeWhatever functions returning true when validateSQLNamePattern returns false? It seems to me that they should return false. That would cause exec_command_d() to set status = PSQL_CMD_ERROR, which seems appropriate. I wondered whether we should return PSQL_CMD_ERROR only for database errors, but that doesn't seem to be the case. For example, exec_command_a() sets PSQL_CMD_ERROR for a failure in do_pset(). pg_dump's prohibit_crossdb_refs() has a special case for you are not connected to a database, but psql's validateSQLNamePattern() treats it as an invalid cross-database reference. Maybe that should be consistent, or just the other way around. After all, I would expect pg_dump to just bail out if we lose the database connection, but psql may continue, because we can reconnect. Putting more code into the tool where reconnecting doesn't really make sense seems odd. processSQLNamePattern() documents that dotcnt can be NULL, and then asserts that it isn't. processSQLNamePattern() introduces new local variables schema and name, which account for most of the notational churn in that function. I can't see a reason why those changes are needed. You do test whether the new variables are NULL in a couple of places, but you could equally well test schemavar/namevar/altnamevar directly. Actually, I don't really understand why this function needs any changes other than passing dbnamebuf and dotcnt through to patternToSQLRegex(). Is there a reason? patternToSQLRegex() restructures the system of buffers as well, and I don't understand the purpose of that either. It sort of looks like the idea might be to relax the rule against dbname.relname patterns, but why would we want to do that? If we don't want to do that, why remove the assertion? It is not very nice that patternToSQLRegex() ends up repeating the locution "if (left && want_literal_dbname) appendPQExpBufferChar(&left_literal, '"')" a whole bunch of times. Suppose we remove all that. Then, in the if (!inquotes && ch == '.') case, if left = true, we copy "cp - pattern" bytes starting at "pattern" into the buffer. Wouldn't that accomplish the same thing with less code? -- Robert Haas EDB: http://www.enterprisedb.com
> On Mar 29, 2022, at 8:20 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > In describe.c, why are the various describeWhatever functions > returning true when validateSQLNamePattern returns false? It seems to > me that they should return false. That would cause exec_command_d() to > set status = PSQL_CMD_ERROR, which seems appropriate. I wondered > whether we should return PSQL_CMD_ERROR only for database errors, but > that doesn't seem to be the case. For example, exec_command_a() sets > PSQL_CMD_ERROR for a failure in do_pset(). Yes, I believe you are right. For scripting, the following should echo, but doesn't under the version 7 patch. Fixed inversion 8. % psql -c "\d a.b.c.d" || echo 'error' improper qualified name (too many dotted names): a.b.c.d > pg_dump's prohibit_crossdb_refs() has a special case for you are not > connected to a database, but psql's validateSQLNamePattern() treats it > as an invalid cross-database reference. Maybe that should be > consistent, or just the other way around. After all, I would expect > pg_dump to just bail out if we lose the database connection, but psql > may continue, because we can reconnect. Putting more code into the > tool where reconnecting doesn't really make sense seems odd. Fixed psql in version 8 to issue the appropriate error message, either "You are currently not connected to a database." or"cross-database references are not implemented: %s". That matches the output for pg_dump. > processSQLNamePattern() documents that dotcnt can be NULL, and then > asserts that it isn't. That's ugly. Fixed the documentation in version 8. > processSQLNamePattern() introduces new local variables schema and > name, which account for most of the notational churn in that function. > I can't see a reason why those changes are needed. You do test whether > the new variables are NULL in a couple of places, but you could > equally well test schemavar/namevar/altnamevar directly. Actually, I > don't really understand why this function needs any changes other than > passing dbnamebuf and dotcnt through to patternToSQLRegex(). Is there > a reason? It looks like overeager optimization to me, to avoid passing buffers to patternToSQLRegex that aren't really wanted, consequentlyasking that function to parse things that the caller doesn't care about. But I don't think the optimizationis worth the git history churn. Removed in version 8. > patternToSQLRegex() restructures the system of buffers as well, and I > don't understand the purpose of that either. It sort of looks like the > idea might be to relax the rule against dbname.relname patterns, but > why would we want to do that? If we don't want to do that, why remove > the assertion? This took a while to answer. I don't remember exactly what I was trying to do here, but it looks like I wanted callers who only want a (possibly database-qualified)schema name to pass that in the (dbnamebuf and) schemabuf, rather than using the (schemabuf and ) namebuf. I obviously didn't finish that conversion, because the clients never got the message. What remained was some rearrangementin patternToSQLRegex which worked but served no purpose. I've reverted the useless refactoring. > It is not very nice that patternToSQLRegex() ends up repeating the > locution "if (left && want_literal_dbname) > appendPQExpBufferChar(&left_literal, '"')" a whole bunch of times. > Suppose we remove all that. Then, in the if (!inquotes && ch == '.') > case, if left = true, we copy "cp - pattern" bytes starting at > "pattern" into the buffer. Wouldn't that accomplish the same thing > with less code? We don't *quite* want the literal left string. If it is quoted, we still want the quotes removed. For example: \d "robert.haas".accounts.acme needs to return robert.haas (without the quotes) as the database name. Likewise, for embedded quotes: \d "robert""haas".accounts.acme needs to return robert"haas, and so forth. I was able to clean up the "if (left && want_literal_dbname)" stuff, though. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Wed, Apr 6, 2022 at 12:07 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > I was able to clean up the "if (left && want_literal_dbname)" stuff, though. I still have a vague feeling that there's probably some way of doing this better, but had more or less resolved to commit this patch as is anyway and had that all queued up. But then I had to go to a meeting and when I came out I discovered that Tom had done this: --- a/src/fe_utils/string_utils.c +++ b/src/fe_utils/string_utils.c @@ -918,8 +918,12 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern, * Convert shell-style 'pattern' into the regular expression(s) we want to * execute. Quoting/escaping into SQL literal format will be done below * using appendStringLiteralConn(). + * + * If the caller provided a schemavar, we want to split the pattern on + * ".", otherwise not. */ - patternToSQLRegex(PQclientEncoding(conn), NULL, &schemabuf, &namebuf, + patternToSQLRegex(PQclientEncoding(conn), NULL, + (schemavar ? &schemabuf : NULL), &namebuf, pattern, force_escape); /* I don't know whether that's a bug fix for the existing code or some new bit of functionality that \dconfig requires and nothing else needs. A related point that I had noticed during review is that these existing tests look pretty bogus: if (namebuf.len > 2) if (schemabuf.len > 2) In the v13 code, these tests occur at a point where we've definitely added ^( to the buffer, but possibly nothing else. But starting in v14 that's no longer the case. So probably this test should be changed somehow. The proposed patch changes these to something like this: + if (schemavar && schemabuf.len > 2) But that doesn't really seem like it's fixing the problem I'm talking about. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > I still have a vague feeling that there's probably some way of doing > this better, but had more or less resolved to commit this patch as is > anyway and had that all queued up. But then I had to go to a meeting > and when I came out I discovered that Tom had done this: Sorry, it didn't occur to me that that would impinge on what you were doing over here ... though in retrospect I should have thought of it. > I don't know whether that's a bug fix for the existing code or some > new bit of functionality that \dconfig requires and nothing else > needs. Well, \dconfig needs it because it would like foo.bar to get processed as just a name. But I think it's a bug fix because as things stood, if the caller doesn't provide a schemavar and the pattern contains a dot, the code just silently throws away the dot and all to the left. That doesn't seem very sane, even if it is a longstanding behavior. regards, tom lane
> On Apr 7, 2022, at 3:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> >> I don't know whether that's a bug fix for the existing code or some >> new bit of functionality that \dconfig requires and nothing else >> needs. > > Well, \dconfig needs it because it would like foo.bar to get processed > as just a name. But I think it's a bug fix because as things stood, > if the caller doesn't provide a schemavar and the pattern contains a > dot, the code just silently throws away the dot and all to the left. > That doesn't seem very sane, even if it is a longstanding behavior. The patch submitted changes processSQLNamePattern() to return a dot count by reference. It's up to the caller to decidewhether to raise an error. If you pass in no schemavar, and you get back dotcnt=2, you know it parsed it as a twopart pattern, and you can pg_fatal(...) or ereport(ERROR, ...) or whatever. It looks like I'll need to post a new version of the patch with an argument telling the function to ignore dots, but I'mnot prepared to say that for sure. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Mark Dilger <mark.dilger@enterprisedb.com> writes: > The patch submitted changes processSQLNamePattern() to return a dot count by reference. It's up to the caller to decidewhether to raise an error. If you pass in no schemavar, and you get back dotcnt=2, you know it parsed it as a twopart pattern, and you can pg_fatal(...) or ereport(ERROR, ...) or whatever. Well, I'm not telling Robert what to do, but I wouldn't accept that API. It requires duplicative error-handling code at every call site and is an open invitation to omitting necessary error checks. Possibly a better idea is to add an enum argument telling the function what to do (parse the whole thing as one name regardless of dots, parse as two names if there's a dot, throw error if there's a dot, etc etc as needed by existing call sites). Perhaps some of the existing arguments could be merged into such an enum, too. regards, tom lane
On Thu, Apr 7, 2022 at 7:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Mark Dilger <mark.dilger@enterprisedb.com> writes: > > The patch submitted changes processSQLNamePattern() to return a dot count by reference. It's up to the caller to decidewhether to raise an error. If you pass in no schemavar, and you get back dotcnt=2, you know it parsed it as a twopart pattern, and you can pg_fatal(...) or ereport(ERROR, ...) or whatever. > > Well, I'm not telling Robert what to do, but I wouldn't accept that > API. It requires duplicative error-handling code at every call site > and is an open invitation to omitting necessary error checks. > > Possibly a better idea is to add an enum argument telling the function > what to do (parse the whole thing as one name regardless of dots, > parse as two names if there's a dot, throw error if there's a dot, > etc etc as needed by existing call sites). Perhaps some of the > existing arguments could be merged into such an enum, too. I hadn't considered that approach, but I don't think it works very well, because front-end error handling is so inconsistent. From the patch: + pg_log_error("improper relation name (too many dotted names): %s", pattern); + exit(2); + fatal("improper qualified name (too many dotted names): %s", + cell->val); + pg_log_error("improper qualified name (too many dotted names): %s", + cell->val); + PQfinish(conn); + exit_nicely(1); + pg_log_error("improper qualified name (too many dotted names): %s", + pattern); + termPQExpBuffer(&dbbuf); + return false; Come to think of it, maybe the error text there could stand some bikeshedding, but AFAICS there's not much to be done about the fact that one caller wants pg_log_error + exit(2), another wants fatal(), a third PQfinish(conn) and exit_nicely(), and the last termPQExpBuffer() and return false. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Apr 07, 2022 at 10:26:18PM -0400, Robert Haas wrote: > + pg_log_error("improper relation name (too many dotted names): %s", pattern); > > Come to think of it, maybe the error text there could stand some > bikeshedding, but AFAICS AFAICT the error text deliberately matches this, which I mentioned seems to me the strongest argument for supporting \d datname.nspname.relname ts=# SELECT 'a.a.a.a'::regclass; ERROR: 42601: improper relation name (too many dotted names): a.a.a.a LINE 1: SELECT 'a.a.a.a'::regclass; ^ LOCATION: makeRangeVarFromNameList, namespace.c:3129
On Thu, 7 Apr 2022 at 22:32, Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Apr 7, 2022 at 7:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Possibly a better idea is to add an enum argument telling the function > > what to do (parse the whole thing as one name regardless of dots, > > parse as two names if there's a dot, throw error if there's a dot, > > etc etc as needed by existing call sites). Perhaps some of the > > existing arguments could be merged into such an enum, too. > > AFAICS there's not much to be done about the fact > that one caller wants pg_log_error + exit(2), another wants fatal(), a > third PQfinish(conn) and exit_nicely(), and the last termPQExpBuffer() > and return false. That doesn't seem to be entirely inconsistent with what Tom describes. Instead of "throw an error" the function would return an error and possibly some extra info which the caller would use to handle the error appropriately. It still has the nice property that the decision that it is in fact an error would be made inside the parsing function based on the enum declaring what's intended. And it wouldn't return a possibly bogus parsing with information the caller might use to infer it isn't what was desired (or fail to). -- greg
On Thu, Apr 7, 2022 at 11:40 PM Greg Stark <stark@mit.edu> wrote: > That doesn't seem to be entirely inconsistent with what Tom describes. > Instead of "throw an error" the function would return an error and > possibly some extra info which the caller would use to handle the > error appropriately. I don't personally see how we're going to come out ahead with that approach, but if you or Tom or someone else want to put something together, that's fine with me. I'm not stuck on this approach, I just don't see how we come out ahead with the type of thing you're talking about. I mean we could return the error text, but it's only to a handful of places, so it just doesn't really seem like a win over what the patch is already doing. -- Robert Haas EDB: http://www.enterprisedb.com
> On Apr 8, 2022, at 4:11 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > I don't personally see how we're going to come out ahead with that > approach, but if you or Tom or someone else want to put something > together, that's fine with me. I'm not stuck on this approach, I just > don't see how we come out ahead with the type of thing you're talking > about. I mean we could return the error text, but it's only to a > handful of places, so it just doesn't really seem like a win over what > the patch is already doing. Since there hasn't been any agreement on that point, I've just rebased the patch to apply cleanly against the current master: — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Mon, Apr 18, 2022 at 3:39 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > Since there hasn't been any agreement on that point, I've just rebased the patch to apply cleanly against the current master: This looks OK to me. There may be better ways to do some of it, but there's no rule against further improving the code later. Also, since the issue was introduced in v14, we probably shouldn't wait forever to do something about it. However, there is a procedural issue here now that we are past feature freeze. I think someone could defensibly take any of the following positions: (A) This is a new feature. Wait for v16. (B) This is a bug fix. Commit it now and back-patch to v14. (C) This is a cleanup that is OK to put into v15 even after feature freeze but since it is a behavior change we shouldn't back-patch it. I vote for (C). What do other people think? -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Apr 19, 2022 at 7:00 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Apr 18, 2022 at 3:39 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> Since there hasn't been any agreement on that point, I've just rebased the patch to apply cleanly against the current master:
This looks OK to me. There may be better ways to do some of it, but
there's no rule against further improving the code later. Also, since
the issue was introduced in v14, we probably shouldn't wait forever to
do something about it. However, there is a procedural issue here now
that we are past feature freeze. I think someone could defensibly take
any of the following positions:
(A) This is a new feature. Wait for v16.
(B) This is a bug fix. Commit it now and back-patch to v14.
(C) This is a cleanup that is OK to put into v15 even after feature
freeze but since it is a behavior change we shouldn't back-patch it.
I vote for (C). What do other people think?
I vote for (B). The behavioral change for v14 turns working usage patterns into errors where it should not have. It is a design bug and POLA violation that should be corrected.
"""
such that the above example was
interpreted as schema=production, relation=marketing.customers.
This turns out to be highly unintuitive to users.
interpreted as schema=production, relation=marketing.customers.
This turns out to be highly unintuitive to users.
"""
My concern here about a behavior affecting bug fix - which we allow - is reduced by the fact this feature is almost exclusively an interactive one. Which supports not having only v14, and maybe v15, behave differently than v13 and v16 when it comes to using it for expected usage patterns:
"""
We've had reports that users sometimes copy-and-paste database- and
schema-qualified relation names from the logs.
schema-qualified relation names from the logs.
"""
David J.
On Tue, Apr 19, 2022 at 10:00:01AM -0400, Robert Haas wrote: > (A) This is a new feature. Wait for v16. > (B) This is a bug fix. Commit it now and back-patch to v14. > (C) This is a cleanup that is OK to put into v15 even after feature > freeze but since it is a behavior change we shouldn't back-patch it. > > I vote for (C). What do other people think? I thought the plan was to backpatch to v14. v14 psql had an unintentional behavior change, rejecting \d datname.nspname.relname. This patch is meant to relax that change by allowing datname, but only if it matches the name of the current database ... without returning to the v13 behavior, which allowed arbitrary leading junk. -- Justin
Justin Pryzby <pryzby@telsasoft.com> writes: > On Tue, Apr 19, 2022 at 10:00:01AM -0400, Robert Haas wrote: >> (A) This is a new feature. Wait for v16. >> (B) This is a bug fix. Commit it now and back-patch to v14. >> (C) This is a cleanup that is OK to put into v15 even after feature >> freeze but since it is a behavior change we shouldn't back-patch it. >> I vote for (C). What do other people think? > I thought the plan was to backpatch to v14. > v14 psql had an unintentional behavior change, rejecting \d > datname.nspname.relname. I agree that the v14 behavior is a bug, so ordinarily I'd vote for back-patching. A possible objection to doing that is that the patch changes the APIs of processSQLNamePattern and patternToSQLRegex. We would avoid making such a change in core-backend APIs in a minor release, but I'm not certain whether there are equivalent stability concerns for src/fe_utils/. On the whole I'd vote for (B), with (C) as second choice. regards, tom lane
On 4/19/22 16:00, Robert Haas wrote: > On Mon, Apr 18, 2022 at 3:39 PM Mark Dilger > <mark.dilger@enterprisedb.com> wrote: >> Since there hasn't been any agreement on that point, I've just rebased the patch to apply cleanly against the currentmaster: > > This looks OK to me. There may be better ways to do some of it, but > there's no rule against further improving the code later. Also, since > the issue was introduced in v14, we probably shouldn't wait forever to > do something about it. However, there is a procedural issue here now > that we are past feature freeze. I think someone could defensibly take > any of the following positions: > > (A) This is a new feature. Wait for v16. > (B) This is a bug fix. Commit it now and back-patch to v14. > (C) This is a cleanup that is OK to put into v15 even after feature > freeze but since it is a behavior change we shouldn't back-patch it. > > I vote for (C). What do other people think? I vote for (B). -- Vik Fearing
> On Apr 19, 2022, at 7:00 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > (A) This is a new feature. Wait for v16. > (B) This is a bug fix. Commit it now and back-patch to v14. > (C) This is a cleanup that is OK to put into v15 even after feature > freeze but since it is a behavior change we shouldn't back-patch it. > > I vote for (C). What do other people think? Looks like most people voted for (B). In support of that option, here are patches for master and REL_14_STABLE. Note thatI extended the tests compared to v9, which found a problem that is fixed for v10: — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Tue, Apr 19, 2022 at 10:20 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > Looks like most people voted for (B). In support of that option, here are patches for master and REL_14_STABLE. Notethat I extended the tests compared to v9, which found a problem that is fixed for v10: OK, I committed these. I am not totally sure we've got all the problems sorted here, but I don't think that continuing to not commit anything is going to be better, so here we go. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Apr 21, 2022 at 3:55 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Apr 19, 2022 at 10:20 PM Mark Dilger > <mark.dilger@enterprisedb.com> wrote: > > Looks like most people voted for (B). In support of that option, here are patches for master and REL_14_STABLE. Notethat I extended the tests compared to v9, which found a problem that is fixed for v10: > > OK, I committed these. I am not totally sure we've got all the > problems sorted here, but I don't think that continuing to not commit > anything is going to be better, so here we go. Looks like this somehow broke on a Windows box: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2022-04-20%2016%3A34%3A19 [14:05:49.729](0.001s) not ok 16 - pg_dumpall: option --exclude-database rejects multipart pattern ".*": matches [14:05:49.730](0.000s) [14:05:49.730](0.000s) # Failed test 'pg_dumpall: option --exclude-database rejects multipart pattern ".*": matches' # at t/002_pg_dump.pl line 3985. [14:05:49.730](0.000s) # 'pg_dumpall: error: improper qualified name (too many dotted names): .gitignore # ' # doesn't match '(?^:pg_dumpall: error: improper qualified name \\(too many dotted names\\): \\.\\*)'
On Wed, Apr 20, 2022 at 3:08 PM Thomas Munro <thomas.munro@gmail.com> wrote: > Looks like this somehow broke on a Windows box: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2022-04-20%2016%3A34%3A19 So the issue here is that we are running this command: pg_dumpall --exclude-database .* And on that Windows machine, .* is being expanded to .gitignore, so pg_dumpall prints: pg_dumpall: error: improper qualified name (too many dotted names): .gitignore Instead of: pg_dumpall: error: improper qualified name (too many dotted names): .* I don't know why that glob-expansion only happens on jacana, and I don't know how to fix it, either. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Apr 21, 2022 at 7:35 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Apr 20, 2022 at 3:08 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > Looks like this somehow broke on a Windows box: > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2022-04-20%2016%3A34%3A19 > > So the issue here is that we are running this command: > > pg_dumpall --exclude-database .* > > And on that Windows machine, .* is being expanded to .gitignore, so > pg_dumpall prints: > > pg_dumpall: error: improper qualified name (too many dotted names): .gitignore > > Instead of: > > pg_dumpall: error: improper qualified name (too many dotted names): .* > > I don't know why that glob-expansion only happens on jacana, and I > don't know how to fix it, either. Perhaps bowerbird and jacana have different versions of IPC::Run? I see some recent-ish changes to escaping logic in here from Noah: https://github.com/toddr/IPC-Run/commits/master/lib/IPC/Run/Win32Helper.pm Looks like the older version looks for meta characters not including '*', and the later one uses Win32::ShellQuote::quote_native.
On Thu, Apr 21, 2022 at 8:38 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Thu, Apr 21, 2022 at 7:35 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Apr 20, 2022 at 3:08 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > Looks like this somehow broke on a Windows box: > > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2022-04-20%2016%3A34%3A19 > > > > So the issue here is that we are running this command: > > > > pg_dumpall --exclude-database .* > > > > And on that Windows machine, .* is being expanded to .gitignore, so > > pg_dumpall prints: > > > > pg_dumpall: error: improper qualified name (too many dotted names): .gitignore > > > > Instead of: > > > > pg_dumpall: error: improper qualified name (too many dotted names): .* > > > > I don't know why that glob-expansion only happens on jacana, and I > > don't know how to fix it, either. > > Perhaps bowerbird and jacana have different versions of IPC::Run? I > see some recent-ish changes to escaping logic in here from Noah: > > https://github.com/toddr/IPC-Run/commits/master/lib/IPC/Run/Win32Helper.pm > > Looks like the older version looks for meta characters not including > '*', and the later one uses Win32::ShellQuote::quote_native. This time with Andrew in CC.