Thread: ProcessStartupPacket(): database_name and user_name truncation
Hi hackers, Please find attached a patch to truncate (in ProcessStartupPacket()) the port->database_name and port->user_name in such a way to not break multibyte character boundary. Indeed, currently, one could create a database that way: postgres=# create database ääääääääääääääääääääääääääääääää; NOTICE: identifier "ääääääääääääääääääääääääääääääää" will be truncated to "äääääääääääääääääääääääääääääää" CREATE DATABASE The database name has been truncated from 64 bytes to 62 bytes thanks to pg_mbcliplen() which ensures to not break multibyte character boundary. postgres=# select datname, OCTET_LENGTH(datname),encoding from pg_database; datname | octet_length | encoding ---------------------------------+--------------+---------- äääääääääääääääääääääääääääääää | 62 | 6 Trying to connect with the 64 bytes name: $ psql -d ääääääääääääääääääääääääääääääää psql: error: connection to server on socket "/tmp/.s.PGSQL.55448" failed: FATAL: database "äääääääääääääääääääääääääääääää"does not exist It fails because the truncation done in ProcessStartupPacket(): " if (strlen(port→database_name) >= NAMEDATALEN) port→database_name[NAMEDATALEN - 1] = '\0'; " does not take care about multibyte character boundary. On the other hand it works with non multibyte character involved: postgres=# create database abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijke; NOTICE: identifier "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijke" will be truncated to "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijk" CREATE DATABASE postgres=# select datname, OCTET_LENGTH(datname),encoding from pg_database; datname | octet_length | encoding -----------------------------------------------------------------+--------------+---------- abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijk | 63 | 6 The database name is truncated to 63 bytes and then using the 64 bytes name would work: $ psql -d abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijke psql (16beta1) Type "help" for help. The comment in ProcessStartupPacket() states: " /* * Truncate given database and user names to length of a Postgres name. * This avoids lookup failures when overlength names are given. */ " The last sentence is not right in case of mutlibyte character (as seen in the first example). About the patch: As the database encoding is not known yet in ProcessStartupPacket() ( and we are even not sure the database provided does exist), the proposed patch does not rely on pg_mbcliplen() but on pg_encoding_mbcliplen(). The proposed patch does use the client encoding that it retrieves that way: - use the one requested in the startup packet (if we come across it) - use the one from the locale (if we did not find a client encoding request in the startup packet) - use PG_SQL_ASCII (if none of the above have been satisfied) Happy to discuss any other thoughts or suggestions if any. With the proposed patch in place, using the first example above (and the 64 bytes name) we would get: $ PGCLIENTENCODING=LATIN1 psql -d ääääääääääääääääääääääääääääääää psql: error: connection to server on socket "/tmp/.s.PGSQL.55448" failed: FATAL: database "äääääääääääääääääääääääääääääää"does not exist but this one would allow us to connect: $ PGCLIENTENCODING=UTF8 psql -d ääääääääääääääääääääääääääääääää psql (16beta1) Type "help" for help. The patch does not provide documentation update or related TAP test (but could be added if we feel the need). Looking forward to your feedback, Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
At Wed, 21 Jun 2023 09:43:50 +0200, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in > Trying to connect with the 64 bytes name: > > $ psql -d ääääääääääääääääääääääääääääääää > psql: error: connection to server on socket "/tmp/.s.PGSQL.55448" > failed: FATAL: database "äääääääääääääääääääääääääääääää" does not > exist IMHO, I'm not sure we should allow connections without the exact name being provided. In that sense, I think we might want to consider outright rejecting the estblishment of a connection when the given database name doesn't fit the startup packet, since the database with the exact given name cannot be found. While it is somewhat off-topic, I cannot establish a connection if the console encoding differs from the template database even if I provide the identical database name. (I don't mean I want that behavior to be "fix"ed.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > At Wed, 21 Jun 2023 09:43:50 +0200, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in >> Trying to connect with the 64 bytes name: >> $ psql -d ääääääääääääääääääääääääääääääää >> psql: error: connection to server on socket "/tmp/.s.PGSQL.55448" >> failed: FATAL: database "äääääääääääääääääääääääääääääää" does not >> exist > IMHO, I'm not sure we should allow connections without the exact name > being provided. In that sense, I think we might want to consider > outright rejecting the estblishment of a connection when the given > database name doesn't fit the startup packet, since the database with > the exact given name cannot be found. I think I agree. I don't like the proposed patch at all, because it's making completely unsupportable assumptions about what encoding the names are given in. Simply failing to match when a name is overlength sounds safer. (Our whole story about what is the encoding of names in shared catalogs is a mess. But this particular point doesn't seem like the place to start if you want to clean that up.) regards, tom lane
Hi, On 6/21/23 3:43 PM, Tom Lane wrote: > Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: >> At Wed, 21 Jun 2023 09:43:50 +0200, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in >>> Trying to connect with the 64 bytes name: >>> $ psql -d ääääääääääääääääääääääääääääääää >>> psql: error: connection to server on socket "/tmp/.s.PGSQL.55448" >>> failed: FATAL: database "äääääääääääääääääääääääääääääää" does not >>> exist > >> IMHO, I'm not sure we should allow connections without the exact name >> being provided. In that sense, I think we might want to consider >> outright rejecting the estblishment of a connection when the given >> database name doesn't fit the startup packet, since the database with >> the exact given name cannot be found. > > I think I agree. I don't like the proposed patch at all, because it's > making completely unsupportable assumptions about what encoding the > names are given in. Simply failing to match when a name is overlength > sounds safer. > Yeah, that's another and "cleaner" option. I'll propose a patch to make it failing even for the non multibyte case then ( so that multibyte and non multibyte behaves the same aka failing in case of overlength name is detected). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Jun 21, 2023 at 09:43:38AM -0400, Tom Lane wrote: > Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: >> IMHO, I'm not sure we should allow connections without the exact name >> being provided. In that sense, I think we might want to consider >> outright rejecting the estblishment of a connection when the given >> database name doesn't fit the startup packet, since the database with >> the exact given name cannot be found. > > I think I agree. I don't like the proposed patch at all, because it's > making completely unsupportable assumptions about what encoding the > names are given in. Simply failing to match when a name is overlength > sounds safer. +1. Even if these assumptions were supportable, IMHO it's probably not worth the added complexity to keep the truncation consistent with CREATE ROLE/DATABASE. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On 6/21/23 4:22 PM, Drouvot, Bertrand wrote: > Hi, > > On 6/21/23 3:43 PM, Tom Lane wrote: >> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: >>> At Wed, 21 Jun 2023 09:43:50 +0200, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in >>>> Trying to connect with the 64 bytes name: >>>> $ psql -d ääääääääääääääääääääääääääääääää >>>> psql: error: connection to server on socket "/tmp/.s.PGSQL.55448" >>>> failed: FATAL: database "äääääääääääääääääääääääääääääää" does not >>>> exist >> >>> IMHO, I'm not sure we should allow connections without the exact name >>> being provided. In that sense, I think we might want to consider >>> outright rejecting the estblishment of a connection when the given >>> database name doesn't fit the startup packet, since the database with >>> the exact given name cannot be found. >> >> I think I agree. I don't like the proposed patch at all, because it's >> making completely unsupportable assumptions about what encoding the >> names are given in. Simply failing to match when a name is overlength >> sounds safer. >> > > Yeah, that's another and "cleaner" option. > > I'll propose a patch to make it failing even for the non multibyte case then ( > so that multibyte and non multibyte behaves the same aka failing in case of overlength > name is detected). Please find attached a patch doing so (which is basically a revert of d18c1d1f51). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Wed, Jun 21, 2023 at 09:02:49PM +0200, Drouvot, Bertrand wrote: > Please find attached a patch doing so (which is basically a revert of d18c1d1f51). LGTM. I think this can wait for v17 since the current behavior has been around since 2001 and AFAIK this is the first report. While it's arguably a bug fix, the patch also breaks some cases that work today. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Wed, Jun 21, 2023 at 12:55:15PM -0700, Nathan Bossart wrote: > LGTM. I think this can wait for v17 since the current behavior has been > around since 2001 and AFAIK this is the first report. While it's arguably > a bug fix, the patch also breaks some cases that work today. Agreed that anything discussed on this thread does not warrant a backpatch. -- Michael
Attachment
Hi, On 6/22/23 1:37 AM, Michael Paquier wrote: > On Wed, Jun 21, 2023 at 12:55:15PM -0700, Nathan Bossart wrote: >> LGTM. I think this can wait for v17 since the current behavior has been >> around since 2001 and AFAIK this is the first report. While it's arguably >> a bug fix, the patch also breaks some cases that work today. > > Agreed that anything discussed on this thread does not warrant a > backpatch. Fully agree, the CF entry [1] has been tagged as "Target Version 17". [1] https://commitfest.postgresql.org/43/4383/ Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
After taking another look at this, I wonder if it'd be better to fail as soon as we see the database or user name is too long instead of lugging them around when authentication is destined to fail. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Nathan Bossart <nathandbossart@gmail.com> writes: > After taking another look at this, I wonder if it'd be better to fail as > soon as we see the database or user name is too long instead of lugging > them around when authentication is destined to fail. If we're agreed that we aren't going to truncate these identifiers, that seems like a reasonable way to handle it. regards, tom lane
Hi, On 6/30/23 5:54 PM, Tom Lane wrote: > Nathan Bossart <nathandbossart@gmail.com> writes: >> After taking another look at this, I wonder if it'd be better to fail as >> soon as we see the database or user name is too long instead of lugging >> them around when authentication is destined to fail. > > If we're agreed that we aren't going to truncate these identifiers, > that seems like a reasonable way to handle it. > Yeah agree, thanks Nathan for the idea. I'll work on a new patch version proposal. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On 6/30/23 7:32 PM, Drouvot, Bertrand wrote: > Hi, > > On 6/30/23 5:54 PM, Tom Lane wrote: >> Nathan Bossart <nathandbossart@gmail.com> writes: >>> After taking another look at this, I wonder if it'd be better to fail as >>> soon as we see the database or user name is too long instead of lugging >>> them around when authentication is destined to fail. >> >> If we're agreed that we aren't going to truncate these identifiers, >> that seems like a reasonable way to handle it. >> > > Yeah agree, thanks Nathan for the idea. > I'll work on a new patch version proposal. > Please find V2 attached where it's failing as soon as the database name or user name are detected as overlength. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
At Fri, 30 Jun 2023 19:32:50 +0200, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in > Hi, > > On 6/30/23 5:54 PM, Tom Lane wrote: > > Nathan Bossart <nathandbossart@gmail.com> writes: > >> After taking another look at this, I wonder if it'd be better to fail > >> as > >> soon as we see the database or user name is too long instead of > >> lugging > >> them around when authentication is destined to fail. For the record, if I understand Nathan correctly, it is what I suggested in my initial post. If this is correct, +1 for the suggestion. me> I think we might want to consider outright rejecting the me> estblishment of a connection when the given database name doesn't me> fit the startup packet > > If we're agreed that we aren't going to truncate these identifiers, > > that seems like a reasonable way to handle it. > > > > Yeah agree, thanks Nathan for the idea. > I'll work on a new patch version proposal. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Mon, 03 Jul 2023 10:50:45 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > For the record, if I understand Nathan correctly, it is what I > suggested in my initial post. If this is correct, +1 for the suggestion. > > me> I think we might want to consider outright rejecting the > me> estblishment of a connection when the given database name doesn't > me> fit the startup packet Mmm. It's bit wrong. "doesn't fit the startup packet" is "is long as a database name". At Sat, 1 Jul 2023 16:02:06 +0200, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in > Please find V2 attached where it's failing as soon as the database > name or > user name are detected as overlength. I find another errocde "ERRCODE_INVALID_ROLE_SPECIFICATION". I don't find a clear distinction between the usages of the two, but I think .._ROLE_.. might be a better fit. ERRCODE_INVALID_ROLE_SPACIFICATION: auth.c:1507: "could not transnlate name" auth.c:1526: "could not translate name" auth.c:1539: "realm name too long" auth.c:1554: "translated account name too long" ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION: postmaster.c:2268: "no PostgreSQL user name specified in startup packet" miscinit.c:756: "role \"%s\" does not exist" miscinit.c:764: "role with OID %u does not exist" miscinit.c:794: "role \"%s\" is not permitted to log in" auth.c:420: "connection requires a valid client certificate" auth.c:461,468,528,536: "pg_hba.conf rejects ..." auth.c:878: MD5 authentication is not supported when \"db_user_namespace\" is enabled" auth-scram.c:1016: "SCRAM channel binding negotiation error" auth-scram.c:1349: "SCRAM channel binding check failed" regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Sat, Jul 01, 2023 at 04:02:06PM +0200, Drouvot, Bertrand wrote: > Please find V2 attached where it's failing as soon as the database name or > user name are detected as overlength. Thanks, Bertrand. I chickened out and ended up committing v1 for now (i.e., simply removing the truncation code). I didn't like the idea of trying to keep the new error messages consistent with code in faraway files, and the startup packet length limit is already pretty aggressive, so I'm a little less concerned about lugging around long names. Plus, I think v2 had some subtle interactions with db_user_namespace (maybe for the better), but I didn't spend too much time looking at that since db_user_namespace will likely be removed soon. If anyone disagrees and wants to see the FATALs emitted from ProcessStartupPacket() directly, please let me know and we can work on adding them in a follow-up patch. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Nathan Bossart <nathandbossart@gmail.com> writes: > Thanks, Bertrand. I chickened out and ended up committing v1 for now > (i.e., simply removing the truncation code). WFM. > If anyone disagrees and wants to see the FATALs emitted from > ProcessStartupPacket() directly, please let me know and we can work on > adding them in a follow-up patch. I think the new behavior is fine. regards, tom lane
Hi, On 7/3/23 10:34 PM, Nathan Bossart wrote: > On Sat, Jul 01, 2023 at 04:02:06PM +0200, Drouvot, Bertrand wrote: >> Please find V2 attached where it's failing as soon as the database name or >> user name are detected as overlength. > > Thanks, Bertrand. I chickened out and ended up committing v1 for now > (i.e., simply removing the truncation code). I didn't like the idea of > trying to keep the new error messages consistent with code in faraway > files, and the startup packet length limit is already pretty aggressive, so > I'm a little less concerned about lugging around long names. Plus, I think > v2 had some subtle interactions with db_user_namespace (maybe for the > better), but I didn't spend too much time looking at that since > db_user_namespace will likely be removed soon. Thanks Nathan for the feedback and explanations, I think that makes fully sense. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com