Thread: ProcessStartupPacket(): database_name and user_name truncation

ProcessStartupPacket(): database_name and user_name truncation

From

"Drouvot, Bertrand"

Date:

21 June 2023, 07:43:50

Hi hackers,

Please find attached a patch to truncate (in ProcessStartupPacket())
the port->database_name and port->user_name in such a way to not break
multibyte character boundary.

Indeed, currently, one could create a database that way:

postgres=# create database ääääääääääääääääääääääääääääääää;
NOTICE:  identifier "ääääääääääääääääääääääääääääääää" will be truncated to "äääääääääääääääääääääääääääääää"
CREATE DATABASE

The database name has been truncated from 64 bytes to 62 bytes thanks to pg_mbcliplen()
which ensures to not break multibyte character boundary.

postgres=# select datname, OCTET_LENGTH(datname),encoding from pg_database;
              datname             | octet_length | encoding
---------------------------------+--------------+----------
  äääääääääääääääääääääääääääääää |           62 |        6

Trying to connect with the 64 bytes name:

$ psql -d ääääääääääääääääääääääääääääääää
psql: error: connection to server on socket "/tmp/.s.PGSQL.55448" failed: FATAL:  database
"äääääääääääääääääääääääääääääää"does not exist
 


It fails because the truncation done in ProcessStartupPacket():

"
if (strlen(port→database_name) >= NAMEDATALEN)
port→database_name[NAMEDATALEN - 1] = '\0';
"

does not take care about multibyte character boundary.

On the other hand it works with non multibyte character involved:

postgres=# create database abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijke;
NOTICE:  identifier "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijke" will be truncated to
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijk"
CREATE DATABASE

postgres=# select datname, OCTET_LENGTH(datname),encoding from pg_database;
                              datname                             | octet_length | encoding
-----------------------------------------------------------------+--------------+----------
  abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijk |           63 |        6

The database name is truncated to 63 bytes and then using the 64 bytes name would work:

$ psql -d abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijke
psql (16beta1)
Type "help" for help.

The comment in ProcessStartupPacket() states:

"
     /*
      * Truncate given database and user names to length of a Postgres name.
      * This avoids lookup failures when overlength names are given.
      */
"

The last sentence is not right in case of mutlibyte character (as seen
in the first example).

About the patch:

As the database encoding is not known yet in ProcessStartupPacket() (
and we are even not sure the database provided does exist), the proposed
patch does not rely on pg_mbcliplen() but on pg_encoding_mbcliplen().

The proposed patch does use the client encoding that it retrieves that way:

- use the one requested in the startup packet (if we come across it)
- use the one from the locale (if we did not find a client encoding request
in the startup packet)
- use PG_SQL_ASCII (if none of the above have been satisfied)

Happy to discuss any other thoughts or suggestions if any.

With the proposed patch in place, using the first example above (and the
64 bytes name) we would get:

$ PGCLIENTENCODING=LATIN1 psql -d ääääääääääääääääääääääääääääääää
psql: error: connection to server on socket "/tmp/.s.PGSQL.55448" failed: FATAL:  database
"äääääääääääääääääääääääääääääää"does not exist
 

but this one would allow us to connect:

$ PGCLIENTENCODING=UTF8 psql -d ääääääääääääääääääääääääääääääää
psql (16beta1)
Type "help" for help.

The patch does not provide documentation update or related TAP test (but could be added
if we feel the need).

Looking forward to your feedback,

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v1-0001-multibyte-truncation-for-database-and-user-name.patch

Re: ProcessStartupPacket(): database_name and user_name truncation

From

Kyotaro Horiguchi

Date:

21 June 2023, 08:54:59

At Wed, 21 Jun 2023 09:43:50 +0200, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in
> Trying to connect with the 64 bytes name:
>
> $ psql -d ääääääääääääääääääääääääääääääää
> psql: error: connection to server on socket "/tmp/.s.PGSQL.55448"
> failed: FATAL: database "äääääääääääääääääääääääääääääää" does not
> exist

IMHO, I'm not sure we should allow connections without the exact name
being provided. In that sense, I think we might want to consider
outright rejecting the estblishment of a connection when the given
database name doesn't fit the startup packet, since the database with
the exact given name cannot be found.

While it is somewhat off-topic, I cannot establish a connection if the
console encoding differs from the template database even if I provide
the identical database name. (I don't mean I want that behavior to be
"fix"ed.)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Re: ProcessStartupPacket(): database_name and user_name truncation

From

Tom Lane

Date:

21 June 2023, 13:43:38

Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> At Wed, 21 Jun 2023 09:43:50 +0200, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in
>> Trying to connect with the 64 bytes name:
>> $ psql -d ääääääääääääääääääääääääääääääää
>> psql: error: connection to server on socket "/tmp/.s.PGSQL.55448"
>> failed: FATAL: database "äääääääääääääääääääääääääääääää" does not
>> exist

> IMHO, I'm not sure we should allow connections without the exact name
> being provided. In that sense, I think we might want to consider
> outright rejecting the estblishment of a connection when the given
> database name doesn't fit the startup packet, since the database with
> the exact given name cannot be found.

I think I agree.  I don't like the proposed patch at all, because it's
making completely unsupportable assumptions about what encoding the
names are given in.  Simply failing to match when a name is overlength
sounds safer.

(Our whole story about what is the encoding of names in shared catalogs
is a mess.  But this particular point doesn't seem like the place to
start if you want to clean that up.)

            regards, tom lane

Re: ProcessStartupPacket(): database_name and user_name truncation

From

"Drouvot, Bertrand"

Date:

21 June 2023, 14:22:47

Hi,

On 6/21/23 3:43 PM, Tom Lane wrote:
> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
>> At Wed, 21 Jun 2023 09:43:50 +0200, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in
>>> Trying to connect with the 64 bytes name:
>>> $ psql -d ääääääääääääääääääääääääääääääää
>>> psql: error: connection to server on socket "/tmp/.s.PGSQL.55448"
>>> failed: FATAL: database "äääääääääääääääääääääääääääääää" does not
>>> exist
> 
>> IMHO, I'm not sure we should allow connections without the exact name
>> being provided. In that sense, I think we might want to consider
>> outright rejecting the estblishment of a connection when the given
>> database name doesn't fit the startup packet, since the database with
>> the exact given name cannot be found.
> 
> I think I agree.  I don't like the proposed patch at all, because it's
> making completely unsupportable assumptions about what encoding the
> names are given in.  Simply failing to match when a name is overlength
> sounds safer.
> 

Yeah, that's another and "cleaner" option.

I'll propose a patch to make it failing even for the non multibyte case then (
so that multibyte and non multibyte behaves the same aka failing in case of overlength
name is detected).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: ProcessStartupPacket(): database_name and user_name truncation

From

Nathan Bossart

Date:

21 June 2023, 15:04:02

On Wed, Jun 21, 2023 at 09:43:38AM -0400, Tom Lane wrote:
> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
>> IMHO, I'm not sure we should allow connections without the exact name
>> being provided. In that sense, I think we might want to consider
>> outright rejecting the estblishment of a connection when the given
>> database name doesn't fit the startup packet, since the database with
>> the exact given name cannot be found.
> 
> I think I agree.  I don't like the proposed patch at all, because it's
> making completely unsupportable assumptions about what encoding the
> names are given in.  Simply failing to match when a name is overlength
> sounds safer.

+1.  Even if these assumptions were supportable, IMHO it's probably not
worth the added complexity to keep the truncation consistent with CREATE
ROLE/DATABASE.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: ProcessStartupPacket(): database_name and user_name truncation

From

"Drouvot, Bertrand"

Date:

21 June 2023, 19:02:49


On 6/21/23 4:22 PM, Drouvot, Bertrand wrote:
> Hi,
> 
> On 6/21/23 3:43 PM, Tom Lane wrote:
>> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
>>> At Wed, 21 Jun 2023 09:43:50 +0200, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in
>>>> Trying to connect with the 64 bytes name:
>>>> $ psql -d ääääääääääääääääääääääääääääääää
>>>> psql: error: connection to server on socket "/tmp/.s.PGSQL.55448"
>>>> failed: FATAL: database "äääääääääääääääääääääääääääääää" does not
>>>> exist
>>
>>> IMHO, I'm not sure we should allow connections without the exact name
>>> being provided. In that sense, I think we might want to consider
>>> outright rejecting the estblishment of a connection when the given
>>> database name doesn't fit the startup packet, since the database with
>>> the exact given name cannot be found.
>>
>> I think I agree.  I don't like the proposed patch at all, because it's
>> making completely unsupportable assumptions about what encoding the
>> names are given in.  Simply failing to match when a name is overlength
>> sounds safer.
>>
> 
> Yeah, that's another and "cleaner" option.
> 
> I'll propose a patch to make it failing even for the non multibyte case then (
> so that multibyte and non multibyte behaves the same aka failing in case of overlength
> name is detected).

Please find attached a patch doing so (which is basically a revert of d18c1d1f51).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v1-0001-Reject-incoming-username-and-database-name-in-cas.patch

Re: ProcessStartupPacket(): database_name and user_name truncation

From

Nathan Bossart

Date:

21 June 2023, 19:55:15

On Wed, Jun 21, 2023 at 09:02:49PM +0200, Drouvot, Bertrand wrote:
> Please find attached a patch doing so (which is basically a revert of d18c1d1f51).

LGTM.  I think this can wait for v17 since the current behavior has been
around since 2001 and AFAIK this is the first report.  While it's arguably
a bug fix, the patch also breaks some cases that work today.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: ProcessStartupPacket(): database_name and user_name truncation

From

Michael Paquier

Date:

21 June 2023, 23:37:26

On Wed, Jun 21, 2023 at 12:55:15PM -0700, Nathan Bossart wrote:
> LGTM.  I think this can wait for v17 since the current behavior has been
> around since 2001 and AFAIK this is the first report.  While it's arguably
> a bug fix, the patch also breaks some cases that work today.

Agreed that anything discussed on this thread does not warrant a
backpatch.
--
Michael

Attachment

signature.asc

Re: ProcessStartupPacket(): database_name and user_name truncation

From

"Drouvot, Bertrand"

Date:

22 June 2023, 06:10:30

Hi,

On 6/22/23 1:37 AM, Michael Paquier wrote:
> On Wed, Jun 21, 2023 at 12:55:15PM -0700, Nathan Bossart wrote:
>> LGTM.  I think this can wait for v17 since the current behavior has been
>> around since 2001 and AFAIK this is the first report.  While it's arguably
>> a bug fix, the patch also breaks some cases that work today.
> 
> Agreed that anything discussed on this thread does not warrant a
> backpatch.

Fully agree, the CF entry [1] has been tagged as "Target Version 17".

[1] https://commitfest.postgresql.org/43/4383/

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: ProcessStartupPacket(): database_name and user_name truncation

From

Nathan Bossart

Date:

30 June 2023, 15:42:18

After taking another look at this, I wonder if it'd be better to fail as
soon as we see the database or user name is too long instead of lugging
them around when authentication is destined to fail.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: ProcessStartupPacket(): database_name and user_name truncation

From

Tom Lane

Date:

30 June 2023, 15:54:12

Nathan Bossart <nathandbossart@gmail.com> writes:
> After taking another look at this, I wonder if it'd be better to fail as
> soon as we see the database or user name is too long instead of lugging
> them around when authentication is destined to fail.

If we're agreed that we aren't going to truncate these identifiers,
that seems like a reasonable way to handle it.

            regards, tom lane

Re: ProcessStartupPacket(): database_name and user_name truncation

From

"Drouvot, Bertrand"

Date:

30 June 2023, 17:32:50

Hi,

On 6/30/23 5:54 PM, Tom Lane wrote:
> Nathan Bossart <nathandbossart@gmail.com> writes:
>> After taking another look at this, I wonder if it'd be better to fail as
>> soon as we see the database or user name is too long instead of lugging
>> them around when authentication is destined to fail.
> 
> If we're agreed that we aren't going to truncate these identifiers,
> that seems like a reasonable way to handle it.
> 

Yeah agree, thanks Nathan for the idea.
I'll work on a new patch version proposal.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: ProcessStartupPacket(): database_name and user_name truncation

From

"Drouvot, Bertrand"

Date:

01 July 2023, 14:02:06

Hi,

On 6/30/23 7:32 PM, Drouvot, Bertrand wrote:
> Hi,
> 
> On 6/30/23 5:54 PM, Tom Lane wrote:
>> Nathan Bossart <nathandbossart@gmail.com> writes:
>>> After taking another look at this, I wonder if it'd be better to fail as
>>> soon as we see the database or user name is too long instead of lugging
>>> them around when authentication is destined to fail.
>>
>> If we're agreed that we aren't going to truncate these identifiers,
>> that seems like a reasonable way to handle it.
>>
> 
> Yeah agree, thanks Nathan for the idea.
> I'll work on a new patch version proposal.
> 

Please find V2 attached where it's failing as soon as the database name or
user name are detected as overlength.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v2-0001-Reject-incoming-username-and-database-name-in-cas.patch

Re: ProcessStartupPacket(): database_name and user_name truncation

From

Kyotaro Horiguchi

Date:

03 July 2023, 01:50:45

At Fri, 30 Jun 2023 19:32:50 +0200, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in 
> Hi,
> 
> On 6/30/23 5:54 PM, Tom Lane wrote:
> > Nathan Bossart <nathandbossart@gmail.com> writes:
> >> After taking another look at this, I wonder if it'd be better to fail
> >> as
> >> soon as we see the database or user name is too long instead of
> >> lugging
> >> them around when authentication is destined to fail.

For the record, if I understand Nathan correctly, it is what I
suggested in my initial post. If this is correct, +1 for the suggestion.

me> I think we might want to consider outright rejecting the
me> estblishment of a connection when the given database name doesn't
me> fit the startup packet

> > If we're agreed that we aren't going to truncate these identifiers,
> > that seems like a reasonable way to handle it.
> > 
> 
> Yeah agree, thanks Nathan for the idea.
> I'll work on a new patch version proposal.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: ProcessStartupPacket(): database_name and user_name truncation

From

Kyotaro Horiguchi

Date:

03 July 2023, 02:09:58

At Mon, 03 Jul 2023 10:50:45 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> For the record, if I understand Nathan correctly, it is what I
> suggested in my initial post. If this is correct, +1 for the suggestion.
> 
> me> I think we might want to consider outright rejecting the
> me> estblishment of a connection when the given database name doesn't
> me> fit the startup packet

Mmm. It's bit wrong. "doesn't fit the startup packet" is "is long as a
database name".


At Sat, 1 Jul 2023 16:02:06 +0200, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in 
> Please find V2 attached where it's failing as soon as the database
> name or
> user name are detected as overlength.

I find another errocde "ERRCODE_INVALID_ROLE_SPECIFICATION". I don't
find a clear distinction between the usages of the two, but I think
.._ROLE_.. might be a better fit.


ERRCODE_INVALID_ROLE_SPACIFICATION:
  auth.c:1507:  "could not transnlate name"
  auth.c:1526:  "could not translate name"
  auth.c:1539:  "realm name too long"
  auth.c:1554:  "translated account name too long"

ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION:
postmaster.c:2268:  "no PostgreSQL user name specified in startup packet"
miscinit.c:756:     "role \"%s\" does not exist"
miscinit.c:764:     "role with OID %u does not exist"
miscinit.c:794:     "role \"%s\" is not permitted to log in"
auth.c:420:         "connection requires a valid client certificate"
auth.c:461,468,528,536:  "pg_hba.conf rejects ..."
auth.c:878:         MD5 authentication is not supported when \"db_user_namespace\" is enabled"
auth-scram.c:1016:  "SCRAM channel binding negotiation error"
auth-scram.c:1349:  "SCRAM channel binding check failed"

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: ProcessStartupPacket(): database_name and user_name truncation

From

Nathan Bossart

Date:

03 July 2023, 20:34:08

On Sat, Jul 01, 2023 at 04:02:06PM +0200, Drouvot, Bertrand wrote:
> Please find V2 attached where it's failing as soon as the database name or
> user name are detected as overlength.

Thanks, Bertrand.  I chickened out and ended up committing v1 for now
(i.e., simply removing the truncation code).  I didn't like the idea of
trying to keep the new error messages consistent with code in faraway
files, and the startup packet length limit is already pretty aggressive, so
I'm a little less concerned about lugging around long names.  Plus, I think
v2 had some subtle interactions with db_user_namespace (maybe for the
better), but I didn't spend too much time looking at that since
db_user_namespace will likely be removed soon.

If anyone disagrees and wants to see the FATALs emitted from
ProcessStartupPacket() directly, please let me know and we can work on
adding them in a follow-up patch.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: ProcessStartupPacket(): database_name and user_name truncation

From

Tom Lane

Date:

03 July 2023, 22:33:46

Nathan Bossart <nathandbossart@gmail.com> writes:
> Thanks, Bertrand.  I chickened out and ended up committing v1 for now
> (i.e., simply removing the truncation code).

WFM.

> If anyone disagrees and wants to see the FATALs emitted from
> ProcessStartupPacket() directly, please let me know and we can work on
> adding them in a follow-up patch.

I think the new behavior is fine.

            regards, tom lane

Re: ProcessStartupPacket(): database_name and user_name truncation

From

"Drouvot, Bertrand"

Date:

04 July 2023, 06:06:37

Hi,

On 7/3/23 10:34 PM, Nathan Bossart wrote:
> On Sat, Jul 01, 2023 at 04:02:06PM +0200, Drouvot, Bertrand wrote:
>> Please find V2 attached where it's failing as soon as the database name or
>> user name are detected as overlength.
> 
> Thanks, Bertrand.  I chickened out and ended up committing v1 for now
> (i.e., simply removing the truncation code).  I didn't like the idea of
> trying to keep the new error messages consistent with code in faraway
> files, and the startup packet length limit is already pretty aggressive, so
> I'm a little less concerned about lugging around long names.  Plus, I think
> v2 had some subtle interactions with db_user_namespace (maybe for the
> better), but I didn't spend too much time looking at that since
> db_user_namespace will likely be removed soon.

Thanks Nathan for the feedback and explanations, I think that makes fully sense.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com