Thread: Patch proposal: make use of regular expressions for the username in pg_hba.conf

Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

"Drouvot, Bertrand"

Date:

19 August 2022, 08:12:57

Hi hackers,

Attached is a patch proposal to allow the use of regular expressions for 
the username in pg_hba.conf.

Using regular expressions for the username in the pg_hba.conf file is 
convenient in situations where an organization has a large number of 
users and needs an expressive way to map them.

For example, if an organization wants to allow gss connections only for 
users having their principal, e.g. @BDTFOREST.LOCAL, they could make use 
of an entry in pg_hba.conf such as:

host all /^.*@BDTFOREST.LOCAL$ 0.0.0.0/0 gss

Without this patch, I can think of three alternatives with existing 
functionality, which all of tradeoffs. This includes:

1) Create an entry per user: this is challenging for organizations 
managing large numbers of users (e.g. 1000s). This is also not dynamic, 
i.e. the HBA file would need to be updated when users are added or removed.

2) Use a mapping in pg_ident.conf, for example:

Here is an entry in pg_hba.conf that uses a map:

host all all 0.0.0.0/0 gss map=mygssmap

and by defining this mapping in pg_ident.conf:

mygssmap /^(.*)@BDTFOREST\.LOCAL$ \1@BDTFOREST.LOCAL

That works for filtering the username.

LOG:  connection authenticated: identity="bertrand@BDTFOREST.LOCAL" 
method=gss (/pg_installed/data/pg_hba.conf:95)
$ grep -n mygssmap /pg_installed/data/pg_hba.conf
95:host    all             all  0.0.0.0/0  gss map=mygssmap

However, the behavior is not the same for the ones that don’t match the 
mapping in pg_ident.conf: indeed the connection attempt stop here and 
the next HBA line won’t be evaluated.

FATAL:  GSSAPI authentication failed for user "bdt"
DETAIL:  Connection matched pg_hba.conf line 95: "host all             
all  0.0.0.0/0  gss map=mygssmap"

3) Make use of a role in pg_hba.conf, e.g. “+BDTONLY”. That would work 
too, and also allow the evaluation of the next HBA line for the ones 
that are not part of the role.

However:

     - That’s not as dynamic as the regular expression, as new users 
would need to be granted the role and some users who are moving in the 
company may need to have the role revoked.
     - Looking at the regular expression in the HBA file makes it clear 
what filtering needs to be done. This is not obvious when looking at the 
role, even if it has a meaningful name. This can generate “incorrect 
filtering” should one user be granted the role by mistake, or make it 
more difficult to debug why a user is not being matched to a particular 
line in the HBA file.

This is why I think username filtering with regular expressions would 
provide its own advantages.

Thoughts? Looking forward to your feedback,

Regards,

-- 
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

Attachment

v1-0001-hba_with_regexp.patch

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Jacob Champion

Date:

09 September 2022, 00:02:00

On Fri, Aug 19, 2022 at 1:13 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
> This is why I think username filtering with regular expressions would
> provide its own advantages.
>
> Thoughts? Looking forward to your feedback,

I think your motivation for the feature is solid. It is killing me a
bit that this is making it easier to switch authentication methods
based on the role name, when I suspect what someone might really want
is to switch authentication methods based on the ID the user is trying
to authenticate with. But that's not your fault or problem to fix,
because the startup packet doesn't currently have that information.
(It does make me wonder whether I withdrew my PGAUTHUSER proposal [1]
a month too early. And man, do I wish that pg_ident and pg_hba were
one file.)

I think you're going to have to address backwards compatibility
concerns. Today, I can create a role named "/a", and I can put that
into the HBA without quoting it. I'd be unamused if, after an upgrade,
my rule suddenly matched any role name containing an 'a'.

Speaking of partial matches, should this feature allow them? Maybe
rules should have to match the entire username instead, and sidestep
the inevitable "I forgot to anchor my regex" problems?

Thanks,
--Jacob

[1] https://commitfest.postgresql.org/38/3314/

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Tom Lane

Date:

09 September 2022, 00:46:40

Jacob Champion <jchampion@timescale.com> writes:
> On Fri, Aug 19, 2022 at 1:13 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>> This is why I think username filtering with regular expressions would
>> provide its own advantages.

> I think your motivation for the feature is solid.

Yeah.  I'm not sure that I buy the argument that this is more useful
than writing a role name and controlling things with GRANT ROLE, but
it's a plausible alternative with properties that might win in some
use-cases.  So I see little reason not to allow it.

I'd actually ask why stop here?  In particular, why not do the same
with the database-name column, especially since that does *not*
have the ability to use roles as a substitute for a wildcard entry?

> I think you're going to have to address backwards compatibility
> concerns. Today, I can create a role named "/a", and I can put that
> into the HBA without quoting it. I'd be unamused if, after an upgrade,
> my rule suddenly matched any role name containing an 'a'.

Meh ... that concern seems overblown to me.  I guess it's possible
that somebody has an HBA entry that looks like that, but it doesn't
seem very plausible.  Note that we made this exact same change in
pg_ident.conf years ago, and AFAIR we got zero complaints.

> Speaking of partial matches, should this feature allow them? Maybe
> rules should have to match the entire username instead, and sidestep
> the inevitable "I forgot to anchor my regex" problems?

I think the pg_ident.conf precedent is binding on us here.  If we
make this one work differently, nobody's going to thank us for it,
they're just going to wonder "did the left hand not know what the
right hand already did?"

            regards, tom lane

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

"Drouvot, Bertrand"

Date:

09 September 2022, 10:31:08

Hi,

On 9/9/22 2:46 AM, Tom Lane wrote:

Jacob Champion <jchampion@timescale.com> writes:

On Fri, Aug 19, 2022 at 1:13 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

This is why I think username filtering with regular expressions would
provide its own advantages.

I think your motivation for the feature is solid.

Yeah.  I'm not sure that I buy the argument that this is more useful
than writing a role name and controlling things with GRANT ROLE, but
it's a plausible alternative with properties that might win in some
use-cases.  So I see little reason not to allow it.

Thank you both for your feedback.

I'd actually ask why stop here?  In particular, why not do the same
with the database-name column, especially since that does *not*
have the ability to use roles as a substitute for a wildcard entry?

I think that's a fair point, I'll look at it.

I think you're going to have to address backwards compatibility
concerns. Today, I can create a role named "/a", and I can put that
into the HBA without quoting it. I'd be unamused if, after an upgrade,
my rule suddenly matched any role name containing an 'a'.

Meh ... that concern seems overblown to me.  I guess it's possible
that somebody has an HBA entry that looks like that, but it doesn't
seem very plausible.  Note that we made this exact same change in
pg_ident.conf years ago, and AFAIR we got zero complaints.

Agree that it seems unlikely but maybe we could add a new GUC to turn the regex usage on the hba file on/off (and use off as the default)?
Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Tom Lane

Date:

09 September 2022, 14:19:35

"Drouvot, Bertrand" <bdrouvot@amazon.com> writes:
> Agree that it seems unlikely but maybe we could add a new GUC to turn 
> the regex usage on the hba file on/off (and use off as the default)?

I think that will just add useless complication.

            regards, tom lane

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Jacob Champion

Date:

09 September 2022, 22:05:18

On Thu, Sep 8, 2022 at 5:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jacob Champion <jchampion@timescale.com> writes:
> > I think you're going to have to address backwards compatibility
> > concerns. Today, I can create a role named "/a", and I can put that
> > into the HBA without quoting it. I'd be unamused if, after an upgrade,
> > my rule suddenly matched any role name containing an 'a'.
>
> Meh ... that concern seems overblown to me.  I guess it's possible
> that somebody has an HBA entry that looks like that, but it doesn't
> seem very plausible.  Note that we made this exact same change in
> pg_ident.conf years ago, and AFAIR we got zero complaints.

What percentage of users actually use pg_ident maps? My assumption
would be that a change to pg_hba would affect many more people, but
then I don't have any proof that there are users with role names that
look like that to begin with. I won't pound the table with it.

> > Speaking of partial matches, should this feature allow them? Maybe
> > rules should have to match the entire username instead, and sidestep
> > the inevitable "I forgot to anchor my regex" problems?
>
> I think the pg_ident.conf precedent is binding on us here.  If we
> make this one work differently, nobody's going to thank us for it,
> they're just going to wonder "did the left hand not know what the
> right hand already did?"

Hmm... yeah, I suppose. From the other direction, it'd be bad to train
users that unanchored regexes are safe in pg_hba only to take those
guardrails off in pg_ident. I will tuck that away as a potential
behavior change, for a different thread.

Thanks,
--Jacob

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Jacob Champion

Date:

09 September 2022, 23:21:32

On 8/19/22 01:12, Drouvot, Bertrand wrote:
> +           wstr = palloc((strlen(tok->string + 1) + 1) * sizeof(pg_wchar));     
> +           wlen = pg_mb2wchar_with_len(tok->string + 1,                         
> +                                       wstr, strlen(tok->string + 1));

The (tok->string + 1) construction comes up often enough that I think it
should be put in a `regex` variable or similar. That would help my eyes
with the (strlen(tok->string + 1) + 1) construction, especially.

I noticed that for pg_ident, we precompile the regexes per-line and
reuse those in child processes. Whereas here we're compiling, using, and
then discarding the regex for each check. I think the example set by the
pg_ident code is probably the one to follow, unless you have a good
reason not to.

> +# Testing with regular expression for username                                  
> +reset_pg_hba($node, '/^.*md.*$', 'password');                                   
> +test_role($node, 'md5_role', 'password from pgpass and regular expression for username', 0);
> +                      

IMO the coverage for this patch needs to be filled out. Negative test
cases are more important than positive ones for security-related code.

Other than that, and Tom's note on potentially expanding this to other
areas, I think this is a pretty straightforward patch.

Thanks,
--Jacob

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

"Drouvot, Bertrand"

Date:

12 September 2022, 07:55:25

Hi,

On 9/10/22 1:21 AM, Jacob Champion wrote:

On 8/19/22 01:12, Drouvot, Bertrand wrote:

+           wstr = palloc((strlen(tok->string + 1) + 1) * sizeof(pg_wchar));
+           wlen = pg_mb2wchar_with_len(tok->string + 1,
+                                       wstr, strlen(tok->string + 1));

The (tok->string + 1) construction comes up often enough that I think it
should be put in a `regex` variable or similar. That would help my eyes
with the (strlen(tok->string + 1) + 1) construction, especially.

I noticed that for pg_ident, we precompile the regexes per-line and
reuse those in child processes. Whereas here we're compiling, using, and
then discarding the regex for each check. I think the example set by the
pg_ident code is probably the one to follow, unless you have a good
reason not to.

Thanks for the feedback.

Yeah fully agree. I'll provide a new version that follow the same logic as the pg_ident code.

+# Testing with regular expression for username
+reset_pg_hba($node, '/^.*md.*$', 'password');
+test_role($node, 'md5_role', 'password from pgpass and regular expression for username', 0);
+

IMO the coverage for this patch needs to be filled out. Negative test
cases are more important than positive ones for security-related code.

Agree, will do.


Other than that, and Tom's note on potentially expanding this to other
areas,

I'll add regexp usage for the database column and also the for the address one when non CIDR is provided (so host name(s)) (I think it also makes sense specially as we don't allow multiple values for this column).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

"Drouvot, Bertrand"

Date:

16 September 2022, 16:24:07

Hi,

On 9/12/22 9:55 AM, Drouvot, Bertrand wrote:
> Hi,
> 
> On 9/10/22 1:21 AM, Jacob Champion wrote:
>> On 8/19/22 01:12, Drouvot, Bertrand wrote:
>>> +           wstr = palloc((strlen(tok->string + 1) + 1) * sizeof(pg_wchar));
>>> +           wlen = pg_mb2wchar_with_len(tok->string + 1,
>>> +                                       wstr, strlen(tok->string + 1));
>> The (tok->string + 1) construction comes up often enough that I think it
>> should be put in a `regex` variable or similar. That would help my eyes
>> with the (strlen(tok->string + 1) + 1) construction, especially.
>>
>> I noticed that for pg_ident, we precompile the regexes per-line and
>> reuse those in child processes. Whereas here we're compiling, using, and
>> then discarding the regex for each check. I think the example set by the
>> pg_ident code is probably the one to follow, unless you have a good
>> reason not to.
> 
> Thanks for the feedback.
> 
> Yeah fully agree. I'll provide a new version that follow the same logic 
> as the pg_ident code.
> 
>>> +# Testing with regular expression for username
>>> +reset_pg_hba($node, '/^.*md.*$', 'password');
>>> +test_role($node, 'md5_role', 'password from pgpass and regular expression for username', 0);
>>> +
>> IMO the coverage for this patch needs to be filled out. Negative test
>> cases are more important than positive ones for security-related code.
> 
> Agree, will do.
> 
>> Other than that, and Tom's note on potentially expanding this to other
>> areas,
> 
> I'll add regexp usage for the database column and also the for the 
> address one when non CIDR is provided (so host name(s)) (I think it also 
> makes sense specially as we don't allow multiple values for this column).
> 

Please find attached v2 addressing the comments mentioned above.

v2 also provides regular expression usage for the database and the 
address columns (when a host name is being used).

Remark:

The CF bot is failing for Windows (all other tests are green) and only 
for the new tap test related to the regular expression on the host name 
(the ones on database and role are fine).

The issue is not related to the patch. The issue is that the Windows 
Cirrus test does not like when a host name is provided for a "host" 
entry in pg_hba.conf (while it works fine when a CIDR is provided).

You can see an example in [1] where the only change is to replace the 
CIDR by "localhost" in 002_scram.pl. As you can see the Cirrus tests are 
failing on Windows only (its log file is here [2]).

I'll look at this "Windows" related issue but would appreciate any 
guidance/help if someone has experience in this area on windows.

[1]: https://github.com/bdrouvot/postgres/branches on branch “host_non_cidr”

[2]: 
https://api.cirrus-ci.com/v1/artifact/task/6507279833890816/log/src/test/ssl/tmp_check/log/002_scram_primary.log

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v2-0001-hba_with_regexp.patch

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Michael Paquier

Date:

17 September 2022, 06:53:35

On Fri, Sep 16, 2022 at 06:24:07PM +0200, Drouvot, Bertrand wrote:
> The CF bot is failing for Windows (all other tests are green) and only for
> the new tap test related to the regular expression on the host name (the
> ones on database and role are fine).
>
> The issue is not related to the patch. The issue is that the Windows Cirrus
> test does not like when a host name is provided for a "host" entry in
> pg_hba.conf (while it works fine when a CIDR is provided).
>
> You can see an example in [1] where the only change is to replace the CIDR
> by "localhost" in 002_scram.pl. As you can see the Cirrus tests are failing
> on Windows only (its log file is here [2]).
>
> I'll look at this "Windows" related issue but would appreciate any
> guidance/help if someone has experience in this area on windows.

I recall that being able to do a reverse lookup of a hostname on
Windows for localhost requires a few extra setup steps as that's not
guaranteed to be set in all environments by default, which is why we
go at great length to use 127.0.0.1 in the TAP test setup for example
(see Cluster.pm).  Looking at your patch, the goal is to test the
mapping of regular expression for host names, user names and database
names.  If the first case is not guaranteed, my guess is that it is
fine to skip this portion of the tests on Windows.

While reading the patch, I am a bit confused about token_regcomp() and
token_regexec().  It would help the review a lot if these were
documented with proper comments, even if these act roughly as wrappers
for pg_regexec() and pg_regcomp().
--
Michael

Attachment

signature.asc

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

"Drouvot, Bertrand"

Date:

19 September 2022, 07:36:10

Hi,

On 9/17/22 8:53 AM, Michael Paquier wrote:
> On Fri, Sep 16, 2022 at 06:24:07PM +0200, Drouvot, Bertrand wrote:
>> The CF bot is failing for Windows (all other tests are green) and only for
>> the new tap test related to the regular expression on the host name (the
>> ones on database and role are fine).
>>
>> The issue is not related to the patch. The issue is that the Windows Cirrus
>> test does not like when a host name is provided for a "host" entry in
>> pg_hba.conf (while it works fine when a CIDR is provided).
>>
>> You can see an example in [1] where the only change is to replace the CIDR
>> by "localhost" in 002_scram.pl. As you can see the Cirrus tests are failing
>> on Windows only (its log file is here [2]).
>>
>> I'll look at this "Windows" related issue but would appreciate any
>> guidance/help if someone has experience in this area on windows.
> 
> I recall that being able to do a reverse lookup of a hostname on
> Windows for localhost requires a few extra setup steps as that's not
> guaranteed to be set in all environments by default, which is why we
> go at great length to use 127.0.0.1 in the TAP test setup for example
> (see Cluster.pm).  Looking at your patch, the goal is to test the
> mapping of regular expression for host names, user names and database
> names.  If the first case is not guaranteed, my guess is that it is
> fine to skip this portion of the tests on Windows.

Thanks for looking at it!

That sounds reasonable, v3 attached is skipping the regular expression 
tests for the hostname on Windows.


> 
> While reading the patch, I am a bit confused about token_regcomp() and
> token_regexec().  It would help the review a lot if these were
> documented with proper comments, even if these act roughly as wrappers
> for pg_regexec() and pg_regcomp().

Fully agree, comments were missing. They've been added in v3 attached.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v3-0001-hba_with_regexp.patch

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Michael Paquier

Date:

20 September 2022, 03:55:25

On Fri, Sep 09, 2022 at 03:05:18PM -0700, Jacob Champion wrote:
> On Thu, Sep 8, 2022 at 5:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Jacob Champion <jchampion@timescale.com> writes:
>> > I think you're going to have to address backwards compatibility
>> > concerns. Today, I can create a role named "/a", and I can put that
>> > into the HBA without quoting it. I'd be unamused if, after an upgrade,
>> > my rule suddenly matched any role name containing an 'a'.
>>
>> Meh ... that concern seems overblown to me.  I guess it's possible
>> that somebody has an HBA entry that looks like that, but it doesn't
>> seem very plausible.  Note that we made this exact same change in
>> pg_ident.conf years ago, and AFAIR we got zero complaints.
>
> What percentage of users actually use pg_ident maps? My assumption
> would be that a change to pg_hba would affect many more people, but
> then I don't have any proof that there are users with role names that
> look like that to begin with. I won't pound the table with it.

This concern does not sound overblown to me.  A change in pg_hba.conf
impacts everybody.  I was just looking at this patch, and the logic
with usernames and databases is changed so as we would *always* treat
*any* entries beginning with a slash as a regular expression, skipping
them if they don't match with an error fed to the logs and
pg_hba_file_rules.  This could lead to silent security issues as
stricter HBA policies need to be located first in pg_hba.conf and
these could suddenly fail to load.  It would be much safer to me if we
had in place some restrictions to avoid such problems to happen,
meaning some extra checks in the DDL code paths for such object names
and perhaps even something in pg_upgrade with a scan at pg_database
and pg_authid.

On the bright side, I have been looking at some of the RFCs covering
the set of characters allowed in DNS names and slashes are not
authorized in hostnames, making this change rather safe AFAIK.
--
Michael

Attachment

signature.asc

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Tom Lane

Date:

20 September 2022, 04:09:33

Michael Paquier <michael@paquier.xyz> writes:
>> On Thu, Sep 8, 2022 at 5:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Meh ... that concern seems overblown to me.  I guess it's possible
>>> that somebody has an HBA entry that looks like that, but it doesn't
>>> seem very plausible.  Note that we made this exact same change in
>>> pg_ident.conf years ago, and AFAIR we got zero complaints.

> This concern does not sound overblown to me.

You have to assume that somebody (a) has a role or DB name starting
with slash, (b) has an explicit reference to that name in their
pg_hba.conf, (c) doesn't read the release notes, and (d) doesn't
notice that things are misbehaving until after some hacker manages
to break into their installation on the strength of the misbehaving
entry.  OK, I'll grant that the probability of (c) is depressingly
close to unity; but each of the other steps seems quite low probability.
All four of them happening in one installation is something I doubt
will happen.

On the contrary side, if we make this work differently from the
pg_ident.conf precedent, or install weird rules to try to prevent
accidental misinterpretations, that could also lead to security
problems because things don't work as someone would expect.  I see
no a-priori reason to believe that this risk is negligible compared
to the other one.

            regards, tom lane

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Michael Paquier

Date:

20 September 2022, 04:30:50

On Tue, Sep 20, 2022 at 12:09:33AM -0400, Tom Lane wrote:
> You have to assume that somebody (a) has a role or DB name starting
> with slash, (b) has an explicit reference to that name in their
> pg_hba.conf, (c) doesn't read the release notes, and (d) doesn't
> notice that things are misbehaving until after some hacker manages
> to break into their installation on the strength of the misbehaving
> entry.  OK, I'll grant that the probability of (c) is depressingly
> close to unity; but each of the other steps seems quite low probability.
> All four of them happening in one installation is something I doubt
> will happen.

It is the kind of things that could blow up as a CVE and some bad PR
for the project, so I cannot get excited about enforcing this new rule
in an authentication file (aka before a role is authenticated) while
we are talking about 3~4 code paths (?) that would need an extra check
to make sure that no instances have such object names.

> On the contrary side, if we make this work differently from the
> pg_ident.conf precedent, or install weird rules to try to prevent
> accidental misinterpretations, that could also lead to security
> problems because things don't work as someone would expect.  I see
> no a-priori reason to believe that this risk is negligible compared
> to the other one.

I also do like a lot the idea of making things consistent across all
the auth configuration files for all the fields where this can be
applied.
--
Michael

Attachment

signature.asc

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

"Drouvot, Bertrand"

Date:

20 September 2022, 11:33:09

Hi,

On 9/20/22 6:30 AM, Michael Paquier wrote:
> On Tue, Sep 20, 2022 at 12:09:33AM -0400, Tom Lane wrote:
>> You have to assume that somebody (a) has a role or DB name starting
>> with slash, (b) has an explicit reference to that name in their
>> pg_hba.conf, (c) doesn't read the release notes, and (d) doesn't
>> notice that things are misbehaving until after some hacker manages
>> to break into their installation on the strength of the misbehaving
>> entry.  OK, I'll grant that the probability of (c) is depressingly
>> close to unity; but each of the other steps seems quite low probability.
>> All four of them happening in one installation is something I doubt
>> will happen.
> 
> It is the kind of things that could blow up as a CVE and some bad PR
> for the project, so I cannot get excited about enforcing this new rule
> in an authentication file (aka before a role is authenticated) while
> we are talking about 3~4 code paths (?) that would need an extra check
> to make sure that no instances have such object names.

I also have the feeling that having (a), (b) and (d) is low probability.

That said, If the user "/john" already exists and has a hba entry then 
this entry will still match with the patch. Problem is that all the 
users that contain "john" would also now match.

But things get worst if say /a is an existing user and hba entry as the 
entry would match any users that contains "a" with the patch.

I assume (maybe i should not) that if objects starting with / already 
exist there is very good reason(s) behind. Then I don't think that 
preventing their creation in the DDL would help (quite the contrary for 
the ones that really need them).

It looks to me that adding a GUC (off by default) to enable/disable the 
regexp usage in the hba could be a fair compromise. It won't block any 
creation starting with a / and won't open more doors (if such objects 
exist) by default.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Jacob Champion

Date:

20 September 2022, 17:15:14

On Mon, Sep 19, 2022 at 9:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> You have to assume that somebody (a) has a role or DB name starting
> with slash, (b) has an explicit reference to that name in their
> pg_hba.conf, (c) doesn't read the release notes, and (d) doesn't
> notice that things are misbehaving until after some hacker manages
> to break into their installation on the strength of the misbehaving
> entry.  OK, I'll grant that the probability of (c) is depressingly
> close to unity; but each of the other steps seems quite low probability.
> All four of them happening in one installation is something I doubt
> will happen.

I can't argue with (a) or (b), but (d) seems decently likely to me. If
your normal user base consists of people who are authorized to access
your system, what clues would you have that your HBA is silently
failing open?

--Jacob

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Michael Paquier

Date:

05 October 2022, 07:24:43

On Tue, Sep 20, 2022 at 01:33:09PM +0200, Drouvot, Bertrand wrote:
> I assume (maybe i should not) that if objects starting with / already exist
> there is very good reason(s) behind. Then I don't think that preventing
> their creation in the DDL would help (quite the contrary for the ones that
> really need them).

I have been pondering on this point for the last few weeks, and I'd
like to change my opinion and side with Tom on this one as per the
very unlikeliness of this being a problem in the wild.  I have studied
the places that would require restrictions but that was just feeling
adding a bit more bloat into the CREATE/ALTER ROLE paths for what's
aimed at providing a consistent experience for the user across
pg_hba.conf and pg_ident.conf.

> It looks to me that adding a GUC (off by default) to enable/disable the
> regexp usage in the hba could be a fair compromise. It won't block any
> creation starting with a / and won't open more doors (if such objects exist)
> by default.

Enforcing a behavior change in HBA policies with a GUC does not strike
me as a good thing in the long term.  I am ready to bet that it would
just sit around for nothing like the compatibility GUCs.

Anyway, I have looked at the patch.

+   List       *roles_re;
+   List       *databases_re;
+   regex_t    hostname_re;
I am surprised by the approach of using separate lists for the regular
expressions and the raw names.  Wouldn't it be better to store
everything in a single list but assign an entry type?  In this case it
would be either regex or plain string.  This would minimize the
footprint of the changes (no extra arguments *_re in the routines
checking for a match on the roles, databases or hosts).  And it seems
to me that this would make unnecessary the use of re_num here and
there.  The hostname is different, of course, requiring only an extra
field for its type, or something like that.

Perhaps the documentation would gain in clarity if there were more
examples, like a set of comma-separated examples (mix of regex and raw
strings for example, for all the field types that gain support for
regexes)?

-$node->append_conf('postgresql.conf', "log_connections = on\n");
+$node->append_conf(
+    'postgresql.conf', qq{
+listen_addresses = '127.0.0.1'
+log_connections = on
+});
Hmm.  I think that we may need to reconsider the location of the tests
for the regexes with the host name, as the "safe" regression tests
should not switch listen_addresses.  One location where we already do
that is src/test/ssl/, so these could be moved there.  Keeping the
database and user name parts in src/test/authentication/ is fine.

Something that stood out on a first review is the refactoring of
001_password.pl that can be done independently of the main patch:
- test_role() -> test_conn() to be able to pass down a database name.
- reset_pg_hba() to control the host, db and user parts.  The host
part does not really apply after moving the hosts checks to a more
secure location, so I guess that this had better be extended just for
the user and database, keeping host=local all the time.
I am planning to apply 0001 attached independently, reducing the
footprint of 0002, which is your previous patch left untouched
(mostly!).
--
Michael

Attachment

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

"Drouvot, Bertrand"

Date:

05 October 2022, 13:32:20

Hi,

On 10/5/22 9:24 AM, Michael Paquier wrote:
> Something that stood out on a first review is the refactoring of
> 001_password.pl that can be done independently of the main patch:

Good idea, thanks for the proposal.

> - test_role() -> test_conn() to be able to pass down a database name.
> - reset_pg_hba() to control the host, db and user parts.  The host
> part does not really apply after moving the hosts checks to a more
> secure location, so I guess that this had better be extended just for
> the user and database, keeping host=local all the time.
> I am planning to apply 0001 attached independently, 

0001 looks good to me.

> reducing the
> footprint of 0002, which is your previous patch left untouched
> (mostly!).

Thanks! I'll look at it and the comments you just made up-thread.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Michael Paquier

Date:

06 October 2022, 00:53:15

On Wed, Oct 05, 2022 at 03:32:20PM +0200, Drouvot, Bertrand wrote:
> On 10/5/22 9:24 AM, Michael Paquier wrote:
>> - test_role() -> test_conn() to be able to pass down a database name.
>> - reset_pg_hba() to control the host, db and user parts.  The host
>> part does not really apply after moving the hosts checks to a more
>> secure location, so I guess that this had better be extended just for
>> the user and database, keeping host=local all the time.
>> I am planning to apply 0001 attached independently,
>
> 0001 looks good to me.

Thanks.  I have applied this refactoring, leaving the host part out of
the equation as we should rely only on local connections for this
part of the test.  The best fit I can think about for the checks on
the hostname patterns would be either the ssl, ldap or krb5 tests.
SSL is more widely tested than the two others.

>> reducing the
>> footprint of 0002, which is your previous patch left untouched
>> (mostly!).
>
> Thanks! I'll look at it and the comments you just made up-thread.

Cool, thanks.  One thing that matters a lot IMO (that I forgot to
mention previously) is to preserve the order of the items parsed from
the configuration files.

Also, I am wondering whether we'd better have some regression tests
where a regex includes a comma and a role name itself has a comma,
actually, just to stress more the parsing of individual items in the
HBA file.
--
Michael

Attachment

signature.asc

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

"Drouvot, Bertrand"

Date:

10 October 2022, 07:00:06

Hi,

On 10/5/22 9:24 AM, Michael Paquier wrote:
> On Tue, Sep 20, 2022 at 01:33:09PM +0200, Drouvot, Bertrand wrote:
> Anyway, I have looked at the patch.
> 
> +   List       *roles_re;
> +   List       *databases_re;
> +   regex_t    hostname_re;
> I am surprised by the approach of using separate lists for the regular
> expressions and the raw names.  Wouldn't it be better to store
> everything in a single list but assign an entry type?  In this case it
> would be either regex or plain string.  This would minimize the
> footprint of the changes (no extra arguments *_re in the routines
> checking for a match on the roles, databases or hosts).  And it seems
> to me that this would make unnecessary the use of re_num here and
> there. 

Please find attached v5 addressing this. I started with an union but it 
turns out that we still need the plain string when a regex is used. This 
is not needed for the authentication per say but for fill_hba_line(). So 
I ended up creating a new struct without union in v5.

> The hostname is different, of course, requiring only an extra
> field for its type, or something like that.

I'm using the same new struct as described above for the hostname.

> 
> Perhaps the documentation would gain in clarity if there were more
> examples, like a set of comma-separated examples (mix of regex and raw
> strings for example, for all the field types that gain support for
> regexes)?
> 

Right, I added more examples in v5.

> -$node->append_conf('postgresql.conf', "log_connections = on\n");
> +$node->append_conf(
> +    'postgresql.conf', qq{
> +listen_addresses = '127.0.0.1'
> +log_connections = on
> +});
> Hmm.  I think that we may need to reconsider the location of the tests
> for the regexes with the host name, as the "safe" regression tests
> should not switch listen_addresses.  One location where we already do
> that is src/test/ssl/, so these could be moved there. 

Good point, I moved the hostname related tests in src/test/ssl.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v5-0001-hba_with_regexp.patch

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

"Drouvot, Bertrand"

Date:

10 October 2022, 07:04:16

Hi,

On 10/6/22 2:53 AM, Michael Paquier wrote:
> On Wed, Oct 05, 2022 at 03:32:20PM +0200, Drouvot, Bertrand wrote:
>> On 10/5/22 9:24 AM, Michael Paquier wrote:
>>> - test_role() -> test_conn() to be able to pass down a database name.
>>> - reset_pg_hba() to control the host, db and user parts.  The host
>>> part does not really apply after moving the hosts checks to a more
>>> secure location, so I guess that this had better be extended just for
>>> the user and database, keeping host=local all the time.
>>> I am planning to apply 0001 attached independently,
>>
>> 0001 looks good to me.
> 
> Thanks.  I have applied this refactoring, leaving the host part out of
> the equation as we should rely only on local connections for this
> part of the test. 

Thanks!

>> Thanks! I'll look at it and the comments you just made up-thread.
> 
> Cool, thanks.  One thing that matters a lot IMO (that I forgot to
> mention previously) is to preserve the order of the items parsed from
> the configuration files.

Fully agree, all the versions that have been submitted in this thread 
preserves the ordering.

> 
> Also, I am wondering whether we'd better have some regression tests
> where a regex includes a comma and a role name itself has a comma,
> actually, just to stress more the parsing of individual items in the
> HBA file.

Good idea, it has been added in v5 just shared up-thread.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Michael Paquier

Date:

11 October 2022, 06:29:19

On Mon, Oct 10, 2022 at 09:00:06AM +0200, Drouvot, Bertrand wrote:
>      foreach(cell, tokens)
>      {
> [...]
> +        tokreg = lfirst(cell);
> +        if (!token_is_regexp(tokreg))
>          {
> -            if (strcmp(dbname, role) == 0)
> +            if (am_walsender && !am_db_walsender)
> +            {
> +                /*
> +                 * physical replication walsender connections can only match
> +                 * replication keyword
> +                 */
> +                if (token_is_keyword(tokreg->authtoken, "replication"))
> +                    return true;
> +            }
> +            else if (token_is_keyword(tokreg->authtoken, "all"))
>                  return true;

When checking the list of databases in check_db(), physical WAL
senders (aka am_walsender && !am_db_walsender) would be able to accept
regexps, but these should only accept "replication" and never a
regexp, no?  The second check on "replication" placed in the branch
for token_is_regexp() in your patch would be correctly placed, though.
This is kind of special in the HBA logic, coming back to 9.0 where
physical replication and this special role property have been
introduced.  WAL senders have gained an actual database property later
on in 9.4 with logical decoding, keeping "replication" for
compatibility (connection strings can use replication=database to
connect as a non-physical WAL sender and connect to a specific
database).

> +typedef struct AuthToken
> +{
> +    char       *string;
> +    bool        quoted;
> +} AuthToken;
> +
> +/*
> + * Distinguish the case a token has to be treated as a regular
> + * expression or not.
> + */
> +typedef struct AuthTokenOrRegex
> +{
> +    bool        is_regex;
> +
> +    /*
> +     * Not an union as we still need the token string for fill_hba_line().
> +     */
> +    AuthToken  *authtoken;
> +    regex_t    *regex;
> +} AuthTokenOrRegex;

Hmm.  With is_regex to check if a regex_t exists, both structures may
not be necessary.  I have not put my hands on that directly, but if
I guess that I would shape things to have only AuthToken with
(enforcing regex_t in priority if set in the list of elements to check
for a match):
- the string
- quoted
- regex_t
A list member should never have (regex_t != NULL && quoted), right?
Hostnames would never be quoted, as well.

> +# test with a comma in the regular expression
> +reset_pg_hba($node, 'all', '"/^.*5,.*e$"', 'password');
> +test_conn($node, 'user=md5,role', 'password', 'matching regexp for username',
> +    0);

So, we check here that the role includes "5," in its name.  This is
getting fun to parse ;)

>  elsif ($ENV{PG_TEST_EXTRA} !~ /\bssl\b/)
>  {
> -    plan skip_all => 'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA';
> +    plan skip_all =>
> +      'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA';
>  }

Unrelated noise from perltidy.
--
Michael

Attachment

signature.asc

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

"Drouvot, Bertrand"

Date:

12 October 2022, 06:17:14

Hi,

On 10/11/22 8:29 AM, Michael Paquier wrote:
> On Mon, Oct 10, 2022 at 09:00:06AM +0200, Drouvot, Bertrand wrote:
>>       foreach(cell, tokens)
>>       {
>> [...]
>> +        tokreg = lfirst(cell);
>> +        if (!token_is_regexp(tokreg))
>>           {
>> -            if (strcmp(dbname, role) == 0)
>> +            if (am_walsender && !am_db_walsender)
>> +            {
>> +                /*
>> +                 * physical replication walsender connections can only match
>> +                 * replication keyword
>> +                 */
>> +                if (token_is_keyword(tokreg->authtoken, "replication"))
>> +                    return true;
>> +            }
>> +            else if (token_is_keyword(tokreg->authtoken, "all"))
>>                   return true;
> 
> When checking the list of databases in check_db(), physical WAL
> senders (aka am_walsender && !am_db_walsender) would be able to accept
> regexps, but these should only accept "replication" and never a
> regexp, no?  

Oh right, good catch, thanks! Please find attached v6 fixing it.


> This is kind of special in the HBA logic, coming back to 9.0 where
> physical replication and this special role property have been
> introduced.  WAL senders have gained an actual database property later
> on in 9.4 with logical decoding, keeping "replication" for
> compatibility (connection strings can use replication=database to
> connect as a non-physical WAL sender and connect to a specific
> database).
> 

Thanks for the explanation!

>> +typedef struct AuthToken
>> +{
>> +    char       *string;
>> +    bool        quoted;
>> +} AuthToken;
>> +
>> +/*
>> + * Distinguish the case a token has to be treated as a regular
>> + * expression or not.
>> + */
>> +typedef struct AuthTokenOrRegex
>> +{
>> +    bool        is_regex;
>> +
>> +    /*
>> +     * Not an union as we still need the token string for fill_hba_line().
>> +     */
>> +    AuthToken  *authtoken;
>> +    regex_t    *regex;
>> +} AuthTokenOrRegex;
> 
> Hmm.  With is_regex to check if a regex_t exists, both structures may
> not be necessary. 

Agree that both struct are not necessary. In v6, AuthTokenOrRegex has 
been removed and the regex has been moved to AuthToken. There is no 
is_regex bool anymore, as it's enough to test whether regex is NULL or not.

> I have not put my hands on that directly, but if
> I guess that I would shape things to have only AuthToken with
> (enforcing regex_t in priority if set in the list of elements to check
> for a match):
> - the string
> - quoted
> - regex_t
> A list member should never have (regex_t != NULL && quoted), right?

The patch does allow that. For example it happens for the test where we 
add a comma in the role name. As we don't rely on a dedicated char to 
mark the end of a reg exp (we only rely on / to mark its start) then 
allowing (regex_t != NULL && quoted) seems reasonable to me.

>> +# test with a comma in the regular expression
>> +reset_pg_hba($node, 'all', '"/^.*5,.*e$"', 'password');
>> +test_conn($node, 'user=md5,role', 'password', 'matching regexp for username',
>> +    0);
> 
> So, we check here that the role includes "5," in its name.  This is
> getting fun to parse ;)
>

Indeed, ;-)


>>   elsif ($ENV{PG_TEST_EXTRA} !~ /\bssl\b/)
>>   {
>> -    plan skip_all => 'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA';
>> +    plan skip_all =>
>> +      'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA';
>>   }
> 
> Unrelated noise from perltidy.

Right.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v6-0001-hba_with_regexp.patch

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Michael Paquier

Date:

14 October 2022, 05:30:25

On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote:
> Indeed, ;-)

So, I have spent the last two days looking at all that, studying the
structure of the patch and the existing HEAD code, and it looks like
that a few things could be consolidated.

First, as of HEAD, AuthToken is only used for elements in a list of
role and database names in hba.conf before filling in each HbaLine,
hence we limit its usage to the initial parsing.  The patch assigns an
optional regex_t to it, then extends the use of AuthToken for single
hostname entries in pg_hba.conf.  Things going first: shouldn't we
combine ident_user and "re" together in the same structure?  Even if
we finish by not using AuthToken to store the computed regex, it seems
to me that we'd better use the same base structure for pg_ident.conf
and pg_hba.conf.  While looking closely at the patch, we would expand
the use of AuthToken outside its original context.  I have also looked
at make_auth_token(), and wondered if it could be possible to have this
routine compile the regexes.  This approach would not stick with
pg_ident.conf though, as we validate the fields in each line when we
put our hands on ident_user and after the base validation of a line
(number of fields, etc.).  So with all that in mind, it feels right to
not use AuthToken at all when building each HbaLine and each
IdentLine, but a new, separate, structure.  We could call that an
AuthItem (string, its compiled regex) perhaps?  It could have its own
make() routine, taking in input an AuthToken and process
pg_regcomp().  Better ideas for this new structure would be welcome,
and the idea is that we'd store the post-parsing state of an
AuthToken to something that has a compiled regex.  We could finish by
using AuthToken at the end and expand its use, but it does not feel
completely right either to have a make() routine but not be able to
compile its regular expression when creating the AuthToken.

The logic in charge of compiling the regular expressions could be
consolidated more.  The patch refactors the logic with
token_regcomp(), uses it for the user names (ident_user in
parse_ident_line() from pg_ident.conf), then extended to the hostnames
(single item) and the role/database names (list possible in these
cases).  This approach looks right to me.  Once we plug in an AuthItem
to IdentLine, token_regcomp could be changed so as it takes an
AuthToken in input, saving directly the compiled regex_t in the input
structure.

At the end, the global structure of the code should, I guess, respect
the following rules:
- The number of places where we check if a string is a regex should be
minimal (aka string beginning by '/').
- Only one code path of hba.c should call pg_regcomp() (the patch does
that), and only one code path should call pg_regexec() (two code paths
of hba.c do that with the patch, as of the need to store matching
expression).  This should minimize the areas where we call
pg_mb2wchar_with_len(), for one.

About this last point, token_regexec() does not include
check_ident_usermap() in its logic, and it seems to me that it should.
The difference is with the expected regmatch_t structures, so
shouldn't token_regexec be extended with two arguments as of an array
of regmatch_t and the number of elements in the array?  This would
save a bit some of the logic around pg_mb2wchar_with_len(), for
example.  To make all that work, token_regexec() should return an int,
coming from pg_regexec, but no specific error strings as we don't want
to spam the logs when checking hosts, roles and databases in
pg_hba.conf.

       /* Check if it has a CIDR suffix and if so isolate it */
-      cidr_slash = strchr(str, '/');
-      if (cidr_slash)
-          *cidr_slash = '\0';
+      if (!is_regexp)
+      {
+          cidr_slash = strchr(str, '/');
+          if (cidr_slash)
+              *cidr_slash = '\0';
+      }
[...]
            /* Get the netmask */
-           if (cidr_slash)
+           if (cidr_slash && !is_regexp)
            {
Some of the code handling regexes for hostnames itches me a bit, like
this one.  Perhaps it would be better to evaluate this interaction
with regular expressions separately.  The database and role names
don't have this need, so their cases are much simpler to think about.

The code could be split to tackle things step-by-step:
- One refactoring patch to introduce token_regcomp() and
token_regexec(), with the introduction of a new structure that
includes the compiled regexes.  (Feel free to counterargue about the
use of AuthToken for this purpose, of course!)
- Plug in the refactored logic for the lists of role names and
database names in pg_hba.conf.
- Handle the case of single host entries in pg_hba.conf.
--
Michael

Attachment

signature.asc

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Michael Paquier

Date:

14 October 2022, 06:18:52

On Fri, Oct 14, 2022 at 02:30:25PM +0900, Michael Paquier wrote:
> First, as of HEAD, AuthToken is only used for elements in a list of
> role and database names in hba.conf before filling in each HbaLine,
> hence we limit its usage to the initial parsing.  The patch assigns an
> optional regex_t to it, then extends the use of AuthToken for single
> hostname entries in pg_hba.conf.  Things going first: shouldn't we
> combine ident_user and "re" together in the same structure?  Even if
> we finish by not using AuthToken to store the computed regex, it seems
> to me that we'd better use the same base structure for pg_ident.conf
> and pg_hba.conf.  While looking closely at the patch, we would expand
> the use of AuthToken outside its original context.  I have also looked
> at make_auth_token(), and wondered if it could be possible to have this
> routine compile the regexes.  This approach would not stick with
> pg_ident.conf though, as we validate the fields in each line when we
> put our hands on ident_user and after the base validation of a line
> (number of fields, etc.).  So with all that in mind, it feels right to
> not use AuthToken at all when building each HbaLine and each
> IdentLine, but a new, separate, structure.  We could call that an
> AuthItem (string, its compiled regex) perhaps?  It could have its own
> make() routine, taking in input an AuthToken and process
> pg_regcomp().  Better ideas for this new structure would be welcome,
> and the idea is that we'd store the post-parsing state of an
> AuthToken to something that has a compiled regex.  We could finish by
> using AuthToken at the end and expand its use, but it does not feel
> completely right either to have a make() routine but not be able to
> compile its regular expression when creating the AuthToken.

I have have sent this part too quickly.  As AuthTokens are used in
check_db() and check_role() when matching entries, it is more
intuitive to store the regex_t directly in it.  Changing IdentLine to
use a AuthToken makes the "quoted" part useless in this case, still it
could be used in Assert()s to make sure that the data is shaped as
expected at check-time, enforced at false when creating it in
parse_ident_line()?
--
Michael

Attachment

signature.asc

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

"Drouvot, Bertrand"

Date:

14 October 2022, 12:47:04

Hi,

On 10/14/22 8:18 AM, Michael Paquier wrote:
> On Fri, Oct 14, 2022 at 02:30:25PM +0900, Michael Paquier wrote:
>> First, as of HEAD, AuthToken is only used for elements in a list of
>> role and database names in hba.conf before filling in each HbaLine,
>> hence we limit its usage to the initial parsing.  The patch assigns an
>> optional regex_t to it, then extends the use of AuthToken for single
>> hostname entries in pg_hba.conf.  Things going first: shouldn't we
>> combine ident_user and "re" together in the same structure?  Even if
>> we finish by not using AuthToken to store the computed regex, it seems
>> to me that we'd better use the same base structure for pg_ident.conf
>> and pg_hba.conf.  While looking closely at the patch, we would expand
>> the use of AuthToken outside its original context.  I have also looked
>> at make_auth_token(), and wondered if it could be possible to have this
>> routine compile the regexes.  This approach would not stick with
>> pg_ident.conf though, as we validate the fields in each line when we
>> put our hands on ident_user and after the base validation of a line
>> (number of fields, etc.).  So with all that in mind, it feels right to
>> not use AuthToken at all when building each HbaLine and each
>> IdentLine, but a new, separate, structure.  We could call that an
>> AuthItem (string, its compiled regex) perhaps?  It could have its own
>> make() routine, taking in input an AuthToken and process
>> pg_regcomp().  Better ideas for this new structure would be welcome,
>> and the idea is that we'd store the post-parsing state of an
>> AuthToken to something that has a compiled regex.  We could finish by
>> using AuthToken at the end and expand its use, but it does not feel
>> completely right either to have a make() routine but not be able to
>> compile its regular expression when creating the AuthToken.
> 
> I have have sent this part too quickly.  As AuthTokens are used in
> check_db() and check_role() when matching entries, it is more
> intuitive to store the regex_t directly in it. 

Yeah, I also think this is the right place for it.

> Changing IdentLine to
> use a AuthToken makes the "quoted" part useless in this case, still it
> could be used in Assert()s to make sure that the data is shaped as
> expected at check-time, enforced at false when creating it in
> parse_ident_line()?

I agree, that makes sense. I'll work on that.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

"Drouvot, Bertrand"

Date:

14 October 2022, 13:04:34

Hi,

On 10/14/22 7:30 AM, Michael Paquier wrote:
> On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote:
>> Indeed, ;-)
> 
> So, I have spent the last two days looking at all that, studying the
> structure of the patch and the existing HEAD code,

Thanks!

> The code could be split to tackle things step-by-step:
> - One refactoring patch to introduce token_regcomp() and
> token_regexec(), with the introduction of a new structure that
> includes the compiled regexes.  (Feel free to counterargue about the
> use of AuthToken for this purpose, of course!)
> - Plug in the refactored logic for the lists of role names and
> database names in pg_hba.conf.
> - Handle the case of single host entries in pg_hba.conf.
> --

I agree to work step-by-step.

While looking at it again now, I discovered that the new TAP test for 
the regexp on the hostname in ssl/002_scram.pl is failing on some of my 
tests environment (and not all..).

So, I agree with the dedicated steps you are proposing and that the 
"host case" needs a dedicated attention.

I'm not ignoring all the remarks you've just done up-thread, I'll 
address them and/or provide my feedback on them when I'll come back with 
the step-by-step sub patches.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

"Drouvot, Bertrand"

Date:

17 October 2022, 17:56:02

Hi,

On 10/14/22 7:30 AM, Michael Paquier wrote:
> On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote:
>> Indeed, ;-)
> 
> 
> I have also looked
> at make_auth_token(), and wondered if it could be possible to have this
> routine compile the regexes. 

I think that it makes sense.

> This approach would not stick with
> pg_ident.conf though, as we validate the fields in each line when we
> put our hands on ident_user and after the base validation of a line
> (number of fields, etc.).

I'm not sure to get the issue here with the proposed approach and 
pg_ident.conf.

The new attached patch proposal is making use of make_auth_token() 
(through copy_auth_token()) in parse_ident_line(), do you see any issue?

> 
> The logic in charge of compiling the regular expressions could be
> consolidated more.  The patch refactors the logic with
> token_regcomp(), uses it for the user names (ident_user in
> parse_ident_line() from pg_ident.conf), then extended to the hostnames
> (single item) and the role/database names (list possible in these
> cases).  This approach looks right to me.  Once we plug in an AuthItem
> to IdentLine, token_regcomp could be changed so as it takes an
> AuthToken in input

Right, did it that way in the attached.


> - Only one code path of hba.c should call pg_regcomp() (the patch does
> that), and only one code path should call pg_regexec() (two code paths
> of hba.c do that with the patch, as of the need to store matching
> expression).  This should minimize the areas where we call
> pg_mb2wchar_with_len(), for one.

Right.

> About this last point, token_regexec() does not include
> check_ident_usermap() in its logic, and it seems to me that it should.
> The difference is with the expected regmatch_t structures, so
> shouldn't token_regexec be extended with two arguments as of an array
> of regmatch_t and the number of elements in the array? 

You are right, not using token_regexec() in check_ident_usermap() in the 
previous patch versions was not right. That's fixed in the attached, 
though the substitution (if any) is still outside of token_regexec(), do 
you think it should be part of it? (I think that makes sense to keep it 
outside of it as we wont use the substitution logic for roles, databases 
and hostname)

> 
> The code could be split to tackle things step-by-step:
> - One refactoring patch to introduce token_regcomp() and
> token_regexec()

Agree. Please find attached v1-0001-token_reg-functions.patch for this 
first step.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v1-0001-token_reg-functions.patch

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Michael Paquier

Date:

18 October 2022, 05:51:56

On Mon, Oct 17, 2022 at 07:56:02PM +0200, Drouvot, Bertrand wrote:
> On 10/14/22 7:30 AM, Michael Paquier wrote:
>> This approach would not stick with
>> pg_ident.conf though, as we validate the fields in each line when we
>> put our hands on ident_user and after the base validation of a line
>> (number of fields, etc.).
>
> I'm not sure to get the issue here with the proposed approach and
> pg_ident.conf.

My point is about parse_ident_line(), where we need to be careful in
the order of the operations.  The macros IDENT_MULTI_VALUE() and
IDENT_FIELD_ABSENT() need to be applied on all the fields first, and
the regex computation needs to be last.  Your patch had a subtile
issue here, as users may get errors on the computed regex before the
ordering of the fields as the computation was used *before* the "Get
the PG rolename token" part of the logic.

>> About this last point, token_regexec() does not include
>> check_ident_usermap() in its logic, and it seems to me that it should.
>> The difference is with the expected regmatch_t structures, so
>> shouldn't token_regexec be extended with two arguments as of an array
>> of regmatch_t and the number of elements in the array?
>
> You are right, not using token_regexec() in check_ident_usermap() in the
> previous patch versions was not right. That's fixed in the attached, though
> the substitution (if any) is still outside of token_regexec(), do you think
> it should be part of it? (I think that makes sense to keep it outside of it
> as we wont use the substitution logic for roles, databases and hostname)

Keeping the substition done with the IdentLine's Authtokens outside of
the internal execution routine is fine by me.

While putting my hands on that, I was also wondering whether we should
have the error string generated after compilation within the internal
regcomp() routine, but that would require more arguments to
pg_regcomp() (as of file name, line number, **err_string), and that
looks more invasive than necessary.  Perhaps the follow-up steps will
prove me wrong, though :)

A last thing is the introduction of a free() routine for AuthTokens,
to minimize the number of places where we haev pg_regfree().  The gain
is minimal, but that looks more consistent with the execution and
compilation paths.
--
Michael

Attachment

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

"Drouvot, Bertrand"

Date:

18 October 2022, 07:14:21

Hi,

On 10/18/22 7:51 AM, Michael Paquier wrote:
> On Mon, Oct 17, 2022 at 07:56:02PM +0200, Drouvot, Bertrand wrote:
>> On 10/14/22 7:30 AM, Michael Paquier wrote:
>>> This approach would not stick with
>>> pg_ident.conf though, as we validate the fields in each line when we
>>> put our hands on ident_user and after the base validation of a line
>>> (number of fields, etc.).
>>
>> I'm not sure to get the issue here with the proposed approach and
>> pg_ident.conf.
> 
> My point is about parse_ident_line(), where we need to be careful in
> the order of the operations.  The macros IDENT_MULTI_VALUE() and
> IDENT_FIELD_ABSENT() need to be applied on all the fields first, and
> the regex computation needs to be last.  Your patch had a subtile
> issue here, as users may get errors on the computed regex before the
> ordering of the fields as the computation was used *before* the "Get
> the PG rolename token" part of the logic.

Gotcha, thanks! I was wondering if we shouldn't add a comment about that 
and I see that you've added one in v2, thanks!

BTW, what about adding a new TAP test (dedicated patch) to test the 
behavior in case of errors during the regexes compilation in 
pg_ident.conf and pg_hba.conf (not done yet)? (we could add it once this 
  patch series is done).

> While putting my hands on that, I was also wondering whether we should
> have the error string generated after compilation within the internal
> regcomp() routine, but that would require more arguments to
> pg_regcomp() (as of file name, line number, **err_string), and that
> looks more invasive than necessary.  Perhaps the follow-up steps will
> prove me wrong, though :)

I've had the same thought (and that was what the previous global patch 
was doing). I'm tempted to think that the follow-steps will prove you 
right ;-) (specially if at the end those will be the same error messages 
for databases and roles).

> 
> A last thing is the introduction of a free() routine for AuthTokens,
> to minimize the number of places where we haev pg_regfree().  The gain
> is minimal, but that looks more consistent with the execution and
> compilation paths.

Agree, that looks better.

I had a look at your v2, did a few tests and it looks good to me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Michael Paquier

Date:

19 October 2022, 01:18:42

On Tue, Oct 18, 2022 at 09:14:21AM +0200, Drouvot, Bertrand wrote:
> BTW, what about adding a new TAP test (dedicated patch) to test the behavior
> in case of errors during the regexes compilation in pg_ident.conf and
> pg_hba.conf (not done yet)? (we could add it once this  patch series is
> done).

Perhaps, that may become tricky when it comes to -DEXEC_BACKEND (for
cases where no fork() implementation is available, aka Windows).  But
a postmaster restart failure would generate logs that could be picked
for a pattern check?

>> While putting my hands on that, I was also wondering whether we should
>> have the error string generated after compilation within the internal
>> regcomp() routine, but that would require more arguments to
>> pg_regcomp() (as of file name, line number, **err_string), and that
>> looks more invasive than necessary.  Perhaps the follow-up steps will
>> prove me wrong, though :)
>
> I've had the same thought (and that was what the previous global patch was
> doing). I'm tempted to think that the follow-steps will prove you right ;-)
> (specially if at the end those will be the same error messages for databases
> and roles).

Avoiding three times the same error message seems like a good thing in
the long run, but let's think about this part later as needed.  All
these routines are static to hba.c so even if we finish by not
finishing the whole job for this development cycle we can still be
very flexible.
--
Michael

Attachment

signature.asc

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

"Drouvot, Bertrand"

Date:

19 October 2022, 07:50:45

Hi,

On 10/19/22 3:18 AM, Michael Paquier wrote:
> On Tue, Oct 18, 2022 at 09:14:21AM +0200, Drouvot, Bertrand wrote:
>> BTW, what about adding a new TAP test (dedicated patch) to test the behavior
>> in case of errors during the regexes compilation in pg_ident.conf and
>> pg_hba.conf (not done yet)? (we could add it once this  patch series is
>> done).
> 
> Perhaps, that may become tricky when it comes to -DEXEC_BACKEND (for
> cases where no fork() implementation is available, aka Windows).  But
> a postmaster restart failure would generate logs that could be picked
> for a pattern check?

Right, that's how I'd see it. I'll give it a look.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

"Drouvot, Bertrand"

Date:

19 October 2022, 08:45:44

Hi,

On 10/14/22 7:30 AM, Michael Paquier wrote:
> On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote:
>> Indeed, ;-)
> 
> 
> The code could be split to tackle things step-by-step:
> - One refactoring patch to introduce token_regcomp() and
> token_regexec(), with the introduction of a new structure that
> includes the compiled regexes.  (Feel free to counterargue about the
> use of AuthToken for this purpose, of course!)
> - Plug in the refactored logic for the lists of role names and
> database names in pg_hba.conf.

Please find attached 
v1-0001-regex-handling-for-db-and-roles-in-hba.patch to implement 
regexes for databases and roles in hba.

It does also contain new regexes related TAP tests and doc updates.

It relies on the refactoring made in fc579e11c6 (but changes the 
regcomp_auth_token() parameters so that it is now responsible for 
emitting the compilation error message (if any), to avoid code 
duplication in parse_hba_line() and parse_ident_line() for roles, 
databases and user name mapping).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v1-0001-regex-handling-for-db-and-roles-in-hba.patch

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Michael Paquier

Date:

21 October 2022, 00:58:03

On Wed, Oct 19, 2022 at 10:45:44AM +0200, Drouvot, Bertrand wrote:
> Please find attached v1-0001-regex-handling-for-db-and-roles-in-hba.patch to
> implement regexes for databases and roles in hba.
>
> It does also contain new regexes related TAP tests and doc updates.

Thanks for the updated version.  This is really easy to look at now.

> It relies on the refactoring made in fc579e11c6 (but changes the
> regcomp_auth_token() parameters so that it is now responsible for emitting
> the compilation error message (if any), to avoid code duplication in
> parse_hba_line() and parse_ident_line() for roles, databases and user name
> mapping).

@@ -652,13 +670,18 @@ check_role(const char *role, Oid roleid, List *tokens)
[...]
-       if (!tok->quoted && tok->string[0] == '+')
+       if (!token_has_regexp(tok))
        {
Hmm.  Do we need token_has_regexp() here for all the cases?  We know
that the string can begin with a '+', hence it is no regex.  The same
applies for the case of "all".  The remaining case is the one where
the user name matches exactly the AuthToken string, which should be
last as we want to treat anything beginning with a '/' as a regex.  It
seems like we could do an order like that?  Say:
if (!tok->quoted && tok->string[0] == '+')
    //do
else if (token_is_keyword(tok, "all"))
    //do
else if (token_has_regexp(tok))
    //do regex compilation, handling failures
else if (token_matches(tok, role))
    //do exact match

The same approach with keywords first, regex, and exact match could be
applied as well for the databases?  Perhaps it is just mainly a matter
of taste, and it depends on how much you want to prioritize the place
of the regex over the rest but that could make the code easier to
understand in the long-run and this is a very sensitive code area, and
the case of physical walsenders (in short specific process types)
requiringx specific conditions is also something to take into account.

    foreach(tokencell, tokens)
    {
-       parsedline->roles = lappend(parsedline->roles,
-                                   copy_auth_token(lfirst(tokencell)));
+       AuthToken  *tok = copy_auth_token(lfirst(tokencell));
+
+       /*
+        * Compile a regex from the role token, if necessary.
+        */
+       if (regcomp_auth_token(tok, HbaFileName, line_num, err_msg, elevel))
+           return NULL;
+
+       parsedline->roles = lappend(parsedline->roles, tok);
    }

Compiling the expressions for the user and database lists one-by-one
in parse_hba_line() as you do is correct.  However there is a gotcha
that you are forgetting here: the memory allocations done for the
regexp compilations are not linked to the memory context where each
line is processed (named hbacxt in load_hba()) and need a separate
cleanup.  In the same fashion as load_ident(), it seems to me that we
need two extra things for this patch:
- if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go
through new_parsed_lines and free for each line the AuthTokens for the
database and user lists.
- if ok and new_parsed_lines != NIL, the same cleanup needs to
happen.
My guess is that you could do both the same way as load_ident() does,
keeping some symmetry between the two code paths.  Unifying both into
a common routine would be sort of annoying as HbaLines uses lists
within the lists of parsed lines, and IdentLine can have one at most
in each line.

I am wondering whether we should link the regexp code to not use
malloc(), actually..  This would require a separate analysis, though,
and I suspect that palloc() would be very expensive for this job.

For now, I have made your last patch a bit shorter by applying the
refactoring of regcomp_auth_token() separately with a few tweaks to
the comments.
--
Michael

Attachment

signature.asc

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

"Drouvot, Bertrand"

Date:

21 October 2022, 12:10:37

Hi,

On 10/21/22 2:58 AM, Michael Paquier wrote:
> On Wed, Oct 19, 2022 at 10:45:44AM +0200, Drouvot, Bertrand wrote:
>> Please find attached v1-0001-regex-handling-for-db-and-roles-in-hba.patch to
>> implement regexes for databases and roles in hba.
>>
>> It does also contain new regexes related TAP tests and doc updates.
> 
> Thanks for the updated version.  This is really easy to look at now.
> 
>> It relies on the refactoring made in fc579e11c6 (but changes the
>> regcomp_auth_token() parameters so that it is now responsible for emitting
>> the compilation error message (if any), to avoid code duplication in
>> parse_hba_line() and parse_ident_line() for roles, databases and user name
>> mapping).
> 
> @@ -652,13 +670,18 @@ check_role(const char *role, Oid roleid, List *tokens)
> [...]
> -       if (!tok->quoted && tok->string[0] == '+')
> +       if (!token_has_regexp(tok))
>          {
> Hmm.  Do we need token_has_regexp() here for all the cases?  We know
> that the string can begin with a '+', hence it is no regex.  The same
> applies for the case of "all".  The remaining case is the one where
> the user name matches exactly the AuthToken string, which should be
> last as we want to treat anything beginning with a '/' as a regex.  It
> seems like we could do an order like that?  Say:
> if (!tok->quoted && tok->string[0] == '+')
>      //do
> else if (token_is_keyword(tok, "all"))
>      //do
> else if (token_has_regexp(tok))
>      //do regex compilation, handling failures
> else if (token_matches(tok, role))
>      //do exact match
> 
> The same approach with keywords first, regex, and exact match could be
> applied as well for the databases?  Perhaps it is just mainly a matter
> of taste, 

Yeah, I think it is.

> and it depends on how much you want to prioritize the place
> of the regex over the rest but that could make the code easier to
> understand in the long-run and this is a very sensitive code area, 

And agree that your proposal tastes better ;-): it is easier to 
understand, v2 attached has been done that way.

> Compiling the expressions for the user and database lists one-by-one
> in parse_hba_line() as you do is correct.  However there is a gotcha
> that you are forgetting here: the memory allocations done for the
> regexp compilations are not linked to the memory context where each
> line is processed (named hbacxt in load_hba()) and need a separate
> cleanup. 

Oops, right, thanks for the call out!

> In the same fashion as load_ident(), it seems to me that we
> need two extra things for this patch:
> - if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go
> through new_parsed_lines and free for each line the AuthTokens for the
> database and user lists.
> - if ok and new_parsed_lines != NIL, the same cleanup needs to
> happen.

Right, but I think that should be "parsed_hba_lines != NIL".

> My guess is that you could do both the same way as load_ident() does,
> keeping some symmetry between the two code paths. 

Right. To avoid code duplication in the !ok/ok cases, the function 
free_hba_line() has been added in v2: it goes through the list of 
databases and roles tokens and call free_auth_token() for each of them.

> Unifying both into
> a common routine would be sort of annoying as HbaLines uses lists
> within the lists of parsed lines, and IdentLine can have one at most
> in each line.

I agree, and v2 is not attempting to unify them.

> For now, I have made your last patch a bit shorter by applying the
> refactoring of regcomp_auth_token() separately with a few tweaks to
> the comments.

Thanks! v2 attached does apply on top of that.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v2-0001-regex-handling-for-db-and-roles-in-hba.patch

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

Michael Paquier

Date:

24 October 2022, 03:34:32

On Fri, Oct 21, 2022 at 02:10:37PM +0200, Drouvot, Bertrand wrote:
> On 10/21/22 2:58 AM, Michael Paquier wrote:
>> The same approach with keywords first, regex, and exact match could be
>> applied as well for the databases?  Perhaps it is just mainly a matter
>> of taste,
>
> Yeah, I think it is.

;)

Still it looks that this makes for less confusion with a minimal
footprint once the new additions are in place.

>> In the same fashion as load_ident(), it seems to me that we
>> need two extra things for this patch:
>> - if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go
>> through new_parsed_lines and free for each line the AuthTokens for the
>> database and user lists.
>> - if ok and new_parsed_lines != NIL, the same cleanup needs to
>> happen.
>
> Right, but I think that should be "parsed_hba_lines != NIL".

For the second case, where we need to free the past contents after a
success, yes.

> Right. To avoid code duplication in the !ok/ok cases, the function
> free_hba_line() has been added in v2: it goes through the list of databases
> and roles tokens and call free_auth_token() for each of them.

Having a small-ish routine for that is fine.

I have spent a couple of hours doing a pass over v2, playing manually
with regex patterns, reloads, the system views and item lists.  The
logic was fine, but I have adjusted a few things related to the
comments and the documentation (particularly with the examples,
removing one example and updating one with a regex that has a comma,
needing double quotes).  The CI and all my machines were green, and
the test coverage looked sufficient.  So, applied.  I'll keep an eye
on the buildfarm.
--
Michael

Attachment

signature.asc

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

From

"Drouvot, Bertrand"

Date:

24 October 2022, 10:17:47

Hi,

On 10/24/22 5:34 AM, Michael Paquier wrote:
> On Fri, Oct 21, 2022 at 02:10:37PM +0200, Drouvot, Bertrand wrote:
>> On 10/21/22 2:58 AM, Michael Paquier wrote:
> 
> I have spent a couple of hours doing a pass over v2, playing manually
> with regex patterns, reloads, the system views and item lists.  The
> logic was fine, but I have adjusted a few things related to the
> comments and the documentation (particularly with the examples,
> removing one example and updating one with a regex that has a comma,
> needing double quotes).  The CI and all my machines were green, and
> the test coverage looked sufficient.  So, applied. 
Thanks!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com