Thread: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
"Drouvot, Bertrand"
Date:
Hi hackers, Attached is a patch proposal to allow the use of regular expressions for the username in pg_hba.conf. Using regular expressions for the username in the pg_hba.conf file is convenient in situations where an organization has a large number of users and needs an expressive way to map them. For example, if an organization wants to allow gss connections only for users having their principal, e.g. @BDTFOREST.LOCAL, they could make use of an entry in pg_hba.conf such as: host all /^.*@BDTFOREST.LOCAL$ 0.0.0.0/0 gss Without this patch, I can think of three alternatives with existing functionality, which all of tradeoffs. This includes: 1) Create an entry per user: this is challenging for organizations managing large numbers of users (e.g. 1000s). This is also not dynamic, i.e. the HBA file would need to be updated when users are added or removed. 2) Use a mapping in pg_ident.conf, for example: Here is an entry in pg_hba.conf that uses a map: host all all 0.0.0.0/0 gss map=mygssmap and by defining this mapping in pg_ident.conf: mygssmap /^(.*)@BDTFOREST\.LOCAL$ \1@BDTFOREST.LOCAL That works for filtering the username. LOG: connection authenticated: identity="bertrand@BDTFOREST.LOCAL" method=gss (/pg_installed/data/pg_hba.conf:95) $ grep -n mygssmap /pg_installed/data/pg_hba.conf 95:host all all 0.0.0.0/0 gss map=mygssmap However, the behavior is not the same for the ones that don’t match the mapping in pg_ident.conf: indeed the connection attempt stop here and the next HBA line won’t be evaluated. FATAL: GSSAPI authentication failed for user "bdt" DETAIL: Connection matched pg_hba.conf line 95: "host all all 0.0.0.0/0 gss map=mygssmap" 3) Make use of a role in pg_hba.conf, e.g. “+BDTONLY”. That would work too, and also allow the evaluation of the next HBA line for the ones that are not part of the role. However: - That’s not as dynamic as the regular expression, as new users would need to be granted the role and some users who are moving in the company may need to have the role revoked. - Looking at the regular expression in the HBA file makes it clear what filtering needs to be done. This is not obvious when looking at the role, even if it has a meaningful name. This can generate “incorrect filtering” should one user be granted the role by mistake, or make it more difficult to debug why a user is not being matched to a particular line in the HBA file. This is why I think username filtering with regular expressions would provide its own advantages. Thoughts? Looking forward to your feedback, Regards, -- Bertrand Drouvot Amazon Web Services: https://aws.amazon.com
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Jacob Champion
Date:
On Fri, Aug 19, 2022 at 1:13 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote: > This is why I think username filtering with regular expressions would > provide its own advantages. > > Thoughts? Looking forward to your feedback, I think your motivation for the feature is solid. It is killing me a bit that this is making it easier to switch authentication methods based on the role name, when I suspect what someone might really want is to switch authentication methods based on the ID the user is trying to authenticate with. But that's not your fault or problem to fix, because the startup packet doesn't currently have that information. (It does make me wonder whether I withdrew my PGAUTHUSER proposal [1] a month too early. And man, do I wish that pg_ident and pg_hba were one file.) I think you're going to have to address backwards compatibility concerns. Today, I can create a role named "/a", and I can put that into the HBA without quoting it. I'd be unamused if, after an upgrade, my rule suddenly matched any role name containing an 'a'. Speaking of partial matches, should this feature allow them? Maybe rules should have to match the entire username instead, and sidestep the inevitable "I forgot to anchor my regex" problems? Thanks, --Jacob [1] https://commitfest.postgresql.org/38/3314/
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Tom Lane
Date:
Jacob Champion <jchampion@timescale.com> writes: > On Fri, Aug 19, 2022 at 1:13 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote: >> This is why I think username filtering with regular expressions would >> provide its own advantages. > I think your motivation for the feature is solid. Yeah. I'm not sure that I buy the argument that this is more useful than writing a role name and controlling things with GRANT ROLE, but it's a plausible alternative with properties that might win in some use-cases. So I see little reason not to allow it. I'd actually ask why stop here? In particular, why not do the same with the database-name column, especially since that does *not* have the ability to use roles as a substitute for a wildcard entry? > I think you're going to have to address backwards compatibility > concerns. Today, I can create a role named "/a", and I can put that > into the HBA without quoting it. I'd be unamused if, after an upgrade, > my rule suddenly matched any role name containing an 'a'. Meh ... that concern seems overblown to me. I guess it's possible that somebody has an HBA entry that looks like that, but it doesn't seem very plausible. Note that we made this exact same change in pg_ident.conf years ago, and AFAIR we got zero complaints. > Speaking of partial matches, should this feature allow them? Maybe > rules should have to match the entire username instead, and sidestep > the inevitable "I forgot to anchor my regex" problems? I think the pg_ident.conf precedent is binding on us here. If we make this one work differently, nobody's going to thank us for it, they're just going to wonder "did the left hand not know what the right hand already did?" regards, tom lane
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
"Drouvot, Bertrand"
Date:
Hi,
On 9/9/22 2:46 AM, Tom Lane wrote:
Jacob Champion <jchampion@timescale.com> writes:On Fri, Aug 19, 2022 at 1:13 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:This is why I think username filtering with regular expressions would provide its own advantages.I think your motivation for the feature is solid.Yeah. I'm not sure that I buy the argument that this is more useful than writing a role name and controlling things with GRANT ROLE, but it's a plausible alternative with properties that might win in some use-cases. So I see little reason not to allow it.
Thank you both for your feedback.
I'd actually ask why stop here? In particular, why not do the same with the database-name column, especially since that does *not* have the ability to use roles as a substitute for a wildcard entry?
I think that's a fair point, I'll look at it.
I think you're going to have to address backwards compatibility concerns. Today, I can create a role named "/a", and I can put that into the HBA without quoting it. I'd be unamused if, after an upgrade, my rule suddenly matched any role name containing an 'a'.Meh ... that concern seems overblown to me. I guess it's possible that somebody has an HBA entry that looks like that, but it doesn't seem very plausible. Note that we made this exact same change in pg_ident.conf years ago, and AFAIR we got zero complaints.
Agree that it seems unlikely but maybe we could add a new GUC to turn the regex usage on the hba file on/off (and use off as the default)?
Regards,
-- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Tom Lane
Date:
"Drouvot, Bertrand" <bdrouvot@amazon.com> writes: > Agree that it seems unlikely but maybe we could add a new GUC to turn > the regex usage on the hba file on/off (and use off as the default)? I think that will just add useless complication. regards, tom lane
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Jacob Champion
Date:
On Thu, Sep 8, 2022 at 5:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jacob Champion <jchampion@timescale.com> writes: > > I think you're going to have to address backwards compatibility > > concerns. Today, I can create a role named "/a", and I can put that > > into the HBA without quoting it. I'd be unamused if, after an upgrade, > > my rule suddenly matched any role name containing an 'a'. > > Meh ... that concern seems overblown to me. I guess it's possible > that somebody has an HBA entry that looks like that, but it doesn't > seem very plausible. Note that we made this exact same change in > pg_ident.conf years ago, and AFAIR we got zero complaints. What percentage of users actually use pg_ident maps? My assumption would be that a change to pg_hba would affect many more people, but then I don't have any proof that there are users with role names that look like that to begin with. I won't pound the table with it. > > Speaking of partial matches, should this feature allow them? Maybe > > rules should have to match the entire username instead, and sidestep > > the inevitable "I forgot to anchor my regex" problems? > > I think the pg_ident.conf precedent is binding on us here. If we > make this one work differently, nobody's going to thank us for it, > they're just going to wonder "did the left hand not know what the > right hand already did?" Hmm... yeah, I suppose. From the other direction, it'd be bad to train users that unanchored regexes are safe in pg_hba only to take those guardrails off in pg_ident. I will tuck that away as a potential behavior change, for a different thread. Thanks, --Jacob
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Jacob Champion
Date:
On 8/19/22 01:12, Drouvot, Bertrand wrote: > + wstr = palloc((strlen(tok->string + 1) + 1) * sizeof(pg_wchar)); > + wlen = pg_mb2wchar_with_len(tok->string + 1, > + wstr, strlen(tok->string + 1)); The (tok->string + 1) construction comes up often enough that I think it should be put in a `regex` variable or similar. That would help my eyes with the (strlen(tok->string + 1) + 1) construction, especially. I noticed that for pg_ident, we precompile the regexes per-line and reuse those in child processes. Whereas here we're compiling, using, and then discarding the regex for each check. I think the example set by the pg_ident code is probably the one to follow, unless you have a good reason not to. > +# Testing with regular expression for username > +reset_pg_hba($node, '/^.*md.*$', 'password'); > +test_role($node, 'md5_role', 'password from pgpass and regular expression for username', 0); > + IMO the coverage for this patch needs to be filled out. Negative test cases are more important than positive ones for security-related code. Other than that, and Tom's note on potentially expanding this to other areas, I think this is a pretty straightforward patch. Thanks, --Jacob
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
"Drouvot, Bertrand"
Date:
Hi,
On 9/10/22 1:21 AM, Jacob Champion wrote:
On 8/19/22 01:12, Drouvot, Bertrand wrote:+ wstr = palloc((strlen(tok->string + 1) + 1) * sizeof(pg_wchar)); + wlen = pg_mb2wchar_with_len(tok->string + 1, + wstr, strlen(tok->string + 1));The (tok->string + 1) construction comes up often enough that I think it should be put in a `regex` variable or similar. That would help my eyes with the (strlen(tok->string + 1) + 1) construction, especially. I noticed that for pg_ident, we precompile the regexes per-line and reuse those in child processes. Whereas here we're compiling, using, and then discarding the regex for each check. I think the example set by the pg_ident code is probably the one to follow, unless you have a good reason not to.
Thanks for the feedback.
Yeah fully agree. I'll provide a new version that follow the same logic as the pg_ident code.
+# Testing with regular expression for username +reset_pg_hba($node, '/^.*md.*$', 'password'); +test_role($node, 'md5_role', 'password from pgpass and regular expression for username', 0); +IMO the coverage for this patch needs to be filled out. Negative test cases are more important than positive ones for security-related code.
Agree, will do.
Other than that, and Tom's note on potentially expanding this to other areas,
I'll add regexp usage for the database column and also the for the address one when non CIDR is provided (so host name(s)) (I think it also makes sense specially as we don't allow multiple values for this column).
Regards,
-- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
"Drouvot, Bertrand"
Date:
Hi, On 9/12/22 9:55 AM, Drouvot, Bertrand wrote: > Hi, > > On 9/10/22 1:21 AM, Jacob Champion wrote: >> On 8/19/22 01:12, Drouvot, Bertrand wrote: >>> + wstr = palloc((strlen(tok->string + 1) + 1) * sizeof(pg_wchar)); >>> + wlen = pg_mb2wchar_with_len(tok->string + 1, >>> + wstr, strlen(tok->string + 1)); >> The (tok->string + 1) construction comes up often enough that I think it >> should be put in a `regex` variable or similar. That would help my eyes >> with the (strlen(tok->string + 1) + 1) construction, especially. >> >> I noticed that for pg_ident, we precompile the regexes per-line and >> reuse those in child processes. Whereas here we're compiling, using, and >> then discarding the regex for each check. I think the example set by the >> pg_ident code is probably the one to follow, unless you have a good >> reason not to. > > Thanks for the feedback. > > Yeah fully agree. I'll provide a new version that follow the same logic > as the pg_ident code. > >>> +# Testing with regular expression for username >>> +reset_pg_hba($node, '/^.*md.*$', 'password'); >>> +test_role($node, 'md5_role', 'password from pgpass and regular expression for username', 0); >>> + >> IMO the coverage for this patch needs to be filled out. Negative test >> cases are more important than positive ones for security-related code. > > Agree, will do. > >> Other than that, and Tom's note on potentially expanding this to other >> areas, > > I'll add regexp usage for the database column and also the for the > address one when non CIDR is provided (so host name(s)) (I think it also > makes sense specially as we don't allow multiple values for this column). > Please find attached v2 addressing the comments mentioned above. v2 also provides regular expression usage for the database and the address columns (when a host name is being used). Remark: The CF bot is failing for Windows (all other tests are green) and only for the new tap test related to the regular expression on the host name (the ones on database and role are fine). The issue is not related to the patch. The issue is that the Windows Cirrus test does not like when a host name is provided for a "host" entry in pg_hba.conf (while it works fine when a CIDR is provided). You can see an example in [1] where the only change is to replace the CIDR by "localhost" in 002_scram.pl. As you can see the Cirrus tests are failing on Windows only (its log file is here [2]). I'll look at this "Windows" related issue but would appreciate any guidance/help if someone has experience in this area on windows. [1]: https://github.com/bdrouvot/postgres/branches on branch “host_non_cidr” [2]: https://api.cirrus-ci.com/v1/artifact/task/6507279833890816/log/src/test/ssl/tmp_check/log/002_scram_primary.log Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Michael Paquier
Date:
On Fri, Sep 16, 2022 at 06:24:07PM +0200, Drouvot, Bertrand wrote: > The CF bot is failing for Windows (all other tests are green) and only for > the new tap test related to the regular expression on the host name (the > ones on database and role are fine). > > The issue is not related to the patch. The issue is that the Windows Cirrus > test does not like when a host name is provided for a "host" entry in > pg_hba.conf (while it works fine when a CIDR is provided). > > You can see an example in [1] where the only change is to replace the CIDR > by "localhost" in 002_scram.pl. As you can see the Cirrus tests are failing > on Windows only (its log file is here [2]). > > I'll look at this "Windows" related issue but would appreciate any > guidance/help if someone has experience in this area on windows. I recall that being able to do a reverse lookup of a hostname on Windows for localhost requires a few extra setup steps as that's not guaranteed to be set in all environments by default, which is why we go at great length to use 127.0.0.1 in the TAP test setup for example (see Cluster.pm). Looking at your patch, the goal is to test the mapping of regular expression for host names, user names and database names. If the first case is not guaranteed, my guess is that it is fine to skip this portion of the tests on Windows. While reading the patch, I am a bit confused about token_regcomp() and token_regexec(). It would help the review a lot if these were documented with proper comments, even if these act roughly as wrappers for pg_regexec() and pg_regcomp(). -- Michael
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
"Drouvot, Bertrand"
Date:
Hi, On 9/17/22 8:53 AM, Michael Paquier wrote: > On Fri, Sep 16, 2022 at 06:24:07PM +0200, Drouvot, Bertrand wrote: >> The CF bot is failing for Windows (all other tests are green) and only for >> the new tap test related to the regular expression on the host name (the >> ones on database and role are fine). >> >> The issue is not related to the patch. The issue is that the Windows Cirrus >> test does not like when a host name is provided for a "host" entry in >> pg_hba.conf (while it works fine when a CIDR is provided). >> >> You can see an example in [1] where the only change is to replace the CIDR >> by "localhost" in 002_scram.pl. As you can see the Cirrus tests are failing >> on Windows only (its log file is here [2]). >> >> I'll look at this "Windows" related issue but would appreciate any >> guidance/help if someone has experience in this area on windows. > > I recall that being able to do a reverse lookup of a hostname on > Windows for localhost requires a few extra setup steps as that's not > guaranteed to be set in all environments by default, which is why we > go at great length to use 127.0.0.1 in the TAP test setup for example > (see Cluster.pm). Looking at your patch, the goal is to test the > mapping of regular expression for host names, user names and database > names. If the first case is not guaranteed, my guess is that it is > fine to skip this portion of the tests on Windows. Thanks for looking at it! That sounds reasonable, v3 attached is skipping the regular expression tests for the hostname on Windows. > > While reading the patch, I am a bit confused about token_regcomp() and > token_regexec(). It would help the review a lot if these were > documented with proper comments, even if these act roughly as wrappers > for pg_regexec() and pg_regcomp(). Fully agree, comments were missing. They've been added in v3 attached. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Michael Paquier
Date:
On Fri, Sep 09, 2022 at 03:05:18PM -0700, Jacob Champion wrote: > On Thu, Sep 8, 2022 at 5:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Jacob Champion <jchampion@timescale.com> writes: >> > I think you're going to have to address backwards compatibility >> > concerns. Today, I can create a role named "/a", and I can put that >> > into the HBA without quoting it. I'd be unamused if, after an upgrade, >> > my rule suddenly matched any role name containing an 'a'. >> >> Meh ... that concern seems overblown to me. I guess it's possible >> that somebody has an HBA entry that looks like that, but it doesn't >> seem very plausible. Note that we made this exact same change in >> pg_ident.conf years ago, and AFAIR we got zero complaints. > > What percentage of users actually use pg_ident maps? My assumption > would be that a change to pg_hba would affect many more people, but > then I don't have any proof that there are users with role names that > look like that to begin with. I won't pound the table with it. This concern does not sound overblown to me. A change in pg_hba.conf impacts everybody. I was just looking at this patch, and the logic with usernames and databases is changed so as we would *always* treat *any* entries beginning with a slash as a regular expression, skipping them if they don't match with an error fed to the logs and pg_hba_file_rules. This could lead to silent security issues as stricter HBA policies need to be located first in pg_hba.conf and these could suddenly fail to load. It would be much safer to me if we had in place some restrictions to avoid such problems to happen, meaning some extra checks in the DDL code paths for such object names and perhaps even something in pg_upgrade with a scan at pg_database and pg_authid. On the bright side, I have been looking at some of the RFCs covering the set of characters allowed in DNS names and slashes are not authorized in hostnames, making this change rather safe AFAIK. -- Michael
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes: >> On Thu, Sep 8, 2022 at 5:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Meh ... that concern seems overblown to me. I guess it's possible >>> that somebody has an HBA entry that looks like that, but it doesn't >>> seem very plausible. Note that we made this exact same change in >>> pg_ident.conf years ago, and AFAIR we got zero complaints. > This concern does not sound overblown to me. You have to assume that somebody (a) has a role or DB name starting with slash, (b) has an explicit reference to that name in their pg_hba.conf, (c) doesn't read the release notes, and (d) doesn't notice that things are misbehaving until after some hacker manages to break into their installation on the strength of the misbehaving entry. OK, I'll grant that the probability of (c) is depressingly close to unity; but each of the other steps seems quite low probability. All four of them happening in one installation is something I doubt will happen. On the contrary side, if we make this work differently from the pg_ident.conf precedent, or install weird rules to try to prevent accidental misinterpretations, that could also lead to security problems because things don't work as someone would expect. I see no a-priori reason to believe that this risk is negligible compared to the other one. regards, tom lane
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Michael Paquier
Date:
On Tue, Sep 20, 2022 at 12:09:33AM -0400, Tom Lane wrote: > You have to assume that somebody (a) has a role or DB name starting > with slash, (b) has an explicit reference to that name in their > pg_hba.conf, (c) doesn't read the release notes, and (d) doesn't > notice that things are misbehaving until after some hacker manages > to break into their installation on the strength of the misbehaving > entry. OK, I'll grant that the probability of (c) is depressingly > close to unity; but each of the other steps seems quite low probability. > All four of them happening in one installation is something I doubt > will happen. It is the kind of things that could blow up as a CVE and some bad PR for the project, so I cannot get excited about enforcing this new rule in an authentication file (aka before a role is authenticated) while we are talking about 3~4 code paths (?) that would need an extra check to make sure that no instances have such object names. > On the contrary side, if we make this work differently from the > pg_ident.conf precedent, or install weird rules to try to prevent > accidental misinterpretations, that could also lead to security > problems because things don't work as someone would expect. I see > no a-priori reason to believe that this risk is negligible compared > to the other one. I also do like a lot the idea of making things consistent across all the auth configuration files for all the fields where this can be applied. -- Michael
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
"Drouvot, Bertrand"
Date:
Hi, On 9/20/22 6:30 AM, Michael Paquier wrote: > On Tue, Sep 20, 2022 at 12:09:33AM -0400, Tom Lane wrote: >> You have to assume that somebody (a) has a role or DB name starting >> with slash, (b) has an explicit reference to that name in their >> pg_hba.conf, (c) doesn't read the release notes, and (d) doesn't >> notice that things are misbehaving until after some hacker manages >> to break into their installation on the strength of the misbehaving >> entry. OK, I'll grant that the probability of (c) is depressingly >> close to unity; but each of the other steps seems quite low probability. >> All four of them happening in one installation is something I doubt >> will happen. > > It is the kind of things that could blow up as a CVE and some bad PR > for the project, so I cannot get excited about enforcing this new rule > in an authentication file (aka before a role is authenticated) while > we are talking about 3~4 code paths (?) that would need an extra check > to make sure that no instances have such object names. I also have the feeling that having (a), (b) and (d) is low probability. That said, If the user "/john" already exists and has a hba entry then this entry will still match with the patch. Problem is that all the users that contain "john" would also now match. But things get worst if say /a is an existing user and hba entry as the entry would match any users that contains "a" with the patch. I assume (maybe i should not) that if objects starting with / already exist there is very good reason(s) behind. Then I don't think that preventing their creation in the DDL would help (quite the contrary for the ones that really need them). It looks to me that adding a GUC (off by default) to enable/disable the regexp usage in the hba could be a fair compromise. It won't block any creation starting with a / and won't open more doors (if such objects exist) by default. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Jacob Champion
Date:
On Mon, Sep 19, 2022 at 9:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > You have to assume that somebody (a) has a role or DB name starting > with slash, (b) has an explicit reference to that name in their > pg_hba.conf, (c) doesn't read the release notes, and (d) doesn't > notice that things are misbehaving until after some hacker manages > to break into their installation on the strength of the misbehaving > entry. OK, I'll grant that the probability of (c) is depressingly > close to unity; but each of the other steps seems quite low probability. > All four of them happening in one installation is something I doubt > will happen. I can't argue with (a) or (b), but (d) seems decently likely to me. If your normal user base consists of people who are authorized to access your system, what clues would you have that your HBA is silently failing open? --Jacob
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Michael Paquier
Date:
On Tue, Sep 20, 2022 at 01:33:09PM +0200, Drouvot, Bertrand wrote: > I assume (maybe i should not) that if objects starting with / already exist > there is very good reason(s) behind. Then I don't think that preventing > their creation in the DDL would help (quite the contrary for the ones that > really need them). I have been pondering on this point for the last few weeks, and I'd like to change my opinion and side with Tom on this one as per the very unlikeliness of this being a problem in the wild. I have studied the places that would require restrictions but that was just feeling adding a bit more bloat into the CREATE/ALTER ROLE paths for what's aimed at providing a consistent experience for the user across pg_hba.conf and pg_ident.conf. > It looks to me that adding a GUC (off by default) to enable/disable the > regexp usage in the hba could be a fair compromise. It won't block any > creation starting with a / and won't open more doors (if such objects exist) > by default. Enforcing a behavior change in HBA policies with a GUC does not strike me as a good thing in the long term. I am ready to bet that it would just sit around for nothing like the compatibility GUCs. Anyway, I have looked at the patch. + List *roles_re; + List *databases_re; + regex_t hostname_re; I am surprised by the approach of using separate lists for the regular expressions and the raw names. Wouldn't it be better to store everything in a single list but assign an entry type? In this case it would be either regex or plain string. This would minimize the footprint of the changes (no extra arguments *_re in the routines checking for a match on the roles, databases or hosts). And it seems to me that this would make unnecessary the use of re_num here and there. The hostname is different, of course, requiring only an extra field for its type, or something like that. Perhaps the documentation would gain in clarity if there were more examples, like a set of comma-separated examples (mix of regex and raw strings for example, for all the field types that gain support for regexes)? -$node->append_conf('postgresql.conf', "log_connections = on\n"); +$node->append_conf( + 'postgresql.conf', qq{ +listen_addresses = '127.0.0.1' +log_connections = on +}); Hmm. I think that we may need to reconsider the location of the tests for the regexes with the host name, as the "safe" regression tests should not switch listen_addresses. One location where we already do that is src/test/ssl/, so these could be moved there. Keeping the database and user name parts in src/test/authentication/ is fine. Something that stood out on a first review is the refactoring of 001_password.pl that can be done independently of the main patch: - test_role() -> test_conn() to be able to pass down a database name. - reset_pg_hba() to control the host, db and user parts. The host part does not really apply after moving the hosts checks to a more secure location, so I guess that this had better be extended just for the user and database, keeping host=local all the time. I am planning to apply 0001 attached independently, reducing the footprint of 0002, which is your previous patch left untouched (mostly!). -- Michael
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
"Drouvot, Bertrand"
Date:
Hi, On 10/5/22 9:24 AM, Michael Paquier wrote: > Something that stood out on a first review is the refactoring of > 001_password.pl that can be done independently of the main patch: Good idea, thanks for the proposal. > - test_role() -> test_conn() to be able to pass down a database name. > - reset_pg_hba() to control the host, db and user parts. The host > part does not really apply after moving the hosts checks to a more > secure location, so I guess that this had better be extended just for > the user and database, keeping host=local all the time. > I am planning to apply 0001 attached independently, 0001 looks good to me. > reducing the > footprint of 0002, which is your previous patch left untouched > (mostly!). Thanks! I'll look at it and the comments you just made up-thread. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Michael Paquier
Date:
On Wed, Oct 05, 2022 at 03:32:20PM +0200, Drouvot, Bertrand wrote: > On 10/5/22 9:24 AM, Michael Paquier wrote: >> - test_role() -> test_conn() to be able to pass down a database name. >> - reset_pg_hba() to control the host, db and user parts. The host >> part does not really apply after moving the hosts checks to a more >> secure location, so I guess that this had better be extended just for >> the user and database, keeping host=local all the time. >> I am planning to apply 0001 attached independently, > > 0001 looks good to me. Thanks. I have applied this refactoring, leaving the host part out of the equation as we should rely only on local connections for this part of the test. The best fit I can think about for the checks on the hostname patterns would be either the ssl, ldap or krb5 tests. SSL is more widely tested than the two others. >> reducing the >> footprint of 0002, which is your previous patch left untouched >> (mostly!). > > Thanks! I'll look at it and the comments you just made up-thread. Cool, thanks. One thing that matters a lot IMO (that I forgot to mention previously) is to preserve the order of the items parsed from the configuration files. Also, I am wondering whether we'd better have some regression tests where a regex includes a comma and a role name itself has a comma, actually, just to stress more the parsing of individual items in the HBA file. -- Michael
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
"Drouvot, Bertrand"
Date:
Hi, On 10/5/22 9:24 AM, Michael Paquier wrote: > On Tue, Sep 20, 2022 at 01:33:09PM +0200, Drouvot, Bertrand wrote: > Anyway, I have looked at the patch. > > + List *roles_re; > + List *databases_re; > + regex_t hostname_re; > I am surprised by the approach of using separate lists for the regular > expressions and the raw names. Wouldn't it be better to store > everything in a single list but assign an entry type? In this case it > would be either regex or plain string. This would minimize the > footprint of the changes (no extra arguments *_re in the routines > checking for a match on the roles, databases or hosts). And it seems > to me that this would make unnecessary the use of re_num here and > there. Please find attached v5 addressing this. I started with an union but it turns out that we still need the plain string when a regex is used. This is not needed for the authentication per say but for fill_hba_line(). So I ended up creating a new struct without union in v5. > The hostname is different, of course, requiring only an extra > field for its type, or something like that. I'm using the same new struct as described above for the hostname. > > Perhaps the documentation would gain in clarity if there were more > examples, like a set of comma-separated examples (mix of regex and raw > strings for example, for all the field types that gain support for > regexes)? > Right, I added more examples in v5. > -$node->append_conf('postgresql.conf', "log_connections = on\n"); > +$node->append_conf( > + 'postgresql.conf', qq{ > +listen_addresses = '127.0.0.1' > +log_connections = on > +}); > Hmm. I think that we may need to reconsider the location of the tests > for the regexes with the host name, as the "safe" regression tests > should not switch listen_addresses. One location where we already do > that is src/test/ssl/, so these could be moved there. Good point, I moved the hostname related tests in src/test/ssl. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
"Drouvot, Bertrand"
Date:
Hi, On 10/6/22 2:53 AM, Michael Paquier wrote: > On Wed, Oct 05, 2022 at 03:32:20PM +0200, Drouvot, Bertrand wrote: >> On 10/5/22 9:24 AM, Michael Paquier wrote: >>> - test_role() -> test_conn() to be able to pass down a database name. >>> - reset_pg_hba() to control the host, db and user parts. The host >>> part does not really apply after moving the hosts checks to a more >>> secure location, so I guess that this had better be extended just for >>> the user and database, keeping host=local all the time. >>> I am planning to apply 0001 attached independently, >> >> 0001 looks good to me. > > Thanks. I have applied this refactoring, leaving the host part out of > the equation as we should rely only on local connections for this > part of the test. Thanks! >> Thanks! I'll look at it and the comments you just made up-thread. > > Cool, thanks. One thing that matters a lot IMO (that I forgot to > mention previously) is to preserve the order of the items parsed from > the configuration files. Fully agree, all the versions that have been submitted in this thread preserves the ordering. > > Also, I am wondering whether we'd better have some regression tests > where a regex includes a comma and a role name itself has a comma, > actually, just to stress more the parsing of individual items in the > HBA file. Good idea, it has been added in v5 just shared up-thread. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Michael Paquier
Date:
On Mon, Oct 10, 2022 at 09:00:06AM +0200, Drouvot, Bertrand wrote: > foreach(cell, tokens) > { > [...] > + tokreg = lfirst(cell); > + if (!token_is_regexp(tokreg)) > { > - if (strcmp(dbname, role) == 0) > + if (am_walsender && !am_db_walsender) > + { > + /* > + * physical replication walsender connections can only match > + * replication keyword > + */ > + if (token_is_keyword(tokreg->authtoken, "replication")) > + return true; > + } > + else if (token_is_keyword(tokreg->authtoken, "all")) > return true; When checking the list of databases in check_db(), physical WAL senders (aka am_walsender && !am_db_walsender) would be able to accept regexps, but these should only accept "replication" and never a regexp, no? The second check on "replication" placed in the branch for token_is_regexp() in your patch would be correctly placed, though. This is kind of special in the HBA logic, coming back to 9.0 where physical replication and this special role property have been introduced. WAL senders have gained an actual database property later on in 9.4 with logical decoding, keeping "replication" for compatibility (connection strings can use replication=database to connect as a non-physical WAL sender and connect to a specific database). > +typedef struct AuthToken > +{ > + char *string; > + bool quoted; > +} AuthToken; > + > +/* > + * Distinguish the case a token has to be treated as a regular > + * expression or not. > + */ > +typedef struct AuthTokenOrRegex > +{ > + bool is_regex; > + > + /* > + * Not an union as we still need the token string for fill_hba_line(). > + */ > + AuthToken *authtoken; > + regex_t *regex; > +} AuthTokenOrRegex; Hmm. With is_regex to check if a regex_t exists, both structures may not be necessary. I have not put my hands on that directly, but if I guess that I would shape things to have only AuthToken with (enforcing regex_t in priority if set in the list of elements to check for a match): - the string - quoted - regex_t A list member should never have (regex_t != NULL && quoted), right? Hostnames would never be quoted, as well. > +# test with a comma in the regular expression > +reset_pg_hba($node, 'all', '"/^.*5,.*e$"', 'password'); > +test_conn($node, 'user=md5,role', 'password', 'matching regexp for username', > + 0); So, we check here that the role includes "5," in its name. This is getting fun to parse ;) > elsif ($ENV{PG_TEST_EXTRA} !~ /\bssl\b/) > { > - plan skip_all => 'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA'; > + plan skip_all => > + 'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA'; > } Unrelated noise from perltidy. -- Michael
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
"Drouvot, Bertrand"
Date:
Hi, On 10/11/22 8:29 AM, Michael Paquier wrote: > On Mon, Oct 10, 2022 at 09:00:06AM +0200, Drouvot, Bertrand wrote: >> foreach(cell, tokens) >> { >> [...] >> + tokreg = lfirst(cell); >> + if (!token_is_regexp(tokreg)) >> { >> - if (strcmp(dbname, role) == 0) >> + if (am_walsender && !am_db_walsender) >> + { >> + /* >> + * physical replication walsender connections can only match >> + * replication keyword >> + */ >> + if (token_is_keyword(tokreg->authtoken, "replication")) >> + return true; >> + } >> + else if (token_is_keyword(tokreg->authtoken, "all")) >> return true; > > When checking the list of databases in check_db(), physical WAL > senders (aka am_walsender && !am_db_walsender) would be able to accept > regexps, but these should only accept "replication" and never a > regexp, no? Oh right, good catch, thanks! Please find attached v6 fixing it. > This is kind of special in the HBA logic, coming back to 9.0 where > physical replication and this special role property have been > introduced. WAL senders have gained an actual database property later > on in 9.4 with logical decoding, keeping "replication" for > compatibility (connection strings can use replication=database to > connect as a non-physical WAL sender and connect to a specific > database). > Thanks for the explanation! >> +typedef struct AuthToken >> +{ >> + char *string; >> + bool quoted; >> +} AuthToken; >> + >> +/* >> + * Distinguish the case a token has to be treated as a regular >> + * expression or not. >> + */ >> +typedef struct AuthTokenOrRegex >> +{ >> + bool is_regex; >> + >> + /* >> + * Not an union as we still need the token string for fill_hba_line(). >> + */ >> + AuthToken *authtoken; >> + regex_t *regex; >> +} AuthTokenOrRegex; > > Hmm. With is_regex to check if a regex_t exists, both structures may > not be necessary. Agree that both struct are not necessary. In v6, AuthTokenOrRegex has been removed and the regex has been moved to AuthToken. There is no is_regex bool anymore, as it's enough to test whether regex is NULL or not. > I have not put my hands on that directly, but if > I guess that I would shape things to have only AuthToken with > (enforcing regex_t in priority if set in the list of elements to check > for a match): > - the string > - quoted > - regex_t > A list member should never have (regex_t != NULL && quoted), right? The patch does allow that. For example it happens for the test where we add a comma in the role name. As we don't rely on a dedicated char to mark the end of a reg exp (we only rely on / to mark its start) then allowing (regex_t != NULL && quoted) seems reasonable to me. >> +# test with a comma in the regular expression >> +reset_pg_hba($node, 'all', '"/^.*5,.*e$"', 'password'); >> +test_conn($node, 'user=md5,role', 'password', 'matching regexp for username', >> + 0); > > So, we check here that the role includes "5," in its name. This is > getting fun to parse ;) > Indeed, ;-) >> elsif ($ENV{PG_TEST_EXTRA} !~ /\bssl\b/) >> { >> - plan skip_all => 'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA'; >> + plan skip_all => >> + 'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA'; >> } > > Unrelated noise from perltidy. Right. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Michael Paquier
Date:
On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote: > Indeed, ;-) So, I have spent the last two days looking at all that, studying the structure of the patch and the existing HEAD code, and it looks like that a few things could be consolidated. First, as of HEAD, AuthToken is only used for elements in a list of role and database names in hba.conf before filling in each HbaLine, hence we limit its usage to the initial parsing. The patch assigns an optional regex_t to it, then extends the use of AuthToken for single hostname entries in pg_hba.conf. Things going first: shouldn't we combine ident_user and "re" together in the same structure? Even if we finish by not using AuthToken to store the computed regex, it seems to me that we'd better use the same base structure for pg_ident.conf and pg_hba.conf. While looking closely at the patch, we would expand the use of AuthToken outside its original context. I have also looked at make_auth_token(), and wondered if it could be possible to have this routine compile the regexes. This approach would not stick with pg_ident.conf though, as we validate the fields in each line when we put our hands on ident_user and after the base validation of a line (number of fields, etc.). So with all that in mind, it feels right to not use AuthToken at all when building each HbaLine and each IdentLine, but a new, separate, structure. We could call that an AuthItem (string, its compiled regex) perhaps? It could have its own make() routine, taking in input an AuthToken and process pg_regcomp(). Better ideas for this new structure would be welcome, and the idea is that we'd store the post-parsing state of an AuthToken to something that has a compiled regex. We could finish by using AuthToken at the end and expand its use, but it does not feel completely right either to have a make() routine but not be able to compile its regular expression when creating the AuthToken. The logic in charge of compiling the regular expressions could be consolidated more. The patch refactors the logic with token_regcomp(), uses it for the user names (ident_user in parse_ident_line() from pg_ident.conf), then extended to the hostnames (single item) and the role/database names (list possible in these cases). This approach looks right to me. Once we plug in an AuthItem to IdentLine, token_regcomp could be changed so as it takes an AuthToken in input, saving directly the compiled regex_t in the input structure. At the end, the global structure of the code should, I guess, respect the following rules: - The number of places where we check if a string is a regex should be minimal (aka string beginning by '/'). - Only one code path of hba.c should call pg_regcomp() (the patch does that), and only one code path should call pg_regexec() (two code paths of hba.c do that with the patch, as of the need to store matching expression). This should minimize the areas where we call pg_mb2wchar_with_len(), for one. About this last point, token_regexec() does not include check_ident_usermap() in its logic, and it seems to me that it should. The difference is with the expected regmatch_t structures, so shouldn't token_regexec be extended with two arguments as of an array of regmatch_t and the number of elements in the array? This would save a bit some of the logic around pg_mb2wchar_with_len(), for example. To make all that work, token_regexec() should return an int, coming from pg_regexec, but no specific error strings as we don't want to spam the logs when checking hosts, roles and databases in pg_hba.conf. /* Check if it has a CIDR suffix and if so isolate it */ - cidr_slash = strchr(str, '/'); - if (cidr_slash) - *cidr_slash = '\0'; + if (!is_regexp) + { + cidr_slash = strchr(str, '/'); + if (cidr_slash) + *cidr_slash = '\0'; + } [...] /* Get the netmask */ - if (cidr_slash) + if (cidr_slash && !is_regexp) { Some of the code handling regexes for hostnames itches me a bit, like this one. Perhaps it would be better to evaluate this interaction with regular expressions separately. The database and role names don't have this need, so their cases are much simpler to think about. The code could be split to tackle things step-by-step: - One refactoring patch to introduce token_regcomp() and token_regexec(), with the introduction of a new structure that includes the compiled regexes. (Feel free to counterargue about the use of AuthToken for this purpose, of course!) - Plug in the refactored logic for the lists of role names and database names in pg_hba.conf. - Handle the case of single host entries in pg_hba.conf. -- Michael
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Michael Paquier
Date:
On Fri, Oct 14, 2022 at 02:30:25PM +0900, Michael Paquier wrote: > First, as of HEAD, AuthToken is only used for elements in a list of > role and database names in hba.conf before filling in each HbaLine, > hence we limit its usage to the initial parsing. The patch assigns an > optional regex_t to it, then extends the use of AuthToken for single > hostname entries in pg_hba.conf. Things going first: shouldn't we > combine ident_user and "re" together in the same structure? Even if > we finish by not using AuthToken to store the computed regex, it seems > to me that we'd better use the same base structure for pg_ident.conf > and pg_hba.conf. While looking closely at the patch, we would expand > the use of AuthToken outside its original context. I have also looked > at make_auth_token(), and wondered if it could be possible to have this > routine compile the regexes. This approach would not stick with > pg_ident.conf though, as we validate the fields in each line when we > put our hands on ident_user and after the base validation of a line > (number of fields, etc.). So with all that in mind, it feels right to > not use AuthToken at all when building each HbaLine and each > IdentLine, but a new, separate, structure. We could call that an > AuthItem (string, its compiled regex) perhaps? It could have its own > make() routine, taking in input an AuthToken and process > pg_regcomp(). Better ideas for this new structure would be welcome, > and the idea is that we'd store the post-parsing state of an > AuthToken to something that has a compiled regex. We could finish by > using AuthToken at the end and expand its use, but it does not feel > completely right either to have a make() routine but not be able to > compile its regular expression when creating the AuthToken. I have have sent this part too quickly. As AuthTokens are used in check_db() and check_role() when matching entries, it is more intuitive to store the regex_t directly in it. Changing IdentLine to use a AuthToken makes the "quoted" part useless in this case, still it could be used in Assert()s to make sure that the data is shaped as expected at check-time, enforced at false when creating it in parse_ident_line()? -- Michael
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
"Drouvot, Bertrand"
Date:
Hi, On 10/14/22 8:18 AM, Michael Paquier wrote: > On Fri, Oct 14, 2022 at 02:30:25PM +0900, Michael Paquier wrote: >> First, as of HEAD, AuthToken is only used for elements in a list of >> role and database names in hba.conf before filling in each HbaLine, >> hence we limit its usage to the initial parsing. The patch assigns an >> optional regex_t to it, then extends the use of AuthToken for single >> hostname entries in pg_hba.conf. Things going first: shouldn't we >> combine ident_user and "re" together in the same structure? Even if >> we finish by not using AuthToken to store the computed regex, it seems >> to me that we'd better use the same base structure for pg_ident.conf >> and pg_hba.conf. While looking closely at the patch, we would expand >> the use of AuthToken outside its original context. I have also looked >> at make_auth_token(), and wondered if it could be possible to have this >> routine compile the regexes. This approach would not stick with >> pg_ident.conf though, as we validate the fields in each line when we >> put our hands on ident_user and after the base validation of a line >> (number of fields, etc.). So with all that in mind, it feels right to >> not use AuthToken at all when building each HbaLine and each >> IdentLine, but a new, separate, structure. We could call that an >> AuthItem (string, its compiled regex) perhaps? It could have its own >> make() routine, taking in input an AuthToken and process >> pg_regcomp(). Better ideas for this new structure would be welcome, >> and the idea is that we'd store the post-parsing state of an >> AuthToken to something that has a compiled regex. We could finish by >> using AuthToken at the end and expand its use, but it does not feel >> completely right either to have a make() routine but not be able to >> compile its regular expression when creating the AuthToken. > > I have have sent this part too quickly. As AuthTokens are used in > check_db() and check_role() when matching entries, it is more > intuitive to store the regex_t directly in it. Yeah, I also think this is the right place for it. > Changing IdentLine to > use a AuthToken makes the "quoted" part useless in this case, still it > could be used in Assert()s to make sure that the data is shaped as > expected at check-time, enforced at false when creating it in > parse_ident_line()? I agree, that makes sense. I'll work on that. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
"Drouvot, Bertrand"
Date:
Hi, On 10/14/22 7:30 AM, Michael Paquier wrote: > On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote: >> Indeed, ;-) > > So, I have spent the last two days looking at all that, studying the > structure of the patch and the existing HEAD code, Thanks! > The code could be split to tackle things step-by-step: > - One refactoring patch to introduce token_regcomp() and > token_regexec(), with the introduction of a new structure that > includes the compiled regexes. (Feel free to counterargue about the > use of AuthToken for this purpose, of course!) > - Plug in the refactored logic for the lists of role names and > database names in pg_hba.conf. > - Handle the case of single host entries in pg_hba.conf. > -- I agree to work step-by-step. While looking at it again now, I discovered that the new TAP test for the regexp on the hostname in ssl/002_scram.pl is failing on some of my tests environment (and not all..). So, I agree with the dedicated steps you are proposing and that the "host case" needs a dedicated attention. I'm not ignoring all the remarks you've just done up-thread, I'll address them and/or provide my feedback on them when I'll come back with the step-by-step sub patches. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
"Drouvot, Bertrand"
Date:
Hi, On 10/14/22 7:30 AM, Michael Paquier wrote: > On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote: >> Indeed, ;-) > > > I have also looked > at make_auth_token(), and wondered if it could be possible to have this > routine compile the regexes. I think that it makes sense. > This approach would not stick with > pg_ident.conf though, as we validate the fields in each line when we > put our hands on ident_user and after the base validation of a line > (number of fields, etc.). I'm not sure to get the issue here with the proposed approach and pg_ident.conf. The new attached patch proposal is making use of make_auth_token() (through copy_auth_token()) in parse_ident_line(), do you see any issue? > > The logic in charge of compiling the regular expressions could be > consolidated more. The patch refactors the logic with > token_regcomp(), uses it for the user names (ident_user in > parse_ident_line() from pg_ident.conf), then extended to the hostnames > (single item) and the role/database names (list possible in these > cases). This approach looks right to me. Once we plug in an AuthItem > to IdentLine, token_regcomp could be changed so as it takes an > AuthToken in input Right, did it that way in the attached. > - Only one code path of hba.c should call pg_regcomp() (the patch does > that), and only one code path should call pg_regexec() (two code paths > of hba.c do that with the patch, as of the need to store matching > expression). This should minimize the areas where we call > pg_mb2wchar_with_len(), for one. Right. > About this last point, token_regexec() does not include > check_ident_usermap() in its logic, and it seems to me that it should. > The difference is with the expected regmatch_t structures, so > shouldn't token_regexec be extended with two arguments as of an array > of regmatch_t and the number of elements in the array? You are right, not using token_regexec() in check_ident_usermap() in the previous patch versions was not right. That's fixed in the attached, though the substitution (if any) is still outside of token_regexec(), do you think it should be part of it? (I think that makes sense to keep it outside of it as we wont use the substitution logic for roles, databases and hostname) > > The code could be split to tackle things step-by-step: > - One refactoring patch to introduce token_regcomp() and > token_regexec() Agree. Please find attached v1-0001-token_reg-functions.patch for this first step. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Michael Paquier
Date:
On Mon, Oct 17, 2022 at 07:56:02PM +0200, Drouvot, Bertrand wrote: > On 10/14/22 7:30 AM, Michael Paquier wrote: >> This approach would not stick with >> pg_ident.conf though, as we validate the fields in each line when we >> put our hands on ident_user and after the base validation of a line >> (number of fields, etc.). > > I'm not sure to get the issue here with the proposed approach and > pg_ident.conf. My point is about parse_ident_line(), where we need to be careful in the order of the operations. The macros IDENT_MULTI_VALUE() and IDENT_FIELD_ABSENT() need to be applied on all the fields first, and the regex computation needs to be last. Your patch had a subtile issue here, as users may get errors on the computed regex before the ordering of the fields as the computation was used *before* the "Get the PG rolename token" part of the logic. >> About this last point, token_regexec() does not include >> check_ident_usermap() in its logic, and it seems to me that it should. >> The difference is with the expected regmatch_t structures, so >> shouldn't token_regexec be extended with two arguments as of an array >> of regmatch_t and the number of elements in the array? > > You are right, not using token_regexec() in check_ident_usermap() in the > previous patch versions was not right. That's fixed in the attached, though > the substitution (if any) is still outside of token_regexec(), do you think > it should be part of it? (I think that makes sense to keep it outside of it > as we wont use the substitution logic for roles, databases and hostname) Keeping the substition done with the IdentLine's Authtokens outside of the internal execution routine is fine by me. While putting my hands on that, I was also wondering whether we should have the error string generated after compilation within the internal regcomp() routine, but that would require more arguments to pg_regcomp() (as of file name, line number, **err_string), and that looks more invasive than necessary. Perhaps the follow-up steps will prove me wrong, though :) A last thing is the introduction of a free() routine for AuthTokens, to minimize the number of places where we haev pg_regfree(). The gain is minimal, but that looks more consistent with the execution and compilation paths. -- Michael
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
"Drouvot, Bertrand"
Date:
Hi, On 10/18/22 7:51 AM, Michael Paquier wrote: > On Mon, Oct 17, 2022 at 07:56:02PM +0200, Drouvot, Bertrand wrote: >> On 10/14/22 7:30 AM, Michael Paquier wrote: >>> This approach would not stick with >>> pg_ident.conf though, as we validate the fields in each line when we >>> put our hands on ident_user and after the base validation of a line >>> (number of fields, etc.). >> >> I'm not sure to get the issue here with the proposed approach and >> pg_ident.conf. > > My point is about parse_ident_line(), where we need to be careful in > the order of the operations. The macros IDENT_MULTI_VALUE() and > IDENT_FIELD_ABSENT() need to be applied on all the fields first, and > the regex computation needs to be last. Your patch had a subtile > issue here, as users may get errors on the computed regex before the > ordering of the fields as the computation was used *before* the "Get > the PG rolename token" part of the logic. Gotcha, thanks! I was wondering if we shouldn't add a comment about that and I see that you've added one in v2, thanks! BTW, what about adding a new TAP test (dedicated patch) to test the behavior in case of errors during the regexes compilation in pg_ident.conf and pg_hba.conf (not done yet)? (we could add it once this patch series is done). > While putting my hands on that, I was also wondering whether we should > have the error string generated after compilation within the internal > regcomp() routine, but that would require more arguments to > pg_regcomp() (as of file name, line number, **err_string), and that > looks more invasive than necessary. Perhaps the follow-up steps will > prove me wrong, though :) I've had the same thought (and that was what the previous global patch was doing). I'm tempted to think that the follow-steps will prove you right ;-) (specially if at the end those will be the same error messages for databases and roles). > > A last thing is the introduction of a free() routine for AuthTokens, > to minimize the number of places where we haev pg_regfree(). The gain > is minimal, but that looks more consistent with the execution and > compilation paths. Agree, that looks better. I had a look at your v2, did a few tests and it looks good to me. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Michael Paquier
Date:
On Tue, Oct 18, 2022 at 09:14:21AM +0200, Drouvot, Bertrand wrote: > BTW, what about adding a new TAP test (dedicated patch) to test the behavior > in case of errors during the regexes compilation in pg_ident.conf and > pg_hba.conf (not done yet)? (we could add it once this patch series is > done). Perhaps, that may become tricky when it comes to -DEXEC_BACKEND (for cases where no fork() implementation is available, aka Windows). But a postmaster restart failure would generate logs that could be picked for a pattern check? >> While putting my hands on that, I was also wondering whether we should >> have the error string generated after compilation within the internal >> regcomp() routine, but that would require more arguments to >> pg_regcomp() (as of file name, line number, **err_string), and that >> looks more invasive than necessary. Perhaps the follow-up steps will >> prove me wrong, though :) > > I've had the same thought (and that was what the previous global patch was > doing). I'm tempted to think that the follow-steps will prove you right ;-) > (specially if at the end those will be the same error messages for databases > and roles). Avoiding three times the same error message seems like a good thing in the long run, but let's think about this part later as needed. All these routines are static to hba.c so even if we finish by not finishing the whole job for this development cycle we can still be very flexible. -- Michael
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
"Drouvot, Bertrand"
Date:
Hi, On 10/19/22 3:18 AM, Michael Paquier wrote: > On Tue, Oct 18, 2022 at 09:14:21AM +0200, Drouvot, Bertrand wrote: >> BTW, what about adding a new TAP test (dedicated patch) to test the behavior >> in case of errors during the regexes compilation in pg_ident.conf and >> pg_hba.conf (not done yet)? (we could add it once this patch series is >> done). > > Perhaps, that may become tricky when it comes to -DEXEC_BACKEND (for > cases where no fork() implementation is available, aka Windows). But > a postmaster restart failure would generate logs that could be picked > for a pattern check? Right, that's how I'd see it. I'll give it a look. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
"Drouvot, Bertrand"
Date:
Hi, On 10/14/22 7:30 AM, Michael Paquier wrote: > On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote: >> Indeed, ;-) > > > The code could be split to tackle things step-by-step: > - One refactoring patch to introduce token_regcomp() and > token_regexec(), with the introduction of a new structure that > includes the compiled regexes. (Feel free to counterargue about the > use of AuthToken for this purpose, of course!) > - Plug in the refactored logic for the lists of role names and > database names in pg_hba.conf. Please find attached v1-0001-regex-handling-for-db-and-roles-in-hba.patch to implement regexes for databases and roles in hba. It does also contain new regexes related TAP tests and doc updates. It relies on the refactoring made in fc579e11c6 (but changes the regcomp_auth_token() parameters so that it is now responsible for emitting the compilation error message (if any), to avoid code duplication in parse_hba_line() and parse_ident_line() for roles, databases and user name mapping). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Michael Paquier
Date:
On Wed, Oct 19, 2022 at 10:45:44AM +0200, Drouvot, Bertrand wrote: > Please find attached v1-0001-regex-handling-for-db-and-roles-in-hba.patch to > implement regexes for databases and roles in hba. > > It does also contain new regexes related TAP tests and doc updates. Thanks for the updated version. This is really easy to look at now. > It relies on the refactoring made in fc579e11c6 (but changes the > regcomp_auth_token() parameters so that it is now responsible for emitting > the compilation error message (if any), to avoid code duplication in > parse_hba_line() and parse_ident_line() for roles, databases and user name > mapping). @@ -652,13 +670,18 @@ check_role(const char *role, Oid roleid, List *tokens) [...] - if (!tok->quoted && tok->string[0] == '+') + if (!token_has_regexp(tok)) { Hmm. Do we need token_has_regexp() here for all the cases? We know that the string can begin with a '+', hence it is no regex. The same applies for the case of "all". The remaining case is the one where the user name matches exactly the AuthToken string, which should be last as we want to treat anything beginning with a '/' as a regex. It seems like we could do an order like that? Say: if (!tok->quoted && tok->string[0] == '+') //do else if (token_is_keyword(tok, "all")) //do else if (token_has_regexp(tok)) //do regex compilation, handling failures else if (token_matches(tok, role)) //do exact match The same approach with keywords first, regex, and exact match could be applied as well for the databases? Perhaps it is just mainly a matter of taste, and it depends on how much you want to prioritize the place of the regex over the rest but that could make the code easier to understand in the long-run and this is a very sensitive code area, and the case of physical walsenders (in short specific process types) requiringx specific conditions is also something to take into account. foreach(tokencell, tokens) { - parsedline->roles = lappend(parsedline->roles, - copy_auth_token(lfirst(tokencell))); + AuthToken *tok = copy_auth_token(lfirst(tokencell)); + + /* + * Compile a regex from the role token, if necessary. + */ + if (regcomp_auth_token(tok, HbaFileName, line_num, err_msg, elevel)) + return NULL; + + parsedline->roles = lappend(parsedline->roles, tok); } Compiling the expressions for the user and database lists one-by-one in parse_hba_line() as you do is correct. However there is a gotcha that you are forgetting here: the memory allocations done for the regexp compilations are not linked to the memory context where each line is processed (named hbacxt in load_hba()) and need a separate cleanup. In the same fashion as load_ident(), it seems to me that we need two extra things for this patch: - if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go through new_parsed_lines and free for each line the AuthTokens for the database and user lists. - if ok and new_parsed_lines != NIL, the same cleanup needs to happen. My guess is that you could do both the same way as load_ident() does, keeping some symmetry between the two code paths. Unifying both into a common routine would be sort of annoying as HbaLines uses lists within the lists of parsed lines, and IdentLine can have one at most in each line. I am wondering whether we should link the regexp code to not use malloc(), actually.. This would require a separate analysis, though, and I suspect that palloc() would be very expensive for this job. For now, I have made your last patch a bit shorter by applying the refactoring of regcomp_auth_token() separately with a few tweaks to the comments. -- Michael
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
"Drouvot, Bertrand"
Date:
Hi, On 10/21/22 2:58 AM, Michael Paquier wrote: > On Wed, Oct 19, 2022 at 10:45:44AM +0200, Drouvot, Bertrand wrote: >> Please find attached v1-0001-regex-handling-for-db-and-roles-in-hba.patch to >> implement regexes for databases and roles in hba. >> >> It does also contain new regexes related TAP tests and doc updates. > > Thanks for the updated version. This is really easy to look at now. > >> It relies on the refactoring made in fc579e11c6 (but changes the >> regcomp_auth_token() parameters so that it is now responsible for emitting >> the compilation error message (if any), to avoid code duplication in >> parse_hba_line() and parse_ident_line() for roles, databases and user name >> mapping). > > @@ -652,13 +670,18 @@ check_role(const char *role, Oid roleid, List *tokens) > [...] > - if (!tok->quoted && tok->string[0] == '+') > + if (!token_has_regexp(tok)) > { > Hmm. Do we need token_has_regexp() here for all the cases? We know > that the string can begin with a '+', hence it is no regex. The same > applies for the case of "all". The remaining case is the one where > the user name matches exactly the AuthToken string, which should be > last as we want to treat anything beginning with a '/' as a regex. It > seems like we could do an order like that? Say: > if (!tok->quoted && tok->string[0] == '+') > //do > else if (token_is_keyword(tok, "all")) > //do > else if (token_has_regexp(tok)) > //do regex compilation, handling failures > else if (token_matches(tok, role)) > //do exact match > > The same approach with keywords first, regex, and exact match could be > applied as well for the databases? Perhaps it is just mainly a matter > of taste, Yeah, I think it is. > and it depends on how much you want to prioritize the place > of the regex over the rest but that could make the code easier to > understand in the long-run and this is a very sensitive code area, And agree that your proposal tastes better ;-): it is easier to understand, v2 attached has been done that way. > Compiling the expressions for the user and database lists one-by-one > in parse_hba_line() as you do is correct. However there is a gotcha > that you are forgetting here: the memory allocations done for the > regexp compilations are not linked to the memory context where each > line is processed (named hbacxt in load_hba()) and need a separate > cleanup. Oops, right, thanks for the call out! > In the same fashion as load_ident(), it seems to me that we > need two extra things for this patch: > - if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go > through new_parsed_lines and free for each line the AuthTokens for the > database and user lists. > - if ok and new_parsed_lines != NIL, the same cleanup needs to > happen. Right, but I think that should be "parsed_hba_lines != NIL". > My guess is that you could do both the same way as load_ident() does, > keeping some symmetry between the two code paths. Right. To avoid code duplication in the !ok/ok cases, the function free_hba_line() has been added in v2: it goes through the list of databases and roles tokens and call free_auth_token() for each of them. > Unifying both into > a common routine would be sort of annoying as HbaLines uses lists > within the lists of parsed lines, and IdentLine can have one at most > in each line. I agree, and v2 is not attempting to unify them. > For now, I have made your last patch a bit shorter by applying the > refactoring of regcomp_auth_token() separately with a few tweaks to > the comments. Thanks! v2 attached does apply on top of that. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
Michael Paquier
Date:
On Fri, Oct 21, 2022 at 02:10:37PM +0200, Drouvot, Bertrand wrote: > On 10/21/22 2:58 AM, Michael Paquier wrote: >> The same approach with keywords first, regex, and exact match could be >> applied as well for the databases? Perhaps it is just mainly a matter >> of taste, > > Yeah, I think it is. ;) Still it looks that this makes for less confusion with a minimal footprint once the new additions are in place. >> In the same fashion as load_ident(), it seems to me that we >> need two extra things for this patch: >> - if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go >> through new_parsed_lines and free for each line the AuthTokens for the >> database and user lists. >> - if ok and new_parsed_lines != NIL, the same cleanup needs to >> happen. > > Right, but I think that should be "parsed_hba_lines != NIL". For the second case, where we need to free the past contents after a success, yes. > Right. To avoid code duplication in the !ok/ok cases, the function > free_hba_line() has been added in v2: it goes through the list of databases > and roles tokens and call free_auth_token() for each of them. Having a small-ish routine for that is fine. I have spent a couple of hours doing a pass over v2, playing manually with regex patterns, reloads, the system views and item lists. The logic was fine, but I have adjusted a few things related to the comments and the documentation (particularly with the examples, removing one example and updating one with a regex that has a comma, needing double quotes). The CI and all my machines were green, and the test coverage looked sufficient. So, applied. I'll keep an eye on the buildfarm. -- Michael
Attachment
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
From
"Drouvot, Bertrand"
Date:
Hi, On 10/24/22 5:34 AM, Michael Paquier wrote: > On Fri, Oct 21, 2022 at 02:10:37PM +0200, Drouvot, Bertrand wrote: >> On 10/21/22 2:58 AM, Michael Paquier wrote: > > I have spent a couple of hours doing a pass over v2, playing manually > with regex patterns, reloads, the system views and item lists. The > logic was fine, but I have adjusted a few things related to the > comments and the documentation (particularly with the examples, > removing one example and updating one with a regex that has a comma, > needing double quotes). The CI and all my machines were green, and > the test coverage looked sufficient. So, applied. Thanks! Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com