Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Date
Msg-id Y1HumyLtAqvQ1lOz@paquier.xyz
Whole thread Raw
In response to Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Responses Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
List pgsql-hackers
On Wed, Oct 19, 2022 at 10:45:44AM +0200, Drouvot, Bertrand wrote:
> Please find attached v1-0001-regex-handling-for-db-and-roles-in-hba.patch to
> implement regexes for databases and roles in hba.
>
> It does also contain new regexes related TAP tests and doc updates.

Thanks for the updated version.  This is really easy to look at now.

> It relies on the refactoring made in fc579e11c6 (but changes the
> regcomp_auth_token() parameters so that it is now responsible for emitting
> the compilation error message (if any), to avoid code duplication in
> parse_hba_line() and parse_ident_line() for roles, databases and user name
> mapping).

@@ -652,13 +670,18 @@ check_role(const char *role, Oid roleid, List *tokens)
[...]
-       if (!tok->quoted && tok->string[0] == '+')
+       if (!token_has_regexp(tok))
        {
Hmm.  Do we need token_has_regexp() here for all the cases?  We know
that the string can begin with a '+', hence it is no regex.  The same
applies for the case of "all".  The remaining case is the one where
the user name matches exactly the AuthToken string, which should be
last as we want to treat anything beginning with a '/' as a regex.  It
seems like we could do an order like that?  Say:
if (!tok->quoted && tok->string[0] == '+')
    //do
else if (token_is_keyword(tok, "all"))
    //do
else if (token_has_regexp(tok))
    //do regex compilation, handling failures
else if (token_matches(tok, role))
    //do exact match

The same approach with keywords first, regex, and exact match could be
applied as well for the databases?  Perhaps it is just mainly a matter
of taste, and it depends on how much you want to prioritize the place
of the regex over the rest but that could make the code easier to
understand in the long-run and this is a very sensitive code area, and
the case of physical walsenders (in short specific process types)
requiringx specific conditions is also something to take into account.

    foreach(tokencell, tokens)
    {
-       parsedline->roles = lappend(parsedline->roles,
-                                   copy_auth_token(lfirst(tokencell)));
+       AuthToken  *tok = copy_auth_token(lfirst(tokencell));
+
+       /*
+        * Compile a regex from the role token, if necessary.
+        */
+       if (regcomp_auth_token(tok, HbaFileName, line_num, err_msg, elevel))
+           return NULL;
+
+       parsedline->roles = lappend(parsedline->roles, tok);
    }

Compiling the expressions for the user and database lists one-by-one
in parse_hba_line() as you do is correct.  However there is a gotcha
that you are forgetting here: the memory allocations done for the
regexp compilations are not linked to the memory context where each
line is processed (named hbacxt in load_hba()) and need a separate
cleanup.  In the same fashion as load_ident(), it seems to me that we
need two extra things for this patch:
- if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go
through new_parsed_lines and free for each line the AuthTokens for the
database and user lists.
- if ok and new_parsed_lines != NIL, the same cleanup needs to
happen.
My guess is that you could do both the same way as load_ident() does,
keeping some symmetry between the two code paths.  Unifying both into
a common routine would be sort of annoying as HbaLines uses lists
within the lists of parsed lines, and IdentLine can have one at most
in each line.

I am wondering whether we should link the regexp code to not use
malloc(), actually..  This would require a separate analysis, though,
and I suspect that palloc() would be very expensive for this job.

For now, I have made your last patch a bit shorter by applying the
refactoring of regcomp_auth_token() separately with a few tweaks to
the comments.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Corey Huinker
Date:
Subject: Re: XMAX_LOCK_ONLY and XMAX_COMMITTED (fk/multixact code)
Next
From: Peter Smith
Date:
Subject: Re: fix stats_fetch_consistency value in postgresql.conf.sample