Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf |
Date | |
Msg-id | Y0jz8QeyeEsnXKlm@paquier.xyz Whole thread Raw |
In response to | Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>) |
Responses |
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf |
List | pgsql-hackers |
On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote: > Indeed, ;-) So, I have spent the last two days looking at all that, studying the structure of the patch and the existing HEAD code, and it looks like that a few things could be consolidated. First, as of HEAD, AuthToken is only used for elements in a list of role and database names in hba.conf before filling in each HbaLine, hence we limit its usage to the initial parsing. The patch assigns an optional regex_t to it, then extends the use of AuthToken for single hostname entries in pg_hba.conf. Things going first: shouldn't we combine ident_user and "re" together in the same structure? Even if we finish by not using AuthToken to store the computed regex, it seems to me that we'd better use the same base structure for pg_ident.conf and pg_hba.conf. While looking closely at the patch, we would expand the use of AuthToken outside its original context. I have also looked at make_auth_token(), and wondered if it could be possible to have this routine compile the regexes. This approach would not stick with pg_ident.conf though, as we validate the fields in each line when we put our hands on ident_user and after the base validation of a line (number of fields, etc.). So with all that in mind, it feels right to not use AuthToken at all when building each HbaLine and each IdentLine, but a new, separate, structure. We could call that an AuthItem (string, its compiled regex) perhaps? It could have its own make() routine, taking in input an AuthToken and process pg_regcomp(). Better ideas for this new structure would be welcome, and the idea is that we'd store the post-parsing state of an AuthToken to something that has a compiled regex. We could finish by using AuthToken at the end and expand its use, but it does not feel completely right either to have a make() routine but not be able to compile its regular expression when creating the AuthToken. The logic in charge of compiling the regular expressions could be consolidated more. The patch refactors the logic with token_regcomp(), uses it for the user names (ident_user in parse_ident_line() from pg_ident.conf), then extended to the hostnames (single item) and the role/database names (list possible in these cases). This approach looks right to me. Once we plug in an AuthItem to IdentLine, token_regcomp could be changed so as it takes an AuthToken in input, saving directly the compiled regex_t in the input structure. At the end, the global structure of the code should, I guess, respect the following rules: - The number of places where we check if a string is a regex should be minimal (aka string beginning by '/'). - Only one code path of hba.c should call pg_regcomp() (the patch does that), and only one code path should call pg_regexec() (two code paths of hba.c do that with the patch, as of the need to store matching expression). This should minimize the areas where we call pg_mb2wchar_with_len(), for one. About this last point, token_regexec() does not include check_ident_usermap() in its logic, and it seems to me that it should. The difference is with the expected regmatch_t structures, so shouldn't token_regexec be extended with two arguments as of an array of regmatch_t and the number of elements in the array? This would save a bit some of the logic around pg_mb2wchar_with_len(), for example. To make all that work, token_regexec() should return an int, coming from pg_regexec, but no specific error strings as we don't want to spam the logs when checking hosts, roles and databases in pg_hba.conf. /* Check if it has a CIDR suffix and if so isolate it */ - cidr_slash = strchr(str, '/'); - if (cidr_slash) - *cidr_slash = '\0'; + if (!is_regexp) + { + cidr_slash = strchr(str, '/'); + if (cidr_slash) + *cidr_slash = '\0'; + } [...] /* Get the netmask */ - if (cidr_slash) + if (cidr_slash && !is_regexp) { Some of the code handling regexes for hostnames itches me a bit, like this one. Perhaps it would be better to evaluate this interaction with regular expressions separately. The database and role names don't have this need, so their cases are much simpler to think about. The code could be split to tackle things step-by-step: - One refactoring patch to introduce token_regcomp() and token_regexec(), with the introduction of a new structure that includes the compiled regexes. (Feel free to counterargue about the use of AuthToken for this purpose, of course!) - Plug in the refactored logic for the lists of role names and database names in pg_hba.conf. - Handle the case of single host entries in pg_hba.conf. -- Michael
Attachment
pgsql-hackers by date: