Re: Allow file inclusion in pg_hba and pg_ident files - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Allow file inclusion in pg_hba and pg_ident files
Date
Msg-id 20221114074727.s4g6rcsh3rlt5gc3@jrouhaud
Whole thread Raw
In response to Re: Allow file inclusion in pg_hba and pg_ident files  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Allow file inclusion in pg_hba and pg_ident files
List pgsql-hackers
Hi,

On Mon, Nov 14, 2022 at 02:40:37PM +0900, Michael Paquier wrote:
> On Sat, Nov 12, 2022 at 04:13:53PM +0800, Julien Rouhaud wrote:
> > It's looks good to me.  I agree that file name and line number should be enough
> > to diagnose any unexpected error.
>
> Thanks for checking.  I have looked at 0001 and 0002 again with a
> fresh mind, and applied both of them this morning.

Thanks a lot!

> This makes the remaining bits of the patch much easier to follow in
> hba.c.  Here are more comments after a closer review of the whole for
> the C logic.

Agreed.

> -MemoryContext
> -tokenize_auth_file(const char *filename, FILE *file, List **tok_lines,
> -                  int elevel, int depth)
> +static void
> +tokenize_file_with_context(MemoryContext linecxt, const char *filename,
>
> I really tend to prefer having one routine rather than two for the
> tokenization entry point.  Switching to the line context after setting
> up the callback is better, and tokenize_file_with_context() does so.
> Anyway, what do you think about having one API that gains a
> "MemoryContext *" argument, as of the following:
> void tokenize_auth_file(const char *filename, FILE *file,
>                         List **tok_lines,
>                         int depth, int elevel, MemoryContext *linectx)
>
> If the caller passes NULL for *linectx as the initial line context,
> just create it as we do now.  If *linectx is not NULL, just reuse it.
> That may be cleaner than returning the created MemoryContext as
> returned result from tokenize_auth_file().

I originally used two functions as it's a common pattern (e.g. ReadBuffer /
ReadBufferExtended or all the *_internal versions of functions) and to avoid
unnecessary branch, but I agree that here having an extra branch is unlikely to
make any measurable difference.  It would only matter on -DEXEC_BACKEND /
win32, and in that case the extra overhead (over an already expensive new
backend mechanism) is way more than that, so +1 to keep things simple here.

> +    /* Cumulate errors if any. */
> +    if (err_msg)
> +    {
> +        if (err_buf.len > 0)
> +            appendStringInfoChar(&err_buf, '\n');
> +        appendStringInfoString(&err_buf, err_msg);
> +    }
>
> This aggregates all the error messages for all the files included in a
> given repository.  As the patch stands, it seems to me that we would
> get errors related to an include_dir clause for two cases:
> - The specified path does not exist, in which case we have only one
> err_msg to consume and report back.
> - Multiple failures in opening and/or reading included files.
> In the second case, aggregating the reports would provide a full set
> of information, but that's not something a user would be able to act
> on directly as this is system-related.  Or there is a case to know a
> full list of files in the case of multiple files that cannot be read
> because of permission issues?  We may be fine with just the first
> system error here.  Note that in the case where files can be read and
> opened, these would have their own TokenizedAuthLines for each line
> parsed, meaning one line in the SQL views once translated to an
> HbaLine or an IdentLine.

If you have an include_dir directive and multiple files have wrong permission
(or maybe broken symlink or something like that), you will get multiple errors
when trying to process that single directive.  I think it's friendlier to
report as much detail as we can, so users can make sure they fix everything
rather than iterating over the first error.  That's especially helpful if the
fix is done in some external tooling (puppet or whatever) rather than directly
on the server.
>
> This line of thoughts brings an interesting point, actually: there is
> an inconsistency between "include_if_exists" and "include" compared to
> the GUC processing.  As of the patch, if we use "include" on a file
> that does not exist, the tokenization logic would jump over it and
> continue processing the follow-up entries anyway.  This is a different
> policy than the GUCs, where we would immediately stop looking at
> parameters after an "include" if it fails because its file does not
> exist, working as a immediate stop in the processing.  The difference
> that the patch brings between "include_if_exists" and "include" is
> that we report an error in one case but not the other, still skip the
> files in both cases and move on with the rest.  Hence my question,
> shouldn't we do like the GUC processing for the hba and ident files,
> aka stop immediately when we fail to find a file on an "include"
> clause?  This would be equivalent to doing a "break" in
> tokenize_file_with_context() after failing an include file.

I think that the problem is that we have the same interface for processing the
files on a startup/reload and for filling the views, so if we want the views to
be helpful and report all errors we have to also allow a bogus "include" to
continue in the reload case too.  The same problem doesn't exists for GUCs, so
a slightly different behavior there might be acceptable.



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: ExecRTCheckPerms() and many prunable partitions
Next
From: Masahiko Sawada
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum