Re: Allow file inclusion in pg_hba and pg_ident files - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Allow file inclusion in pg_hba and pg_ident files
Date
Msg-id 20220523114308.qxnbjnxox23i5gsv@jrouhaud
Whole thread Raw
In response to Re: Allow file inclusion in pg_hba and pg_ident files  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Allow file inclusion in pg_hba and pg_ident files
List pgsql-hackers
Hi,

On Mon, May 23, 2022 at 03:53:06PM +0900, Michael Paquier wrote:
>
> +       switch (kind)
> +       {
> +           case SecondaryAuthFile:
> +               msglog = "could not open secondary authentication file \"@%s\" as \"%s\": %m";
> +               msgview = "could not open secondary authentication file \"@%s\" as \"%s\": %s";
> +               break;
> +           case IncludedAuthFile:
> +               msglog = "could not open included authentication file \"%s\" as \"%s\": %m";
> +               msgview = "could not open included authentication file \"%s\" as \"%s\": %s";
> +               break;
> +           default:
> +               elog(ERROR, "unknown HbaIncludeKind: %d", kind);
> +               break;
> +       }
> I don't think that HbaIncludeKind is necessary, considering that we
> could rely on the file name to provide enough context about the type
> involved in the error string generated.

I'm not sure it would always be the case.  For instance, why wouldn't someone
use something like

peer all @app_users ident

rather than

@include app_users

or a mix of both?

That being said, I'd gladly drop that enum and only handle a single error
message, as the rest of the error context (including the owning file name and
line) should provide enough information to users.

If so, should I use "included authentication file" everywhere, or use something
else?

> The "default" clause in the
> switch could just be removed, btw, to generate warnings if a new value
> is added in the kind enum.

Right.

> +       /* relative path is relative to dir of calling file */
> There are many relatives here.  It seems to me that this is the kind
> of behavior we should document precisely

Agreed.  I will try to mimic how it's done for
https://www.postgresql.org/docs/current/config-setting.html#CONFIG-INCLUDES.

> (and test!).

Sure.  As I said previously I'm waiting for some agreement on the feature (and
the syntax), before adding tests.  After that I will definitely include full
test coverage!

> I am
> understanding the following for cascading configurations:
> - pg_hba.conf has "include_dir path1".
> - path1/ has file1.conf that has "include file2.conf".  This means
> that file2.conf has to be also included in path1/.

Right, same rules as for postgresql.conf's include_dir.
>
> postmaster.c and postinit.c do that:
>     /*
>      * Load configuration files for client authentication.
>      */
>     if (!load_hba())
>     {
>         /*
>          * It makes no sense to continue if we fail to load the HBA file,
>          * since there is no way to connect to the database in this case.
>          */
>         ereport(FATAL,
>                 (errmsg("could not load pg_hba.conf")));
>     }
> This could be confusing as a different file may fail to load, while
> pg_hba.conf was able to do its work outside an include clause.

Well, even if included they are still part of the configuration and lead to a
failure to load the main file.

The detail should be provided in the logs so it should disambiguate the
situation.  While you're mentioning this message, AFAICS it can already be
entirely wrong as-is, as it doesn't take into account the hba_file
configuration:

ALTER SYSTEM SET hba_file = '/tmp/myfile';

After a restart:
2022-05-23 18:52:08.081 CST [41890] LOG:  could not open configuration file
"/tmp/myfile": No such file or directory
2022-05-23 18:52:08.081 CST [41890] FATAL:  could not load pg_hba.conf

> How does include_dir work with the ordering of the HBA entries across
> multiple files?

Bugs apart, same as for postgresql.conf's include_dir.

> The set of HBA rules are very sensitive to their
> ordering, and ReadDir() depends on readdir(), which provides a list
> of files depending on its FS implementation.  That does not seem like
> a sane concept to me.  I can this this order (tracked by rule_number)
> being strictly enforced when it comes to the loading of files, though,
> so I would recommend to focus on the implementation of "include"
> rather than "include_dir".

But the same problem already exists for the postgresql.conf's include_dir.

Having an hba rule masking another isn't really better than having a GUC
value overloaded by another one.  Usually people rely on a sane naming (like
01_file.conf and so on) to get a predictable behavior, and we even document
that very thoroughly for the postgresql.conf part.  Improving the
documentation, as you already pointed out a bit above, should be enough?

> +      <para>
> +       Rule number, in priority order, of this rule if the rule is valid,
> +       otherwise null
> +      </para></entry>
> This is a very important field.  I think that this explanation should
> be extended and explain why relying on this number counts (aka this is
> the order of the rules loaded across files).  Btw, this could be added
> as a separate patch, even if this maps to the line number when it
> comes to the ordering.

Agreed, I will improve the documentation to outline the importance of that
information.

Do you mean a separate patch to ease review or to eventually commit both
separately?  FWIW I don't think that adding any form of inclusion in the hba
files should be done without a way for users to check the results.  Any test
facility would also probably rely on this field.
>
> +# include       FILE
> +# include_dir   FILE
> You mean s/FILE/DIRECTORY/ for include_dir, I guess?

Oops yes.
>
> +                   /*
> +                    * Only parse files with names ending in ".conf".
> +                    * Explicitly reject files starting with ".". This
> +                    * excludes things like "." and "..", as well as typical
> +                    * hidden files, backup files, and editor debris.
> +                    */
> I don't think that there is any need to restrict that to files ending
> with .conf.  We don't do that for postgresql.conf's include, for one.

I'm confused.  Are you talking about postgresql.conf's include or include_dir
option?  I'm pretty sure that I borrowed this logic from
src/backend/utils/misc/guc-file.l / ParseConfigDirectory(), which implements
the include_dir logic for the postgresql.conf file.

> In 0002, pg_hba_matches() had better have some documentation,
> explaining for which purpose this function can be used with a short
> example (aka for an address and a role, find the matching set of HBA
> rules and report their line and file)?
>
> I am not sure to be a huge fan of this implementation, actually.  The
> function is shaped so as the user provides in input the arguments to
> fill hbaPort with, passing it down to check_hba().  This could bite
> easily in the future if some of the internal fields filled in by the
> HBA load and used by the HBA check change over time, particularly if
> this stuff has no tests to provide some validation, though we
> discussed that a couple of months ago.  Perhaps we should think
> harder on this point.

First, it's for now only a POC to demonstrate a way to diagnose problems in
those files (thus no doc and obviously no tests).  I added some details in my
initial email:

> Finally I also added 0003, which is a POC for a new pg_hba_matches()
> function, that can help DBA to understand why their configuration isn't
> working as they expect.  This only to start the discussion on that topic, the
> code is for now really hackish, as I don't know how much this is wanted
> and/or if some other behavior would be better, and there's also no
> documentation or test.  The function for now only takes an optional inet
> (null means unix socket), the target role and an optional ssl flag and
> returns the file, line and raw line matching if any, or null.  For instance:
>
> =# select * from pg_hba_matches('127.0.0.1'::inet, 'postgres');
> -[ RECORD 1  ]-----------------------------------------------------------------
> file_name | /tmp/pgbuild/toto.conf
> line_num  | 5
> raw_line  | host    all            all            127.0.0.1/32           trust

I will put more details and this example in the commit message if you want, but
if no one is interested in that feature I'm ok with discarding it.

I can of course change the implementation, but unless I'm missing something
users will always have to provide all the relevant info to check the behavior
for a specific connection origin?

> +       char        tmp[sizeof("xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:255.255.255.255/128")];
> Okay.  This is the same way of doing things as network.c or
> inet_cidr_ntop.c.  Shouldn't we centralize the definition of such a
> maximum size instead?
>
> +   if (!load_hba())
> +       ereport(ERROR,
> +               (errcode(ERRCODE_CONFIG_FILE_ERROR),
> +                errmsg("Invalidation auth configuration file")));
> This error message sounds wrong to me.  It would be more consistent to
> write that as "could not load authentication file" or such.
> postinit.c and postmaster.c do that (these error strings become
> partially confusing with the possibility to include extra auth files,
> actually, on a separate note).

Oops, I meant "Invalid".  I'll Cleanup the error messages and refactor stuff if
this goes beyond the POC stage.

> +   if (PG_ARGISNULL(1))
> +       ereport(ERROR,
> +               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                errmsg("parameter role is mandatory")));
> +   port->user_name = text_to_cstring(PG_GETARG_TEXT_PP(1));
> This could be moved at the beginning of the function, before
> processing the first argument of the function (address) in details.

I thought it'd be better to handle each argument sequentially for readability,
but I could move all the sanity checks first if you prefer.



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: create_help.pl treats as replaceable
Next
From: Amit Kapila
Date:
Subject: Re: Support logical replication of DDLs