Thread: pipe_read_line for reading arbitrary strings

pipe_read_line for reading arbitrary strings

From
Daniel Gustafsson
Date:
When skimming through pg_rewind during a small review I noticed the use of
pipe_read_line for reading arbitrary data from a pipe, the mechanics of which
seemed odd.

Commit 5b2f4afffe6 refactored find_other_exec() and broke out pipe_read_line()
as a static convenience routine for reading a single line of output to catch a
version number.  Many years later, commit a7e8ece41 exposed it externally in
order to read a GUC from postgresql.conf using "postgres -C ..".  f06b1c598
also make use of it for reading a version string much like find_other_exec().
Funnily enough, while now used for arbitrary string reading the variable is
still "pgver".

Since the function requires passing a buffer/size, and at most size - 1 bytes
will be read via fgets(), there is a truncation risk when using this for
reading GUCs (like how pg_rewind does, though the risk there is slim to none).

If we are going to continue using this for reading $stuff from pipes, maybe we
should think about presenting a nicer API which removes that risk?  Returning
an allocated buffer which contains all the output along the lines of the recent
pg_get_line work seems a lot nicer and safer IMO.

The attached POC diff replace fgets() with pg_get_line(), which may not be an
Ok way to cross the streams (it's clearly not a great fit), but as a POC it
provided a neater interface for reading one-off lines from a pipe IMO.  Does
anyone else think this is worth fixing before too many callsites use it, or is
this another case of my fear of silent subtle truncation bugs?  =)

--
Daniel Gustafsson


Attachment

Re: pipe_read_line for reading arbitrary strings

From
Daniel Gustafsson
Date:
> On 7 Mar 2023, at 23:05, Daniel Gustafsson <daniel@yesql.se> wrote:
>
> When skimming through pg_rewind during a small review I noticed the use of
> pipe_read_line for reading arbitrary data from a pipe, the mechanics of which
> seemed odd.

A rebase of this for the CFBot since I realized I had forgotten to add this to
the July CF.

--
Daniel Gustafsson


Attachment

Re: pipe_read_line for reading arbitrary strings

From
Heikki Linnakangas
Date:
On 08/03/2023 00:05, Daniel Gustafsson wrote:
> When skimming through pg_rewind during a small review I noticed the use of
> pipe_read_line for reading arbitrary data from a pipe, the mechanics of which
> seemed odd.
> 
> Commit 5b2f4afffe6 refactored find_other_exec() and broke out pipe_read_line()
> as a static convenience routine for reading a single line of output to catch a
> version number.  Many years later, commit a7e8ece41 exposed it externally in
> order to read a GUC from postgresql.conf using "postgres -C ..".  f06b1c598
> also make use of it for reading a version string much like find_other_exec().
> Funnily enough, while now used for arbitrary string reading the variable is
> still "pgver".
> 
> Since the function requires passing a buffer/size, and at most size - 1 bytes
> will be read via fgets(), there is a truncation risk when using this for
> reading GUCs (like how pg_rewind does, though the risk there is slim to none).

Good point.

> If we are going to continue using this for reading $stuff from pipes, maybe we
> should think about presenting a nicer API which removes that risk?  Returning
> an allocated buffer which contains all the output along the lines of the recent
> pg_get_line work seems a lot nicer and safer IMO.

+1

> /*
>  * Execute a command in a pipe and read the first line from it. The returned
>  * string is allocated, the caller is responsible for freeing.
>  */
> char *
> pipe_read_line(char *cmd)

I think it's worth being explicit here that it's palloc'd, or malloc'd 
in frontend programs, rather than just "allocated". Like in pg_get_line.

Other than that, LGTM.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: pipe_read_line for reading arbitrary strings

From
Daniel Gustafsson
Date:
> On 4 Jul 2023, at 13:59, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 08/03/2023 00:05, Daniel Gustafsson wrote:

>> If we are going to continue using this for reading $stuff from pipes, maybe we
>> should think about presenting a nicer API which removes that risk?  Returning
>> an allocated buffer which contains all the output along the lines of the recent
>> pg_get_line work seems a lot nicer and safer IMO.
>
> +1

Thanks for review!

>> /*
>> * Execute a command in a pipe and read the first line from it. The returned
>> * string is allocated, the caller is responsible for freeing.
>> */
>> char *
>> pipe_read_line(char *cmd)
>
> I think it's worth being explicit here that it's palloc'd, or malloc'd in frontend programs, rather than just
"allocated".Like in pg_get_line. 

Good point, I'll make that happen before committing this.

--
Daniel Gustafsson




Re: pipe_read_line for reading arbitrary strings

From
Daniel Gustafsson
Date:
> On 4 Jul 2023, at 14:50, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 4 Jul 2023, at 13:59, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> On 08/03/2023 00:05, Daniel Gustafsson wrote:
>
>>> If we are going to continue using this for reading $stuff from pipes, maybe we
>>> should think about presenting a nicer API which removes that risk?  Returning
>>> an allocated buffer which contains all the output along the lines of the recent
>>> pg_get_line work seems a lot nicer and safer IMO.
>>
>> +1
>
> Thanks for review!
>
>>> /*
>>> * Execute a command in a pipe and read the first line from it. The returned
>>> * string is allocated, the caller is responsible for freeing.
>>> */
>>> char *
>>> pipe_read_line(char *cmd)
>>
>> I think it's worth being explicit here that it's palloc'd, or malloc'd in frontend programs, rather than just
"allocated".Like in pg_get_line. 
>
> Good point, I'll make that happen before committing this.

Fixed, along with commit message wordsmithing in the attached.  Unless objected
to I'll go ahead with this version.

--
Daniel Gustafsson


Attachment

Re: pipe_read_line for reading arbitrary strings

From
John Naylor
Date:
On Mon, Sep 25, 2023 at 2:55 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> Fixed, along with commit message wordsmithing in the attached.  Unless objected
> to I'll go ahead with this version.

+1



Re: pipe_read_line for reading arbitrary strings

From
Alvaro Herrera
Date:
On 2023-Mar-07, Daniel Gustafsson wrote:

> The attached POC diff replace fgets() with pg_get_line(), which may not be an
> Ok way to cross the streams (it's clearly not a great fit), but as a POC it
> provided a neater interface for reading one-off lines from a pipe IMO.  Does
> anyone else think this is worth fixing before too many callsites use it, or is
> this another case of my fear of silent subtle truncation bugs?  =)

I think this is generally a good change.

I think pipe_read_line should have a "%m" in the "no data returned"
error message.  pg_read_line is careful to retain errno (and it was
already zero at start), so this should be okay ... or should we set
errno again to zero after popen(), even if it works?

(I'm not sure I buy pg_read_line's use of perror in the backend case.
Maybe this is only okay because the backend doesn't use this code?)

pg_get_line says caller can distinguish error from no-input-before-EOF
with ferror(), but pipe_read_line does no such thing.  I wonder what
happens if an NFS mount containing a file being read disappears in the
middle of reading it, for example ... will we think that we completed
reading the file, rather than erroring out?

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"That sort of implies that there are Emacs keystrokes which aren't obscure.
I've been using it daily for 2 years now and have yet to discover any key
sequence which makes any sense."                        (Paul Thomas)



Re: pipe_read_line for reading arbitrary strings

From
Daniel Gustafsson
Date:
> On 22 Nov 2023, at 13:47, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2023-Mar-07, Daniel Gustafsson wrote:
>
>> The attached POC diff replace fgets() with pg_get_line(), which may not be an
>> Ok way to cross the streams (it's clearly not a great fit), but as a POC it
>> provided a neater interface for reading one-off lines from a pipe IMO.  Does
>> anyone else think this is worth fixing before too many callsites use it, or is
>> this another case of my fear of silent subtle truncation bugs?  =)
>
> I think this is generally a good change.

Thanks for review!

> I think pipe_read_line should have a "%m" in the "no data returned"
> error message.

Good point.

> pg_read_line is careful to retain errno (and it was
> already zero at start), so this should be okay ... or should we set
> errno again to zero after popen(), even if it works?

While it shouldn't be needed, reading manpages from a variety of systems
indicates that popen() isn't entirely reliable when it comes to errno so I've
added an explicit errno=0 just to be certain.

> (I'm not sure I buy pg_read_line's use of perror in the backend case.
> Maybe this is only okay because the backend doesn't use this code?)

In EXEC_BACKEND builds the postmaster will use find_other_exec which in turn
calls pipe_read_line, so there is a possibility.  I agree it's proabably not a
good idea, I'll have a look at it separately and will raise on a new thread.

> pg_get_line says caller can distinguish error from no-input-before-EOF
> with ferror(), but pipe_read_line does no such thing.  I wonder what
> happens if an NFS mount containing a file being read disappears in the
> middle of reading it, for example ... will we think that we completed
> reading the file, rather than erroring out?

Interesting, that's an omission which should be fixed.  I notice there are a
number of callsites using pg_get_line which skip validating with ferror(), I'll
have a look at those next (posting findings to a new thread).

--
Daniel Gustafsson


Attachment

Re: pipe_read_line for reading arbitrary strings

From
Daniel Gustafsson
Date:
The attached v5 is a rebase with no new changes just to get a fresh run in the
CFBot before pushing.  All review comments have been addressed and the patch
has been Ready for Committer for some time, just didn't have time to get to it
in the last CF.

--
Daniel Gustafsson


Attachment

Re: pipe_read_line for reading arbitrary strings

From
Peter Eisentraut
Date:
On 22.11.23 13:47, Alvaro Herrera wrote:
> On 2023-Mar-07, Daniel Gustafsson wrote:
> 
>> The attached POC diff replace fgets() with pg_get_line(), which may not be an
>> Ok way to cross the streams (it's clearly not a great fit), but as a POC it
>> provided a neater interface for reading one-off lines from a pipe IMO.  Does
>> anyone else think this is worth fixing before too many callsites use it, or is
>> this another case of my fear of silent subtle truncation bugs?  =)
> 
> I think this is generally a good change.
> 
> I think pipe_read_line should have a "%m" in the "no data returned"
> error message.  pg_read_line is careful to retain errno (and it was
> already zero at start), so this should be okay ... or should we set
> errno again to zero after popen(), even if it works?

Is this correct? The code now looks like this:

     line = pg_get_line(pipe_cmd, NULL);

     if (line == NULL)
     {
         if (ferror(pipe_cmd))
             log_error(errcode_for_file_access(),
                       _("could not read from command \"%s\": %m"), cmd);
         else
             log_error(errcode_for_file_access(),
                       _("no data was returned by command \"%s\": %m"), 
cmd);
     }

We already handle the case where an error happened in the first branch, 
so there cannot be an error set in the second branch (unless something 
nonobvious is going on?).

It seems to me that if the command being run just happens to print 
nothing but is otherwise successful, this would print a bogus error code 
(or "Success")?




Re: pipe_read_line for reading arbitrary strings

From
Daniel Gustafsson
Date:
> On 6 Mar 2024, at 10:07, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 22.11.23 13:47, Alvaro Herrera wrote:
>> On 2023-Mar-07, Daniel Gustafsson wrote:
>>> The attached POC diff replace fgets() with pg_get_line(), which may not be an
>>> Ok way to cross the streams (it's clearly not a great fit), but as a POC it
>>> provided a neater interface for reading one-off lines from a pipe IMO.  Does
>>> anyone else think this is worth fixing before too many callsites use it, or is
>>> this another case of my fear of silent subtle truncation bugs?  =)
>> I think this is generally a good change.
>> I think pipe_read_line should have a "%m" in the "no data returned"
>> error message.  pg_read_line is careful to retain errno (and it was
>> already zero at start), so this should be okay ... or should we set
>> errno again to zero after popen(), even if it works?
>
> Is this correct? The code now looks like this:
>
>    line = pg_get_line(pipe_cmd, NULL);
>
>    if (line == NULL)
>    {
>        if (ferror(pipe_cmd))
>            log_error(errcode_for_file_access(),
>                      _("could not read from command \"%s\": %m"), cmd);
>        else
>            log_error(errcode_for_file_access(),
>                      _("no data was returned by command \"%s\": %m"), cmd);
>    }
>
> We already handle the case where an error happened in the first branch, so there cannot be an error set in the second
branch(unless something nonobvious is going on?). 
>
> It seems to me that if the command being run just happens to print nothing but is otherwise successful, this would
printa bogus error code (or "Success")? 

Good catch, that's an incorrect copy/paste, it should use ERRCODE_NO_DATA.  I'm
not convinced that a function to read from a pipe should consider not reading
anything successful by default, output is sort expected here.  We could add a
flag parameter to use for signalling that no data is fine though as per the
attached (as of yet untested) diff?

--
Daniel Gustafsson


Attachment

Re: pipe_read_line for reading arbitrary strings

From
Alvaro Herrera
Date:
On 2024-Mar-06, Daniel Gustafsson wrote:

> Good catch, that's an incorrect copy/paste, it should use ERRCODE_NO_DATA.  I'm
> not convinced that a function to read from a pipe should consider not reading
> anything successful by default, output is sort expected here.  We could add a
> flag parameter to use for signalling that no data is fine though as per the
> attached (as of yet untested) diff?

I think adding dead code is not a great plan, particularly if it's hairy
enough that we need to very carefully dissect what happens in error
cases.  IMO if and when somebody has a need for an empty return string
being acceptable, they can add it then.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/



Re: pipe_read_line for reading arbitrary strings

From
Daniel Gustafsson
Date:
> On 6 Mar 2024, at 11:46, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2024-Mar-06, Daniel Gustafsson wrote:
>
>> Good catch, that's an incorrect copy/paste, it should use ERRCODE_NO_DATA.  I'm
>> not convinced that a function to read from a pipe should consider not reading
>> anything successful by default, output is sort expected here.  We could add a
>> flag parameter to use for signalling that no data is fine though as per the
>> attached (as of yet untested) diff?
>
> I think adding dead code is not a great plan, particularly if it's hairy
> enough that we need to very carefully dissect what happens in error
> cases.  IMO if and when somebody has a need for an empty return string
> being acceptable, they can add it then.

I agree with that, there are no callers today and I can't imagine one off the
cuff.  The change to use the appropriate errcode still applies though.

--
Daniel Gustafsson




Re: pipe_read_line for reading arbitrary strings

From
Peter Eisentraut
Date:
On 06.03.24 10:54, Daniel Gustafsson wrote:
>> On 6 Mar 2024, at 10:07, Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>> On 22.11.23 13:47, Alvaro Herrera wrote:
>>> On 2023-Mar-07, Daniel Gustafsson wrote:
>>>> The attached POC diff replace fgets() with pg_get_line(), which may not be an
>>>> Ok way to cross the streams (it's clearly not a great fit), but as a POC it
>>>> provided a neater interface for reading one-off lines from a pipe IMO.  Does
>>>> anyone else think this is worth fixing before too many callsites use it, or is
>>>> this another case of my fear of silent subtle truncation bugs?  =)
>>> I think this is generally a good change.
>>> I think pipe_read_line should have a "%m" in the "no data returned"
>>> error message.  pg_read_line is careful to retain errno (and it was
>>> already zero at start), so this should be okay ... or should we set
>>> errno again to zero after popen(), even if it works?
>>
>> Is this correct? The code now looks like this:
>>
>>     line = pg_get_line(pipe_cmd, NULL);
>>
>>     if (line == NULL)
>>     {
>>         if (ferror(pipe_cmd))
>>             log_error(errcode_for_file_access(),
>>                       _("could not read from command \"%s\": %m"), cmd);
>>         else
>>             log_error(errcode_for_file_access(),
>>                       _("no data was returned by command \"%s\": %m"), cmd);
>>     }
>>
>> We already handle the case where an error happened in the first branch, so there cannot be an error set in the
secondbranch (unless something nonobvious is going on?).
 
>>
>> It seems to me that if the command being run just happens to print nothing but is otherwise successful, this would
printa bogus error code (or "Success")?
 
> 
> Good catch, that's an incorrect copy/paste, it should use ERRCODE_NO_DATA.

Also it shouldn't print %m, was my point.




Re: pipe_read_line for reading arbitrary strings

From
Daniel Gustafsson
Date:
> On 8 Mar 2024, at 18:13, Peter Eisentraut <peter@eisentraut.org> wrote:

>> Good catch, that's an incorrect copy/paste, it should use ERRCODE_NO_DATA.
> 
> Also it shouldn't print %m, was my point.


Absolutely, I removed that in the patch upthread, it was clearly wrong.

--
Daniel Gustafsson




Re: pipe_read_line for reading arbitrary strings

From
Daniel Gustafsson
Date:
> On 8 Mar 2024, at 19:38, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 8 Mar 2024, at 18:13, Peter Eisentraut <peter@eisentraut.org> wrote:
>
>>> Good catch, that's an incorrect copy/paste, it should use ERRCODE_NO_DATA.
>>
>> Also it shouldn't print %m, was my point.
>
> Absolutely, I removed that in the patch upthread, it was clearly wrong.

Pushed the fix for the incorrect logline and errcode.

--
Daniel Gustafsson