Thread: libpqrcv_connect() leaks PGconn

libpqrcv_connect() leaks PGconn

From
Andres Freund
Date:
Hi,

We have code like this in libpqrcv_connect():

    conn = palloc0(sizeof(WalReceiverConn));
    conn->streamConn = PQconnectStartParams(keys, vals,
                                             /* expand_dbname = */ true);
    if (PQstatus(conn->streamConn) == CONNECTION_BAD)
    {
        *err = pchomp(PQerrorMessage(conn->streamConn));
        return NULL;
    }

        [try to establish connection]

    if (PQstatus(conn->streamConn) != CONNECTION_OK)
    {
        *err = pchomp(PQerrorMessage(conn->streamConn));
        return NULL;
    }


Am I missing something, or are we leaking the libpq connection in case of
errors?

It doesn't matter really for walreceiver, since it will exit anyway, but we
also use libpqwalreceiver for logical replication, where it might?


Seems pretty clear that we should do a PQfinish() before returning NULL? I
lean towards thinking that this isn't worth backpatching given the current
uses of libpq, but I could easily be convinced otherwise.


Noticed while taking another look through [1].

Greetings,

Andres Freund

[1] https://www.postgresql.org/message-id/20220925232237.p6uskba2dw6fnwj2%40awork3.anarazel.de



Re: libpqrcv_connect() leaks PGconn

From
Andres Freund
Date:
Hi,

On 2023-01-20 17:12:37 -0800, Andres Freund wrote:
> We have code like this in libpqrcv_connect():
> 
>     conn = palloc0(sizeof(WalReceiverConn));
>     conn->streamConn = PQconnectStartParams(keys, vals,
>                                              /* expand_dbname = */ true);
>     if (PQstatus(conn->streamConn) == CONNECTION_BAD)
>     {
>         *err = pchomp(PQerrorMessage(conn->streamConn));
>         return NULL;
>     }
> 
>         [try to establish connection]
> 
>     if (PQstatus(conn->streamConn) != CONNECTION_OK)
>     {
>         *err = pchomp(PQerrorMessage(conn->streamConn));
>         return NULL;
>     }
> 
> 
> Am I missing something, or are we leaking the libpq connection in case of
> errors?
> 
> It doesn't matter really for walreceiver, since it will exit anyway, but we
> also use libpqwalreceiver for logical replication, where it might?
> 
> 
> Seems pretty clear that we should do a PQfinish() before returning NULL? I
> lean towards thinking that this isn't worth backpatching given the current
> uses of libpq, but I could easily be convinced otherwise.
> 

It's bit worse than I earlier thought: We use walrv_connect() during CREATE
SUBSCRIPTION. One can easily exhaust file descriptors right now.  So I think
we need to fix this.

I also noticed the following in libpqrcv_connect, added in 11da97024abb:

    if (logical)
    {
        PGresult   *res;

        res = libpqrcv_PQexec(conn->streamConn,
                              ALWAYS_SECURE_SEARCH_PATH_SQL);
        if (PQresultStatus(res) != PGRES_TUPLES_OK)
        {
            PQclear(res);
            ereport(ERROR,
                    (errmsg("could not clear search path: %s",
                            pchomp(PQerrorMessage(conn->streamConn)))));
        }
        PQclear(res);
    }


Which doesn't seem quite right? The comment for the function says:

 * Returns NULL on error and fills the err with palloc'ed error message.

which this doesn't do. Of course we don't expect this to fail, but network
issues etc could still lead us to hit this case.  In this case we'll actually
have an open libpq connection around that we'll leak.


The attached patch fixes both issues.



I seems we don't have any tests for creating a subscription that fails during
connection establishment? That doesn't seem optimal - I guess there may have
been concern around portability of the error messages? I think we can control
for that in a tap test, by failing to connect due to a non-existant database,
then the error is under our control. Whereas e.g. an invalid hostname would
contain an error from gai_strerror().


Greetings,

Andres Freund

Attachment

Re: libpqrcv_connect() leaks PGconn

From
Noah Misch
Date:
On Fri, Jan 20, 2023 at 06:50:37PM -0800, Andres Freund wrote:
> On 2023-01-20 17:12:37 -0800, Andres Freund wrote:
> > We have code like this in libpqrcv_connect():
> > 
> >     conn = palloc0(sizeof(WalReceiverConn));
> >     conn->streamConn = PQconnectStartParams(keys, vals,
> >                                              /* expand_dbname = */ true);
> >     if (PQstatus(conn->streamConn) == CONNECTION_BAD)
> >     {
> >         *err = pchomp(PQerrorMessage(conn->streamConn));
> >         return NULL;
> >     }
> > 
> >         [try to establish connection]
> > 
> >     if (PQstatus(conn->streamConn) != CONNECTION_OK)
> >     {
> >         *err = pchomp(PQerrorMessage(conn->streamConn));
> >         return NULL;
> >     }
> > 
> > 
> > Am I missing something, or are we leaking the libpq connection in case of
> > errors?
> > 
> > It doesn't matter really for walreceiver, since it will exit anyway, but we
> > also use libpqwalreceiver for logical replication, where it might?
> > 
> > 
> > Seems pretty clear that we should do a PQfinish() before returning NULL? I
> > lean towards thinking that this isn't worth backpatching given the current
> > uses of libpq, but I could easily be convinced otherwise.
> > 
> 
> It's bit worse than I earlier thought: We use walrv_connect() during CREATE
> SUBSCRIPTION. One can easily exhaust file descriptors right now.  So I think
> we need to fix this.
> 
> I also noticed the following in libpqrcv_connect, added in 11da97024abb:

> The attached patch fixes both issues.

Looks good.  I'm not worried about a superuser hosing their own session via
CREATE SUBSCRIPTION failures in a loop.  At the same time, this fix is plenty
safe to back-patch.

> I seems we don't have any tests for creating a subscription that fails during
> connection establishment? That doesn't seem optimal - I guess there may have
> been concern around portability of the error messages?

Perhaps.  We have various (non-subscription) tests using "\set VERBOSITY
sqlstate" for that problem.  If even the sqlstate varies, a DO block is the
next level of error swallowing.

> I think we can control
> for that in a tap test, by failing to connect due to a non-existant database,
> then the error is under our control. Whereas e.g. an invalid hostname would
> contain an error from gai_strerror().

That sounds fine.



Re: libpqrcv_connect() leaks PGconn

From
Andres Freund
Date:
Hi,

On 2023-01-21 08:16:42 -0800, Noah Misch wrote:
> On Fri, Jan 20, 2023 at 06:50:37PM -0800, Andres Freund wrote:
> > On 2023-01-20 17:12:37 -0800, Andres Freund wrote:
> > > We have code like this in libpqrcv_connect():
> > It's bit worse than I earlier thought: We use walrv_connect() during CREATE
> > SUBSCRIPTION. One can easily exhaust file descriptors right now.  So I think
> > we need to fix this.
> > 
> > I also noticed the following in libpqrcv_connect, added in 11da97024abb:
> 
> > The attached patch fixes both issues.
> 
> Looks good.  I'm not worried about a superuser hosing their own session via
> CREATE SUBSCRIPTION failures in a loop.  At the same time, this fix is plenty
> safe to back-patch.

Yea, I'm not worried about it from a security perspective and more from a
usability perspective (but even there not terribly). File descriptors that
leaked, particularly when not reserved (AcquireExternalFD() etc), can lead to
weird problems down the line. And I think it's not that rare to need a few
attempts at getting the connection string, permissions, etc right.

Thanks for looking at the fix!


> > I seems we don't have any tests for creating a subscription that fails during
> > connection establishment? That doesn't seem optimal - I guess there may have
> > been concern around portability of the error messages?
> 
> Perhaps.  We have various (non-subscription) tests using "\set VERBOSITY
> sqlstate" for that problem.  If even the sqlstate varies, a DO block is the
> next level of error swallowing.

That's a good trick I need to remember. And the errcode for an invalid
connection string luckily differs from the one for a not working one.


I think found an even easier way - port=-1 is rejected during PQconnectPoll()
and will never even open a socket. That'd make it reasonable for the test to
happen in subscription.sql, instead of a tap test, I think (faster, easier to
maintain). It may be that we'll one day move that error into the
PQconninfoParse() phase, but I don't think we need to worry about it now.

Any reason not to go for that?

If not, I'll add a test for an invalid conninfo and a non-working connection
string to subscription.sql.

Greetings,

Andres Freund



Re: libpqrcv_connect() leaks PGconn

From
Noah Misch
Date:
On Sat, Jan 21, 2023 at 12:04:53PM -0800, Andres Freund wrote:
> On 2023-01-21 08:16:42 -0800, Noah Misch wrote:
> > On Fri, Jan 20, 2023 at 06:50:37PM -0800, Andres Freund wrote:
> > > I seems we don't have any tests for creating a subscription that fails during
> > > connection establishment? That doesn't seem optimal - I guess there may have
> > > been concern around portability of the error messages?
> > 
> > Perhaps.  We have various (non-subscription) tests using "\set VERBOSITY
> > sqlstate" for that problem.  If even the sqlstate varies, a DO block is the
> > next level of error swallowing.
> 
> That's a good trick I need to remember. And the errcode for an invalid
> connection string luckily differs from the one for a not working one.
> 
> 
> I think found an even easier way - port=-1 is rejected during PQconnectPoll()
> and will never even open a socket. That'd make it reasonable for the test to
> happen in subscription.sql, instead of a tap test, I think (faster, easier to
> maintain). It may be that we'll one day move that error into the
> PQconninfoParse() phase, but I don't think we need to worry about it now.
> 
> Any reason not to go for that?

No, a port=-1 test in subscription.sql sounds ideal.

> If not, I'll add a test for an invalid conninfo and a non-working connection
> string to subscription.sql.



Re: libpqrcv_connect() leaks PGconn

From
Andres Freund
Date:
On 2023-01-21 23:14:08 -0800, Noah Misch wrote:
> On Sat, Jan 21, 2023 at 12:04:53PM -0800, Andres Freund wrote:
> > On 2023-01-21 08:16:42 -0800, Noah Misch wrote:
> > > On Fri, Jan 20, 2023 at 06:50:37PM -0800, Andres Freund wrote:
> > > > I seems we don't have any tests for creating a subscription that fails during
> > > > connection establishment? That doesn't seem optimal - I guess there may have
> > > > been concern around portability of the error messages?
> > > 
> > > Perhaps.  We have various (non-subscription) tests using "\set VERBOSITY
> > > sqlstate" for that problem.  If even the sqlstate varies, a DO block is the
> > > next level of error swallowing.
> > 
> > That's a good trick I need to remember. And the errcode for an invalid
> > connection string luckily differs from the one for a not working one.
> > 
> > 
> > I think found an even easier way - port=-1 is rejected during PQconnectPoll()
> > and will never even open a socket. That'd make it reasonable for the test to
> > happen in subscription.sql, instead of a tap test, I think (faster, easier to
> > maintain). It may be that we'll one day move that error into the
> > PQconninfoParse() phase, but I don't think we need to worry about it now.
> > 
> > Any reason not to go for that?
> 
> No, a port=-1 test in subscription.sql sounds ideal.

Cool. Thanks for the review - pushed that way.