Thread: WIP: Fix parallel workers connection bug in pg_dump (Bug #13727)

WIP: Fix parallel workers connection bug in pg_dump (Bug #13727)

From
Zeus Kronion
Date:
Parallel workers were failing to connect to the database when running pg_dump with a connection string. The first of the following two commands runs without errors, while the second one fails:
pg_dump "postgres://my-user:my-password@my.hostname.com:5432/my-db" -Fd -f my-dump
pg_dump "postgres://my-user:my-password@my.hostname.com:5432/my-db" -Fd --jobs=9 -f my-dump

The error message:
pg_dump: [parallel archiver] connection to database "my-db" failed: fe_sendauth: no password supplied

The password is not being stored correctly in the PGconn object when connecting with a connection string.

This is my first time contributing to Postgres, so I tried to stick to the instructions from the "Submitting a Patch" wiki. This submission is for discussion because I haven't figured out how to write regression tests for this patch yet (and I would appreciate guidance).

Target branch: master
Compiles and tests successfully: true
Platform-specific items: none
Regression tests: still needed
Documentation: N/A
Performance implications: none
Attachment

Re: WIP: Fix parallel workers connection bug in pg_dump (Bug #13727)

From
Zeus Kronion
Date:
I'm still unclear on how to write regression tests for a connectivity bug. Are they necessary in this case?

On Sun, Oct 25, 2015 at 5:55 PM, Zeus Kronion <zkronion@gmail.com> wrote:
Parallel workers were failing to connect to the database when running pg_dump with a connection string. The first of the following two commands runs without errors, while the second one fails:
pg_dump "postgres://my-user:my-password@my.hostname.com:5432/my-db" -Fd -f my-dump
pg_dump "postgres://my-user:my-password@my.hostname.com:5432/my-db" -Fd --jobs=9 -f my-dump

The error message:
pg_dump: [parallel archiver] connection to database "my-db" failed: fe_sendauth: no password supplied

The password is not being stored correctly in the PGconn object when connecting with a connection string.

This is my first time contributing to Postgres, so I tried to stick to the instructions from the "Submitting a Patch" wiki. This submission is for discussion because I haven't figured out how to write regression tests for this patch yet (and I would appreciate guidance).

Target branch: master
Compiles and tests successfully: true
Platform-specific items: none
Regression tests: still needed
Documentation: N/A
Performance implications: none

Re: WIP: Fix parallel workers connection bug in pg_dump (Bug #13727)

From
Euler Taveira
Date:
On 30-10-2015 10:04, Zeus Kronion wrote:
> I'm still unclear on how to write regression tests for a connectivity
> bug. Are they necessary in this case?
>
There aren't regression tests for pg_dump. However, your instructions 
are sufficient to demonstrate the bug.

You could continue the thread in -bugs because the discussion started 
there. Sometimes people track -bugs ML to make sure that some bugs 
aren't forgotten. Add your patch to the next CF [1].


[1] https://commitfest.postgresql.org/7/


--    Euler Taveira                   Timbira - http://www.timbira.com.br/   PostgreSQL: Consultoria, Desenvolvimento,
Suporte24x7 e Treinamento
 



Re: WIP: Fix parallel workers connection bug in pg_dump (Bug #13727)

From
Michael Paquier
Date:
On Fri, Oct 30, 2015 at 2:42 PM, Euler Taveira <euler@timbira.com.br> wrote:
> On 30-10-2015 10:04, Zeus Kronion wrote:
>>
>> I'm still unclear on how to write regression tests for a connectivity
>> bug. Are they necessary in this case?
>>
> There aren't regression tests for pg_dump. However, your instructions are
> sufficient to demonstrate the bug.

Well, we could have something in pg_dump/t/, though the instance set
by standard_initdb would require some update in pg_hba.conf to switch
to md5 before running the dump.

> You could continue the thread in -bugs because the discussion started there.
> Sometimes people track -bugs ML to make sure that some bugs aren't
> forgotten. Add your patch to the next CF [1].

Yep. Things get easily lost.
-- 
Michael



Re: WIP: Fix parallel workers connection bug in pg_dump (Bug #13727)

From
Marko Tiikkaja
Date:
On 10/25/15 10:55 PM, Zeus Kronion wrote:
> Parallel workers were failing to connect to the database when running
> pg_dump with a connection string. The first of the following two commands
> runs without errors, while the second one fails:
> pg_dump "postgres://my-user:my-password@my.hostname.com:5432/my-db" -Fd -f
> my-dump
> pg_dump "postgres://my-user:my-password@my.hostname.com:5432/my-db" -Fd
> --jobs=9 -f my-dump
>
> The error message:
> pg_dump: [parallel archiver] connection to database "my-db" failed:
> fe_sendauth: no password supplied
>
> The password is not being stored correctly in the PGconn object when
> connecting with a connection string.

Yeah, the current code is definitely broken for this case.  However, I 
don't feel like this patch is quite there yet, either.  _connectDB has 
similar logic in it which might be hit in case e.g. a a user's HBA is 
changed from a non-password-requiring method to a password-requiring one 
after the one or more connections has been initiated.  That one needs 
changing as well.

However, I don't quite like the way the password cache is kept up to 
date in the old *or* the new code.  It seems to me that it should 
instead look like:
   if (PQconnectionUsedPassword(AH->connection))       AH->savedPassword = PQpass(AH->connection);

What do you think?


.m



Re: WIP: Fix parallel workers connection bug in pg_dump (Bug #13727)

From
Zeus Kronion
Date:
<p dir="ltr"><br /> On Nov 1, 2015 5:04 PM, "Marko Tiikkaja" <<a href="mailto:marko@joh.to">marko@joh.to</a>>
wrote:<br/> ><br /> > On 10/25/15 10:55 PM, Zeus Kronion wrote:<br /> >><br /> >> Parallel workers
werefailing to connect to the database when running<br /> >> pg_dump with a connection string. The first of the
followingtwo commands<br /> >> runs without errors, while the second one fails:<br /> >> pg_dump
"postgres://<a
href="http://my-user:my-password@my.hostname.com:5432/my-db">my-user:my-password@my.hostname.com:5432/my-db</a>"-Fd
-f<br/> >> my-dump<br /> >> pg_dump "postgres://<a
href="http://my-user:my-password@my.hostname.com:5432/my-db">my-user:my-password@my.hostname.com:5432/my-db</a>"-Fd<br
/>>> --jobs=9 -f my-dump<br /> >><br /> >> The error message:<br /> >> pg_dump: [parallel
archiver]connection to database "my-db" failed:<br /> >> fe_sendauth: no password supplied<br /> >><br />
>>The password is not being stored correctly in the PGconn object when<br /> >> connecting with a
connectionstring.<br /> ><br /> ><br /> > Yeah, the current code is definitely broken for this case.  However,
Idon't feel like this patch is quite there yet, either.  _connectDB has similar logic in it which might be hit in case
e.g.a a user's HBA is changed from a non-password-requiring method to a password-requiring one after the one or more
connectionshas been initiated.  That one needs changing as well.<p dir="ltr">I wasn't aware of that case. Should be an
easyfix to make this weekend.<p dir="ltr">> However, I don't quite like the way the password cache is kept up to
datein the old *or* the new code.  It seems to me that it should instead look like:<br /> ><br /> >    if
(PQconnectionUsedPassword(AH->connection))<br/> >        AH->savedPassword = PQpass(AH->connection);<br />
><br/> > What do you think?<br /><br /> I don't understand why this logic is preferable. Is your concern that
AH->savedPasswordmay contain a password even when none is needed? Or is the change simply meant to give the reader a
bettersense of what is actually going on?<p dir="ltr">-CS<br /> 

Re: WIP: Fix parallel workers connection bug in pg_dump (Bug #13727)

From
Marko Tiikkaja
Date:
On 11/5/15 4:11 PM, Zeus Kronion wrote:
> On Nov 1, 2015 5:04 PM, "Marko Tiikkaja" <marko@joh.to> wrote:
>> However, I don't quite like the way the password cache is kept up to date
> in the old *or* the new code.  It seems to me that it should instead look
> like:
>>
>>     if (PQconnectionUsedPassword(AH->connection))
>>         AH->savedPassword = PQpass(AH->connection);
>>
>> What do you think?
>
> I don't understand why this logic is preferable. Is your concern that
> AH->savedPassword may contain a password even when none is needed?

The opposite, sort of.  If the first connection uses a password, the 
second one doesn't, and the third one does again, you need to ask for a 
password again because you emptied the cache on the second attempt since 
it didn't use a password.  Granted, this situation is quite unlikely to 
occur in practice, but I find the "correct" code *also* more readable. 
To me it reads like "if the what we're caching was applied during the 
connection attempt, update the cache; otherwise keep the previous value 
in case it's useful in the future".


.m



Re: WIP: Fix parallel workers connection bug in pg_dump (Bug #13727)

From
Michael Paquier
Date:
On Fri, Nov 6, 2015 at 12:23 AM, Marko Tiikkaja <marko@joh.to> wrote:
> On 11/5/15 4:11 PM, Zeus Kronion wrote:
>>
>> On Nov 1, 2015 5:04 PM, "Marko Tiikkaja" <marko@joh.to> wrote:
>>>
>>> However, I don't quite like the way the password cache is kept up to date
>>
>> in the old *or* the new code.  It seems to me that it should instead look
>> like:
>>>
>>>
>>>     if (PQconnectionUsedPassword(AH->connection))
>>>         AH->savedPassword = PQpass(AH->connection);
>>>
>>> What do you think?
>>
>>
>> I don't understand why this logic is preferable. Is your concern that
>> AH->savedPassword may contain a password even when none is needed?
>
>
> The opposite, sort of.  If the first connection uses a password, the second
> one doesn't, and the third one does again, you need to ask for a password
> again because you emptied the cache on the second attempt since it didn't
> use a password.  Granted, this situation is quite unlikely to occur in
> practice, but I find the "correct" code *also* more readable. To me it reads
> like "if the what we're caching was applied during the connection attempt,
> update the cache; otherwise keep the previous value in case it's useful in
> the future".

OK, I think that we had better address this bug for this CF. I am
attaching an updated patch following Marko's suggestion, and marking
this patch as ready for committer. I have noticed that the fix was
actually not complete: ConnectDatabase needs a similar fix, this is a
code path when a connection string like "dbname=db user=foo" is used.
And this was failing as well when the password is passed directly in
the connection string for multiple jobs.
--
Michael

Attachment

Re: WIP: Fix parallel workers connection bug in pg_dump (Bug #13727)

From
Tom Lane
Date:
Michael Paquier <michael.paquier@gmail.com> writes:
> OK, I think that we had better address this bug for this CF. I am
> attaching an updated patch following Marko's suggestion, and marking
> this patch as ready for committer. I have noticed that the fix was
> actually not complete: ConnectDatabase needs a similar fix, this is a
> code path when a connection string like "dbname=db user=foo" is used.
> And this was failing as well when the password is passed directly in
> the connection string for multiple jobs.

As written, this would leak password strings, and it even seems possible
that it would leave savedPassword pointing at dangling memory, since the
free(password) inside the loop might free what savedPassword is pointing
at, and then in principle we might fail to overwrite savedPassword
afterwards.  This probably can't happen in practice because it'd require
successive connection attempts to come to different conclusions about
PQconnectionNeedsPassword/PQconnectionUsedPassword.  But it seems pretty
fragile in the face of future changes to this code.  I modified it further
so that "password" and "savedPassword" never share storage, and pushed it.

A larger concern is that I suspect we need to abandon this whole approach
of passing a different representation of the connection parameters than
we were given the first time through.  If dbname is a connection URI
(or keyword=value connection string), it might well contain more
information than just host + port + username + password.  We're losing
any such details during the workers' reconnections.  But that looks like
it would be a rather wide-ranging rewrite, so I just committed what we
had for now; at least we fixed the reported bug symptom.
        regards, tom lane



Re: WIP: Fix parallel workers connection bug in pg_dump (Bug #13727)

From
Michael Paquier
Date:
On Thu, Dec 24, 2015 at 4:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> As written, this would leak password strings, and it even seems possible
> that it would leave savedPassword pointing at dangling memory, since the
> free(password) inside the loop might free what savedPassword is pointing
> at, and then in principle we might fail to overwrite savedPassword
> afterwards.  This probably can't happen in practice because it'd require
> successive connection attempts to come to different conclusions about
> PQconnectionNeedsPassword/PQconnectionUsedPassword.

Yes, that's what I was assuming.

> But it seems pretty
> fragile in the face of future changes to this code.  I modified it further
> so that "password" and "savedPassword" never share storage, and pushed it.

OK, thanks for fixing the issue!

> A larger concern is that I suspect we need to abandon this whole approach
> of passing a different representation of the connection parameters than
> we were given the first time through.  If dbname is a connection URI
> (or keyword=value connection string), it might well contain more
> information than just host + port + username + password.  We're losing
> any such details during the workers' reconnections.  But that looks like
> it would be a rather wide-ranging rewrite, so I just committed what we
> had for now; at least we fixed the reported bug symptom.

vacuumdb suffers the same symptoms I think...
-- 
Michael