Thread: BUG #14329: libpq doesn't send complete client certificate chain on first SSL connection

VGhlIGZvbGxvd2luZyBidWcgaGFzIGJlZW4gbG9nZ2VkIG9uIHRoZSB3ZWJz
aXRlOgoKQnVnIHJlZmVyZW5jZTogICAgICAxNDMyOQpMb2dnZWQgYnk6ICAg
ICAgICAgIEthY3BlciBadWsKRW1haWwgYWRkcmVzczogICAgICBrenVrQGFr
YW1haS5jb20KUG9zdGdyZVNRTCB2ZXJzaW9uOiA5LjUuNApPcGVyYXRpbmcg
c3lzdGVtOiAgIEdOVS9MaW51eCA0LjEuMjAKRGVzY3JpcHRpb246ICAgICAg
ICAKCkkgaGF2ZSBQb3N0Z3JlU1FMIDkuNS40IGNvbXBpbGVkIHdpdGggT3Bl
blNTTCAxLjAuMmguIFNlcnZlciBpcyBjb25maWd1cmVkCnRvIHJlcXVpcmUg
U1NMIGNlcnRpZmljYXRlIGZvciBjbGllbnQuIFNlcnZlcidzIHJvb3QuY3J0
IG9ubHkgY29udGFpbnMKdG9wLWxldmVsIENBIGNlcnRpZmljYXRlIGFuZCBw
Z19oYmEuY29uZiBjb250YWluczoNCg0KaG9zdHNzbCAgIGRibmFtZSAgIGRi
dXNlciAgIDEyNy4wLjAuMS8zMiAgIGNlcnQgICBtYXA9dXNlcm1hcA0KDQpJ
J20gdHJ5aW5nIHRvIGNvbm5lY3QgdXNpbmcgcHNxbDoNCg0KcHNxbCAiZGJu
YW1lPWRibmFtZSB1c2VyPWRidXNlciBob3N0PTEyNy4wLjAuMSBzc2xtb2Rl
PXZlcmlmeS1jYQpzc2xrZXk9Y2VydHMvY2xpZW50LmtleSBzc2xyb290Y2Vy
dD1jZXJ0cy9yb290LmNydApzc2xjZXJ0PWNlcnRzL2NsaWVudC1pbnQtY2hh
aW4uY3J0Ig0KDQpjbGllbnQtaW50LWNoYWluLmNydCBjb250YWlucyBjbGll
bnQgY2VydGlmaWNhdGUgc2lnbmVkIGJ5IGludGVybWVkaWF0ZQpjZXJ0aWZp
Y2F0ZSBhbmQgaW50ZXJtZWRpYXRlIGNlcnRpZmljYXRlIHNpZ25lZCBieSBv
dXIgdG9wLWxldmVsIENBCmNlcnRpZmljYXRlICh0aGUgc2FtZSBhcyBpbiBz
ZXJ2ZXIncyByb290LmNydCkuDQoNClNlcnZlciByZWplY3RzIHRoaXMgY29u
bmVjdGlvbi4gcHNxbCByZXBvcnRzIGVycm9yICJTU0wgZXJyb3I6IHRsc3Yx
IGFsZXJ0CnVua25vd24gY2EiLiBTZXJ2ZXIgcmVwb3J0cyAiY291bGQgbm90
IGFjY2VwdCBTU0wgY29ubmVjdGlvbjogY2VydGlmaWNhdGUKdmVyaWZ5IGZh
aWxlZCIgaW4gbG9ncy4NCg0KdGNwZHVtcCBvZiB0cmFmZmljIGNvbmZpcm1z
IHRoYXQgcHNxbCBvbmx5IHByZXNlbnRzIGNsaWVudCBjZXJ0aWZpY2F0ZSwK
d2l0aG91dCBpbnRlcm1lZGlhdGUuDQoNCldoZW4gdXNpbmcgcHN5Y29wZzIg
aW4gUHl0aG9uIChwc3ljb3BnMiB1c2VzIGxpYnBxIGFuZCBJJ20gdGVzdGlu
ZyB3aXRoIHRoZQpzYW1lIHNldHRpbmdzIGFuZCBjZXJ0aWZpY2F0ZXMgYXMg
d2l0aCBwc3FsKSwgdGhlIGZpcnN0IGNvbm5lY3Rpb24gZmFpbHMKd2l0aCB0
aGUgc2FtZSBlcnJvciwgYnV0IGl0IHN1Y2NlZWRzIGlmIGl0J3MgcmV0cmll
ZC4gdGNwZHVtcCBvZiB0cmFmZmljCmNvbmZpcm1zLCB0aGF0IG9uIHRoZSBz
ZWNvbmQgdHJ5IGJvdGggY2xpZW50IGFuZCBpbnRlcm1lZGlhdGUgY2VydGlm
aWNhdGVzCmFyZSBzZW50Lg0KDQpJdCB3b3JrcyBjb3JyZWN0bHkgKHBzcWwg
Y29ubmVjdHMgc3VjY2Vzc2Z1bGx5IGFuZCBwc3ljb3BnMiBjb25uZWN0cyBv
bgpmaXJzdCB0cnkpIHdoZW4gUG9zdGdyZVNRTCBpcyBjb21waWxlZCB3aXRo
IE9wZW5TU0wgMS4wLjEuIEl0IGFsc28gd29ya3MKY29ycmVjdGx5IGlmIHNl
cnZlciBpcyBjb21waWxlZCB3aXRoIE9wZW5TU0wgMS4wLjIsIGJ1dCBjbGll
bnQgd2l0aCBPcGVuU1NMCjEuMC4xLg0KDQpNeSBlZHVjYXRlZCBndWVzcyBp
cyB0aGF0IGluIGZlLXNlY3VyZS1vcGVuc3NsLmMgaW4gaW5pdGlhbGl6ZV9T
U0wgZnVuY3Rpb24Kd2hvbGUgY2VydGlmaWNhdGUgY2hhaW4gaXMgbG9hZGVk
IGludG8gU1NMX2NvbnRleHQsIGJ1dCBvbmx5IGNsaWVudApjZXJ0aWZpY2F0
ZSBpcyBsb2FkZWQgdG8gU1NMIG9iamVjdC4gU1NMIG9iamVjdCBpcyBjcmVh
dGVkIGJlZm9yZSBsb2FkaW5nCmNlcnRpZmljYXRlIGNoYWluIGludG8gU1NM
X2NvbnRleHQsIHNvIGl0IGRvZXNuJ3Qgc2VlIHRoaXMgdXBkYXRlLiBPbmx5
IHRoZQpuZXh0IGNvbm5lY3Rpb24sIHdpdGggbmV3IFNTTCBvYmplY3QsIHBp
Y2tzIHVwIHRoZSBjZXJ0aWZpY2F0ZSBjaGFpbiBmcm9tClNTTF9jb250ZXh0
LiBJdCBkb2Vzbid0IGV4cGxhaW4gd2h5IGl0IHdvcmtzIHdpdGggT3BlblNT
TCAxLjAuMSB0aG91Z2gsIHNvCnRoYXQgbWF5IGJlIGEgZmFsc2UgdHJhaWwu
Cgo=
On 09/20/2016 01:10 PM, kzuk@akamai.com wrote:
> My educated guess is that in fe-secure-openssl.c in initialize_SSL function
> whole certificate chain is loaded into SSL_context, but only client
> certificate is loaded to SSL object. SSL object is created before loading
> certificate chain into SSL_context, so it doesn't see this update. Only the
> next connection, with new SSL object, picks up the certificate chain from
> SSL_context. It doesn't explain why it works with OpenSSL 1.0.1 though, so
> that may be a false trail.

Yeah, that's probably what's happening. The OpenSSL man page for
SSL_CTX_use_certificate() says:

> The SSL_CTX_* class of functions loads the certificates and keys into
> the SSL_CTX object ctx. The information is passed to SSL objects ssl
> created from ctx with SSL_new by copying, so that changes applied to
> ctx do not propagate to already existing SSL objects.

It says the same in both 1.0.1 and 1.0.2 versions, though. I guess we
have been relying on a bug that was fixed in 1.0.2, in that the
intermediate CA certs were actually propagated from the context to the
existing SSL object, contrary to what the man page says. I don't
immediately see any relevant change in the OpenSSL commit logs between
1.0.1 and 1.0.2, though.

I think we need to rearrange the code so that we call
SSL_CTX_use_certificate() first, and SSL_new() after that. I wonder if
that's going to break 1.0.1, though.

Setting up a test environment with the required certificates and CAs is
a bit tedious. Would you be interested in adding a test case for this in
the SSL test suite, in src/test/ssl, and posting a patch for that? I can
take a stab at fixing this, but having a test case ready would give me a
head start.

- Heikki
Heikki -

Would also be happy to set up a test case for this.... Impacts us directly.

Need a couple of days to do so, though. Please let me know your timeline.

Lou Picciano

----- Original Message -----

From: "Heikki Linnakangas" <hlinnaka@iki.fi>
To: kzuk@akamai.com, pgsql-bugs@postgresql.org
Sent: Wednesday, September 21, 2016 4:06:33 AM
Subject: Re: [BUGS] BUG #14329: libpq doesn't send complete client certificate chain on first SSL connection

On 09/20/2016 01:10 PM, kzuk@akamai.com wrote:
> My educated guess is that in fe-secure-openssl.c in initialize_SSL function
> whole certificate chain is loaded into SSL_context, but only client
> certificate is loaded to SSL object. SSL object is created before loading
> certificate chain into SSL_context, so it doesn't see this update. Only the
> next connection, with new SSL object, picks up the certificate chain from
> SSL_context. It doesn't explain why it works with OpenSSL 1.0.1 though, so
> that may be a false trail.

Yeah, that's probably what's happening. The OpenSSL man page for
SSL_CTX_use_certificate() says:

> The SSL_CTX_* class of functions loads the certificates and keys into
> the SSL_CTX object ctx. The information is passed to SSL objects ssl
> created from ctx with SSL_new by copying, so that changes applied to
> ctx do not propagate to already existing SSL objects.

It says the same in both 1.0.1 and 1.0.2 versions, though. I guess we
have been relying on a bug that was fixed in 1.0.2, in that the
intermediate CA certs were actually propagated from the context to the
existing SSL object, contrary to what the man page says. I don't
immediately see any relevant change in the OpenSSL commit logs between
1.0.1 and 1.0.2, though.

I think we need to rearrange the code so that we call
SSL_CTX_use_certificate() first, and SSL_new() after that. I wonder if
that's going to break 1.0.1, though.

Setting up a test environment with the required certificates and CAs is
a bit tedious. Would you be interested in adding a test case for this in
the SSL test suite, in src/test/ssl, and posting a patch for that? I can
take a stab at fixing this, but having a test case ready would give me a
head start.

- Heikki



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 09/21/2016 02:16 PM, Lou Picciano wrote:
> Heikki -
>
> Would also be happy to set up a test case for this.... Impacts us directly.

Great, thanks!

> Need a couple of days to do so, though. Please let me know your timeline.

Sure, I'm not in a hurry :-).

- Heikki
T24gOS8yMS8xNiwgMToxNiBQTSwgTG91IFBpY2NpYW5vIHdyb3RlOg0KPiBXb3VsZCBhbHNvIGJl
IGhhcHB5IHRvIHNldCB1cCBhIHRlc3QgY2FzZSBmb3IgdGhpcy4uLi4gSW1wYWN0cyB1cyBkaXJl
Y3RseS4NCg0KSeKAmWxsIHdvcmsgb24gYSB0ZXN0IGNhc2UgYXQgdGhlIGJlZ2lubmluZyBvZiB0
aGUgbmV4dCB3ZWVrIGlmIExvdSBkb2VzbuKAmXQgZmluaXNoIGl0IGVhcmxpZXIuDQoNClRoYW5r
cywNCkthY3Blcg0KDQo=
On 9/21/16, 10:06 AM, "Heikki Linnakangas" <hlinnaka@gmail.com on behalf of hlinnaka@iki.fi> wrote:
> Setting up a test environment with the required certificates and CAs is 
> a bit tedious. Would you be interested in adding a test case for this in 
> the SSL test suite, in src/test/ssl, and posting a patch for that? I can 
> take a stab at fixing this, but having a test case ready would give me a 
> head start.
    
I’ve attached the patch that adds a test case. Please let me know if I should send it to pgsql-hackers instead.
Rememberto run make sslfiles, as I didn’t include generated certificate in patch.
 

Thanks
Kacper


Attachment
On 10/04/2016 02:08 PM, Zuk, Kacper wrote:
> On 9/21/16, 10:06 AM, "Heikki Linnakangas" <hlinnaka@gmail.com on behalf of hlinnaka@iki.fi> wrote:
>> Setting up a test environment with the required certificates and CAs is
>> a bit tedious. Would you be interested in adding a test case for this in
>> the SSL test suite, in src/test/ssl, and posting a patch for that? I can
>> take a stab at fixing this, but having a test case ready would give me a
>> head start.
>
> I’ve attached the patch that adds a test case. Please let me know if I should send it to pgsql-hackers instead.
Rememberto run make sslfiles, as I didn’t include generated certificate in patch. 

Thanks!

Now that I look closer into this, I can see more trouble:

1. Since the SSL context is shared between all connections and all
threads, there's a race condition if two threads try to open a
connection at the same time. The SSL_CTX_use_certificate_chain_file()
call, and SSL_CTX_load_verify_locations() later in the function, are
protected by the global mutex, but we only hold the mutex for the
duration of those calls. It doesn't protect from the fact that while one
thread is opening a connection with one set of options, another thread
opens a connection with different options. They might get mixed up.
That's a problem if the connections use different sslcert or sslrootcert
settings.

2. Even with a single thread, using different sslrootcert options in
different PQconnectdb() calls fails outright, with error:

SSL error: block type is not 01

I got that with a test program that simply calls:

PQconnectdb(<connection string 1>);
PQfinish();
PQconnectdb(<connection string 2>);
PQfinish();

when the two connection strings point to different servers, with
different certificates signed by different root CA, and the connection
strings have different (correct) sslrootcert options for the servers.
Either PQconnectdb() call works separately, but put them in the same
program, you get that error.

I don't understand why that second error happens. I don't think we're
doing anything illegal by reusing the same SSL_CTX object for two
different connections.


I'm starting to feel that using the same SSL_CTX object for multiple
connections is just too fragile. Perhaps we could share one SSL_CTX
object for all the connections with no sslcert and no sslrootcert, but
I'm not sure if even that is worth it.

In quick testing, calling SSL_CTX_new() for each connection adds about
3% of overhead to establishing a new connection, with the default
OpenSSL settings (seems to use ECDHE-RSA-AES256-GCM-SHA384 cipher here).
I also tested memory usage with a program that opens 10000 connections,
and it used about 15% more memory, when SSL_CTX_new() is called for each
connection. I think that's acceptable. Barring objections, I'm going to
write a patch to use a separate SSL context for every connection.

- Heikki
On Tue, 4 Oct 2016 21:55:52 +0300
Heikki Linnakangas <hlinnaka@iki.fi> wrote:

> 1. Since the SSL context is shared between all connections and all
> threads, there's a race condition if two threads try to open a
> connection at the same time. The SSL_CTX_use_certificate_chain_file()
> call, and SSL_CTX_load_verify_locations() later in the function, are
> protected by the global mutex, but we only hold the mutex for the
> duration of those calls. It doesn't protect from the fact that while one
> thread is opening a connection with one set of options, another thread
> opens a connection with different options. They might get mixed up.
> That's a problem if the connections use different sslcert or sslrootcert
> settings.
>
> 2. Even with a single thread, using different sslrootcert options in
> different PQconnectdb() calls fails outright, with error:
>
> SSL error: block type is not 01
>
> I got that with a test program that simply calls:
>
> PQconnectdb(<connection string 1>);
> PQfinish();
> PQconnectdb(<connection string 2>);
> PQfinish();

> I'm starting to feel that using the same SSL_CTX object for multiple
> connections is just too fragile. Perhaps we could share one SSL_CTX
> object for all the connections with no sslcert and no sslrootcert, but
> I'm not sure if even that is worth it.
>
> In quick testing, calling SSL_CTX_new() for each connection adds about
> 3% of overhead to establishing a new connection, with the default
> OpenSSL settings (seems to use ECDHE-RSA-AES256-GCM-SHA384 cipher here).
> I also tested memory usage with a program that opens 10000 connections,
> and it used about 15% more memory, when SSL_CTX_new() is called for each
> connection. I think that's acceptable. Barring objections, I'm going to
> write a patch to use a separate SSL context for every connection.

What about keeping a table of connection strings and SSL contexts so that all
connections using the same connection string could share the SSL context? It
seems likely that most applications reuse the same connection string and
could avoid the penalty.

-dg

--
David Gould              510 282 0869         daveg@sonic.net
If simplicity worked, the world would be overrun with insects.
On 10/04/2016 10:53 PM, David Gould wrote:
> On Tue, 4 Oct 2016 21:55:52 +0300
> Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
>> I'm starting to feel that using the same SSL_CTX object for multiple
>> connections is just too fragile. Perhaps we could share one SSL_CTX
>> object for all the connections with no sslcert and no sslrootcert, but
>> I'm not sure if even that is worth it.
>>
>> In quick testing, calling SSL_CTX_new() for each connection adds about
>> 3% of overhead to establishing a new connection, with the default
>> OpenSSL settings (seems to use ECDHE-RSA-AES256-GCM-SHA384 cipher here).
>> I also tested memory usage with a program that opens 10000 connections,
>> and it used about 15% more memory, when SSL_CTX_new() is called for each
>> connection. I think that's acceptable. Barring objections, I'm going to
>> write a patch to use a separate SSL context for every connection.
>
> What about keeping a table of connection strings and SSL contexts so that all
> connections using the same connection string could share the SSL context? It
> seems likely that most applications reuse the same connection string and
> could avoid the penalty.

Yeah, could do that. Then again, I don't think it's worth the trouble.

- Heikki
On 10/04/2016 09:55 PM, Heikki Linnakangas wrote:
> I'm starting to feel that using the same SSL_CTX object for multiple
> connections is just too fragile. Perhaps we could share one SSL_CTX
> object for all the connections with no sslcert and no sslrootcert, but
> I'm not sure if even that is worth it.
>
> In quick testing, calling SSL_CTX_new() for each connection adds about
> 3% of overhead to establishing a new connection, with the default
> OpenSSL settings (seems to use ECDHE-RSA-AES256-GCM-SHA384 cipher here).
> I also tested memory usage with a program that opens 10000 connections,
> and it used about 15% more memory, when SSL_CTX_new() is called for each
> connection. I think that's acceptable. Barring objections, I'm going to
> write a patch to use a separate SSL context for every connection.

I came up with the attached patch for this. As threatened, it uses a
separate SSL context for each connection. That simplifies the code
somewhat, and fixes the bugs. Kacper's test case is included in this.
(This is for git master, stable branches will need small tweaking to
make the patch apply.)

Did some more testing with "pgbench -C". The overhead on establishing a
connection is a bit higher than I saw initially, about 6%, when
sslmode=verify-ca is used. Might be more with more complex certificate
chains. I think that's still acceptable. If you have an application that
establishes SSL connections so frequently that that matters, you should
reconsider your design.

- Heikki


Attachment
On 10/5/2016 2:10 AM, Heikki Linnakangas wrote:
> If you have an application that establishes SSL connections so
> frequently that that matters, you should reconsider your design.

Hah!


(totally agree).



--
john r pierce, recycling bits in santa cruz
On 10/05/2016 12:10 PM, Heikki Linnakangas wrote:
> I came up with the attached patch for this. As threatened, it uses a
> separate SSL context for each connection. That simplifies the code
> somewhat, and fixes the bugs. Kacper's test case is included in this.
> (This is for git master, stable branches will need small tweaking to
> make the patch apply.)

I hear no objections, so committed and back-patched that.

Thanks for the report and analysis, Kacper!

- Heikki
T24gMTAvNy8xNiwgMTE6MjYgQU0sICJIZWlra2kgTGlubmFrYW5nYXMiIDxobGlubmFrYUBnbWFp
bC5jb20gb24gYmVoYWxmIG9mIGhsaW5uYWthQGlraS5maT4gd3JvdGU6DQo+IE9uIDEwLzA1LzIw
MTYgMTI6MTAgUE0sIEhlaWtraSBMaW5uYWthbmdhcyB3cm90ZToNCj4gPiBJIGNhbWUgdXAgd2l0
aCB0aGUgYXR0YWNoZWQgcGF0Y2ggZm9yIHRoaXMuDQo+IA0KPiBJIGhlYXIgbm8gb2JqZWN0aW9u
cywgc28gY29tbWl0dGVkIGFuZCBiYWNrLXBhdGNoZWQgdGhhdC4NCj4gDQo+IFRoYW5rcyBmb3Ig
dGhlIHJlcG9ydCBhbmQgYW5hbHlzaXMsIEthY3BlciENCg0KVGhhbmsgeW91IGZvciBhIHF1aWNr
IGZpeCA6KQ0KDQpLYWNwZXINCg0K