Thread: pgcrypto: PGP armor headers
Hi, Currently there's no way to generate or extract armor headers from the PGP armored format in pgcrypto. I've written a patch to add the support. For example: local:marko=#* select armor('zooka', array['Version', 'Comment'], array['Created by pgcrypto', 'PostgreSQL, the database']); armor ----------------------------------- -----BEGIN PGP MESSAGE----- + Version: Created by pgcrypto + Comment: PostgreSQL, the database+ + em9va2E= + =D5cR + -----END PGP MESSAGE----- + local:marko=#* select pgp_armor_header(armor('zooka', array['Version', 'Comment'], array['Created by pgcrypto', 'PostgreSQL, the database']), 'Comment'); pgp_armor_header -------------------------- PostgreSQL, the database (1 row) .marko
Attachment
Hi, On 8/8/14 3:18 PM, I wrote: > Currently there's no way to generate or extract armor headers from the > PGP armored format in pgcrypto. I've written a patch to add the > support. Latest version of the patch here, having fixed some small coding issues. .marko
Attachment
On Fri, Aug 15, 2014 at 1:55 AM, Marko Tiikkaja <marko@joh.to> wrote:
I didn't do a detailed code review, and I am not a security expert, so I will leave it to the signed-up reviewer to change the status once he takes a look.
Hi,Latest version of the patch here, having fixed some small coding issues.
On 8/8/14 3:18 PM, I wrote:Currently there's no way to generate or extract armor headers from the
PGP armored format in pgcrypto. I've written a patch to add the
support.
I've built this and tested the installation of the extension, the upgrade from earlier versions, and the basic functions, with and without --enable-cassert
I did occasionally get some failures with 'make check -C contrib/pgcrypto', but I can't reproduce it. I might have accidentally had some mixture of binaries some with cassert and some without.
No other problems to report.
One quibble in the documentation, "an error is returned". Errors get raised, not returned.
This patch will conflict with the pgp signature patch due to the pgcrypto--1.2.sql and kin.
Cheers,
Jeff
Marko, et al, This is a review of the pgcrypto PGP Armor Headers patch: http://www.postgresql.org/message-id/53EDCAE8.20004@joh.to Contents & Purpose ================== This patch add functions to create and extract OpenPGP Armor Headers. from OpenPGP messages. Included in the patch are updated regression test cases and documentation. Initial Run =========== The patch applies cleanly to HEAD. The 144 regression tests all pass successfully against the new patch. Conclusion ========== Since I'm using these functions in the BankAPI project, https://github.com/trustly/bankapi, I have tested them by actually using them in production, in addition to the provided regression tests, which is a good sign they are working not just in theory. +1 for committer review. On Fri, Aug 15, 2014 at 10:55 AM, Marko Tiikkaja <marko@joh.to> wrote: > Hi, > > > On 8/8/14 3:18 PM, I wrote: >> >> Currently there's no way to generate or extract armor headers from the >> PGP armored format in pgcrypto. I've written a patch to add the >> support. > > > Latest version of the patch here, having fixed some small coding issues. > > > .marko > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
On 08/15/2014 11:55 AM, Marko Tiikkaja wrote: > Hi, > > On 8/8/14 3:18 PM, I wrote: >> Currently there's no way to generate or extract armor headers from the >> PGP armored format in pgcrypto. I've written a patch to add the >> support. > > Latest version of the patch here, having fixed some small coding issues. This coding style gives me the willies: > guess_len = pgp_armor_enc_len(data_len, num_headers, keys, values); > res = palloc(VARHDRSZ + guess_len); > > res_len = pgp_armor_encode((uint8 *) VARDATA(data), data_len, > (uint8 *) VARDATA(res), > num_headers, keys, values); > if (res_len > guess_len) > ereport(ERROR, > (errcode(ERRCODE_EXTERNAL_ROUTINE_INVOCATION_EXCEPTION), > errmsg("Overflow - encode estimate too small"))); That was OK before this patch, as the length calculation was simple enough to verify (although if I were writing it from scratch, I would've written it differently). But with this patch, it gets a lot more complicated, and I can't easily convince myself that it's correct. pgp_armor_enc_len might be vulnerable to integer overflow. Consider 1GB worth of keys, 1GB worth of values, and 1GB worth of data. I'm not sure if you can quite make it overflow a 32-bit unsigned integer, but at least you can get nervously close, e.g if you use use max-sized key/value arrays, with a single byte in each key and value. Oh, and if you use a single-byte server encoding and characters that get expanded to multiple bytes in UTF-8, you can go higher. So I think this (and the corresponding dearmor code too) should be refactored to use a StringInfo that gets enlarged as needed, instead of hoping to guess the size correctly beforehand. To ease review, might make sense to do that as a separate patch over current sources, and the main patch on top of that. BTW, I'm surprised that there is no function to get all the armor headers. You can only probe for a particular one with pgp_armor_headder, but there is no way to list them all, if you don't know what you're looking for. - Heikki
On 9/9/14 10:54 AM, Heikki Linnakangas wrote: > So I think this (and the corresponding dearmor code too) should be > refactored to use a StringInfo that gets enlarged as needed, instead of > hoping to guess the size correctly beforehand. To ease review, might > make sense to do that as a separate patch over current sources, and the > main patch on top of that. Yeah, I thought the same thing while writing that code. Thanks, I'll do it that way. > BTW, I'm surprised that there is no function to get all the armor > headers. You can only probe for a particular one with pgp_armor_headder, > but there is no way to list them all, if you don't know what you're > looking for. Right. The assumption was that due to the nature of how the headers are used, the user would always be looking for a certain, predetermined set of values. But I don't oppose to adding a pgp_armor_header_keys() function to get the set of all keys, for example. .marko
On 9/9/14 11:01 AM, I wrote: > On 9/9/14 10:54 AM, Heikki Linnakangas wrote: >> So I think this (and the corresponding dearmor code too) should be >> refactored to use a StringInfo that gets enlarged as needed, instead of >> hoping to guess the size correctly beforehand. To ease review, might >> make sense to do that as a separate patch over current sources, and the >> main patch on top of that. > > Yeah, I thought the same thing while writing that code. Thanks, I'll do > it that way. Attached is what I have right now. I started working on the decoding part, but it has this piece of code: /* decode crc */ if (b64_decode(p + 1, 4, buf) != 3) goto out; which makes the approach a bit uglier. If I did this the same way, I would have to create and destroy a StringInfo just for this operation, which seems ugly. So I wonder if I shouldn't try and instead keep the code closer to what it is in HEAD right now; I could call enlargeStringInfo() first, then hand out a pointer to b64_encode (or b64_decode()) and finally increment StringInfoData.len by how much was actually written. That would keep the code changes a lot smaller, too. Is either of these approaches anywhere near what you had in mind? I'm also not sure why we need to keep a copy of the base64 encoding/decoding logic instead of exporting it in utils/adt/encode.c. .marko
Attachment
On 09/10/2014 02:26 PM, Marko Tiikkaja wrote: > On 9/9/14 11:01 AM, I wrote: >> On 9/9/14 10:54 AM, Heikki Linnakangas wrote: >>> So I think this (and the corresponding dearmor code too) should be >>> refactored to use a StringInfo that gets enlarged as needed, instead of >>> hoping to guess the size correctly beforehand. To ease review, might >>> make sense to do that as a separate patch over current sources, and the >>> main patch on top of that. >> >> Yeah, I thought the same thing while writing that code. Thanks, I'll do >> it that way. > > Attached is what I have right now. I started working on the decoding > part, but it has this piece of code: > > /* decode crc */ > if (b64_decode(p + 1, 4, buf) != 3) > goto out; > > which makes the approach a bit uglier. If I did this the same way, I > would have to create and destroy a StringInfo just for this operation, > which seems ugly. So I wonder if I shouldn't try and instead keep the > code closer to what it is in HEAD right now; I could call > enlargeStringInfo() first, then hand out a pointer to b64_encode (or > b64_decode()) and finally increment StringInfoData.len by how much was > actually written. That would keep the code changes a lot smaller, too. Yeah, that sounds reasonable. > I'm also not sure why we need to keep a copy of the base64 > encoding/decoding logic instead of exporting it in utils/adt/encode.c. Good question... - Heikki
On 9/10/14 1:38 PM, Heikki Linnakangas wrote: > On 09/10/2014 02:26 PM, Marko Tiikkaja wrote: >> So I wonder if I shouldn't try and instead keep the >> code closer to what it is in HEAD right now; I could call >> enlargeStringInfo() first, then hand out a pointer to b64_encode (or >> b64_decode()) and finally increment StringInfoData.len by how much was >> actually written. That would keep the code changes a lot smaller, too. > > Yeah, that sounds reasonable. OK, I've attemped to do that in the attached. I'm pretty sure I didn't get all of the overflows right, so someone should probably take a really good look at it. (I'm not too confident the original code got them right either, but whatever). Speaking of good looks, should I add it to the next commitfest as a new patch, or should we try and get someone to review it like this? >> I'm also not sure why we need to keep a copy of the base64 >> encoding/decoding logic instead of exporting it in utils/adt/encode.c. > > Good question... I've left this question unanswered for now. We can fix it later, independently of this patch. .marko
Attachment
On 09/10/2014 04:35 PM, Marko Tiikkaja wrote: > Speaking of good looks, should I add it to the next commitfest as a new > patch, or should we try and get someone to review it like this? Let's handle this in this commitfest. - Heikki
On 09/10/2014 04:35 PM, Marko Tiikkaja wrote: > On 9/10/14 1:38 PM, Heikki Linnakangas wrote: >> On 09/10/2014 02:26 PM, Marko Tiikkaja wrote: >>> So I wonder if I shouldn't try and instead keep the >>> code closer to what it is in HEAD right now; I could call >>> enlargeStringInfo() first, then hand out a pointer to b64_encode (or >>> b64_decode()) and finally increment StringInfoData.len by how much was >>> actually written. That would keep the code changes a lot smaller, too. >> >> Yeah, that sounds reasonable. > > OK, I've attemped to do that in the attached. I'm pretty sure I didn't > get all of the overflows right, so someone should probably take a really > good look at it. (I'm not too confident the original code got them > right either, but whatever). Looks good, committed. It might've been a tad more efficient to return the StringInfo buffer directly from pgp_armor/dearmor, and avoid the extra palloc and memcpy, but this isn't performance critical enough for it to really matter. Are you planning to post the main patch rebased on top of this soon? As in the next day or two? Otherwise I'll mark this as "Returned with feedback" for this commitfest. - Heikki
On 9/25/14 3:50 PM, Heikki Linnakangas wrote: > On 09/10/2014 04:35 PM, Marko Tiikkaja wrote: >> OK, I've attemped to do that in the attached. I'm pretty sure I didn't >> get all of the overflows right, so someone should probably take a really >> good look at it. (I'm not too confident the original code got them >> right either, but whatever). > > Looks good, committed. Thanks! > It might've been a tad more efficient to return > the StringInfo buffer directly from pgp_armor/dearmor, and avoid the > extra palloc and memcpy, but this isn't performance critical enough for > it to really matter. I couldn't see any way of doing that without breaking the VARDATA abstraction. I even went looking for similar cases in the source code, but couldn't find any. If you can come up with a way, feel free to change that -- I'd like to learn myself. But like you, I didn't consider it too important. > Are you planning to post the main patch rebased on top of this soon? As > in the next day or two? Otherwise I'll mark this as "Returned with > feedback" for this commitfest. Yes. With good luck I'll get you a rebased one today, otherwise it'll have to wait until tomorrow. .marko
On 09/25/2014 04:56 PM, Marko Tiikkaja wrote: > On 9/25/14 3:50 PM, Heikki Linnakangas wrote: >> On 09/10/2014 04:35 PM, Marko Tiikkaja wrote: >> It might've been a tad more efficient to return >> the StringInfo buffer directly from pgp_armor/dearmor, and avoid the >> extra palloc and memcpy, but this isn't performance critical enough for >> it to really matter. > > I couldn't see any way of doing that without breaking the VARDATA > abstraction. I even went looking for similar cases in the source code, > but couldn't find any. If you can come up with a way, feel free to > change that -- I'd like to learn myself. You could first append VARHDRSZ zeros to the StringInfo, then append the base64-encoded data, and last replace the zeros with the real length, using SET_VARSIZE. >> Are you planning to post the main patch rebased on top of this soon? As >> in the next day or two? Otherwise I'll mark this as "Returned with >> feedback" for this commitfest. > > Yes. With good luck I'll get you a rebased one today, otherwise it'll > have to wait until tomorrow. Ok, I'll leave this in Waiting on Author state then. - Heikki
On 9/25/14 4:08 PM, Heikki Linnakangas wrote: > On 09/25/2014 04:56 PM, Marko Tiikkaja wrote: >> On 9/25/14 3:50 PM, Heikki Linnakangas wrote: >>> On 09/10/2014 04:35 PM, Marko Tiikkaja wrote: >>> It might've been a tad more efficient to return >>> the StringInfo buffer directly from pgp_armor/dearmor, and avoid the >>> extra palloc and memcpy, but this isn't performance critical enough for >>> it to really matter. >> >> I couldn't see any way of doing that without breaking the VARDATA >> abstraction. I even went looking for similar cases in the source code, >> but couldn't find any. If you can come up with a way, feel free to >> change that -- I'd like to learn myself. > > You could first append VARHDRSZ zeros to the StringInfo, then append the > base64-encoded data, and last replace the zeros with the real length, > using SET_VARSIZE. That's assuming that VARDATA() is at exactly VARHDRSZ bytes. I couldn't find any callers making that assumption. .marko
Hi, On 9/25/14, 3:56 PM, I wrote: > On 9/25/14 3:50 PM, Heikki Linnakangas wrote: >> Are you planning to post the main patch rebased on top of this soon? As >> in the next day or two? Otherwise I'll mark this as "Returned with >> feedback" for this commitfest. > > Yes. With good luck I'll get you a rebased one today, otherwise it'll > have to wait until tomorrow. Missed that promise by a day since something unexpected came up yesterday. Attached is v3 of the patch. The changes are: - Rebased on top of the current master - Added a function pgp_armor_header_keys() to list all keys present in an armor - Changed pgp_armor_header() to use a StringInfo instead of an mbuf - Fixed the "error is returned" problem in the documentation pointed out earlier .marko
Attachment
On 09/27/2014 11:50 PM, Marko Tiikkaja wrote: > Hi, > > On 9/25/14, 3:56 PM, I wrote: >> On 9/25/14 3:50 PM, Heikki Linnakangas wrote: >>> Are you planning to post the main patch rebased on top of this soon? As >>> in the next day or two? Otherwise I'll mark this as "Returned with >>> feedback" for this commitfest. >> >> Yes. With good luck I'll get you a rebased one today, otherwise it'll >> have to wait until tomorrow. > > Missed that promise by a day since something unexpected came up > yesterday. Attached is v3 of the patch. The changes are: > > - Rebased on top of the current master > - Added a function pgp_armor_header_keys() to list all keys present > in an armor > - Changed pgp_armor_header() to use a StringInfo instead of an mbuf > - Fixed the "error is returned" problem in the documentation pointed > out earlier Thanks! I found the pgp_extract_armor_headers()'s signature quite weird, so I simplified that by making it always return arrays of keys and values. The callers is now responsible for returning all the keys (pgp_armor_header_keys) or finding the single key (pgp_armor_header). I also partially rewrote the implementation of pgp_extract_armor_headers(), making it more readable I hope. If an armor header line ends in CR+LF, pgp_armor_header() returned the CR as part of the value, with your patch. I don't think that's right, the line ending should be considered part of the armoring, so I changed that. Is there any real life examples or tools out there to generate armors with headers with duplicate keys? RFC 4880 says: > Note that some transport methods are sensitive to line length. While > there is a limit of 76 characters for the Radix-64 data (Section > 6.3), there is no limit to the length of Armor Headers. Care should > be taken that the Armor Headers are short enough to survive > transport. One way to do this is to repeat an Armor Header key > multiple times with different values for each so that no one line is > overly long. Does anyone do that in practice? Is there any precedence for concatenating the values in other tools that read armor headers? I wonder if it would make sense to have pgp_armor_header_keys() return both the keys and values. That would make it easier to use, and it might then make sense for it to not remove the duplicates or concatenate values, but just them as is. The caller could then deal with the duplicates any way he wants. We could keep the function for extracting the value for a single key, with the concatenating behavior, for convenience. - Heikki
Attachment
On 9/29/14 3:02 PM, Heikki Linnakangas wrote: > Thanks! I found the pgp_extract_armor_headers()'s signature quite weird, > so I simplified that by making it always return arrays of keys and > values. The callers is now responsible for returning all the keys > (pgp_armor_header_keys) or finding the single key (pgp_armor_header). I > also partially rewrote the implementation of > pgp_extract_armor_headers(), making it more readable I hope. OK. Looks good to me. > If an armor header line ends in CR+LF, pgp_armor_header() returned the > CR as part of the value, with your patch. I don't think that's right, > the line ending should be considered part of the armoring, so I changed > that. Nice catch. You are correct. > Is there any real life examples or tools out there to generate armors > with headers with duplicate keys? RFC 4880 says: > >> Note that some transport methods are sensitive to line length. While >> there is a limit of 76 characters for the Radix-64 data (Section >> 6.3), there is no limit to the length of Armor Headers. Care should >> be taken that the Armor Headers are short enough to survive >> transport. One way to do this is to repeat an Armor Header key >> multiple times with different values for each so that no one line is >> overly long. > > Does anyone do that in practice? Is there any precedence for > concatenating the values in other tools that read armor headers? Maybe I just suck at $SEARCH_ENGINE, but extracting armor headers programmatically doesn't seem to be very popular. I could only find one example, which returned the last instance of the key. But that seemed to be more an accident than anything else; it wasn't documented and the source code didn't say anything about it. I also think that's the worst behaviour. If we can't agree on concatenation, I'd rather see an error. > I wonder if it would make sense to have pgp_armor_header_keys() return > both the keys and values. That would make it easier to use, and it might > then make sense for it to not remove the duplicates or concatenate > values, but just them as is. The caller could then deal with the > duplicates any way he wants. We could keep the function for extracting > the value for a single key, with the concatenating behavior, for > convenience. I'd also suggest renaming it to pgp_armor_headers() in that case. But otherwise it seems like a reasonable plan to me. Want me to do that change or are you going to? .marko
On 09/29/2014 05:38 PM, Marko Tiikkaja wrote: > On 9/29/14 3:02 PM, Heikki Linnakangas wrote: >> Is there any real life examples or tools out there to generate armors >> with headers with duplicate keys? RFC 4880 says: >> >>> Note that some transport methods are sensitive to line length. While >>> there is a limit of 76 characters for the Radix-64 data (Section >>> 6.3), there is no limit to the length of Armor Headers. Care should >>> be taken that the Armor Headers are short enough to survive >>> transport. One way to do this is to repeat an Armor Header key >>> multiple times with different values for each so that no one line is >>> overly long. >> >> Does anyone do that in practice? Is there any precedence for >> concatenating the values in other tools that read armor headers? > > Maybe I just suck at $SEARCH_ENGINE, but extracting armor headers > programmatically doesn't seem to be very popular. I could only find one > example, which returned the last instance of the key. But that seemed > to be more an accident than anything else; it wasn't documented and the > source code didn't say anything about it. I also think that's the worst > behaviour. If we can't agree on concatenation, I'd rather see an error. May I ask you why you wrote this patch? What are you doing with the headers? - Heikki
On 9/30/14 4:37 PM, Heikki Linnakangas wrote: > On 09/29/2014 05:38 PM, Marko Tiikkaja wrote: >> Maybe I just suck at $SEARCH_ENGINE, but extracting armor headers >> programmatically doesn't seem to be very popular. I could only find one >> example, which returned the last instance of the key. But that seemed >> to be more an accident than anything else; it wasn't documented and the >> source code didn't say anything about it. I also think that's the worst >> behaviour. If we can't agree on concatenation, I'd rather see an error. > > May I ask you why you wrote this patch? What are you doing with the headers? We're sending arbitrary messages between systems over HTTP(S), and a special header is used to tell the recipient system what type of message it is. The message types are specific to the domain, but you can think of them to be roughly equivalent to MIME types. If what you're trying to get a sense of is why I'd prefer to see concatenation, I can't really help you. For our use case (and perhaps for everyone else as well) it would actually make more sense to throw an error if pgp_armor_header() is used on a key which appears more than once. The concatenation behaviour was an attempt at a "one size fits all" interface, but now that we're going to also have a pgp_armor_headers() function for users to implement the behaviour they want themselves, there's no real reason to try and guess what everyone wants. I think I'd prefer to see an ERROR in this case now. .marko
Heikki Linnakangas wrote: Happened to land on the middle of the regression test by accident and noticed: > +select armor('zooka', array['Version', 'Comment'], array['Created by pgcrypto', 'PostgreSQL, the world''s most most advancedopen source database']); > + armor > +-------------------------------------------------------------------------- > + -----BEGIN PGP MESSAGE----- + > + Version: Created by pgcrypto + > + Comment: PostgreSQL, the world's most most advanced open source database+ > + + > + em9va2E= + > + =D5cR + > + -----END PGP MESSAGE----- + > + > +(1 row) "most most"? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 09/30/2014 05:45 PM, Marko Tiikkaja wrote: > On 9/30/14 4:37 PM, Heikki Linnakangas wrote: >> On 09/29/2014 05:38 PM, Marko Tiikkaja wrote: >>> Maybe I just suck at $SEARCH_ENGINE, but extracting armor headers >>> programmatically doesn't seem to be very popular. I could only find one >>> example, which returned the last instance of the key. But that seemed >>> to be more an accident than anything else; it wasn't documented and the >>> source code didn't say anything about it. I also think that's the worst >>> behaviour. If we can't agree on concatenation, I'd rather see an error. >> >> May I ask you why you wrote this patch? What are you doing with the headers? > > We're sending arbitrary messages between systems over HTTP(S), and a > special header is used to tell the recipient system what type of message > it is. The message types are specific to the domain, but you can think > of them to be roughly equivalent to MIME types. Ok. How quaint. :-) > If what you're trying to get a sense of is why I'd prefer to see > concatenation, I can't really help you. For our use case (and perhaps > for everyone else as well) it would actually make more sense to throw an > error if pgp_armor_header() is used on a key which appears more than > once. The concatenation behaviour was an attempt at a "one size fits > all" interface, but now that we're going to also have a > pgp_armor_headers() function for users to implement the behaviour they > want themselves, there's no real reason to try and guess what everyone > wants. I think I'd prefer to see an ERROR in this case now. I'm actually now leaning towards providing just a single function, pgp_armor_headers(text, key OUT text, value OUT text), which returns all the keys and values. That gives maximum flexibility, and leaves it up to the user to decide what to do with duplicate keys. It's pretty easy to use that to extract just a single header, too: postgres=# select * FROM pgp_armor_headers(' -----BEGIN PGP MESSAGE----- foo: baar foo: more foo singlekey: fsdfsd em9va2E= =ZZZZ -----END PGP MESSAGE----- ') where key = 'singlekey'; key | value -----------+-------- singlekey | fsdfsd (1 row) And if you want to concatenate possible duplicates: postgres=# select string_agg(value, ' ') FROM pgp_armor_headers(' -----BEGIN PGP MESSAGE----- foo: baar foo: more foo singlekey: fsdfsd em9va2E= =ZZZZ -----END PGP MESSAGE----- ') where key = 'foo'; string_agg --------------- baar more foo (1 row) What do you think? Attached patch implements that, but the docs and regression tests now need adjustment. - Heikki
On 9/30/14 5:17 PM, Heikki Linnakangas wrote: > I'm actually now leaning towards providing just a single function, > pgp_armor_headers(text, key OUT text, value OUT text), which returns all > the keys and values. That gives maximum flexibility, and leaves it up to > the user to decide what to do with duplicate keys. It's pretty easy to > use that to extract just a single header, too: > > <snip> > > What do you think? Attached patch implements that, but the docs and > regression tests now need adjustment. (You forgot the patch, but I can imagine what it would have been.) I'm not exactly sure to be honest. I would personally like to see a simple function for extracting a single header value in a scalar context without having to deal with all the pain of SRFs, multiple matches and no matches. Sure, that got a lot better in 9.3 with LATERAL but it's still way inferior to pgp_armor_header(). I can also see why someone would argue that I should just create the function myself and that it doesn't have to be shipped with postgres. But on the other hand, this is already an extension one has to explicitly go and CREATE, and that considered one more function probably wouldn't hurt too much. But either way I won't be unhappy, so it's up to you (and/or other members of the community) to decide this one. .marko
On 09/30/2014 06:39 PM, Marko Tiikkaja wrote: > On 9/30/14 5:17 PM, Heikki Linnakangas wrote: >> I'm actually now leaning towards providing just a single function, >> pgp_armor_headers(text, key OUT text, value OUT text), which returns all >> the keys and values. That gives maximum flexibility, and leaves it up to >> the user to decide what to do with duplicate keys. It's pretty easy to >> use that to extract just a single header, too: >> >> <snip> >> >> What do you think? Attached patch implements that, but the docs and >> regression tests now need adjustment. > > (You forgot the patch, but I can imagine what it would have been.) Oops. > I'm not exactly sure to be honest. I would personally like to see a > simple function for extracting a single header value in a scalar context > without having to deal with all the pain of SRFs, multiple matches and > no matches. Sure, that got a lot better in 9.3 with LATERAL but it's > still way inferior to pgp_armor_header(). > > I can also see why someone would argue that I should just create the > function myself and that it doesn't have to be shipped with postgres. > But on the other hand, this is already an extension one has to > explicitly go and CREATE, and that considered one more function probably > wouldn't hurt too much. Yeah, building the function to extract a single value is pretty simple once you have the set-returning function: create function pgp_armor_header(armor text, key text) returns text language sql as $$ select string_agg(value, ' ') from pgp_armor_headers($1) where key = $2 $$; I spent a little time cleaning up the regression tests and docs, and ended up with the attached. But then I realized that there's a problem with UTF-8 conversion in the armor() function. It returns the armored blob as text, but forcibly converts the keys and values to UTF-8. That's not cool, because you will get invalidly encoded strings into the database, if you use the function while connected to a database that uses some other encoding than UTF-8. RFC4880 says that the headers are in UTF-8, but armor() cannot safely return UTF-8 encoded text unless the database's encoding is also UTF-8. It also rightly says that using anything else than plain ASCII, even though nominally it's UTF-8, is a bad idea, because the whole point of ASCII-armoring is to make the format safe from encoding conversions. We have two options: 1. Throw an error if there are any non-ASCII characters in the key/value arrays. 2. Don't convert them to UTF-8, but use the current database encoding. Both seem sane to me. If we use the current database encoding, then we have to also decide what to do with the input, in pgp_armor_headers(). If armor() uses the database encoding, but pgp_armor_headers() treats the input as UTF-8, then a round-trip with pgp_armor_headers(armor(?)) won't work. - Heikki
Attachment
On 10/1/14, 9:11 AM, Heikki Linnakangas wrote: > I spent a little time cleaning up the regression tests and docs, and > ended up with the attached. But then I realized that there's a problem > with UTF-8 conversion in the armor() function. It returns the armored > blob as text, but forcibly converts the keys and values to UTF-8. That's > not cool, because you will get invalidly encoded strings into the > database, if you use the function while connected to a database that > uses some other encoding than UTF-8. > > RFC4880 says that the headers are in UTF-8, but armor() cannot safely > return UTF-8 encoded text unless the database's encoding is also UTF-8. > It also rightly says that using anything else than plain ASCII, even > though nominally it's UTF-8, is a bad idea, because the whole point of > ASCII-armoring is to make the format safe from encoding conversions. Ugh. Right. > We have two options: > > 1. Throw an error if there are any non-ASCII characters in the key/value > arrays. > 2. Don't convert them to UTF-8, but use the current database encoding. > > Both seem sane to me. If we use the current database encoding, then we > have to also decide what to do with the input, in pgp_armor_headers(). > If armor() uses the database encoding, but pgp_armor_headers() treats > the input as UTF-8, then a round-trip with pgp_armor_headers(armor(?)) > won't work. Yeah. Both options seem fine to me. Throwing an error perhaps slightly more so. .marko
On 10/01/2014 11:58 AM, Marko Tiikkaja wrote: > On 10/1/14, 9:11 AM, Heikki Linnakangas wrote: >> We have two options: >> >> 1. Throw an error if there are any non-ASCII characters in the key/value >> arrays. >> 2. Don't convert them to UTF-8, but use the current database encoding. >> >> Both seem sane to me. If we use the current database encoding, then we >> have to also decide what to do with the input, in pgp_armor_headers(). >> If armor() uses the database encoding, but pgp_armor_headers() treats >> the input as UTF-8, then a round-trip with pgp_armor_headers(armor(?)) >> won't work. > > Yeah. Both options seem fine to me. Throwing an error perhaps slightly > more so. I went with 1, throw an error. I also added checks that the key or value doesn't contain any embedded newlines, and that the key doesn't contain an embedded ": ". Those would cause the armor to be invalid. I think this is now ready for commit, but since I've changed it quite significantly from what you originally submitted, please take a moment to review this. - Heikki
Attachment
On 10/1/14 1:01 PM, Heikki Linnakangas wrote: > On 10/01/2014 11:58 AM, Marko Tiikkaja wrote: >> On 10/1/14, 9:11 AM, Heikki Linnakangas wrote: >>> We have two options: >>> >>> 1. Throw an error if there are any non-ASCII characters in the key/value >>> arrays. >>> 2. Don't convert them to UTF-8, but use the current database encoding. >>> >>> Both seem sane to me. If we use the current database encoding, then we >>> have to also decide what to do with the input, in pgp_armor_headers(). >>> If armor() uses the database encoding, but pgp_armor_headers() treats >>> the input as UTF-8, then a round-trip with pgp_armor_headers(armor(?)) >>> won't work. >> >> Yeah. Both options seem fine to me. Throwing an error perhaps slightly >> more so. > > I went with 1, throw an error. I also added checks that the key or value > doesn't contain any embedded newlines, and that the key doesn't contain > an embedded ": ". Those would cause the armor to be invalid. Great. > I think this is now ready for commit, but since I've changed it quite > significantly from what you originally submitted, please take a moment > to review this. 1) I see this compiler warning: pgp-pgsql.c: In function ‘pg_armor’: pgp-pgsql.c:960:18: warning: ‘values’ may be used uninitialized in this function [-Wmaybe-uninitialized] pgp_armor_encode((uint8 *) VARDATA(data), data_len, &buf, It's bogus, but worth silencing anyway. 2) There's what looks like an extra whitespace change in pgp_armor_encode(), but maybe that's intentional? 3) Also, I think the attached two corner cases deserve their own tests. Other than the above, the patch looks good to me. Huge thanks for your work on this one! .marko
Attachment
On 10/01/2014 02:47 PM, Marko Tiikkaja wrote: > On 10/1/14 1:01 PM, Heikki Linnakangas wrote: >> I think this is now ready for commit, but since I've changed it quite >> significantly from what you originally submitted, please take a moment >> to review this. > > 1) I see this compiler warning: > > pgp-pgsql.c: In function ‘pg_armor’: > pgp-pgsql.c:960:18: warning: ‘values’ may be used uninitialized in > this function [-Wmaybe-uninitialized] > pgp_armor_encode((uint8 *) VARDATA(data), data_len, &buf, > > It's bogus, but worth silencing anyway. > > 2) There's what looks like an extra whitespace change in > pgp_armor_encode(), but maybe that's intentional? > > 3) Also, I think the attached two corner cases deserve their own tests. > > Other than the above, the patch looks good to me. Huge thanks for your > work on this one! Ok, committed with those fixes. - Heikki