Re: Shorter archive URLs - Mailing list pgsql-www

From Magnus Hagander
Subject Re: Shorter archive URLs
Date
Msg-id CABUevEwu3RMgQKqO1WFaaaMt+WtU52g=VsjhwM-72vGRxU0dBg@mail.gmail.com
Whole thread Raw
In response to Re: Shorter archive URLs  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-www


On Tue, Jul 16, 2019 at 5:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bruce Momjian <bruce@momjian.us> writes:
> On Sun, Jul 14, 2019 at 12:52:46PM +0200, Magnus Hagander wrote:
>> This means that instead of being:
>> https://www.postgresql.org/message-id/
>> CABUevEyqGVV-s1yXQBsTpoPDCHy79j-yDtJcucrPb9Hh4CFTNg%40mail.gmail.com
>>
>> The url would be:
>> https://www.postgresql.org/message-id/Z0oaTfo56bV4tke6-r_PKJstHF8=

> It would be nice if I could easily compute the hash if I know the
> message-id --- I assume I can just run it through sha1.  This would
> allow me to shorten commit URLs, which would be a win for GMail.

Now that I look closer, Magnus' example shows that this proposal
is underspecified: exactly how would the message-ID be rendered
before being fed into sha1?  In particular it's not clear from
this whether "@" should be spelled "@" or "%40".  The existing
archive website is quite forgiving about that, you can write
either --- but the sha1 transform would be utterly unforgiving.
Instead of opaque hash X you'd get opaque hash Y, and there'd
be no way even to see what caused the mismatch.

It should always be @. The %40 is a sideeffect of @ not being allowed in an URL.

 

(BTW, after some experimentation I'm totally unable to reproduce
Magnus' example using sha1sum(1) and base64(1), so that is not
the only underspecified point here.)

The problem is that sha1sum generates a hex version of the sum, not the binary version. You also need to be careful about the newlines.
How I've done it is simply (in python):

>>> import hashlib, base64
>>> base64.urlsafe_b64encode(hashlib.sha1(b'CABUevEyqGVV-s1yXQBsTpoPDCHy79j-yDtJcucrPb9Hh4CFTNg@mail.gmail.com').digest())
b'Z0oaTfo56bV4tke6-r_PKJstHF8='


We could use a hex digest instead of a base64 of course, but that would make the URLs longer.

(FWIW, I'm not wedded to making this change -- that's why I posted here first -- this is just explaining how it was actually calculated)
 
-- 

pgsql-www by date:

Previous
From: Tom Lane
Date:
Subject: Re: Shorter archive URLs
Next
From: "Uliana Philippova \(Ispirer Systems\)"
Date:
Subject: Wiki editor request - Ispirer Systems