Home > mailing lists

Re: Shorter archive URLs - Mailing list pgsql-www

From	Magnus Hagander
Subject	Re: Shorter archive URLs
Date	July 16, 2019 11:49:41
Msg-id	CABUevEwu3RMgQKqO1WFaaaMt+WtU52g=VsjhwM-72vGRxU0dBg@mail.gmail.com Whole thread Raw
In response to	Re: Shorter archive URLs (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-www

Tree view

On Tue, Jul 16, 2019 at 5:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bruce Momjian <bruce@momjian.us> writes:
> On Sun, Jul 14, 2019 at 12:52:46PM +0200, Magnus Hagander wrote:
>> This means that instead of being:
>> https://www.postgresql.org/message-id/
>> CABUevEyqGVV-s1yXQBsTpoPDCHy79j-yDtJcucrPb9Hh4CFTNg%40mail.gmail.com
>>
>> The url would be:
>> https://www.postgresql.org/message-id/Z0oaTfo56bV4tke6-r_PKJstHF8=

> It would be nice if I could easily compute the hash if I know the
> message-id --- I assume I can just run it through sha1. This would
> allow me to shorten commit URLs, which would be a win for GMail.

Now that I look closer, Magnus' example shows that this proposal
is underspecified: exactly how would the message-ID be rendered
before being fed into sha1? In particular it's not clear from
this whether "@" should be spelled "@" or "%40". The existing
archive website is quite forgiving about that, you can write
either --- but the sha1 transform would be utterly unforgiving.
Instead of opaque hash X you'd get opaque hash Y, and there'd
be no way even to see what caused the mismatch.

It should always be @. The %40 is a sideeffect of @ not being allowed in an URL.

(BTW, after some experimentation I'm totally unable to reproduce
Magnus' example using sha1sum(1) and base64(1), so that is not
the only underspecified point here.)

The problem is that sha1sum generates a hex version of the sum, not the binary version. You also need to be careful about the newlines.

How I've done it is simply (in python):

>>> import hashlib, base64
>>> base64.urlsafe_b64encode(hashlib.sha1(b'CABUevEyqGVV-s1yXQBsTpoPDCHy79j-yDtJcucrPb9Hh4CFTNg@mail.gmail.com').digest())
b'Z0oaTfo56bV4tke6-r_PKJstHF8='

We could use a hex digest instead of a base64 of course, but that would make the URLs longer.

(FWIW, I'm not wedded to making this change -- that's why I posted here first -- this is just explaining how it was actually calculated)

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

pgsql-www by date:

From: Tom Lane
Date: 16 July 2019, 06:49:05
Subject: Re: Shorter archive URLs

From: "Uliana Philippova \(Ispirer Systems\)"
Date: 16 July 2019, 17:45:12
Subject: Wiki editor request - Ispirer Systems

Re: Shorter archive URLs - Mailing list pgsql-www

Previous

Next