Thread: downloading list archive mboxes

downloading list archive mboxes

From
Justin Pryzby
Date:
I used to be able to wget the "download mbox" link.
https://www.postgresql.org/list/pgsql-hackers/

After I did this one or twice, I added a .netrc entry to make this easy:

machine www.postgresql.org
login archives
password antispam

This is no longer working - instead, wget retrieves an HTML file that says:
|The website you are trying to log in to (List archives) is using the
|postgresql.org community login system. In this system you create a
|central account that is used to log into most postgresql.org services.
|Once you are logged into this account, you will automatically be
|logged in to the associated postgresql.org services.

I hope that can be fixed, unless it was deliberate, which would be unfortunate.

-- 
Justin



Re: downloading list archive mboxes

From
"David G. Johnston"
Date:
On Sat, Jun 25, 2022 at 11:40 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
I used to be able to wget the "download mbox" link.
https://www.postgresql.org/list/pgsql-hackers/

I hope that can be fixed, unless it was deliberate, which would be unfortunate.


It was intentional.


David J.

Re: downloading list archive mboxes

From
Magnus Hagander
Date:
On Sat, Jun 25, 2022 at 8:47 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Sat, Jun 25, 2022 at 11:40 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
I used to be able to wget the "download mbox" link.
https://www.postgresql.org/list/pgsql-hackers/

I hope that can be fixed, unless it was deliberate, which would be unfortunate.


It was intentional.



It was indeed deliberate. Are there any locations where it still *tells* you to use the archives/antispam method?

FWIW, the old method should still work fine as long as you provide the username/password in basic auth up front. Can you explain exactly the scenario in which it fails? Like, what command did you actually use?

--

Re: downloading list archive mboxes

From
Justin Pryzby
Date:
On Sat, Jun 25, 2022 at 09:01:07PM +0200, Magnus Hagander wrote:
> On Sat, Jun 25, 2022 at 8:47 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
> 
> > On Sat, Jun 25, 2022 at 11:40 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> >> I used to be able to wget the "download mbox" link.
> >> https://www.postgresql.org/list/pgsql-hackers/
> >>
> >> I hope that can be fixed, unless it was deliberate, which would be
> >> unfortunate.
> >>
> > It was intentional.

Oh, it's unfortunate then.  I'm used to using wget to retrieve a mailbox on a
remote host over SSH.  In the immediate case, the "resend email" link is not
working (maybe it did work, but I cannot find the mail and I gave up waiting).

When I finally saved it in my web browser, it took me 5 minutes to realize that
I'd saved it locally and that I'd need to scp the mailbox to the remote side.

I've also retrieved the mailbox to a remote server to retrieve someone's
patchset to compile on the remote side.

> It was indeed deliberate. Are there any locations where it still *tells*
> you to use the archives/antispam method?

Not to my knowledge - just muscle memory.

> FWIW, the old method should still work fine as long as you provide the
> username/password in basic auth up front. 

I tried but haven't gotten this to work yet.

curl -u 'archives:antispam' -L -v
http://www.postgresql.org/message-id/flat/126b4480-359c-b745-a713-336ae96d1936%40inbox.ru

> Can you explain exactly the scenario in which it fails? Like, what command
> did you actually use?

I used this command:
wget https://www.postgresql.org/message-id/flat/126b4480-359c-b745-a713-336ae96d1936%40inbox.ru

and the web server happily responded with a 200 OK, so wget didn't retry, as it
used to.

If it had responded with HTTP 403, wget would've retried with basic
authentication (I think it's deliberate and even suggested by RFC to initially
attempt without sending a password, even if one is available).

-- 
Justin



Re: downloading list archive mboxes

From
Magnus Hagander
Date:
On Sat, Jun 25, 2022 at 9:12 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Sat, Jun 25, 2022 at 09:01:07PM +0200, Magnus Hagander wrote:
> On Sat, Jun 25, 2022 at 8:47 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
>
> > On Sat, Jun 25, 2022 at 11:40 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> >> I used to be able to wget the "download mbox" link.
> >> https://www.postgresql.org/list/pgsql-hackers/
> >>
> >> I hope that can be fixed, unless it was deliberate, which would be
> >> unfortunate.
> >>
> > It was intentional.

Oh, it's unfortunate then.  I'm used to using wget to retrieve a mailbox on a
remote host over SSH.  In the immediate case, the "resend email" link is not
working (maybe it did work, but I cannot find the mail and I gave up waiting).

When I finally saved it in my web browser, it took me 5 minutes to realize that
I'd saved it locally and that I'd need to scp the mailbox to the remote side.

I've also retrieved the mailbox to a remote server to retrieve someone's
patchset to compile on the remote side.

> It was indeed deliberate. Are there any locations where it still *tells*
> you to use the archives/antispam method?

Not to my knowledge - just muscle memory.

> FWIW, the old method should still work fine as long as you provide the
> username/password in basic auth up front.

I tried but haven't gotten this to work yet.

curl -u 'archives:antispam' -L -v http://www.postgresql.org/message-id/flat/126b4480-359c-b745-a713-336ae96d1936%40inbox.ru


Um, that's not trying the mbox though? If I use that very command but put in /mbox/ instead of /flat/ it works for me.



> Can you explain exactly the scenario in which it fails? Like, what command
> did you actually use?

I used this command:
wget https://www.postgresql.org/message-id/flat/126b4480-359c-b745-a713-336ae96d1936%40inbox.ru

and the web server happily responded with a 200 OK, so wget didn't retry, as it
used to.

Well, again that URl i actually not for the mbox, so it just returns the thread.

But yes, if you again replace it with the /mbox/ part, it will give you a 200 OK and ask for community auth. To fetch that mbox with wget, you need to use:

wget --auth-no-challenge https://archives:antispam@www.postgresql.org/message-id/mbox/126b4480-359c-b745-a713-336ae96d1936%40inbox.ru



If it had responded with HTTP 403, wget would've retried with basic
authentication (I think it's deliberate and even suggested by RFC to initially
attempt without sending a password, even if one is available).

Yes, but AFAIK it would now no longer work at all for community authentication because it would pop up a basic auth prompt there, no? 

--

Re: downloading list archive mboxes

From
Justin Pryzby
Date:
On Sat, Jun 25, 2022 at 09:30:48PM +0200, Magnus Hagander wrote:
> Um, that's not trying the mbox though? If I use that very command but put
> in /mbox/ instead of /flat/ it works for me.

You're right - evidently I got confused while adding in user:pass to re-test.

> wget --auth-no-challenge
> https://archives:antispam@www.postgresql.org/message-id/mbox/126b4480-359c-b745-a713-336ae96d1936%40inbox.ru

Thanks.

> > If it had responded with HTTP 403, wget would've retried with basic
> > authentication (I think it's deliberate and even suggested by RFC to
> > initially
> > attempt without sending a password, even if one is available).
> 
> Yes, but AFAIK it would now no longer work at all for community
> authentication because it would pop up a basic auth prompt there, no?

Yeah, I wasn't suggesting to change it (back) but just point out the mechanism
behind the behavior.  Which we seem to agree on.

Thanks,
-- 
Justin