On Wed, Aug 28, 2019 at 7:45 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-08-28 19:09:40 +0200, Magnus Hagander wrote: > It blocks /list/ which has the subjects only.
Yea. But there's no way to actually get to all the individual messages without /list/? Sure, some will be linked to from somewhere else, but without the content below /list/, most won't be reached?
That is indeed a good point. But it has been that way for many years, so something must've changed. We last modified this in 2013....
Maybe Google used to load the pages under /list/ and crawl them for links but just not include the actual pages in the index or something
I wonder if we can inject these into Google using a sitemap. I think that should work -- will need some investigation on exactly how to do it, as sitemaps also have individual restrictions on the number of urls per file, and we do have quite a few messages.
Why is that /list/ exclusion there in the first place?
Because there are basically infinite number of pages in that space, due to the fact that you can pick an arbitrary point in time to view from.
> Nothing has been changed around that for many years from *our* side.
Any chance that there previously still was an archives.postgresql.org view or such that allowed to reach the individual messages without being blocked by robots.txt?
That one had a robots.txt blocking this going back even further in time.