> Maybe Google used to load the pages under /list/ and crawl them for links > but just not include the actual pages in the index or something > > I wonder if we can inject these into Google using a sitemap. I think that > should work -- will need some investigation on exactly how to do it, as > sitemaps also have individual restrictions on the number of urls per file, > and we do have quite a few messages. > > > Why is that /list/ exclusion there in the first place? > > Because there are basically infinite number of pages in that space, due to > the fact that you can pick an arbitrary point in time to view from.
Maybe we can create a new page that's specifically to be used by crawlers, that lists all emails, each only once. Say (unimaginatively) /list_crawlers/2019-08/ containing links to all emails of all public lists occurring during August 2019.
That's pretty much what I'm suggesting but using a sitemap so it's directly injected.