Re: Fixing Google Search on the docs (redux) - Mailing list pgsql-www

From Andres Freund
Subject Re: Fixing Google Search on the docs (redux)
Date
Msg-id 20201121194534.bthxe2nrc7pvjelo@alap3.anarazel.de
Whole thread Raw
In response to Re: Fixing Google Search on the docs (redux)  (Magnus Hagander <magnus@hagander.net>)
List pgsql-www
Hi,

On 2020-11-21 15:57:28 +0100, Magnus Hagander wrote:
> On Thu, Nov 19, 2020 at 8:50 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2020-11-18 18:28:49 +0100, Magnus Hagander wrote:
> > > We've discussed this many times before, and I think so far they've all
> > > bogged down at "google suck" :) The problem is that they don't even
> > > consider the case like we have where the pages *aren't* identical, but
> > > yet related.
> >
> > Is any search engine better at this? I don't think so?
> 
> I doubt it, most tend to copy Google. And in either case it doesn't
> matter that much -- the *vast* majority of our inbound search traffic
> is google vs the other searches. By such a margin that it's not even a
> point in considering the others.

I was more wondering whether it's "search engines sucks" or "google
sucks" - obviously g search is dominant...


> > > The problem it usually comes down to is that if we do that, then you
> > > will no longer be able to say search for something in the old docs *at
> > > all*.
> >
> > I think that'd still be better than the current situation. But I hope we
> > can do better:
> >
> > > A good example right now might be that recovery.conf stuff goes
> > > away. Even if you explicitly search for "postgresql recovery.conf 11".
> > > And I'd guess the majority of people are actually looking for things
> > > in versions that are NOT the latest (though an even bigger majority of
> > > people will be looking for things in versions that are not 9.1).
> >
> > E.g. not applying canonical when there's no newer version.
> 
> That we can definitely go. So for recovery.conf it would still work,
> but anything that goes on a page where the page still exists, I don't
> see how we could separate that out and not do a canonical for that...

Compute a similarity metric ;). No, I'm not serious.


I wonder if it's worth adding some more metadata to our pages for
google's benefit. Perhaps it'd be *slightly* less annoying to navigate
to the right version of the docs if we added breadcrumb annotations
https://developers.google.com/search/docs/data-types/breadcrumb#json-ld_1

I can imagine - but have nothing but intuition to back that up - that we
also make google's job harder by having very recent timestamp for each
version of the docs. Perhaps we ought to add datePublished /
dateModified annotations, and freeze datePublished to the release?

And probably also not update dateModified when the page didn't change,
but I think you were discussing that elsewhere.

Greetings,

Andres Freund



pgsql-www by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Fixing Google Search on the docs (redux)
Next
From: Magnus Hagander
Date:
Subject: Archiving of pgsql-announce