Re: once more: documentation search indexing - Mailing list pgsql-www

From Jonathan S. Katz
Subject Re: once more: documentation search indexing
Date
Msg-id eab6d326-cf24-e03f-5917-fe126ba6c03a@postgresql.org
Whole thread Raw
In response to Re: once more: documentation search indexing  (Andres Freund <andres@anarazel.de>)
Responses Re: once more: documentation search indexing  (Magnus Hagander <magnus@hagander.net>)
List pgsql-www
On 6/12/21 5:37 PM, Andres Freund wrote:

>>> - add a docs/current link to https://www.postgresql.org/docs/. Often
>>>   enough that's what a user wants anyway, and it's not useful to add
>>>   additional steps for users and search engines to navigate to
>>>   docs/current/.
>>
>> We do that at the very top: that is the first link in the main body.
>> This was done back in Nov 2020[1]
>
> Oh - I had not realized that at all. I think the similarity to the news
> bar made me completely blend the "view the manual" element out.

Yeah, there may be a UX tweak we can do there. However, given the length
of time we've had a "current" doc link at the top of the content bar,
I'm skeptical that it's helped the SEO problem.

>> However, I'm also not opposed to putting a (Current) link next to the
>> current version in the table. I think that'd at least be helpful from a
>> user perspective, if they don't click the big button up top.
>
> Yea, I think that'd be good.

Cool. This is a trivial add.

>>> - put version in page titles where it makes sense. E.g. change
>>>   "PostgreSQL: Documentation: 10: 6.1. Inserting Data" to
>>>   "PostgreSQL 10 Documentation: 6.1. Inserting Data"
>>>
>>>   The current ordering doesn't seem like it has much going for it, and
>>>   it can't help search engines to have the version number people might
>>>   search for removed from the product name.
>>>
>>>   Right now this seem to contribute to less than helpful titles in
>>>   search engine results. Searching anonymously for "postgres alter
>>>   table" I get the less than helpful "Documentation: 12: ALTER TABLE -
>>>   PostgreSQL" on google.
>>>
>>>   It might also be worth to go a bit further and put the documentation
>>>   version *after* the page title, given that it's most likely already
>>>   clear to the reader that this is about postgres. I.e. something like
>>>   "ALTER TABLE - Documentation for PostgreSQL 14"
>>
>> I think having "PostgreSQL $MAJOR_VERSION" together would help both for
>> some of the indexing issues + readability in the search engine. The
>> question is around how the content is ordered. in the title.
>>
>> Doing "PostgreSQL $MAJOR_VERSION: Documentation: $page_title" might be
>> the way to go. The other thing I see done for SEO what you suggest, but
>> just hyphenated i.e. "ALTER TABLE - Documentation - PostgreSQL 14"
>>
>> Anyway, I'm generally in favor for combining at least "PostgreSQL
>> $MAJOR_VERSION."
>
> Yea, let's do that separately then.
>
> WRT ordering, I do think I prefer the versions with the actual subject
> of the page first - to distinguish between different PG doc pages
> "PostgreSQL 14 Documentation" is really not helpful. I often have
> multiple doc pages open in different tabs, and there's right now no way
> to distinguish them, because there's never enough space for even just
> "PostgreSQL 13: Documentation:", not to speak of an actual title.

...but that's why you can hover your mouse of the tab, and the full
title appears! ;)

That said, there are SEO ramifications to the ordering of the content in
the <title> tag :) This one is a tricky balance.

I'd suggest starting with

    "PostgreSQL $MAJOR_VERSION: Documentation: $page_title"

and see how we do with that.

>> That all said, as stated and cited in some of those previous threads, I
>> think the biggest lift is around making our documentation URLs
>> canonical. After discussing with Magnus a bit, there are a few things
>> that we need to consider in it:
>>
>> 1. Whether or not the documentation page is in "current"
>> 2. If it's not in "current", which is the last version the page is a
>> part of? We make that the canonical
>
> Yea, I know that's a potentially significant improvement. I just didn't
> feel it's useful to wade into the topic because it's been discussed for
> about a decade by now. And that there's things we could make easier
> progress on...

I think we're at the point where "all else has failed, so let's do this."

>> I've attached a patch that does this. The one part I'm not sure I like
>> is how we treat something that is solely in "devel" -- knowing that
>> eventually something in devel could end up in current. Perhaps if
>> something is only in "devel", we exclude it from being part of the
>> canonical tree?
>
> Right now all of docs/devel is prevented from being indexed via
> robots.txt:
> Disallow: /docs/devel/
>
> So it won't really matter for SEO purposes.

Thanks for pointing that out, I hadn't checked robots.txt in awhile.

So this simplifies the patch a bit, i.e. we will not show a canonical
URL on the devel pages.

Updated patch to account for that. I also included a change to the docs
index page to show which one is "current" in the table.

Jonathan

Attachment

pgsql-www by date:

Previous
From: Andres Freund
Date:
Subject: Re: once more: documentation search indexing
Next
From: "Jonathan S. Katz"
Date:
Subject: Re: PGWEB Testing Suite Development