Re: once more: documentation search indexing - Mailing list pgsql-www

From Magnus Hagander
Subject Re: once more: documentation search indexing
Date
Msg-id CABUevEyGwaZE8KJg=-K4f7moUo=UbV3AFbnmtTB-c31ojNn2Vg@mail.gmail.com
Whole thread Raw
In response to Re: once more: documentation search indexing  ("Jonathan S. Katz" <jkatz@postgresql.org>)
Responses Re: once more: documentation search indexing  (Michael Christofides <michael@pgmustard.com>)
List pgsql-www
On Sun, Jun 13, 2021 at 2:41 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
>
> On 6/12/21 5:37 PM, Andres Freund wrote:
>
> >>> - add a docs/current link to https://www.postgresql.org/docs/. Often
> >>>   enough that's what a user wants anyway, and it's not useful to add
> >>>   additional steps for users and search engines to navigate to
> >>>   docs/current/.
> >>
> >> We do that at the very top: that is the first link in the main body.
> >> This was done back in Nov 2020[1]
> >
> > Oh - I had not realized that at all. I think the similarity to the news
> > bar made me completely blend the "view the manual" element out.
>
> Yeah, there may be a UX tweak we can do there. However, given the length
> of time we've had a "current" doc link at the top of the content bar,

I agree with the UX tweak being needed -- I've missed that button a t
least a couple of times myself, and I am the one who put it there :)


> I'm skeptical that it's helped the SEO problem.

Agreed, on this particular one (obviously not on the general SEO
problem, that part is pretty obvious)



> >> However, I'm also not opposed to putting a (Current) link next to the
> >> current version in the table. I think that'd at least be helpful from a
> >> user perspective, if they don't click the big button up top.
> >
> > Yea, I think that'd be good.
>
> Cool. This is a trivial add.

WFM. +1.


> >>> - put version in page titles where it makes sense. E.g. change
> >>>   "PostgreSQL: Documentation: 10: 6.1. Inserting Data" to
> >>>   "PostgreSQL 10 Documentation: 6.1. Inserting Data"
> >>>
> >>>   The current ordering doesn't seem like it has much going for it, and
> >>>   it can't help search engines to have the version number people might
> >>>   search for removed from the product name.
> >>>
> >>>   Right now this seem to contribute to less than helpful titles in
> >>>   search engine results. Searching anonymously for "postgres alter
> >>>   table" I get the less than helpful "Documentation: 12: ALTER TABLE -
> >>>   PostgreSQL" on google.
> >>>
> >>>   It might also be worth to go a bit further and put the documentation
> >>>   version *after* the page title, given that it's most likely already
> >>>   clear to the reader that this is about postgres. I.e. something like
> >>>   "ALTER TABLE - Documentation for PostgreSQL 14"
> >>
> >> I think having "PostgreSQL $MAJOR_VERSION" together would help both for
> >> some of the indexing issues + readability in the search engine. The
> >> question is around how the content is ordered. in the title.
> >>
> >> Doing "PostgreSQL $MAJOR_VERSION: Documentation: $page_title" might be
> >> the way to go. The other thing I see done for SEO what you suggest, but
> >> just hyphenated i.e. "ALTER TABLE - Documentation - PostgreSQL 14"
> >>
> >> Anyway, I'm generally in favor for combining at least "PostgreSQL
> >> $MAJOR_VERSION."
> >
> > Yea, let's do that separately then.
> >
> > WRT ordering, I do think I prefer the versions with the actual subject
> > of the page first - to distinguish between different PG doc pages
> > "PostgreSQL 14 Documentation" is really not helpful. I often have
> > multiple doc pages open in different tabs, and there's right now no way
> > to distinguish them, because there's never enough space for even just
> > "PostgreSQL 13: Documentation:", not to speak of an actual title.

So we actually have a ticket in that ticket system that nobody uses,
tat's been there since 2012, so it even carried over from the old
layout :)

That one deals with the generic case of our pages being "PostgreSQL:
<title>", which leads to every tab you open basically saying
"PostgreSQL" and nothing more. The docs are even more affected by
that, but we should consider this across the whole site, and not just
the docs.

As for the suggested titles above, I'd vote for "ALTER TABLE -
Documentation - PostgreSQL 14" or "ALTER TABLE - PostgreSQL 14
Documentation". In a way "documentation" feels redundant there wihle
you're on the site, but it is what goes in the links on search hits
and I think it's definitely valuable to see specifically that it's
documentation there.


> ...but that's why you can hover your mouse of the tab, and the full
> title appears! ;)

Yeah, but that's a workaround of course :)


> That said, there are SEO ramifications to the ordering of the content in
> the <title> tag :) This one is a tricky balance.

This one I cannot comment on, I know nothing about it. Other than that
it's important :)


> I'd suggest starting with
>
>     "PostgreSQL $MAJOR_VERSION: Documentation: $page_title"
>
> and see how we do with that.
>
> >> That all said, as stated and cited in some of those previous threads, I
> >> think the biggest lift is around making our documentation URLs
> >> canonical. After discussing with Magnus a bit, there are a few things
> >> that we need to consider in it:
> >>
> >> 1. Whether or not the documentation page is in "current"
> >> 2. If it's not in "current", which is the last version the page is a
> >> part of? We make that the canonical
> >
> > Yea, I know that's a potentially significant improvement. I just didn't
> > feel it's useful to wade into the topic because it's been discussed for
> > about a decade by now. And that there's things we could make easier
> > progress on...
>
> I think we're at the point where "all else has failed, so let's do this."


Yes.

It can get worse. But not all that much worse...


> >> I've attached a patch that does this. The one part I'm not sure I like
> >> is how we treat something that is solely in "devel" -- knowing that
> >> eventually something in devel could end up in current. Perhaps if
> >> something is only in "devel", we exclude it from being part of the
> >> canonical tree?
> >
> > Right now all of docs/devel is prevented from being indexed via
> > robots.txt:
> > Disallow: /docs/devel/
> >
> > So it won't really matter for SEO purposes.
>
> Thanks for pointing that out, I hadn't checked robots.txt in awhile.
>
> So this simplifies the patch a bit, i.e. we will not show a canonical
> URL on the devel pages.
>
> Updated patch to account for that. I also included a change to the docs
> index page to show which one is "current" in the table.

Absolute nitpick, but isn't:
+    if len(list(filter(lambda v: v.version.current, versions))):
cleaner written as:
if any(filter(lambda v: v.version.current, versions)):
?

And the loop thereafter I think can just be:
version_max = max(versions, key=lambda v: v.tree)
?


I do think the bigger question is if we want the actual /current/ URL
to be the canonical one, rather than the /<version number of current
version>/?

I would've guessed that's better? But again I don't realy know, that's
a guess, so if it was considered and rejected for good reasons then
ignore that comment :)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



pgsql-www by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: PGWEB Testing Suite Development
Next
From: Devrim Gündüz
Date:
Subject: Adding Rocky Linux to downloads page