Thread: [PATCH] Fix docs to use canonical links

[PATCH] Fix docs to use canonical links

From
"Joel Jacobson"
Date:
Hello hackers,

During work in the separate thread [1], I discovered more cases
where the link in docs wasn't the canonical link [2].

[1] https://postgr.es/m/CAKFQuwYEX9Pj9G0ZHJeWSmSbnqUyGH+FYcW-66eZjfVG4KOjiQ@mail.gmail.com
[2] https://en.wikipedia.org/wiki/Canonical_link_element

The. below script e.g. doesn't parse SGML, and is broken in some other ways
also, but probably good enough to suggest changes that can then be manually
carefully verified.

```
#!/bin/bash
output_file="changes.log"
> $output_file
extract_canonical() {
  local url=$1
  canonical=$(curl -s "$url" | sed -n 's/.*<link rel="canonical" href="\([^"]*\)".*/\1/p')
  if [[ -n "$canonical" && "$canonical" != "$url" ]]; then
    echo "-$url" >> $output_file
    echo "+$canonical" >> $output_file
    echo $canonical
  else
    echo $url
  fi
}
find . -type f -name '*.sgml' | while read -r file; do
  urls=$(sed -n 's/.*\(https:\/\/[^"]*\).*/\1/p' "$file")
  for url in $urls; do
    canonical_url=$(extract_canonical "$url")
    if [[ "$canonical_url" != "$url" ]]; then
      # Replace the original URL with the canonical URL in the file
      sed -i '' "s|$url|$canonical_url|g" "$file"
    fi
  done
done
```

Most of what it found was indeed correct, but I had to undo some mistakes it did.

All the changes in the attached patch have been manually verified, by clicking
the original link, and observing the URL seen in the browser.

/Joel
Attachment

Re: [PATCH] Fix docs to use canonical links

From
Michael Paquier
Date:
On Thu, Jun 27, 2024 at 11:27:45AM +0200, Joel Jacobson wrote:
> During work in the separate thread [1], I discovered more cases
> where the link in docs wasn't the canonical link [2].
>
> [1] https://postgr.es/m/CAKFQuwYEX9Pj9G0ZHJeWSmSbnqUyGH+FYcW-66eZjfVG4KOjiQ@mail.gmail.com
> [2] https://en.wikipedia.org/wiki/Canonical_link_element
>
> The. below script e.g. doesn't parse SGML, and is broken in some other ways
> also, but probably good enough to suggest changes that can then be manually
> carefully verified.

The 19 links you are updating here avoid redirections in Wikipedia and
the Postgres wiki.  It's always a bit of a chicken-and-egg game in
this area, because links always change, still I don't mind the change.

Any opinions from others?
--
Michael

Attachment

Re: [PATCH] Fix docs to use canonical links

From
Daniel Gustafsson
Date:
> On 1 Jul 2024, at 08:06, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Jun 27, 2024 at 11:27:45AM +0200, Joel Jacobson wrote:
>> During work in the separate thread [1], I discovered more cases
>> where the link in docs wasn't the canonical link [2].
>>
>> [1] https://postgr.es/m/CAKFQuwYEX9Pj9G0ZHJeWSmSbnqUyGH+FYcW-66eZjfVG4KOjiQ@mail.gmail.com
>> [2] https://en.wikipedia.org/wiki/Canonical_link_element
>>
>> The. below script e.g. doesn't parse SGML, and is broken in some other ways
>> also, but probably good enough to suggest changes that can then be manually
>> carefully verified.
>
> The 19 links you are updating here avoid redirections in Wikipedia and
> the Postgres wiki.  It's always a bit of a chicken-and-egg game in
> this area, because links always change, still I don't mind the change.

Avoding redirects is generally a good thing, not everyone is on lightning fast
internet.  Wikipedia is however not doing any 30X redirects so it's not really
an issue for those links, it's all 200 requests.

--
Daniel Gustafsson




Re: [PATCH] Fix docs to use canonical links

From
"Joel Jacobson"
Date:
On Mon, Jul 1, 2024, at 09:35, Daniel Gustafsson wrote:
> Avoding redirects is generally a good thing, not everyone is on lightning fast
> internet.  Wikipedia is however not doing any 30X redirects so it's not really
> an issue for those links, it's all 200 requests.

Yes, I noticed that too when observing the HTTPS traffic, so no issue there,
except that it's a bit annoying that the address bar suddenly changes.

However, I think David J had another good argument:

"If we are making wikipedia our authority we might as well use their standard for naming."

/Joel



Re: [PATCH] Fix docs to use canonical links

From
Daniel Gustafsson
Date:
> On 1 Jul 2024, at 13:09, Joel Jacobson <joel@compiler.org> wrote:
>
> On Mon, Jul 1, 2024, at 09:35, Daniel Gustafsson wrote:
>> Avoding redirects is generally a good thing, not everyone is on lightning fast
>> internet.  Wikipedia is however not doing any 30X redirects so it's not really
>> an issue for those links, it's all 200 requests.
>
> Yes, I noticed that too when observing the HTTPS traffic, so no issue there,
> except that it's a bit annoying that the address bar suddenly changes.

Right, I was unclear, I'm not advocating against changing.  It won't move the
needle compared to 30X redirects but it also won't hurt.

> However, I think David J had another good argument:
>
> "If we are making wikipedia our authority we might as well use their standard for naming."

It's a moving target, but so is most if not all links.

--
Daniel Gustafsson




Re: [PATCH] Fix docs to use canonical links

From
Tom Lane
Date:
Daniel Gustafsson <daniel@yesql.se> writes:
> On 1 Jul 2024, at 13:09, Joel Jacobson <joel@compiler.org> wrote:
>> However, I think David J had another good argument:
>> "If we are making wikipedia our authority we might as well use their standard for naming."

> It's a moving target, but so is most if not all links.

I see nothing wrong with this patch, so pushed.

            regards, tom lane