Thread: [PATCH] Fix docs to use canonical links
Hello hackers, During work in the separate thread [1], I discovered more cases where the link in docs wasn't the canonical link [2]. [1] https://postgr.es/m/CAKFQuwYEX9Pj9G0ZHJeWSmSbnqUyGH+FYcW-66eZjfVG4KOjiQ@mail.gmail.com [2] https://en.wikipedia.org/wiki/Canonical_link_element The. below script e.g. doesn't parse SGML, and is broken in some other ways also, but probably good enough to suggest changes that can then be manually carefully verified. ``` #!/bin/bash output_file="changes.log" > $output_file extract_canonical() { local url=$1 canonical=$(curl -s "$url" | sed -n 's/.*<link rel="canonical" href="\([^"]*\)".*/\1/p') if [[ -n "$canonical" && "$canonical" != "$url" ]]; then echo "-$url" >> $output_file echo "+$canonical" >> $output_file echo $canonical else echo $url fi } find . -type f -name '*.sgml' | while read -r file; do urls=$(sed -n 's/.*\(https:\/\/[^"]*\).*/\1/p' "$file") for url in $urls; do canonical_url=$(extract_canonical "$url") if [[ "$canonical_url" != "$url" ]]; then # Replace the original URL with the canonical URL in the file sed -i '' "s|$url|$canonical_url|g" "$file" fi done done ``` Most of what it found was indeed correct, but I had to undo some mistakes it did. All the changes in the attached patch have been manually verified, by clicking the original link, and observing the URL seen in the browser. /Joel
Attachment
On Thu, Jun 27, 2024 at 11:27:45AM +0200, Joel Jacobson wrote: > During work in the separate thread [1], I discovered more cases > where the link in docs wasn't the canonical link [2]. > > [1] https://postgr.es/m/CAKFQuwYEX9Pj9G0ZHJeWSmSbnqUyGH+FYcW-66eZjfVG4KOjiQ@mail.gmail.com > [2] https://en.wikipedia.org/wiki/Canonical_link_element > > The. below script e.g. doesn't parse SGML, and is broken in some other ways > also, but probably good enough to suggest changes that can then be manually > carefully verified. The 19 links you are updating here avoid redirections in Wikipedia and the Postgres wiki. It's always a bit of a chicken-and-egg game in this area, because links always change, still I don't mind the change. Any opinions from others? -- Michael
Attachment
> On 1 Jul 2024, at 08:06, Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Jun 27, 2024 at 11:27:45AM +0200, Joel Jacobson wrote: >> During work in the separate thread [1], I discovered more cases >> where the link in docs wasn't the canonical link [2]. >> >> [1] https://postgr.es/m/CAKFQuwYEX9Pj9G0ZHJeWSmSbnqUyGH+FYcW-66eZjfVG4KOjiQ@mail.gmail.com >> [2] https://en.wikipedia.org/wiki/Canonical_link_element >> >> The. below script e.g. doesn't parse SGML, and is broken in some other ways >> also, but probably good enough to suggest changes that can then be manually >> carefully verified. > > The 19 links you are updating here avoid redirections in Wikipedia and > the Postgres wiki. It's always a bit of a chicken-and-egg game in > this area, because links always change, still I don't mind the change. Avoding redirects is generally a good thing, not everyone is on lightning fast internet. Wikipedia is however not doing any 30X redirects so it's not really an issue for those links, it's all 200 requests. -- Daniel Gustafsson
On Mon, Jul 1, 2024, at 09:35, Daniel Gustafsson wrote: > Avoding redirects is generally a good thing, not everyone is on lightning fast > internet. Wikipedia is however not doing any 30X redirects so it's not really > an issue for those links, it's all 200 requests. Yes, I noticed that too when observing the HTTPS traffic, so no issue there, except that it's a bit annoying that the address bar suddenly changes. However, I think David J had another good argument: "If we are making wikipedia our authority we might as well use their standard for naming." /Joel
> On 1 Jul 2024, at 13:09, Joel Jacobson <joel@compiler.org> wrote: > > On Mon, Jul 1, 2024, at 09:35, Daniel Gustafsson wrote: >> Avoding redirects is generally a good thing, not everyone is on lightning fast >> internet. Wikipedia is however not doing any 30X redirects so it's not really >> an issue for those links, it's all 200 requests. > > Yes, I noticed that too when observing the HTTPS traffic, so no issue there, > except that it's a bit annoying that the address bar suddenly changes. Right, I was unclear, I'm not advocating against changing. It won't move the needle compared to 30X redirects but it also won't hurt. > However, I think David J had another good argument: > > "If we are making wikipedia our authority we might as well use their standard for naming." It's a moving target, but so is most if not all links. -- Daniel Gustafsson
Daniel Gustafsson <daniel@yesql.se> writes: > On 1 Jul 2024, at 13:09, Joel Jacobson <joel@compiler.org> wrote: >> However, I think David J had another good argument: >> "If we are making wikipedia our authority we might as well use their standard for naming." > It's a moving target, but so is most if not all links. I see nothing wrong with this patch, so pushed. regards, tom lane