Thread: Broken linkparsing in archives

Broken linkparsing in archives

From

Daniel Gustafsson

Date:

02 November 2022, 15:31:08

Looking at past announcements I noticed that Markdown links were parsed and/or
rendered incorrectly in the archives.  The example email that I noticed it on
was this:

https://www.postgresql.org/message-id/163724833494.26187.1931723451787420391@wrigleys.postgresql.org

..but it happens on all it seems, a more recent example:

https://www.postgresql.org/message-id/166472941958.662.2706300812023074847%40wrigleys.postgresql.org

The rendered links follow the same pattern, the last word in the markdown text
block is prepended to the url block and all of it added as the href:

[call for papers](https://2022.nordicpgday.org/cfp/)

becomes:

[call for <a href="http://papers](https://2022.nordicpgday.org/cfp/)"
rel="nofollow">papers](https://2022.nordicpgday.org/cfp/)</a>

Is this a known issue?

--
Daniel Gustafsson        https://vmware.com/

Re: Broken linkparsing in archives

From

Magnus Hagander

Date:

02 November 2022, 15:39:38

On Wed, Nov 2, 2022 at 1:31 PM Daniel Gustafsson <daniel@yesql.se> wrote:

Looking at past announcements I noticed that Markdown links were parsed and/or
rendered incorrectly in the archives. The example email that I noticed it on
was this:

https://www.postgresql.org/message-id/163724833494.26187.1931723451787420391@wrigleys.postgresql.org

..but it happens on all it seems, a more recent example:

https://www.postgresql.org/message-id/166472941958.662.2706300812023074847%40wrigleys.postgresql.org

The rendered links follow the same pattern, the last word in the markdown text
block is prepended to the url block and all of it added as the href:

[call for papers](https://2022.nordicpgday.org/cfp/)

becomes:

[call for <a href="http://papers](https://2022.nordicpgday.org/cfp/)" rel="nofollow">papers](https://2022.nordicpgday.org/cfp/)</a>

Is this a known issue?

Well, there is no markdown support at all :) So what happens comes out as a result of trying to extract links out of plaintext. This in turn is handled by the django urlize filter: https://docs.djangoproject.com/en/3.2/ref/templates/builtins/#urlize

Thus:

>>> from django.utils.html import urlize

>>> urlize('[call for papers](https://2022.nordicpgday.org/cfp/)')
'[call for <a href="http://papers](https://2022.nordicpgday.org/cfp/)">papers](https://2022.nordicpgday.org/cfp/)</a>'

And I'm not sure they *should* be considered, since the mime type of the body isn't markdown...

//Magnus

Re: Broken linkparsing in archives

From

Daniel Gustafsson

Date:

02 November 2022, 15:52:40

> On 2 Nov 2022, at 13:39, Magnus Hagander <magnus@hagander.net> wrote:

> And I'm not sure they *should* be considered, since the mime type of the body isn't markdown...

For emails sent as text to -announce, sure.  But.  Since we support markdown
formatting in news postings that go out to -announce, it seems a bit unhelpful
to generate broken links for all those posts.

If I can come up with a filter that converts a broken link from urlize for the
known case of markdown links, would that be an accepted solution?

--
Daniel Gustafsson        https://vmware.com/

Re: Broken linkparsing in archives

From

Magnus Hagander

Date:

03 November 2022, 16:17:43

On Wed, Nov 2, 2022 at 1:52 PM Daniel Gustafsson <daniel@yesql.se> wrote:

> On 2 Nov 2022, at 13:39, Magnus Hagander <magnus@hagander.net> wrote:

> And I'm not sure they *should* be considered, since the mime type of the body isn't markdown...

For emails sent as text to -announce, sure. But. Since we support markdown
formatting in news postings that go out to -announce, it seems a bit unhelpful
to generate broken links for all those posts.

I agree with the principe, but the question is how reliable we can make it. (One oculd also argue we *should* post those as text/markdown, but I fear that will break even more MUAs).

If I can come up with a filter that converts a broken link from urlize for the
known case of markdown links, would that be an accepted solution?

If it can be made reliable, I think that would be acceptable. It needs to be validated that it works in the full chain that we use on the site (we also include the silly obfuscation of email addresses in the filter chain), but as long as that's done I think we can and should do it.

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/