Hi Daniel,
06.12.2024 16:46, Daniel Gustafsson пишет:
>> On 6 Dec 2024, at 13:59, Alexander Borisov <lex.borisov@gmail.com> wrote:
>
>> As I've written before, there is a difference between parsing URLs
>> according to the RFC 3986 specification and WHATWG URLs. This is
>> especially true for host. Here are a couple more examples.
>
> As someone who wears another open-source hat which is heavily involved in
> parsing URLs I cannot stress enough how much I think postgres should avoid
> this. The example url http://http://http://@http://http://?http://#http:// is
> a valid url, but is rejected by a number of implementations and parsed
> differently by most that accept it.
Your example is valid, yes, it looks scary, t might catch someone off
guard. At the same time your URL is correctly parsed both RFC 3986
and WHATWG URL.
There are many examples of “scary” URLs that you can't even understand
how they are parsed. You can write a URL with any intimidating host,
path, scheme, but that's not what I mean.
There are generally accepted standards for URL/URI parsing RFC 3986 and
WHATWG URL. We are not talking about self-written implementations
(without relying on any specifications) or those who made a mistake
while implementing one of the standards.
I propose to implement support for one of the standards that looks
promising. On the contrary, everything is quite clear. All we need to
do is point out that we have a URL data type in extension by WHATWG
specification. I would even say that by creating a new type we will
contribute to the standardization of this zoo.
It's about creating a new URL data type according to the
specification WHATWG and including it in contrib as an extension.
--
Alexander Borisov