Thread: Re: Proposal to add a new URL data type.

Re: Proposal to add a new URL data type.

From
Matthias van de Meent
Date:
On Thu, 5 Dec 2024 at 15:02, Alexander Borisov <lex.borisov@gmail.com> wrote:
> What is the main difference between WHATWG and RFC 3986?
[snip]
> [host]
> Source: https://exаmple.com/ (а — U+0430)
> RFC 3986: https://exаmple.com/.
> WHATWG: https://xn--exmple-4nf.com/.
[snip]
> [path]
> Source: https://example.com/a/./b/../c
> RFC 3986: https://example.com/a/./b/../c.
> WHATWG: https://example.com/a/c.
[snip]
> Proposal
>
> I propose to add a new data type for PostgreSQL as an extension, in
> contrib.  Name the new type URL and use the WHATWG URL specification to
> implement the new type.

I'd be extremely annoyed if URLs I wrote into the database didn't
return in identical manner when fetched from the database. See also
how numeric has different representations of the same value: 2.0 and
2.00 are equivalent for sorting purposes, they aren't the same and
cannot just truncate those zeroes. Note that a path of "/%2e/" could
well be interpreted differently from "/./" or "/" by a server.

> The choice of URL parsing specification is
> justified by the following factors:
> 1. Live specification, adopts to modern realities.

I don't think choosing to defer to a living standard is a good idea
for contrib extensions, which are expected to be supported and stable
with the major PostgreSQL release they're bundled with. If (when) that
living standard gets updated, as tends to happen to such standards,
we'd suddenly lose compatibility with the standard we said we
supported, which isn't a nice outlook. Compare that to RFCs, which
AFAIK don't change in specification once released.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: Proposal to add a new URL data type.

From
Alexander Borisov
Date:
06.12.2024 21:04, Matthias van de Meent:
> On Thu, 5 Dec 2024 at 15:02, Alexander Borisov <lex.borisov@gmail.com> wrote:
[..]
> 
> I'd be extremely annoyed if URLs I wrote into the database didn't
> return in identical manner when fetched from the database. See also
> how numeric has different representations of the same value: 2.0 and
> 2.00 are equivalent for sorting purposes, they aren't the same and
> cannot just truncate those zeroes. Note that a path of "/%2e/" could
> well be interpreted differently from "/./" or "/" by a server.

That's why data types are invented.  Most likely, you will not be able
to write bad UTF-8 bit sequence into a field with the text type.
Because the incoming data will not pass validation.  The user chooses
the data type for his needs, knowing how it works.
I mean that the data in the database should be stored validated and
choosing the URL type to store URLs should not be surprised that
the incoming URL will be parsed and will pass validation.

Also, no one is stopping you from storing the URL in text format and
using the new type on the fly.

> 
> I don't think choosing to defer to a living standard is a good idea
> for contrib extensions, which are expected to be supported and stable
> with the major PostgreSQL release they're bundled with. If (when) that
> living standard gets updated, as tends to happen to such standards,
> we'd suddenly lose compatibility with the standard we said we
> supported, which isn't a nice outlook. Compare that to RFCs, which
> AFAIK don't change in specification once released.

WHATWG:
"The standard can generally not be changed in backwards-incompatible
ways without extreme care, and with implementer commitments leading
the way."

You can read more about what it means Living Standard
https://whatwg.org/faq#living-standard.


--
Alexander Borisov