Re: Proposal to add a new URL data type. - Mailing list pgsql-hackers

From Alexander Borisov
Subject Re: Proposal to add a new URL data type.
Date
Msg-id 70b01d6b-d6d4-43bd-977a-8a51cf5a0b8a@gmail.com
Whole thread Raw
Responses Re: Proposal to add a new URL data type.
List pgsql-hackers
10.12.2024 13:59, Victor Yegorov пишет:
> чт, 5 дек. 2024 г. в 17:02, Alexander Borisov <lex.borisov@gmail.com 
> <mailto:lex.borisov@gmail.com>>:
> 

[..]

> 
> Hey, I had a look at this patch and found its functionality mature and 
> performant.
> 
> As Peter mentioned pguri, I used it to compare with the proposed 
> extension. This brought up
> the following differences:
> - pguri (uriparser 0.9.8) doesn't support Chinese symbols in the host 
> part of URI (uri_test1.sh):
> 
>      ERROR:  invalid input syntax for type uri at or near "事 
> 例.com#comments <http://xn--3kq3x.com#comments>"
> 
>    Therefore, I avoided Chinese or Cyrillic symbols in the pguri test 
> script.
> - There are no SET functions in the pguri, changing specific portions of 
> URI is troublesome. I used
>    replace() in the test, but this is an error prone approach.
> - It's even more troublesome to set parts of the URI that are not 
> initially there. Probably, a full decomposition
>    into parts and the following wrap up is needed
> Suggested extension has no such limitations. Additionally, pguri 
> extracts userinfo as a whole,
> while suggested extension can get/set user and password individually.
> 
> 
> Running tests (attached) I got the following numbers:
> $ ./url_test.sh
> NOTICE:  extension "url" already exists, skipping
> tps = 13068.287423 (without initial connection time)
> tps = 12888.937747 (without initial connection time)
> tps = 12830.642558 (without initial connection time)
> tps = 12846.341411 (without initial connection time)
> tps = 13187.955601 (without initial connection time)
> 
> $ ./uri_test2.sh
> NOTICE:  extension "uri" already exists, skipping
> tps = 2441.934308 (without initial connection time)
> tps = 2513.277660 (without initial connection time)
> tps = 2484.641673 (without initial connection time)
> tps = 2519.312395 (without initial connection time)
> tps = 2512.364492 (without initial connection time)
> 
> So it's 12.9k vs 2.5k, or 6x faster for a case where we replace 5 parts 
> of the original URL.
> 
> Given its performance and functionality, I find the suggested URL 
> extension better than pguri.
> 

Thanks for the constructive comments and the testing you have done.


> 
> Now to the review part.
> 
> 1. Applying patch causes indentation warning, please, bring spacing to 
> the project's policy
> 
> $ git apply ~/0001-Add-url-data-type-to-contrib.patch
> /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:837: indent with 
> spaces.
>      return lexbor_calloc(1, sizeof(lexbor_array_t));
> /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:843: indent with 
> spaces.
>      if (array == NULL) {
> /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:844: indent with 
> spaces.
>          return LXB_STATUS_ERROR_OBJECT_IS_NULL;
> /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:845: indent with 
> spaces.
>      }
> /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:847: indent with 
> spaces.
>      if (size == 0) {
> warning: squelched 6350 whitespace errors
> warning: 6355 lines add whitespace errors.

This will be fixed when the main URL parser code is rewritten to fit
the Postgres style/infrastructure.


> 2. There's a lexbor/ library that contains core and url parts. Feels 
> like some commentary about what's
>    inside is required.

Yeah, that's a good point.


> 3. Do you think it's possible to adjust your code to use existing 
> postgres infrastructure instead? I don't
>    think having its own infrastructure for a single extension is a good 
> thing. Also, this might be a source
>    for performance improvements in the core.

To implement (rewrite/modify) a URL parser using exclusively Postgres
infrastructure is one of the main goals.

> 
> 4. There's no user visible documentation, please, add one.

That's a good point.


> I've created a commitfest entry for the patch: https:// 
> commitfest.postgresql.org/51/5432/ <https:// 
> commitfest.postgresql.org/51/5432/>
> I was not able to find you, please, register a community account and set 
> yourself as an author for the patch.

Done.

> 
> -- 
> Victor Yegorov


--
Alexander Borisov



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Suggestion to standardize comment format in pg_dump
Next
From: Andres Freund
Date:
Subject: Re: FileFallocate misbehaving on XFS