Thread: Re: Proposal to add a new URL data type.
10.12.2024 13:59, Victor Yegorov пишет: > чт, 5 дек. 2024 г. в 17:02, Alexander Borisov <lex.borisov@gmail.com > <mailto:lex.borisov@gmail.com>>: > [..] > > Hey, I had a look at this patch and found its functionality mature and > performant. > > As Peter mentioned pguri, I used it to compare with the proposed > extension. This brought up > the following differences: > - pguri (uriparser 0.9.8) doesn't support Chinese symbols in the host > part of URI (uri_test1.sh): > > ERROR: invalid input syntax for type uri at or near "事 > 例.com#comments <http://xn--3kq3x.com#comments>" > > Therefore, I avoided Chinese or Cyrillic symbols in the pguri test > script. > - There are no SET functions in the pguri, changing specific portions of > URI is troublesome. I used > replace() in the test, but this is an error prone approach. > - It's even more troublesome to set parts of the URI that are not > initially there. Probably, a full decomposition > into parts and the following wrap up is needed > Suggested extension has no such limitations. Additionally, pguri > extracts userinfo as a whole, > while suggested extension can get/set user and password individually. > > > Running tests (attached) I got the following numbers: > $ ./url_test.sh > NOTICE: extension "url" already exists, skipping > tps = 13068.287423 (without initial connection time) > tps = 12888.937747 (without initial connection time) > tps = 12830.642558 (without initial connection time) > tps = 12846.341411 (without initial connection time) > tps = 13187.955601 (without initial connection time) > > $ ./uri_test2.sh > NOTICE: extension "uri" already exists, skipping > tps = 2441.934308 (without initial connection time) > tps = 2513.277660 (without initial connection time) > tps = 2484.641673 (without initial connection time) > tps = 2519.312395 (without initial connection time) > tps = 2512.364492 (without initial connection time) > > So it's 12.9k vs 2.5k, or 6x faster for a case where we replace 5 parts > of the original URL. > > Given its performance and functionality, I find the suggested URL > extension better than pguri. > Thanks for the constructive comments and the testing you have done. > > Now to the review part. > > 1. Applying patch causes indentation warning, please, bring spacing to > the project's policy > > $ git apply ~/0001-Add-url-data-type-to-contrib.patch > /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:837: indent with > spaces. > return lexbor_calloc(1, sizeof(lexbor_array_t)); > /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:843: indent with > spaces. > if (array == NULL) { > /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:844: indent with > spaces. > return LXB_STATUS_ERROR_OBJECT_IS_NULL; > /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:845: indent with > spaces. > } > /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:847: indent with > spaces. > if (size == 0) { > warning: squelched 6350 whitespace errors > warning: 6355 lines add whitespace errors. This will be fixed when the main URL parser code is rewritten to fit the Postgres style/infrastructure. > 2. There's a lexbor/ library that contains core and url parts. Feels > like some commentary about what's > inside is required. Yeah, that's a good point. > 3. Do you think it's possible to adjust your code to use existing > postgres infrastructure instead? I don't > think having its own infrastructure for a single extension is a good > thing. Also, this might be a source > for performance improvements in the core. To implement (rewrite/modify) a URL parser using exclusively Postgres infrastructure is one of the main goals. > > 4. There's no user visible documentation, please, add one. That's a good point. > I've created a commitfest entry for the patch: https:// > commitfest.postgresql.org/51/5432/ <https:// > commitfest.postgresql.org/51/5432/> > I was not able to find you, please, register a community account and set > yourself as an author for the patch. Done. > > -- > Victor Yegorov -- Alexander Borisov
ср, 11 дек. 2024 г. в 19:04, Alexander Borisov <lex.borisov@gmail.com>:
> I've created a commitfest entry for the patch: https://
> commitfest.postgresql.org/51/5432/ <https://
> commitfest.postgresql.org/51/5432/>
> I was not able to find you, please, register a community account and set
> yourself as an author for the patch.
Done.
I've marked this patch as Rejected, per discussion.
Still, I find this functionality nice to have, I'd be happy if you could create an extension on github (or similar platform).
Victor Yegorov
On Thu, Dec 19, 2024 at 7:52 AM Victor Yegorov <vyegorov@gmail.com> wrote: > ср, 11 дек. 2024 г. в 19:04, Alexander Borisov <lex.borisov@gmail.com>: >> >> > I've created a commitfest entry for the patch: https:// >> > commitfest.postgresql.org/51/5432/ <https:// >> > commitfest.postgresql.org/51/5432/> >> > I was not able to find you, please, register a community account and set >> > yourself as an author for the patch. >> >> Done. > > > I've marked this patch as Rejected, per discussion. > +1 > Still, I find this functionality nice to have, I'd be happy if you could create an extension on github (or similar platform). > +1 Robert Treat https://xzilla.net