Re: daitch_mokotoff module - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: daitch_mokotoff module
Date
Msg-id 0cac2d60-5b85-59c6-1ac1-77092b8688b7@enterprisedb.com
Whole thread Raw
In response to Re: daitch_mokotoff module  (Dag Lem <dag@nimrod.no>)
Responses Re: daitch_mokotoff module
Re: daitch_mokotoff module
List pgsql-hackers
On 12/13/21 14:38, Dag Lem wrote:
> Please find attached an updated patch, with the following fixes:
> 
> * Replaced remaining malloc/free with palloc/pfree.
> * Made "make check" pass.
> * Updated notes on other implementations.
> 

Thanks, looks interesting. A couple generic comments, based on a quick 
code review.

1) Can the extension be marked as trusted, just like fuzzystrmatch?

2) The docs really need an explanation of what the extension is for, not 
just a link to fuzzystrmatch. Also, a couple examples would be helpful, 
I guess - similarly to fuzzystrmatch. The last line in the docs is 
annoyingly long.

3) What's daitch_mokotov_header.pl for? I mean, it generates the header, 
but when do we need to run it?

4) It seems to require perl-open, which is a module we did not need 
until now. Not sure how well supported it is, but maybe we can use a 
more standard module?

5) Do we need to keep DM_MAIN? It seems to be meant for some kind of 
testing, but our regression tests certainly don't need it (or the palloc 
mockup). I suggest to get rid of it.

6) I really don't understand some of the comments in daitch_mokotov.sql, 
like for example:

-- testEncodeBasic
-- Tests covered above are omitted.

Also, comments with names of Java methods seem pretty confusing. It'd be 
better to actually explain what rules are the tests checking.

7) There are almost no comments in the .c file (ignoring the comment on 
top). Short functions like initialize_node are probably fine without 
one, but e.g. update_node would deserve one.

8) Some of the lines are pretty long (e.g. the update_node signature is 
almost 170 chars). That should be wrapped. Maybe try running pgindent on 
the code, that'll show which parts need better formatting (so as not to 
get broken later).

9) I'm sure there's better way to get the number of valid chars than this:

   for (i = 0, ilen = 0; (c = read_char(&str[i], &ilen)) && (c < 'A' || 
c > ']'); i += ilen)
   {
   }

Say, a while loop or something?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: WIP: WAL prefetch (another approach)
Next
From: Dilip Kumar
Date:
Subject: Re: Add sub-transaction overflow status in pg_stat_activity