Re: request a new feature in fuzzystrmatch - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: request a new feature in fuzzystrmatch
Date
Msg-id CAB7nPqRatR2oqe6OpubvVyHnKgxLt2CbF4V5Hf3phPavgEcx7w@mail.gmail.com
Whole thread Raw
In response to Re: request a new feature in fuzzystrmatch  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: request a new feature in fuzzystrmatch
List pgsql-hackers
Hi Liming,

Here is a more formal review of this patch.

First, when submitting a patch, please follow the following guidelines:
http://wiki.postgresql.org/wiki/Submitting_a_Patch
http://wiki.postgresql.org/wiki/Creating_Clean_Patches

Particularly, when you develop a new feature, people will expect that
you submit your patches based on the master branch in git. This
extremely facilitates the review and test work that can be done based
on your work. So do not create a new dedicated project on github like
that => https://github.com/liminghu/fuzzystrmatch, but for example
fork the postgres repository, and work directly on it using for
example custom branches, then generate patches between your custom
branches and postgres master branch.

Also, new features need to be submitted to a commit fest
(https://commitfest.postgresql.org/). The next commit fest is in
September, so you have plenty of time to update your patch based on
the comments of my review (and perhaps comments of others), then
register a patch directly there when you are done.

In order to perform my review, I took your github project and
generated a diff patch. The patch is attached, and applies to master
branch, so you could use it for your future work as a base.

OK, now for the review, here are some comments about the structure of
the patch, which is incorrect based on the structure extensions need
to have.
1) This patch lacks documentation, you shouldn't add a README.md file.
So remove it and updated the dedicated sgml documentation. In your
case, the file to be updated with more details about
Damerau–Levenshtein is doc/src/sgml/fuzzystrmatch.sgml.
2) Remove fuzzystrmatch--1.0.sql, this is not necessary anymore
because this module is upgraded to 1.1
3) Remove fuzzystrmatch--unpackaged--1.1.sql, it is not needed
4) Add fuzzystrmatch--1.0--1.1.sql, which is a file that can be used
to upgrade an existing fuzzystrmatch 1.0 to 1.1. This needs to contain
all the modifications allowing to do a correct upgrade: alter existing
objects to their new shape, add new objects, and remove unnecessary
objects. In your case, this file should have this shape:
/* contrib/fuzzystrmatch/fuzzystrmatch--1.0--1.1.sql */

-- complain if script is sourced in psql, rather than via ALTER EXTENSION
\echo Use "ALTER EXTENSION fuzzystrmatch UPDATE" to load this file. \quit

CREATE FUNCTION damerau_levenstein()...
etc.
5) Your code copies a function from TOMOYO Linux, which is under GPL2
license, so I believe that this cannot be integrated to Postgres which
is under PostgreSQL license (more permissive). Just based on that some
portions of this code cannot be included in Postgres, I am not sure
but this gives a sufficient reason to reject this patch.

This is all I have about the shape of the patch...

Also, I am not (yet) very familiar with Damerau–Levenshtein itself and
I need to read more about that before giving a precise opinion... but
here are some comments anyway based on what I can read from the code:
1) There is a lot of duplicated code among levenshtein.c,
dameraulevenshtein.c and dameraulevenshtein_new.c, why is that?
levenshtein.c refers even to a variable called trans_c which is even
used nowehere.
2) By doing a simple diff between for example levenshtein.c and
dameraulevenshtein.c, the only new thing is the function called
dameraulevenshtein_internal_noncompatible copied from tomoyo linux...
And this function is used nowhere... The new functions for
Damerau–Levenshtein equivalent to levenshtein[_less_equal], called
damaraulevenshtein[_less_equal], are exactly the same things

So, in short, even if I may not be able to give a precise opinion
about Damerau–Levenshtein, what this patch tries to achieve is unclear
to me... The only new feature I can see is
dameraulevenshtein_internal_noncompatible but this is taken from code
licensed as GPL.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: in-catalog Extension Scripts and Control parameters (templates?)
Next
From: Hari Babu
Date:
Subject: Re: Review: Patch to compute Max LSN of Data Pages