Re: Permute underscore separated components of columns before fuzzy matching - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Permute underscore separated components of columns before fuzzy matching
Date
Msg-id 799067.1700256184@sss.pgh.pa.us
Whole thread Raw
In response to Re: Permute underscore separated components of columns before fuzzy matching  (Mikhail Gribkov <youzhick@gmail.com>)
List pgsql-hackers
Mikhail Gribkov <youzhick@gmail.com> writes:
> Honestly I'm not entirely sure fixing only two switched words is worth the
> effort, but the declared goal is clearly achieved.

> I think the patch is good to go, although you need to fix code formatting.

I took a brief look at this.  I concur that we shouldn't need to be
hugely concerned about the speed of this code path.  However, we *do*
need to be concerned about its maintainability, and I think the patch
falls down badly there: it adds a chunk of very opaque and essentially
undocumented code, that people will need to reverse-engineer anytime
they are studying this function.  That could be alleviated perhaps
with more work on comments, but I have to wonder whether it's worth
carrying this logic at all.  It's a rather strange behavior to add,
and I wonder if many users will want it.

One thing that struck me is that no care is being taken for adjacent
underscores (that is, "foo__bar" and similar cases).  It seems
unlikely that treating the zero-length substring between the
underscores as a word to permute is helpful; moreover, it adds
an edge case that the string-moving logic could easily get wrong.
I wonder if the code should treat any number of consecutive
underscores as a single separator.  (Somewhat related: I think it
will behave oddly when the first or last character is '_', since the
outer loop ignores those positions.)

> And it would be much more convenient to work with your patch if every next
> version file will have a unique name (maybe something like "_v2", "_v3"
> etc. suffixes)

Please.  It's very confusing when there are multiple identically-named
patches in a thread.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Gurjeet Singh
Date:
Subject: Re: Change GUC hashtable to use simplehash?
Next
From: Bruce Momjian
Date:
Subject: Re: Lifetime of commit timestamps