Re: english parser in text search: support for multiple words in the same position - Mailing list pgsql-hackers

From Tom Lane
Subject Re: english parser in text search: support for multiple words in the same position
Date
Msg-id 9373.1284931797@sss.pgh.pa.us
Whole thread Raw
In response to Re: english parser in text search: support for multiple words in the same position  (Sushant Sinha <sushant354@gmail.com>)
Responses Re: english parser in text search: support for multiple words in the same position
List pgsql-hackers
Sushant Sinha <sushant354@gmail.com> writes:
> For the headline generation to work properly, email/file/url/host need
> to become skip tokens. Updating the patch with that change.

I looked at this patch a bit.  I'm fairly unhappy that it seems to be
inventing a brand new mechanism to do something the ts parser can
already do.  Why didn't you code the url-part mechanism using the
existing support for compound words?  The changes made to parsetext()
seem particularly scary: it's not clear at all that that's not breaking
unrelated behaviors.  In fact, the changes in the regression test
results suggest strongly to me that it *is* breaking things.  Why are
there so many diffs in examples that include no URLs at all?

An issue that's nearly as bad is the 100% lack of documentation,
which makes the patch difficult to review because it's hard to tell
what it intends to accomplish or whether it's met the intent.
The patch is not committable without documentation anyway, but right
now I'm not sure it's even usefully reviewable.

In line with the lack of documentation, I would say that the choice of
the name "parttoken" for the new token type is not helpful.  Part of
what?  And none of the other token type names include the word "token",
so that's not a good decision either.  Possibly "url_part" would be a
suitable name.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Devrim GÜNDÜZ
Date:
Subject: pgxs docdir question
Next
From: Tom Lane
Date:
Subject: Re: pgxs docdir question