Re: Preliminary patch for tsearch example dictionaries/parsers in contrib - Mailing list pgsql-patches

From karpov@sao.ru (Sergey V. Karpov)
Subject Re: Preliminary patch for tsearch example dictionaries/parsers in contrib
Date
Msg-id 87odf838tg.fsf@tigris.sai.msu.ru
Whole thread Raw
In response to Preliminary patch for tsearch example dictionaries/parsers in contrib  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Preliminary patch for tsearch example dictionaries/parsers in contrib  (Magnus Hagander <magnus@hagander.net>)
Re: Preliminary patch for tsearch example dictionaries/parsers in contrib  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-patches
Hi Tom,

Thank you for starting the discussion

> Given all the flap about txid, this surely mustn't go in without public
> review first ;-).  So, here is a submission from Sergey Karpov to fill
> in the lack of any working code examples for user-written tsearch
> parsers and dictionaries.
>
> I will be mostly off-line for the next day or so and don't have time to
> work on this more now, but here are a few comments:
>
> * It seems a bit odd to put multiple independent contrib modules under a
> single subfolder.  I'd be inclined to drop the ts_pack layer and just
> make the dictionaries and parser be top-level contrib modules.

Yes, I understand your position, as well as Magnus' complaints. However,
putting all the code to its own contribs is not the best solution, as
the majority of it is no more than examples. dict_regex, on the
contrary, is an add-on very useful in some situations (and we actually
use in in our projects). Also, its requirements differ from the rest of
the dictionaries, see below.

So, what about the following layout:

 - contrib/ts_examples - single module which contains all the example
   stuff in a single folder, to be built together

 - contrib/dict_regex - separate contrib

> * Depending on PCRE, when we have an at-least-equally-good regex engine
> built in, is silly.  It's an unnecessary dependency and to the (minor)
> extent that the regex syntax is different, we'd have to document the
> discrepancies.

Built-in regex engine seems to not support the one feature critical to
the dict_regex operation - it is not able to report the "partial match"
in a case when the matching fails solely due to premature end of input
string (i.e. when matching may possibly succeed after adding some data
to the string).

If it is possible to achieve this behaviour with built-in engine, please
point me to the right direction.

> * dict_regex is not nearly up to speed on encoding or locale issues.
> I didn't look at the other ones too closely, they may or may not need
> similar adjustments.
>
> * Allowing config files to be read from anywhere is not acceptable.
> We have dealt with this in the core code and the contrib examples
> *must* follow the same rules.

Is it necessary to require this behaviour from each contrib module? They
are not core code, and usually solve application-level tasks - is it
optimal to store the application config files in postgres tree?

Also, these dictionaries need some example config files at the
regression test time, and these configs are of no sense for anyone - is
it good to pullute the system tree with them?

On the other hand, to prevent reading arbitrary files we may require the
specific header line which identifies these dictionary configs.

> * The whole "utils" part of dict_regex should probably go away; it
> is reinventing wheels that already exist in the Postgres backend
> environment.  Since these are meant to be code examples, they should
> show the best ways of doing things within Postgres.

Yes, you are right. I'll rewrite it using StringInfo (the "official"
string-handling layer, right?).

Sincerely your,

Sergey Karpov


pgsql-patches by date:

Previous
From: "Magnus Hagander"
Date:
Subject: Re: Preliminary patch for tsearch example dictionaries/pa rsers in contrib
Next
From: Magnus Hagander
Date:
Subject: Re: Preliminary patch for tsearch example dictionaries/parsers in contrib