Thread: Fall back to alternative tsearch dictionary directory

Fall back to alternative tsearch dictionary directory

From
Martin Pitt
Date:
Hello all,

as recently mentioned on pg-general@, I am currently working on making
installed myspell/unspell dictionary packages (which install
themselves in /usr/share/myspell/dicts, mostly LATIN encoded)
available to PostgreSQL's tsearch/word stemming in Debian/Ubuntu.

So far I wrote the postgresql-common infrastructure to mangle these
dictionary/affix files to become palatable for PostgreSQL (recoding to
UTF-8, renaming to lowercase, changing file suffix) and install them
into /var/cache/postgresql/dicts/ whenever a {hun,my}spell-* package
is installed or updated.

The remaining bit is teaching postgresql to actually look into
/var/cache/postgresql/dicts/ if it does not find a matching
dictionary/affix file in ${sharepath}/tsearch_data/.

The reasons why I'm not using ${sharepath}/tsearch_data/ in the first
place are that

 - it's autogenerated data, as opposed to files statically shipped in
   a package

 - I do not want to conflict to/overwrite files which the admin
   manually put there.

I created an initial demo patch which provides this fallback. It works
great, it passes my test cases (which set up tsearch full text search
and stemming handling) and is pretty simple, too.

However, the path is hardcoded so far, which is of course bad for
upstream inclusion. So this should either become a ./configure option
--with-tsearch-dict-fallback=path (or similar), or even a new optional
configuration parameter for postgresql.conf.

However, before I work on that, I'd like to collect some opinions
about the general idea, and whether you prefer autoconf option or
postgresql.conf, or whether you wouldn't accept it at all?

Thanks a lot in advance!

Martin

--
Martin Pitt                        | http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)

Attachment

Re: Fall back to alternative tsearch dictionary directory

From
Tom Lane
Date:
Martin Pitt <martin@piware.de> writes:
> So far I wrote the postgresql-common infrastructure to mangle these
> dictionary/affix files to become palatable for PostgreSQL (recoding to
> UTF-8, renaming to lowercase, changing file suffix) and install them
> into /var/cache/postgresql/dicts/ whenever a {hun,my}spell-* package
> is installed or updated.

> The remaining bit is teaching postgresql to actually look into
> /var/cache/postgresql/dicts/ if it does not find a matching
> dictionary/affix file in ${sharepath}/tsearch_data/.

I can't see any reason whatever to not put them into
${sharepath}/tsearch_data/.  It's not like you're expecting to be
able to share them with other applications.

> The reasons why I'm not using ${sharepath}/tsearch_data/ in the first
> place are that
>  - it's autogenerated data, as opposed to files statically shipped in
>    a package
>  - I do not want to conflict to/overwrite files which the admin
>    manually put there.

Seems like it'd be quite sufficient to choose a specialized naming
policy within tsearch_data, say es_ES.aff -> system_es_es.aff.  I don't
think moving stuff into a different subdirectory makes conflicts a
non-problem; it just means that half the world will be unhappy with the
search order you chose.

            regards, tom lane

Re: Fall back to alternative tsearch dictionary directory

From
Martin Pitt
Date:
Hi Tom,

Tom Lane [2008-12-01 19:51 -0500]:
> I can't see any reason whatever to not put them into
> ${sharepath}/tsearch_data/.  It's not like you're expecting to be
> able to share them with other applications.

No, not for sharing. I just don't like them to be in /usr, but that's
by and large a stylistic preference, and I won't dwell on it.

> Seems like it'd be quite sufficient to choose a specialized naming
> policy within tsearch_data, say es_ES.aff -> system_es_es.aff.

Works for me, too.

> I don't think moving stuff into a different subdirectory makes
> conflicts a non-problem; it just means that half the world will be
> unhappy with the search order you chose.

IMHO there is really just one sensible ordering here. Always prefer
the ones installed by hand, and only if they are not present, fall
back to the system defaults. The other way around would mean that the
admin couldn't do local overriding any more.

Thanks,

Martin

--=20
Martin Pitt                        | http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)

Re: Fall back to alternative tsearch dictionary directory

From
Martin Pitt
Date:
Tom Lane [2008-12-01 19:51 -0500]:
> I can't see any reason whatever to not put them into
> ${sharepath}/tsearch_data/.  It's not like you're expecting to be
> able to share them with other applications.

Oh, forgot yesterday, there is one case: the data can be shared
between the 8.3, 8.4, and any future version. (In Debian/Ubuntu you
can install different 8.x versions in parallel)

But that can easily be achieved in the distro packaging by adding
symlinks, so if you prefer just looking for
${sharedir}/tsearch_data/system_ll_cc.affix, that would still work for
me.

Thanks!

Martin

--
Martin Pitt                        | http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)

Re: Fall back to alternative tsearch dictionary directory

From
Martin Pitt
Date:
Martin Pitt [2008-12-02  5:29 -0800]:
> Tom Lane [2008-12-01 19:51 -0500]:
> > I can't see any reason whatever to not put them into
> > ${sharepath}/tsearch_data/.  It's not like you're expecting to be
> > able to share them with other applications.
>
> Oh, forgot yesterday, there is one case: the data can be shared
> between the 8.3, 8.4, and any future version. (In Debian/Ubuntu you
> can install different 8.x versions in parallel)
>
> But that can easily be achieved in the distro packaging by adding
> symlinks, so if you prefer just looking for
> ${sharedir}/tsearch_data/system_ll_cc.affix, that would still work for
> me.

Right, so I changed the patch accordingly.

Thanks,

Martin
--
Martin Pitt                        | http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)

Attachment

Re: Fall back to alternative tsearch dictionary directory

From
Bruce Momjian
Date:
Uh, would someone eyeball and apply this?  Thanks.

---------------------------------------------------------------------------

Martin Pitt wrote:
-- Start of PGP signed section.
> Martin Pitt [2008-12-02  5:29 -0800]:
> > Tom Lane [2008-12-01 19:51 -0500]:
> > > I can't see any reason whatever to not put them into
> > > ${sharepath}/tsearch_data/.  It's not like you're expecting to be
> > > able to share them with other applications.
> >
> > Oh, forgot yesterday, there is one case: the data can be shared
> > between the 8.3, 8.4, and any future version. (In Debian/Ubuntu you
> > can install different 8.x versions in parallel)
> >
> > But that can easily be achieved in the distro packaging by adding
> > symlinks, so if you prefer just looking for
> > ${sharedir}/tsearch_data/system_ll_cc.affix, that would still work for
> > me.
>
> Right, so I changed the patch accordingly.
>
> Thanks,
>
> Martin
> --
> Martin Pitt                        | http://www.piware.de
> Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)

[ Attachment, skipping... ]
-- End of PGP section, PGP failed!

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Fall back to alternative tsearch dictionary directory

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Uh, would someone eyeball and apply this?  Thanks.

I thought we had come to the conclusion that no patch was needed
because there's no convincing reason to look anyplace except
${sharepath}/tsearch_data/.

            regards, tom lane

Re: Fall back to alternative tsearch dictionary directory

From
Martin Pitt
Date:
Hi Tom,

Tom Lane [2009-01-14 20:56 -0500]:
> Bruce Momjian <bruce@momjian.us> writes:
> > Uh, would someone eyeball and apply this?  Thanks.
>
> I thought we had come to the conclusion that no patch was needed
> because there's no convincing reason to look anyplace except
> ${sharepath}/tsearch_data/.

That's what the current patch does now: It falls back to
system_basename.extension if there is no basename.extension. This
avoids overwriting the admin's own installed dictionaries with
automatically generated ones, and allows telling apart the ones that
the system can update automatically (system_) from the ones that we
should not touch (without system_ prefix).

Martin

--
Martin Pitt                        | http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)