Thread: Fall back to alternative tsearch dictionary directory
Hello all, as recently mentioned on pg-general@, I am currently working on making installed myspell/unspell dictionary packages (which install themselves in /usr/share/myspell/dicts, mostly LATIN encoded) available to PostgreSQL's tsearch/word stemming in Debian/Ubuntu. So far I wrote the postgresql-common infrastructure to mangle these dictionary/affix files to become palatable for PostgreSQL (recoding to UTF-8, renaming to lowercase, changing file suffix) and install them into /var/cache/postgresql/dicts/ whenever a {hun,my}spell-* package is installed or updated. The remaining bit is teaching postgresql to actually look into /var/cache/postgresql/dicts/ if it does not find a matching dictionary/affix file in ${sharepath}/tsearch_data/. The reasons why I'm not using ${sharepath}/tsearch_data/ in the first place are that - it's autogenerated data, as opposed to files statically shipped in a package - I do not want to conflict to/overwrite files which the admin manually put there. I created an initial demo patch which provides this fallback. It works great, it passes my test cases (which set up tsearch full text search and stemming handling) and is pretty simple, too. However, the path is hardcoded so far, which is of course bad for upstream inclusion. So this should either become a ./configure option --with-tsearch-dict-fallback=path (or similar), or even a new optional configuration parameter for postgresql.conf. However, before I work on that, I'd like to collect some opinions about the general idea, and whether you prefer autoconf option or postgresql.conf, or whether you wouldn't accept it at all? Thanks a lot in advance! Martin -- Martin Pitt | http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)
Attachment
Martin Pitt <martin@piware.de> writes: > So far I wrote the postgresql-common infrastructure to mangle these > dictionary/affix files to become palatable for PostgreSQL (recoding to > UTF-8, renaming to lowercase, changing file suffix) and install them > into /var/cache/postgresql/dicts/ whenever a {hun,my}spell-* package > is installed or updated. > The remaining bit is teaching postgresql to actually look into > /var/cache/postgresql/dicts/ if it does not find a matching > dictionary/affix file in ${sharepath}/tsearch_data/. I can't see any reason whatever to not put them into ${sharepath}/tsearch_data/. It's not like you're expecting to be able to share them with other applications. > The reasons why I'm not using ${sharepath}/tsearch_data/ in the first > place are that > - it's autogenerated data, as opposed to files statically shipped in > a package > - I do not want to conflict to/overwrite files which the admin > manually put there. Seems like it'd be quite sufficient to choose a specialized naming policy within tsearch_data, say es_ES.aff -> system_es_es.aff. I don't think moving stuff into a different subdirectory makes conflicts a non-problem; it just means that half the world will be unhappy with the search order you chose. regards, tom lane
Hi Tom, Tom Lane [2008-12-01 19:51 -0500]: > I can't see any reason whatever to not put them into > ${sharepath}/tsearch_data/. It's not like you're expecting to be > able to share them with other applications. No, not for sharing. I just don't like them to be in /usr, but that's by and large a stylistic preference, and I won't dwell on it. > Seems like it'd be quite sufficient to choose a specialized naming > policy within tsearch_data, say es_ES.aff -> system_es_es.aff. Works for me, too. > I don't think moving stuff into a different subdirectory makes > conflicts a non-problem; it just means that half the world will be > unhappy with the search order you chose. IMHO there is really just one sensible ordering here. Always prefer the ones installed by hand, and only if they are not present, fall back to the system defaults. The other way around would mean that the admin couldn't do local overriding any more. Thanks, Martin --=20 Martin Pitt | http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)
Tom Lane [2008-12-01 19:51 -0500]: > I can't see any reason whatever to not put them into > ${sharepath}/tsearch_data/. It's not like you're expecting to be > able to share them with other applications. Oh, forgot yesterday, there is one case: the data can be shared between the 8.3, 8.4, and any future version. (In Debian/Ubuntu you can install different 8.x versions in parallel) But that can easily be achieved in the distro packaging by adding symlinks, so if you prefer just looking for ${sharedir}/tsearch_data/system_ll_cc.affix, that would still work for me. Thanks! Martin -- Martin Pitt | http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)
Martin Pitt [2008-12-02 5:29 -0800]: > Tom Lane [2008-12-01 19:51 -0500]: > > I can't see any reason whatever to not put them into > > ${sharepath}/tsearch_data/. It's not like you're expecting to be > > able to share them with other applications. > > Oh, forgot yesterday, there is one case: the data can be shared > between the 8.3, 8.4, and any future version. (In Debian/Ubuntu you > can install different 8.x versions in parallel) > > But that can easily be achieved in the distro packaging by adding > symlinks, so if you prefer just looking for > ${sharedir}/tsearch_data/system_ll_cc.affix, that would still work for > me. Right, so I changed the patch accordingly. Thanks, Martin -- Martin Pitt | http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)
Attachment
Uh, would someone eyeball and apply this? Thanks. --------------------------------------------------------------------------- Martin Pitt wrote: -- Start of PGP signed section. > Martin Pitt [2008-12-02 5:29 -0800]: > > Tom Lane [2008-12-01 19:51 -0500]: > > > I can't see any reason whatever to not put them into > > > ${sharepath}/tsearch_data/. It's not like you're expecting to be > > > able to share them with other applications. > > > > Oh, forgot yesterday, there is one case: the data can be shared > > between the 8.3, 8.4, and any future version. (In Debian/Ubuntu you > > can install different 8.x versions in parallel) > > > > But that can easily be achieved in the distro packaging by adding > > symlinks, so if you prefer just looking for > > ${sharedir}/tsearch_data/system_ll_cc.affix, that would still work for > > me. > > Right, so I changed the patch accordingly. > > Thanks, > > Martin > -- > Martin Pitt | http://www.piware.de > Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org) [ Attachment, skipping... ] -- End of PGP section, PGP failed! -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Uh, would someone eyeball and apply this? Thanks. I thought we had come to the conclusion that no patch was needed because there's no convincing reason to look anyplace except ${sharepath}/tsearch_data/. regards, tom lane
Hi Tom, Tom Lane [2009-01-14 20:56 -0500]: > Bruce Momjian <bruce@momjian.us> writes: > > Uh, would someone eyeball and apply this? Thanks. > > I thought we had come to the conclusion that no patch was needed > because there's no convincing reason to look anyplace except > ${sharepath}/tsearch_data/. That's what the current patch does now: It falls back to system_basename.extension if there is no basename.extension. This avoids overwriting the admin's own installed dictionaries with automatically generated ones, and allows telling apart the ones that the system can update automatically (system_) from the ones that we should not touch (without system_ prefix). Martin -- Martin Pitt | http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)