Thread: Has there been any discussion of custom dictionaries being defined inthe database?

Has there been any discussion of custom dictionaries being defined inthe database?

From
Morris de Oryx
Date:
I've been experimenting with the FTS features in Postgres over the past few days. Mind blow.

We're deployed on RDS, which does not give you any file system to access. I'd love to be able to create a custom thesaurus dictionary for our situation, which seems like it is impossible in a setup like ours.

Has there been any discussion of making dictionary configuration files accessible via a dictionary table instead of a physical, structured disk file? Or, alternatively, something that could be accessed remotely/externally as a URL or FDW?

Thanks for any comments.
Morris de Oryx <morrisdeoryx@gmail.com> writes:
> We're deployed on RDS, which does not give you any file system to access.
> I'd love to be able to create a custom thesaurus dictionary for our
> situation, which seems like it is impossible in a setup like ours.

> Has there been any discussion of making dictionary configuration files
> accessible via a dictionary table instead of a physical, structured disk
> file? Or, alternatively, something that could be accessed
> remotely/externally as a URL or FDW?

Nope.  TBH, I don't find this case terribly compelling.  You should be
beating up RDS for not letting you configure your DB the way you want.

            regards, tom lane



Fair.

Given that Amazon is bragging this week about turning off Oracle, it seems like they could kick some resources towards contributing something to the Postgres project. With that in mind, is the idea of defining dictionaries within a table somehow meritless, or unexpectedly difficult? 
Morris de Oryx <morrisdeoryx@gmail.com> writes:
> Given that Amazon is bragging this week about turning off Oracle, it seems
> like they could kick some resources towards contributing something to the
> Postgres project. With that in mind, is the idea of defining dictionaries
> within a table somehow meritless, or unexpectedly difficult?

Well, it'd just be totally different.  I don't think anybody cares to
provide two separate definitions of common dictionaries (which'd have to
somehow be kept in sync).

As for why we did it with external text files in the first place ---
for at least some of the dictionary types, the point is that you can
drop in data files that are available from upstream sources, without any
modification.  Getting the same info into a table would require some
nonzero amount of data transformation.

Having said that ... in the end a dictionary is really just a set of
functions implementing the dictionary API; where they get their data
from is their business.  So in theory you could roll your own
dictionary that gets its data out of a table.  But the dictionary API
would be pretty hard to implement except in C, and I bet RDS doesn't
let you install your own C functions either :-(

            regards, tom lane



Nope, no custom C installs. RDS is super convenient in many ways, but also limited. You can't, for example, run TimeScale, install RUM indexes (if those still work), or any novel plugins. And you can't do anything at all requiring a file reference. The backup features are outstanding. But, yeah, sometimes frustrating. 
On Thu, Oct 17, 2019 at 11:52:39AM +0200, Tom Lane wrote:

> Morris de Oryx <morrisdeoryx@gmail.com> writes:
> > Given that Amazon is bragging this week about turning off Oracle, it seems
> > like they could kick some resources towards contributing something to the
> > Postgres project. With that in mind, is the idea of defining dictionaries
> > within a table somehow meritless, or unexpectedly difficult?
>
> Well, it'd just be totally different.  I don't think anybody cares to
> provide two separate definitions of common dictionaries (which'd have to
> somehow be kept in sync).

Might crafty use of server side

    COPY TO ... PROGRAM ...

enable OP to drop in dictionary data files as needed ?

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



On Thu, Oct 17, 2019 at 11:52:39AM +0200, Tom Lane wrote:
>Morris de Oryx <morrisdeoryx@gmail.com> writes:
>> Given that Amazon is bragging this week about turning off Oracle, it seems
>> like they could kick some resources towards contributing something to the
>> Postgres project. With that in mind, is the idea of defining dictionaries
>> within a table somehow meritless, or unexpectedly difficult?
>
>Well, it'd just be totally different.  I don't think anybody cares to
>provide two separate definitions of common dictionaries (which'd have to
>somehow be kept in sync).
>
>As for why we did it with external text files in the first place ---
>for at least some of the dictionary types, the point is that you can
>drop in data files that are available from upstream sources, without any
>modification.  Getting the same info into a table would require some
>nonzero amount of data transformation.
>

IMHO being able to load dictionaries from a table would be quite
useful, and not just because of RDS. For example, it's not entirely true
we're just using the upstream dictionaries verbatim - it's quite common
to add new words, particularly in specialized fields. That's way easier
when you can do that through a table and not through a file.

>Having said that ... in the end a dictionary is really just a set of
>functions implementing the dictionary API; where they get their data
>from is their business.  So in theory you could roll your own
>dictionary that gets its data out of a table.  But the dictionary API
>would be pretty hard to implement except in C, and I bet RDS doesn't
>let you install your own C functions either :-(
>

Not sure. Of course, if we expect the dictionary to work just like the
ispell one, with preprocessing the dictionary into shmem, then that
requires C. I don't think that's entirely necessary, thoug - we could
use the table directly. Yes, that would be slower, but maybe it'd be
sufficient.

But I think the idea is ultimately that we'd implement a new dict type
in core, and people would just specify which table to load data from.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services