Re: Has there been any discussion of custom dictionaries beingdefined in the database? - Mailing list pgsql-general

From Tomas Vondra
Subject Re: Has there been any discussion of custom dictionaries beingdefined in the database?
Date
Msg-id 20191019130826.usuxx5k7rhwmmnr5@development
Whole thread Raw
In response to Re: Has there been any discussion of custom dictionaries being defined in the database?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Thu, Oct 17, 2019 at 11:52:39AM +0200, Tom Lane wrote:
>Morris de Oryx <morrisdeoryx@gmail.com> writes:
>> Given that Amazon is bragging this week about turning off Oracle, it seems
>> like they could kick some resources towards contributing something to the
>> Postgres project. With that in mind, is the idea of defining dictionaries
>> within a table somehow meritless, or unexpectedly difficult?
>
>Well, it'd just be totally different.  I don't think anybody cares to
>provide two separate definitions of common dictionaries (which'd have to
>somehow be kept in sync).
>
>As for why we did it with external text files in the first place ---
>for at least some of the dictionary types, the point is that you can
>drop in data files that are available from upstream sources, without any
>modification.  Getting the same info into a table would require some
>nonzero amount of data transformation.
>

IMHO being able to load dictionaries from a table would be quite
useful, and not just because of RDS. For example, it's not entirely true
we're just using the upstream dictionaries verbatim - it's quite common
to add new words, particularly in specialized fields. That's way easier
when you can do that through a table and not through a file.

>Having said that ... in the end a dictionary is really just a set of
>functions implementing the dictionary API; where they get their data
>from is their business.  So in theory you could roll your own
>dictionary that gets its data out of a table.  But the dictionary API
>would be pretty hard to implement except in C, and I bet RDS doesn't
>let you install your own C functions either :-(
>

Not sure. Of course, if we expect the dictionary to work just like the
ispell one, with preprocessing the dictionary into shmem, then that
requires C. I don't think that's entirely necessary, thoug - we could
use the table directly. Yes, that would be slower, but maybe it'd be
sufficient.

But I think the idea is ultimately that we'd implement a new dict type
in core, and people would just specify which table to load data from.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-general by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: releasing space
Next
From: Dmitry Dolgov
Date:
Subject: Re: jsonb_set() strictness considered harmful to data