Re: patch: preload dictionary new version - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: patch: preload dictionary new version
Date
Msg-id AANLkTimgw4N_rNFpJANboidG9O5oCdkzGqdKHO0O2jCG@mail.gmail.com
Whole thread Raw
In response to Re: patch: preload dictionary new version  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
2010/7/8 Tom Lane <tgl@sss.pgh.pa.us>:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> 2010/7/8 Robert Haas <robertmhaas@gmail.com>:
>>> A precompiler can give you all the same memory management benefits.
>
>> I use mmap(). And with  mmap the precompiler are not necessary.
>> Dictionary is loaded only one time - in original ispell format. I
>> think, it is much more simple for administration - just copy ispell
>> files. There are not some possible problems with binary
>> incompatibility, you don't need to solve serialisation,
>> deserialiasation, ...you don't need to copy TSearch ispell parser code
>> to client application - probably we would to support not compiled
>> ispell dictionaries still. Using a precompiler means a new questions
>> for upgrade!
>
> You're inventing a bunch of straw men to attack.  There's no reason that
> a precompiler approach would have to put any new requirements on the
> user.  For example, the dictionary-load code could automatically execute
> the precompile step if it observed that the precompiled copy of the
> dictionary was missing or had an older file timestamp than the source.

uff - just safe activation of precompiler needs lot of low level code
- but maybe I see it wrong, and I doesn't work directly with files
inside pg. But I can't to see it as simple solution.

>
> I like the idea of a precompiler step mainly because it still gives you
> most of the benefits of the patch on platforms without mmap.  (Instead
> of mmap'ing, just open and read() the precompiled file.)  In particular,
> you would still have a creditable improvement for Windows users without
> writing any Windows-specific code.
>

the loading cca 10 MB takes on my comp cca 30 ms - it is better than
90ms, but it isn't a win.


>> I think we can divide this problem to three parts
>
>> a) simple allocator - it can be used not only for TSearch dictionaries.
>
> I think that's a waste of time, frankly.  There aren't enough potential
> use cases.
>
>> b) sharing a data - it is important for large dictionaries
>
> Useful but not really essential.
>
>> c) preloading - it decrease load time of first TSearch query
>
> This is the part that is the make-or-break benefit of the patch.
> You need a solution that cuts load time even when mmap isn't
> available.
>

I am not sure if this existing, and if it is necessary. Probably main
problem is with Czech language - we have a few specialities. For Czech
environment is UNIX and Windows platform the most important. I have
not information about using Postgres and Fulltext on other platforms
here. So, probably the solution doesn't need be core. I am thinking
about some pgfoundry project now - some like ispell dictionary
preload.

I can send only simplified version without preloading and sharing.
Just solving a memory issue - I think so there are not different
opinions.

best regards

Pavel Stehule

>                        regards, tom lane
>


pgsql-hackers by date:

Previous
From: KaiGai Kohei
Date:
Subject: Re: Bug? Concurrent COMMENT ON and DROP object
Next
From: Pavel Stehule
Date:
Subject: Re: patch (for 9.1) string functions