Home > mailing lists
Re: WIP: shared ispell dictionary - Mailing list pgsql-hackers

From	Pavel Stehule
Subject	Re: WIP: shared ispell dictionary
Date	March 18, 2010 12:08:56
Msg-id	162867791003180808p49a047cfj72d1d89ce5121d9e@mail.gmail.com Whole thread Raw
In response to	Re: WIP: shared ispell dictionary (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: WIP: shared ispell dictionary
List	pgsql-hackers
Tree view
2010/3/18 Tom Lane <tgl@sss.pgh.pa.us>:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> I know so Tom worries about using of share memory.
>
> You're right, and if I have any say in the matter no patch like this
> will ever go in.
>
> What I would suggest looking into is some way of preprocessing the raw
> text dictionary file into a format that can be slurped into memory
> quickly.  The main problem compared to the way things are done now
> is that the current internal format relies heavily on pointers.
> Maybe you could replace those by offsets?

You have to maintain a new application :( There can be a new kind of bugs.

I playing with preload solution now. And I found a new issue.

I don't know why, but when I preload library with large mem like
ispell, then all next operations are ten times slower :(

[pavel@nemesis tsearch]$ psql-dev3 postgres
Timing is on.
psql-dev3 (9.0devel)
Type "help" for help.

postgres=# select 10;?column?
----------      10
(1 row)

Time: 0,611 ms
postgres=# select 10;?column?
----------      10
(1 row)

Time: 0,277 ms
postgres=# select 10;?column?
----------      10
(1 row)

Time: 0,266 ms
postgres=# select 10;?column?
----------      10
(1 row)

Time: 0,348 ms
postgres=# select * from ts_debug('cs','Jmenuji se Pavel Stěhule a
bydlím ve Skalici');  alias   |    description    |  token  |       dictionaries        |  dictionary    |     lexemes
-----------+-------------------+---------+---------------------------+------------------+----------------asciiword |
Word,all ASCII   | Jmenuji | {preloaded_cspell,simple} | 
preloaded_cspell | {jmenovat}blank     | Space symbols     |         | {}                        |
|asciiword| Word, all ASCII   | se      | {preloaded_cspell,simple} | 
preloaded_cspell | {}blank     | Space symbols     |         | {}                        |                |asciiword |
Word,all ASCII   | Pavel   | {preloaded_cspell,simple} | 
preloaded_cspell | {pavel,pavla}blank     | Space symbols     |         | {}                        |
|word     | Word, all letters | Stěhule | {preloaded_cspell,simple} | 
preloaded_cspell | {stěhule}blank     | Space symbols     |         | {}                        |
|asciiword| Word, all ASCII   | a       | {preloaded_cspell,simple} | 
preloaded_cspell | {}blank     | Space symbols     |         | {}                        |                |word      |
Word,all letters | bydlím  | {preloaded_cspell,simple} | 
preloaded_cspell | {bydlet,bydlit}blank     | Space symbols     |         | {}                        |
|asciiword| Word, all ASCII   | ve      | {preloaded_cspell,simple} | 
preloaded_cspell | {}blank     | Space symbols     |         | {}                        |                |asciiword |
Word,all ASCII   | Skalici | {preloaded_cspell,simple} | 
preloaded_cspell | {skalice}
(15 rows)

Time: 24,495 ms
postgres=# select * from ts_debug('cs','Jmenuji se Pavel Stěhule a
bydlím ve Skalici');  alias   |    description    |  token  |       dictionaries        |  dictionary    |     lexemes
-----------+-------------------+---------+---------------------------+------------------+----------------asciiword |
Word,all ASCII   | Jmenuji | {preloaded_cspell,simple} | 
preloaded_cspell | {jmenovat}blank     | Space symbols     |         | {}                        |
|asciiword| Word, all ASCII   | se      | {preloaded_cspell,simple} | 
preloaded_cspell | {}blank     | Space symbols     |         | {}                        |                |asciiword |
Word,all ASCII   | Pavel   | {preloaded_cspell,simple} | 
preloaded_cspell | {pavel,pavla}blank     | Space symbols     |         | {}                        |
|word     | Word, all letters | Stěhule | {preloaded_cspell,simple} | 
preloaded_cspell | {stěhule}blank     | Space symbols     |         | {}                        |
|asciiword| Word, all ASCII   | a       | {preloaded_cspell,simple} | 
preloaded_cspell | {}blank     | Space symbols     |         | {}                        |                |word      |
Word,all letters | bydlím  | {preloaded_cspell,simple} | 
preloaded_cspell | {bydlet,bydlit}blank     | Space symbols     |         | {}                        |
|asciiword| Word, all ASCII   | ve      | {preloaded_cspell,simple} | 
preloaded_cspell | {}blank     | Space symbols     |         | {}                        |                |asciiword |
Word,all ASCII   | Skalici | {preloaded_cspell,simple} | 
preloaded_cspell | {skalice}
(15 rows)

...skipping...  alias   |    description    |  token  |       dictionaries        |  dictionary    |     lexemes
-----------+-------------------+---------+---------------------------+------------------+----------------asciiword |
Word,all ASCII   | Jmenuji | {preloaded_cspell,simple} | 
preloaded_cspell | {jmenovat}blank     | Space symbols     |         | {}                        |
|asciiword| Word, all ASCII   | se      | {preloaded_cspell,simple} | 
preloaded_cspell | {}blank     | Space symbols     |         | {}                        |                |asciiword |
Word,all ASCII   | Pavel   | {preloaded_cspell,simple} | 
preloaded_cspell | {pavel,pavla}blank     | Space symbols     |         | {}                        |
|word     | Word, all letters | Stěhule | {preloaded_cspell,simple} | 
preloaded_cspell | {stěhule}blank     | Space symbols     |         | {}                        |
|asciiword| Word, all ASCII   | a       | {preloaded_cspell,simple} | 
preloaded_cspell | {}blank     | Space symbols     |         | {}                        |                |word      |
Word,all letters | bydlím  | {preloaded_cspell,simple} | 
preloaded_cspell | {bydlet,bydlit}blank     | Space symbols     |         | {}                        |
|asciiword| Word, all ASCII   | ve      | {preloaded_cspell,simple} | 
preloaded_cspell | {}blank     | Space symbols     |         | {}                        |                |asciiword |
Word,all ASCII   | Skalici | {preloaded_cspell,simple} | 
preloaded_cspell | {skalice}
(15 rows)

~
~
~
Time: 18,426 ms
postgres=# select 10;?column?
----------      10
(1 row)

Time: 12,700 ms
postgres=# select 10;?column?
----------      10
(1 row)

Time: 12,465 ms
postgres=# select 10;?column?
----------      10
(1 row)

Time: 12,603 ms
postgres=# select 10;?column?
----------      10
(1 row)

Time: 12,901 ms
postgres=# select 10;?column?
----------      10
(1 row)

Time: 12,642 ms

When I reduce memory with simple allocator, then this issue is
removed, but it is strange.

Pavel


>
>                        regards, tom lane
>
pgsql-hackers by date:
From: Tom Lane
Date: 18 March 2010, 11:40:47
Subject: Re: WIP: shared ispell dictionary
From: Pavel Stehule
Date: 18 March 2010, 12:21:37
Subject: Re: WIP: shared ispell dictionary
Re: WIP: shared ispell dictionary - Mailing list pgsql-hackers

Previous

Next