Re: big data - slow select (speech search) - Mailing list pgsql-performance

From Michal Fapso
Subject Re: big data - slow select (speech search)
Date
Msg-id AANLkTimq2ap3WEQq1U8lMR0Sx8vfI8cXY1D69ouYrqno@mail.gmail.com
Whole thread Raw
In response to Re: big data - slow select (speech search)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: big data - slow select (speech search)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-performance
Hi Robert,

thank you for your help. I tried to cluster the table on
hyps_wordid_index and the query execution time dropped from 4.43 to
0.19 seconds which is not that far from Lucene's performance of 0.10
second.

Thanks a lot!
Miso Fapso

On 6 July 2010 02:25, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jul 1, 2010 at 6:34 PM, Michal Fapso <michal.fapso@gmail.com> wrote:
>> It took about 4.5 seconds. If I rerun it, it takes
>> less than 2 miliseconds, but it is because of the cache. I need to
>> optimize the first-run.
>>
>> laptop ASUS, CPU dual core T2300 1.66GHz, 1.5G RAM
>>
>> EXPLAIN ANALYZE SELECT h1.docid
>> FROM hyps AS h1
>> WHERE h1.wordid=65658;
>>
>>  Bitmap Heap Scan on hyps h1  (cost=10.97..677.09 rows=171 width=4)
>> (actual time=62.106..4416.864 rows=343 loops=1)
>>   Recheck Cond: (wordid = 65658)
>>   ->  Bitmap Index Scan on hyps_wordid_index  (cost=0.00..10.92
>> rows=171 width=0) (actual time=42.969..42.969 rows=343 loops=1)
>>         Index Cond: (wordid = 65658)
>>  Total runtime: 4432.015 ms
>>
>> If I run the same query in Lucene search engine, it takes 0.105
>> seconds on the same data which is quite a huge difference.
>
> So PostgreSQL is reading 343 rows from disk in 4432 ms, or about 12
> ms/row.  I'm not an expert on seek times, but that might not really be
> that unreasonable, considering that those rows may be scattered all
> over the index and thus it may be basically random I/O.  Have you
> tried clustering hyps on hyps_wordid_index?  If you had a more
> sophisticated disk subsystem you could try increasing
> effective_io_concurrency but that's not going to help with only one
> spindle.
>
> If you run the same query in Lucene and it takes only 0.105 s, then
> Lucene is obviously doing a lot less I/O.  I doubt that any amount of
> tuning of your existing schema is going to produce that kind of result
> on PostgreSQL.  Using the full-text search stuff, or a gin index of
> some kind, might get you closer, but it's hard to beat a
> special-purpose engine that implements exactly the right algorithm for
> your use case.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>

pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Two "equivalent" WITH RECURSIVE queries, one of them slow.
Next
From: damien hostin
Date:
Subject: Re: Slow query with planner row strange estimation