Re: big data - slow select (speech search) - Mailing list pgsql-performance

From Robert Haas
Subject Re: big data - slow select (speech search)
Date
Msg-id AANLkTilwM_d9b_7jvYs2c9TrM5FJjxqk3LR8Vlfve4xj@mail.gmail.com
Whole thread Raw
In response to big data - slow select (speech search)  (Michal Fapso <michal.fapso@gmail.com>)
Responses Re: big data - slow select (speech search)  (Michal Fapso <michal.fapso@gmail.com>)
List pgsql-performance
On Thu, Jul 1, 2010 at 6:34 PM, Michal Fapso <michal.fapso@gmail.com> wrote:
> It took about 4.5 seconds. If I rerun it, it takes
> less than 2 miliseconds, but it is because of the cache. I need to
> optimize the first-run.
>
> laptop ASUS, CPU dual core T2300 1.66GHz, 1.5G RAM
>
> EXPLAIN ANALYZE SELECT h1.docid
> FROM hyps AS h1
> WHERE h1.wordid=65658;
>
>  Bitmap Heap Scan on hyps h1  (cost=10.97..677.09 rows=171 width=4)
> (actual time=62.106..4416.864 rows=343 loops=1)
>   Recheck Cond: (wordid = 65658)
>   ->  Bitmap Index Scan on hyps_wordid_index  (cost=0.00..10.92
> rows=171 width=0) (actual time=42.969..42.969 rows=343 loops=1)
>         Index Cond: (wordid = 65658)
>  Total runtime: 4432.015 ms
>
> If I run the same query in Lucene search engine, it takes 0.105
> seconds on the same data which is quite a huge difference.

So PostgreSQL is reading 343 rows from disk in 4432 ms, or about 12
ms/row.  I'm not an expert on seek times, but that might not really be
that unreasonable, considering that those rows may be scattered all
over the index and thus it may be basically random I/O.  Have you
tried clustering hyps on hyps_wordid_index?  If you had a more
sophisticated disk subsystem you could try increasing
effective_io_concurrency but that's not going to help with only one
spindle.

If you run the same query in Lucene and it takes only 0.105 s, then
Lucene is obviously doing a lot less I/O.  I doubt that any amount of
tuning of your existing schema is going to produce that kind of result
on PostgreSQL.  Using the full-text search stuff, or a gin index of
some kind, might get you closer, but it's hard to beat a
special-purpose engine that implements exactly the right algorithm for
your use case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

pgsql-performance by date:

Previous
From: MUHAMMAD ASIF
Date:
Subject: Re: using dbt2 postgresql 8.4 - rampup time issue
Next
From: Robert Haas
Date:
Subject: Re: Performance issues with postgresql-8.4.0: Query gets stuck sometimes