Thread: GIN index not used

GIN index not used

From
Mark
Date:
I have problem with GIN index. Queries over it takes a lot of time. Some
informations:

I've got a table with tsvector- textvector:
CREATE TABLE mediawiki.pagecontent
(
  old_id integer NOT NULL DEFAULT
nextval('mediawiki.text_old_id_seq'::regclass),
  old_text text,
  old_flags text,
  textvector tsvector,
  CONSTRAINT pagecontent_pkey PRIMARY KEY (old_id)
)
The table has about 311 000 rows.
I've created GIN index over textvector:
CREATE INDEX gin_index2 ON mediawiki.pagecontent USING gin (textvector);

At first all was ok, but last days I find out that the queries take a lot of
time (10sec and more). When I start EXPLAIN ANALIZE of the query
SELECT old_id FROM mediawiki.pagecontent WHERE (textvector @@
(to_tsquery('den')))
the result is this:
"Bitmap Heap Scan on pagecontent  (cost=8677.26..26663.01 rows=9037 width=4)
(actual time=8.258..8131.677 rows=9093 loops=1)"
"  Recheck Cond: (textvector @@ to_tsquery('den'::text))"
"  ->  Bitmap Index Scan on gin_index2  (cost=0.00..8675.00 rows=9037
width=0) (actual time=6.002..6.002 rows=9093 loops=1)"
"        Index Cond: (textvector @@ to_tsquery('den'::text))"
"Total runtime: 8150.949 ms"

It seems that the GIN index was not used. At first I though, that there is
some mess in the db, cause I provided some experiments(delete and import
data and GIN index), so I used vacuum, afterwards I tried pg_dump and
pg_restore, but it did not help.
Could you please point me in the right direction, where could be the
problem?

--
View this message in context: http://postgresql.1045698.n5.nabble.com/GIN-index-not-used-tp4344822p4344822.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: GIN index not used

From
Alban Hertroys
Date:
On 27 Apr 2011, at 22:55, Mark wrote:

> I've got a table with tsvector- textvector:
> CREATE TABLE mediawiki.pagecontent
> (
>  old_id integer NOT NULL DEFAULT
> nextval('mediawiki.text_old_id_seq'::regclass),
>  old_text text,
>  old_flags text,
>  textvector tsvector,
>  CONSTRAINT pagecontent_pkey PRIMARY KEY (old_id)
> )
> The table has about 311 000 rows.
> I've created GIN index over textvector:
> CREATE INDEX gin_index2 ON mediawiki.pagecontent USING gin (textvector);

> When I start EXPLAIN ANALIZE of the query

> SELECT old_id FROM mediawiki.pagecontent WHERE (textvector @@
> (to_tsquery('den')))
> the result is this:
> "Bitmap Heap Scan on pagecontent  (cost=8677.26..26663.01 rows=9037 width=4)
> (actual time=8.258..8131.677 rows=9093 loops=1)"
> "  Recheck Cond: (textvector @@ to_tsquery('den'::text))"
> "  ->  Bitmap Index Scan on gin_index2  (cost=0.00..8675.00 rows=9037
> width=0) (actual time=6.002..6.002 rows=9093 loops=1)"
> "        Index Cond: (textvector @@ to_tsquery('den'::text))"
> "Total runtime: 8150.949 ms"
>
> It seems that the GIN index was not used.

But it _is_ being used. It takes almost 1ms per row it finds though, I wonder why that is. Maybe you're using slow
hardwareor you didn't provide PG with enough resources to work with efficiently? 

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4db8fc3811671483513299!



Re: GIN index not used

From
Mark
Date:
Alban thanks for your quick reply.
It is true that I use for this only 2,5GB RAM on Intel Core i5 CPU 2.67GHz
and resources I didn't changed from instalation of postgres:
max_connections = 100
shared_buffers = 32MB
(other parameters are commented)
, but that would not be the reason I think.

I was maybe wrong to explain. As I said at first all was ok. That means that
after creating the index one query took less time.
For example query over:
tsquery(liquid & water) - 38ms, 219 results

but now it is like this:
tsquery(liquid & water) - 2859ms, 219 results

I though that it was caused by some "magic" over this table (I have deleted
and inserted data and index again).
That is why I tried mentioned vacuum and pg_dump with pg_restore.
Could it be caused by this?


--
View this message in context: http://postgresql.1045698.n5.nabble.com/GIN-index-not-used-tp4344826p4345976.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: GIN index not used

From
Alban Hertroys
Date:
On 28 Apr 2011, at 10:07, Mark wrote:

> Alban thanks for your quick reply.
> It is true that I use for this only 2,5GB RAM on Intel Core i5 CPU 2.67GHz
> and resources I didn't changed from instalation of postgres:
> max_connections = 100
> shared_buffers = 32MB
> (other parameters are commented)
> , but that would not be the reason I think.

It probably is, the default Postgres settings are quite modest and GIN indexes are memory hungry.
I think you need to increase shared_buffers. With 2.5GB of memory (such a strange number) the docs suggest about 250MB.
See http://www.postgresql.org/docs/current/static/runtime-config-resource.html for details.

> I was maybe wrong to explain. As I said at first all was ok. That means that
> after creating the index one query took less time.
> For example query over:
> tsquery(liquid & water) - 38ms, 219 results
>
> but now it is like this:
> tsquery(liquid & water) - 2859ms, 219 results

Presumably the index did fit in your available shared buffers before your delete/insert actions, but now it doesn't
anymore(which possibly means there are still dead rows referenced by the index). I'm also not sure why you're getting a
bitmapindex scan instead of a normal index scan, maybe that's normal for GIN indexes (considering what I know about how
theywork internally, that seems plausible). 

What version of Postgres is this?
If it's 9.0, you could try VACUUM FULL (before you increase shared_buffers) and see whether your performance is back
whereit was, which would indicate that dead rows are the cause. In older versions you could as well, but you'll want to
REINDEXafter that. That's not a solution though, only a test ;) 

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4db987e912122982898556!



Re: GIN index not used

From
Alban Hertroys
Date:
On 28 Apr 2011, at 17:29, Alban Hertroys wrote:
> With 2.5GB of memory (such a strange number) the docs suggest about 250MB.

Correction, 25% of 2.5GB isn't 250MB of course. It would be somewhat over 500MB, although it's really just a
rule-of-thumb(no point in calculating exact numbers). Anyway, much more than you've currently configured, so even 250MB
wouldprobably show a significant improvement. 

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4db98c8412122209668288!



Re: GIN index not used

From
Mark
Date:
Alban thank for your ideas
> It probably is, the default Postgres settings are quite modest and GIN
> indexes are memory hungry.
> I think you need to increase shared_buffers. With 2.5GB of memory (such a
> strange number) the docs        > suggest about 250MB.
> See
> http://www.postgresql.org/docs/current/static/runtime-config-resource.html
> for details.
Well I did not mentioned that it all runs on Virtual Machine so thats why
2.5GB. I could not assigned more.
Today I have tried the VM machine on PC where I assigned 9GB of memory to
this VM. There I set the shared_buffers on 900MB, but there was not any
improvement.

> What version of Postgres is this?
I use 9.0 and I also tried Vacuum FULL, but it did not help.
It is very strange. I have tried everythink, but nothing helped :-(
Any other ideas?
--
View this message in context: http://postgresql.1045698.n5.nabble.com/GIN-index-not-used-tp4344826p4359235.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: GIN index not used

From
Mark
Date:
Alban thank for your ideas
> It probably is, the default Postgres settings are quite modest and GIN
> indexes are memory hungry.
> I think you need to increase shared_buffers. With 2.5GB of memory (such a
> strange number) the docs        > suggest about 250MB.
> See
> http://www.postgresql.org/docs/current/static/runtime-config-resource.html
> for details.
Well I did not mentioned that it all runs on Virtual Machine so that's why
2.5GB. I could not assigned more.
Today I have tried the VM machine on PC where I assigned 9GB of memory to
this VM. There I set the shared_buffers on 900MB. You were right it wa's
definitely caused by the size of memory.
I do not understand why before it was such quick even, if I had less memory.

Thanks very much for your help :-)--
View this message in context: http://postgresql.1045698.n5.nabble.com/GIN-index-not-used-tp4344826p4361529.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.