Thread: GiST or GIN, I feel like I am doing something wrong
We are testing full text searching on a small chunk of our data. We have created an INDEX to make searching faster. From the PostgreSQL 8.3 docs, we are running 8.3.7, it seems we should be running GIN indexes. The reason GIN on paper seems like the right INDEX: * we have static data * we have over 241071 unique words (lexemes) * GIN index lookups are about three times faster and we are 99.9% searching The problem is that we have been testing with both INDEX types and GiST is killing GIN. I believe it has to do with the size of our GiST index. SELECT * from relation_size where relation like '%full%'; relation | size --------------------------------------------+-------- public.profile_images_fulltext_gin | 437 MB public.profile_images_fulltext | 161 MB public.profile_images_fulltext_gist | 66 MB public.profile_images_fulltext_pif_key_key | 18 MB (4 rows) So my questions... Why is the GiST index so large? Would the large size likely effect performance? Am I doing something fundamentally wrong? Yes I was sure to ANALYZE public.profile_images_fulltext between all my INDEX DROP and CREATE More details: owl=# \d profile_images_fulltext Table "public.profile_images_fulltext" Column | Type | Modifiers ------------------+-----------------------------+--------------- pif_key | bigint | not null content | tsvector | datetime_created | timestamp without time zone | default now() raw | text | owl=# SELECT count (pif_key) from public.profile_images_fulltext; count -------- 630699 (1 row) owl=# SELECT count(word) FROM ts_stat('SELECT content FROM profile_images_fulltext'); count -------- 241071 (1 row) CREATE INDEX profile_images_fulltext_gin ON profile_images_fulltext USING gin(content); CREATE INDEX profile_images_fulltext_gist ON profile_images_fulltext USING gist(content); Any and all thoughts would be greatly appreciated, Aaron Thul http://www.chasingnuts.com Life is complex: it has real and imaginary components.
Aaron <aaron@chasingnuts.com> writes: > CREATE INDEX profile_images_fulltext_gin ON profile_images_fulltext > USING gin(content); > CREATE INDEX profile_images_fulltext_gist ON profile_images_fulltext > USING gist(content); What did you have maintenance_work_mem set to while you did this? GIST doesn't care, but GIN likes to have lots of workspace while building an index. I'm not entirely sure if small workspace only affects build time, or if it could result in a bloated/inefficient index ... but if the latter is true it might explain your results. regards, tom lane
Tom, Our maintenance_work_mem is 1024MB so there should have been plenty of memory for INDEX creation. I happened to be watching top when we created the GiN INDEX and the process used about 500MB of non-shared memory. Aaron Thul http://www.chasingnuts.com On Wed, Jun 17, 2009 at 11:30 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Aaron <aaron@chasingnuts.com> writes: >> CREATE INDEX profile_images_fulltext_gin ON profile_images_fulltext >> USING gin(content); >> CREATE INDEX profile_images_fulltext_gist ON profile_images_fulltext >> USING gist(content); > > What did you have maintenance_work_mem set to while you did this? > GIST doesn't care, but GIN likes to have lots of workspace while > building an index. I'm not entirely sure if small workspace only > affects build time, or if it could result in a bloated/inefficient > index ... but if the latter is true it might explain your results. > > regards, tom lane >
Aaron, did you actually check performance of search in both cases ? GiST index can be small but very inefficient, since top-level signatures can be degenerated, so we just remove them. It's easy to see that looking in explain analyze - see difference between number of rows found by index and actual number after recheck. And remember, recheck needs access to the heap, which is slow and can kill performance. GIN is big, but it should be fine for your setup. Also, see explain analyze. Oleg On Wed, 17 Jun 2009, Aaron wrote: > We are testing full text searching on a small chunk of our data. We > have created an INDEX to make searching faster. From the PostgreSQL > 8.3 docs, we are running 8.3.7, it seems we should be running GIN > indexes. The reason GIN on paper seems like the right INDEX: > * we have static data > * we have over 241071 unique words (lexemes) > * GIN index lookups are about three times faster and we are 99.9% searching > > The problem is that we have been testing with both INDEX types and > GiST is killing GIN. I believe it has to do with the size of our GiST > index. > SELECT * from relation_size where relation like '%full%'; > relation | size > --------------------------------------------+-------- > public.profile_images_fulltext_gin | 437 MB > public.profile_images_fulltext | 161 MB > public.profile_images_fulltext_gist | 66 MB > public.profile_images_fulltext_pif_key_key | 18 MB > (4 rows) > > So my questions... > Why is the GiST index so large? > Would the large size likely effect performance? > Am I doing something fundamentally wrong? > Yes I was sure to ANALYZE public.profile_images_fulltext between all > my INDEX DROP and CREATE > > More details: > owl=# \d profile_images_fulltext > Table "public.profile_images_fulltext" > Column | Type | Modifiers > ------------------+-----------------------------+--------------- > pif_key | bigint | not null > content | tsvector | > datetime_created | timestamp without time zone | default now() > raw | text | > > owl=# SELECT count (pif_key) from public.profile_images_fulltext; > count > -------- > 630699 > (1 row) > > owl=# SELECT count(word) FROM ts_stat('SELECT content FROM > profile_images_fulltext'); > count > -------- > 241071 > (1 row) > > CREATE INDEX profile_images_fulltext_gin ON profile_images_fulltext > USING gin(content); > CREATE INDEX profile_images_fulltext_gist ON profile_images_fulltext > USING gist(content); > > > > Any and all thoughts would be greatly appreciated, > Aaron Thul > http://www.chasingnuts.com > Life is complex: it has real and imaginary components. > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83