Peculiar performance observation.... - Mailing list pgsql-general

From Net Virtual Mailing Lists
Subject Peculiar performance observation....
Date
Msg-id 20050312105407.593@mail.net-virtual.com
Whole thread Raw
Responses Re: Peculiar performance observation....
List pgsql-general
I have a rather peculiar performance observation and would welcome any
feedback on this.....

First off, the main table (well, part of it.. it is quite large..):


                                             Table "table1"
       Column       |           Type           |
   Modifiers
--------------------+--------------------------
+-----------------------------------------------------------------
 id                 | integer                  | not null default
nextval('master.id_seq'::text)
 user_id            | integer                  |
 ... (skipping about 20 columns)
 category           | ltree[]                  |
 somedata           | text                     | not null



Indexes:
    "table1_pkey" primary key, btree (id)
    "table1_category_full_gist_idx" gist (category)
    "table1_id_idx" btree (id)
    "table1_fti_idx" gist (fti) WHERE ((status)::text = 'open'::text)
    "table1_user_id_idx" btree (user_id)


database=> explain analyze select id from table1 where category <@ 'a.b';
                                                                    QUERY
PLAN
-------------------------------------
-------------------------------------
-------------------------------------------------------------------------
 Index Scan using table1_category_full_gist_idx on jobdata
(cost=0.00..113.48 rows=28 width=4) (actual time=43.814..12201.528
rows=1943 loops=1)
   Index Cond: (category <@ 'a.b'::ltree)
   Filter: (category <@ 'a.b'::ltree)
 Total runtime: 12222.258 ms


If I do this:

create table yuck (id integer, category ltree[]);
insert into yuck select id, category from table1;
create index category_idx on yuck using gist(category);
vacuum analyze yuck;
jobs=> explain analyze select id from table1 where id in (select id from
yuck where category <@ 'a.b');
                                                              QUERY PLAN

-------------------------------------
-------------------------------------
-------------------------------------------------------------
 Nested Loop  (cost=108.64..114.28 rows=1 width=52) (actual
time=654.645..1245.212 rows=1943 loops=1)
   ->  HashAggregate  (cost=108.64..108.64 rows=1 width=4) (actual
time=654.202..690.709 rows=1943 loops=1)
         ->  Index Scan using category_idx on yuck  (cost=0.00..108.57
rows=28 width=4) (actual time=2.046..623.436 rows=1943 loops=1)
               Index Cond: (category <@ 'a.b'::ltree)
               Filter: (category <@ 'a.b'::ltree)
   ->  Index Scan using table1_pkey on jobdata  (cost=0.00..5.64 rows=1
width=52) (actual time=0.219..0.235 rows=1 loops=1943)
         Index Cond: (table1.id = "outer".id)
 Total runtime: 1261.551 ms
(8 rows)


In the first query, my hard disk trashes audibly the entire 12 seconds
(this is actually the best run I could get, it is usually closer to 20
seconds), the second query runs almost effortlessly..  I've tried
reindexing, even dropping the index and recreating it but nothing I do
helps at all.

Now keep in mind that I do all of my development on painfully slow
hardware in order to make any performance issues really stand out.  But,
I've done this on production servers too with an equal performance
improvement noticed.

I just can't figure out why this second query is so much faster, I feel
like I must have done something very wrong in my schema design or
something to be suffering this sort of a performance loss.   Any idea
what I can do about this?

Thanks as always!

- Greg


pgsql-general by date:

Previous
From: "ILove TheSpam"
Date:
Subject: Re: Unique Indexes
Next
From: "Net Virtual Mailing Lists"
Date:
Subject: Re: Peculiar performance observation....