Re: Peculiar performance observation.... - Mailing list pgsql-general

From Net Virtual Mailing Lists
Subject Re: Peculiar performance observation....
Date
Msg-id 20050315031431.1274@mail.net-virtual.com
Whole thread Raw
In response to Re: Peculiar performance observation....  (Scott Marlowe <smarlowe@g2switchworks.com>)
Responses Re: Peculiar performance observation....
List pgsql-general
>On Mon, 2005-03-14 at 06:11, Net Virtual Mailing Lists wrote:
>> Hello,
>>
>>
>> I am sorry to bring this up again.... Does anyone have any idea what
>> might be going on here?...   I'm very worried about this situation.. ;-(
>
>It looks to me like either you're not analyzing often enough, or your
>statistics target is too low to get a good sample.  Note your estimated
>versus real rows are off by a factor of 70 (28 est. versus 1943 actual
>rows). That's a pretty big difference, and where you should be looking.
>
>> >   ->  Seq Scan on jobdata  (cost=0.00..7567.88 rows=28 width=52) (actual
>> >time=11.498..4800.907 rows=1943 loops=1)
>
>Yes, this is because PostgreSQL is using an index to approximate a
>sequential scan, which is not a good thing since PostgreSQL can't get
>all the information it needs from just an index, but has to visit the
>table to check visibility.
>


All of these were after a vacuum full analyze, which I actually do
nightly on the database.

I probably confused the issue with all of my posts, this is the query
which has me concerned.  When running it on my system here, the disk
thrashes (and I mean *THRASHES*) the entire 12-20 seconds it takes to
run...  WHen running on our production servers, I can't hear the disk,
but see an equally troubling performance loss when using the index.

database=> explain analyze select id from table1 where category <@ 'a.b';
                                                                    QUERY
PLAN
-------------------------------------
-------------------------------------
-------------------------------------------------------------------------
 Index Scan using table1_category_full_gist_idx on jobdata
(cost=0.00..113.48 rows=28 width=4) (actual time=43.814..12201.528
rows=1943 loops=1)
   Index Cond: (category <@ 'a.b'::ltree)
   Filter: (category <@ 'a.b'::ltree)
 Total runtime: 12222.258 ms


I can do this to speed things up (this results in very little disk
activity, certainly not the thrashing the original query did):


create table yuck (id integer, category ltree[]);
insert into yuck select id, category from table1;
create index category_idx on yuck using gist(category);
vacuum analyze yuck;
jobs=> explain analyze select id from table1 where id in (select id from
yuck where category <@ 'a.b');
                                                              QUERY PLAN

-------------------------------------
-------------------------------------
-------------------------------------------------------------
 Nested Loop  (cost=108.64..114.28 rows=1 width=52) (actual
time=654.645..1245.212 rows=1943 loops=1)
   ->  HashAggregate  (cost=108.64..108.64 rows=1 width=4) (actual
time=654.202..690.709 rows=1943 loops=1)
         ->  Index Scan using category_idx on yuck  (cost=0.00..108.57
rows=28 width=4) (actual time=2.046..623.436 rows=1943 loops=1)
               Index Cond: (category <@ 'a.b'::ltree)
               Filter: (category <@ 'a.b'::ltree)
   ->  Index Scan using table1_pkey on jobdata  (cost=0.00..5.64 rows=1
width=52) (actual time=0.219..0.235 rows=1 loops=1943)
         Index Cond: (table1.id = "outer".id)
 Total runtime: 1261.551 ms
(8 rows)



If I drop the index "table1_category_full_gist_idx", the query speeds up
dramatically (10-15 times faster on both dev and prod uction systems).

So my concern, in short: why is it so much slower when actually using an
index and why is it trying to make mince meat out of my hard drive?


- Greg


pgsql-general by date:

Previous
From: Jack Orenstein
Date:
Subject: Re: Logging VACUUM activity
Next
From: Yu Jie
Date:
Subject: Question about database restrict