Re: Peculiar performance observation.... - Mailing list pgsql-general
From | Net Virtual Mailing Lists |
---|---|
Subject | Re: Peculiar performance observation.... |
Date | |
Msg-id | 20050315031431.1274@mail.net-virtual.com Whole thread Raw |
In response to | Re: Peculiar performance observation.... (Scott Marlowe <smarlowe@g2switchworks.com>) |
Responses |
Re: Peculiar performance observation....
|
List | pgsql-general |
>On Mon, 2005-03-14 at 06:11, Net Virtual Mailing Lists wrote: >> Hello, >> >> >> I am sorry to bring this up again.... Does anyone have any idea what >> might be going on here?... I'm very worried about this situation.. ;-( > >It looks to me like either you're not analyzing often enough, or your >statistics target is too low to get a good sample. Note your estimated >versus real rows are off by a factor of 70 (28 est. versus 1943 actual >rows). That's a pretty big difference, and where you should be looking. > >> > -> Seq Scan on jobdata (cost=0.00..7567.88 rows=28 width=52) (actual >> >time=11.498..4800.907 rows=1943 loops=1) > >Yes, this is because PostgreSQL is using an index to approximate a >sequential scan, which is not a good thing since PostgreSQL can't get >all the information it needs from just an index, but has to visit the >table to check visibility. > All of these were after a vacuum full analyze, which I actually do nightly on the database. I probably confused the issue with all of my posts, this is the query which has me concerned. When running it on my system here, the disk thrashes (and I mean *THRASHES*) the entire 12-20 seconds it takes to run... WHen running on our production servers, I can't hear the disk, but see an equally troubling performance loss when using the index. database=> explain analyze select id from table1 where category <@ 'a.b'; QUERY PLAN ------------------------------------- ------------------------------------- ------------------------------------------------------------------------- Index Scan using table1_category_full_gist_idx on jobdata (cost=0.00..113.48 rows=28 width=4) (actual time=43.814..12201.528 rows=1943 loops=1) Index Cond: (category <@ 'a.b'::ltree) Filter: (category <@ 'a.b'::ltree) Total runtime: 12222.258 ms I can do this to speed things up (this results in very little disk activity, certainly not the thrashing the original query did): create table yuck (id integer, category ltree[]); insert into yuck select id, category from table1; create index category_idx on yuck using gist(category); vacuum analyze yuck; jobs=> explain analyze select id from table1 where id in (select id from yuck where category <@ 'a.b'); QUERY PLAN ------------------------------------- ------------------------------------- ------------------------------------------------------------- Nested Loop (cost=108.64..114.28 rows=1 width=52) (actual time=654.645..1245.212 rows=1943 loops=1) -> HashAggregate (cost=108.64..108.64 rows=1 width=4) (actual time=654.202..690.709 rows=1943 loops=1) -> Index Scan using category_idx on yuck (cost=0.00..108.57 rows=28 width=4) (actual time=2.046..623.436 rows=1943 loops=1) Index Cond: (category <@ 'a.b'::ltree) Filter: (category <@ 'a.b'::ltree) -> Index Scan using table1_pkey on jobdata (cost=0.00..5.64 rows=1 width=52) (actual time=0.219..0.235 rows=1 loops=1943) Index Cond: (table1.id = "outer".id) Total runtime: 1261.551 ms (8 rows) If I drop the index "table1_category_full_gist_idx", the query speeds up dramatically (10-15 times faster on both dev and prod uction systems). So my concern, in short: why is it so much slower when actually using an index and why is it trying to make mince meat out of my hard drive? - Greg
pgsql-general by date: