Query performance PLEASE HELP - Mailing list pgsql-general

From Dmitry Tkach
Subject Query performance PLEASE HELP
Date
Msg-id 3E3ACEE6.7060603@openratings.com
Whole thread Raw
Responses Re: Query performance PLEASE HELP
List pgsql-general
Hi, everybody!

I have spent the last few days trying to figure out why one (or rather some) of my queries takes so long time, but I am
aboutto give up :-( 

Something seems to be definitely wrong here, but I am out of ideas :-(
I was wonderring if somebody could please take a look at this and, perhaps, give me some clue what is going on? I would
greatlyappreciate that... 

There are two tables involved:

create table tradestyle
(
    duns int,
    type int,
    name text not null
);
create index tradestyle_idx on tradestyle (name);
create unique index tradestyle_duns_idx on tradestyle (duns,type);

create table managed_supplier
(
    id serial primary key,
    duns int not null,
    subscriber int not null
);
create unique index managed_supplier_idx on managed_supplier (subscriber, duns);

tradestyle table is huge (about 50 million rows), managed_supplier is much smaller (under 100K)

The query I am fighting with is something like:


select * from tradestyle ts, managed_supplier ms where ts.duns=ms.duns and ts.name like 'POST%' and ms.subscriber=74
orderby ts.name limit 10; 

This takes close to 20 minutes (!) to run for the first time (it returns almost instanteneouly if I run it again, but
willtake about 20 minutes 
again if I modify the name parameter).

Explain analyze says:

Limit  (cost=61.91..61.91 rows=1 width=192) (actual time=439435.47..439435.50 rows=10 loops=1)
   ->  Sort  (cost=61.91..61.91 rows=1 width=192) (actual time=439435.47..439435.48 rows=11 loops=1)
         ->  Nested Loop  (cost=0.00..61.90 rows=1 width=192) (actual time=7589.68..439423.75 rows=110 loops=1)
               ->  Index Scan using managed_supplier_idx on managed_supplier ms  (cost=0.00..22.02 rows=5 width=157)
(actualtime=6.72..3009.90 rows=14365 loops=1) 
               ->  Index Scan using tradestyle_duns_idx on tradestyle ts  (cost=0.00..6.97 rows=1 width=35) (actual
time=30.34..30.37rows=0 loops=14365) 
Total runtime: 439436.45 msec

Well... this time it was just 7 minutes, but I've seen it beeing 20 (with a different param for the name) too, and, 7
minutesis still kinda a lot! 

I figure, what's killing me here, is having to read the large number of rows from disk for the nested loop... But here
iswhat I get from the stats: 
(The total number of blocks read times block size over 439 seconds it took to run):

select (sum(heap_blks_read) + sum(idx_blks_read))*8192/439  from pg_statio_user_tables where relname in ('tradestyle',
'managed_supplier');
-[ RECORD 1 ]---------------
?column? | 574391.6902050114

500K per second???? This is slower than my DSL line!!!

I do believe, it is really the physical read that takes most of the time
(I sampled it by stopping in the debugger periodically - and every time it was sitting in __libc_read (),
and also, if I run the same exact query again, it returns instanteneously), but I can't understand why is it
taking THIS long????

It is also NOT a faulty drive problem - I have this same database installed on a few different boxes, and I have the
sameproblem everywhere. 

If you guys have any idea what can possibly be going on here, PLEASE HELP!!!!

Thanks a lot!

Dima.

P.S. Here is one more example, with a different name param, it chooses to use a different plan (that actually makes
moresense to me, but whatever), 
and took just under 5 minutes to run (but, the "improvement" is just an illusion, because I ran this thing after the
previousone - so, most of the stuff was cached): 

Limit  (cost=0.00..16.14 rows=1 width=192) (actual time=6926.37..297527.99 rows=10 loops=1)
   ->  Nested Loop  (cost=0.00..16.14 rows=1 width=192) (actual time=6926.36..297527.94 rows=11 loops=1)
         ->  Index Scan using tradestyle_name_idx on tradestyle ts  (cost=0.00..7.98 rows=1 width=35) (actual
time=51.99..295646.78rows=41020 loops=1) 
         ->  Index Scan using managed_supplier_idx on managed_supplier ms  (cost=0.00..5.82 rows=1 width=157) (actual
time=0.04..0.04rows=0 loops=41020) 
Total runtime: 297528.31 msec

I'll greatly appreciate any ideas you could come up with.












pgsql-general by date:

Previous
From: Hervé Piedvache
Date:
Subject: ALTER STATISTIC and dump ?
Next
From: "Gregory Wood"
Date:
Subject: Re: [PERFORM] One large v. many small