Re: Simple queries take forever to run - Mailing list pgsql-performance

From Michael Guerin
Subject Re: Simple queries take forever to run
Date
Msg-id 3F4E6112.2060807@rentec.com
Whole thread Raw
In response to Re: Simple queries take forever to run  (Stephan Szabo <sszabo@megazone.bigpanda.com>)
List pgsql-performance
Stephan Szabo wrote:

>On Thu, 28 Aug 2003, Michael Guerin wrote:
>
>
>
>>Stephan Szabo wrote:
>>
>>
>>
>>>On Thu, 28 Aug 2003, Michael Guerin wrote:
>>>
>>>
>>>
>>>
>>>
>>>>Stephan Szabo wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>On Wed, 27 Aug 2003, Michael Guerin wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>I'm running into some performance problems trying to execute simple
>>>>>>queries.
>>>>>>
>>>>>>postgresql version 7.3.3
>>>>>>.conf params changed from defaults.
>>>>>>shared_buffers = 64000
>>>>>>sort_mem = 64000
>>>>>>fsync = false
>>>>>>effective_cache_size = 400000
>>>>>>
>>>>>>ex. query: select * from x where id in (select id from y);
>>>>>>
>>>>>>There's an index on each table for id.  SQL Server takes <1s to return,
>>>>>>postgresql doesn't return at all, neither does explain analyze.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>IN(subquery) is known to run poorly in 7.3.x and earlier.  7.4 is
>>>>>generally much better (for reasonably sized subqueries) but in earlier
>>>>>versions you'll probably want to convert into an EXISTS or join form.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>Something else seems to be going on, even switching to an exists clause
>>>>gives much better but poor performance.
>>>>count(*) where exists clause: Postgresql 19s, SQL Server <1s
>>>>count(*) where not exists:  23.3s SQL Server 1.5s
>>>>
>>>>
>>>>
>>>>
>>>What does explain analyze show for the two queries?
>>>
>>>
>>>
>>>
>>>
>>>
>>explain analyze  select count(*) from tbltimeseries where exists(select
>>uniqid  from tblobjectname where timeseriesid = uniqid);
>>Aggregate  (cost=5681552.18..5681552.18 rows=1 width=0) (actual
>>time=22756.64..22756.64 rows=1 loops=1)
>>   ->  Seq Scan on tbltimeseries  (cost=0.00..5680051.34 rows=600336
>>width=0) (actual time=22.06..21686.78 rows=1200113 loops=1)
>>         Filter: (NOT (subplan))
>>         SubPlan
>>           ->  Index Scan using idx_objectname on tblobjectname
>>(cost=0.00..4.70 rows=1 width=4) (actual time=0.01..0.01 rows=0
>>loops=1200673)
>>                 Index Cond: ($0 = uniqid)
>> Total runtime: 22756.83 msec
>>(7 rows)
>>
>>
>
>Hmm... I'd thought that it had options for a better plan than that.
>
>What do things like:
>
>explain analyze select count(distinct timeseriesid) from tbltimeseries,
> tblobjectname where timeseriesid=uniquid;
>
>and
>
>explain analyze select count(distinct timeseriesid) from
> tbltimeseries left outer join tblobjectname on (timeseriesid=uniqid)
> where uniqid is null;
>
>give you?
>
>
>
>
much better performance:

explain analyze select count(distinct timeseriesid) from tbltimeseries,
 tblobjectname where timeseriesid=uniquid;

 Aggregate  (cost=7384.03..7384.03 rows=1 width=8) (actual time=668.15..668.15 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..7380.83 rows=1282 width=8) (actual time=333.31..666.13 rows=561 loops=1)
         ->  Seq Scan on tblobjectname  (cost=0.00..33.82 rows=1282 width=4) (actual time=0.05..4.98 rows=1282 loops=1)
         ->  Index Scan using xx on tbltimeseries  (cost=0.00..5.72 rows=1 width=4) (actual time=0.51..0.51 rows=0
loops=1282)
               Index Cond: (tbltimeseries.timeseriesid = "outer".uniqid)
 Total runtime: 669.61 msec
(6 rows)

explain analyze select count(distinct timeseriesid) from
 tbltimeseries left outer join tblobjectname on (timeseriesid=uniqid)
 where uniqid is null;

 Aggregate  (cost=59144.19..59144.19 rows=1 width=8) (actual time=12699.47..12699.47 rows=1 loops=1)
   ->  Hash Join  (cost=37.02..56142.51 rows=1200673 width=8) (actual time=7.41..6376.12 rows=1200113 loops=1)
         Hash Cond: ("outer".timeseriesid = "inner".uniqid)
         Filter: ("inner".uniqid IS NULL)
         ->  Seq Scan on tbltimeseries  (cost=0.00..44082.73 rows=1200673 width=4) (actual time=0.01..3561.61
rows=1200673loops=1) 
         ->  Hash  (cost=33.82..33.82 rows=1282 width=4) (actual time=4.84..4.84 rows=0 loops=1)
               ->  Seq Scan on tblobjectname  (cost=0.00..33.82 rows=1282 width=4) (actual time=0.04..2.84 rows=1282
loops=1)
 Total runtime: 12699.76 msec
(8 rows)







pgsql-performance by date:

Previous
From: "scott.marlowe"
Date:
Subject: Re: The results of my PostgreSQL/filesystem performance
Next
From: "scott.marlowe"
Date:
Subject: Re: Queries sometimes take 1000 times the normal time