Thread: Simple queries take forever to run

Simple queries take forever to run

From
Michael Guerin
Date:
I'm running into some performance problems trying to execute simple
queries.

postgresql version 7.3.3
.conf params changed from defaults.
shared_buffers = 64000
sort_mem = 64000
fsync = false
effective_cache_size = 400000

ex. query: select * from x where id in (select id from y);

There's an index on each table for id.  SQL Server takes <1s to return,
postgresql doesn't return at all, neither does explain analyze.
x has 1200673 rows
y has 1282 rows

It seems like its ignoring the index and not using enough memory.. any
ideas?


Re: Simple queries take forever to run

From
Stephan Szabo
Date:
On Wed, 27 Aug 2003, Michael Guerin wrote:

> I'm running into some performance problems trying to execute simple
> queries.
>
> postgresql version 7.3.3
> .conf params changed from defaults.
> shared_buffers = 64000
> sort_mem = 64000
> fsync = false
> effective_cache_size = 400000
>
> ex. query: select * from x where id in (select id from y);
>
> There's an index on each table for id.  SQL Server takes <1s to return,
> postgresql doesn't return at all, neither does explain analyze.

IN(subquery) is known to run poorly in 7.3.x and earlier.  7.4 is
generally much better (for reasonably sized subqueries) but in earlier
versions you'll probably want to convert into an EXISTS or join form.


Re: Simple queries take forever to run

From
Neil Conway
Date:
On Wed, Aug 27, 2003 at 05:40:05PM -0400, Michael Guerin wrote:
> ex. query: select * from x where id in (select id from y);
>
> There's an index on each table for id.  SQL Server takes <1s to return,
> postgresql doesn't return at all, neither does explain analyze.

This particular form of query is a known performance problem for PostgreSQL
7.3 and earlier -- the problem should hopefully be fixed in 7.4 (currently
in beta). Check the archives for more discussion on this topic.

-Neil


Re: Simple queries take forever to run

From
"Christopher Kings-Lynne"
Date:
> postgresql version 7.3.3
> .conf params changed from defaults.
> shared_buffers = 64000
> sort_mem = 64000
> fsync = false
> effective_cache_size = 400000
>
> ex. query: select * from x where id in (select id from y);
>
> There's an index on each table for id.  SQL Server takes <1s to return,
> postgresql doesn't return at all, neither does explain analyze.
> x has 1200673 rows
> y has 1282 rows
>
> It seems like its ignoring the index and not using enough memory.. any
> ideas?

This is a known problem in 7.3, it is much faster in 7.4b1.  This should be
very, very fast though, and do exactly the same thing:

select * from x where exists (select id from y where y.id=x.id);

Chris


Re: Simple queries take forever to run

From
Michael Guerin
Date:
Stephan Szabo wrote:

>On Wed, 27 Aug 2003, Michael Guerin wrote:
>
>
>
>>I'm running into some performance problems trying to execute simple
>>queries.
>>
>>postgresql version 7.3.3
>>.conf params changed from defaults.
>>shared_buffers = 64000
>>sort_mem = 64000
>>fsync = false
>>effective_cache_size = 400000
>>
>>ex. query: select * from x where id in (select id from y);
>>
>>There's an index on each table for id.  SQL Server takes <1s to return,
>>postgresql doesn't return at all, neither does explain analyze.
>>
>>
>
>IN(subquery) is known to run poorly in 7.3.x and earlier.  7.4 is
>generally much better (for reasonably sized subqueries) but in earlier
>versions you'll probably want to convert into an EXISTS or join form.
>
>
>
>
Something else seems to be going on, even switching to an exists clause
gives much better but poor performance.
count(*) where exists clause: Postgresql 19s, SQL Server <1s
count(*) where not exists:  23.3s SQL Server 1.5s

SQL Server runs on a dual 1.4 with 4gigs, win2k
Postgresql runs on a quad 900 with 8 gigs, sunos 5.8




Re: Simple queries take forever to run

From
"Shridhar Daithankar"
Date:
On 28 Aug 2003 at 10:38, Michael Guerin wrote:
> >IN(subquery) is known to run poorly in 7.3.x and earlier.  7.4 is
> >generally much better (for reasonably sized subqueries) but in earlier
> >versions you'll probably want to convert into an EXISTS or join form.
> Something else seems to be going on, even switching to an exists clause
> gives much better but poor performance.
> count(*) where exists clause: Postgresql 19s, SQL Server <1s
> count(*) where not exists:  23.3s SQL Server 1.5s

This was with 7.4? Can you try downloading 7.4CVS and try?

>
> SQL Server runs on a dual 1.4 with 4gigs, win2k
> Postgresql runs on a quad 900 with 8 gigs, sunos 5.8

SunOS...Not the impala out there but anyways I would refrain from slipping in
that..

Parden me if this is a repeatation,  have you set your effective cache size?

Bye
 Shridhar

--
Nouvelle cuisine, n.:    French for "not enough food".Continental breakfast, n.:
English for "not enough food".Tapas, n.:    Spanish for "not enough food".Dim Sum,
n.:    Chinese for more food than you've ever seen in your entire life.


Re: Simple queries take forever to run

From
Stephan Szabo
Date:
On Thu, 28 Aug 2003, Michael Guerin wrote:

> Stephan Szabo wrote:
>
> >On Wed, 27 Aug 2003, Michael Guerin wrote:
> >
> >
> >
> >>I'm running into some performance problems trying to execute simple
> >>queries.
> >>
> >>postgresql version 7.3.3
> >>.conf params changed from defaults.
> >>shared_buffers = 64000
> >>sort_mem = 64000
> >>fsync = false
> >>effective_cache_size = 400000
> >>
> >>ex. query: select * from x where id in (select id from y);
> >>
> >>There's an index on each table for id.  SQL Server takes <1s to return,
> >>postgresql doesn't return at all, neither does explain analyze.
> >>
> >>
> >
> >IN(subquery) is known to run poorly in 7.3.x and earlier.  7.4 is
> >generally much better (for reasonably sized subqueries) but in earlier
> >versions you'll probably want to convert into an EXISTS or join form.
> >
> >
> >
> >
> Something else seems to be going on, even switching to an exists clause
> gives much better but poor performance.
> count(*) where exists clause: Postgresql 19s, SQL Server <1s
> count(*) where not exists:  23.3s SQL Server 1.5s

What does explain analyze show for the two queries?


Re: Simple queries take forever to run

From
Michael Guerin
Date:
Stephan Szabo wrote:

>On Thu, 28 Aug 2003, Michael Guerin wrote:
>
>
>
>>Stephan Szabo wrote:
>>
>>
>>
>>>On Wed, 27 Aug 2003, Michael Guerin wrote:
>>>
>>>
>>>
>>>
>>>
>>>>I'm running into some performance problems trying to execute simple
>>>>queries.
>>>>
>>>>postgresql version 7.3.3
>>>>.conf params changed from defaults.
>>>>shared_buffers = 64000
>>>>sort_mem = 64000
>>>>fsync = false
>>>>effective_cache_size = 400000
>>>>
>>>>ex. query: select * from x where id in (select id from y);
>>>>
>>>>There's an index on each table for id.  SQL Server takes <1s to return,
>>>>postgresql doesn't return at all, neither does explain analyze.
>>>>
>>>>
>>>>
>>>>
>>>IN(subquery) is known to run poorly in 7.3.x and earlier.  7.4 is
>>>generally much better (for reasonably sized subqueries) but in earlier
>>>versions you'll probably want to convert into an EXISTS or join form.
>>>
>>>
>>>
>>>
>>>
>>>
>>Something else seems to be going on, even switching to an exists clause
>>gives much better but poor performance.
>>count(*) where exists clause: Postgresql 19s, SQL Server <1s
>>count(*) where not exists:  23.3s SQL Server 1.5s
>>
>>
>
>What does explain analyze show for the two queries?
>
>
>
>
explain analyze  select count(*) from tbltimeseries where exists(select
uniqid  from tblobjectname where timeseriesid = uniqid);
Aggregate  (cost=5681552.18..5681552.18 rows=1 width=0) (actual
time=22756.64..22756.64 rows=1 loops=1)
   ->  Seq Scan on tbltimeseries  (cost=0.00..5680051.34 rows=600336
width=0) (actual time=22.06..21686.78 rows=1200113 loops=1)
         Filter: (NOT (subplan))
         SubPlan
           ->  Index Scan using idx_objectname on tblobjectname
(cost=0.00..4.70 rows=1 width=4) (actual time=0.01..0.01 rows=0
loops=1200673)
                 Index Cond: ($0 = uniqid)
 Total runtime: 22756.83 msec
(7 rows)

fiasco=# explain analyze  select count(*) from tbltimeseries where
exists(select uniqid  from tblobjectname where timeseriesid = uniqid);
                                                                   QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
explain analyze  select count(*) from tbltimeseries where exists(select
uniqid  from tblobjectname where timeseriesid = uniqid);
 Aggregate  (cost=5681552.18..5681552.18 rows=1 width=0) (actual
time=19558.77..19558.77 rows=1 loops=1)
   ->  Seq Scan on tbltimeseries  (cost=0.00..5680051.34 rows=600336
width=0) (actual time=0.21..19557.73 rows=560 loops=1)
         Filter: (subplan)
         SubPlan
           ->  Index Scan using idx_objectname on tblobjectname
(cost=0.00..4.70 rows=1 width=4) (actual time=0.01..0.01 rows=0
loops=1200673)
                 Index Cond: ($0 = uniqid)
 Total runtime: 19559.04 msec
(7 rows)





Re: Simple queries take forever to run

From
Stephan Szabo
Date:
On Thu, 28 Aug 2003, Michael Guerin wrote:

> Stephan Szabo wrote:
>
> >On Thu, 28 Aug 2003, Michael Guerin wrote:
> >
> >
> >
> >>Stephan Szabo wrote:
> >>
> >>
> >>
> >>>On Wed, 27 Aug 2003, Michael Guerin wrote:
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>>I'm running into some performance problems trying to execute simple
> >>>>queries.
> >>>>
> >>>>postgresql version 7.3.3
> >>>>.conf params changed from defaults.
> >>>>shared_buffers = 64000
> >>>>sort_mem = 64000
> >>>>fsync = false
> >>>>effective_cache_size = 400000
> >>>>
> >>>>ex. query: select * from x where id in (select id from y);
> >>>>
> >>>>There's an index on each table for id.  SQL Server takes <1s to return,
> >>>>postgresql doesn't return at all, neither does explain analyze.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>IN(subquery) is known to run poorly in 7.3.x and earlier.  7.4 is
> >>>generally much better (for reasonably sized subqueries) but in earlier
> >>>versions you'll probably want to convert into an EXISTS or join form.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>Something else seems to be going on, even switching to an exists clause
> >>gives much better but poor performance.
> >>count(*) where exists clause: Postgresql 19s, SQL Server <1s
> >>count(*) where not exists:  23.3s SQL Server 1.5s
> >>
> >>
> >
> >What does explain analyze show for the two queries?
> >
> >
> >
> >
> explain analyze  select count(*) from tbltimeseries where exists(select
> uniqid  from tblobjectname where timeseriesid = uniqid);
> Aggregate  (cost=5681552.18..5681552.18 rows=1 width=0) (actual
> time=22756.64..22756.64 rows=1 loops=1)
>    ->  Seq Scan on tbltimeseries  (cost=0.00..5680051.34 rows=600336
> width=0) (actual time=22.06..21686.78 rows=1200113 loops=1)
>          Filter: (NOT (subplan))
>          SubPlan
>            ->  Index Scan using idx_objectname on tblobjectname
> (cost=0.00..4.70 rows=1 width=4) (actual time=0.01..0.01 rows=0
> loops=1200673)
>                  Index Cond: ($0 = uniqid)
>  Total runtime: 22756.83 msec
> (7 rows)

Hmm... I'd thought that it had options for a better plan than that.

What do things like:

explain analyze select count(distinct timeseriesid) from tbltimeseries,
 tblobjectname where timeseriesid=uniquid;

and

explain analyze select count(distinct timeseriesid) from
 tbltimeseries left outer join tblobjectname on (timeseriesid=uniqid)
 where uniqid is null;

give you?


Re: Simple queries take forever to run

From
Michael Guerin
Date:
Stephan Szabo wrote:

>On Thu, 28 Aug 2003, Michael Guerin wrote:
>
>
>
>>Stephan Szabo wrote:
>>
>>
>>
>>>On Thu, 28 Aug 2003, Michael Guerin wrote:
>>>
>>>
>>>
>>>
>>>
>>>>Stephan Szabo wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>On Wed, 27 Aug 2003, Michael Guerin wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>I'm running into some performance problems trying to execute simple
>>>>>>queries.
>>>>>>
>>>>>>postgresql version 7.3.3
>>>>>>.conf params changed from defaults.
>>>>>>shared_buffers = 64000
>>>>>>sort_mem = 64000
>>>>>>fsync = false
>>>>>>effective_cache_size = 400000
>>>>>>
>>>>>>ex. query: select * from x where id in (select id from y);
>>>>>>
>>>>>>There's an index on each table for id.  SQL Server takes <1s to return,
>>>>>>postgresql doesn't return at all, neither does explain analyze.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>IN(subquery) is known to run poorly in 7.3.x and earlier.  7.4 is
>>>>>generally much better (for reasonably sized subqueries) but in earlier
>>>>>versions you'll probably want to convert into an EXISTS or join form.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>Something else seems to be going on, even switching to an exists clause
>>>>gives much better but poor performance.
>>>>count(*) where exists clause: Postgresql 19s, SQL Server <1s
>>>>count(*) where not exists:  23.3s SQL Server 1.5s
>>>>
>>>>
>>>>
>>>>
>>>What does explain analyze show for the two queries?
>>>
>>>
>>>
>>>
>>>
>>>
>>explain analyze  select count(*) from tbltimeseries where exists(select
>>uniqid  from tblobjectname where timeseriesid = uniqid);
>>Aggregate  (cost=5681552.18..5681552.18 rows=1 width=0) (actual
>>time=22756.64..22756.64 rows=1 loops=1)
>>   ->  Seq Scan on tbltimeseries  (cost=0.00..5680051.34 rows=600336
>>width=0) (actual time=22.06..21686.78 rows=1200113 loops=1)
>>         Filter: (NOT (subplan))
>>         SubPlan
>>           ->  Index Scan using idx_objectname on tblobjectname
>>(cost=0.00..4.70 rows=1 width=4) (actual time=0.01..0.01 rows=0
>>loops=1200673)
>>                 Index Cond: ($0 = uniqid)
>> Total runtime: 22756.83 msec
>>(7 rows)
>>
>>
>
>Hmm... I'd thought that it had options for a better plan than that.
>
>What do things like:
>
>explain analyze select count(distinct timeseriesid) from tbltimeseries,
> tblobjectname where timeseriesid=uniquid;
>
>and
>
>explain analyze select count(distinct timeseriesid) from
> tbltimeseries left outer join tblobjectname on (timeseriesid=uniqid)
> where uniqid is null;
>
>give you?
>
>
>
>
much better performance:

explain analyze select count(distinct timeseriesid) from tbltimeseries,
 tblobjectname where timeseriesid=uniquid;

 Aggregate  (cost=7384.03..7384.03 rows=1 width=8) (actual time=668.15..668.15 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..7380.83 rows=1282 width=8) (actual time=333.31..666.13 rows=561 loops=1)
         ->  Seq Scan on tblobjectname  (cost=0.00..33.82 rows=1282 width=4) (actual time=0.05..4.98 rows=1282 loops=1)
         ->  Index Scan using xx on tbltimeseries  (cost=0.00..5.72 rows=1 width=4) (actual time=0.51..0.51 rows=0
loops=1282)
               Index Cond: (tbltimeseries.timeseriesid = "outer".uniqid)
 Total runtime: 669.61 msec
(6 rows)

explain analyze select count(distinct timeseriesid) from
 tbltimeseries left outer join tblobjectname on (timeseriesid=uniqid)
 where uniqid is null;

 Aggregate  (cost=59144.19..59144.19 rows=1 width=8) (actual time=12699.47..12699.47 rows=1 loops=1)
   ->  Hash Join  (cost=37.02..56142.51 rows=1200673 width=8) (actual time=7.41..6376.12 rows=1200113 loops=1)
         Hash Cond: ("outer".timeseriesid = "inner".uniqid)
         Filter: ("inner".uniqid IS NULL)
         ->  Seq Scan on tbltimeseries  (cost=0.00..44082.73 rows=1200673 width=4) (actual time=0.01..3561.61
rows=1200673loops=1) 
         ->  Hash  (cost=33.82..33.82 rows=1282 width=4) (actual time=4.84..4.84 rows=0 loops=1)
               ->  Seq Scan on tblobjectname  (cost=0.00..33.82 rows=1282 width=4) (actual time=0.04..2.84 rows=1282
loops=1)
 Total runtime: 12699.76 msec
(8 rows)