Thread: Simple queries take forever to run
I'm running into some performance problems trying to execute simple queries. postgresql version 7.3.3 .conf params changed from defaults. shared_buffers = 64000 sort_mem = 64000 fsync = false effective_cache_size = 400000 ex. query: select * from x where id in (select id from y); There's an index on each table for id. SQL Server takes <1s to return, postgresql doesn't return at all, neither does explain analyze. x has 1200673 rows y has 1282 rows It seems like its ignoring the index and not using enough memory.. any ideas?
On Wed, 27 Aug 2003, Michael Guerin wrote: > I'm running into some performance problems trying to execute simple > queries. > > postgresql version 7.3.3 > .conf params changed from defaults. > shared_buffers = 64000 > sort_mem = 64000 > fsync = false > effective_cache_size = 400000 > > ex. query: select * from x where id in (select id from y); > > There's an index on each table for id. SQL Server takes <1s to return, > postgresql doesn't return at all, neither does explain analyze. IN(subquery) is known to run poorly in 7.3.x and earlier. 7.4 is generally much better (for reasonably sized subqueries) but in earlier versions you'll probably want to convert into an EXISTS or join form.
On Wed, Aug 27, 2003 at 05:40:05PM -0400, Michael Guerin wrote: > ex. query: select * from x where id in (select id from y); > > There's an index on each table for id. SQL Server takes <1s to return, > postgresql doesn't return at all, neither does explain analyze. This particular form of query is a known performance problem for PostgreSQL 7.3 and earlier -- the problem should hopefully be fixed in 7.4 (currently in beta). Check the archives for more discussion on this topic. -Neil
> postgresql version 7.3.3 > .conf params changed from defaults. > shared_buffers = 64000 > sort_mem = 64000 > fsync = false > effective_cache_size = 400000 > > ex. query: select * from x where id in (select id from y); > > There's an index on each table for id. SQL Server takes <1s to return, > postgresql doesn't return at all, neither does explain analyze. > x has 1200673 rows > y has 1282 rows > > It seems like its ignoring the index and not using enough memory.. any > ideas? This is a known problem in 7.3, it is much faster in 7.4b1. This should be very, very fast though, and do exactly the same thing: select * from x where exists (select id from y where y.id=x.id); Chris
Stephan Szabo wrote: >On Wed, 27 Aug 2003, Michael Guerin wrote: > > > >>I'm running into some performance problems trying to execute simple >>queries. >> >>postgresql version 7.3.3 >>.conf params changed from defaults. >>shared_buffers = 64000 >>sort_mem = 64000 >>fsync = false >>effective_cache_size = 400000 >> >>ex. query: select * from x where id in (select id from y); >> >>There's an index on each table for id. SQL Server takes <1s to return, >>postgresql doesn't return at all, neither does explain analyze. >> >> > >IN(subquery) is known to run poorly in 7.3.x and earlier. 7.4 is >generally much better (for reasonably sized subqueries) but in earlier >versions you'll probably want to convert into an EXISTS or join form. > > > > Something else seems to be going on, even switching to an exists clause gives much better but poor performance. count(*) where exists clause: Postgresql 19s, SQL Server <1s count(*) where not exists: 23.3s SQL Server 1.5s SQL Server runs on a dual 1.4 with 4gigs, win2k Postgresql runs on a quad 900 with 8 gigs, sunos 5.8
On 28 Aug 2003 at 10:38, Michael Guerin wrote: > >IN(subquery) is known to run poorly in 7.3.x and earlier. 7.4 is > >generally much better (for reasonably sized subqueries) but in earlier > >versions you'll probably want to convert into an EXISTS or join form. > Something else seems to be going on, even switching to an exists clause > gives much better but poor performance. > count(*) where exists clause: Postgresql 19s, SQL Server <1s > count(*) where not exists: 23.3s SQL Server 1.5s This was with 7.4? Can you try downloading 7.4CVS and try? > > SQL Server runs on a dual 1.4 with 4gigs, win2k > Postgresql runs on a quad 900 with 8 gigs, sunos 5.8 SunOS...Not the impala out there but anyways I would refrain from slipping in that.. Parden me if this is a repeatation, have you set your effective cache size? Bye Shridhar -- Nouvelle cuisine, n.: French for "not enough food".Continental breakfast, n.: English for "not enough food".Tapas, n.: Spanish for "not enough food".Dim Sum, n.: Chinese for more food than you've ever seen in your entire life.
On Thu, 28 Aug 2003, Michael Guerin wrote: > Stephan Szabo wrote: > > >On Wed, 27 Aug 2003, Michael Guerin wrote: > > > > > > > >>I'm running into some performance problems trying to execute simple > >>queries. > >> > >>postgresql version 7.3.3 > >>.conf params changed from defaults. > >>shared_buffers = 64000 > >>sort_mem = 64000 > >>fsync = false > >>effective_cache_size = 400000 > >> > >>ex. query: select * from x where id in (select id from y); > >> > >>There's an index on each table for id. SQL Server takes <1s to return, > >>postgresql doesn't return at all, neither does explain analyze. > >> > >> > > > >IN(subquery) is known to run poorly in 7.3.x and earlier. 7.4 is > >generally much better (for reasonably sized subqueries) but in earlier > >versions you'll probably want to convert into an EXISTS or join form. > > > > > > > > > Something else seems to be going on, even switching to an exists clause > gives much better but poor performance. > count(*) where exists clause: Postgresql 19s, SQL Server <1s > count(*) where not exists: 23.3s SQL Server 1.5s What does explain analyze show for the two queries?
Stephan Szabo wrote: >On Thu, 28 Aug 2003, Michael Guerin wrote: > > > >>Stephan Szabo wrote: >> >> >> >>>On Wed, 27 Aug 2003, Michael Guerin wrote: >>> >>> >>> >>> >>> >>>>I'm running into some performance problems trying to execute simple >>>>queries. >>>> >>>>postgresql version 7.3.3 >>>>.conf params changed from defaults. >>>>shared_buffers = 64000 >>>>sort_mem = 64000 >>>>fsync = false >>>>effective_cache_size = 400000 >>>> >>>>ex. query: select * from x where id in (select id from y); >>>> >>>>There's an index on each table for id. SQL Server takes <1s to return, >>>>postgresql doesn't return at all, neither does explain analyze. >>>> >>>> >>>> >>>> >>>IN(subquery) is known to run poorly in 7.3.x and earlier. 7.4 is >>>generally much better (for reasonably sized subqueries) but in earlier >>>versions you'll probably want to convert into an EXISTS or join form. >>> >>> >>> >>> >>> >>> >>Something else seems to be going on, even switching to an exists clause >>gives much better but poor performance. >>count(*) where exists clause: Postgresql 19s, SQL Server <1s >>count(*) where not exists: 23.3s SQL Server 1.5s >> >> > >What does explain analyze show for the two queries? > > > > explain analyze select count(*) from tbltimeseries where exists(select uniqid from tblobjectname where timeseriesid = uniqid); Aggregate (cost=5681552.18..5681552.18 rows=1 width=0) (actual time=22756.64..22756.64 rows=1 loops=1) -> Seq Scan on tbltimeseries (cost=0.00..5680051.34 rows=600336 width=0) (actual time=22.06..21686.78 rows=1200113 loops=1) Filter: (NOT (subplan)) SubPlan -> Index Scan using idx_objectname on tblobjectname (cost=0.00..4.70 rows=1 width=4) (actual time=0.01..0.01 rows=0 loops=1200673) Index Cond: ($0 = uniqid) Total runtime: 22756.83 msec (7 rows) fiasco=# explain analyze select count(*) from tbltimeseries where exists(select uniqid from tblobjectname where timeseriesid = uniqid); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------ explain analyze select count(*) from tbltimeseries where exists(select uniqid from tblobjectname where timeseriesid = uniqid); Aggregate (cost=5681552.18..5681552.18 rows=1 width=0) (actual time=19558.77..19558.77 rows=1 loops=1) -> Seq Scan on tbltimeseries (cost=0.00..5680051.34 rows=600336 width=0) (actual time=0.21..19557.73 rows=560 loops=1) Filter: (subplan) SubPlan -> Index Scan using idx_objectname on tblobjectname (cost=0.00..4.70 rows=1 width=4) (actual time=0.01..0.01 rows=0 loops=1200673) Index Cond: ($0 = uniqid) Total runtime: 19559.04 msec (7 rows)
On Thu, 28 Aug 2003, Michael Guerin wrote: > Stephan Szabo wrote: > > >On Thu, 28 Aug 2003, Michael Guerin wrote: > > > > > > > >>Stephan Szabo wrote: > >> > >> > >> > >>>On Wed, 27 Aug 2003, Michael Guerin wrote: > >>> > >>> > >>> > >>> > >>> > >>>>I'm running into some performance problems trying to execute simple > >>>>queries. > >>>> > >>>>postgresql version 7.3.3 > >>>>.conf params changed from defaults. > >>>>shared_buffers = 64000 > >>>>sort_mem = 64000 > >>>>fsync = false > >>>>effective_cache_size = 400000 > >>>> > >>>>ex. query: select * from x where id in (select id from y); > >>>> > >>>>There's an index on each table for id. SQL Server takes <1s to return, > >>>>postgresql doesn't return at all, neither does explain analyze. > >>>> > >>>> > >>>> > >>>> > >>>IN(subquery) is known to run poorly in 7.3.x and earlier. 7.4 is > >>>generally much better (for reasonably sized subqueries) but in earlier > >>>versions you'll probably want to convert into an EXISTS or join form. > >>> > >>> > >>> > >>> > >>> > >>> > >>Something else seems to be going on, even switching to an exists clause > >>gives much better but poor performance. > >>count(*) where exists clause: Postgresql 19s, SQL Server <1s > >>count(*) where not exists: 23.3s SQL Server 1.5s > >> > >> > > > >What does explain analyze show for the two queries? > > > > > > > > > explain analyze select count(*) from tbltimeseries where exists(select > uniqid from tblobjectname where timeseriesid = uniqid); > Aggregate (cost=5681552.18..5681552.18 rows=1 width=0) (actual > time=22756.64..22756.64 rows=1 loops=1) > -> Seq Scan on tbltimeseries (cost=0.00..5680051.34 rows=600336 > width=0) (actual time=22.06..21686.78 rows=1200113 loops=1) > Filter: (NOT (subplan)) > SubPlan > -> Index Scan using idx_objectname on tblobjectname > (cost=0.00..4.70 rows=1 width=4) (actual time=0.01..0.01 rows=0 > loops=1200673) > Index Cond: ($0 = uniqid) > Total runtime: 22756.83 msec > (7 rows) Hmm... I'd thought that it had options for a better plan than that. What do things like: explain analyze select count(distinct timeseriesid) from tbltimeseries, tblobjectname where timeseriesid=uniquid; and explain analyze select count(distinct timeseriesid) from tbltimeseries left outer join tblobjectname on (timeseriesid=uniqid) where uniqid is null; give you?
Stephan Szabo wrote: >On Thu, 28 Aug 2003, Michael Guerin wrote: > > > >>Stephan Szabo wrote: >> >> >> >>>On Thu, 28 Aug 2003, Michael Guerin wrote: >>> >>> >>> >>> >>> >>>>Stephan Szabo wrote: >>>> >>>> >>>> >>>> >>>> >>>>>On Wed, 27 Aug 2003, Michael Guerin wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>I'm running into some performance problems trying to execute simple >>>>>>queries. >>>>>> >>>>>>postgresql version 7.3.3 >>>>>>.conf params changed from defaults. >>>>>>shared_buffers = 64000 >>>>>>sort_mem = 64000 >>>>>>fsync = false >>>>>>effective_cache_size = 400000 >>>>>> >>>>>>ex. query: select * from x where id in (select id from y); >>>>>> >>>>>>There's an index on each table for id. SQL Server takes <1s to return, >>>>>>postgresql doesn't return at all, neither does explain analyze. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>IN(subquery) is known to run poorly in 7.3.x and earlier. 7.4 is >>>>>generally much better (for reasonably sized subqueries) but in earlier >>>>>versions you'll probably want to convert into an EXISTS or join form. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>Something else seems to be going on, even switching to an exists clause >>>>gives much better but poor performance. >>>>count(*) where exists clause: Postgresql 19s, SQL Server <1s >>>>count(*) where not exists: 23.3s SQL Server 1.5s >>>> >>>> >>>> >>>> >>>What does explain analyze show for the two queries? >>> >>> >>> >>> >>> >>> >>explain analyze select count(*) from tbltimeseries where exists(select >>uniqid from tblobjectname where timeseriesid = uniqid); >>Aggregate (cost=5681552.18..5681552.18 rows=1 width=0) (actual >>time=22756.64..22756.64 rows=1 loops=1) >> -> Seq Scan on tbltimeseries (cost=0.00..5680051.34 rows=600336 >>width=0) (actual time=22.06..21686.78 rows=1200113 loops=1) >> Filter: (NOT (subplan)) >> SubPlan >> -> Index Scan using idx_objectname on tblobjectname >>(cost=0.00..4.70 rows=1 width=4) (actual time=0.01..0.01 rows=0 >>loops=1200673) >> Index Cond: ($0 = uniqid) >> Total runtime: 22756.83 msec >>(7 rows) >> >> > >Hmm... I'd thought that it had options for a better plan than that. > >What do things like: > >explain analyze select count(distinct timeseriesid) from tbltimeseries, > tblobjectname where timeseriesid=uniquid; > >and > >explain analyze select count(distinct timeseriesid) from > tbltimeseries left outer join tblobjectname on (timeseriesid=uniqid) > where uniqid is null; > >give you? > > > > much better performance: explain analyze select count(distinct timeseriesid) from tbltimeseries, tblobjectname where timeseriesid=uniquid; Aggregate (cost=7384.03..7384.03 rows=1 width=8) (actual time=668.15..668.15 rows=1 loops=1) -> Nested Loop (cost=0.00..7380.83 rows=1282 width=8) (actual time=333.31..666.13 rows=561 loops=1) -> Seq Scan on tblobjectname (cost=0.00..33.82 rows=1282 width=4) (actual time=0.05..4.98 rows=1282 loops=1) -> Index Scan using xx on tbltimeseries (cost=0.00..5.72 rows=1 width=4) (actual time=0.51..0.51 rows=0 loops=1282) Index Cond: (tbltimeseries.timeseriesid = "outer".uniqid) Total runtime: 669.61 msec (6 rows) explain analyze select count(distinct timeseriesid) from tbltimeseries left outer join tblobjectname on (timeseriesid=uniqid) where uniqid is null; Aggregate (cost=59144.19..59144.19 rows=1 width=8) (actual time=12699.47..12699.47 rows=1 loops=1) -> Hash Join (cost=37.02..56142.51 rows=1200673 width=8) (actual time=7.41..6376.12 rows=1200113 loops=1) Hash Cond: ("outer".timeseriesid = "inner".uniqid) Filter: ("inner".uniqid IS NULL) -> Seq Scan on tbltimeseries (cost=0.00..44082.73 rows=1200673 width=4) (actual time=0.01..3561.61 rows=1200673loops=1) -> Hash (cost=33.82..33.82 rows=1282 width=4) (actual time=4.84..4.84 rows=0 loops=1) -> Seq Scan on tblobjectname (cost=0.00..33.82 rows=1282 width=4) (actual time=0.04..2.84 rows=1282 loops=1) Total runtime: 12699.76 msec (8 rows)