Thread: Slow query problem
Hi, We've set up a little test box (1GHz Athlon, 40G IDE drive, 256M RAM, Redhat 9) to do some basic comparisons between postgresql and firebird 1.0.3 and 1.5rc8. Mostly the results are comparable, with one significant exception. QUERY select invheadref, invprodref, sum(units) from invtran group by invheadref, invprodref RESULTS pg 7.3.4 - 5.5 min pg 7.4.0 - 10 min fb 1.0.3 - 64 sec fb 1.5 - 44 sec * The invtran table has about 2.5 million records, invheadref and invprodref are both char(10) and indexed. * shared_buffers = 12000 and sort_mem = 8192 are the only changes I've made to postgresql.conf, with relevant changes to shmall and shmmax. This is an explain analyse plan from postgresql 7.4: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ GroupAggregate (cost=572484.23..601701.15 rows=1614140 width=39) (actual time=500091.171..554203.189 rows=147621 loops=1) -> Sort (cost=572484.23..578779.62 rows=2518157 width=39) (actual time=500090.939..527500.940 rows=2521530 loops=1) Sort Key: invheadref, invprodref -> Seq Scan on invtran (cost=0.00..112014.57 rows=2518157 width=39) (actual time=16.002..25516.917 rows=2521530 loops=1) Total runtime: 554826.827 ms (5 rows) Am I correct in interpreting that most time was spent doing the sorting? Explain confuses the heck out of me and any help on how I could make this run faster would be gratefully received. Cheers, Bradley.
On Thu, 08 Jan 2004 16:52:05 +1100 Bradley Tate <btate@objectmastery.com> wrote: > Am I correct in interpreting that most time was spent doing the > sorting? looks so. your table is about 70MB total size, and its getting loaded completely into memory (you have 12000 * 8k = 96M available). 26s to load 70MB from disk seems reasonable. The rest of the time is used for sorting. > Explain confuses the heck out of me and any help on how I could make > this run faster would be gratefully received. > You should bump sort_mem as high as you can stand. with only 8MB sort memory available, you're swapping intermediate sort pages to disk -- a lot. Try the query with sort_mem set to 75MB (to do the entire sort in memory). -mike > Cheers, > > Bradley. > > > ---------------------------(end of > broadcast)--------------------------- TIP 8: explain analyze is your > friend -- Mike Glover Key ID BFD19F2C <mpg4@duluoz.net>
Attachment
On Thu, Jan 08, 2004 at 19:27:16 -0800, Mike Glover <mpg4@duluoz.net> wrote: > > You should bump sort_mem as high as you can stand. with only 8MB sort > memory available, you're swapping intermediate sort pages to disk -- > a lot. Try the query with sort_mem set to 75MB (to do the entire sort in > memory). Postgres also might be able to switch to a hash aggregate instead of using a sort if sortmem is made large enough to hold the results for all of the (estimated) groups.
Mike Glover <mpg4@duluoz.net> writes: > You should bump sort_mem as high as you can stand. with only 8MB sort > memory available, you're swapping intermediate sort pages to disk -- > a lot. Try the query with sort_mem set to 75MB (to do the entire sort in > memory). 7.4 will probably flip over to a hash-based aggregation method, and not sort at all, once you make sort_mem large enough that it thinks the hash table will fit in sort_mem. regards, tom lane
On Thu, 8 Jan 2004, Bradley Tate wrote: > We've set up a little test box (1GHz Athlon, 40G IDE drive, 256M RAM, > Redhat 9) to do some basic comparisons between postgresql and firebird > 1.0.3 and 1.5rc8. Mostly the results are comparable, with one > significant exception. > > QUERY > select invheadref, invprodref, sum(units) > from invtran > group by invheadref, invprodref > > RESULTS > pg 7.3.4 - 5.5 min > pg 7.4.0 - 10 min > fb 1.0.3 - 64 sec > fb 1.5 - 44 sec > > * The invtran table has about 2.5 million records, invheadref and > invprodref are both char(10) and indexed. For the above query, shouldn't you have one index for both columns (invheadref, invprodref). Then it should not need to sort at all to do the grouping and it should all be fast. -- /Dennis Björklund
On Friday 09 January 2004 07:29, Dennis Björklund wrote: > On Thu, 8 Jan 2004, Bradley Tate wrote: > > > > select invheadref, invprodref, sum(units) > > from invtran > > group by invheadref, invprodref > For the above query, shouldn't you have one index for both columns > (invheadref, invprodref). Then it should not need to sort at all to do the > grouping and it should all be fast. Not sure if that would make a difference here, since the whole table is being read. -- Richard Huxton Archonet Ltd
On Fri, 9 Jan 2004, Richard Huxton wrote: > > > select invheadref, invprodref, sum(units) > > > from invtran > > > group by invheadref, invprodref > > > For the above query, shouldn't you have one index for both columns > > (invheadref, invprodref). Then it should not need to sort at all to do the > > grouping and it should all be fast. > > Not sure if that would make a difference here, since the whole table is being > read. The goal was to avoid the sorting which should not be needed with that index (I hope). So I still think that it would help in this case. -- /Dennis Björklund
On Friday 09 January 2004 08:57, Dennis Björklund wrote: > On Fri, 9 Jan 2004, Richard Huxton wrote: > > > > select invheadref, invprodref, sum(units) > > > > from invtran > > > > group by invheadref, invprodref > > > > > > For the above query, shouldn't you have one index for both columns > > > (invheadref, invprodref). Then it should not need to sort at all to do > > > the grouping and it should all be fast. > > > > Not sure if that would make a difference here, since the whole table is > > being read. > > The goal was to avoid the sorting which should not be needed with that > index (I hope). So I still think that it would help in this case. Sorry - not being clear. I can see how it _might_ help, but will the planner take into account the fact that even though: index-cost > seqscan-cost that (index-cost + no-sorting) < (seqscan-cost + sort-cost) assuming of course, that the costs turn out that way. -- Richard Huxton Archonet Ltd
Dennis Björklund wrote: >On Fri, 9 Jan 2004, Richard Huxton wrote: > > > >>>>select invheadref, invprodref, sum(units) >>>>from invtran >>>>group by invheadref, invprodref >>>> >>>> >>>For the above query, shouldn't you have one index for both columns >>>(invheadref, invprodref). Then it should not need to sort at all to do the >>>grouping and it should all be fast. >>> >>> >>Not sure if that would make a difference here, since the whole table is being >>read. >> >> > >The goal was to avoid the sorting which should not be needed with that >index (I hope). So I still think that it would help in this case. > > > Thanks for the advice. I tried creating a compound index along with clustering the invtran table on it, adding another 512MB RAM, increasing shared_buffers to 60000 and increasing sort_mem to 100MB, playing with effective cache size in postgresql.conf. This cut the execution time down to 4 minutes, which was helpful but still way behind firebird. There was still an awful lot of disk activity while it was happening which seems to imply lots of sorting going on (?) Invtran is a big table but it is clustered and static i.e. no updates, select statements only. Mostly my performance problems are with sorts - group by, order by. I was hoping for better results than I've been getting so far. Thanks. p.s. Can someone confirm whether this should work from pgadmin3? i.e. will the size of the sort_mem be changed for the duration of the query or session? set sort_mem to 100000; select ....etc....;
Richard Huxton <dev@archonet.com> writes: >> The goal was to avoid the sorting which should not be needed with that >> index (I hope). So I still think that it would help in this case. > Sorry - not being clear. I can see how it _might_ help, but will the planner > take into account the fact that even though: > index-cost > seqscan-cost > that > (index-cost + no-sorting) < (seqscan-cost + sort-cost) Yes, it would. > assuming of course, that the costs turn out that way. That I'm less sure about. A sort frequently looks cheaper than a full indexscan, unless the table is pretty well clustered on that index, or you knock random_page_cost way down. With no stats at all, CVS tip has these preferences: regression=# create table fooey (f1 int, f2 int, unique(f1,f2)); NOTICE: CREATE TABLE / UNIQUE will create implicit index "fooey_f1_key" for table "fooey" CREATE TABLE regression=# explain select * from fooey group by f1,f2; QUERY PLAN --------------------------------------------------------------- HashAggregate (cost=25.00..25.00 rows=1000 width=8) -> Seq Scan on fooey (cost=0.00..20.00 rows=1000 width=8) (2 rows) regression=# set enable_hashagg TO 0; SET regression=# explain select * from fooey group by f1,f2; QUERY PLAN ------------------------------------------------------------------------------------ Group (cost=0.00..57.00 rows=1000 width=8) -> Index Scan using fooey_f1_key on fooey (cost=0.00..52.00 rows=1000 width=8) (2 rows) regression=# set enable_indexscan TO 0; SET regression=# explain select * from fooey group by f1,f2; QUERY PLAN --------------------------------------------------------------------- Group (cost=69.83..77.33 rows=1000 width=8) -> Sort (cost=69.83..72.33 rows=1000 width=8) Sort Key: f1, f2 -> Seq Scan on fooey (cost=0.00..20.00 rows=1000 width=8) (4 rows) but remember this is for a relatively small (estimated size of) table. regards, tom lane
On Fri, 9 Jan 2004, Richard Huxton wrote: > On Friday 09 January 2004 08:57, Dennis Bj�rklund wrote: > > On Fri, 9 Jan 2004, Richard Huxton wrote: > > > > > select invheadref, invprodref, sum(units) > > > > > from invtran > > > > > group by invheadref, invprodref > > > > > > > > For the above query, shouldn't you have one index for both columns > > > > (invheadref, invprodref). Then it should not need to sort at all to do > > > > the grouping and it should all be fast. > > > > > > Not sure if that would make a difference here, since the whole table is > > > being read. > > > > The goal was to avoid the sorting which should not be needed with that > > index (I hope). So I still think that it would help in this case. > > Sorry - not being clear. I can see how it _might_ help, but will the planner > take into account the fact that even though: > index-cost > seqscan-cost > that > (index-cost + no-sorting) < (seqscan-cost + sort-cost) > assuming of course, that the costs turn out that way. AFAICS, yes it does take that effect into account (as best it can with the estimates).