Thread: Slow query problem

Slow query problem

From
Bradley Tate
Date:
Hi,

We've set up a little test box (1GHz Athlon, 40G IDE drive, 256M RAM,
Redhat 9) to do some basic comparisons between postgresql and firebird
1.0.3 and 1.5rc8. Mostly the results are comparable, with one
significant exception.

QUERY
select invheadref, invprodref, sum(units)
from invtran
group by invheadref, invprodref

RESULTS
pg 7.3.4  -  5.5 min
pg 7.4.0  -  10 min
fb 1.0.3   -  64 sec
fb 1.5     -   44 sec

* The invtran table has about 2.5 million records, invheadref and
invprodref are both char(10) and indexed.
* shared_buffers = 12000 and sort_mem = 8192 are the only changes I've
made to postgresql.conf, with relevant changes to shmall and shmmax.

This is an explain analyse plan from postgresql 7.4:

                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------


GroupAggregate  (cost=572484.23..601701.15 rows=1614140 width=39)
(actual time=500091.171..554203.189 rows=147621 loops=1)
   ->  Sort  (cost=572484.23..578779.62 rows=2518157 width=39) (actual
time=500090.939..527500.940 rows=2521530 loops=1)
         Sort Key: invheadref, invprodref
         ->  Seq Scan on invtran  (cost=0.00..112014.57 rows=2518157
width=39) (actual time=16.002..25516.917 rows=2521530 loops=1)
 Total runtime: 554826.827 ms
(5 rows)

Am I correct in interpreting that most time was spent doing the sorting?
Explain confuses the heck out of me and any help on how I could make
this run faster would be gratefully received.

Cheers,

Bradley.


Re: Slow query problem

From
Mike Glover
Date:
On Thu, 08 Jan 2004 16:52:05 +1100
Bradley Tate <btate@objectmastery.com> wrote:
> Am I correct in interpreting that most time was spent doing the
> sorting?

looks so.  your table is about 70MB total size, and its getting loaded
completely into memory (you have 12000 * 8k = 96M available).  26s to
load 70MB from disk seems reasonable.  The rest of the time is used for
sorting.

> Explain confuses the heck out of me and any help on how I could make
> this run faster would be gratefully received.
>

You should bump sort_mem as high as you can stand.  with only 8MB sort
memory available, you're swapping intermediate sort pages to disk --
a lot. Try the query with sort_mem set to 75MB (to do the entire sort in
memory).

-mike

> Cheers,
>
> Bradley.
>
>
> ---------------------------(end of
> broadcast)--------------------------- TIP 8: explain analyze is your
> friend


--
Mike Glover
Key ID BFD19F2C <mpg4@duluoz.net>

Attachment

Re: Slow query problem

From
Bruno Wolff III
Date:
On Thu, Jan 08, 2004 at 19:27:16 -0800,
  Mike Glover <mpg4@duluoz.net> wrote:
>
> You should bump sort_mem as high as you can stand.  with only 8MB sort
> memory available, you're swapping intermediate sort pages to disk --
> a lot. Try the query with sort_mem set to 75MB (to do the entire sort in
> memory).

Postgres also might be able to switch to a hash aggregate instead of
using a sort if sortmem is made large enough to hold the results for
all of the (estimated) groups.

Re: Slow query problem

From
Tom Lane
Date:
Mike Glover <mpg4@duluoz.net> writes:
> You should bump sort_mem as high as you can stand.  with only 8MB sort
> memory available, you're swapping intermediate sort pages to disk --
> a lot. Try the query with sort_mem set to 75MB (to do the entire sort in
> memory).

7.4 will probably flip over to a hash-based aggregation method, and not
sort at all, once you make sort_mem large enough that it thinks the hash
table will fit in sort_mem.

            regards, tom lane

Re: Slow query problem

From
Dennis Björklund
Date:
On Thu, 8 Jan 2004, Bradley Tate wrote:

> We've set up a little test box (1GHz Athlon, 40G IDE drive, 256M RAM,
> Redhat 9) to do some basic comparisons between postgresql and firebird
> 1.0.3 and 1.5rc8. Mostly the results are comparable, with one
> significant exception.
>
> QUERY
> select invheadref, invprodref, sum(units)
> from invtran
> group by invheadref, invprodref
>
> RESULTS
> pg 7.3.4  -  5.5 min
> pg 7.4.0  -  10 min
> fb 1.0.3   -  64 sec
> fb 1.5     -   44 sec
>
> * The invtran table has about 2.5 million records, invheadref and
> invprodref are both char(10) and indexed.

For the above query, shouldn't you have one index for both columns
(invheadref, invprodref). Then it should not need to sort at all to do the
grouping and it should all be fast.

--
/Dennis Björklund


Re: Slow query problem

From
Richard Huxton
Date:
On Friday 09 January 2004 07:29, Dennis Björklund wrote:
> On Thu, 8 Jan 2004, Bradley Tate wrote:
> >
> > select invheadref, invprodref, sum(units)
> > from invtran
> > group by invheadref, invprodref

> For the above query, shouldn't you have one index for both columns
> (invheadref, invprodref). Then it should not need to sort at all to do the
> grouping and it should all be fast.

Not sure if that would make a difference here, since the whole table is being
read.

--
  Richard Huxton
  Archonet Ltd

Re: Slow query problem

From
Dennis Björklund
Date:
On Fri, 9 Jan 2004, Richard Huxton wrote:

> > > select invheadref, invprodref, sum(units)
> > > from invtran
> > > group by invheadref, invprodref
>
> > For the above query, shouldn't you have one index for both columns
> > (invheadref, invprodref). Then it should not need to sort at all to do the
> > grouping and it should all be fast.
>
> Not sure if that would make a difference here, since the whole table is being
> read.

The goal was to avoid the sorting which should not be needed with that
index (I hope). So I still think that it would help in this case.

--
/Dennis Björklund


Re: Slow query problem

From
Richard Huxton
Date:
On Friday 09 January 2004 08:57, Dennis Björklund wrote:
> On Fri, 9 Jan 2004, Richard Huxton wrote:
> > > > select invheadref, invprodref, sum(units)
> > > > from invtran
> > > > group by invheadref, invprodref
> > >
> > > For the above query, shouldn't you have one index for both columns
> > > (invheadref, invprodref). Then it should not need to sort at all to do
> > > the grouping and it should all be fast.
> >
> > Not sure if that would make a difference here, since the whole table is
> > being read.
>
> The goal was to avoid the sorting which should not be needed with that
> index (I hope). So I still think that it would help in this case.

Sorry - not being clear. I can see how it _might_ help, but will the planner
take into account the fact that even though:
  index-cost > seqscan-cost
that
  (index-cost + no-sorting) < (seqscan-cost + sort-cost)
assuming of course, that the costs turn out that way.

--
  Richard Huxton
  Archonet Ltd

Re: Slow query problem

From
Bradley Tate
Date:
Dennis Björklund wrote:

>On Fri, 9 Jan 2004, Richard Huxton wrote:
>
>
>
>>>>select invheadref, invprodref, sum(units)
>>>>from invtran
>>>>group by invheadref, invprodref
>>>>
>>>>
>>>For the above query, shouldn't you have one index for both columns
>>>(invheadref, invprodref). Then it should not need to sort at all to do the
>>>grouping and it should all be fast.
>>>
>>>
>>Not sure if that would make a difference here, since the whole table is being
>>read.
>>
>>
>
>The goal was to avoid the sorting which should not be needed with that
>index (I hope). So I still think that it would help in this case.
>
>
>
Thanks for the advice. I tried creating a compound index along with
clustering the invtran table on it, adding another 512MB RAM, increasing
shared_buffers to 60000 and increasing sort_mem to 100MB, playing with
effective cache size in postgresql.conf. This cut the execution time
down to 4 minutes, which was helpful but still way behind firebird.
There was still an awful lot of disk activity while it was happening
which seems to imply lots of sorting going on (?)

Invtran is a big table but it is clustered and static i.e. no updates,
select statements only.

Mostly my performance problems are with sorts - group by, order by. I
was hoping for better results than I've been getting so far.

Thanks.

p.s.
Can someone confirm whether this should work from pgadmin3? i.e.  will
the size of the sort_mem be changed for the duration of the query or
session?

set sort_mem  to 100000;
select ....etc....;




Re: Slow query problem

From
Tom Lane
Date:
Richard Huxton <dev@archonet.com> writes:
>> The goal was to avoid the sorting which should not be needed with that
>> index (I hope). So I still think that it would help in this case.

> Sorry - not being clear. I can see how it _might_ help, but will the planner
> take into account the fact that even though:
>   index-cost > seqscan-cost
> that
>   (index-cost + no-sorting) < (seqscan-cost + sort-cost)

Yes, it would.

> assuming of course, that the costs turn out that way.

That I'm less sure about.  A sort frequently looks cheaper than a full
indexscan, unless the table is pretty well clustered on that index,
or you knock random_page_cost way down.

With no stats at all, CVS tip has these preferences:

regression=# create table fooey (f1 int, f2 int, unique(f1,f2));
NOTICE:  CREATE TABLE / UNIQUE will create implicit index "fooey_f1_key" for table "fooey"
CREATE TABLE
regression=# explain select * from fooey group by f1,f2;
                          QUERY PLAN
---------------------------------------------------------------
 HashAggregate  (cost=25.00..25.00 rows=1000 width=8)
   ->  Seq Scan on fooey  (cost=0.00..20.00 rows=1000 width=8)
(2 rows)

regression=# set enable_hashagg TO 0;
SET
regression=# explain select * from fooey group by f1,f2;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Group  (cost=0.00..57.00 rows=1000 width=8)
   ->  Index Scan using fooey_f1_key on fooey  (cost=0.00..52.00 rows=1000 width=8)
(2 rows)

regression=# set enable_indexscan TO 0;
SET
regression=# explain select * from fooey group by f1,f2;
                             QUERY PLAN
---------------------------------------------------------------------
 Group  (cost=69.83..77.33 rows=1000 width=8)
   ->  Sort  (cost=69.83..72.33 rows=1000 width=8)
         Sort Key: f1, f2
         ->  Seq Scan on fooey  (cost=0.00..20.00 rows=1000 width=8)
(4 rows)

but remember this is for a relatively small (estimated size of) table.

            regards, tom lane

Re: Slow query problem

From
Stephan Szabo
Date:
On Fri, 9 Jan 2004, Richard Huxton wrote:

> On Friday 09 January 2004 08:57, Dennis Bj�rklund wrote:
> > On Fri, 9 Jan 2004, Richard Huxton wrote:
> > > > > select invheadref, invprodref, sum(units)
> > > > > from invtran
> > > > > group by invheadref, invprodref
> > > >
> > > > For the above query, shouldn't you have one index for both columns
> > > > (invheadref, invprodref). Then it should not need to sort at all to do
> > > > the grouping and it should all be fast.
> > >
> > > Not sure if that would make a difference here, since the whole table is
> > > being read.
> >
> > The goal was to avoid the sorting which should not be needed with that
> > index (I hope). So I still think that it would help in this case.
>
> Sorry - not being clear. I can see how it _might_ help, but will the planner
> take into account the fact that even though:
>   index-cost > seqscan-cost
> that
>   (index-cost + no-sorting) < (seqscan-cost + sort-cost)
> assuming of course, that the costs turn out that way.

AFAICS, yes it does take that effect into account (as best
it can with the estimates).