Re: Sort performance on large tables - Mailing list pgsql-performance

From Charlie Savage
Subject Re: Sort performance on large tables
Date
Msg-id dkr7ga$18a4$1@news.hub.org
Whole thread Raw
In response to Re: Sort performance on large tables  (Richard Huxton <dev@archonet.com>)
Responses Re: Sort performance on large tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Sort performance on large tables  (Christopher Kings-Lynne <chriskl@familyhealth.com.au>)
List pgsql-performance
Thanks everyone for the feedback.

I tried increasing work_mem:

set work_mem to 300000;

select tlid, min(ogc_fid)
from completechain
group by tld;

The results are:

"GroupAggregate  (cost=9041602.80..10003036.88 rows=48071704 width=8)
(actual time=4371749.523..5106162.256 rows=47599910 loops=1)"
"  ->  Sort  (cost=9041602.80..9161782.06 rows=48071704 width=8) (actual
time=4371690.894..4758660.433 rows=48199165 loops=1)"
"        Sort Key: tlid"
"        ->  Seq Scan on completechain  (cost=0.00..2228584.04
rows=48071704 width=8) (actual time=49.518..805234.970 rows=48199165
loops=1)"
"Total runtime: 5279988.127 ms"

Thus the time decreased from 8486 seconds to 5279 seconds - which is a
nice improvement.  However, that still leaves postgresql about 9 times
slower.

I tried increasing work_mem up to 500000, but at that point the machine
started using its swap partition and performance degraded back to the
original values.

Charlie


Richard Huxton wrote:
 > Charlie Savage wrote:
 >> Hi everyone,
 >>
 >> I have a question about the performance of sort.
 >
 >> Note it takes over 10 times longer to do the sort than the full
 >> sequential scan.
 >>
 >> Should I expect results like this?  I realize that the computer is
 >> quite low-end and is very IO bound for this query, but I'm still
 >> surprised that the sort operation takes so long.
 >
 > The sort will be spilling to disk, which will grind your I/O to a halt.
 >
 >> work_mem =  16384                        # in Kb
 >
 > Try upping this. You should be able to issue "set work_mem = 100000"
 > before running your query IIRC. That should let PG do its sorting in
 > larger chunks.
 >
 > Also, if your most common access pattern is ordered via tlid look into
 > clustering the table on that.



Richard Huxton wrote:
> Charlie Savage wrote:
>> Hi everyone,
>>
>> I have a question about the performance of sort.
>
>> Note it takes over 10 times longer to do the sort than the full
>> sequential scan.
>>
>> Should I expect results like this?  I realize that the computer is
>> quite low-end and is very IO bound for this query, but I'm still
>> surprised that the sort operation takes so long.
>
> The sort will be spilling to disk, which will grind your I/O to a halt.
>
>> work_mem =  16384                        # in Kb
>
> Try upping this. You should be able to issue "set work_mem = 100000"
> before running your query IIRC. That should let PG do its sorting in
> larger chunks.
>
> Also, if your most common access pattern is ordered via tlid look into
> clustering the table on that.

pgsql-performance by date:

Previous
From: Stephan Szabo
Date:
Subject: Re: Sort performance on large tables
Next
From: Tom Lane
Date:
Subject: Re: Sort performance on large tables