Thread: Sort performance

Sort performance

From
"Subbiah Stalin-XCGF84"
Date:
Hi All,
 
I'm in the process of tuning a query that does a sort on a huge dataset. With work_mem set to 2M, i see the sort operation spilling to disk writing upto 430MB and then return the first 500 rows. Our query is of the sort
 
select co1, col2... from table where col1 like 'aa%' order col1 limit 500; It took 561Secs to complete. Looking at the execution plan 95% of the time is spent on sort vs seq scan on the table.
 
Now if set the work_mem to 500MB (i did this in a psql session without making it global) and ran the same query. One would think the sort operations would happen in memory and not spill to disk but i still see 430MB written to disk however, the query complete time dropped down to 351Secs. So work_mem did have an impact but wondering why its still writing to disk when it can all do it memory.
 
I appreciate if anyone can shed some light on this.
 
Thanks,
Stalin
 
Env: Sol 10, Pg 827 64bit.

Re: Sort performance

From
Robert Haas
Date:
On Thu, Jan 29, 2009 at 3:15 PM, Subbiah Stalin-XCGF84
<SSubbiah@motorola.com> wrote:
> I'm in the process of tuning a query that does a sort on a huge dataset.
> With work_mem set to 2M, i see the sort operation spilling to disk writing
> upto 430MB and then return the first 500 rows. Our query is of the sort
>
> select co1, col2... from table where col1 like 'aa%' order col1 limit 500;
> It took 561Secs to complete. Looking at the execution plan 95% of the time
> is spent on sort vs seq scan on the table.
>
> Now if set the work_mem to 500MB (i did this in a psql session without
> making it global) and ran the same query. One would think the sort
> operations would happen in memory and not spill to disk but i still see
> 430MB written to disk however, the query complete time dropped down to
> 351Secs. So work_mem did have an impact but wondering why its still writing
> to disk when it can all do it memory.
>
> I appreciate if anyone can shed some light on this.

Can you send the EXPLAIN ANALYZE output?

What happens if you set work_mem to something REALLY big, like 5GB?

...Robert

Re: Sort performance

From
Gregory Stark
Date:
Robert Haas <robertmhaas@gmail.com> writes:

> On Thu, Jan 29, 2009 at 3:15 PM, Subbiah Stalin-XCGF84
> <SSubbiah@motorola.com> wrote:
>>
>> i see the sort operation spilling to disk writing upto 430MB and then
>> return the first 500 rows. Our query is of the sort
>>
>> Now if set the work_mem to 500MB (i did this in a psql session without
>> making it global) and ran the same query. One would think the sort
>> operations would happen in memory and not spill to disk but i still see
>> 430MB written to disk however, the query complete time dropped down to
>> 351Secs. So work_mem did have an impact but wondering why its still writing
>> to disk when it can all do it memory.

The on-disk storage is more compact than the in-memory storage so you actually
need a larger value than the space reported for on-disk storage to avoid the
disk sort entirely. The accounting also isn't perfect; the on-disk sort still
uses some ram, for example.

> What happens if you set work_mem to something REALLY big, like 5GB?

Don't set it larger than the available RAM though -- or you'll quite possibly
get an out-of-error error.


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

Re: Sort performance

From
"Subbiah Stalin-XCGF84"
Date:
Here you go.

 Limit  (cost=502843.44..502844.69 rows=501 width=618) (actual
time=561397.940..561429.242 rows=501 loops=1)
   ->  Sort  (cost=502843.44..503923.48 rows=432014 width=618) (actual
time=561397.934..561429.062 rows=501 loops=1)
         Sort Key: name
         ->  Seq Scan on objects  (cost=0.00..99157.88 rows=432014
width=618) (actual time=0.172..22267.727 rows=649999 loops=1)
               Filter: (((domainid)::text = ANY
(('{111,SmWCGiRp}'::character varying[])::text[])) AND (("type")::text =
'cpe'::text) AND (upper((name)::text) ~~ 'CPE1%'::text) AND
(upper((name)::text) >= 'CPE1'::text) AND (upper((name)::text) <
'CPE2'::text))
 Total runtime: 561429.915 ms
(6 rows)

I haven't tried setting that high number. I came up with 500M by
monitoring pgsql_tmp when sort operations were performed. It never went
beyond 450M. Once it reaches 450M it spends some cycles before I see the
output. I guess some sort of merge operation happens to get the first
500 records out.

Thanks,
Stalin

-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Thursday, January 29, 2009 3:21 PM
To: Subbiah Stalin-XCGF84
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Sort performance

On Thu, Jan 29, 2009 at 3:15 PM, Subbiah Stalin-XCGF84
<SSubbiah@motorola.com> wrote:
> I'm in the process of tuning a query that does a sort on a huge
dataset.
> With work_mem set to 2M, i see the sort operation spilling to disk
> writing upto 430MB and then return the first 500 rows. Our query is of

> the sort
>
> select co1, col2... from table where col1 like 'aa%' order col1 limit
> 500; It took 561Secs to complete. Looking at the execution plan 95% of

> the time is spent on sort vs seq scan on the table.
>
> Now if set the work_mem to 500MB (i did this in a psql session without

> making it global) and ran the same query. One would think the sort
> operations would happen in memory and not spill to disk but i still
> see 430MB written to disk however, the query complete time dropped
> down to 351Secs. So work_mem did have an impact but wondering why its
> still writing to disk when it can all do it memory.
>
> I appreciate if anyone can shed some light on this.

Can you send the EXPLAIN ANALYZE output?

What happens if you set work_mem to something REALLY big, like 5GB?

...Robert

Re: Sort performance

From
"Subbiah Stalin-XCGF84"
Date:
Thanks Greg. You were right. If I set my sort_mem to 1G (yes I have
loads of memory, only for testing purpose), then I don't see any thing
written to disk. So in-memory require more memory than reported on-disk
storage.

Stalin

-----Original Message-----
From: Greg Stark [mailto:greg.stark@enterprisedb.com] On Behalf Of
Gregory Stark
Sent: Thursday, January 29, 2009 3:36 PM
To: Robert Haas
Cc: Subbiah Stalin-XCGF84; pgsql-performance@postgresql.org
Subject: Re: Sort performance

Robert Haas <robertmhaas@gmail.com> writes:

> On Thu, Jan 29, 2009 at 3:15 PM, Subbiah Stalin-XCGF84
> <SSubbiah@motorola.com> wrote:
>>
>> i see the sort operation spilling to disk writing upto 430MB and then

>> return the first 500 rows. Our query is of the sort
>>
>> Now if set the work_mem to 500MB (i did this in a psql session
>> without making it global) and ran the same query. One would think the

>> sort operations would happen in memory and not spill to disk but i
>> still see 430MB written to disk however, the query complete time
>> dropped down to 351Secs. So work_mem did have an impact but wondering

>> why its still writing to disk when it can all do it memory.

The on-disk storage is more compact than the in-memory storage so you
actually need a larger value than the space reported for on-disk storage
to avoid the disk sort entirely. The accounting also isn't perfect; the
on-disk sort still uses some ram, for example.

> What happens if you set work_mem to something REALLY big, like 5GB?

Don't set it larger than the available RAM though -- or you'll quite
possibly get an out-of-error error.


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!