Re: Increasing GROUP BY CHAR columns speed - Mailing list pgsql-performance

From Scott Carey
Subject Re: Increasing GROUP BY CHAR columns speed
Date
Msg-id BDFBB77C9E07BE4A984DAAE981D19F961ACA17D9CE@EXVMBX018-1.exch018.msoutlookonline.net
Whole thread Raw
In response to Re: Increasing GROUP BY CHAR columns speed  ("Scott Marlowe" <scott.marlowe@gmail.com>)
Responses Re: Increasing GROUP BY CHAR columns speed  ("Andrus" <kobruleht2@hot.ee>)
Re: Increasing GROUP BY CHAR columns speed  ("Andrus" <kobruleht2@hot.ee>)
Re: Increasing GROUP BY CHAR columns speed  ("Scott Marlowe" <scott.marlowe@gmail.com>)
Re: Increasing GROUP BY CHAR columns speed  ("Andrus" <kobruleht2@hot.ee>)
List pgsql-performance
I'm not sure that postgres allocates the whole work_mem each time, and in any event, the OS won't try and allocate to
RAMuntil the page is actually used.  My experience is that oversized work_mem doesn't hurt until it is actually used.
Else,my configuration with 1000 concurrent connections and work_mem = 1GB would have blown up.  I don't have that much
RAM+ SWAP * overcommit.  Of the 1000 connections, only 8 run queries that would ever need more than 2 or 3 MB of space
toexecute.  Of course, one has to be certain what the client connections do for it to be very over sized, so I would
notrecommend the above in general. 

----------
Back to this query:

In the query case shown, the explain analyze shows:
"GroupAggregate  (cost=43403.38..52521.63 rows=41923 width=838) (actual
time=8083.171..8620.908 rows=577 loops=1)"

Thus, the planner thought that it needed ~40K ish rows for results of ~800 bytes in size, hence an approximation of the
requiredhash space is 80M.  However, it returns only 577 rows, so the actual needed hash space is much smaller.  This
isa common problem I've seen -- the query planner has very conservative estimates for result row counts from any
non-trivialfilter condition / aggregate which leads to poor query plans. 

I'd be suprised if this query used more than 1MB total work_mem in reality for that last step if it used a hash.  As it
stands,sorting will actually use much more. 

I'm still not sure why the planner chose to sort rather than hash with oversized work_mem (is there an implied order in
thequery results I missed?).   My guess is that this query can still get much faster if a hash is possible on the last
part. It looks like the gain so far has more to do with sorting purely in memory which reduced the number of compares
required. But that is just a guess. 

________________________________________
From: Scott Marlowe [scott.marlowe@gmail.com]
Sent: Friday, November 28, 2008 10:24 AM
To: Andrus
Cc: Scott Carey; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Increasing GROUP BY CHAR columns speed

On Fri, Nov 28, 2008 at 10:58 AM, Andrus <kobruleht2@hot.ee> wrote:
>
> SET work_mem = 2097151  (this is max allowed value) or SET work_mem = 97151
> decreases query time from 12 seconds to 9 seconds.

Setting work_mem so high that it allocates memory that isn't there
WILL slow down your queries, because the OS will then wind up swapping
out one part of the swap to make room for another part.  There are
values between 100M and 2G.  Run it with increasing work_mem from 100
to 500 or so Meg and see if that helps.  Keep an eye on vmstat 1 or
something to see if your machine starts swapping out while running the
query.  If it does you've overallocated memory and things will start
to slow down a LOT.

pgsql-performance by date:

Previous
From: "Scott Marlowe"
Date:
Subject: Re: Increasing GROUP BY CHAR columns speed
Next
From: "Andrus"
Date:
Subject: Re: Increasing GROUP BY CHAR columns speed