Re: Bug? Query plans / EXPLAIN using gigabytes of memory - Mailing list pgsql-general

From Toby Corkindale
Subject Re: Bug? Query plans / EXPLAIN using gigabytes of memory
Date
Msg-id c4d89858-13a7-461d-9ead-021b58a4f608@dmz03.strategicdata.internal
Whole thread Raw
In response to Re: Bug? Query plans / EXPLAIN using gigabytes of memory  (Willy-Bas Loos <willybas@gmail.com>)
Responses Re: Bug? Query plans / EXPLAIN using gigabytes of memory
List pgsql-general
Hi Willy-Bas,
Thanks for your reply.

I realise that stacking the views up like this complicates matters, but the actual views are fairly simple queries, and
eachone individually is only looking at a few dozen rows. (Eg. selecting min, max or average value from a small set,
groupedby one column) 
From the point of view of creating reporting queries, it's a nice and logical way to build up a query, and we didn't
thinkit would present any problems.. and even on a well-populated database, the query runs very fast. It's just the
astoundingamount of memory used that presents difficulties. 

Looking at the postgresql.conf for non-default settings, the notable ones are:

max_connections = 200
ssl = false
shared_buffers = 256MB
max_prepared_transactions = 16
# although they aren't used for the group of queries in question
maintenance_work_mem = 128MB
# work_mem is left at default of 1MB
effective_io_concurrency = 2
random_page_cost = 3.0
effective_cache_size = 512MB
geqo = on
geqo_threshold = 12
geqo_effort = 7

Some other things are non-default, like checkpoints, streaming-replication stuff, but those shouldn't have any effect.

The memory settings (shared buffers, effective cache) might seem to be set quite conservatively at the moment, given
thememory available in the machine -- but since we can exhaust that memory with just a few connections, it seems fair. 

Cheers,
Toby

----- Original Message -----
From: "Willy-Bas Loos" <willybas@gmail.com>
To: "Toby Corkindale" <toby.corkindale@strategicdata.com.au>
Cc: "pgsql-general" <pgsql-general@postgresql.org>
Sent: Wednesday, 25 April, 2012 6:05:37 PM
Subject: Re: [GENERAL] Bug? Query plans / EXPLAIN using gigabytes of memory


Stacking views is a bad practice. It usually means that you are making the db do a lot of unnecessary work, scanning
tablesmore than once when you don't even need them.  
According to your description, you have 3 layers of views on partitioned tables.
I can imagine that that leaves the planner with a lot of possible query plans, a lot of interaction and a lot of
statisticsto read.  

do you have any special settings for the statistics on these tables?
and could you please post the non-default settings in your postgresql.conf file?
$ grep ^[^#] /etc/ postgresql /9.1/main/ postgresql . conf | grep -e ^[^[:space:]]

Would be helpful to see if you have any statistics or planner stuff altered.

Cheers,

WBL



On Wed, Apr 25, 2012 at 8:18 AM, Toby Corkindale < toby.corkindale@strategicdata.com.au > wrote:


Hi,
I'm hitting some peculiar behaviour. I'm currently on Pg 9.1.3 on a 64bit Debian system.

I have a database which is moderately large - 20 GByte or so - and contains that data split up over dozens of tables,
whichare themselves partitioned.  
Queries are usually only run against fairly small, partitioned, sets of data.

These queries generally run fairly fast. Performance is not a problem.

However Postgres is chewing up huge amounts of memory just to create the query plan!

For example, even if I just run
EXPLAIN SELECT a_column FROM a_view
WHERE partition_id = 1;

Then the postgres backend process takes several seconds to return, and in the worst example here, is hogging more than
3Gbyteonce it comes back. (It doesn't free that up until you close the connection)  

The query plan that comes back does seem quite convoluted, but then, the view is a query run over about eight other
views,each of which is pulling data from a few other views. The actual underlying data being touched is only *a few
dozen*small rows.  

As I said, the query runs fast enough.. however we only need a handful of these queries to get run in separate
connections,and the database server will be exhausted of memory. Especially since the memory isn't returned until the
endof the connection, yet these connections typically stay up for a while.  

I wondered if there's anything I can do to reduce this memory usage? And, is this a bug?

I've posted the output of the query plan here: https://gist.github.com/2487097


Thanks in advance,
Toby

--
Sent via pgsql-general mailing list ( pgsql-general@postgresql.org )
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth


pgsql-general by date:

Previous
From: Willy-Bas Loos
Date:
Subject: Re: how robust are custom dumps?
Next
From: Abbas
Date:
Subject: Fwd: FW: Really heart touching.........