Re: PostgreSQL Query Speed Issues - Mailing list pgsql-novice

From Joseph Pravato
Subject Re: PostgreSQL Query Speed Issues
Date
Msg-id 5127ED08.3020408@nomagic.com
Whole thread Raw
In response to Re: PostgreSQL Query Speed Issues  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: PostgreSQL Query Speed Issues  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-novice
On 2/22/2013 8:58 AM, Joseph Pravato wrote:
----- Forwarded Message -----
From: "Kevin Grittner" <kgrittn@ymail.com>
To: "Tom Lisjac" <netdxr@gmail.com>, pgsql-novice@postgresql.org
Sent: Friday, February 22, 2013 7:29:58 AM
Subject: Re: [NOVICE] PostgreSQL Query Speed Issues

Tom Lisjac <netdxr@gmail.com> wrote:

version 9.2.1 running on Centos6/64
There are performance problems in the 9.2 branch which are fixed in
9.2.3.  If you care about performance, or for that matter running
with known bugs fixed, upgrade.

http://www.postgresql.org/support/versioning/
Our server manager is going to update to 9.2.3 this weekend to see if we
get any improvements.

The database is running in a Xen VM with 12GB of ram, 4 virtual
CPU's and fast, dedicated physical disks rather then shared
network storage.  The problem queries take one core to saturation
and keep it there with very little disk I/O.
max_connections = 200
You may want to consider connection pooling:

http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
On our server, we are using connection pooling with a maximum of 50. For
the moment, we've moved it down to max_connections = 100. Originally,
the 200 was because we were using a multi-threaded migration routine to
load data into Postgres. However, we've set up a second database
dedicated for migration and are using pg_dump & pg_restore.


shared_buffers = 2GB
The usual advice is to start at 25% of machine RAM, up to 8GB, and
check performance with incremental updates from there.  That would
suggest a 4GB starting point.

effective_cache_size = 1024MB
This should normally be around 75% of machine RAM, so 9GB.  This
does not actually allocate any RAM, but is used to estimate how
much of the indexes may be in RAM on repeated access.  A larger
number allows more index usage.
We actually only have 11.37GB on our server, so we've used 3072MB for
shared_buffers & 8400MB for effective_cache_size.


In addition, with 12GB and apparently not more than 2 million rows
per table, you seem very likely to have the active portion of your
database fully cached.  So these settings are likely to help:

seq_page_cost = 0.1
random_page_cost = 0.1
cpu_tuple_cost = 0.03
Wow, the random_page_cost setting made a huge difference. The query we
mentioned yesterday that takes 30 minutes is down to about 30 seconds.
Nice speed improvement, reasonable speed, but still could use
improvements. However, I'm a bit worried about changing it so
drastically. Can there be any future issues from this change? We have 4
or 5 tables that will continue to grow fairly rapidly (1 million row
increase every 2-3 years).


I normally don't set work_mem above RAM * 0.25 / max_connections,
so I'd say 10MB to 15MB would probably be good with your
max_connections setting.  If you can lower that with a connection
pool you might want to go to 20MB or 40MB.
For the moment, It is set to 30MB based on your formula and the new 100
connection limit.


You probably want to boost maintenance_work_mem to something like
512MB.

If you haven't already done so, run VACUUM ANALYZE at the database
level.  If most of your data was loaded at about the same time, run
VACUUM FREEZE ANALYZE instead, to prevent a read-and-rewrite pass
of your entire database at peak OLTP load.  Vacuum and analyze are
routine maintenance that are necessary for the database to perform
well.  Autovacuum can often handle everything for you, but if you
have an "off-hours" period when load is lower it is often a good
idea to run a database VACUUM ANALYZE on a daily or weekly basis to
shift maintenance load to time when it has the least impact.  Never
leave autovacuum disabled beyond a short maintenance or load
window.
The vacuuming didn't seem to change any performances, and it didn't take
very long to run. Autovacuum may have already taken care of most
potential issues from this.


Regarding your specific problem...

You claim this is how contact_address_map is defined:

create table contact_address_map (   contact id int8 not null,   address_id int8 not null,   unique (address_id)
);

... but your query is looking for rows in that table where
address_id is null, and finding them.  Those can't both be true.
We are migrating a really old db to postgres and cleaning up all of the
data. In the old system, only one address was allowed per user and
therefore with the migrated data there is only going to be 1 entry in
the map for each user. We were looking for which users didn't have an
address at all and the query was a optimization with this assumption. We
are aware that the query is not ideal, but it works for the
just-after-migrating scenario for validating parts of the migration
routine. Do note that were were left joining the two tables so that all
contacts are guaranteed to return, only with a check on address_id to
see if they didn't have any addresses.


Your plan shows an index scan with no index conditions to pass
every row in a table, which is much slower than a seqscan.  Did you
turn off enable_seqscan and fail to mention that?  Doing so will
definitely result in sub-optimal performance on queries like the
one shown, where every row in one of the tables must be read.
No, enable_seqscan is still turned on.


When you don't provide accurate information, any advice you get may
be off-target.

pgsql-novice by date:

Previous
From: Wolfgang Keller
Date:
Subject: Re: minimum hardware requirements for small postgres db
Next
From: Kevin Grittner
Date:
Subject: Re: PostgreSQL Query Speed Issues