Thread: PostgreSQL Query Speed Issues

PostgreSQL Query Speed Issues

From

Joseph Pravato

Date:

21 February 2013, 20:49:27

Hi!

We are running into an interesting issue with PostgreSQL 9.2. Some of the queries are returning in 5-30 minutes where
othersnever return at all.  

The same queries return in several seconds in other databases we have tested on with the same data. Here is a sample
querythat we are running into this issue with, hopefully some insight on this issue will help us solve a similar
problemwith the rest of the queries. 

NOTE: All database tests were done without changing or updating any settings after install.

# rows in contact: 574939
# rows in contact_address_map: 574924


Original: NEVER RETURNS (tested to over an hour)

select * from contact where id not in (select contact_id from contact_address_map)

Result from an 'explain analyze':

Since the original query never returned, this was not run.


Optimized for Postgres: RETURNS

select c.*
from contact c
left join CONTACT_ADDRESS_MAP cam on cam.CONTACT_ID = c.id
where cam.ADDRESS_ID is null

Result from an 'explain analyze':

QUERY PLAN
Merge Left Join  (cost=83512.87..158059.12 rows=1 width=952) (actual time=5224.171..5224.270 rows=15 loops=1)
  Merge Cond: (c.id = cam.contact_id)
  Filter: (cam.address_id IS NULL)
  Rows Removed by Filter: 574924
  ->  Index Scan using contact_pkey on contact c  (cost=0.00..63048.48 rows=574917 width=952) (actual
time=0.009..852.708rows=574939 loops=1) 
  ->  Materialize  (cost=83512.59..86387.17 rows=574917 width=16) (actual time=1407.387..3107.246 rows=574924 loops=1)
        ->  Sort  (cost=83512.59..84949.88 rows=574917 width=16) (actual time=1407.379..2027.946 rows=574924 loops=1)
              Sort Key: cam.contact_id
              Sort Method: external sort  Disk: 14616kB
              ->  Seq Scan on contact_address_map cam  (cost=0.00..8857.17 rows=574917 width=16) (actual
time=0.018..578.348rows=574924 loops=1) 
Total runtime: 5228.459 ms


We have done some investigation online and it looks like many of these speed issues require hand optimizing each query
specificto PostgreSQL where as other databases seem to do it automatically. Is it possible to change a configuration
optionon the database or does PostgreSQL require hand optimization to all queries to run optimally. 

Any help would be greatly appreciated.

Re: PostgreSQL Query Speed Issues

From

Douglas J Hunley

Date:

21 February 2013, 21:00:29

On Thu, Feb 21, 2013 at 3:56 PM, Joseph Pravato
<joseph.pravato@nomagic.com> wrote:
> NOTE: All database tests were done without changing or updating any settings after install.

The defaults are sub-optimal. You really do need to tune them to the
server in question.

>
> # rows in contact: 574939
> # rows in contact_address_map: 574924

After loading this data, did you run an ANALYZE?

> select c.*
> from contact c
> left join CONTACT_ADDRESS_MAP cam on cam.CONTACT_ID = c.id
> where cam.ADDRESS_ID is null

Table definitions would probably help. You might be missing indexes.

>
> Result from an 'explain analyze':
>
> QUERY PLAN
> Merge Left Join  (cost=83512.87..158059.12 rows=1 width=952) (actual time=5224.171..5224.270 rows=15 loops=1)
>   Merge Cond: (c.id = cam.contact_id)
>   Filter: (cam.address_id IS NULL)
>   Rows Removed by Filter: 574924
>   ->  Index Scan using contact_pkey on contact c  (cost=0.00..63048.48 rows=574917 width=952) (actual
time=0.009..852.708rows=574939 loops=1) 
>   ->  Materialize  (cost=83512.59..86387.17 rows=574917 width=16) (actual time=1407.387..3107.246 rows=574924
loops=1)
>         ->  Sort  (cost=83512.59..84949.88 rows=574917 width=16) (actual time=1407.379..2027.946 rows=574924 loops=1)
>               Sort Key: cam.contact_id
>               Sort Method: external sort  Disk: 14616kB

disk sorts imply work_mem isn't big enough.

>               ->  Seq Scan on contact_address_map cam  (cost=0.00..8857.17 rows=574917 width=16) (actual
time=0.018..578.348rows=574924 loops=1) 

This should probably be using an index

--
Douglas J Hunley (doug.hunley@gmail.com)
Twitter: @hunleyd                                               Web:
douglasjhunley.com
G+: http://goo.gl/sajR3

Re: PostgreSQL Query Speed Issues

From

Joseph Pravato

Date:

21 February 2013, 22:52:46

On 2/21/2013 3:00 PM, Douglas J Hunley wrote:
> On Thu, Feb 21, 2013 at 3:56 PM, Joseph Pravato
> <joseph.pravato@nomagic.com> wrote:
>> NOTE: All database tests were done without changing or updating any settings after install.
> The defaults are sub-optimal. You really do need to tune them to the
> server in question.
Do you have an recommendations that are optimized for database with
600,000 user records and tables with up to 2,000,000 records?


>
>> # rows in contact: 574939
>> # rows in contact_address_map: 574924
> After loading this data, did you run an ANALYZE?
No we did not, why would this be useful?


>
>> select c.*
>> from contact c
>> left join CONTACT_ADDRESS_MAP cam on cam.CONTACT_ID = c.id
>> where cam.ADDRESS_ID is null
> Table definitions would probably help. You might be missing indexes.
Table definitions link (http://pastebin.com/GyCsYpBn). See index comment
below.


>
>> Result from an 'explain analyze':
>>
>> QUERY PLAN
>> Merge Left Join  (cost=83512.87..158059.12 rows=1 width=952) (actual time=5224.171..5224.270 rows=15 loops=1)
>>   Merge Cond: (c.id = cam.contact_id)
>>   Filter: (cam.address_id IS NULL)
>>   Rows Removed by Filter: 574924
>>   ->  Index Scan using contact_pkey on contact c  (cost=0.00..63048.48 rows=574917 width=952) (actual
time=0.009..852.708rows=574939 loops=1) 
>>   ->  Materialize  (cost=83512.59..86387.17 rows=574917 width=16) (actual time=1407.387..3107.246 rows=574924
loops=1)
>>         ->  Sort  (cost=83512.59..84949.88 rows=574917 width=16) (actual time=1407.379..2027.946 rows=574924
loops=1)
>>               Sort Key: cam.contact_id
>>               Sort Method: external sort  Disk: 14616kB
> disk sorts imply work_mem isn't big enough.
I was surprised that the work_mem default was so low, we changed it from
1MB to 50MB and this is the `explain analyze' for
select * from contact where id not in (select contact_id from
contact_address_map)

QUERY PLAN
Seq Scan on contact  (cost=10294.55..31256.01 rows=287458 width=952)
(actual time=1555.473..1582.885 rows=16 loops=1)
  Filter: (NOT (hashed SubPlan 1))
  Rows Removed by Filter: 574924
  SubPlan 1
    ->  Seq Scan on contact_address_map  (cost=0.00..8857.24 rows=574924
width=8) (actual time=0.013..586.107 rows=574924 loops=1)
Total runtime: 1597.773 ms

Increasing work_mem did not improve speeds for an unrelated query, it
ended up returning in 26 minutes. We had set the work_mem to 1000MB for
that test, we will try to get a simplified version of the query that
does not expose company data to you tomorrow.


>
>>               ->  Seq Scan on contact_address_map cam  (cost=0.00..8857.17 rows=574917 width=16) (actual
time=0.018..578.348rows=574924 loops=1) 
> This should probably be using an index
We added indexes to both columns of contact_address_map, but they made
no difference in speed for both queries.

Any additional assistance is appreciated.

Re: PostgreSQL Query Speed Issues

From

Joseph Pravato

Date:

21 February 2013, 22:55:19

On 2/21/2013 3:00 PM, Douglas J Hunley wrote:
> On Thu, Feb 21, 2013 at 3:56 PM, Joseph Pravato
> <joseph.pravato@nomagic.com> wrote:
>> NOTE: All database tests were done without changing or updating any settings after install.
> The defaults are sub-optimal. You really do need to tune them to the
> server in question.
Do you have an recommendations that are optimized for database with
600,000 user records and tables with up to 2,000,000 records?


>
>> # rows in contact: 574939
>> # rows in contact_address_map: 574924
> After loading this data, did you run an ANALYZE?
No we did not, why would this be useful?


>
>> select c.*
>> from contact c
>> left join CONTACT_ADDRESS_MAP cam on cam.CONTACT_ID = c.id
>> where cam.ADDRESS_ID is null
> Table definitions would probably help. You might be missing indexes.
Table definitions link (http://pastebin.com/GyCsYpBn). See index comment
below.


>
>> Result from an 'explain analyze':
>>
>> QUERY PLAN
>> Merge Left Join  (cost=83512.87..158059.12 rows=1 width=952) (actual time=5224.171..5224.270 rows=15 loops=1)
>>   Merge Cond: (c.id = cam.contact_id)
>>   Filter: (cam.address_id IS NULL)
>>   Rows Removed by Filter: 574924
>>   ->  Index Scan using contact_pkey on contact c  (cost=0.00..63048.48 rows=574917 width=952) (actual
time=0.009..852.708rows=574939 loops=1) 
>>   ->  Materialize  (cost=83512.59..86387.17 rows=574917 width=16) (actual time=1407.387..3107.246 rows=574924
loops=1)
>>         ->  Sort  (cost=83512.59..84949.88 rows=574917 width=16) (actual time=1407.379..2027.946 rows=574924
loops=1)
>>               Sort Key: cam.contact_id
>>               Sort Method: external sort  Disk: 14616kB
> disk sorts imply work_mem isn't big enough.
I was surprised that the work_mem default was so low, we changed it from
1MB to 50MB and this is the `explain analyze' for
select * from contact where id not in (select contact_id from
contact_address_map)

QUERY PLAN
Seq Scan on contact  (cost=10294.55..31256.01 rows=287458 width=952)
(actual time=1555.473..1582.885 rows=16 loops=1)
  Filter: (NOT (hashed SubPlan 1))
  Rows Removed by Filter: 574924
  SubPlan 1
    ->  Seq Scan on contact_address_map  (cost=0.00..8857.24 rows=574924
width=8) (actual time=0.013..586.107 rows=574924 loops=1)
Total runtime: 1597.773 ms

Increasing work_mem did not improve speeds for an unrelated query, it
ended up returning in 26 minutes. We had set the work_mem to 1000MB for
that test, we will try to get a simplified version of the query that
does not expose company data to you tomorrow.


>
>>               ->  Seq Scan on contact_address_map cam  (cost=0.00..8857.17 rows=574917 width=16) (actual
time=0.018..578.348rows=574924 loops=1) 
> This should probably be using an index
We added indexes to both columns of contact_address_map, but they made
no difference in speed for both queries.

Any additional assistance is appreciated.

Re: PostgreSQL Query Speed Issues

From

Daniel Staal

Date:

21 February 2013, 23:37:46

--As of February 21, 2013 4:55:03 PM -0600, Joseph Pravato is alleged to
have said:

>>> # rows in contact: 574939
>>> # rows in contact_address_map: 574924
>> After loading this data, did you run an ANALYZE?
> No we did not, why would this be useful?

--As for the rest, it is mine.

Because it would let the database collect information on how the data is
stored and distributed, thereby giving it a better chance to choose the
best query plan.  (Things like: What are common values, what are ranges of
values, etc.  If half the database has the same zip code, using an index
for the zip code isn't useful for that zip code.  If it's an even
distribution, it is.  ANALYZE lets the database figure that out.)

Recent versions of PostgreSQL will periodically do this in the background,
but doing it explicitly in the foreground after major uploads of data is
almost always a good idea.

Daniel T. STaal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------

Re: PostgreSQL Query Speed Issues

From

Tom Lisjac

Date:

22 February 2013, 05:24:45

Hi all,

On 02/21/2013 03:55 PM, Joseph Pravato wrote:
> Do you have an recommendations that are optimized for database with
> 600,000 user records and tables with up to 2,000,000 records?

Joey and I are both working on this and it's related to an issue I
posted a few weeks ago about a "hang". Since then I learned that a
better problem description is "a query that takes longer to complete
then anyone can tolerate". :)

Here is our postgresql.conf for version 9.2.1 running on Centos6/64:

listen_addresses = '*'
max_connections = 200
shared_buffers = 2GB
effective_cache_size = 1024MB
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%a.log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
log_timezone = 'US/Central'
datestyle = 'iso, mdy'
timezone = 'US/Central'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'

The database is running in a Xen VM with 12GB of ram, 4 virtual CPU's
and fast, dedicated physical disks rather then shared network storage.
The problem queries take one core to saturation and keep it there with
very little disk I/O.

Any suggestions or insights would be greatly appreciated.

Thanks,

-Tom

Re: PostgreSQL Query Speed Issues

From

Kevin Grittner

Date:

22 February 2013, 13:30:16

Tom Lisjac <netdxr@gmail.com> wrote:

> version 9.2.1 running on Centos6/64

There are performance problems in the 9.2 branch which are fixed in
9.2.3.  If you care about performance, or for that matter running
with known bugs fixed, upgrade.

http://www.postgresql.org/support/versioning/

> The database is running in a Xen VM with 12GB of ram, 4 virtual
> CPU's and fast, dedicated physical disks rather then shared
> network storage.  The problem queries take one core to saturation
> and keep it there with very little disk I/O.

> max_connections = 200

You may want to consider connection pooling:

http://wiki.postgresql.org/wiki/Number_Of_Database_Connections

> shared_buffers = 2GB

The usual advice is to start at 25% of machine RAM, up to 8GB, and
check performance with incremental updates from there.  That would
suggest a 4GB starting point.

> effective_cache_size = 1024MB

This should normally be around 75% of machine RAM, so 9GB.  This
does not actually allocate any RAM, but is used to estimate how
much of the indexes may be in RAM on repeated access.  A larger
number allows more index usage.

In addition, with 12GB and apparently not more than 2 million rows
per table, you seem very likely to have the active portion of your
database fully cached.  So these settings are likely to help:

seq_page_cost = 0.1
random_page_cost = 0.1
cpu_tuple_cost = 0.03

I normally don't set work_mem above RAM * 0.25 / max_connections,
so I'd say 10MB to 15MB would probably be good with your
max_connections setting.  If you can lower that with a connection
pool you might want to go to 20MB or 40MB.

You probably want to boost maintenance_work_mem to something like
512MB.

If you haven't already done so, run VACUUM ANALYZE at the database
level.  If most of your data was loaded at about the same time, run
VACUUM FREEZE ANALYZE instead, to prevent a read-and-rewrite pass
of your entire database at peak OLTP load.  Vacuum and analyze are
routine maintenance that are necessary for the database to perform
well.  Autovacuum can often handle everything for you, but if you
have an "off-hours" period when load is lower it is often a good
idea to run a database VACUUM ANALYZE on a daily or weekly basis to
shift maintenance load to time when it has the least impact.  Never
leave autovacuum disabled beyond a short maintenance or load
window.

Regarding your specific problem...

You claim this is how contact_address_map is defined:

create table contact_address_map (
    contact id int8 not null,
    address_id int8 not null,
    unique (address_id)
);

... but your query is looking for rows in that table where
address_id is null, and finding them.  Those can't both be true.

Your plan shows an index scan with no index conditions to pass
every row in a table, which is much slower than a seqscan.  Did you
turn off enable_seqscan and fail to mention that?  Doing so will
definitely result in sub-optimal performance on queries like the
one shown, where every row in one of the tables must be read.

When you don't provide accurate information, any advice you get may
be off-target.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PostgreSQL Query Speed Issues

From

Joseph Pravato

Date:

22 February 2013, 22:11:39

On 2/22/2013 8:58 AM, Joseph Pravato wrote:

----- Forwarded Message -----
From: "Kevin Grittner" <kgrittn@ymail.com>
To: "Tom Lisjac" <netdxr@gmail.com>, pgsql-novice@postgresql.org
Sent: Friday, February 22, 2013 7:29:58 AM
Subject: Re: [NOVICE] PostgreSQL Query Speed Issues

Tom Lisjac <netdxr@gmail.com> wrote:

version 9.2.1 running on Centos6/64

There are performance problems in the 9.2 branch which are fixed in
9.2.3.  If you care about performance, or for that matter running
with known bugs fixed, upgrade.

http://www.postgresql.org/support/versioning/

Our server manager is going to update to 9.2.3 this weekend to see if we
get any improvements.

The database is running in a Xen VM with 12GB of ram, 4 virtual
CPU's and fast, dedicated physical disks rather then shared
network storage.  The problem queries take one core to saturation
and keep it there with very little disk I/O.
max_connections = 200

You may want to consider connection pooling:

http://wiki.postgresql.org/wiki/Number_Of_Database_Connections

On our server, we are using connection pooling with a maximum of 50. For
the moment, we've moved it down to max_connections = 100. Originally,
the 200 was because we were using a multi-threaded migration routine to
load data into Postgres. However, we've set up a second database
dedicated for migration and are using pg_dump & pg_restore.

shared_buffers = 2GB

The usual advice is to start at 25% of machine RAM, up to 8GB, and
check performance with incremental updates from there.  That would
suggest a 4GB starting point.

effective_cache_size = 1024MB

This should normally be around 75% of machine RAM, so 9GB.  This
does not actually allocate any RAM, but is used to estimate how
much of the indexes may be in RAM on repeated access.  A larger
number allows more index usage.

We actually only have 11.37GB on our server, so we've used 3072MB for
shared_buffers & 8400MB for effective_cache_size.

In addition, with 12GB and apparently not more than 2 million rows
per table, you seem very likely to have the active portion of your
database fully cached.  So these settings are likely to help:

seq_page_cost = 0.1
random_page_cost = 0.1
cpu_tuple_cost = 0.03

Wow, the random_page_cost setting made a huge difference. The query we
mentioned yesterday that takes 30 minutes is down to about 30 seconds.
Nice speed improvement, reasonable speed, but still could use
improvements. However, I'm a bit worried about changing it so
drastically. Can there be any future issues from this change? We have 4
or 5 tables that will continue to grow fairly rapidly (1 million row
increase every 2-3 years).

I normally don't set work_mem above RAM * 0.25 / max_connections,
so I'd say 10MB to 15MB would probably be good with your
max_connections setting.  If you can lower that with a connection
pool you might want to go to 20MB or 40MB.

For the moment, It is set to 30MB based on your formula and the new 100
connection limit.

You probably want to boost maintenance_work_mem to something like
512MB.

If you haven't already done so, run VACUUM ANALYZE at the database
level.  If most of your data was loaded at about the same time, run
VACUUM FREEZE ANALYZE instead, to prevent a read-and-rewrite pass
of your entire database at peak OLTP load.  Vacuum and analyze are
routine maintenance that are necessary for the database to perform
well.  Autovacuum can often handle everything for you, but if you
have an "off-hours" period when load is lower it is often a good
idea to run a database VACUUM ANALYZE on a daily or weekly basis to
shift maintenance load to time when it has the least impact.  Never
leave autovacuum disabled beyond a short maintenance or load
window.

The vacuuming didn't seem to change any performances, and it didn't take
very long to run. Autovacuum may have already taken care of most
potential issues from this.

Regarding your specific problem...

You claim this is how contact_address_map is defined:

create table contact_address_map (   contact id int8 not null,   address_id int8 not null,   unique (address_id)
);

... but your query is looking for rows in that table where
address_id is null, and finding them.  Those can't both be true.

We are migrating a really old db to postgres and cleaning up all of the
data. In the old system, only one address was allowed per user and
therefore with the migrated data there is only going to be 1 entry in
the map for each user. We were looking for which users didn't have an
address at all and the query was a optimization with this assumption. We
are aware that the query is not ideal, but it works for the
just-after-migrating scenario for validating parts of the migration
routine. Do note that were were left joining the two tables so that all
contacts are guaranteed to return, only with a check on address_id to
see if they didn't have any addresses.

Your plan shows an index scan with no index conditions to pass
every row in a table, which is much slower than a seqscan.  Did you
turn off enable_seqscan and fail to mention that?  Doing so will
definitely result in sub-optimal performance on queries like the
one shown, where every row in one of the tables must be read.

No, enable_seqscan is still turned on.

When you don't provide accurate information, any advice you get may
be off-target.

Re: PostgreSQL Query Speed Issues

From

Kevin Grittner

Date:

23 February 2013, 16:53:34

Joseph Pravato <joseph.pravato@nomagic.com> wrote:
> From: "Kevin Grittner" <kgrittn@ymail.com>

> We actually only have 11.37GB on our server, so we've used 3072MB
> for shared_buffers & 8400MB for effective_cache_size.

Sounds reasonable.  Keep in mind that the actual "sweet spot" for a
lot of this configuration depends not only on the hardware, but
also the database and workload.  Once you have a performance
baseline with actual workload using these settings, trying
incremental changes and monitoring the results will help zero in on
ideal settings for you situation.  Many settings can take effect
with just a reload, and don't disrupt ongoing workload, although
shared_buffers is one which requires a restart.  You can check that
in the docs or the context column in the pg_settings view.

>> In addition, with 12GB and apparently not more than 2 million
>> rows per table, you seem very likely to have the active portion
>> of your database fully cached.  So these settings are likely to
>> help:
>>
>> seq_page_cost = 0.1
>> random_page_cost = 0.1
>> cpu_tuple_cost = 0.03

> Wow, the random_page_cost setting made a huge difference. The
> query we mentioned yesterday that takes 30 minutes is down to
> about 30 seconds. Nice speed improvement, reasonable speed, but
> still could use improvements. However, I'm a bit worried about
> changing it so drastically. Can there be any future issues from
> this change? We have 4 or 5 tables that will continue to grow
> fairly rapidly (1 million row increase every 2-3 years).

PostgreSQL uses a cost-based optimizer (rather than rules-based)
and it goes through the OS and its filesystems, so caching effects
need to be considered in setting the cost factors in order to get
the best plan for any query.  The default configuration is intended
to allow PostgreSQL to run on a low-end laptop, so that people
don't have the unfortunate experience of installing it and not
being able to get it to even start.  Tuning is required for serious
production work.  The original configuration assumed fairly heavy
disk access when reading data, so if that's not the case for your
environment, you need to make adjustments like the above to more
accurately model the costs of each possible plan.  If the cache hit
rate falls dramatically, you will want to either add RAM to keep
data cached, or adjust the cost factors so that it will try to
minimize slow disk access.

>> If you haven't already done so, run VACUUM ANALYZE at the
>> database level.  If most of your data was loaded at about the
>> same time, run VACUUM FREEZE ANALYZE instead, to prevent a
>> read-and-rewrite pass of your entire database at peak OLTP load.

> The vacuuming didn't seem to change any performances, and it
> didn't take very long to run. Autovacuum may have already taken
> care of most potential issues from this.

That's good, but if most of the data was loaded at about the same
time, you may want to run VACUUM FREEZE ANALYZE during an off-peak
period.  Without that, there will be an autovacuum run at some
point which will be more aggressive than usual, and may affect
performance enought to notice.  A manual run will let you pick when
to do this more aggressive maintenance.

>> Your plan shows an index scan with no index conditions to pass
>> every row in a table, which is much slower than a seqscan.  Did
>> you turn off enable_seqscan [...]?

> No, enable_seqscan is still turned on.

That plan choice strikes me as very odd, and not likely to be
optimal.  The only other things that I can think of which might
cause this plan choice would be if seq_page_cost is higher than
random_page_cost, or if the table has a lot of dead space in it.
Could you show EXPLAIN ANALYZE output for the current settings,
along with the output of running this?:

SELECT name, current_setting(name), source
  FROM pg_settings
  WHERE source NOT IN ('default', 'override');
SELECT oid, relname, relpages, reltuples FROM pg_class
  WHERE relname = 'contact';
SELECT * FROM pg_stat_user_tables WHERE relname = 'contact';

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PostgreSQL Query Speed Issues

From

Joseph Pravato

Date:

26 February 2013, 17:19:45

On 2/22/2013 7:29 AM, Kevin Grittner wrote:

If you haven't already done so, run VACUUM ANALYZE at the database
level.  If most of your data was loaded at about the same time, run
VACUUM FREEZE ANALYZE instead, to prevent a read-and-rewrite pass
of your entire database at peak OLTP load.  Vacuum and analyze are
routine maintenance that are necessary for the database to perform
well.  Autovacuum can often handle everything for you, but if you
have an "off-hours" period when load is lower it is often a good
idea to run a database VACUUM ANALYZE on a daily or weekly basis to
shift maintenance load to time when it has the least impact.  Never
leave autovacuum disabled beyond a short maintenance or load
window.

We use SquirrelSQL to talk to our databases with the Postgres 9.2-1002 JDBC driver. When I tried to run `VACUUM FREEZE ANALYZE', It came back with this error:
    Error: ERROR: VACUUM cannot run inside a transaction block
    SQLState: 25001
    ErrorCode: 0

This is the error we have had that Tom posted about before. (http://www.postgresql.org/message-id/5115B188.6090308@gmail.com) I don't think Tom fully understood the issue, and nor did we. Over the past few weeks, our team has been looking into this issue where our application would just stop and hang. For the past couple months we've been dealing with this, our solution was to go to everybody's SquirrelSQL sessions and hit rollback. This fixes the issue every time. Usually, users' who's session caused the hang couldn't remember what he was doing, so we still don't know what exactly is causing the issue.

We did however figure out what the root cause is, as you can see above. Every statement in the JDBC driver is ran inside of a transaction. This is not a SquirrelSQL problem since a simple test program that only uses the JDBC driver gives the same error. Yet when I run it in on the command line (psql) it has no issues.

What gets particularly confusing with this issue is when your query is wrong in any way, you are required to rollback the transaction. Worse yet, you have to re-run any prior statements in the transaction if you have any errors. The easiest solution is to turn auto-commit on and hope that you never seriously screw up data. But, we are not going to do that.

Did I miss anything, maybe a setting somewhere in the JDBC driver that can fix this issue? It makes it extremely difficult to run ad-hoc queries since at anytime there is a possibility that some queries stop responding.

Re: PostgreSQL Query Speed Issues

From

Thomas Kellerer

Date:

26 February 2013, 18:01:50

Joseph Pravato wrote on 26.02.2013 18:19:
> We use SquirrelSQL to talk to our databases with the Postgres
> 9.2-1002 JDBC driver. When I tried to run `VACUUM FREEZE ANALYZE', It
> came back with this error:

>Error: ERROR: VACUUM cannot run inside a transaction block SQLState:  25001 ErrorCode: 0

Simply turn on "Autocommit" in Squirrel to get around this.

> What gets particularly confusing with this issue is when your query
> is wrong in any way, you are required to rollback the transaction.
> Worse yet, you have to re-run any prior statements in the transaction
> if you have any errors. The easiest solution is to turn auto-commit
> on and hope that you never seriously screw up data. But, we are not
> going to do that.

That can be solved by using savepoints. For my SQL tool (SQL Workbench/J) I have implemented this so that each
statementthat is run interactively is guarded with a savepoint. Upon an error, the tool automatically rolls back to the
savepoint,leaving the transaction "intact" and letting you continue without the manual need to rollback (possibly
losingchanges you did before that). 

Without autocommit you can however not run vacuum. The workaround for that would be to turn off autocommit temporarily
insideyour SQL client. I don't know if that is possible in Squirel though (again I added that ability to SQL
Workbench/Jusing "set autocommit on/off" - which is not a standard Postgres statement. There, you could run the script: 

set autocommit on;
VACUUM FREEZE ANALYZE;
set autcommit off;


If you want to look at my tool, here it is: http://www.sql-workbench.net

Regards
Thomas

Re: PostgreSQL Query Speed Issues

From

Joseph Pravato

Date:

27 February 2013, 16:42:30

On 2/23/2013 10:53 AM, Kevin Grittner wrote:
> That plan choice strikes me as very odd, and not likely to be
> optimal.  The only other things that I can think of which might
> cause this plan choice would be if seq_page_cost is higher than
> random_page_cost, or if the table has a lot of dead space in it.
> Could you show EXPLAIN ANALYZE output for the current settings,
> along with the output of running this?:

Sorry for the delay in responding, we thank you for all your assistance
and time, it is very appreciated!

Here is the explain analyze for the query:
select * from contact where id not in (select contact_id from
contact_address_map)

Seq Scan on contact  (cost=18995.86..39058.98 rows=287471 width=948)
(actual time=1231.398..1259.205 rows=17 loops=1)
  Filter: (NOT (hashed SubPlan 1))
  Rows Removed by Filter: 574928
  SubPlan 1
    ->  Seq Scan on contact_address_map  (cost=0.00..17558.55
rows=574925 width=8) (actual time=0.018..454.653 rows=574928 loops=1)
Total runtime: 1259.281 ms

After your suggestions this query sped up dramatically, it now returns
in less than a second.

This query that we have been talking about is just a sample that we used
to get a start on performance improvements. The original performance
related issue we had was with a large view that we use for our customer
& sales information that accesses 3 additional views and joins a total
of 23 tables. Before the suggestions you gave it returned in 7 - 10
minutes and now returns in less than 10 seconds. However, we have a copy
of our data on another database that runs in less than 0.5 seconds. We
think based on the previous messages in this thread that it is still
choosing a sub-optimal query plan for the views.

This is the explain analyze for our customer data view.
http://pastebin.com/kSfb2dqy

> SELECT name, current_setting(name), source
>   FROM pg_settings
>   WHERE source NOT IN ('default', 'override');
> SELECT oid, relname, relpages, reltuples FROM pg_class
>   WHERE relname = 'contact';
> SELECT * FROM pg_stat_user_tables WHERE relname = 'contact';

Here is the output for the queries you provided.
http://pastebin.com/Yp80HCpe

> --
> Kevin Grittner
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Re: PostgreSQL Query Speed Issues

From

Kevin Grittner

Date:

27 February 2013, 17:44:31

Joseph Pravato <joseph.pravato@nomagic.com> wrote:

> This query that we have been talking about is just a sample that
> we used to get a start on performance improvements. The original
> performance related issue we had was with a large view that we
> use for our customer & sales information that accesses 3
> additional views and joins a total of 23 tables.

Ah, well that gets us into a whole new ballgame.  With that many
tables, planning time can become an issue in itself, so there are
various things that PostgreSQL does to try to planning time from
ballooning to a point where it takes longer than the time saved by
plan improvement.  The first thing I would try is boosting these
settings to more than the number of table references:

from_collapse_limit
join_collapse_limit
geqo_threshold

Try an EXPLAIN ANALYZE and see whether the "actual time" at the top
level node of the plan looks good, and see how it compares to
"Total runtime" at the bottom.  The difference is primarily
planning time, so you can see how good a plan you got versus how
expensive it was to find that plan.  If you see that there are good
plans, but it is too expensive to find them, you might want to let
the "genetic query optimizer" have a shot at planning, by adjusting
the above values.  The concept of this alternative planner is that
it tries to get a plan which is "good enough" with bounded planning
time.

See the docs for details.

> Before the suggestions you gave it returned in 7 - 10 minutes and
> now returns in less than 10 seconds. However, we have a copy of
> our data on another database that runs in less than 0.5 seconds.
> We think based on the previous messages in this thread that it is
> still choosing a sub-optimal query plan for the views.
>
> This is the explain analyze for our customer data view.
> http://pastebin.com/kSfb2dqy

> Here is the output for the queries you provided.
> http://pastebin.com/Yp80HCpe

The plan seems to be reluctant to use index scans, which might be
related to the ratio between these values:

random_page_cost           | 1
seq_page_cost              | 0.1

Do you get a better plan if these are equal?  If random is only
twice the sequential cost?

The other possible issue is that depending on how the views are
used, it sometimes creates an optimization barrier.  In general,
the planner will see more options if the views are joined than if
they are used in subqueries or CTEs.

You might also want to try pasting your plan into:

http://explain.depesz.com/

This formats the plan and highlights portions you might want to pay
special attention to.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PostgreSQL Query Speed Issues

From

Tom Lisjac

Date:

27 February 2013, 18:49:39

Hi Kevin,

Again, many thanks! The team is implementing your suggestions.

From a sysadmin perspective, something odd also seems to be the lack of
OS disk caching. After some very heavy use over many hours, 12 of the
16GB total RAM is still free rather then being allocated to disk buffering:

Mem:  16013788k total,  4013608k used, 12000180k free,   108108k buffers
Swap:  2097148k total,        0k used,  2097148k free,  3267868k cached

When a query hangs, we also see a core become and stay saturated:

PID     USER    PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
27135 postgres  20   0 3510m 471m 315m R 100.0  3.0  71:06.82 postmaster

Without that core doing any disk I/O:

Cpu1  : 52.8%us, 47.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,
0.0%st

I get the feeling that we've missed a basic setting change that's
required to move from that default, low end laptop you described in an
earlier post to the enterprise environment we'd like to operate in. :)

Best regards,

-Tom

On 02/27/2013 10:44 AM, Kevin Grittner wrote:
> Joseph Pravato <joseph.pravato@nomagic.com> wrote:
>
>> This query that we have been talking about is just a sample that
>> we used to get a start on performance improvements. The original
>> performance related issue we had was with a large view that we
>> use for our customer & sales information that accesses 3
>> additional views and joins a total of 23 tables.
> Ah, well that gets us into a whole new ballgame.  With that many
> tables, planning time can become an issue in itself, so there are
> various things that PostgreSQL does to try to planning time from
> ballooning to a point where it takes longer than the time saved by
> plan improvement.  The first thing I would try is boosting these
> settings to more than the number of table references:
>
> from_collapse_limit
> join_collapse_limit
> geqo_threshold
>
> Try an EXPLAIN ANALYZE and see whether the "actual time" at the top
> level node of the plan looks good, and see how it compares to
> "Total runtime" at the bottom.  The difference is primarily
> planning time, so you can see how good a plan you got versus how
> expensive it was to find that plan.  If you see that there are good
> plans, but it is too expensive to find them, you might want to let
> the "genetic query optimizer" have a shot at planning, by adjusting
> the above values.  The concept of this alternative planner is that
> it tries to get a plan which is "good enough" with bounded planning
> time.
>
> See the docs for details.
>
>> Before the suggestions you gave it returned in 7 - 10 minutes and
>> now returns in less than 10 seconds. However, we have a copy of
>> our data on another database that runs in less than 0.5 seconds.
>> We think based on the previous messages in this thread that it is
>> still choosing a sub-optimal query plan for the views.
>>
>> This is the explain analyze for our customer data view.
>> http://pastebin.com/kSfb2dqy
>> Here is the output for the queries you provided.
>> http://pastebin.com/Yp80HCpe
> The plan seems to be reluctant to use index scans, which might be
> related to the ratio between these values:
>
> random_page_cost           | 1
> seq_page_cost              | 0.1
>
> Do you get a better plan if these are equal?  If random is only
> twice the sequential cost?
>
> The other possible issue is that depending on how the views are
> used, it sometimes creates an optimization barrier.  In general,
> the planner will see more options if the views are joined than if
> they are used in subqueries or CTEs.
>
> You might also want to try pasting your plan into:
>
> http://explain.depesz.com/
>
> This formats the plan and highlights portions you might want to pay
> special attention to.
>
> --
> Kevin Grittner
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>

Re: PostgreSQL Query Speed Issues

From

Joseph Pravato

Date:

27 February 2013, 23:07:24

Kevin Grittner wrote:

Ah, well that gets us into a whole new ballgame.  With that many
tables, planning time can become an issue in itself, so there are
various things that PostgreSQL does to try to planning time from
ballooning to a point where it takes longer than the time saved by
plan improvement.  The first thing I would try is boosting these
settings to more than the number of table references:

from_collapse_limit
join_collapse_limit
geqo_threshold

Try an EXPLAIN ANALYZE and see whether the "actual time" at the top
level node of the plan looks good, and see how it compares to
"Total runtime" at the bottom.  The difference is primarily
planning time, so you can see how good a plan you got versus how
expensive it was to find that plan.  If you see that there are good
plans, but it is too expensive to find them, you might want to let
the "genetic query optimizer" have a shot at planning, by adjusting
the above values.  The concept of this alternative planner is that
it tries to get a plan which is "good enough" with bounded planning
time.

See the docs for details.

We've changed both 'from_collapse_limit' and 'join_collapse_limit' from
8 to 15. We tried multiple combinations for both values from 8, 12-17, &
20 and both at 15 seemed to be a sweet spot for this view.
geqo_threshold didn't really make any changes once we changed the other two.

We've started to test more of our queries that involve the view and we
believe we may have come across a possible reason for the poor
performance in our views.
The view is pulling data from many tables as said before, one of them
has a varchar(255) column (ColA) that is copied over into the view.
However, in the view column definition, it is a varchar(2147483647).
In the query we tested, we are running an equivalence check against this
column, along with another varchar column of size 1 (ColB). When we
remove the check against ColA the query returns in 2.5 seconds. When
included it talks 200 seconds. There is almost no difference when the
ColB check is removed (2.5s vs 2.3s).

It is of our belief that this varchar(2147483647) could be causing
performance problems. ColA might be defaulting to 2147483647 because it
is being union-ed with the same column a couple of times in different
circumstances. So we are wondering if there a way to set the column's
varchar size in a view definition?

The plan seems to be reluctant to use index scans, which might be
related to the ratio between these values:

random_page_cost           | 1
seq_page_cost              | 0.1

Do you get a better plan if these are equal?  If random is only
twice the sequential cost?

We have gone ahead and made these settings the same and had no
significant performance increase. We're unsure on what would be a better
plan, please see the next section we commented on.

The other possible issue is that depending on how the views are
used, it sometimes creates an optimization barrier.  In general,
the planner will see more options if the views are joined than if
they are used in subqueries or CTEs.

You might also want to try pasting your plan into:

http://explain.depesz.com/

Side comment: I like this website, thanks for the link. However, we are
not exactly DBA's or anything and are new to databases & postgres. We
are still doing some research on what the site outputs since we're still
new to the information presented. Do you know of any good sites that
show good vs bad query plans? Given the sheer amount in the view's query
plan, we aren't sure whats going on.

Thank you again for all of your assistance!

Re: PostgreSQL Query Speed Issues

From

Joseph Pravato

Date:

04 March 2013, 17:01:59

Thanks again to everyone that has been assisting us with our query speed issue. We have possibly found a reason why our views are running slow, does anyone know if that could be our problem?
Please read below.

We've started to test more of our queries that involve the view and we
believe we may have come across a possible reason for the poor
performance in our views.
The view is pulling data from many tables as said before, one of them
has a varchar(255) column (ColA) that is copied over into the view.
However, in the view column definition, it is a varchar(2147483647).
In the query we tested, we are running an equivalence check against this
column, along with another varchar column of size 1 (ColB). When we
remove the check against ColA the query returns in 2.5 seconds. When
included it talks 200 seconds. There is almost no difference when the
ColB check is removed (2.5s vs 2.3s).

It is of our belief that this varchar(2147483647) could be causing
performance problems. ColA might be defaulting to 2147483647 because it
is being union-ed with the same column a couple of times in different
circumstances. So we are wondering if there a way to set the column's
varchar size in a view definition?

Thanks!

Re: PostgreSQL Query Speed Issues

From

Tom Lane

Date:

04 March 2013, 18:49:31

Joseph Pravato <joseph.pravato@nomagic.com> writes:
> Thanks again to everyone that has been assisting us with our query speed
> issue. We have possibly found a reason why our views are running slow,
> does anyone know if that could be our problem?
> Please read below.
>> We've started to test more of our queries that involve the view and we
>> believe we may have come across a possible reason for the poor
>> performance in our views.
>> The view is pulling data from many tables as said before, one of them
>> has a varchar(255) column (ColA) that is copied over into the view.
>> However, in the view column definition, it is a varchar(2147483647).
>> In the query we tested, we are running an equivalence check against this
>> column, along with another varchar column of size 1 (ColB). When we
>> remove the check against ColA the query returns in 2.5 seconds. When
>> included it talks 200 seconds. There is almost no difference when the
>> ColB check is removed (2.5s vs 2.3s).
>>
>> It is of our belief that this varchar(2147483647) could be causing
>> performance problems. ColA might be defaulting to 2147483647 because it
>> is being union-ed with the same column a couple of times in different
>> circumstances. So we are wondering if there a way to set the column's
>> varchar size in a view definition?

In principle that should not make any difference.  (We have had bugs in
the query planner in the past that were triggered by such things ... but
not recently.)

You would be more likely to get useful answers if you posted the table
schemas and EXPLAIN ANALYZE results on pgsql-performance.  There are
some tips on asking good questions here:
https://wiki.postgresql.org/wiki/Slow_Query_Questions

            regards, tom lane