Thread: PostgreSQL Query Speed Issues
Hi! We are running into an interesting issue with PostgreSQL 9.2. Some of the queries are returning in 5-30 minutes where othersnever return at all. The same queries return in several seconds in other databases we have tested on with the same data. Here is a sample querythat we are running into this issue with, hopefully some insight on this issue will help us solve a similar problemwith the rest of the queries. NOTE: All database tests were done without changing or updating any settings after install. # rows in contact: 574939 # rows in contact_address_map: 574924 Original: NEVER RETURNS (tested to over an hour) select * from contact where id not in (select contact_id from contact_address_map) Result from an 'explain analyze': Since the original query never returned, this was not run. Optimized for Postgres: RETURNS select c.* from contact c left join CONTACT_ADDRESS_MAP cam on cam.CONTACT_ID = c.id where cam.ADDRESS_ID is null Result from an 'explain analyze': QUERY PLAN Merge Left Join (cost=83512.87..158059.12 rows=1 width=952) (actual time=5224.171..5224.270 rows=15 loops=1) Merge Cond: (c.id = cam.contact_id) Filter: (cam.address_id IS NULL) Rows Removed by Filter: 574924 -> Index Scan using contact_pkey on contact c (cost=0.00..63048.48 rows=574917 width=952) (actual time=0.009..852.708rows=574939 loops=1) -> Materialize (cost=83512.59..86387.17 rows=574917 width=16) (actual time=1407.387..3107.246 rows=574924 loops=1) -> Sort (cost=83512.59..84949.88 rows=574917 width=16) (actual time=1407.379..2027.946 rows=574924 loops=1) Sort Key: cam.contact_id Sort Method: external sort Disk: 14616kB -> Seq Scan on contact_address_map cam (cost=0.00..8857.17 rows=574917 width=16) (actual time=0.018..578.348rows=574924 loops=1) Total runtime: 5228.459 ms We have done some investigation online and it looks like many of these speed issues require hand optimizing each query specificto PostgreSQL where as other databases seem to do it automatically. Is it possible to change a configuration optionon the database or does PostgreSQL require hand optimization to all queries to run optimally. Any help would be greatly appreciated.
On Thu, Feb 21, 2013 at 3:56 PM, Joseph Pravato <joseph.pravato@nomagic.com> wrote: > NOTE: All database tests were done without changing or updating any settings after install. The defaults are sub-optimal. You really do need to tune them to the server in question. > > # rows in contact: 574939 > # rows in contact_address_map: 574924 After loading this data, did you run an ANALYZE? > select c.* > from contact c > left join CONTACT_ADDRESS_MAP cam on cam.CONTACT_ID = c.id > where cam.ADDRESS_ID is null Table definitions would probably help. You might be missing indexes. > > Result from an 'explain analyze': > > QUERY PLAN > Merge Left Join (cost=83512.87..158059.12 rows=1 width=952) (actual time=5224.171..5224.270 rows=15 loops=1) > Merge Cond: (c.id = cam.contact_id) > Filter: (cam.address_id IS NULL) > Rows Removed by Filter: 574924 > -> Index Scan using contact_pkey on contact c (cost=0.00..63048.48 rows=574917 width=952) (actual time=0.009..852.708rows=574939 loops=1) > -> Materialize (cost=83512.59..86387.17 rows=574917 width=16) (actual time=1407.387..3107.246 rows=574924 loops=1) > -> Sort (cost=83512.59..84949.88 rows=574917 width=16) (actual time=1407.379..2027.946 rows=574924 loops=1) > Sort Key: cam.contact_id > Sort Method: external sort Disk: 14616kB disk sorts imply work_mem isn't big enough. > -> Seq Scan on contact_address_map cam (cost=0.00..8857.17 rows=574917 width=16) (actual time=0.018..578.348rows=574924 loops=1) This should probably be using an index -- Douglas J Hunley (doug.hunley@gmail.com) Twitter: @hunleyd Web: douglasjhunley.com G+: http://goo.gl/sajR3
On 2/21/2013 3:00 PM, Douglas J Hunley wrote: > On Thu, Feb 21, 2013 at 3:56 PM, Joseph Pravato > <joseph.pravato@nomagic.com> wrote: >> NOTE: All database tests were done without changing or updating any settings after install. > The defaults are sub-optimal. You really do need to tune them to the > server in question. Do you have an recommendations that are optimized for database with 600,000 user records and tables with up to 2,000,000 records? > >> # rows in contact: 574939 >> # rows in contact_address_map: 574924 > After loading this data, did you run an ANALYZE? No we did not, why would this be useful? > >> select c.* >> from contact c >> left join CONTACT_ADDRESS_MAP cam on cam.CONTACT_ID = c.id >> where cam.ADDRESS_ID is null > Table definitions would probably help. You might be missing indexes. Table definitions link (http://pastebin.com/GyCsYpBn). See index comment below. > >> Result from an 'explain analyze': >> >> QUERY PLAN >> Merge Left Join (cost=83512.87..158059.12 rows=1 width=952) (actual time=5224.171..5224.270 rows=15 loops=1) >> Merge Cond: (c.id = cam.contact_id) >> Filter: (cam.address_id IS NULL) >> Rows Removed by Filter: 574924 >> -> Index Scan using contact_pkey on contact c (cost=0.00..63048.48 rows=574917 width=952) (actual time=0.009..852.708rows=574939 loops=1) >> -> Materialize (cost=83512.59..86387.17 rows=574917 width=16) (actual time=1407.387..3107.246 rows=574924 loops=1) >> -> Sort (cost=83512.59..84949.88 rows=574917 width=16) (actual time=1407.379..2027.946 rows=574924 loops=1) >> Sort Key: cam.contact_id >> Sort Method: external sort Disk: 14616kB > disk sorts imply work_mem isn't big enough. I was surprised that the work_mem default was so low, we changed it from 1MB to 50MB and this is the `explain analyze' for select * from contact where id not in (select contact_id from contact_address_map) QUERY PLAN Seq Scan on contact (cost=10294.55..31256.01 rows=287458 width=952) (actual time=1555.473..1582.885 rows=16 loops=1) Filter: (NOT (hashed SubPlan 1)) Rows Removed by Filter: 574924 SubPlan 1 -> Seq Scan on contact_address_map (cost=0.00..8857.24 rows=574924 width=8) (actual time=0.013..586.107 rows=574924 loops=1) Total runtime: 1597.773 ms Increasing work_mem did not improve speeds for an unrelated query, it ended up returning in 26 minutes. We had set the work_mem to 1000MB for that test, we will try to get a simplified version of the query that does not expose company data to you tomorrow. > >> -> Seq Scan on contact_address_map cam (cost=0.00..8857.17 rows=574917 width=16) (actual time=0.018..578.348rows=574924 loops=1) > This should probably be using an index We added indexes to both columns of contact_address_map, but they made no difference in speed for both queries. Any additional assistance is appreciated.
On 2/21/2013 3:00 PM, Douglas J Hunley wrote: > On Thu, Feb 21, 2013 at 3:56 PM, Joseph Pravato > <joseph.pravato@nomagic.com> wrote: >> NOTE: All database tests were done without changing or updating any settings after install. > The defaults are sub-optimal. You really do need to tune them to the > server in question. Do you have an recommendations that are optimized for database with 600,000 user records and tables with up to 2,000,000 records? > >> # rows in contact: 574939 >> # rows in contact_address_map: 574924 > After loading this data, did you run an ANALYZE? No we did not, why would this be useful? > >> select c.* >> from contact c >> left join CONTACT_ADDRESS_MAP cam on cam.CONTACT_ID = c.id >> where cam.ADDRESS_ID is null > Table definitions would probably help. You might be missing indexes. Table definitions link (http://pastebin.com/GyCsYpBn). See index comment below. > >> Result from an 'explain analyze': >> >> QUERY PLAN >> Merge Left Join (cost=83512.87..158059.12 rows=1 width=952) (actual time=5224.171..5224.270 rows=15 loops=1) >> Merge Cond: (c.id = cam.contact_id) >> Filter: (cam.address_id IS NULL) >> Rows Removed by Filter: 574924 >> -> Index Scan using contact_pkey on contact c (cost=0.00..63048.48 rows=574917 width=952) (actual time=0.009..852.708rows=574939 loops=1) >> -> Materialize (cost=83512.59..86387.17 rows=574917 width=16) (actual time=1407.387..3107.246 rows=574924 loops=1) >> -> Sort (cost=83512.59..84949.88 rows=574917 width=16) (actual time=1407.379..2027.946 rows=574924 loops=1) >> Sort Key: cam.contact_id >> Sort Method: external sort Disk: 14616kB > disk sorts imply work_mem isn't big enough. I was surprised that the work_mem default was so low, we changed it from 1MB to 50MB and this is the `explain analyze' for select * from contact where id not in (select contact_id from contact_address_map) QUERY PLAN Seq Scan on contact (cost=10294.55..31256.01 rows=287458 width=952) (actual time=1555.473..1582.885 rows=16 loops=1) Filter: (NOT (hashed SubPlan 1)) Rows Removed by Filter: 574924 SubPlan 1 -> Seq Scan on contact_address_map (cost=0.00..8857.24 rows=574924 width=8) (actual time=0.013..586.107 rows=574924 loops=1) Total runtime: 1597.773 ms Increasing work_mem did not improve speeds for an unrelated query, it ended up returning in 26 minutes. We had set the work_mem to 1000MB for that test, we will try to get a simplified version of the query that does not expose company data to you tomorrow. > >> -> Seq Scan on contact_address_map cam (cost=0.00..8857.17 rows=574917 width=16) (actual time=0.018..578.348rows=574924 loops=1) > This should probably be using an index We added indexes to both columns of contact_address_map, but they made no difference in speed for both queries. Any additional assistance is appreciated.
--As of February 21, 2013 4:55:03 PM -0600, Joseph Pravato is alleged to have said: >>> # rows in contact: 574939 >>> # rows in contact_address_map: 574924 >> After loading this data, did you run an ANALYZE? > No we did not, why would this be useful? --As for the rest, it is mine. Because it would let the database collect information on how the data is stored and distributed, thereby giving it a better chance to choose the best query plan. (Things like: What are common values, what are ranges of values, etc. If half the database has the same zip code, using an index for the zip code isn't useful for that zip code. If it's an even distribution, it is. ANALYZE lets the database figure that out.) Recent versions of PostgreSQL will periodically do this in the background, but doing it explicitly in the foreground after major uploads of data is almost always a good idea. Daniel T. STaal --------------------------------------------------------------- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---------------------------------------------------------------
Hi all, On 02/21/2013 03:55 PM, Joseph Pravato wrote: > Do you have an recommendations that are optimized for database with > 600,000 user records and tables with up to 2,000,000 records? Joey and I are both working on this and it's related to an issue I posted a few weeks ago about a "hang". Since then I learned that a better problem description is "a query that takes longer to complete then anyone can tolerate". :) Here is our postgresql.conf for version 9.2.1 running on Centos6/64: listen_addresses = '*' max_connections = 200 shared_buffers = 2GB effective_cache_size = 1024MB log_destination = 'stderr' logging_collector = on log_directory = 'pg_log' log_filename = 'postgresql-%a.log' log_truncate_on_rotation = on log_rotation_age = 1d log_rotation_size = 0 log_timezone = 'US/Central' datestyle = 'iso, mdy' timezone = 'US/Central' lc_messages = 'en_US.UTF-8' lc_monetary = 'en_US.UTF-8' lc_numeric = 'en_US.UTF-8' lc_time = 'en_US.UTF-8' default_text_search_config = 'pg_catalog.english' The database is running in a Xen VM with 12GB of ram, 4 virtual CPU's and fast, dedicated physical disks rather then shared network storage. The problem queries take one core to saturation and keep it there with very little disk I/O. Any suggestions or insights would be greatly appreciated. Thanks, -Tom
Tom Lisjac <netdxr@gmail.com> wrote: > version 9.2.1 running on Centos6/64 There are performance problems in the 9.2 branch which are fixed in 9.2.3. If you care about performance, or for that matter running with known bugs fixed, upgrade. http://www.postgresql.org/support/versioning/ > The database is running in a Xen VM with 12GB of ram, 4 virtual > CPU's and fast, dedicated physical disks rather then shared > network storage. The problem queries take one core to saturation > and keep it there with very little disk I/O. > max_connections = 200 You may want to consider connection pooling: http://wiki.postgresql.org/wiki/Number_Of_Database_Connections > shared_buffers = 2GB The usual advice is to start at 25% of machine RAM, up to 8GB, and check performance with incremental updates from there. That would suggest a 4GB starting point. > effective_cache_size = 1024MB This should normally be around 75% of machine RAM, so 9GB. This does not actually allocate any RAM, but is used to estimate how much of the indexes may be in RAM on repeated access. A larger number allows more index usage. In addition, with 12GB and apparently not more than 2 million rows per table, you seem very likely to have the active portion of your database fully cached. So these settings are likely to help: seq_page_cost = 0.1 random_page_cost = 0.1 cpu_tuple_cost = 0.03 I normally don't set work_mem above RAM * 0.25 / max_connections, so I'd say 10MB to 15MB would probably be good with your max_connections setting. If you can lower that with a connection pool you might want to go to 20MB or 40MB. You probably want to boost maintenance_work_mem to something like 512MB. If you haven't already done so, run VACUUM ANALYZE at the database level. If most of your data was loaded at about the same time, run VACUUM FREEZE ANALYZE instead, to prevent a read-and-rewrite pass of your entire database at peak OLTP load. Vacuum and analyze are routine maintenance that are necessary for the database to perform well. Autovacuum can often handle everything for you, but if you have an "off-hours" period when load is lower it is often a good idea to run a database VACUUM ANALYZE on a daily or weekly basis to shift maintenance load to time when it has the least impact. Never leave autovacuum disabled beyond a short maintenance or load window. Regarding your specific problem... You claim this is how contact_address_map is defined: create table contact_address_map ( contact id int8 not null, address_id int8 not null, unique (address_id) ); ... but your query is looking for rows in that table where address_id is null, and finding them. Those can't both be true. Your plan shows an index scan with no index conditions to pass every row in a table, which is much slower than a seqscan. Did you turn off enable_seqscan and fail to mention that? Doing so will definitely result in sub-optimal performance on queries like the one shown, where every row in one of the tables must be read. When you don't provide accurate information, any advice you get may be off-target. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2/22/2013 8:58 AM, Joseph Pravato wrote:
----- Forwarded Message ----- From: "Kevin Grittner" <kgrittn@ymail.com> To: "Tom Lisjac" <netdxr@gmail.com>, pgsql-novice@postgresql.org Sent: Friday, February 22, 2013 7:29:58 AM Subject: Re: [NOVICE] PostgreSQL Query Speed Issues Tom Lisjac <netdxr@gmail.com> wrote:version 9.2.1 running on Centos6/64There are performance problems in the 9.2 branch which are fixed in 9.2.3. If you care about performance, or for that matter running with known bugs fixed, upgrade. http://www.postgresql.org/support/versioning/
Our server manager is going to update to 9.2.3 this weekend to see if we get any improvements.
The database is running in a Xen VM with 12GB of ram, 4 virtual CPU's and fast, dedicated physical disks rather then shared network storage. The problem queries take one core to saturation and keep it there with very little disk I/O. max_connections = 200You may want to consider connection pooling: http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
On our server, we are using connection pooling with a maximum of 50. For the moment, we've moved it down to max_connections = 100. Originally, the 200 was because we were using a multi-threaded migration routine to load data into Postgres. However, we've set up a second database dedicated for migration and are using pg_dump & pg_restore.
shared_buffers = 2GBThe usual advice is to start at 25% of machine RAM, up to 8GB, and check performance with incremental updates from there. That would suggest a 4GB starting point.effective_cache_size = 1024MBThis should normally be around 75% of machine RAM, so 9GB. This does not actually allocate any RAM, but is used to estimate how much of the indexes may be in RAM on repeated access. A larger number allows more index usage.
We actually only have 11.37GB on our server, so we've used 3072MB for shared_buffers & 8400MB for effective_cache_size.
In addition, with 12GB and apparently not more than 2 million rows per table, you seem very likely to have the active portion of your database fully cached. So these settings are likely to help: seq_page_cost = 0.1 random_page_cost = 0.1 cpu_tuple_cost = 0.03
Wow, the random_page_cost setting made a huge difference. The query we mentioned yesterday that takes 30 minutes is down to about 30 seconds. Nice speed improvement, reasonable speed, but still could use improvements. However, I'm a bit worried about changing it so drastically. Can there be any future issues from this change? We have 4 or 5 tables that will continue to grow fairly rapidly (1 million row increase every 2-3 years).
I normally don't set work_mem above RAM * 0.25 / max_connections, so I'd say 10MB to 15MB would probably be good with your max_connections setting. If you can lower that with a connection pool you might want to go to 20MB or 40MB.
For the moment, It is set to 30MB based on your formula and the new 100 connection limit.
You probably want to boost maintenance_work_mem to something like 512MB. If you haven't already done so, run VACUUM ANALYZE at the database level. If most of your data was loaded at about the same time, run VACUUM FREEZE ANALYZE instead, to prevent a read-and-rewrite pass of your entire database at peak OLTP load. Vacuum and analyze are routine maintenance that are necessary for the database to perform well. Autovacuum can often handle everything for you, but if you have an "off-hours" period when load is lower it is often a good idea to run a database VACUUM ANALYZE on a daily or weekly basis to shift maintenance load to time when it has the least impact. Never leave autovacuum disabled beyond a short maintenance or load window.
The vacuuming didn't seem to change any performances, and it didn't take very long to run. Autovacuum may have already taken care of most potential issues from this.
Regarding your specific problem... You claim this is how contact_address_map is defined: create table contact_address_map ( contact id int8 not null, address_id int8 not null, unique (address_id) ); ... but your query is looking for rows in that table where address_id is null, and finding them. Those can't both be true.
We are migrating a really old db to postgres and cleaning up all of the data. In the old system, only one address was allowed per user and therefore with the migrated data there is only going to be 1 entry in the map for each user. We were looking for which users didn't have an address at all and the query was a optimization with this assumption. We are aware that the query is not ideal, but it works for the just-after-migrating scenario for validating parts of the migration routine. Do note that were were left joining the two tables so that all contacts are guaranteed to return, only with a check on address_id to see if they didn't have any addresses.
Your plan shows an index scan with no index conditions to pass every row in a table, which is much slower than a seqscan. Did you turn off enable_seqscan and fail to mention that? Doing so will definitely result in sub-optimal performance on queries like the one shown, where every row in one of the tables must be read.
No, enable_seqscan is still turned on.
When you don't provide accurate information, any advice you get may be off-target.
Joseph Pravato <joseph.pravato@nomagic.com> wrote: > From: "Kevin Grittner" <kgrittn@ymail.com> > We actually only have 11.37GB on our server, so we've used 3072MB > for shared_buffers & 8400MB for effective_cache_size. Sounds reasonable. Keep in mind that the actual "sweet spot" for a lot of this configuration depends not only on the hardware, but also the database and workload. Once you have a performance baseline with actual workload using these settings, trying incremental changes and monitoring the results will help zero in on ideal settings for you situation. Many settings can take effect with just a reload, and don't disrupt ongoing workload, although shared_buffers is one which requires a restart. You can check that in the docs or the context column in the pg_settings view. >> In addition, with 12GB and apparently not more than 2 million >> rows per table, you seem very likely to have the active portion >> of your database fully cached. So these settings are likely to >> help: >> >> seq_page_cost = 0.1 >> random_page_cost = 0.1 >> cpu_tuple_cost = 0.03 > Wow, the random_page_cost setting made a huge difference. The > query we mentioned yesterday that takes 30 minutes is down to > about 30 seconds. Nice speed improvement, reasonable speed, but > still could use improvements. However, I'm a bit worried about > changing it so drastically. Can there be any future issues from > this change? We have 4 or 5 tables that will continue to grow > fairly rapidly (1 million row increase every 2-3 years). PostgreSQL uses a cost-based optimizer (rather than rules-based) and it goes through the OS and its filesystems, so caching effects need to be considered in setting the cost factors in order to get the best plan for any query. The default configuration is intended to allow PostgreSQL to run on a low-end laptop, so that people don't have the unfortunate experience of installing it and not being able to get it to even start. Tuning is required for serious production work. The original configuration assumed fairly heavy disk access when reading data, so if that's not the case for your environment, you need to make adjustments like the above to more accurately model the costs of each possible plan. If the cache hit rate falls dramatically, you will want to either add RAM to keep data cached, or adjust the cost factors so that it will try to minimize slow disk access. >> If you haven't already done so, run VACUUM ANALYZE at the >> database level. If most of your data was loaded at about the >> same time, run VACUUM FREEZE ANALYZE instead, to prevent a >> read-and-rewrite pass of your entire database at peak OLTP load. > The vacuuming didn't seem to change any performances, and it > didn't take very long to run. Autovacuum may have already taken > care of most potential issues from this. That's good, but if most of the data was loaded at about the same time, you may want to run VACUUM FREEZE ANALYZE during an off-peak period. Without that, there will be an autovacuum run at some point which will be more aggressive than usual, and may affect performance enought to notice. A manual run will let you pick when to do this more aggressive maintenance. >> Your plan shows an index scan with no index conditions to pass >> every row in a table, which is much slower than a seqscan. Did >> you turn off enable_seqscan [...]? > No, enable_seqscan is still turned on. That plan choice strikes me as very odd, and not likely to be optimal. The only other things that I can think of which might cause this plan choice would be if seq_page_cost is higher than random_page_cost, or if the table has a lot of dead space in it. Could you show EXPLAIN ANALYZE output for the current settings, along with the output of running this?: SELECT name, current_setting(name), source FROM pg_settings WHERE source NOT IN ('default', 'override'); SELECT oid, relname, relpages, reltuples FROM pg_class WHERE relname = 'contact'; SELECT * FROM pg_stat_user_tables WHERE relname = 'contact'; -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2/22/2013 7:29 AM, Kevin Grittner wrote:
We use SquirrelSQL to talk to our databases with the Postgres 9.2-1002 JDBC driver. When I tried to run `VACUUM FREEZE ANALYZE', It came back with this error:If you haven't already done so, run VACUUM ANALYZE at the database level. If most of your data was loaded at about the same time, run VACUUM FREEZE ANALYZE instead, to prevent a read-and-rewrite pass of your entire database at peak OLTP load. Vacuum and analyze are routine maintenance that are necessary for the database to perform well. Autovacuum can often handle everything for you, but if you have an "off-hours" period when load is lower it is often a good idea to run a database VACUUM ANALYZE on a daily or weekly basis to shift maintenance load to time when it has the least impact. Never leave autovacuum disabled beyond a short maintenance or load window.
Error: ERROR: VACUUM cannot run inside a transaction block
SQLState: 25001
ErrorCode: 0
This is the error we have had that Tom posted about before. (http://www.postgresql.org/message-id/5115B188.6090308@gmail.com) I don't think Tom fully understood the issue, and nor did we. Over the past few weeks, our team has been looking into this issue where our application would just stop and hang. For the past couple months we've been dealing with this, our solution was to go to everybody's SquirrelSQL sessions and hit rollback. This fixes the issue every time. Usually, users' who's session caused the hang couldn't remember what he was doing, so we still don't know what exactly is causing the issue.
We did however figure out what the root cause is, as you can see above. Every statement in the JDBC driver is ran inside of a transaction. This is not a SquirrelSQL problem since a simple test program that only uses the JDBC driver gives the same error. Yet when I run it in on the command line (psql) it has no issues.
What gets particularly confusing with this issue is when your query is wrong in any way, you are required to rollback the transaction. Worse yet, you have to re-run any prior statements in the transaction if you have any errors. The easiest solution is to turn auto-commit on and hope that you never seriously screw up data. But, we are not going to do that.
Did I miss anything, maybe a setting somewhere in the JDBC driver that can fix this issue? It makes it extremely difficult to run ad-hoc queries since at anytime there is a possibility that some queries stop responding.
Joseph Pravato wrote on 26.02.2013 18:19: > We use SquirrelSQL to talk to our databases with the Postgres > 9.2-1002 JDBC driver. When I tried to run `VACUUM FREEZE ANALYZE', It > came back with this error: >Error: ERROR: VACUUM cannot run inside a transaction block SQLState: 25001 ErrorCode: 0 Simply turn on "Autocommit" in Squirrel to get around this. > What gets particularly confusing with this issue is when your query > is wrong in any way, you are required to rollback the transaction. > Worse yet, you have to re-run any prior statements in the transaction > if you have any errors. The easiest solution is to turn auto-commit > on and hope that you never seriously screw up data. But, we are not > going to do that. That can be solved by using savepoints. For my SQL tool (SQL Workbench/J) I have implemented this so that each statementthat is run interactively is guarded with a savepoint. Upon an error, the tool automatically rolls back to the savepoint,leaving the transaction "intact" and letting you continue without the manual need to rollback (possibly losingchanges you did before that). Without autocommit you can however not run vacuum. The workaround for that would be to turn off autocommit temporarily insideyour SQL client. I don't know if that is possible in Squirel though (again I added that ability to SQL Workbench/Jusing "set autocommit on/off" - which is not a standard Postgres statement. There, you could run the script: set autocommit on; VACUUM FREEZE ANALYZE; set autcommit off; If you want to look at my tool, here it is: http://www.sql-workbench.net Regards Thomas
On 2/23/2013 10:53 AM, Kevin Grittner wrote: > That plan choice strikes me as very odd, and not likely to be > optimal. The only other things that I can think of which might > cause this plan choice would be if seq_page_cost is higher than > random_page_cost, or if the table has a lot of dead space in it. > Could you show EXPLAIN ANALYZE output for the current settings, > along with the output of running this?: Sorry for the delay in responding, we thank you for all your assistance and time, it is very appreciated! Here is the explain analyze for the query: select * from contact where id not in (select contact_id from contact_address_map) Seq Scan on contact (cost=18995.86..39058.98 rows=287471 width=948) (actual time=1231.398..1259.205 rows=17 loops=1) Filter: (NOT (hashed SubPlan 1)) Rows Removed by Filter: 574928 SubPlan 1 -> Seq Scan on contact_address_map (cost=0.00..17558.55 rows=574925 width=8) (actual time=0.018..454.653 rows=574928 loops=1) Total runtime: 1259.281 ms After your suggestions this query sped up dramatically, it now returns in less than a second. This query that we have been talking about is just a sample that we used to get a start on performance improvements. The original performance related issue we had was with a large view that we use for our customer & sales information that accesses 3 additional views and joins a total of 23 tables. Before the suggestions you gave it returned in 7 - 10 minutes and now returns in less than 10 seconds. However, we have a copy of our data on another database that runs in less than 0.5 seconds. We think based on the previous messages in this thread that it is still choosing a sub-optimal query plan for the views. This is the explain analyze for our customer data view. http://pastebin.com/kSfb2dqy > SELECT name, current_setting(name), source > FROM pg_settings > WHERE source NOT IN ('default', 'override'); > SELECT oid, relname, relpages, reltuples FROM pg_class > WHERE relname = 'contact'; > SELECT * FROM pg_stat_user_tables WHERE relname = 'contact'; Here is the output for the queries you provided. http://pastebin.com/Yp80HCpe > -- > Kevin Grittner > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Joseph Pravato <joseph.pravato@nomagic.com> wrote: > This query that we have been talking about is just a sample that > we used to get a start on performance improvements. The original > performance related issue we had was with a large view that we > use for our customer & sales information that accesses 3 > additional views and joins a total of 23 tables. Ah, well that gets us into a whole new ballgame. With that many tables, planning time can become an issue in itself, so there are various things that PostgreSQL does to try to planning time from ballooning to a point where it takes longer than the time saved by plan improvement. The first thing I would try is boosting these settings to more than the number of table references: from_collapse_limit join_collapse_limit geqo_threshold Try an EXPLAIN ANALYZE and see whether the "actual time" at the top level node of the plan looks good, and see how it compares to "Total runtime" at the bottom. The difference is primarily planning time, so you can see how good a plan you got versus how expensive it was to find that plan. If you see that there are good plans, but it is too expensive to find them, you might want to let the "genetic query optimizer" have a shot at planning, by adjusting the above values. The concept of this alternative planner is that it tries to get a plan which is "good enough" with bounded planning time. See the docs for details. > Before the suggestions you gave it returned in 7 - 10 minutes and > now returns in less than 10 seconds. However, we have a copy of > our data on another database that runs in less than 0.5 seconds. > We think based on the previous messages in this thread that it is > still choosing a sub-optimal query plan for the views. > > This is the explain analyze for our customer data view. > http://pastebin.com/kSfb2dqy > Here is the output for the queries you provided. > http://pastebin.com/Yp80HCpe The plan seems to be reluctant to use index scans, which might be related to the ratio between these values: random_page_cost | 1 seq_page_cost | 0.1 Do you get a better plan if these are equal? If random is only twice the sequential cost? The other possible issue is that depending on how the views are used, it sometimes creates an optimization barrier. In general, the planner will see more options if the views are joined than if they are used in subqueries or CTEs. You might also want to try pasting your plan into: http://explain.depesz.com/ This formats the plan and highlights portions you might want to pay special attention to. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Kevin, Again, many thanks! The team is implementing your suggestions. From a sysadmin perspective, something odd also seems to be the lack of OS disk caching. After some very heavy use over many hours, 12 of the 16GB total RAM is still free rather then being allocated to disk buffering: Mem: 16013788k total, 4013608k used, 12000180k free, 108108k buffers Swap: 2097148k total, 0k used, 2097148k free, 3267868k cached When a query hangs, we also see a core become and stay saturated: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27135 postgres 20 0 3510m 471m 315m R 100.0 3.0 71:06.82 postmaster Without that core doing any disk I/O: Cpu1 : 52.8%us, 47.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st I get the feeling that we've missed a basic setting change that's required to move from that default, low end laptop you described in an earlier post to the enterprise environment we'd like to operate in. :) Best regards, -Tom On 02/27/2013 10:44 AM, Kevin Grittner wrote: > Joseph Pravato <joseph.pravato@nomagic.com> wrote: > >> This query that we have been talking about is just a sample that >> we used to get a start on performance improvements. The original >> performance related issue we had was with a large view that we >> use for our customer & sales information that accesses 3 >> additional views and joins a total of 23 tables. > Ah, well that gets us into a whole new ballgame. With that many > tables, planning time can become an issue in itself, so there are > various things that PostgreSQL does to try to planning time from > ballooning to a point where it takes longer than the time saved by > plan improvement. The first thing I would try is boosting these > settings to more than the number of table references: > > from_collapse_limit > join_collapse_limit > geqo_threshold > > Try an EXPLAIN ANALYZE and see whether the "actual time" at the top > level node of the plan looks good, and see how it compares to > "Total runtime" at the bottom. The difference is primarily > planning time, so you can see how good a plan you got versus how > expensive it was to find that plan. If you see that there are good > plans, but it is too expensive to find them, you might want to let > the "genetic query optimizer" have a shot at planning, by adjusting > the above values. The concept of this alternative planner is that > it tries to get a plan which is "good enough" with bounded planning > time. > > See the docs for details. > >> Before the suggestions you gave it returned in 7 - 10 minutes and >> now returns in less than 10 seconds. However, we have a copy of >> our data on another database that runs in less than 0.5 seconds. >> We think based on the previous messages in this thread that it is >> still choosing a sub-optimal query plan for the views. >> >> This is the explain analyze for our customer data view. >> http://pastebin.com/kSfb2dqy >> Here is the output for the queries you provided. >> http://pastebin.com/Yp80HCpe > The plan seems to be reluctant to use index scans, which might be > related to the ratio between these values: > > random_page_cost | 1 > seq_page_cost | 0.1 > > Do you get a better plan if these are equal? If random is only > twice the sequential cost? > > The other possible issue is that depending on how the views are > used, it sometimes creates an optimization barrier. In general, > the planner will see more options if the views are joined than if > they are used in subqueries or CTEs. > > You might also want to try pasting your plan into: > > http://explain.depesz.com/ > > This formats the plan and highlights portions you might want to pay > special attention to. > > -- > Kevin Grittner > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > >
Kevin Grittner wrote:
Ah, well that gets us into a whole new ballgame. With that many tables, planning time can become an issue in itself, so there are various things that PostgreSQL does to try to planning time from ballooning to a point where it takes longer than the time saved by plan improvement. The first thing I would try is boosting these settings to more than the number of table references: from_collapse_limit join_collapse_limit geqo_threshold Try an EXPLAIN ANALYZE and see whether the "actual time" at the top level node of the plan looks good, and see how it compares to "Total runtime" at the bottom. The difference is primarily planning time, so you can see how good a plan you got versus how expensive it was to find that plan. If you see that there are good plans, but it is too expensive to find them, you might want to let the "genetic query optimizer" have a shot at planning, by adjusting the above values. The concept of this alternative planner is that it tries to get a plan which is "good enough" with bounded planning time. See the docs for details.
We've changed both 'from_collapse_limit' and 'join_collapse_limit' from 8 to 15. We tried multiple combinations for both values from 8, 12-17, & 20 and both at 15 seemed to be a sweet spot for this view. geqo_threshold didn't really make any changes once we changed the other two. We've started to test more of our queries that involve the view and we believe we may have come across a possible reason for the poor performance in our views. The view is pulling data from many tables as said before, one of them has a varchar(255) column (ColA) that is copied over into the view. However, in the view column definition, it is a varchar(2147483647). In the query we tested, we are running an equivalence check against this column, along with another varchar column of size 1 (ColB). When we remove the check against ColA the query returns in 2.5 seconds. When included it talks 200 seconds. There is almost no difference when the ColB check is removed (2.5s vs 2.3s). It is of our belief that this varchar(2147483647) could be causing performance problems. ColA might be defaulting to 2147483647 because it is being union-ed with the same column a couple of times in different circumstances. So we are wondering if there a way to set the column's varchar size in a view definition?
The plan seems to be reluctant to use index scans, which might be related to the ratio between these values: random_page_cost | 1 seq_page_cost | 0.1 Do you get a better plan if these are equal? If random is only twice the sequential cost?
We have gone ahead and made these settings the same and had no significant performance increase. We're unsure on what would be a better plan, please see the next section we commented on.
The other possible issue is that depending on how the views are used, it sometimes creates an optimization barrier. In general, the planner will see more options if the views are joined than if they are used in subqueries or CTEs. You might also want to try pasting your plan into: http://explain.depesz.com/
Side comment: I like this website, thanks for the link. However, we are not exactly DBA's or anything and are new to databases & postgres. We are still doing some research on what the site outputs since we're still new to the information presented. Do you know of any good sites that show good vs bad query plans? Given the sheer amount in the view's query plan, we aren't sure whats going on. Thank you again for all of your assistance!
Thanks again to everyone that has been assisting us with our query speed issue. We have possibly found a reason why our views are running slow, does anyone know if that could be our problem?
Please read below.
Please read below.
We've started to test more of our queries that involve the view and we believe we may have come across a possible reason for the poor performance in our views. The view is pulling data from many tables as said before, one of them has a varchar(255) column (ColA) that is copied over into the view. However, in the view column definition, it is a varchar(2147483647). In the query we tested, we are running an equivalence check against this column, along with another varchar column of size 1 (ColB). When we remove the check against ColA the query returns in 2.5 seconds. When included it talks 200 seconds. There is almost no difference when the ColB check is removed (2.5s vs 2.3s). It is of our belief that this varchar(2147483647) could be causing performance problems. ColA might be defaulting to 2147483647 because it is being union-ed with the same column a couple of times in different circumstances. So we are wondering if there a way to set the column's varchar size in a view definition?
Thanks!
Joseph Pravato <joseph.pravato@nomagic.com> writes: > Thanks again to everyone that has been assisting us with our query speed > issue. We have possibly found a reason why our views are running slow, > does anyone know if that could be our problem? > Please read below. >> We've started to test more of our queries that involve the view and we >> believe we may have come across a possible reason for the poor >> performance in our views. >> The view is pulling data from many tables as said before, one of them >> has a varchar(255) column (ColA) that is copied over into the view. >> However, in the view column definition, it is a varchar(2147483647). >> In the query we tested, we are running an equivalence check against this >> column, along with another varchar column of size 1 (ColB). When we >> remove the check against ColA the query returns in 2.5 seconds. When >> included it talks 200 seconds. There is almost no difference when the >> ColB check is removed (2.5s vs 2.3s). >> >> It is of our belief that this varchar(2147483647) could be causing >> performance problems. ColA might be defaulting to 2147483647 because it >> is being union-ed with the same column a couple of times in different >> circumstances. So we are wondering if there a way to set the column's >> varchar size in a view definition? In principle that should not make any difference. (We have had bugs in the query planner in the past that were triggered by such things ... but not recently.) You would be more likely to get useful answers if you posted the table schemas and EXPLAIN ANALYZE results on pgsql-performance. There are some tips on asking good questions here: https://wiki.postgresql.org/wiki/Slow_Query_Questions regards, tom lane