Thread: Large values for duration of COMMITs and slow queries. Due to large WAL config values?
Large values for duration of COMMITs and slow queries. Due to large WAL config values?
From
Cody Caughlan
Date:
Postgres 9.1.1, master with 2 slaves via streaming replication. I've enabled slow query logging of 150ms and am seeing a large number of slow COMMITs: 2011-11-12 06:55:02 UTC pid:30897 (28/0-0) LOG: duration: 232.398 ms statement: COMMIT 2011-11-12 06:55:08 UTC pid:30896 (27/0-0) LOG: duration: 1078.789 ms statement: COMMIT 2011-11-12 06:55:09 UTC pid:30842 (15/0-0) LOG: duration: 2395.432 ms statement: COMMIT 2011-11-12 06:55:09 UTC pid:30865 (23/0-0) LOG: duration: 2395.153 ms statement: COMMIT 2011-11-12 06:55:09 UTC pid:30873 (17/0-0) LOG: duration: 2390.106 ms statement: COMMIT The machine has 16GB of RAM and plenty of disk space. What I think might be relevant settings are: wal_buffers = 16MB checkpoint_segments = 32 max_wal_senders = 10 checkpoint_completion_target = 0.9 wal_keep_segments = 1024 maintenance_work_mem = 256MB work_mem = 88MB shared_buffers = 3584MB effective_cache_size = 10GB Recently we have bumped up wal_keep_segments and checkpoint_segments because we wanted to run long running queries on the slaves and we're receiving cancellation errors on the slaves. I think the master was recycling WAL logs from underneath the slave and thus canceling the queries. Hence, I believed I needed to crank up those values. It seems to work, I can run long queries (for statistics / reports) on the slaves just fine. But I now wonder if its having an adverse effect on the master, ala these slow commit times and other slow queries (e.g. primary key lookups on tables with not that many records), which seem to have increased since the configuration change. I am watching iostat and sure enough, when %iowait gets > 15 or so then a bunch more slow queries get logged. So I can see its disk related. I just dont know what the underlying cause is. Any pointers would be appreciated. Thank you.
Re: Large values for duration of COMMITs and slow queries. Due to large WAL config values?
From
Cody Caughlan
Date:
I've run VACUUM ANALYZE on all my tables to make sure the house has been cleaned. I still see a lot of slow queries / commits, even on primary key lookups and well indexed tables. /Cody On Fri, Nov 11, 2011 at 11:04 PM, Cody Caughlan <toolbag@gmail.com> wrote: > Postgres 9.1.1, master with 2 slaves via streaming replication. > > I've enabled slow query logging of 150ms and am seeing a large number > of slow COMMITs: > > 2011-11-12 06:55:02 UTC pid:30897 (28/0-0) LOG: duration: 232.398 ms > statement: COMMIT > 2011-11-12 06:55:08 UTC pid:30896 (27/0-0) LOG: duration: 1078.789 ms > statement: COMMIT > 2011-11-12 06:55:09 UTC pid:30842 (15/0-0) LOG: duration: 2395.432 ms > statement: COMMIT > 2011-11-12 06:55:09 UTC pid:30865 (23/0-0) LOG: duration: 2395.153 ms > statement: COMMIT > 2011-11-12 06:55:09 UTC pid:30873 (17/0-0) LOG: duration: 2390.106 ms > statement: COMMIT > > The machine has 16GB of RAM and plenty of disk space. What I think > might be relevant settings are: > > wal_buffers = 16MB > checkpoint_segments = 32 > max_wal_senders = 10 > checkpoint_completion_target = 0.9 > wal_keep_segments = 1024 > maintenance_work_mem = 256MB > work_mem = 88MB > shared_buffers = 3584MB > effective_cache_size = 10GB > > Recently we have bumped up wal_keep_segments and checkpoint_segments > because we wanted to run long running queries on the slaves and we're > receiving cancellation errors on the slaves. I think the master was > recycling WAL logs from underneath the slave and thus canceling the > queries. Hence, I believed I needed to crank up those values. It seems > to work, I can run long queries (for statistics / reports) on the > slaves just fine. > > But I now wonder if its having an adverse effect on the master, ala > these slow commit times and other slow queries (e.g. primary key > lookups on tables with not that many records), which seem to have > increased since the configuration change. > > I am watching iostat and sure enough, when %iowait gets > 15 or so > then a bunch more slow queries get logged. So I can see its disk > related. > > I just dont know what the underlying cause is. > > Any pointers would be appreciated. Thank you. >
Re: Re: Large values for duration of COMMITs and slow queries. Due to large WAL config values?
From
Igor Polishchuk
Date:
I saw such behavior a few years ago on multiple very busy databases connected to the same EMC SAN. The SAN's cache got overwhelmed by the databases IO, and the storage latency went up significantly. I don't remember now what was the latency, but it was above 40ms. Is everything ok with your storage system? Is it possible your databases produce more IOPS than your storage may handle? On Sat, 2011-11-12 at 13:47 -0800, Cody Caughlan wrote: > I've run VACUUM ANALYZE on all my tables to make sure the house has > been cleaned. I still see a lot of slow queries / commits, even on > primary key lookups and well indexed tables. > > /Cody > > On Fri, Nov 11, 2011 at 11:04 PM, Cody Caughlan <toolbag@gmail.com> wrote: > > Postgres 9.1.1, master with 2 slaves via streaming replication. > > > > I've enabled slow query logging of 150ms and am seeing a large number > > of slow COMMITs: > > > > 2011-11-12 06:55:02 UTC pid:30897 (28/0-0) LOG: duration: 232.398 ms > > statement: COMMIT > > 2011-11-12 06:55:08 UTC pid:30896 (27/0-0) LOG: duration: 1078.789 ms > > statement: COMMIT > > 2011-11-12 06:55:09 UTC pid:30842 (15/0-0) LOG: duration: 2395.432 ms > > statement: COMMIT > > 2011-11-12 06:55:09 UTC pid:30865 (23/0-0) LOG: duration: 2395.153 ms > > statement: COMMIT > > 2011-11-12 06:55:09 UTC pid:30873 (17/0-0) LOG: duration: 2390.106 ms > > statement: COMMIT > > > > The machine has 16GB of RAM and plenty of disk space. What I think > > might be relevant settings are: > > > > wal_buffers = 16MB > > checkpoint_segments = 32 > > max_wal_senders = 10 > > checkpoint_completion_target = 0.9 > > wal_keep_segments = 1024 > > maintenance_work_mem = 256MB > > work_mem = 88MB > > shared_buffers = 3584MB > > effective_cache_size = 10GB > > > > Recently we have bumped up wal_keep_segments and checkpoint_segments > > because we wanted to run long running queries on the slaves and we're > > receiving cancellation errors on the slaves. I think the master was > > recycling WAL logs from underneath the slave and thus canceling the > > queries. Hence, I believed I needed to crank up those values. It seems > > to work, I can run long queries (for statistics / reports) on the > > slaves just fine. > > > > But I now wonder if its having an adverse effect on the master, ala > > these slow commit times and other slow queries (e.g. primary key > > lookups on tables with not that many records), which seem to have > > increased since the configuration change. > > > > I am watching iostat and sure enough, when %iowait gets > 15 or so > > then a bunch more slow queries get logged. So I can see its disk > > related. > > > > I just dont know what the underlying cause is. > > > > Any pointers would be appreciated. Thank you. > > >
Re: Large values for duration of COMMITs and slow queries. Due to large WAL config values?
From
Simon Riggs
Date:
On Sat, Nov 12, 2011 at 7:04 AM, Cody Caughlan <toolbag@gmail.com> wrote: > Postgres 9.1.1, master with 2 slaves via streaming replication. > > I've enabled slow query logging of 150ms and am seeing a large number > of slow COMMITs: > > 2011-11-12 06:55:02 UTC pid:30897 (28/0-0) LOG: duration: 232.398 ms > statement: COMMIT > 2011-11-12 06:55:08 UTC pid:30896 (27/0-0) LOG: duration: 1078.789 ms > statement: COMMIT Could well be related to checkpoints. Please try log_checkpoints = on and see if there is a correlation. > Recently we have bumped up wal_keep_segments and checkpoint_segments > because we wanted to run long running queries on the slaves and we're > receiving cancellation errors on the slaves. I think the master was > recycling WAL logs from underneath the slave and thus canceling the > queries. Hence, I believed I needed to crank up those values. It seems > to work, I can run long queries (for statistics / reports) on the > slaves just fine. That reasoning isn't sound because it doesn't work like that. Recycling WAL files has nothing to do with query cancelation on hot standby. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services