Thread: Postgres Replaying WAL slowly
Hello Everyone ...
We have 6 PG 9.1.12 installation, one master (Ubuntu 10.04), one slony slave(Ubuntu 10.04), and four streaming replica (2 on Ubuntu 10.04 and 2 on RHEL 6.5 (Santiago) which lies on different datacenter). All Ubuntu is on same datacenter. Master send wal archive to slony slave.
This is the infrastructure description :
200Mbit link between data centers, esx 5.5 on hp blade chassis. proliant gen 7 blades. postgres servers dedicated to esx hosts (no other vms on those esx hosts). 3par disk backends with 4 and 8 Gbit fiber channel connections. 10Gbit ethernet virtual connects on the hp chassis. cisco fabric and network switches.
All postgres installed from Ubuntu/RHEL package.
Everything works fine until on Thursday we have high load on master, and after that every streaming replica lag further behind the master. Even on night and weekend where all server load is low. But the slony slave is OK at all.
We thought it was due to network, so we decide to copy wal files to local of a streaming server, and replaying wal from local. After PG restart, it replays wal on a good speed about 3 seconds per wal file, but as the time goes the speed decreasing. We had 30 seconds per wal file. The worst we get is 3 minutes to replay 1 wal file.
The rate of wal produced from master is normal like usual. And also on Thursday we had wal files on pg_xlog on streaming replica server, but no other wal files.
This is the configuration :
SELECT name, current_setting(name)
FROM pg_settings
WHERE source NOT IN ('default', 'override');
name | current_setting
--------------------------------+---------------------------------------------------
application_name | psql
archive_command | /var/lib/postgresql/scripts/wal_archive "%p" "%f"
archive_mode | on
checkpoint_completion_target | 0.7
checkpoint_segments | 30
client_encoding | UTF8
DateStyle | ISO, MDY
default_text_search_config | pg_catalog.english
effective_cache_size | 125GB
effective_io_concurrency | 3
external_pid_file | /var/run/postgresql/9.1-main.pid
hot_standby | on
hot_standby_feedback | on
lc_messages | en_US.UTF-8
lc_monetary | en_US.UTF-8
lc_numeric | en_US.UTF-8
lc_time | en_US.UTF-8
listen_addresses | *
log_checkpoints | on
log_connections | on
log_destination | csvlog
log_directory | pg_log
log_disconnections | on
log_filename | postgresql-%a.log
log_line_prefix | %t
log_lock_waits | on
log_rotation_age | 1d
log_rotation_size | 0
log_temp_files | 100kB
log_timezone | localtime
log_truncate_on_rotation | on
logging_collector | on
maintenance_work_mem | 1GB
max_connections | 750
max_locks_per_transaction | 900
max_pred_locks_per_transaction | 900
max_stack_depth | 2MB
max_wal_senders | 6
port | 5432
shared_buffers | 8GB
ssl | on
temp_buffers | 64MB
TimeZone | America/Chicago
unix_socket_directory | /var/run/postgresql
wal_keep_segments | 50
wal_level | hot_standby
work_mem | 256MB
(47 rows)
Thanks for any help
Regards,
On 06/29/2014 11:14 AM, Soni M wrote: > Everything works fine until on Thursday we have high load on master, and > after that every streaming replica lag further behind the master. Even on > night and weekend where all server load is low. But the slony slave is OK > at all. What does 'top' on the standby say? Is the startup process using 100% of (one) CPU replaying records, or is it waiting for I/O? How large is the database, does it fit in RAM? Any clues in the system or PostgreSQL logs? - Heikki
top and sar says 100% cpu usage of one core, no sign of I/O wait. The database is 1.5TB in size. RAM in master is 145GB, on slave it's differ, some has about 16GB another has 145GB also.
nothing suspicious on standby's postgres log.
on master's postgres log :
WARNING,01000,"pgstat wait timeout",,,,,,,,,""
ERROR,57014,"canceling autovacuum task",,,,,"automatic vacuum of table ""consprod._consprod_replication.sl_event""",,,,""
ERROR,57014,"canceling statement due to statement timeout",,,,,,"
"PARSE",2014-06-26 00:39:35 CDT,91/0,0,ERROR,25P02,"current transaction is aborted, commands ignored until end of transaction block",,,,,,"select 1",,,""
"could not receive data from client: Connection reset by peer",,,,,,,,,""
the log files is big anyway. if you can specify some pattern to look at the log, that would really help.
On Sun, Jun 29, 2014 at 3:31 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 06/29/2014 11:14 AM, Soni M wrote:What does 'top' on the standby say? Is the startup process using 100% of (one) CPU replaying records, or is it waiting for I/O? How large is the database, does it fit in RAM? Any clues in the system or PostgreSQL logs?Everything works fine until on Thursday we have high load on master, and
after that every streaming replica lag further behind the master. Even on
night and weekend where all server load is low. But the slony slave is OK
at all.
- Heikki
Regards,
Here's some data from standby (this is when replaying take 44seconds per wal files):
2014-06-29 00:07:36.513 CDT,,,16682,,53af6f46.412a,44,,2014-06-28 20:43:34 CDT,,0,LOG,00000,"restartpoint complete: wrote 63187 buffers (3.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=209.170 s, sync=0.482 s, total=209.667 s; sync files=644, longest=0.036 s, average=0.000 s",,,,,,,,,""
2014-06-29 00:07:36.513 CDT,,,16682,,53af6f46.412a,45,,2014-06-28 20:43:34 CDT,,0,LOG,00000,"recovery restart point at 27CE/170056A8","last completed transaction was at log time 2014-06-27 13:39:00.542624-05",,,,,,,,""
2014-06-29 00:28:59.678 CDT,,,16682,,53af6f46.412a,47,,2014-06-28 20:43:34 CDT,,0,LOG,00000,"restartpoint complete: wrote 70942 buffers (3.4%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=209.981 s, sync=0.493 s, total=210.486 s; sync files=723, longest=0.156 s, average=0.000 s",,,,,,,,,""
2014-06-29 00:28:59.678 CDT,,,16682,,53af6f46.412a,48,,2014-06-28 20:43:34 CDT,,0,LOG,00000,"recovery restart point at 27CE/35002678","last completed transaction was at log time 2014-06-27 13:42:05.121121-05",,,,,,,,""
Regards,
On 06/29/2014 03:43 PM, Soni M wrote: > top and sar says 100% cpu usage of one core, no sign of I/O wait. Hmm, I wonder what it's doing then... If you have "perf" installed on the system, you can do "perf top" to get a quick overlook of where the CPU time is spent. - Heikki
Here's what 'perf top' said on streaming replica :
Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
95.97% postgres [.] 0x00000000002210f3
0.41% perf [.] 0x000000000005f225
0.39% libc-2.12.so [.] __strstr_sse2
0.22% libc-2.12.so [.] memchr
0.22% [kernel] [k] kallsyms_expand_symbol
0.18% perf [.] symbols__insert
0.18% [kernel] [k] format_decode
0.15% libc-2.12.so [.] __GI___strcmp_ssse3
0.13% [kernel] [k] string
0.12% [kernel] [k] number
0.12% [kernel] [k] vsnprintf
0.12% libc-2.12.so [.] _IO_vfscanf
0.11% perf [.] dso__find_symbol
0.11% [kernel] [k] _spin_unlock_irqrestore
0.10% perf [.] hex2u64
0.10% postgres [.] hash_search_with_hash_value
0.09% perf [.] rb_next
0.08% libc-2.12.so [.] memcpy
0.07% libc-2.12.so [.] __strchr_sse2
0.07% [kernel] [k] clear_page
0.06% [kernel] [k] strnlen
0.05% perf [.] perf_evsel__parse_sample
0.05% perf [.] rb_insert_color
0.05% [kernel] [k] pointer
On Mon, Jun 30, 2014 at 2:05 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 06/29/2014 03:43 PM, Soni M wrote:Hmm, I wonder what it's doing then... If you have "perf" installed on the system, you can do "perf top" to get a quick overlook of where the CPU time is spent.top and sar says 100% cpu usage of one core, no sign of I/O wait.
- Heikki
Regards,
On 06/30/2014 05:46 PM, Soni M wrote: > Here's what 'perf top' said on streaming replica : > > Samples: 26K of event 'cpu-clock', Event count (approx.): 19781 > 95.97% postgres [.] 0x00000000002210f3 Ok, so it's stuck doing something.. Can you get build with debug symbols installed, so that we could see the function name? - Heikki
On Jun 30, 2014, at 9:14 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 06/30/2014 05:46 PM, Soni M wrote: >> Here's what 'perf top' said on streaming replica : >> >> Samples: 26K of event 'cpu-clock', Event count (approx.): 19781 >> 95.97% postgres [.] 0x00000000002210f3 > > Ok, so it's stuck doing something.. Can you get build with debug symbols installed, so that we could see the function name? > - Heikki > Looks like StandbyReleaseLocks: Samples: 10K of event 'cpu-clock', Event count (approx.): 8507 89.21% postgres [.] StandbyReleaseLocks 0.89% libc-2.12.so [.] __strstr_sse2 0.83% perf [.] 0x000000000005f1e5 0.74% [kernel] [k] kallsyms_expand_symbol 0.52% libc-2.12.so [.] memchr 0.47% perf [.] symbols__insert 0.47% [kernel] [k] format_decode
On 2014-06-30 19:14:24 +0300, Heikki Linnakangas wrote: > On 06/30/2014 05:46 PM, Soni M wrote: > >Here's what 'perf top' said on streaming replica : > > > >Samples: 26K of event 'cpu-clock', Event count (approx.): 19781 > > 95.97% postgres [.] 0x00000000002210f3 > > Ok, so it's stuck doing something.. Can you get build with debug symbols > installed, so that we could see the function name? My guess it's a spinlock, probably xlogctl->info_lck via RecoveryInProgress(). Unfortunately inline assembler doesn't always seem to show up correctly in profiles... What worked for me was to build with -fno-omit-frame-pointer - that normally shows the callers, even if it can't generate a proper symbol name. Soni: Do you use Hot Standby? Are there connections active while you have that problem? Any other processes with high cpu load? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jul 1, 2014 at 12:14 AM, Andres Freund <andres@2ndquadrant.com> wrote:
My guess it's a spinlock, probably xlogctl->info_lck via
RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
to show up correctly in profiles...
What worked for me was to build with -fno-omit-frame-pointer - that
normally shows the callers, even if it can't generate a proper symbol
name.
Soni: Do you use Hot Standby? Are there connections active while you
have that problem? Any other processes with high cpu load?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
It is
96.62% postgres [.] StandbyReleaseLocks
as Jeff said. It runs quite long time, more than 5 minutes i think
i also use hot standby. we have 4 streaming replica, some of them has active connection some has not. this issue has last more than 4 days. On one of the standby, above postgres process is the only process that consume high cpu load.
Regards,
On Jun 30, 2014, at 10:29 AM, Soni M <diptatapa@gmail.com> wrote:
On Tue, Jul 1, 2014 at 12:14 AM, Andres Freund <andres@2ndquadrant.com> wrote:My guess it's a spinlock, probably xlogctl->info_lck via
RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
to show up correctly in profiles...
What worked for me was to build with -fno-omit-frame-pointer - that
normally shows the callers, even if it can't generate a proper symbol
name.
Soni: Do you use Hot Standby? Are there connections active while you
have that problem? Any other processes with high cpu load?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
It is96.62% postgres [.] StandbyReleaseLocksas Jeff said. It runs quite long time, more than 5 minutes i thinki also use hot standby. we have 4 streaming replica, some of them has active connection some has not. this issue has last more than 4 days. On one of the standby, above postgres process is the only process that consume high cpu load.
compiled with -fno-omit-frame-pointer doesn't yield much more info:
76.24% postgres [.] StandbyReleaseLocks
2.64% libcrypto.so.1.0.1e [.] md5_block_asm_data_order
2.19% libcrypto.so.1.0.1e [.] RC4
2.17% postgres [.] RecordIsValid
1.20% [kernel] [k] copy_user_generic_unrolled
1.18% [kernel] [k] _spin_unlock_irqrestore
0.97% [vmxnet3] [k] vmxnet3_poll_rx_only
0.87% [kernel] [k] __do_softirq
0.77% [vmxnet3] [k] vmxnet3_xmit_frame
0.69% postgres [.] hash_search_with_hash_value
0.68% [kernel] [k] fin
However, this server started progressing through the WAL files quite a bit better before I finished compiling, so we'll leave it running with this version and see if there's more info available the next time it starts replaying slowly.
On 2014-06-30 11:34:52 -0700, Jeff Frost wrote: > On Jun 30, 2014, at 10:29 AM, Soni M <diptatapa@gmail.com> wrote: > > It is > > 96.62% postgres [.] StandbyReleaseLocks > > as Jeff said. It runs quite long time, more than 5 minutes i think > > > > i also use hot standby. we have 4 streaming replica, some of them has active connection some has not. this issue haslast more than 4 days. On one of the standby, above postgres process is the only process that consume high cpu load. > > compiled with -fno-omit-frame-pointer doesn't yield much more info: You'd need to do perf record -ga instead of perf record -a to see additional information. But: > 76.24% postgres [.] StandbyReleaseLocks already is quite helpful. What are you doing on that system? Is there anything requiring large amounts of access exclusive locks on the primary? Possibly large amounts of temporary relations? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
2014-06-30 20:34 GMT+02:00 Jeff Frost <jeff@pgexperts.com>:
On Jun 30, 2014, at 10:29 AM, Soni M <diptatapa@gmail.com> wrote:On Tue, Jul 1, 2014 at 12:14 AM, Andres Freund <andres@2ndquadrant.com> wrote:My guess it's a spinlock, probably xlogctl->info_lck via
RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
to show up correctly in profiles...
For this kind of issues a systemtap or dtrace can be useful
http://postgres.cz/wiki/Monitorov%C3%A1n%C3%AD_lwlocku_pomoc%C3%AD_systemtapu
http://postgres.cz/wiki/Monitorov%C3%A1n%C3%AD_lwlocku_pomoc%C3%AD_systemtapu
you can identify what locking is a problem - please, use a google translate
Regards
Pavel
What worked for me was to build with -fno-omit-frame-pointer - that
normally shows the callers, even if it can't generate a proper symbol
name.
Soni: Do you use Hot Standby? Are there connections active while you
have that problem? Any other processes with high cpu load?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
It is96.62% postgres [.] StandbyReleaseLocksas Jeff said. It runs quite long time, more than 5 minutes i thinki also use hot standby. we have 4 streaming replica, some of them has active connection some has not. this issue has last more than 4 days. On one of the standby, above postgres process is the only process that consume high cpu load.compiled with -fno-omit-frame-pointer doesn't yield much more info:76.24% postgres [.] StandbyReleaseLocks2.64% libcrypto.so.1.0.1e [.] md5_block_asm_data_order2.19% libcrypto.so.1.0.1e [.] RC42.17% postgres [.] RecordIsValid1.20% [kernel] [k] copy_user_generic_unrolled1.18% [kernel] [k] _spin_unlock_irqrestore0.97% [vmxnet3] [k] vmxnet3_poll_rx_only0.87% [kernel] [k] __do_softirq0.77% [vmxnet3] [k] vmxnet3_xmit_frame0.69% postgres [.] hash_search_with_hash_value0.68% [kernel] [k] finHowever, this server started progressing through the WAL files quite a bit better before I finished compiling, so we'll leave it running with this version and see if there's more info available the next time it starts replaying slowly.
On Jun 30, 2014, at 11:39 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-06-30 11:34:52 -0700, Jeff Frost wrote: >> On Jun 30, 2014, at 10:29 AM, Soni M <diptatapa@gmail.com> wrote: > >>> It is >>> 96.62% postgres [.] StandbyReleaseLocks >>> as Jeff said. It runs quite long time, more than 5 minutes i think >>> >>> i also use hot standby. we have 4 streaming replica, some of them has active connection some has not. this issue haslast more than 4 days. On one of the standby, above postgres process is the only process that consume high cpu load. >> >> compiled with -fno-omit-frame-pointer doesn't yield much more info: > > You'd need to do perf record -ga instead of perf record -a to see > additional information. > Ah! That's right. Here's how that looks: Samples: 473K of event 'cpu-clock', Event count (approx.): 473738 + 68.42% init [kernel.kallsyms] [k] native_safe_halt + 26.07% postgres postgres [.] StandbyReleaseLocks + 2.82% swapper [kernel.kallsyms] [k] native_safe_halt + 0.19% ssh libcrypto.so.1.0.1e [.] md5_block_asm_data_order + 0.19% postgres postgres [.] RecordIsValid + 0.16% ssh libcrypto.so.1.0.1e [.] RC4 + 0.10% postgres postgres [.] hash_search_with_hash_value + 0.06% postgres [kernel.kallsyms] [k] _spin_unlock_irqrestore + 0.05% init [vmxnet3] [k] vmxnet3_poll_rx_only + 0.04% postgres [kernel.kallsyms] [k] copy_user_generic_unrolled + 0.04% init [kernel.kallsyms] [k] finish_task_switch + 0.04% init [kernel.kallsyms] [k] __do_softirq + 0.04% ssh [kernel.kallsyms] [k] _spin_unlock_irqrestore + 0.04% ssh [vmxnet3] [k] vmxnet3_xmit_frame + 0.03% postgres postgres [.] PinBuffer + 0.03% init [vmxnet3] [k] vmxnet3_xmit_frame + 0.03% ssh [kernel.kallsyms] [k] copy_user_generic_unrolled + 0.03% postgres postgres [.] XLogReadBufferExtended + 0.03% ssh ssh [.] 0x000000000002aa07 + 0.03% init [kernel.kallsyms] [k] _spin_unlock_irqrestore + 0.03% ssh [vmxnet3] [k] vmxnet3_poll_rx_only + 0.02% ssh [kernel.kallsyms] [k] __do_softirq + 0.02% postgres libc-2.12.so [.] _wordcopy_bwd_dest_aligned + 0.02% postgres postgres [.] mdnblocks + 0.02% ssh libcrypto.so.1.0.1e [.] 0x00000000000e25a1 + 0.02% scp [kernel.kallsyms] [k] copy_user_generic_unrolled + 0.02% ssh libc-2.12.so [.] memcpy + 0.02% postgres libc-2.12.so [.] memcpy > But: > >> 76.24% postgres [.] StandbyReleaseLocks > > already is quite helpful. > > What are you doing on that system? Is there anything requiring large > amounts of access exclusive locks on the primary? Possibly large amounts > of temporary relations? The last time we did a 100% logging run, the peak temp table creation was something like 120k/hr, but the replicas seemedable to keep up with that just fine. Hopefully Soni can answer whether that has increased significantly since May.
On Jun 30, 2014, at 12:17 PM, Jeff Frost <jeff@pgexperts.com> wrote:
already is quite helpful.
What are you doing on that system? Is there anything requiring large
amounts of access exclusive locks on the primary? Possibly large amounts
of temporary relations?
The last time we did a 100% logging run, the peak temp table creation was something like 120k/hr, but the replicas seemed able to keep up with that just fine.
Sampling pg_locks on the primary shows ~50 locks with ExclusiveLock mode:
mode | count
--------------------------+-------
AccessExclusiveLock | 11
AccessShareLock | 2089
ExclusiveLock | 46
RowExclusiveLock | 81
RowShareLock | 17
ShareLock | 4
ShareUpdateExclusiveLock | 5
Seems to be relatively consistent. Of course, it's hard to say what it looked like back when the issue began.
Jeff Frost <jeff@pgexperts.com> writes: > Sampling pg_locks on the primary shows ~50 locks with ExclusiveLock mode: > mode | count > --------------------------+------- > AccessExclusiveLock | 11 > AccessShareLock | 2089 > ExclusiveLock | 46 > RowExclusiveLock | 81 > RowShareLock | 17 > ShareLock | 4 > ShareUpdateExclusiveLock | 5 That's not too helpful if you don't pay attention to what the lock is on; it's likely that all the ExclusiveLocks are on transactions' own XIDs, which isn't relevant to the standby's behavior. The AccessExclusiveLocks are probably interesting though --- you should look to see what those are on. regards, tom lane
On Jun 30, 2014, at 12:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Frost <jeff@pgexperts.com> writes: >> Sampling pg_locks on the primary shows ~50 locks with ExclusiveLock mode: > >> mode | count >> --------------------------+------- >> AccessExclusiveLock | 11 >> AccessShareLock | 2089 >> ExclusiveLock | 46 >> RowExclusiveLock | 81 >> RowShareLock | 17 >> ShareLock | 4 >> ShareUpdateExclusiveLock | 5 > > That's not too helpful if you don't pay attention to what the lock is on; > it's likely that all the ExclusiveLocks are on transactions' own XIDs, > which isn't relevant to the standby's behavior. The AccessExclusiveLocks > are probably interesting though --- you should look to see what those > are on. You're right about the ExclusiveLocks. Here's how the AccessExclusiveLocks look: locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction| pid | mode | granted ----------+----------+------------+------+-------+------------+---------------+---------+------------+----------+--------------------+-------+---------------------+--------- relation | 111285 | 3245291551 | | | | | | | | 233/170813 | 23509 | AccessExclusiveLock | t relation | 111285 | 3245292820 | | | | | | | | 5/22498235 | 23427 | AccessExclusiveLock | t relation | 111285 | 3245292833 | | | | | | | | 173/1723993 | 23407 | AccessExclusiveLock | t relation | 111285 | 3245287874 | | | | | | | | 133/3818415 | 23348 | AccessExclusiveLock | t relation | 111285 | 3245292836 | | | | | | | | 173/1723993 | 23407 | AccessExclusiveLock | t relation | 111285 | 3245292774 | | | | | | | | 5/22498235 | 23427 | AccessExclusiveLock | t relation | 111285 | 3245292734 | | | | | | | | 5/22498235 | 23427 | AccessExclusiveLock | t relation | 111285 | 3245292827 | | | | | | | | 173/1723993 | 23407 | AccessExclusiveLock | t relation | 111285 | 3245288540 | | | | | | | | 133/3818415 | 23348 | AccessExclusiveLock | t relation | 111285 | 3245292773 | | | | | | | | 5/22498235 | 23427 | AccessExclusiveLock | t relation | 111285 | 3245292775 | | | | | | | | 5/22498235 | 23427 | AccessExclusiveLock | t relation | 111285 | 3245292743 | | | | | | | | 5/22498235 | 23427 | AccessExclusiveLock | t relation | 111285 | 3245292751 | | | | | | | | 5/22498235 | 23427 | AccessExclusiveLock | t relation | 111285 | 3245288669 | | | | | | | | 133/3818415 | 23348 | AccessExclusiveLock | t relation | 111285 | 3245292817 | | | | | | | | 5/22498235 | 23427 | AccessExclusiveLock | t relation | 111285 | 3245288657 | | | | | | | | 133/3818415 | 23348 | AccessExclusiveLock | t object | 111285 | | | | | | 2615 | 1246019760 | 0 | 233/170813 | 23509 | AccessExclusiveLock | t relation | 111285 | 3245292746 | | | | | | | | 5/22498235 | 23427 | AccessExclusiveLock | t relation | 111285 | 3245287876 | | | | | | | | 133/3818415 | 23348 | AccessExclusiveLock | t relation | 111285 | 3245292739 | | | | | | | | 5/22498235 | 23427 | AccessExclusiveLock | t relation | 111285 | 3245292826 | | | | | | | | 5/22498235 | 23427 | AccessExclusiveLock | t relation | 111285 | 3245292825 | | | | | | | | 5/22498235 | 23427 | AccessExclusiveLock | t relation | 111285 | 3245292832 | | | | | | | | 173/1723993 | 23407 | AccessExclusiveLock | t relation | 111285 | 3245292740 | | | | | | | | 5/22498235 | 23427 | AccessExclusiveLock | t relation | 111285 | 3245287871 | | | | | | | | 133/3818415 | 23348 | AccessExclusiveLock | t (25 rows) And if you go fishing in pg_class for any of the oids, you don't find anything: SELECT s.procpid, s.query_start, n.nspname, c.relname, l.mode, l.granted, s.current_query FROM pg_locks l, pg_class c, pg_stat_activity s, pg_namespace n WHERE l.relation = c.oid AND l.pid = s.procpid AND c.relnamespace = n.oid AND l.mode = 'AccessExclusiveLock'; procpid | query_start | nspname | relname | mode | granted | current_query ---------+-------------+---------+---------+------+---------+--------------- (0 rows) Temp tables maybe?
On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost <jeff@pgexperts.com> wrote:
And if you go fishing in pg_class for any of the oids, you don't find anything:
That is probably because you are connected in the wrong database. Once you connect to the database of interest, you don't even need to query pg_class, just cast relation attribute to regclass:
SELECT relation::regclass, ...
FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE datname = current_database());
SELECT relation::regclass, ...
FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE datname = current_database());
Regards,
--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres
On Jun 30, 2014, at 12:54 PM, Matheus de Oliveira <matioli.matheus@gmail.com> wrote:
On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost <jeff@pgexperts.com> wrote:And if you go fishing in pg_class for any of the oids, you don't find anything:That is probably because you are connected in the wrong database. Once you connect to the database of interest, you don't even need to query pg_class, just cast relation attribute to regclass:
SELECT relation::regclass, ...
FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE datname = current_database());
Yah, i thought about that too, but verified I am in the correct DB. Just for clarity sake:
SELECT relation::regclass
FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE datname = current_database()) and mode = 'AccessExclusiveLock';
relation
------------
3245508214
3245508273
3245508272
3245508257
3245508469
3245508274
3245508373
3245508468
3245508210
3245508463
3245508205
3245508260
3245508265
3245508434
(16 rows)
On 2014-06-30 12:57:56 -0700, Jeff Frost wrote: > > On Jun 30, 2014, at 12:54 PM, Matheus de Oliveira <matioli.matheus@gmail.com> wrote: > > > > > On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost <jeff@pgexperts.com> wrote: > > And if you go fishing in pg_class for any of the oids, you don't find anything: > > > > That is probably because you are connected in the wrong database. Once you connect to the database of interest, you don'teven need to query pg_class, just cast relation attribute to regclass: > > > > SELECT relation::regclass, ... > > FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE datname = current_database()); > > > > Yah, i thought about that too, but verified I am in the correct DB. Just for clarity sake: So these are probably relations created in uncommitted transactions. Possibly ON COMMIT DROP temp tables? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Jun 30, 2014, at 1:15 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-06-30 12:57:56 -0700, Jeff Frost wrote:
On Jun 30, 2014, at 12:54 PM, Matheus de Oliveira <matioli.matheus@gmail.com> wrote:
On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost <jeff@pgexperts.com> wrote:
And if you go fishing in pg_class for any of the oids, you don't find anything:
That is probably because you are connected in the wrong database. Once you connect to the database of interest, you don't even need to query pg_class, just cast relation attribute to regclass:
SELECT relation::regclass, ...
FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE datname = current_database());
Yah, i thought about that too, but verified I am in the correct DB. Just for clarity sake:
So these are probably relations created in uncommitted
transactions. Possibly ON COMMIT DROP temp tables?
That would make sense. There are definitely quite a few of those being used.
Another item of note is the system catalogs are quite bloated:
schemaname | tablename | tbloat | wastedmb | idxbloat | wastedidxmb
------------+--------------+--------+----------+----------+-------------
pg_catalog | pg_attribute | 3945 | 106.51 | 2770 | 611.28
pg_catalog | pg_class | 8940 | 45.26 | 4420 | 47.89
pg_catalog | pg_type | 4921 | 18.45 | 5850 | 81.16
pg_catalog | pg_depend | 933 | 10.23 | 11730 | 274.37
pg_catalog | pg_index | 3429 | 8.36 | 3920 | 24.24
pg_catalog | pg_shdepend | 983 | 2.67 | 9360 | 30.56
(6 rows)
Would that cause the replica to spin on StandbyReleaseLocks?
Jeff Frost <jeff@pgexperts.com> writes: > On Jun 30, 2014, at 1:15 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> So these are probably relations created in uncommitted >> transactions. Possibly ON COMMIT DROP temp tables? > That would make sense. There are definitely quite a few of those being used. Uh-huh. I doubt that the mechanism that handles propagation of AccessExclusiveLocks to the standby is smart enough to ignore locks on temp tables :-( > Another item of note is the system catalogs are quite bloated: > Would that cause the replica to spin on StandbyReleaseLocks? AFAIK, no. It's an unsurprising consequence of heavy use of short-lived temp tables though. So it seems like we have a candidate explanation. I'm a bit surprised that StandbyReleaseLocks would get this slow if there are only a dozen AccessExclusiveLocks in place at any one time, though. Perhaps that was a low point and there are often many more? regards, tom lane
On Jun 30, 2014, at 1:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >> Another item of note is the system catalogs are quite bloated: >> Would that cause the replica to spin on StandbyReleaseLocks? > > AFAIK, no. It's an unsurprising consequence of heavy use of short-lived > temp tables though. > Yah, this has been an issue in the past, so we tend to cluster them regularly during off-hours to minimize the issue. > So it seems like we have a candidate explanation. I'm a bit surprised > that StandbyReleaseLocks would get this slow if there are only a dozen > AccessExclusiveLocks in place at any one time, though. Perhaps that > was a low point and there are often many more? > > Entirely possible that it was a low point. We'll set up some monitoring to track the number of AccessExclusiveLocks andsee how much variance there is throughout the day.
On Jun 30, 2014, at 1:46 PM, Jeff Frost <jeff@pgexperts.com> wrote:
So it seems like we have a candidate explanation. I'm a bit surprised
that StandbyReleaseLocks would get this slow if there are only a dozen
AccessExclusiveLocks in place at any one time, though. Perhaps that
was a low point and there are often many more?
Entirely possible that it was a low point. We'll set up some monitoring to track the number of AccessExclusiveLocks and see how much variance there is throughout the day.
Since we turned on the monitoring for that, we had a peak of 13,550 AccessExclusiveLocks. So far most of the samples have been in the double digit, with that and two other outliers: 6,118 and 12,747.
Jeff Frost <jeff@pgexperts.com> writes: >>> So it seems like we have a candidate explanation. I'm a bit surprised >>> that StandbyReleaseLocks would get this slow if there are only a dozen >>> AccessExclusiveLocks in place at any one time, though. Perhaps that >>> was a low point and there are often many more? > Since we turned on the monitoring for that, we had a peak of 13,550 > AccessExclusiveLocks. Ah ... that's more like a number I can believe something would have trouble coping with. Did you see a noticeable slowdown with this? Now that we've seen that number, of course it's possible there was an even higher peak occurring when you saw the trouble. Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe it just takes awhile to handle that many locks. Did you check whether the locks were all on temp tables of the ON COMMIT DROP persuasion? regards, tom lane
On 2014-06-30 19:04:20 -0400, Tom Lane wrote: > Jeff Frost <jeff@pgexperts.com> writes: > >>> So it seems like we have a candidate explanation. I'm a bit surprised > >>> that StandbyReleaseLocks would get this slow if there are only a dozen > >>> AccessExclusiveLocks in place at any one time, though. Perhaps that > >>> was a low point and there are often many more? > > > Since we turned on the monitoring for that, we had a peak of 13,550 > > AccessExclusiveLocks. Any chance the workload also uses lots of subtransactions? > Ah ... that's more like a number I can believe something would have > trouble coping with. Did you see a noticeable slowdown with this? > Now that we've seen that number, of course it's possible there was an > even higher peak occurring when you saw the trouble. > > Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe > it just takes awhile to handle that many locks. I don't think there's a O(n^2) in StandbyReleaseLocks() itself, but in combination with StandbyReleaseLockTree() it looks possibly bad. The latter will call StandbyReleaseLocks() for every xid/subxid, and each of the StandbyReleaseLocks() will then trawl the entire RecoveryLockList... It'd probably be better to implement ReleaseLocksTree() by sorting the subxid list and bsearch that while iterating RecoveryLockList. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Jun 30, 2014, at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ah ... that's more like a number I can believe something would have
trouble coping with. Did you see a noticeable slowdown with this?
Now that we've seen that number, of course it's possible there was an
even higher peak occurring when you saw the trouble.
Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe
it just takes awhile to handle that many locks.
Did you check whether the locks were all on temp tables of the
ON COMMIT DROP persuasion?
Unfortunately not, because I went for a poor man's: SELECT count(*) FROM pg_locks WHERE mode = 'AccessExclusiveLock'
run in cron every minute.
That said, I'd bet it was mostly ON COMMIT DROP temp tables.
The unfortunate thing is I wouldn't know how to correlate that spike with the corresponding slowdown because the replica is about 5.5hrs lagged at the moment.
Hopefully it will get caught up tonight and we can see if there's a correlation tomorrow.
On Jun 30, 2014, at 4:57 PM, Jeff Frost <jeff@pgexperts.com> wrote:
On Jun 30, 2014, at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Ah ... that's more like a number I can believe something would have
trouble coping with. Did you see a noticeable slowdown with this?
Now that we've seen that number, of course it's possible there was an
even higher peak occurring when you saw the trouble.
Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe
it just takes awhile to handle that many locks.
Did you check whether the locks were all on temp tables of the
ON COMMIT DROP persuasion?Unfortunately not, because I went for a poor man's: SELECT count(*) FROM pg_locks WHERE mode = 'AccessExclusiveLock'run in cron every minute.That said, I'd bet it was mostly ON COMMIT DROP temp tables.The unfortunate thing is I wouldn't know how to correlate that spike with the corresponding slowdown because the replica is about 5.5hrs lagged at the moment.Hopefully it will get caught up tonight and we can see if there's a correlation tomorrow.
And indeed it did catch up overnight and the lag increased shortly after a correlating spike in AccessExclusiveLocks that were generated by temp table creation with on commit drop.
Jeff Frost <jeff@pgexperts.com> writes: >> On Jun 30, 2014, at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Did you check whether the locks were all on temp tables of the >>> ON COMMIT DROP persuasion? > And indeed it did catch up overnight and the lag increased shortly after a correlating spike in AccessExclusiveLocks thatwere generated by temp table creation with on commit drop. OK, so we have a pretty clear idea of where the problem is now. It seems like there are three, not mutually exclusive, ways we might address this: 1. Local revisions inside StandbyReleaseLocks to make it perform better in the presence of many locks. This would only be likely to improve matters much if there's a fixable O(N^2) algorithmic issue; but there might well be one. 2. Avoid WAL-logging AccessExclusiveLocks associated with temp tables, on the grounds that no standby should be touching them. I'm not entirely sure that that argument is bulletproof though; in particular, even though a standby couldn't access the table's data, it's possible that it would be interested in seeing consistent catalog entries. 3. Avoid WAL-logging AccessExclusiveLocks associated with new-in-transaction tables, temp or not, on the grounds that no standby could even see such tables until they're committed. We could go a bit further and not take out any locks on a new-in-transaction table in the first place, on the grounds that other transactions on the master can't see 'em either. It sounded like Andres had taken a preliminary look at #1 and found a possible avenue for improvement, which I'd encourage him to pursue. For both #2 and the conservative version of #3, the main implementation problem would be whether the lock WAL-logging code has cheap access to the necessary information. I suspect it doesn't. The radical version of #3 might be pretty easy to do, at least to the extent of removing locks taken out during CREATE TABLE. I suspect there are some assertions or other consistency checks that would get unhappy if we manipulate relations without locks, though, so those would have to be taught about the exception. Also, we sometimes forget new-in-transaction status during relcache flush events; it's not clear if that would be a problem for this. I don't plan to work on this myself, but perhaps someone with more motivation will want to run with these ideas. regards, tom lane
On 2014-07-01 15:20:37 -0400, Tom Lane wrote: > Jeff Frost <jeff@pgexperts.com> writes: > >> On Jun 30, 2014, at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> Did you check whether the locks were all on temp tables of the > >>> ON COMMIT DROP persuasion? > > > And indeed it did catch up overnight and the lag increased shortly after a correlating spike in AccessExclusiveLocksthat were generated by temp table creation with on commit drop. > > OK, so we have a pretty clear idea of where the problem is now. > > It seems like there are three, not mutually exclusive, ways we might > address this: > > 1. Local revisions inside StandbyReleaseLocks to make it perform better in > the presence of many locks. This would only be likely to improve matters > much if there's a fixable O(N^2) algorithmic issue; but there might well > be one. > > It sounded like Andres had taken a preliminary look at #1 and found a > possible avenue for improvement, which I'd encourage him to pursue. > I don't have the resources to do this right now, but yes, I think we can get relatively easily get rid of the O(num_locks * num_subtransactions) behaviour. > 2. Avoid WAL-logging AccessExclusiveLocks associated with temp tables, on > the grounds that no standby should be touching them. I'm not entirely > sure that that argument is bulletproof though; in particular, even though > a standby couldn't access the table's data, it's possible that it would be > interested in seeing consistent catalog entries. Hm. We definitely perform checks surprisingly late for those. It's possible to do SELECT * FROM pg_temp_<nn>.whatever; without an error f there's no rows of if the rest of the plan doesn't do accesses to that table. The check prohibiting access is only in bufmgr.c... So yea, I don't think we can do this for at least < 9.4. And there it'll still be hard. > 3. Avoid WAL-logging AccessExclusiveLocks associated with > new-in-transaction tables, temp or not, on the grounds that no standby > could even see such tables until they're committed. We could go a bit > further and not take out any locks on a new-in-transaction table in the > first place, on the grounds that other transactions on the master can't > see 'em either. > > For both #2 and the conservative version of #3, the main implementation > problem would be whether the lock WAL-logging code has cheap access to > the necessary information. I suspect it doesn't. Not trivially. It's logged directly in LockAcquireExtended(). We could add the information into locktags as there's unused fields for relation locktags, but brrr. > The radical version of #3 might be pretty easy to do, at least to the > extent of removing locks taken out during CREATE TABLE. I suspect there > are some assertions or other consistency checks that would get unhappy if > we manipulate relations without locks, though, so those would have to be > taught about the exception. > > Also, we sometimes forget new-in-transaction > status during relcache flush events; it's not clear if that would be a > problem for this. I think that hole is actually pluggable in newer releases - at least there's no code around that assumes rd_createSubid now is persistent, even across cache resets. But I think more importantly it's probably quite possible to hit a similar problem without ON COMMIT DROP relations. Say DISCARD TEMP inside a transaction (with several subxacts) or so? So we probaly really should fix the bad scaling. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-07-01 15:20:37 -0400, Tom Lane wrote: >> It seems like there are three, not mutually exclusive, ways we might >> address this: > But I think more importantly it's probably quite possible to hit a > similar problem without ON COMMIT DROP relations. Say DISCARD TEMP > inside a transaction (with several subxacts) or so? So we probaly really > should fix the bad scaling. Well, my thought was that these approaches would address somewhat different sets of use-cases, and we might well want to do more than one. Even if StandbyReleaseLocks were zero-cost, not emitting the WAL in the first place is surely considerably cheaper yet. regards, tom lane
On 1 July 2014 20:20, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I don't plan to work on this myself, but perhaps someone with more > motivation will want to run with these ideas. I was planning to work on improving performance of replication apply over the summer, mid July - Aug, so I'll add this to the list. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 1 July 2014 20:20, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Frost <jeff@pgexperts.com> writes: >>> On Jun 30, 2014, at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> Did you check whether the locks were all on temp tables of the >>>> ON COMMIT DROP persuasion? > >> And indeed it did catch up overnight and the lag increased shortly after a correlating spike in AccessExclusiveLocks thatwere generated by temp table creation with on commit drop. > > OK, so we have a pretty clear idea of where the problem is now. > > It seems like there are three, not mutually exclusive, ways we might > address this: > > 1. Local revisions inside StandbyReleaseLocks to make it perform better in > the presence of many locks. This would only be likely to improve matters > much if there's a fixable O(N^2) algorithmic issue; but there might well > be one. > > 2. Avoid WAL-logging AccessExclusiveLocks associated with temp tables, on > the grounds that no standby should be touching them. I'm not entirely > sure that that argument is bulletproof though; in particular, even though > a standby couldn't access the table's data, it's possible that it would be > interested in seeing consistent catalog entries. > > 3. Avoid WAL-logging AccessExclusiveLocks associated with > new-in-transaction tables, temp or not, on the grounds that no standby > could even see such tables until they're committed. We could go a bit > further and not take out any locks on a new-in-transaction table in the > first place, on the grounds that other transactions on the master can't > see 'em either. > > It sounded like Andres had taken a preliminary look at #1 and found a > possible avenue for improvement, which I'd encourage him to pursue. > > For both #2 and the conservative version of #3, the main implementation > problem would be whether the lock WAL-logging code has cheap access to > the necessary information. I suspect it doesn't. > > The radical version of #3 might be pretty easy to do, at least to the > extent of removing locks taken out during CREATE TABLE. I suspect there > are some assertions or other consistency checks that would get unhappy if > we manipulate relations without locks, though, so those would have to be > taught about the exception. Also, we sometimes forget new-in-transaction > status during relcache flush events; it's not clear if that would be a > problem for this. > > I don't plan to work on this myself, but perhaps someone with more > motivation will want to run with these ideas. Patch implements option 2 in the above. Skipping the locks entirely seems like it opens a can of worms. Skipping the lock for temp tables is valid since locks don't need to exist on the standby. Any catalog entries for them will exist, but the rows will show them as temp and nobody would expect them to be valid outside of the original session. Patch implements a special case that takes the lock normally, but skips WAL logging the lock info. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services