Thread: BUG #8192: On very large tables the concurrent update with vacuum lag the hot_standby replica
BUG #8192: On very large tables the concurrent update with vacuum lag the hot_standby replica
From
federico@brandwatch.com
Date:
The following bug has been logged on the website: Bug reference: 8192 Logged by: Federico Campoli Email address: federico@brandwatch.com PostgreSQL version: 9.2.4 Operating system: Debian 6.0 Description: = /* Description: It seems on very large tables the concurrent update with vacuum (or autovacuum), = when the slave is in hot standby mode, generates long loops in read on a single wal segment during the recovery process. This have two nasty effects. A massive read IO peak and the replay lag increasing as the recovery process hangs for long periods on a pointless loop. PostgreSQL version: PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit Steps to reproduce the error: setup an hot standby server. the error occurs with streaming replication enabled and disabled */ SET client_min_messages=3D'debug2'; SET trace_sort=3D'on'; --create a new empty table DROP TABLE IF EXISTS t_vacuum; CREATE table t_vacuum ( i_id_row integer, ts_time timestamp with time zone default now() ) ; --disable the autovacuum, we need to run it manually ALTER TABLE t_vacuum set (autovacuum_enabled=3D'off'); --insert into the table enough data to have multiple file nodes INSERT INTO t_vacuum ( i_id_row ) SELECT * FROM generate_series(1,40000000) ; /* start iotop on the hot standby monitoring the postgres:startup process and eventually the wal receiver for example iotop -p 31293 wait for all the wal files generated during the update to be applied = this does not apply on live servers of course but is useful to trap only the IO activity generated by vacuum If the streaming replication is on, this select will tell when the slave has finished. In any case is useful to check the replication lag during the vacuum. SELECT = pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_location(), replay_location)) FROM = pg_stat_replication ; */ --in a new session and start an huge table update UPDATE t_vacuum set ts_time=3Dnow() WHERE i_id_row<20000000; --then vacuum the table = VACUUM VERBOSE t_vacuum; --at some point the startup process will stuck recovering one single wal file and --the DISK READ column will show a huge IO for a while. = --if you monitor the wal receiver also you will notice stream will continue without problems --sooner or later the startup process will resume the recovery but meanwhile the DISK READ is not zero --the slave will accumulate replication lag =
Re: BUG #8192: On very large tables the concurrent update with vacuum lag the hot_standby replica
From
Simon Riggs
Date:
On 30 May 2013 17:43, <federico@brandwatch.com> wrote: > It seems on very large tables the concurrent update with vacuum (or > autovacuum), > when the slave is in hot standby mode, generates long loops in read on a > single wal segment during the recovery process. Not all aspects of hot standby are optimally tuned, as yet. Some room for improvement exists. In some cases, conscious choices were made to keep processing on master fast at the expense of speed on the standby. There doesn't seem to be anything which is in itself an error here. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: BUG #8192: On very large tables the concurrent update with vacuum lag the hot_standby replica
From
Jeff Janes
Date:
On Thursday, May 30, 2013, wrote: > The following bug has been logged on the website: > > Bug reference: 8192 > Logged by: Federico Campoli > Email address: federico@brandwatch.com <javascript:;> > PostgreSQL version: 9.2.4 > Operating system: Debian 6.0 > Description: > > /* > > Description: > > It seems on very large tables the concurrent update with vacuum (or > autovacuum), > when the slave is in hot standby mode, generates long loops in read on a > single wal segment during the recovery process. > > This have two nasty effects. > A massive read IO peak and the replay lag increasing as the recovery > process > hangs for long periods on a pointless loop. > Are you observing a loop, and if so how are you observing it? What is it that is looping? > SET client_min_messages='debug2'; > SET trace_sort='on'; > Are these settings useful? What are they showing you? > > --in a new session and start an huge table update > UPDATE t_vacuum set ts_time=now() WHERE i_id_row<20000000; > > --then vacuum the table > VACUUM VERBOSE t_vacuum; > Are you running the update and vacuum concurrently or serially? > > --at some point the startup process will stuck recovering one single wal > file and > --the DISK READ column will show a huge IO for a while. > What is huge? I don't know if I can reproduce this or not. I certainly get spiky lag, but I see no reason to think it is anything other than IO congestion, occurring during stretches of WAL records where compact records describe a larger amount of work that needs to be done in terms of poorly-cached IO. Perhaps the kernel's read-ahead mechanism is not working as well as it theoretically could be. Also the standby isn't using a ring-buffer strategy, but I see no reason to think it would help were it to do so. The DISK READ column is not what I would call huge during this, often 10-15 MB/S, because much of the IO is scattered rather than sequential. The IO wait % on the other hand is maxed out. It is hard to consider it as a bug that the performance is not as high as one might wish it to be. Is this behavior a regression from some earlier version? What if hot-standby is turned off? Cheers, Jeff
Re: BUG #8192: On very large tables the concurrent update with vacuum lag the hot_standby replica
From
Federico Campoli
Date:
On 02/06/13 01:17, Jeff Janes wrote: > On Thursday, May 30, 2013, wrote: > > The following bug has been logged on the website: > > Bug reference: 8192 > Logged by: Federico Campoli > Email address: federico@brandwatch.com <javascript:;> > PostgreSQL version: 9.2.4 > Operating system: Debian 6.0 > Description: > > /* > > Description: > > It seems on very large tables the concurrent update with vacuum (or > autovacuum), > when the slave is in hot standby mode, generates long loops in read on a > single wal segment during the recovery process. > > This have two nasty effects. > A massive read IO peak and the replay lag increasing as the recovery > process > hangs for long periods on a pointless loop. > > > Are you observing a loop, and if so how are you observing it? What is > it that is looping? I'm sorry, just guessing it could be a loop. The read remains stuck on the same segment. On my testbox I have at least 1 minute to 20 Mb/s. On the live system the peak is 124 Mb/s for 2 to 3 minutes without any progress in the wal reply. I've attached the part of postgresql's log with debug2 from my sandbox when that happens. In warm standby everything is fine no lag at all. At moment as workaround I've switched to warm standby the slaves as, despite the low wal generation on the master ~200Mb/minute the slave accumulates a massive lag when the autovacuum processes starts with hot standby (the peak was 400 GB and was still increasing before switching to warm standby). The database is ~ 4 TB costantly updated. Many thanks Federico -- Federico Campoli Database Administrator brandwatch.com
Attachment
Re: BUG #8192: On very large tables the concurrent update with vacuum lag the hot_standby replica
From
Andres Freund
Date:
On 2013-06-04 13:57:58 +0100, Federico Campoli wrote: > On 02/06/13 01:17, Jeff Janes wrote: > >On Thursday, May 30, 2013, wrote: > > > > The following bug has been logged on the website: > > > > Bug reference: 8192 > > Logged by: Federico Campoli > > Email address: federico@brandwatch.com <javascript:;> > > PostgreSQL version: 9.2.4 > > Operating system: Debian 6.0 > > Description: > > > > /* > > > > Description: > > > > It seems on very large tables the concurrent update with vacuum (or > > autovacuum), > > when the slave is in hot standby mode, generates long loops in read on a > > single wal segment during the recovery process. > > > > This have two nasty effects. > > A massive read IO peak and the replay lag increasing as the recovery > > process > > hangs for long periods on a pointless loop. > > > > > >Are you observing a loop, and if so how are you observing it? What is > >it that is looping? > > I'm sorry, just guessing it could be a loop. > The read remains stuck on the same segment. Well, if you check the output in that file you can see that 'apply' is progressing, so it's not stuck in some specific area. Is the startup process cpu bound during that time? If so, any chance to get a profile? > In warm standby everything is fine no lag at all. > > At moment as workaround I've switched to warm standby the slaves as, despite > the low wal generation on the master ~200Mb/minute the slave accumulates a > massive lag when the autovacuum processes starts with hot standby (the peak > was 400 GB and was still increasing before switching to warm standby). By switching to a warm standby you mean disabling hot_standby=on in the standbys or changing wal_level? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: BUG #8192: On very large tables the concurrent update with vacuum lag the hot_standby replica
From
Andres Freund
Date:
Hi, On 2013-06-04 16:20:12 +0100, Federico Campoli wrote: > >Well, if you check the output in that file you can see that 'apply' is > >progressing, so it's not stuck in some specific area. > >Is the startup process cpu bound during that time? If so, any chance to > >get a profile? > > Please find attached the profile file and the postgresql log generated in > the test run on my sandbox. Unfortunately a gprof profile isn't meaningful without the generating binary as far as I know. Could you either generate the callgraph or just generate a perf profile like: # start_test # perf record -a # ctrl-C # perf report > somefile And then send somefile? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: BUG #8192: On very large tables the concurrent update with vacuum lag the hot_standby replica
From
Federico Campoli
Date:
On 04/06/13 16:29, Andres Freund wrote: > Hi, > > On 2013-06-04 16:20:12 +0100, Federico Campoli wrote: >>> Well, if you check the output in that file you can see that 'apply' is >>> progressing, so it's not stuck in some specific area. >>> Is the startup process cpu bound during that time? If so, any chance to >>> get a profile? >> >> Please find attached the profile file and the postgresql log generated in >> the test run on my sandbox. > > Unfortunately a gprof profile isn't meaningful without the generating > binary as far as I know. Could you either generate the callgraph or just > generate a perf profile like: > > # start_test > # perf record -a > # ctrl-C > # perf report > somefile > > And then send somefile? > > Greetings, > > Andres Freund > This is the gprof output for the profile. Many thanks Federico -- Federico Campoli Database Administrator brandwatch.com
Attachment
Re: BUG #8192: On very large tables the concurrent update with vacuum lag the hot_standby replica
From
Federico Campoli
Date:
On 04/06/13 14:14, Andres Freund wrote: > On 2013-06-04 13:57:58 +0100, Federico Campoli wrote: >> On 02/06/13 01:17, Jeff Janes wrote: >>> On Thursday, May 30, 2013, wrote: >>> >>> The following bug has been logged on the website: >>> >>> Bug reference: 8192 >>> Logged by: Federico Campoli >>> Email address: federico@brandwatch.com <javascript:;> >>> PostgreSQL version: 9.2.4 >>> Operating system: Debian 6.0 >>> Description: >>> >>> /* >>> >>> Description: >>> >>> It seems on very large tables the concurrent update with vacuum (or >>> autovacuum), >>> when the slave is in hot standby mode, generates long loops in read on a >>> single wal segment during the recovery process. >>> >>> This have two nasty effects. >>> A massive read IO peak and the replay lag increasing as the recovery >>> process >>> hangs for long periods on a pointless loop. >>> >>> >>> Are you observing a loop, and if so how are you observing it? What is >>> it that is looping? >> >> I'm sorry, just guessing it could be a loop. >> The read remains stuck on the same segment. > > Well, if you check the output in that file you can see that 'apply' is > progressing, so it's not stuck in some specific area. > Is the startup process cpu bound during that time? If so, any chance to > get a profile? Please find attached the profile file and the postgresql log generated in the test run on my sandbox. > > By switching to a warm standby you mean disabling hot_standby=on in the > standbys or changing wal_level? The master's wal_level is hot_standby. I set the warm standby on the slave changing to off the hot_standby parameter. Cheers Federico -- Federico Campoli Database Administrator brandwatch.com
Attachment
Re: BUG #8192: On very large tables the concurrent update with vacuum lag the hot_standby replica
From
Jeff Janes
Date:
On Tue, Jun 4, 2013 at 5:57 AM, Federico Campoli <federico@brandwatch.com>wrote: > > I'm sorry, just guessing it could be a loop. > The read remains stuck on the same segment. > On my testbox I have at least 1 minute to 20 Mb/s. > On the live system the peak is 124 Mb/s for 2 to 3 minutes without any > progress in the wal reply. > > I've attached the part of postgresql's log with debug2 from my sandbox > when that happens. > It looks like it is making progress, just very slowly. Basically I think every WAL record that needs it to be replayed triggers the random read of some data block which is not already cached on the standby. > In warm standby everything is fine no lag at all. > OK, then I don't think I can reproduce it. The spiky replay I see is the same whether the standby is hot or warm. Cheers, Jeff
Re: BUG #8192: On very large tables the concurrent update with vacuum lag the hot_standby replica
From
Jeff Janes
Date:
On Tue, Jun 4, 2013 at 8:36 AM, Federico Campoli <federico@brandwatch.com>wrote: > > This is the gprof output for the profile. > This looks to me like the gprof of a process that is either idle, or completely IO bound. I'd probably approach this with a combination of "strace -T -ttt -p <PID>" and "lsof -p <PID>" on the PID of the start-up process, to see which file descriptors it is waiting for read or writes on, and what the underlying names of the files for those file descriptors are. Cheers, Jeff
Re: BUG #8192: On very large tables the concurrent update with vacuum lag the hot_standby replica
From
Federico Campoli
Date:
On 06/06/13 21:22, Jeff Janes wrote: > On Tue, Jun 4, 2013 at 8:36 AM, Federico Campoli > <federico@brandwatch.com <mailto:federico@brandwatch.com>> wrote: > > > This is the gprof output for the profile. > > > This looks to me like the gprof of a process that is either idle, or > completely IO bound. > > I'd probably approach this with a combination of "strace -T -ttt -p > <PID>" and "lsof -p <PID>" on the PID of the start-up process, to see > which file descriptors it is waiting for read or writes on, and what the > underlying names of the files for those file descriptors are. > > Cheers, > > Jeff I've generated a strace from the startup process a new profile report and a new log file. It's quite big so you can download here http://d01.megashares.com/dl/BqZLh5H/HS_data_lag.tar I've monitored the startup process lsof and the only files accessed are the recovering wal and the relation affected by the master's activity. Regarding the warm standby, I've repeated the test on my sandbox with the slave in warm standby and I've noticed a replication lag spike. This does not affect the production,same database version and architecture from debian package. If I turn on the hot standby on the failover in 30 minutes accumulates a several Gigabytes lag. When I switch off the lag disappear. I can just guess in WS the lag have less impact as the random read resumes quicker than the HS. Many thanks Federico -- Federico Campoli Database Administrator brandwatch.com
Re: BUG #8192: On very large tables the concurrent update with vacuum lag the hot_standby replica
From
Jeff Janes
Date:
On Fri, Jun 7, 2013 at 6:08 AM, Federico Campoli <federico@brandwatch.com>wrote: > On 06/06/13 21:22, Jeff Janes wrote: > >> >> >> I'd probably approach this with a combination of "strace -T -ttt -p >> <PID>" and "lsof -p <PID>" on the PID of the start-up process, to see >> which file descriptors it is waiting for read or writes on, and what the >> underlying names of the files for those file descriptors are. >> >> Cheers, >> >> Jeff >> > > I've generated a strace from the startup process a new profile report and > a new log file. > > It's quite big so you can download here > > http://d01.megashares.com/dl/**BqZLh5H/HS_data_lag.tar<http://d01.megashares.com/dl/BqZLh5H/HS_data_lag.tar> > > I've monitored the startup process lsof and the only files accessed are > the recovering wal and the relation affected by the master's activity. > You didn't need to strace the entire operation, just a sampling for ~1 minute in the middle of it should be enough. It looks like it is basically idle before 1370603193.846487, so I ignored that part. Of the remaining 472 seconds covered (during which time it replays ~19 WAL files), the sum of the system call times covers only 210 seconds, so the rest is presumably CPU time (although the error due to rounding and instrumentation vagaries could be quite high). Even during the active part, quite a bit of the time is still spent idle ("poll([{fd=3, events=POLLIN}], 1, 5000)"). And the biggest remaining chunk is spent on "lseek(11,", but again I don't know how much to trust that as any individual call is at the limit of measurability so it could just be the accumulation of instrumentation errors, not real time. zcat pg_HS_strace.out.gz | perl -lne '/<([\d.]+)>/ and $_ > 1370603193.846487 and print "$1\t$_"'|sort -rn|perl -lne '$sum+=$_; print "$sum\t$_"' |less zcat pg_HS_strace.out.gz | perl -lne '/ ([^,]+,).*<([\d.]+)>/ and $_ > 1370603193.846487 and $h{$1}+=$2; END {print "$h{$_}\t$_" foreach keys %h}'|sort -g For the warm stand-by, if I cut of the uninteresting parts at the beginning and end (1370604295.966518 and 1370605296.458290), then it applies ~33 WAL files in 1000 seconds, so it doesn't seem to be any faster than the hot standby. Are you sure the strace covered the events of interest? Do you have output of vmstat or sar for the period of interest? It would be interesting to see what the user time and wait time are. > > Regarding the warm standby, I've repeated the test on my sandbox with the > slave in warm standby and I've noticed a replication lag spike. > > This does not affect the production,same database version and architecture > from debian package. > I'm not sure what you are saying here. Both HS and WS lag on your test machine, but only HS does on the production? And the straces you posted are from the test machine? In any case, nothing really stands out here. Other than the idle periods, no one thing seems to be dominating the time. Cheers, Jeff