Re: Postgres slave not catching up (on 9.2)

From: Ruben Domingo Gaspar Aparicio
Subject: Re: Postgres slave not catching up (on 9.2)
Date: ,
Msg-id: 6FCBD86BDC789C43AC696969EC340228010942123C@CERNXCHG44.cern.ch
(view: Whole thread, Raw)
In response to: Postgres slave not catching up (on 9.2)  (Ruben Domingo Gaspar Aparicio)
List: pgsql-performance

Tree view

Postgres slave not catching up (on 9.2)  (Ruben Domingo Gaspar Aparicio, )
 Re: Postgres slave not catching up (on 9.2)  (Robert Klemme, )
  Re: Postgres slave not catching up (on 9.2)  (Ruben Domingo Gaspar Aparicio, )
 Re: Postgres slave not catching up (on 9.2)  (Andres Freund, )
  Re: Postgres slave not catching up (on 9.2)  (Ruben Domingo Gaspar Aparicio, )
 Re: Postgres slave not catching up (on 9.2)  (Ruben Domingo Gaspar Aparicio, )
 Re: Postgres slave not catching up (on 9.2)  (Ruben Domingo Gaspar Aparicio, )

> > The slave (I don't have control on the master) is using 2 NFS file
> > systems, one for WALs and another one for the data, on Netapp controllers:
> >
> > dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs
> >
> (rw,remount,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,timeo=60
> > 0)
> >
> > dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs
> >
> (rw,remount,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,timeo=60
> > 0)
> 
> You should use noatime to avoid unnecessary IO.
> 

Just to mention that changing the mount points from:

dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs (rw,
actimeo=0,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600)
dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs (rw,
actimeo=0,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600)

to

dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs
(rw,noatime,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600)
dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs
(rw,noatime,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600)

it did have a big impact. The profile of the recovery process on terms of calls changed quite a lot:

From: 

[postgres@itrac1202 tmp]$  strace -p 9596 -c
Process 9596 attached - interrupt to quit
Process 9596 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
78.73    0.217824           0    456855    381376 read
17.87    0.049453           0    515320           lseek
  2.89    0.007989          12       669       669 poll
  0.33    0.000912          25        36           open
  0.07    0.000206           0       994       994 stat
  0.05    0.000151           0       995       787 rt_sigreturn
  0.05    0.000133           0       673           write
  0.00    0.000000           0        36           close
  0.00    0.000000           0        52           kill
------ ----------- ----------- --------- --------- ----------------
100.00    0.276668                975630    383826 total


To:

[postgres@itrac1202 tmp]$  strace -p 9596 -c
Process 9596 attached - interrupt to quit
Process 9596 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
78.73    0.217824           0    456855    381376 read
17.87    0.049453           0    515320           lseek
  2.89    0.007989          12       669       669 poll
  0.33    0.000912          25        36           open
  0.07    0.000206           0       994       994 stat
  0.05    0.000151           0       995       787 rt_sigreturn
  0.05    0.000133           0       673           write
  0.00    0.000000           0        36           close
  0.00    0.000000           0        52           kill
------ ----------- ----------- --------- --------- ----------------
100.00    0.276668                975630    383826 total

We did also increased the shared_buffers from 12 to 24GB.

The lag has decreased most of the time:

*/10 * * * *     /usr/local/pgsql/postgresql-9.2.9/bin/psql -U postgres -h /var/lib/pgsql/ -p 6600 -d puppetdb -c
"SELECTnow(), now() - pg_last_xact_replay_timestamp()  AS time_lag" | perl -ne 'if
(/\|\s+(\d{2}):(\d{2}):(\d{2})\.\d+/){$hour=$1;$min=$2;$sec=$3; print $_;}' >> /tmp/lag929morememmount.log
 


...
2014-12-14 14:10:01.688947+01 | 00:00:00.096524
 2014-12-14 14:20:01.798223+01 | 00:00:00.024083
 2014-12-14 14:30:01.884448+01 | 00:00:00.420791
 2014-12-14 14:40:01.960623+01 | 00:00:00.168318
 2014-12-14 14:50:01.191487+01 | 00:00:00.163832
 2014-12-14 15:00:02.146436+01 | 00:00:00.026934
 2014-12-14 15:10:01.277963+01 | 00:00:00.332185
 2014-12-14 15:20:01.353979+01 | 00:00:00.020616
 2014-12-14 15:30:01.417092+01 | 00:00:00.584768
 2014-12-14 15:40:01.575347+01 | 00:00:00.151685
 2014-12-14 15:50:01.205507+01 | 00:00:00.102073
 2014-12-14 16:00:01.321511+01 | 00:00:00.590677
 2014-12-14 16:10:01.570474+01 | 00:00:00.182683
 2014-12-14 16:20:01.640095+01 | 00:00:00.420185
 2014-12-14 16:30:01.767033+01 | 00:00:00.015989
 2014-12-14 16:40:01.849532+01 | 00:00:00.106296
 2014-12-14 16:50:01.920876+01 | 00:00:00.258851
 2014-12-14 17:00:02.000278+01 | 00:00:00.119841
 2014-12-14 17:10:01.894227+01 | 00:00:00.091599
 2014-12-14 17:20:01.61729+01 | 00:00:00.367367
 2014-12-14 17:30:01.683326+01 | 00:00:00.103884
 2014-12-14 17:40:01.755904+01 | 00:00:00.051262
 2014-12-14 17:50:01.833825+01 | 00:00:00.06901
 2014-12-14 18:00:01.901236+01 | 00:00:00.17467
 2014-12-14 18:10:01.186283+01 | 00:00:00.214941
 2014-12-14 18:20:01.145413+01 | 00:00:00.03517
 2014-12-14 18:30:01.241746+01 | 00:00:00.207842
 2014-12-14 18:40:01.299413+01 | 00:00:00.147878
 2014-12-14 18:50:01.368541+01 | 00:00:00.393893
 2014-12-14 19:00:01.430736+01 | 00:00:00.031226
 2014-12-14 19:10:01.672117+01 | 00:05:03.512832
 2014-12-14 19:20:01.9195+01 | 00:06:39.08761
 2014-12-14 19:30:02.184486+01 | 00:00:00.307668
 2014-12-14 19:40:01.227278+01 | 00:00:00.054831
 2014-12-14 19:50:01.305485+01 | 00:00:00.425595
 2014-12-14 20:00:01.410501+01 | 00:00:00.394526
 2014-12-14 20:10:01.984196+01 | 00:00:00.388844
 2014-12-14 20:20:01.031042+01 | 00:00:00.503092
 2014-12-14 20:30:01.225871+01 | 00:00:00.241493
 2014-12-14 20:40:01.305696+01 | 00:00:00.280656
 2014-12-14 20:50:01.379617+01 | 00:00:00.151103
 2014-12-14 21:00:01.468849+01 | 00:00:00.014412
 2014-12-14 21:10:01.724514+01 | 00:00:00.147476
 2014-12-14 21:20:01.799292+01 | 00:00:00.08696
 2014-12-14 21:30:01.866336+01 | 00:00:00.035226
 2014-12-14 21:40:01.942882+01 | 00:00:00.111701
 2014-12-14 21:50:02.010419+01 | 00:00:00.215121
 2014-12-14 22:00:01.110033+01 | 00:00:16.460612
 2014-12-14 22:10:01.568286+01 | 00:00:00.077897
 2014-12-14 22:20:01.682714+01 | 00:00:00.104112
 2014-12-14 22:30:01.758958+01 | 00:00:00.061474
 2014-12-14 22:40:01.970545+01 | 00:00:00.108613
 2014-12-14 22:50:01.038908+01 | 00:00:00.039637
 2014-12-14 23:00:01.120295+01 | 00:00:00.338731
 2014-12-14 23:10:01.365371+01 | 00:00:00.680065
 2014-12-14 23:20:01.423365+01 | 00:00:00.154614
 2014-12-14 23:30:01.48998+01 | 00:00:00.014643
 2014-12-14 23:40:01.569452+01 | 00:00:00.126961
 2014-12-14 23:50:01.63047+01 | 00:00:00.303156
 2014-12-15 00:00:01.278047+01 | 00:00:00.351391
 2014-12-15 00:10:01.382566+01 | 00:00:00.012265
 2014-12-15 00:20:01.444746+01 | 00:07:39.002651
 2014-12-15 00:30:01.510413+01 | 00:16:13.476753
 2014-12-15 00:40:01.97735+01 | 00:00:00.105011
 2014-12-15 00:50:01.082528+01 | 00:01:10.313796
 2014-12-15 01:00:01.124843+01 | 00:00:01.508016
 2014-12-15 01:10:01.818415+01 | 00:00:00.082441
 2014-12-15 01:20:01.961064+01 | 00:00:00.048221
 2014-12-15 01:30:01.355472+01 | 00:00:00.37941
 2014-12-15 01:40:01.42728+01 | 00:00:00.013836
 2014-12-15 01:50:01.486446+01 | 00:00:00.110321
 2014-12-15 02:00:01.566731+01 | 00:00:00.290281
 2014-12-15 02:10:01.236574+01 | 00:01:15.954532
 2014-12-15 02:20:01.440259+01 | 00:00:00.471677
 2014-12-15 02:30:01.5733+01 | 00:00:00.208574
 2014-12-15 02:40:01.662425+01 | 00:00:00.591091
 2014-12-15 02:50:01.263385+01 | 00:00:00.050648
 2014-12-15 03:00:01.340777+01 | 00:00:00.289115
 2014-12-15 03:10:01.993079+01 | 00:00:00.790201
 2014-12-15 03:20:01.061826+01 | 00:00:00.043176
 2014-12-15 03:30:01.125639+01 | 00:00:00.172924
 2014-12-15 03:40:01.252033+01 | 00:03:05.113579
 2014-12-15 03:50:01.362396+01 | 00:00:00.254974
 2014-12-15 04:00:01.370922+01 | 00:00:00.208254
 2014-12-15 04:10:01.472816+01 | 00:00:00.077214
 2014-12-15 04:20:01.553443+01 | 00:00:00.135887
 2014-12-15 04:30:01.63607+01 | 00:00:00.027272
 2014-12-15 04:40:01.696442+01 | 00:00:00.130954
 2014-12-15 04:50:01.786961+01 | 00:00:00.572573
 2014-12-15 05:00:01.790753+01 | 00:00:00.491799
 2014-12-15 05:10:01.078332+01 | 00:07:58.438202  **** likely autovacuum
 2014-12-15 05:20:01.139541+01 | 00:00:00.057486
 2014-12-15 05:30:01.251079+01 | 00:00:00.053462
 2014-12-15 05:40:01.322349+01 | 00:00:00.084701
 2014-12-15 05:50:01.607937+01 | 00:00:00.205241
 2014-12-15 06:00:01.699406+01 | 00:00:00.121415
 2014-12-15 06:10:01.756047+01 | 00:00:00.20769
 2014-12-15 06:20:01.854222+01 | 00:00:00.03397
 2014-12-15 06:30:02.041054+01 | 00:03:07.271295
 2014-12-15 06:40:01.891882+01 | 00:00:00.263748
 2014-12-15 06:50:01.987809+01 | 00:00:00.155619
 2014-12-15 07:00:01.068556+01 | 00:00:00.119866
 2014-12-15 07:10:01.318478+01 | 00:00:00.092856
 2014-12-15 07:20:01.704899+01 | 00:00:00.106533
 2014-12-15 07:30:01.773268+01 | 00:00:00.135743
 2014-12-15 07:40:01.730152+01 | 00:00:00.06358
 2014-12-15 07:50:01.798179+01 | 00:00:00.529685
 2014-12-15 08:00:01.868205+01 | 00:00:00.194482
 2014-12-15 08:10:01.219339+01 | 00:00:00.063553
 2014-12-15 08:20:01.309426+01 | 00:00:00.056698
 2014-12-15 08:30:01.120431+01 | 00:00:00.425596
 2014-12-15 08:40:01.201882+01 | 00:00:00.00909
 2014-12-15 08:50:01.272526+01 | 00:00:00.019492
 2014-12-15 09:00:01.361022+01 | 00:00:00.423997
 2014-12-15 09:10:01.603702+01 | 00:00:00.066705
 2014-12-15 09:20:01.682277+01 | 00:00:09.251202
 2014-12-15 09:30:01.934477+01 | 00:00:00.311553
 2014-12-15 09:40:02.03221+01 | 00:00:00.125678
 2014-12-15 09:50:01.105372+01 | 00:00:00.294006
 2014-12-15 10:00:01.201109+01 | 00:00:00.014641
 2014-12-15 10:10:01.164478+01 | 00:01:51.375378
 2014-12-15 10:20:01.264589+01 | 00:09:54.476361  **** likely autovacuum
 2014-12-15 10:30:01.351103+01 | 00:00:00.213636
 2014-12-15 10:40:01.623903+01 | 00:00:00.488103
 2014-12-15 10:50:01.768132+01 | 00:00:00.080799
 2014-12-15 11:00:01.880247+01 | 00:00:20.401738
 2014-12-15 11:10:01.215509+01 | 00:00:00.036288
 2014-12-15 11:20:01.265607+01 | 00:00:00.057142
 2014-12-15 11:30:01.343731+01 | 00:00:00.036609
 2014-12-15 11:40:01.41248+01 | 00:00:00.218139
 2014-12-15 11:50:01.48113+01 | 00:00:00.242754
 2014-12-15 12:00:01.685114+01 | 00:00:00.82528
 2014-12-15 12:10:01.995243+01 | 00:02:29.971448
 2014-12-15 12:20:01.962833+01 | 00:00:00.118112
 2014-12-15 12:30:01.100587+01 | 00:00:00.214437
 2014-12-15 12:40:01.226111+01 | 00:00:00.052599
 2014-12-15 12:50:01.300061+01 | 00:00:00.162205
 2014-12-15 13:00:01.4007+01 | 00:00:00.707891
 2014-12-15 13:10:02.005526+01 | 00:00:00.162238
 2014-12-15 13:20:01.072375+01 | 00:00:00.214978
 2014-12-15 13:30:01.446005+01 | 00:00:00.121816
 2014-12-15 13:40:01.483524+01 | 00:00:00.650178
 2014-12-15 13:50:01.796143+01 | 00:00:00.065482
 2014-12-15 14:00:01.886071+01 | 00:00:00.237577
 2014-12-15 14:10:01.134148+01 | 00:00:00.193941
 2014-12-15 14:20:01.199047+01 | 00:00:00.068058
 2014-12-15 14:30:01.27777+01 | 00:00:00.022991
 2014-12-15 14:40:01.361959+01 | 00:00:00.439753
 2014-12-15 14:50:01.421515+01 | 00:00:00.037749
 2014-12-15 15:00:01.500559+01 | 00:00:00.174448
 2014-12-15 15:10:01.811804+01 | 00:06:09.196648  **** likely autovacuum
..



It goes up till a maximum of 25 minutes (for the last two weeks), it looks correlated with an autovacuum at the master
inone of the big tables of the schema. It happens at about 5hours interval. Is there a way to avoid this ? Should I ask
tothe master db dba to try to have a more active autovacuum policy?
 


Thank you,
Ruben


pgsql-performance by date:

From: Ruben Domingo Gaspar Aparicio
Date:
Subject: Re: Postgres slave not catching up (on 9.2)
From: "Graeme B. Bell"
Date:
Subject: Re: 8xIntel S3500 SSD in RAID10 on Dell H710p