Re: wal-size limited to 16MB - Performance issue for subsequent backup - Mailing list pgsql-hackers
From | jesper@krogh.cc |
---|---|
Subject | Re: wal-size limited to 16MB - Performance issue for subsequent backup |
Date | |
Msg-id | 305d21c86ca9194220c2cf91c11b78df.squirrel@shrek.krogh.cc Whole thread Raw |
In response to | Re: wal-size limited to 16MB - Performance issue for subsequent backup (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: wal-size limited to 16MB - Performance issue for subsequent
backup
|
List | pgsql-hackers |
> On 2014-10-20 21:41:26 +0200, jesper@krogh.cc wrote: >> >> > On 2014-10-20 21:03:59 +0200, jesper@krogh.cc wrote: >> >> One of our "production issues" is that the system generates lots of >> >> wal-files, lots is like 151952 files over the last 24h, which is >> about >> >> 2.4TB worth of WAL files. I wouldn't say that isn't an issue by >> itself, >> >> but the system does indeed work fine. We do subsequently gzip the >> files >> >> to limit actual disk-usage, this makes the files roughly 30-50% in >> size. > > I'm a bit doubtful that 16MB vs., say, 64MB files really changes > anything substantial for you. If it indeed is a problem, it's simple > enough to join the files temporarily? I am trying to get my head around a good way to do that. 64MB probably isn't a silverbullet. But it would definetely benefit the backup in terms of single thread access to data on rotating drives. >> > Have you analyzed what the source of that volume is? Which version of >> > postgres are you using? What's your checkpoint_timeout/segments >> > settings? >> >> Suggestions are surely welcome. > > Once you're on 9.3 I'd suggest using pg_xlogdump --stats on it. There's > a backport of the facility for 9.3 (looking somewhat different than what > is now in 9.5) at > http://archives.postgresql.org/message-id/CABRT9RAzGowqLFcEE8aF6VdPoFEy%2BP9gmu7ktGRzw0dgRwVr9Q%40mail.gmail.com > > That'd tell you a fair bit more. It's noticeably harder to backport to < > 9.3. I'll bookmark that one. >> I do suspect the majority is from 30 concurrent processes updating an >> 506GB GIN index, but it would be nice to confirm that. There is also a >> message-queue in the DB with a fairly high turnaround. > > A 506GB GIN index? Uh, interesting :). What's it used for? Trigrams? It is for full-text-search, but it is being updated entirely regulary, ~100M records. A dump/restore cycle typically reduces the size to 30-40% of current size. >> Currently PG 9.2 moving to 9.3 hopefully before end-of-year, >> checkpoint_timeout = 30min, checkpoint_segments = 4096. > > Generally a high checkpoint_timeout can significantly reduce the WAL > volume because of fewer full page writes. I've seen cases where spacing > checkpoint further apart by a factor of two reduced the overall WAL > volume by more than two. I'll work with that, I was just uncomfortable bumping checkpoint_segments up much higher, any field experience in that corner? >> According to logs checkpoints are roughly 15 minutes apart. > > Can you show log_checkpoints output? 2014-10-20 18:10:22 CEST LOG: checkpoint starting: time 2014-10-20 18:15:44 CEST LOG: checkpoint complete: wrote 76851 buffers (7.3%); 0 transaction log file(s) added, 0 removed, 3238 recycled; write=295.834 s, sync=23.903 s, total=322.011 s; sync files=2115, longest=0.278 s, average=0.011 s 2014-10-20 18:40:22 CEST LOG: checkpoint starting: time 2014-10-20 18:44:30 CEST LOG: checkpoint complete: wrote 60550 buffers (5.8%); 0 transaction log file(s) added, 0 removed, 3460 recycled; write=224.678 s, sync=21.795 s, total=248.340 s; sync files=2090, longest=0.963 s, average=0.010 s 2014-10-20 19:10:22 CEST LOG: checkpoint starting: time 2014-10-20 19:14:11 CEST LOG: checkpoint complete: wrote 42720 buffers (4.1%); 0 transaction log file(s) added, 0 removed, 3598 recycled; write=206.259 s, sync=21.185 s, total=229.254 s; sync files=2065, longest=0.945 s, average=0.010 s 2014-10-20 19:40:22 CEST LOG: checkpoint starting: time 2014-10-20 19:43:31 CEST LOG: checkpoint complete: wrote 32897 buffers (3.1%); 0 transaction log file(s) added, 0 removed, 3626 recycled; write=161.801 s, sync=26.936 s, total=189.635 s; sync files=2115, longest=0.458 s, average=0.012 s 2014-10-20 20:10:22 CEST LOG: checkpoint starting: time 2014-10-20 20:14:04 CEST LOG: checkpoint complete: wrote 37557 buffers (3.6%); 0 transaction log file(s) added, 0 removed, 3285 recycled; write=205.011 s, sync=16.550 s, total=222.442 s; sync files=2113, longest=0.935 s, average=0.007 s 2014-10-20 20:40:22 CEST LOG: checkpoint starting: time 2014-10-20 20:45:18 CEST LOG: checkpoint complete: wrote 58012 buffers (5.5%); 0 transaction log file(s) added, 0 removed, 3678 recycled; write=252.750 s, sync=39.178 s, total=295.107 s; sync files=2075, longest=0.990 s, average=0.018 s 2014-10-20 21:10:22 CEST LOG: checkpoint starting: time 2014-10-20 21:13:31 CEST LOG: checkpoint complete: wrote 40530 buffers (3.9%); 0 transaction log file(s) added, 0 removed, 3652 recycled; write=167.925 s, sync=19.719 s, total=189.057 s; sync files=2077, longest=0.470 s, average=0.009 s 2014-10-20 21:40:22 CEST LOG: checkpoint starting: time 2014-10-20 21:44:20 CEST LOG: checkpoint complete: wrote 45158 buffers (4.3%); 0 transaction log file(s) added, 0 removed, 3449 recycled; write=202.986 s, sync=32.564 s, total=237.441 s; sync files=2100, longest=0.445 s, average=0.015 s
pgsql-hackers by date: