Thread: Zen and art if WAL and dump file mainteance
We are trying to confirm our understanding of WAL. Thank you in advance for any comments and help. We have version 8.24.
We noticed that when our nightly maintenance of the WAL folders kicked off *** - see below. Production processing was "KILLED". We were in the process of rekeying MANY existing tables.
We were doing heavy processing(which we normally don't do at the time of maintenance - adding fields, table creation and filling tables going on. During this time, archiving was enabled.
## Our data is set up like this
A /sprj/sqldumps_Linux.pgsql_serverX is our storage for the WAL
B /sqldata_Linux.pgsql_serverX is another computer/folder where the actual data is (pgdata).
We are trying to figure out why maintenance on A (zipping up & removing files) would have an affect on production processing (killed it) on pgdata - B.
We are thinking we should have turned off the nightly maintenance at this point, but we are wondering why and trying to improve our process logic. We were thinking that our zipping and deleting
activity on A would not have an affect on B. And it never has before, although we never had heavy processing at that time before.
Since, the archives don't happen too often, until the 36 logfile segments locally gets filled up, then one gets pushed out. However, our clean up process is running from 11:30 - 12:32 in this case.
In our case, the first script wasn't finished zipping causing the second remove script to fail.
***
#This is what happened on A - nightly maintenance of dumps folder *** Maybe there is a better way to do this and we are missing something here
30 19,23 * * * find /maintenance/NFSmntdd.dump001/sqldumps_Linux.pgsql_* -type f -mmin +10 ! -name \*.gz -size +0c -exec gzip {} \;
30 0 * * * find /maintenance/NFSmntdd.dump001/sqldumps_Linux.pgsql_* -type f -mmin +20 -exec rm {} \
# This is what was happening on B
#---------------------------------------------------------------------------
# WRITE AHEAD LOG
#---------------------------------------------------------------------------
# - Settings -
#fsync = on # turns forced synchronization on or off
#wal_sync_method = fsync # the default is the first option
# supported by the operating system:
# open_datasync
# fdatasync
# fsync
# fsync_writethrough
# open_sync
#full_page_writes = on # recover from partial page writes
#wal_buffers = 64kB # min 32kB
wal_buffers = 128kB # min 32kB
# (change requires restart)
#commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000
# - Checkpoints -
#checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
checkpoint_segments = 36 # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 5min # range 30s-1h
#checkpoint_warning = 30s # 0 is off
# - Archiving -
#archive_command = '' # command to use to archive a logfile segment
archive_command = 'cp "%p" /sprj/sqldumps_Linux.pgsql_serverx/"%f"' # command to use to archive a logfile segment
archive_timeout = 0 # force a logfile segment switch after this
# many seconds; 0 is off
Shed those extra pounds with MSN and The Biggest Loser! Learn more.
dx k9 <bitsandbytes88@hotmail.com> writes: > We are trying to figure out why maintenance on A (zipping up & removing files) would have an affect on production processing(killed it) on pgdata - B. Sounds like an OOM kill problem --- do you have memory overcommit disabled on that machine? regards, tom lane
We have: vm.overcommit_memory = 2 -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Tom Lane Sent: Monday, January 28, 2008 4:27 PM To: dx k9 Cc: posgres support Subject: Re: [ADMIN] Zen and art if WAL and dump file mainteance dx k9 <bitsandbytes88@hotmail.com> writes: > We are trying to figure out why maintenance on A (zipping up & removing files) would have an affect on production processing (killed it) on pgdata - B. Sounds like an OOM kill problem --- do you have memory overcommit disabled on that machine? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly -------------------------------------------------------- Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are notthe addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copyingof this communication is strictly prohibited. If you have received this e-mail in error, please notify the senderof the error.
I've seen this quite a bit lately especially when Vmware is involved. Over committing the host server is becoming more common and Vmware has this "balloon" driver which can bring out the Linux oom-killer process. Check your messages file for "oom-killer". If it's there, Tom's right and you need to check into tweaking the kernel a little. sysctl -w vm.overcommit_memory=2 OR echo "vm.overcommit_memory=2" >> /etc/sysctl.conf sysctl -p Greg -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Tom Lane Sent: Monday, January 28, 2008 4:27 PM To: dx k9 Cc: posgres support Subject: Re: [ADMIN] Zen and art if WAL and dump file mainteance dx k9 <bitsandbytes88@hotmail.com> writes: > We are trying to figure out why maintenance on A (zipping up & removing files) would have an affect on production processing (killed it) on pgdata - B. Sounds like an OOM kill problem --- do you have memory overcommit disabled on that machine? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged informationfor the use of the designated recipients named above. If you are not the intended recipient, you are hereby notifiedthat you have received this communication in error and that any review, disclosure, dissemination, distribution orcopying of it or its contents is prohibited. If you have received this communication in error; please notify Cranel Incorporatedimmediately by telephone at 614-431-8000 or 800-288-3475 and destroy all copies of this communication and anyattachments.
I notice in 'pg_data' that a file will stay in pg_xlog until it passes 16 MB, then it goes to the archive pretty much right away. So, when it's busy you get a lot of files swapping into the archive , but when it's not you can get one file taking an hour or more to get up to 16 MB before it swaps over to the archive. In reading the documentation "every 5 minutes or 36 log file segments which ever is first" - - 74 files in my case. But I noticed when it's slow it's more it can be longer than 5 minutes. I'm thinking the "5 minutes or 36 log segments" rule applies to just after files hit 16 MB in size, until then, they are building up.
Does that all sound right?
> To: bitsandbytes88@hotmail.com
> CC: pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] Zen and art if WAL and dump file mainteance
> Date: Mon, 28 Jan 2008 16:27:05 -0500
> From: tgl@sss.pgh.pa.us
>
> dx k9 <bitsandbytes88@hotmail.com> writes:
> > We are trying to figure out why maintenance on A (zipping up & removing files) would have an affect on production processing (killed it) on pgdata - B.
>
> Sounds like an OOM kill problem --- do you have memory overcommit
> disabled on that machine?
>
> regards, tom lane
Climb to the top of the charts! Play the word scramble challenge with star power. Play now!
Does that all sound right?
> To: bitsandbytes88@hotmail.com
> CC: pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] Zen and art if WAL and dump file mainteance
> Date: Mon, 28 Jan 2008 16:27:05 -0500
> From: tgl@sss.pgh.pa.us
>
> dx k9 <bitsandbytes88@hotmail.com> writes:
> > We are trying to figure out why maintenance on A (zipping up & removing files) would have an affect on production processing (killed it) on pgdata - B.
>
> Sounds like an OOM kill problem --- do you have memory overcommit
> disabled on that machine?
>
> regards, tom lane
Climb to the top of the charts! Play the word scramble challenge with star power. Play now!
We do have overcommit set to 2 in the sysctl.conf file as well.
> To: bitsandbytes88@hotmail.com
> CC: pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] Zen and art if WAL and dump file mainteance
> Date: Mon, 28 Jan 2008 16:27:05 -0500
> From: tgl@sss.pgh.pa.us
>
> dx k9 <bitsandbytes88@hotmail.com> writes:
> > We are trying to figure out why maintenance on A (zipping up & removing files) would have an affect on production processing (killed it) on pgdata - B.
>
> Sounds like an OOM kill problem --- do you have memory overcommit
> disabled on that machine?
>
> regards, tom lane
Climb to the top of the charts! Play the word scramble challenge with star power. Play now!
> To: bitsandbytes88@hotmail.com
> CC: pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] Zen and art if WAL and dump file mainteance
> Date: Mon, 28 Jan 2008 16:27:05 -0500
> From: tgl@sss.pgh.pa.us
>
> dx k9 <bitsandbytes88@hotmail.com> writes:
> > We are trying to figure out why maintenance on A (zipping up & removing files) would have an affect on production processing (killed it) on pgdata - B.
>
> Sounds like an OOM kill problem --- do you have memory overcommit
> disabled on that machine?
>
> regards, tom lane
Climb to the top of the charts! Play the word scramble challenge with star power. Play now!