Re: Big UPDATE breaking replication - Mailing list pgsql-admin
From | Kouber Saparev |
---|---|
Subject | Re: Big UPDATE breaking replication |
Date | |
Msg-id | 51B07E67.9060707@saparev.com Whole thread Raw |
In response to | Re: Big UPDATE breaking replication (Steve Crawford <scrawford@pinpointresearch.com>) |
Responses |
Re: Big UPDATE breaking replication
|
List | pgsql-admin |
On 06/04/2013 06:47 PM, Steve Crawford wrote: > On 06/04/2013 04:53 AM, Kouber Saparev wrote: >> Hello, >> >> We are using the 9.1 built-in streaming replication. >> >> Recently our slave nodes fell behind because of an UPDATE statement. It >> took about 3 minutes to execute, but it affected half a million records, >> hence the replication broke with the "requested WAL segment ... has >> already been removed" series of error messages. >> >> The WAL settings we have are: >> >> max_wal_senders = 6 >> wal_keep_segments = 60 >> max_standby_archive_delay = 300s >> >> >> I guess increasing the wal_keep_segments value would prevent it from >> happening in the future, but increase it with how much? What value would >> be high enough? > > You can use WAL shipping to protect against this or set > wal_keep_segments higher. I set my main server to a tad over 1,000 and > know I can do a full restore on the master without coming close to > breaking replication. My xlog dir is 17G. A bit of a waste, perhaps, > but I've noted no ill effects and it's still only a sliver of the > total drive capacity. Well, the streaming replication involves the WAL shipping already. Do you mean to archive the WALs somewhere and then scp them with a cron job? I doubt it would be fast enough neither. > > >> >> Also we noticed some strange error message appearing shortly before and >> after this same statement: "LOG: out of file descriptors: Too many open >> files; release and retry". >> >> Could it be related somehow and what does it mean exactly? > > What is your setup (Linux? Mac? Windows? VM in the cloud? How many > simultaneous connections?...) You will find a lot of info in old > messages on the subject but, annotating the docs: If the kernel is > enforcing a safe per-process limit, you don't need to worry about this > setting. But on some platforms (notably, most BSD systems - looking at > you Mac), the kernel will allow individual processes to open many more > files than the system can actually support if many processes all try > to open that many files.... > > An old email from Tom Lane notes that you need to make sure your > kernel can support approximately > max_connections * (max_files_per_process + max_connections) open file > handles plus any requirements imposed by other processes on the system > and comments that Mac treats each semaphore as an open file. > > My interpretation is that if your kernel enforces things properly you > don't need to worry. If it doesn't, reduce your max_connections and/or > max_files_per_process as needed. We are running virtual machines with Ubuntu Server 12.04.1 LTS. The error looks very confusing given: root@DB1:~# cat /proc/sys/fs/file-max 1621645 root@DB1:~# lsof | wc -l 7013 It is not present normally in our logs, that's why I am suspecting that it has some correlation with the WAL issue above. Could it be that the master was not able to open properly the wal sender for this transaction? -- Kouber Saparev
pgsql-admin by date: