Thread: dropdb breaks replication?
I've two PostgreSQL 9.1.6 running on Linux CentOS 5.8 64bit. They are replicated asynchronously. Yesterday, I've dropped a database of 20Gb, and then replication has broken, requiring me to manually synchronize both servers again. It is expected that dropdb (or, perhaps, createdb) break existing replication between servers? Thanks, Edson
On Wed, Oct 31, 2012 at 10:32 AM, Edson Richter <edsonrichter@hotmail.com> wrote: > I've two PostgreSQL 9.1.6 running on Linux CentOS 5.8 64bit. > They are replicated asynchronously. > > Yesterday, I've dropped a database of 20Gb, and then replication has broken, > requiring me to manually synchronize both servers again. > > It is expected that dropdb (or, perhaps, createdb) break existing > replication between servers? How did you determine that replication was broken, and how did you manually synchronize the servers? Are you certain that replication was working prior to dropping the database?
Em 31/10/2012 15:39, Lonni J Friedman escreveu: > On Wed, Oct 31, 2012 at 10:32 AM, Edson Richter > <edsonrichter@hotmail.com> wrote: >> I've two PostgreSQL 9.1.6 running on Linux CentOS 5.8 64bit. >> They are replicated asynchronously. >> >> Yesterday, I've dropped a database of 20Gb, and then replication has broken, >> requiring me to manually synchronize both servers again. >> >> It is expected that dropdb (or, perhaps, createdb) break existing >> replication between servers? > How did you determine that replication was broken, and how did you > manually synchronize the servers? Are you certain that replication > was working prior to dropping the database? > > I'm sure replication was running. I usually keep two windows open in both servers, running In master: watch -n 2 "ps aux | egrep sender" In slave: watch -n 2 "ps aux | egrep receiver" At the point the dropdb command has been executed, both disappeared from my "radar". Also, in the log there is the following error: LOG: replicação em fluxo conectou-se com sucesso ao servidor principal FATAL: não pôde receber dados do fluxo do WAL: FATAL: segmento do WAL solicitado 0000000100000001000000BE já foi removido May the cause not having enough segments (currently 80) for dropdb command? Is dropdb logged in transaction log page-by-page excluded? Thanks, Edson
On Wed, Oct 31, 2012 at 11:01 AM, Edson Richter <edsonrichter@hotmail.com> wrote: > Em 31/10/2012 15:39, Lonni J Friedman escreveu: >> >> On Wed, Oct 31, 2012 at 10:32 AM, Edson Richter >> <edsonrichter@hotmail.com> wrote: >>> >>> I've two PostgreSQL 9.1.6 running on Linux CentOS 5.8 64bit. >>> They are replicated asynchronously. >>> >>> Yesterday, I've dropped a database of 20Gb, and then replication has >>> broken, >>> requiring me to manually synchronize both servers again. >>> >>> It is expected that dropdb (or, perhaps, createdb) break existing >>> replication between servers? >> >> How did you determine that replication was broken, and how did you >> manually synchronize the servers? Are you certain that replication >> was working prior to dropping the database? >> >> > I'm sure replication was running. > I usually keep two windows open in both servers, running > > In master: > > watch -n 2 "ps aux | egrep sender" > > In slave: > > watch -n 2 "ps aux | egrep receiver" > > > At the point the dropdb command has been executed, both disappeared from my > "radar". > Also, in the log there is the following error: > > LOG: replicação em fluxo conectou-se com sucesso ao servidor principal > FATAL: não pôde receber dados do fluxo do WAL: FATAL: segmento do WAL > solicitado 0000000100000001000000BE já foi removido > > > May the cause not having enough segments (currently 80) for dropdb command? > Is dropdb logged in transaction log page-by-page excluded? I can't read portugese(?), but i think the gist of the error is that the WAL segment was already removed before the slave could consume it. I'm guessing that you aren't keeping enough of them, and dropping the database generated a huge volume which flushed out the old ones before they could get consumed by your slave.
Lonni J Friedman <netllama@gmail.com> writes: > On Wed, Oct 31, 2012 at 11:01 AM, Edson Richter > <edsonrichter@hotmail.com> wrote: >> May the cause not having enough segments (currently 80) for dropdb command? >> Is dropdb logged in transaction log page-by-page excluded? > I can't read portugese(?), but i think the gist of the error is that > the WAL segment was already removed before the slave could consume it. > I'm guessing that you aren't keeping enough of them, and dropping the > database generated a huge volume which flushed out the old ones before > they could get consumed by your slave. dropdb generates one, not very large, WAL record saying "go rm -rf this directory". So sheer WAL volume is not the correct explanation. It's possible though that the slave spent long enough executing the rm -rf to fall behind the master. In any case, it should have been able to catch up automatically if WAL archiving was configured properly. regards, tom lane
Em 31/10/2012 16:09, Lonni J Friedman escreveu: > On Wed, Oct 31, 2012 at 11:01 AM, Edson Richter > <edsonrichter@hotmail.com> wrote: >> Em 31/10/2012 15:39, Lonni J Friedman escreveu: >>> On Wed, Oct 31, 2012 at 10:32 AM, Edson Richter >>> <edsonrichter@hotmail.com> wrote: >>>> I've two PostgreSQL 9.1.6 running on Linux CentOS 5.8 64bit. >>>> They are replicated asynchronously. >>>> >>>> Yesterday, I've dropped a database of 20Gb, and then replication has >>>> broken, >>>> requiring me to manually synchronize both servers again. >>>> >>>> It is expected that dropdb (or, perhaps, createdb) break existing >>>> replication between servers? >>> How did you determine that replication was broken, and how did you >>> manually synchronize the servers? Are you certain that replication >>> was working prior to dropping the database? >>> >>> >> I'm sure replication was running. >> I usually keep two windows open in both servers, running >> >> In master: >> >> watch -n 2 "ps aux | egrep sender" >> >> In slave: >> >> watch -n 2 "ps aux | egrep receiver" >> >> >> At the point the dropdb command has been executed, both disappeared from my >> "radar". >> Also, in the log there is the following error: >> >> LOG: replicação em fluxo conectou-se com sucesso ao servidor principal >> FATAL: não pôde receber dados do fluxo do WAL: FATAL: segmento do WAL >> solicitado 0000000100000001000000BE já foi removido >> >> >> May the cause not having enough segments (currently 80) for dropdb command? >> Is dropdb logged in transaction log page-by-page excluded? > I can't read portugese(?), but i think the gist of the error is that > the WAL segment was already removed before the slave could consume it. > I'm guessing that you aren't keeping enough of them, and dropping the > database generated a huge volume which flushed out the old ones before > they could get consumed by your slave. > > Sorry for the portguese text. Yes, your assumption is correct: WAL segment has been excluded before being able to replicate. I keep 80 WAL segments, but I was wondering if a drop database is being logged: it's just so fast, I thought it wasn't logged. And what is the purpose to log (and replicate) the database drop, if you will not be able to recover it - IMHO, dropdb should be replicated as "database deactivation" or something more or like that... Edson
Em 31/10/2012 16:34, Tom Lane escreveu: > Lonni J Friedman <netllama@gmail.com> writes: >> On Wed, Oct 31, 2012 at 11:01 AM, Edson Richter >> <edsonrichter@hotmail.com> wrote: >>> May the cause not having enough segments (currently 80) for dropdb command? >>> Is dropdb logged in transaction log page-by-page excluded? >> I can't read portugese(?), but i think the gist of the error is that >> the WAL segment was already removed before the slave could consume it. >> I'm guessing that you aren't keeping enough of them, and dropping the >> database generated a huge volume which flushed out the old ones before >> they could get consumed by your slave. > dropdb generates one, not very large, WAL record saying "go rm -rf this > directory". So sheer WAL volume is not the correct explanation. It's > possible though that the slave spent long enough executing the rm -rf > to fall behind the master. Your assumption is right: the slave server is a slow mono processor, low memory, cloud computer, and would have taken very long time to delete everything. > > In any case, it should have been able to catch up automatically if WAL > archiving was configured properly. I don't use WAL archiving - both servers are miles away from each other, and don't have anything except PostgreSQL async replication over VPN connecting them. Edson > > regards, tom lane > >
On 10/31/12 11:34 AM, Edson Richter wrote: > Sorry for the portguese text. Yes, your assumption is correct: WAL > segment has been excluded before being able to replicate. > I keep 80 WAL segments, but I was wondering if a drop database is > being logged: it's just so fast, I thought it wasn't logged. > And what is the purpose to log (and replicate) the database drop, if > you will not be able to recover it - IMHO, dropdb should be replicated > as "database deactivation" or something more or like that... WAL is not a 'redo' log like Oracle uses. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
Edson -- >I've two PostgreSQL 9.1.6 running on Linux CentOS 5.8 64bit. >They are replicated asynchronously. > >Yesterday, I've dropped a database of 20Gb, and then replication has broken, requiring me to manually synchronize both serversagain. > >It is expected that dropdb (or, perhaps, createdb) break existing replication between servers? > Sorry for the slow response -- as others have indicated, the drop db is probably not the problem. We have one system thatdrops a several-gig database hourly and the replication has never failed. We see issues on the master with dead filehandles but the replication itself is rock solid. Greg
Em 31/10/2012 20:47, Greg Williamson escreveu: > Edson -- > >> I've two PostgreSQL 9.1.6 running on Linux CentOS 5.8 64bit. >> They are replicated asynchronously. >> >> Yesterday, I've dropped a database of 20Gb, and then replication has broken, requiring me to manually synchronize bothservers again. >> >> It is expected that dropdb (or, perhaps, createdb) break existing replication between servers? >> > > Sorry for the slow response -- as others have indicated, the drop db is probably not the problem. We have one system thatdrops a several-gig database hourly and the replication has never failed. We see issues on the master with dead filehandles but the replication itself is rock solid. > > Greg > > Our application should (almost) never delete databases, but just in case I'll keep an eye open, and manually sync the replication if needed. It is not a major issue, was more a matter of curiosity. Also, John pointed that xlog in PostgreSQL is not the same as the concept I had from Oracle days. Thanks, Greg (and everyone). Edson