Thread: Data duplication when moving datafiles from one server to another.
Good evening.
Yesterday we experienced some data duplication in several database tables after one sysadmin decided to test in a production environment an rsync script in order to migrate a database from one server to another one.
Postgresql (8.2) was running in source server and rsync script was launched from second one. Second one server had a one day old copy of the same database. Rsync script create a datafile replica in destination server.
Our sysadmin swear he didn't launch the script in a reverse way (that is, from destination to source)... so my question is
How this data duplication could happen? Due to an rsync lock in checkpoint segment or transaction logs? Or he has mistaken source with destination server?
Thans in advance.
Yesterday we experienced some data duplication in several database tables after one sysadmin decided to test in a production environment an rsync script in order to migrate a database from one server to another one.
Postgresql (8.2) was running in source server and rsync script was launched from second one. Second one server had a one day old copy of the same database. Rsync script create a datafile replica in destination server.
Our sysadmin swear he didn't launch the script in a reverse way (that is, from destination to source)... so my question is
How this data duplication could happen? Due to an rsync lock in checkpoint segment or transaction logs? Or he has mistaken source with destination server?
Thans in advance.
IñigoMartinez Lasala <imartinez@vectorsf.com> wrote: > Yesterday we experienced some data duplication in several database > tables after one sysadmin decided to test in a production > environment an rsync script in order to migrate a database from > one server to another one. > Postgresql (8.2) was running in source server and rsync script was > launched from second one. Second one server had a one day old copy > of the same database. Rsync script create a datafile replica in > destination server. > Our sysadmin swear he didn't launch the script in a reverse way > (that is, from destination to source)... so my question is > How this data duplication could happen? Where did this duplication happen -- source or destination? If in the source, I would suspect operator error by your sysadmin; I have never seen rsync cause problems with the source files. If on the target, I would suspect that you didn't follow PITR recovery techniques properly. If you need to follow up on this, please include a few more details, like the exact PostgreSQL version from `select version();` and the exact version of the OS. The exact rsync statement(s) used would be good to see, too -- maybe from the bash history (or similar) on the machines. -Kevin
On Mon, Dec 20, 2010 at 12:22 PM, Iñigo Martinez Lasala <imartinez@vectorsf.com> wrote: > Good evening. > > Yesterday we experienced some data duplication in several database tables > after one sysadmin decided to test in a production environment an rsync > script in order to migrate a database from one server to another one. > Postgresql (8.2) was running in source server and rsync script was launched > from second one. Second one server had a one day old copy of the same > database. Rsync script create a datafile replica in destination server. > Our sysadmin swear he didn't launch the script in a reverse way (that is, > from destination to source)... so my question is > How this data duplication could happen? Due to an rsync lock in checkpoint > segment or transaction logs? Or he has mistaken source with destination > server? I use this method all the time to backup big dbs from one machine to another: (on remote machine): /etc/init.d/postgresql stop (on main server): rsync -avl --delete /data/* backupserver:/data/* # takes forever sudo /etc/init.d/postgresql stop rsync -avl --delete /data/* backupserver:/data/* # fast, we're just catching up sudo /etc/init.d/postgresql start (on remote machine): /etc/init.d/postgresql start You can run such a script at 2 in the morning with a few minutes downtime without having to jump through a lot of hoops. So, can you show us what commands your sysadmin / dba actually typed? My guess is he left out the --delete or left the backup server db up and running at the time of the rsync. It needs to be shut down to do this right.
Re: Data duplication when moving datafiles from one server to another.
From
Iñigo Martinez Lasala
Date:
Hi Kevin, Scott.
Thank you for your response.
Data duplication happened on source server (that is, on production environment). This data duplication affected only on some tables, with some duplicated PKs and altered sequences (sequences value didn't match "select max()" and we had to fix it manually).
Postgres version is 8.2.7 64bit on Ubuntu Server 8.04 LTS
Rsync command was:
rsync -avt -lHpogDtx /srv/postgresql/8.2/main/ root@destination_server:/srv/postgresql/8.2/main/
and was launched from source server.
Remember: source server was running.
If rsync sentence was launched correctly (I assume yes), could a temporary lock over a checkpoint segment cause this behaviour?
We have managed to delete duplicated registers and restore incorrect sequences, but I don't really know what happened. Since I've been indirect responsable from this (sysadmin reports me) I need to clarify the cause. I need an valid explanation for our client.
-----Original Message-----
From: Kevin Grittner <Kevin.Grittner@wicourts.gov>
To: pgsql-admin@postgresql.org, IñigoMartinez Lasala <imartinez@vectorsf.com>
Subject: Re: [ADMIN] Data duplication when moving datafiles from one server to another.
Date: Mon, 20 Dec 2010 13:43:01 -0600
Thank you for your response.
Data duplication happened on source server (that is, on production environment). This data duplication affected only on some tables, with some duplicated PKs and altered sequences (sequences value didn't match "select max()" and we had to fix it manually).
Postgres version is 8.2.7 64bit on Ubuntu Server 8.04 LTS
Rsync command was:
rsync -avt -lHpogDtx /srv/postgresql/8.2/main/ root@destination_server:/srv/postgresql/8.2/main/
and was launched from source server.
Remember: source server was running.
If rsync sentence was launched correctly (I assume yes), could a temporary lock over a checkpoint segment cause this behaviour?
We have managed to delete duplicated registers and restore incorrect sequences, but I don't really know what happened. Since I've been indirect responsable from this (sysadmin reports me) I need to clarify the cause. I need an valid explanation for our client.
-----Original Message-----
From: Kevin Grittner <Kevin.Grittner@wicourts.gov>
To: pgsql-admin@postgresql.org, IñigoMartinez Lasala <imartinez@vectorsf.com>
Subject: Re: [ADMIN] Data duplication when moving datafiles from one server to another.
Date: Mon, 20 Dec 2010 13:43:01 -0600
IñigoMartinez Lasala <imartinez@vectorsf.com> wrote: > Yesterday we experienced some data duplication in several database > tables after one sysadmin decided to test in a production > environment an rsync script in order to migrate a database from > one server to another one. > Postgresql (8.2) was running in source server and rsync script was > launched from second one. Second one server had a one day old copy > of the same database. Rsync script create a datafile replica in > destination server. > Our sysadmin swear he didn't launch the script in a reverse way > (that is, from destination to source)... so my question is > How this data duplication could happen? Where did this duplication happen -- source or destination? If in the source, I would suspect operator error by your sysadmin; I have never seen rsync cause problems with the source files. If on the target, I would suspect that you didn't follow PITR recovery techniques properly. If you need to follow up on this, please include a few more details, like the exact PostgreSQL version from `select version();` and the exact version of the OS. The exact rsync statement(s) used would be good to see, too -- maybe from the bash history (or similar) on the machines. -Kevin
IñigoMartinez Lasala <imartinez@vectorsf.com> wrote: > Data duplication happened on source server (that is, on production > environment). This data duplication affected only on some tables, > with some duplicated PKs and altered sequences (sequences value > didn't match "select max()" and we had to fix it manually). We rsync from production servers all the time, and have never seen such problems. Maybe someone else has ideas beyond my long shots (below). > Postgres version is 8.2.7 64bit on Ubuntu Server 8.04 LTS There have been 12 bug fix releases for the 8.2 major release since 8.2.7. It's not impossible that you somehow hit a bug which has been fixed in the last 33 months. http://www.postgresql.org/support/versioning http://www.postgresql.org/docs/8.2/static/release.html > Rsync command was: > rsync -avt -lHpogDtx /srv/postgresql/8.2/main/ > root@destination_server:/srv/postgresql/8.2/main/ Did it run successfully on the first try, or might there have been an attempt where the target was mis-specified? A space instead of a colon between the target server and path might have caused the kind of corruption you're seeing. Also -- I would avoid using root for such things. We have remote root login disabled entirely on our machines, and limit the use of that login to situations where there is no reasonable alternative. > and was launched from source server. I find it's harder to accidentally damage the remote copy, so I run rsync from the target where possible. -Kevin