Re: [GENERAL] Move rows from one database to other - Mailing list pgsql-general

From Francisco Olarte
Subject Re: [GENERAL] Move rows from one database to other
Date
Msg-id CA+bJJbwjH7fhQD9RUN5iqjVCQud6pSSe3DHuSOwSW37rTEdXjw@mail.gmail.com
Whole thread Raw
In response to Re: [GENERAL] Move rows from one database to other  (Thomas Güttler <guettliml@thomas-guettler.de>)
Responses Re: [GENERAL] via psycopg2 or pg2pg? Move rows from one database toother  (Thomas Güttler <guettliml@thomas-guettler.de>)
List pgsql-general
Thomas:

On Thu, Feb 23, 2017 at 4:16 PM, Thomas Güttler
<guettliml@thomas-guettler.de> wrote:
> Am 22.02.2017 um 16:00 schrieb Adrian Klaver:
>> only written on a successful transfer. To improve the chances of
>> successful transfer more smaller transfer batches
>> rather then larger transfers.

> I really need a solid solution.
> You said "... improve the chances of successful transfer ...". This makes me
> nervous.

I think what Adrian say is you improve the individual transfer time,
if it fails you retry.

> Delays are no problems, but data loss or duplication is.

Remember you can never guarantee 'exactly once' without very complex
solutions, I think you can do "at least once" or "at most once". That
means lose or duplicate.

That being said, IF you have some kind of global, unchanging ( at the
central site ) unique key, you could try the following, using a
holding table in each satellite and assuming you have 'on conflict do
nothing'.

1.- Move rows from main to holding table in the satelite, in a single
transaction. This is to let you work with an unmovable set ( as your
process is the only one touching the holding tables ). If there is
some data in holding it is no problem, they are from a previous
crashed transfer.

2.- Insert every thing from the holding table in main, using on
conflict do nothing.

3.- When everything is commited in main, truncate the satellite holding table.

If satellite crashes in 1 it will roll back, you have not touched main.

If you crash in 2 you will find 1 partially full in the next round,
and main will be rolled back ( it's important to not commit until
everything is done in 2, i.e., if you have read problems in the
satellite do no go to 3, just crash and rollback everything ). You can
either do a loop with the current set or append more data, your
choice, does not matter, as you have to reinsert. The on conflict do
nothing in 2 will take care of potential duplicates.

If you crash in 3 you will transfer the lot again, but the do-nothing
in 2 will eliminate it and 3 will eventually purge it.

You can optimize on that, but basically you just repeat this until
everything goes fine. I do these ( just with two DBs, not 100 ) and it
works.

It does a lot of duplicate work, but only on problems, it normally runs smooth.

If you do not have "on conflict do nothing" ( I do not remember the
versions ) you can use an extra step. Instead of inserting in main in
2 do 2.a - Copy holding to main ( truncating before hand if copy
present ) and 2.b insert news from the copy, either by using and
anti-join with main or by deleting ( in the same transaction ) the
dupes before inserting.

Francisco Olarte.


pgsql-general by date:

Previous
From: Benoit Lobréau
Date:
Subject: Re: [GENERAL] NOTIFY command impact
Next
From: Francisco Olarte
Date:
Subject: Re: [GENERAL] Move rows from one database to other