Thread: [GENERAL] Move rows from one database to other

[GENERAL] Move rows from one database to other

From
Thomas Güttler
Date:
I want to move table rows from one database to an central database.

Both run PostgreSQL.

My use case looks like this:

There are N satellite databases in different data centers. N is about 100 at the moment.

There is one central database.

I need a way to reliably move rows from the satellite databases to the central one

Example

The rows of host1 look like this:

  host1, 2017-02-21, abc
  host1, 2017-02-20, def
  host1, 2017-02-19, ghi

The rows of host2 look like this:

  host2, 2017-02-21, foo
  host2, 2017-02-20, bar
  host2, 2017-02-19, blu

After syncing, all lines which were transferred should be deleted on the satellite databases.

The central table should look like this (it has the same schema)

  host1, 2017-02-21, abc
  host1, 2017-02-20, def
  host1, 2017-02-19, ghi
  host2, 2017-02-21, foo
  host2, 2017-02-20, bar
  host2, 2017-02-19, blu


I don't want to code this myself, since there a tons of possible race conditions:

  - inserts can happen during syncing.
  - Network can break during syncing.
  - inserts into the central table can break (e.g. disk full): No loss at the satellite database must happen.
  - ...

How to solve this with PostgreSQL?

Regards,
   Thomas Güttler




--
Thomas Guettler http://www.thomas-guettler.de/


Re: [GENERAL] Move rows from one database to other

From
Adrian Klaver
Date:
On 02/21/2017 12:53 AM, Thomas Güttler wrote:
> I want to move table rows from one database to an central database.

You actually talking about moving from ~100 databases to the central
database, correct?

>
> Both run PostgreSQL.

Are all the Postgres instances the same version and what is the version
or versions?

>
> My use case looks like this:
>
> There are N satellite databases in different data centers. N is about
> 100 at the moment.
>
> There is one central database.
>
> I need a way to reliably move rows from the satellite databases to the
> central one

Two ways I can think of:

https://www.postgresql.org/docs/9.6/static/dblink.html

https://www.postgresql.org/docs/9.6/static/postgres-fdw.html

>
> Example
>
> The rows of host1 look like this:
>
>  host1, 2017-02-21, abc
>  host1, 2017-02-20, def
>  host1, 2017-02-19, ghi
>
> The rows of host2 look like this:
>
>  host2, 2017-02-21, foo
>  host2, 2017-02-20, bar
>  host2, 2017-02-19, blu
>
> After syncing, all lines which were transferred should be deleted on the
> satellite databases.
>
> The central table should look like this (it has the same schema)
>
>  host1, 2017-02-21, abc
>  host1, 2017-02-20, def
>  host1, 2017-02-19, ghi
>  host2, 2017-02-21, foo
>  host2, 2017-02-20, bar
>  host2, 2017-02-19, blu

Is there a Primary Key on the satellite tables or some way of
determining unique rows?

Is there any existing overlap between the data in the central database
and the satellite databases?


>
>
> I don't want to code this myself, since there a tons of possible race
> conditions:

How much data are you talking about moving from each database?

How active are the satellite databases?

>
>  - inserts can happen during syncing.

Can UPDATEs happen?

>  - Network can break during syncing.
>  - inserts into the central table can break (e.g. disk full): No loss at
> the satellite database must happen.
>  - ...
>
> How to solve this with PostgreSQL?
>
> Regards,
>   Thomas Güttler
>
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] Move rows from one database to other

From
William Ivanski
Date:
You can try OmniDB: http://www.omnidb.com.br/en_index.aspx

OmniDB has a Convert feature, where you can set a data transfer, even if the target table exists.

Em ter, 21 de fev de 2017 às 11:18, Adrian Klaver <adrian.klaver@aklaver.com> escreveu:
On 02/21/2017 12:53 AM, Thomas Güttler wrote:
> I want to move table rows from one database to an central database.

You actually talking about moving from ~100 databases to the central
database, correct?

>
> Both run PostgreSQL.

Are all the Postgres instances the same version and what is the version
or versions?

>
> My use case looks like this:
>
> There are N satellite databases in different data centers. N is about
> 100 at the moment.
>
> There is one central database.
>
> I need a way to reliably move rows from the satellite databases to the
> central one

Two ways I can think of:

https://www.postgresql.org/docs/9.6/static/dblink.html

https://www.postgresql.org/docs/9.6/static/postgres-fdw.html

>
> Example
>
> The rows of host1 look like this:
>
>  host1, 2017-02-21, abc
>  host1, 2017-02-20, def
>  host1, 2017-02-19, ghi
>
> The rows of host2 look like this:
>
>  host2, 2017-02-21, foo
>  host2, 2017-02-20, bar
>  host2, 2017-02-19, blu
>
> After syncing, all lines which were transferred should be deleted on the
> satellite databases.
>
> The central table should look like this (it has the same schema)
>
>  host1, 2017-02-21, abc
>  host1, 2017-02-20, def
>  host1, 2017-02-19, ghi
>  host2, 2017-02-21, foo
>  host2, 2017-02-20, bar
>  host2, 2017-02-19, blu

Is there a Primary Key on the satellite tables or some way of
determining unique rows?

Is there any existing overlap between the data in the central database
and the satellite databases?


>
>
> I don't want to code this myself, since there a tons of possible race
> conditions:

How much data are you talking about moving from each database?

How active are the satellite databases?

>
>  - inserts can happen during syncing.

Can UPDATEs happen?

>  - Network can break during syncing.
>  - inserts into the central table can break (e.g. disk full): No loss at
> the satellite database must happen.
>  - ...
>
> How to solve this with PostgreSQL?
>
> Regards,
>   Thomas Güttler
>
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--

William Ivanski - Microsoft MVP

Re: [GENERAL] Move rows from one database to other

From
Melvin Davidson
Date:


On Tue, Feb 21, 2017 at 9:27 AM, William Ivanski <william.ivanski@gmail.com> wrote:
You can try OmniDB: http://www.omnidb.com.br/en_index.aspx

OmniDB has a Convert feature, where you can set a data transfer, even if the target table exists.

Em ter, 21 de fev de 2017 às 11:18, Adrian Klaver <adrian.klaver@aklaver.com> escreveu:
On 02/21/2017 12:53 AM, Thomas Güttler wrote:
> I want to move table rows from one database to an central database.

You actually talking about moving from ~100 databases to the central
database, correct?

>
> Both run PostgreSQL.

Are all the Postgres instances the same version and what is the version
or versions?

>
> My use case looks like this:
>
> There are N satellite databases in different data centers. N is about
> 100 at the moment.
>
> There is one central database.
>
> I need a way to reliably move rows from the satellite databases to the
> central one

Two ways I can think of:

https://www.postgresql.org/docs/9.6/static/dblink.html

https://www.postgresql.org/docs/9.6/static/postgres-fdw.html

>
> Example
>
> The rows of host1 look like this:
>
>  host1, 2017-02-21, abc
>  host1, 2017-02-20, def
>  host1, 2017-02-19, ghi
>
> The rows of host2 look like this:
>
>  host2, 2017-02-21, foo
>  host2, 2017-02-20, bar
>  host2, 2017-02-19, blu
>
> After syncing, all lines which were transferred should be deleted on the
> satellite databases.
>
> The central table should look like this (it has the same schema)
>
>  host1, 2017-02-21, abc
>  host1, 2017-02-20, def
>  host1, 2017-02-19, ghi
>  host2, 2017-02-21, foo
>  host2, 2017-02-20, bar
>  host2, 2017-02-19, blu

Is there a Primary Key on the satellite tables or some way of
determining unique rows?

Is there any existing overlap between the data in the central database
and the satellite databases?


>
>
> I don't want to code this myself, since there a tons of possible race
> conditions:

How much data are you talking about moving from each database?

How active are the satellite databases?

>
>  - inserts can happen during syncing.

Can UPDATEs happen?

>  - Network can break during syncing.
>  - inserts into the central table can break (e.g. disk full): No loss at
> the satellite database must happen.
>  - ...
>
> How to solve this with PostgreSQL?
>
> Regards,
>   Thomas Güttler
>
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--

William Ivanski - Microsoft MVP

Depending on how much data you want to move, and if the tables have the same structure, you might also want to consider using
pg_dump -a

OR
multiple instances of
on satellite
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) } TO { 'filename' | PROGRAM 'command' | STDOUT } [ [ WITH ] ( option [, ...] ) ]
on central
COPY table_name [ ( column_name [, ...] ) ] FROM { 'filename' | PROGRAM 'command' | STDIN } [ [ WITH ] ( option [, ...] ) ] A BIG consideration is:
Does the Central DB have the same table structures as all satellite DB's?
Does the Central DB already have records in the tables.
Do all Satellite tables have unique records for each other?

As Adrian stated, it would be very helpful if you provided us with all O/S and PostgreSQL vesions involved.

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: [GENERAL] Move rows from one database to other

From
Moreno Andreo
Date:
Il 21/02/2017 15:38, Melvin Davidson ha scritto:
Depending on how much data you want to move, and if the tables have the same structure, you might also want to consider using
pg_dump -a

OR
multiple instances of
on satellite
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }   TO { 'filename' | PROGRAM 'command' | STDOUT }   [ [ WITH ] ( option [, ...] ) ]
on central
COPY table_name [ ( column_name [, ...] ) ]   FROM { 'filename' | PROGRAM 'command' | STDIN }   [ [ WITH ] ( option [, ...] ) ]

A BIG consideration is: 
Does the Central DB have the same table structures as all satellite DB's?
Does the Central DB already have records in the tables.
Do all Satellite tables have unique records for each other?

As Adrian stated, it would be very helpful if you provided us with all O/S and PostgreSQL vesions involved.

AFAIK pg_dump is version-independent (I use it to upgrade some small clusters), given that you are using the one from the newest version and you are dumping from old and restoring from new.
Another thing that I saw is that pg_dump is less I/O stressful (I can almost safely dump databases from my production server in traffic hours, even if I never tried doing it in rush hours :-) ) and therefore slower than COPY, but on the other side, using COPY, the risk is to have all I/O bandwidth eaten by data transfer, especially if the amount of data to be transferred is considerable. I tried some times to COPY across different clusters with different versions, but I always had issues.

My 2 cents.

Cheers
Moreno.


--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] Move rows from one database to other

From
Thomas Güttler
Date:
Am 21.02.2017 um 15:12 schrieb Adrian Klaver:
> On 02/21/2017 12:53 AM, Thomas Güttler wrote:
>> I want to move table rows from one database to an central database.
>
> You actually talking about moving from ~100 databases to the central database, correct?
>
>>
>> Both run PostgreSQL.
>
> Are all the Postgres instances the same version and what is the version or versions?

Yes, all run postgres, but the version can be different (but not much).
Satellite-DB 9.5 and 9.6 and central 9.6.


>
>>
>> My use case looks like this:
>>
>> There are N satellite databases in different data centers. N is about
>> 100 at the moment.
>>
>> There is one central database.
>>
>> I need a way to reliably move rows from the satellite databases to the
>> central one
>
> Two ways I can think of:
>
> https://www.postgresql.org/docs/9.6/static/dblink.html
>
> https://www.postgresql.org/docs/9.6/static/postgres-fdw.html


Since  postgres_fdw is newer, I would focus on it, right?

If I understood it correctly, then there would be N (about 100) tables in the central DB.

What happens if there is a network outage (for about 3 minutes) during accessing a remote table?


> Is there a Primary Key on the satellite tables or some way of determining unique rows?

The concrete schema is not specified up to now. But I guess UUID as primary key would be the best fit.
Or am I wrong?


> Is there any existing overlap between the data in the central database and the satellite databases?

No, there won't be overlaps. Every satellite system creates its own rows.

> How much data are you talking about moving from each database?
 > How active are the satellite databases?

100k rows per day per satellite. Each row has only few bytes.

Moving of rows should happen every ten minutes.

>>  - inserts can happen during syncing.
>
> Can UPDATEs happen?

No, rows get created and moved and later deleted.

Thank you Adrian for your questions. It helped me to narrow down my problem.

Regards,
   Thomas


--
Thomas Guettler http://www.thomas-guettler.de/


Re: [GENERAL] Move rows from one database to other

From
Adrian Klaver
Date:
On 02/21/2017 12:53 AM, Thomas Güttler wrote:
> I want to move table rows from one database to an central database.
>
> Both run PostgreSQL.
>

>
> How to solve this with PostgreSQL?

Should have added earlier. This is a specific case of the more general
case of ETL(Extract/Transform/Load). There are a host of tools out there
that do this. For instance I use Python and the following is available:

http://petl.readthedocs.io/en/latest/index.html

What tool you choose comes down to what you are comfortable with:

1) Writing your own programs/scripts and in what language?

2) Using a GUI that sets things up for you.

>
> Regards,
>   Thomas Güttler
>
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] Move rows from one database to other

From
Thomas Güttler
Date:

Am 21.02.2017 um 15:27 schrieb William Ivanski:
> You can try OmniDB: http://www.omnidb.com.br/en_index.aspx
>
> OmniDB has a Convert feature, where you can set a data transfer, even if the target table exists.

I am unsure if omnidb is the right tool here.

I don't need a GUI. The movement of the rows should happen in background.

Regards,
   Thomas Güttler

--
Thomas Guettler http://www.thomas-guettler.de/


Re: [GENERAL] Move rows from one database to other

From
Thomas Güttler
Date:
>
> Depending on how much data you want to move, and if the tables have the same structure, you might also want to
consider
> using
> pg_dump -a
>
> OR
> multiple instances of
>
> on satellite
> COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
>     TO { 'filename' | PROGRAM 'command' | STDOUT }
>     [ [ WITH ] ( option [, ...] ) ]
>
> on central
> COPY table_name [ ( column_name [, ...] ) ]
>     FROM { 'filename' | PROGRAM 'command' | STDIN }
>     [ [ WITH ] ( option [, ...] ) ]
>
> A BIG consideration is:
> Does the Central DB have the same table structures as all satellite DB's?

yes, same schema

> Does the Central DB already have records in the tables.

yes, movement of rows should happen every ten minutes.

> Do all Satellite tables have unique records for each other?

Yes, UUID primary key.

> As Adrian stated, it would be very helpful if you provided us with all O/S and PostgreSQL vesions involved.

Versions are 9.5 and 9.6

I have other concerns: atomar transaction. Movement should happen completely or not all.

I don't think you can do this reliable (atomic transaction) with "copy table_name".

Regards,
   Thomas



--
Thomas Guettler http://www.thomas-guettler.de/


Re: [GENERAL] Move rows from one database to other

From
Melvin Davidson
Date:


On Tue, Feb 21, 2017 at 11:10 AM, Thomas Güttler <guettliml@thomas-guettler.de> wrote:

Depending on how much data you want to move, and if the tables have the same structure, you might also want to consider
using
pg_dump -a

OR
multiple instances of

on satellite
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
    TO { 'filename' | PROGRAM 'command' | STDOUT }
    [ [ WITH ] ( option [, ...] ) ]

on central
COPY table_name [ ( column_name [, ...] ) ]
    FROM { 'filename' | PROGRAM 'command' | STDIN }
    [ [ WITH ] ( option [, ...] ) ]

A BIG consideration is:
Does the Central DB have the same table structures as all satellite DB's?

yes, same schema

Does the Central DB already have records in the tables.

yes, movement of rows should happen every ten minutes.

Do all Satellite tables have unique records for each other?

Yes, UUID primary key.

As Adrian stated, it would be very helpful if you provided us with all O/S and PostgreSQL vesions involved.

Versions are 9.5 and 9.6

I have other concerns: atomar transaction. Movement should happen completely or not all.

I don't think you can do this reliable (atomic transaction) with "copy table_name".

Regards,
  Thomas



--
Thomas Guettler http://www.thomas-guettler.de/

>I have other concerns: atomar transaction. Movement should happen completely or not all.
>I don't think you can do this reliable (atomic transaction) with "copy table_name".

You can if you wrap it in a transaction:
EG:
BEGIN;
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
    TO { 'filename' | PROGRAM 'command' | STDOUT }
    [ [ WITH ] ( option [, ...] ) ]
COMMIT;

BEGIN;
COPY table_name [ ( column_name [, ...] ) ]
    FROM { 'filename' | PROGRAM 'command' | STDIN }
    [ [ WITH ] ( option [, ...] ) ]
COMMIT;

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: [GENERAL] Move rows from one database to other

From
Adrian Klaver
Date:
On 02/21/2017 07:53 AM, Thomas Güttler wrote:
>
> Am 21.02.2017 um 15:12 schrieb Adrian Klaver:
>> On 02/21/2017 12:53 AM, Thomas Güttler wrote:
>>> I want to move table rows from one database to an central database.
>>
>> You actually talking about moving from ~100 databases to the central
>> database, correct?
>>
>>>
>>> Both run PostgreSQL.
>>
>> Are all the Postgres instances the same version and what is the
>> version or versions?
>
> Yes, all run postgres, but the version can be different (but not much).
> Satellite-DB 9.5 and 9.6 and central 9.6.
>
>
>>
>>>
>>> My use case looks like this:
>>>
>>> There are N satellite databases in different data centers. N is about
>>> 100 at the moment.
>>>
>>> There is one central database.
>>>
>>> I need a way to reliably move rows from the satellite databases to the
>>> central one
>>
>> Two ways I can think of:
>>
>> https://www.postgresql.org/docs/9.6/static/dblink.html
>>
>> https://www.postgresql.org/docs/9.6/static/postgres-fdw.html
>
>
> Since  postgres_fdw is newer, I would focus on it, right?
>
> If I understood it correctly, then there would be N (about 100) tables
> in the central DB.
>
> What happens if there is a network outage (for about 3 minutes) during
> accessing a remote table?

I misunderstood your original intent, I thought this was a one time
process to move data to the central database. Given that it is to be a
continuous process a FDW may not be the answer, one of the reasons being
the above question. You will be denied the data in the remote table
during the outage. Also not sure what you will be doing with the data in
the central database and how often? In any case it will involve reaching
out to all the satellites each time you want to query the latest data.
Looks more like some kind of push mechanism from the satellites to the
central database is in order. Then once the data is on the central
database it is 'captured'.  A question that comes to mind is if there is
a problem should the data transfer from one or more satellites lag that
of the others?


>
>> Is there a Primary Key on the satellite tables or some way of
>> determining unique rows?
>
> The concrete schema is not specified up to now. But I guess UUID as
> primary key would be the best fit.
> Or am I wrong?
>
>
>> Is there any existing overlap between the data in the central database
>> and the satellite databases?
>
> No, there won't be overlaps. Every satellite system creates its own rows.
>
>> How much data are you talking about moving from each database?
>> How active are the satellite databases?
>
> 100k rows per day per satellite. Each row has only few bytes.
>
> Moving of rows should happen every ten minutes.
>
>>>  - inserts can happen during syncing.
>>
>> Can UPDATEs happen?
>
> No, rows get created and moved and later deleted.
>
> Thank you Adrian for your questions. It helped me to narrow down my
> problem.
>
> Regards,
>   Thomas
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] Move rows from one database to other

From
Adrian Klaver
Date:
On 02/21/2017 08:06 AM, Thomas Güttler wrote:
>
>
> Am 21.02.2017 um 15:27 schrieb William Ivanski:
>> You can try OmniDB: http://www.omnidb.com.br/en_index.aspx
>>
>> OmniDB has a Convert feature, where you can set a data transfer, even
>> if the target table exists.
>
> I am unsure if omnidb is the right tool here.
>
> I don't need a GUI. The movement of the rows should happen in background.

Given the versions of Postgres you are using there is logical replication:

https://www.postgresql.org/docs/9.5/static/protocol-replication.html

https://2ndquadrant.com/en/resources/pglogical/
"AGGREGATE - Accumulate changes from sharded database servers into a
Data Warehouse"

I have not used this capability yet, so others would have to comment on
its applicability.

>
> Regards,
>   Thomas Güttler
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] Move rows from one database to other

From
Thomas Güttler
Date:
> I misunderstood your original intent, I thought this was a one time process to move data to the central database.
Given
> that it is to be a continuous process a FDW may not be the answer, one of the reasons being the above question. You
will
> be denied the data in the remote table during the outage. Also not sure what you will be doing with the data in the
> central database and how often? In any case it will involve reaching out to all the satellites each time you want to
> query the latest data. Looks more like some kind of push mechanism from the satellites to the central database is in
> order. Then once the data is on the central database it is 'captured'.  A question that comes to mind is if there is
a
> problem should the data transfer from one or more satellites lag that of the others?

I am sorry. Yes, I forgot to mention that the process should be a continous push.

If not all data is in the central, that's ok.

The central should only see the data which is already in the central place.

Up to now it is not important who triggers this process. It could be a push (from satellite to central) or a pull (from

central to satellite)

If there is a lag ... then there is a lag. That can happen (network outage), this is not very important.
More important is that no single row gets lost or duplicated.

Regards,
   Thomas

--
Thomas Guettler http://www.thomas-guettler.de/


Re: [GENERAL] Move rows from one database to other

From
Thomas Güttler
Date:
>>I have other concerns: atomar transaction. Movement should happen completely or not all.
>>I don't think you can do this reliable (atomic transaction) with "copy table_name".
>
> You can if you wrap it in a transaction:

I want to **move** the data. The data should get deleted on the satellite after transfer.

I don't know how to delete the data which was copied, since inserts can happen during the copy statement.

Regards,
   Thomas Güttler



--
Thomas Guettler http://www.thomas-guettler.de/


Re: [GENERAL] Move rows from one database to other

From
Adrian Klaver
Date:
On 02/22/2017 04:51 AM, Thomas Güttler wrote:
>>> I have other concerns: atomar transaction. Movement should happen
>>> completely or not all.
>>> I don't think you can do this reliable (atomic transaction) with
>>> "copy table_name".
>>
>> You can if you wrap it in a transaction:
>
> I want to **move** the data. The data should get deleted on the
> satellite after transfer.

Well the replication suggestion is out.

>
> I don't know how to delete the data which was copied, since inserts can
> happen during the copy statement.

However you end up doing this I think you will probably need some sort
of flag on the rows on the satellites. It could be a timestamp field of
when the rows where inserted on the satellite or a boolean
field(copied). First instinct is to use an insert timestamp and a
tracking table that stores the last timestamp used to move rows, where
the timestamp is only written on a successful transfer. To improve the
chances of successful transfer more smaller transfer batches rather then
larger transfers.

>
> Regards,
>   Thomas Güttler
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] Move rows from one database to other

From
Francisco Olarte
Date:
Thomas:

On Wed, Feb 22, 2017 at 1:51 PM, Thomas Güttler
<guettliml@thomas-guettler.de> wrote:
> I want to **move** the data. The data should get deleted on the satellite
> after transfer.
> I don't know how to delete the data which was copied, since inserts can
> happen during the copy statement.

Depending on the structure / control / kind of ops you have on the
satellite you can try a two-table trick, similar to a common one used
to process files in spool directories.

1st, you move everything to a holding table transactionally in the
satellite ( insert into holding table delete from main table returning
* )

Then you have the holding table to transfer and clear, not touched by
anyone ( as transfer process is the only one touching it )

You can do some tricks. You could inherit holding from main if you
want to be able to see data while it is been transferred. Or play
renaming tricks. It all depends in the concrete app, but you can try
to fit the pattern in it, I've done it several times and its a useful
one.

Francisco Olarte.


Re: [GENERAL] Move rows from one database to other

From
Leknín Řepánek
Date:
Maybe you can do something like

WITH cte AS (
DELETE FROM t1 WHERE cond
RETURNINIG *
)
INSERT into t2
SELECT * FROM cte;

To move rows between tables with combination with fdw_postgres and
foreign table.

Je;

On Thu, Feb 23, 2017 at 10:33:27AM +0100, Francisco Olarte wrote:
> Thomas:
>
> On Wed, Feb 22, 2017 at 1:51 PM, Thomas Güttler
> <guettliml@thomas-guettler.de> wrote:
> > I want to **move** the data. The data should get deleted on the satellite
> > after transfer.
> > I don't know how to delete the data which was copied, since inserts can
> > happen during the copy statement.
>
> Depending on the structure / control / kind of ops you have on the
> satellite you can try a two-table trick, similar to a common one used
> to process files in spool directories.
>
> 1st, you move everything to a holding table transactionally in the
> satellite ( insert into holding table delete from main table returning
> * )
>
> Then you have the holding table to transfer and clear, not touched by
> anyone ( as transfer process is the only one touching it )
>
> You can do some tricks. You could inherit holding from main if you
> want to be able to see data while it is been transferred. Or play
> renaming tricks. It all depends in the concrete app, but you can try
> to fit the pattern in it, I've done it several times and its a useful
> one.
>
> Francisco Olarte.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Move rows from one database to other

From
Thomas Güttler
Date:

Am 22.02.2017 um 16:00 schrieb Adrian Klaver:
> On 02/22/2017 04:51 AM, Thomas Güttler wrote:
>>>> I have other concerns: atomar transaction. Movement should happen
>>>> completely or not all.
>>>> I don't think you can do this reliable (atomic transaction) with
>>>> "copy table_name".
>>>
>>> You can if you wrap it in a transaction:
>>
>> I want to **move** the data. The data should get deleted on the
>> satellite after transfer.
>
> Well the replication suggestion is out.
>
>>
>> I don't know how to delete the data which was copied, since inserts can
>> happen during the copy statement.
>
> However you end up doing this I think you will probably need some sort of flag on the rows on the satellites. It
could
> be a timestamp field of when the rows where inserted on the satellite or a boolean field(copied). First instinct is
to
> use an insert timestamp and a tracking table that stores the last timestamp used to move rows, where the timestamp is
> only written on a successful transfer. To improve the chances of successful transfer more smaller transfer batches
> rather then larger transfers.


I really need a solid solution.

You said "... improve the chances of successful transfer ...". This makes me nervous.

Delays are no problems, but data loss or duplication is.

Regards,
   Thomas



--
Thomas Guettler http://www.thomas-guettler.de/


Re: [GENERAL] Move rows from one database to other

From
Thomas Güttler
Date:

Am 23.02.2017 um 10:33 schrieb Francisco Olarte:
> Thomas:
>
> On Wed, Feb 22, 2017 at 1:51 PM, Thomas Güttler
> <guettliml@thomas-guettler.de> wrote:
>> I want to **move** the data. The data should get deleted on the satellite
>> after transfer.
>> I don't know how to delete the data which was copied, since inserts can
>> happen during the copy statement.



> Depending on the structure / control / kind of ops you have on the
> satellite you can try a two-table trick, similar to a common one used
> to process files in spool directories.
 > [cut]

This sound good. Is there a name for this trick, to find more details?

Regards,
   Thomas


--
Thomas Guettler http://www.thomas-guettler.de/


Re: [GENERAL] Move rows from one database to other

From
Thomas Güttler
Date:

Am 23.02.2017 um 13:44 schrieb Leknín Řepánek:
> Maybe you can do something like
>
> WITH cte AS (
> DELETE FROM t1 WHERE cond
> RETURNINIG *
> )
> INSERT into t2
> SELECT * FROM cte;
>
> To move rows between tables with combination with fdw_postgres and
> foreign table.


... this way you don't need the second table ... nice.


--
Thomas Guettler http://www.thomas-guettler.de/


Re: [GENERAL] Move rows from one database to other

From
Francisco Olarte
Date:
Thomas:

On Thu, Feb 23, 2017 at 4:16 PM, Thomas Güttler
<guettliml@thomas-guettler.de> wrote:
> Am 22.02.2017 um 16:00 schrieb Adrian Klaver:
>> only written on a successful transfer. To improve the chances of
>> successful transfer more smaller transfer batches
>> rather then larger transfers.

> I really need a solid solution.
> You said "... improve the chances of successful transfer ...". This makes me
> nervous.

I think what Adrian say is you improve the individual transfer time,
if it fails you retry.

> Delays are no problems, but data loss or duplication is.

Remember you can never guarantee 'exactly once' without very complex
solutions, I think you can do "at least once" or "at most once". That
means lose or duplicate.

That being said, IF you have some kind of global, unchanging ( at the
central site ) unique key, you could try the following, using a
holding table in each satellite and assuming you have 'on conflict do
nothing'.

1.- Move rows from main to holding table in the satelite, in a single
transaction. This is to let you work with an unmovable set ( as your
process is the only one touching the holding tables ). If there is
some data in holding it is no problem, they are from a previous
crashed transfer.

2.- Insert every thing from the holding table in main, using on
conflict do nothing.

3.- When everything is commited in main, truncate the satellite holding table.

If satellite crashes in 1 it will roll back, you have not touched main.

If you crash in 2 you will find 1 partially full in the next round,
and main will be rolled back ( it's important to not commit until
everything is done in 2, i.e., if you have read problems in the
satellite do no go to 3, just crash and rollback everything ). You can
either do a loop with the current set or append more data, your
choice, does not matter, as you have to reinsert. The on conflict do
nothing in 2 will take care of potential duplicates.

If you crash in 3 you will transfer the lot again, but the do-nothing
in 2 will eliminate it and 3 will eventually purge it.

You can optimize on that, but basically you just repeat this until
everything goes fine. I do these ( just with two DBs, not 100 ) and it
works.

It does a lot of duplicate work, but only on problems, it normally runs smooth.

If you do not have "on conflict do nothing" ( I do not remember the
versions ) you can use an extra step. Instead of inserting in main in
2 do 2.a - Copy holding to main ( truncating before hand if copy
present ) and 2.b insert news from the copy, either by using and
anti-join with main or by deleting ( in the same transaction ) the
dupes before inserting.

Francisco Olarte.


Re: [GENERAL] Move rows from one database to other

From
Francisco Olarte
Date:
Thomas:

On Thu, Feb 23, 2017 at 4:26 PM, Thomas Güttler
<guettliml@thomas-guettler.de> wrote:
> Am 23.02.2017 um 13:44 schrieb Leknín Řepánek:
>> Maybe you can do something like
>> WITH cte AS (
>> DELETE FROM t1 WHERE cond
>> RETURNINIG *
>> )
>> INSERT into t2
>> SELECT * FROM cte;
>>
>> To move rows between tables with combination with fdw_postgres and
>> foreign table.

> ... this way you don't need the second table ... nice.

If you do this WITH a FDW be careful with the semantics, as I think
you have several commits.

If the local ( t1 ) is commited before the remote, you can lose rows.

If the remote ( t2 ) is commited before the local you'll have dupes,
so you need some way to purge them.

These things can be solved with the aid of transaction manager, and
prepared transactions, but I'm not sure of the status of it in your
versions, and those things are difficult.

Francisco Olarte.


Re: [GENERAL] Move rows from one database to other

From
Francisco Olarte
Date:
Thomas:

On Thu, Feb 23, 2017 at 4:25 PM, Thomas Güttler
<guettliml@thomas-guettler.de> wrote:
> This sound good. Is there a name for this trick, to find more details?

Not that I know of. It's really old, basic stuff, with many variations
possible. I've being doing variation of it since the half-inch tape
and punched cards times, and use it a lot for file processing ( as
renaming in Linux is atomic in many filesystems )

Francisco Olarte.


Re: [GENERAL] via psycopg2 or pg2pg? Move rows from one database toother

From
Thomas Güttler
Date:
Thank you for explaining the steps of your algorithm.

Just one question: How to do the actual transfer of data?

I see two solutions:

1, Read the data into a script (via psycopg2 (we love python))
and dump it into a second connection.

2, connect postgres to postgres and transfer the data without a database
adapter like psycopg2.

Regards,
   Thomas



Am 23.02.2017 um 17:40 schrieb Francisco Olarte:
> Thomas:
>
> On Thu, Feb 23, 2017 at 4:16 PM, Thomas Güttler
> <guettliml@thomas-guettler.de> wrote:
>> Am 22.02.2017 um 16:00 schrieb Adrian Klaver:
>>> only written on a successful transfer. To improve the chances of
>>> successful transfer more smaller transfer batches
>>> rather then larger transfers.
>
>> I really need a solid solution.
>> You said "... improve the chances of successful transfer ...". This makes me
>> nervous.
>
> I think what Adrian say is you improve the individual transfer time,
> if it fails you retry.
>
>> Delays are no problems, but data loss or duplication is.
>
> Remember you can never guarantee 'exactly once' without very complex
> solutions, I think you can do "at least once" or "at most once". That
> means lose or duplicate.
>
> That being said, IF you have some kind of global, unchanging ( at the
> central site ) unique key, you could try the following, using a
> holding table in each satellite and assuming you have 'on conflict do
> nothing'.
>
> 1.- Move rows from main to holding table in the satelite, in a single
> transaction. This is to let you work with an unmovable set ( as your
> process is the only one touching the holding tables ). If there is
> some data in holding it is no problem, they are from a previous
> crashed transfer.
>
> 2.- Insert every thing from the holding table in main, using on
> conflict do nothing.
>
> 3.- When everything is commited in main, truncate the satellite holding table.
>
> If satellite crashes in 1 it will roll back, you have not touched main.
>
> If you crash in 2 you will find 1 partially full in the next round,
> and main will be rolled back ( it's important to not commit until
> everything is done in 2, i.e., if you have read problems in the
> satellite do no go to 3, just crash and rollback everything ). You can
> either do a loop with the current set or append more data, your
> choice, does not matter, as you have to reinsert. The on conflict do
> nothing in 2 will take care of potential duplicates.
>
> If you crash in 3 you will transfer the lot again, but the do-nothing
> in 2 will eliminate it and 3 will eventually purge it.
>
> You can optimize on that, but basically you just repeat this until
> everything goes fine. I do these ( just with two DBs, not 100 ) and it
> works.
>
> It does a lot of duplicate work, but only on problems, it normally runs smooth.
>
> If you do not have "on conflict do nothing" ( I do not remember the
> versions ) you can use an extra step. Instead of inserting in main in
> 2 do 2.a - Copy holding to main ( truncating before hand if copy
> present ) and 2.b insert news from the copy, either by using and
> anti-join with main or by deleting ( in the same transaction ) the
> dupes before inserting.
>
> Francisco Olarte.
>
>

--
Thomas Guettler http://www.thomas-guettler.de/


Re: [GENERAL] via psycopg2 or pg2pg? Move rows from one database to other

From
Francisco Olarte
Date:
Thomas:

On Mon, Feb 27, 2017 at 12:47 PM, Thomas Güttler
<guettliml@thomas-guettler.de> wrote:
> Thank you for explaining the steps of your algorithm.

My pleasure. But check it anyway, I may have forgotten something ( I
normally implement this things after writing a big flow diagram on a
piece of paper and checking it for a while, or something similar, I
find easier to spot the missing spots graphically )

> Just one question: How to do the actual transfer of data?

It does not matter too much, but ..

> I see two solutions:
> 1, Read the data into a script (via psycopg2 (we love python))
> and dump it into a second connection.
> 2, connect postgres to postgres and transfer the data without a database
> adapter like psycopg2.

For 2 you need and adapter, the foreign data wrapper, anyway. I
personally would go for 1, especially if you can collocate the program
near main db ( same machine or network, so you can have enough speed )
. Normally problems are much easier to diagnose this way, as you only
deal with psycopg2, not with psyco AND fdw, and you will need a
program to do the transfers anyway. Also, IIRC, you had a lot of
machines, so you will need a main program to do all the retrying and
accounting. And you can optimize some things ( like copying from
several satellites and then inserting them at once ).

YMMV anyway, just use whichever is easier for you, but avoid false lazyness ;-)

Francisco Olarte.