Thread: Archiving Data to Another DB?

Archiving Data to Another DB?

From

Don Seiler

Date:

11 April 2018, 19:15:12

Let's say I have two DBs: main (9.6.6) and archive (10.2).

I have a table in main where I want to archive data older then 60 days. For various reasons, the table is not partitioned, so for now we must use DELETE. The destination table in the archive DB is partitioned with the new Pg10 partitioning.

My initial plan was to have a stored procedure on the archive DB use postgres_fdw to do an INSERT INTO / SELECT to select the data remotely and insert into the local archive table. It would then issue a single DELETE command to remotely remove the data from the main DB. However I found that doing this resulted in the main DB calling thousands (perhaps millions if it's one-per-row) of individual DELETE statements based on a ctid column. Aside from WAL behavior concerns, it is flooding my postgresql server logs since I log any DML.

On top of that, I'm told that a remote DELETE wouldn't be transactional, so if I were to compare inserted rows vs deleted rows and found a mismatch, I couldn't just rollback the DELETE. I plan to verify this with a small test case later but for now I'll assume this to be true.

Right now I'm thinking of falling back to the far-less-elegant method of dumping the data to a flat file via COPY, running psql to connect to the archive DB remotely and running a COPY to load the data (or maybe transferring the flat file to the archive DB to load it there, offloading that part of the workload), then deleting the data from the main DB. I could capture the rows dumped in a control table and compare the rows deleted against that and then rollback the delete if necessary.

Like I said, not elegant, but I don't want to risk losing data that wasn't successfully archived to the archive DB. I'm very interested to hear what others might be doing for tasks like this.

Don.

Don Seiler
www.seiler.us

Re: Archiving Data to Another DB?

From

Ron

Date:

11 April 2018, 19:24:41


On 04/11/2018 11:15 AM, Don Seiler wrote:
> Let's say I have two DBs: main (9.6.6) and archive (10.2).
>
> I have a table in main where I want to archive data older then 60 days. 
> For various reasons, the table is not partitioned, so for now we must use 
> DELETE. The destination table in the archive DB is partitioned with the 
> new Pg10 partitioning.
>
> My initial plan was to have a stored procedure on the archive DB use 
> postgres_fdw to do an INSERT INTO / SELECT to select the data remotely and 
> insert into the local archive table. It would then issue a single DELETE 
> command to remotely remove the data from the main DB. However I found that 
> doing this resulted in the main DB calling thousands (perhaps millions if 
> it's one-per-row) of individual DELETE statements based on a ctid column. 
> Aside from WAL behavior concerns, it is flooding my postgresql server logs 
> since I log any DML.
>
> On top of that, I'm told that a remote DELETE wouldn't be transactional, 
> so if I were to compare inserted rows vs deleted rows and found a 
> mismatch, I couldn't just rollback the DELETE. I plan to verify this with 
> a small test case later but for now I'll assume this to be true.
>
> Right now I'm thinking of falling back to the far-less-elegant method of 
> dumping the data to a flat file via COPY, running psql to connect to the 
> archive DB remotely and running a COPY to load the data (or maybe 
> transferring the flat file to the archive DB to load it there, offloading 
> that part of the workload), then deleting the data from the main DB. I 
> could capture the rows dumped in a control table and compare the rows 
> deleted against that and then rollback the delete if necessary.
>
> Like I said, not elegant, but I don't want to risk losing data that wasn't 
> successfully archived to the archive DB. I'm very interested to hear what 
> others might be doing for tasks like this.

It might not be elegant, but a COPY / DELETE / LOAD is granular, so you can 
restart at any point.


-- 
Angular momentum makes the world go 'round.

Re: Archiving Data to Another DB?

From

"David G. Johnston"

Date:

11 April 2018, 19:37:45

On Wed, Apr 11, 2018 at 9:15 AM, Don Seiler <don@seiler.us> wrote:

Right now I'm thinking of falling back to the far-less-elegant method of dumping the data to a flat file via COPY, running psql to connect to the archive DB remotely and running a COPY to load the data (or maybe transferring the flat file to the archive DB to load it there, offloading that part of the workload), then deleting the data from the main DB. I could capture the rows dumped in a control table and compare the rows deleted against that and then rollback the delete if necessary.

This is what I would lean toward - with a data compression/decompression step surrounding the network transfer.

Not sure exactly how it might fit in but don't forget about "DELETE FROM ... RETURNING *" and being able to place that into a CTE/WITH clause (same goes for insert/update)

David J.

Re: Archiving Data to Another DB?

From

Rob Sargent

Date:

11 April 2018, 20:37:08


On 04/11/2018 10:24 AM, Ron wrote:
>
>
> On 04/11/2018 11:15 AM, Don Seiler wrote:
>> Let's say I have two DBs: main (9.6.6) and archive (10.2).
>>
>> I have a table in main where I want to archive data older then 60 
>> days. For various reasons, the table is not partitioned, so for now 
>> we must use DELETE. The destination table in the archive DB is 
>> partitioned with the new Pg10 partitioning.
>>
>> My initial plan was to have a stored procedure on the archive DB use 
>> postgres_fdw to do an INSERT INTO / SELECT to select the data 
>> remotely and insert into the local archive table. It would then issue 
>> a single DELETE command to remotely remove the data from the main DB. 
>> However I found that doing this resulted in the main DB calling 
>> thousands (perhaps millions if it's one-per-row) of individual DELETE 
>> statements based on a ctid column. Aside from WAL behavior concerns, 
>> it is flooding my postgresql server logs since I log any DML.
>>
>> On top of that, I'm told that a remote DELETE wouldn't be 
>> transactional, so if I were to compare inserted rows vs deleted rows 
>> and found a mismatch, I couldn't just rollback the DELETE. I plan to 
>> verify this with a small test case later but for now I'll assume this 
>> to be true.
>>
>> Right now I'm thinking of falling back to the far-less-elegant method 
>> of dumping the data to a flat file via COPY, running psql to connect 
>> to the archive DB remotely and running a COPY to load the data (or 
>> maybe transferring the flat file to the archive DB to load it there, 
>> offloading that part of the workload), then deleting the data from 
>> the main DB. I could capture the rows dumped in a control table and 
>> compare the rows deleted against that and then rollback the delete if 
>> necessary.
>>
>> Like I said, not elegant, but I don't want to risk losing data that 
>> wasn't successfully archived to the archive DB. I'm very interested 
>> to hear what others might be doing for tasks like this.
>
> It might not be elegant, but a COPY / DELETE / LOAD is granular, so 
> you can restart at any point.
>
>
I might be inclined to COPY/LOAD/check/DELETE

Re: Archiving Data to Another DB?

From

Adrian Klaver

Date:

11 April 2018, 20:58:10

On 04/11/2018 09:15 AM, Don Seiler wrote:
> Let's say I have two DBs: main (9.6.6) and archive (10.2).
> 
> I have a table in main where I want to archive data older then 60 days. 
> For various reasons, the table is not partitioned, so for now we must 
> use DELETE. The destination table in the archive DB is partitioned with 
> the new Pg10 partitioning.
> 
> My initial plan was to have a stored procedure on the archive DB use 
> postgres_fdw to do an INSERT INTO / SELECT to select the data remotely 
> and insert into the local archive table. It would then issue a single 
> DELETE command to remotely remove the data from the main DB. However I 
> found that doing this resulted in the main DB calling thousands (perhaps 
> millions if it's one-per-row) of individual DELETE statements based on a 
> ctid column. Aside from WAL behavior concerns, it is flooding my 
> postgresql server logs since I log any DML.
> 
> On top of that, I'm told that a remote DELETE wouldn't be transactional, 
> so if I were to compare inserted rows vs deleted rows and found a 
> mismatch, I couldn't just rollback the DELETE. I plan to verify this 
> with a small test case later but for now I'll assume this to be true.

That would be worth testing:

https://www.postgresql.org/docs/9.6/static/postgres-fdw.html

"F.33.3. Transaction Management

During a query that references any remote tables on a foreign server, 
postgres_fdw opens a transaction on the remote server if one is not 
already open corresponding to the current local transaction. The remote 
transaction is committed or aborted when the local transaction commits 
or aborts. Savepoints are similarly managed by creating corresponding 
remote savepoints.

..."

> 
> Right now I'm thinking of falling back to the far-less-elegant method of 
> dumping the data to a flat file via COPY, running psql to connect to the 
> archive DB remotely and running a COPY to load the data (or maybe 
> transferring the flat file to the archive DB to load it there, 
> offloading that part of the workload), then deleting the data from the 
> main DB. I could capture the rows dumped in a control table and compare 
> the rows deleted against that and then rollback the delete if necessary.

I may be missing something, but why not reverse your original set up?
Assuming transactional behavior works as expected something like:

1) Setup postgres_fdw in main database.

2) Create FOREIGN TABLE pointing to table in archive database.

3) INSERT INTO/SELECT from main table to archive table.

4) DELETE FROM main table.

> 
> Like I said, not elegant, but I don't want to risk losing data that 
> wasn't successfully archived to the archive DB. I'm very interested to 
> hear what others might be doing for tasks like this.
> 
> Don.
> 
> -- 
> Don Seiler
> www.seiler.us <http://www.seiler.us>


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Archiving Data to Another DB?

From

Don Seiler

Date:

11 April 2018, 21:13:31

On Wed, Apr 11, 2018 at 12:58 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

"F.33.3. Transaction Management

During a query that references any remote tables on a foreign server, postgres_fdw opens a transaction on the remote server if one is not already open corresponding to the current local transaction. The remote transaction is committed or aborted when the local transaction commits or aborts. Savepoints are similarly managed by creating corresponding remote savepoints.

..."

Interesting, I'll work on a test case later!

I may be missing something, but why not reverse your original set up?
Assuming transactional behavior works as expected something like:

1) Setup postgres_fdw in main database.

2) Create FOREIGN TABLE pointing to table in archive database.

3) INSERT INTO/SELECT from main table to archive table.

4) DELETE FROM main table.

I had considered this as well, as this would allow me to rollback the delete (assuming my intel on postgres_fdw transactions was correct, which it may not be after all). I wondered if a remote insert would be broken up into individual inserts like the remote delete was, as that would be equally unappealing for the same reasons. But obviously worth confirming.

Don.

Don Seiler
www.seiler.us

Re: Archiving Data to Another DB?

From

Adrian Klaver

Date:

11 April 2018, 21:33:46

On 04/11/2018 11:13 AM, Don Seiler wrote:
> On Wed, Apr 11, 2018 at 12:58 PM, Adrian Klaver 
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     "F.33.3. Transaction Management
> 
>     During a query that references any remote tables on a foreign
>     server, postgres_fdw opens a transaction on the remote server if one
>     is not already open corresponding to the current local transaction.
>     The remote transaction is committed or aborted when the local
>     transaction commits or aborts. Savepoints are similarly managed by
>     creating corresponding remote savepoints.
> 
>     ..."
> 
> 
> Interesting, I'll work on a test case later!
> 
>     I may be missing something, but why not reverse your original set up?
>     Assuming transactional behavior works as expected something like:
> 
>     1) Setup postgres_fdw in main database.
> 
>     2) Create FOREIGN TABLE pointing to table in archive database.
> 
>     3) INSERT INTO/SELECT from main table to archive table.
> 
>     4) DELETE FROM main table.
> 
> 
> I had considered this as well, as this would allow me to rollback the 
> delete (assuming my intel on postgres_fdw transactions was correct, 
> which it may not be after all). I wondered if a remote insert woultd be 
> broken up into individual inserts like the remote delete was, as that 
> would be equally unappealing for the same reasons. But obviously worth 
> confirming.

A test case here confirms it sends individual INSERTS:

test_(postgres)# insert into fdw_test_table select * from fdw_test;
INSERT 0 3

Where fdw_test_table is the remote table and fdw_test is the local one.

postgres-2018-04-11 11:29:23.812 PDT-0LOG:  statement: insert into 
fdw_test_table select * from fdw_test;
postgres-2018-04-11 11:29:23.812 PDT-0LOG:  execute pgsql_fdw_prep_2: 
INSERT INTO public.fdw_test_rcv(id, fld_1, fld_2) VALUES ($1, $2, $3)
postgres-2018-04-11 11:29:23.812 PDT-0DETAIL:  parameters: $1 = '1', $2 
= 'one', $3 = 't'
postgres-2018-04-11 11:29:23.813 PDT-10140LOG:  execute 
pgsql_fdw_prep_2: INSERT INTO public.fdw_test_rcv(id, fld_1, fld_2) 
VALUES ($1, $2, $3)
postgres-2018-04-11 11:29:23.813 PDT-10140DETAIL:  parameters: $1 = '2', 
$2 = 'two', $3 = 'f'
postgres-2018-04-11 11:29:23.813 PDT-10140LOG:  execute 
pgsql_fdw_prep_2: INSERT INTO public.fdw_test_rcv(id, fld_1, fld_2) 
VALUES ($1, $2, $3)
postgres-2018-04-11 11:29:23.813 PDT-10140DETAIL:  parameters: $1 = '3', 
$2 = 'three', $3 = 'f'

So much for that idea(:

> 
> Don.
> 
> -- 
> Don Seiler
> www.seiler.us <http://www.seiler.us>


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Archiving Data to Another DB?

From

Don Seiler

Date:

11 April 2018, 21:33:53

On Wed, Apr 11, 2018 at 1:13 PM, Don Seiler <don@seiler.us> wrote:

I had considered this as well, as this would allow me to rollback the delete (assuming my intel on postgres_fdw transactions was correct, which it may not be after all). I wondered if a remote insert would be broken up into individual inserts like the remote delete was, as that would be equally unappealing for the same reasons. But obviously worth confirming.

I have now confirmed that a remote INSERT does get broken up into a single INSERT for each row.

Don Seiler
www.seiler.us

Re: Archiving Data to Another DB?

From

Don Seiler

Date:

11 April 2018, 21:38:39

On Wed, Apr 11, 2018 at 1:33 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

A test case here confirms it sends individual INSERTS:

test_(postgres)# insert into fdw_test_table select * from fdw_test;
INSERT 0 3

Where fdw_test_table is the remote table and fdw_test is the local one.

postgres-2018-04-11 11:29:23.812 PDT-0LOG: statement: insert into fdw_test_table select * from fdw_test;
postgres-2018-04-11 11:29:23.812 PDT-0LOG: execute pgsql_fdw_prep_2: INSERT INTO public.fdw_test_rcv(id, fld_1, fld_2) VALUES ($1, $2, $3)
postgres-2018-04-11 11:29:23.812 PDT-0DETAIL: parameters: $1 = '1', $2 = 'one', $3 = 't'
postgres-2018-04-11 11:29:23.813 PDT-10140LOG: execute pgsql_fdw_prep_2: INSERT INTO public.fdw_test_rcv(id, fld_1, fld_2) VALUES ($1, $2, $3)
postgres-2018-04-11 11:29:23.813 PDT-10140DETAIL: parameters: $1 = '2', $2 = 'two', $3 = 'f'
postgres-2018-04-11 11:29:23.813 PDT-10140LOG: execute pgsql_fdw_prep_2: INSERT INTO public.fdw_test_rcv(id, fld_1, fld_2) VALUES ($1, $2, $3)
postgres-2018-04-11 11:29:23.813 PDT-10140DETAIL: parameters: $1 = '3', $2 = 'three', $3 = 'f'

So much for that idea(:=

Yeah, I saw the same with a 132 row insert. Now imagine that with a monthly 50 million row insert or delete. :p Thanks for the confirmation!

I'm definitely leaning towards the copy/load/delete method.

Don.

Don Seiler
www.seiler.us

Re: Archiving Data to Another DB?

From

Don Seiler

Date:

11 April 2018, 21:52:58

On Wed, Apr 11, 2018 at 1:38 PM, Don Seiler <don@seiler.us> wrote:

Yeah, I saw the same with a 132 row insert. Now imagine that with a monthly 50 million row insert or delete. :p Thanks for the confirmation!

I went back to look at the postgres logs on my dev server. These logs are rotated once they hit 2G in size. One typical log from the middle of my test last night hit that in 13 minutes and had over 5.2 million DELETE calls. There a quite a few logs like this. That would not be fun for the disk space on the log volume, either.

Don.

Don Seiler
www.seiler.us

RE: Archiving Data to Another DB?

From

"Kumar, Virendra"

Date:

11 April 2018, 21:59:16

Does this apply to SELECT calls as well or only for DMLs.

I am planning to use postgres_fdw but if it is going by one row at a time there will be a lot of round trip and defeat the purpose.

Regards,

Virendra.

From: Don Seiler [mailto:don@seiler.us]
Sent: Wednesday, April 11, 2018 2:53 PM
To: Adrian Klaver
Cc: pgsql-general@postgresql.org
Subject: Re: Archiving Data to Another DB?

On Wed, Apr 11, 2018 at 1:38 PM, Don Seiler <don@seiler.us> wrote:

Yeah, I saw the same with a 132 row insert. Now imagine that with a monthly 50 million row insert or delete. :p Thanks for the confirmation!

Don.

Don Seiler
www.seiler.us

This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the message
and its attachments and notify the sender immediately. Thank you.

Re: Archiving Data to Another DB?

From

Adrian Klaver

Date:

11 April 2018, 23:06:54

On 04/11/2018 11:59 AM, Kumar, Virendra wrote:
> Does this apply to SELECT calls as well or only for DMLs.

Easy enough to test:

test=# \d projection

Foreign table "public.projection"
   ...

Server: fdw_test_server
FDW options: (schema_name 'public', table_name 'projection')

test=# select count(*) from projection ;
  count
-------
  28430
(1 row)

With log_statement = 'all':

select * from projection ;

postgres-2018-04-11 13:04:33.871 PDT-0LOG:  statement: select * from 
projection ;
postgres-2018-04-11 13:04:33.872 PDT-0LOG:  statement: START TRANSACTION 
ISOLATION LEVEL REPEATABLE READ
postgres-2018-04-11 13:04:33.872 PDT-0LOG:  execute <unnamed>: DECLARE 
c1 CURSOR FOR
         SELECT line_id, p_item_no, c_id, method, year, qty, sub_method, 
proj_note, item_key, pot_ct, trial, ts_insert, ts_update, user_insert, 
user_update, link_key, v_number FROM public.projection
postgres-2018-04-11 13:04:33.872 PDT-0LOG:  statement: FETCH 100 FROM c1
postgres-2018-04-11 13:04:33.873 PDT-0LOG:  statement: FETCH 100 FROM c1
postgres-2018-04-11 13:04:33.874 PDT-0LOG:  statement: FETCH 100 FROM c1
postgres-2018-04-11 13:04:33.875 PDT-0LOG:  statement: FETCH 100 FROM c1
postgres-2018-04-11 13:04:33.876 PDT-0LOG:  statement: FETCH 100 FROM c1
postgres-2018-04-11 13:04:33.877 PDT-0LOG:  statement: FETCH 100 FROM c1
postgres-2018-04-11 13:04:33.878 PDT-0LOG:  statement: FETCH 100 FROM c1
postgres-2018-04-11 13:04:33.879 PDT-0LOG:  statement: FETCH 100 FROM c1
postgres-2018-04-11 13:04:33.880 PDT-0LOG:  statement: FETCH 100 FROM c1
postgres-2018-04-11 13:04:33.881 PDT-0LOG:  statement: FETCH 100 FROM c1
postgres-2018-04-11 13:04:33.882 PDT-0LOG:  statement: FETCH 100 FROM c1

...

test=# explain analyse select * from projection ;
                                                       QUERY PLAN 

----------------------------------------------------------------------------------------------------------------------
  Foreign Scan on projection  (cost=100.00..115.34 rows=178 width=435) 
(actual time=0.844..163.493 rows=28430 loops=1)
  Planning time: 0.077 ms
  Execution time: 164.735 ms
(3 rows)


> 
> I am planning to use postgres_fdw but if it is going by one row at a 
> time there will be a lot of round trip and defeat the purpose.
> 
> Regards,
> 
> Virendra.
> 
> *From:*Don Seiler [mailto:don@seiler.us]
> *Sent:* Wednesday, April 11, 2018 2:53 PM
> *To:* Adrian Klaver
> *Cc:* pgsql-general@postgresql.org
> *Subject:* Re: Archiving Data to Another DB?
> 
> On Wed, Apr 11, 2018 at 1:38 PM, Don Seiler <don@seiler.us 
> <mailto:don@seiler.us>> wrote:
> 
> Yeah, I saw the same with a 132 row insert. Now imagine that with a 
> monthly 50 million row insert or delete. :p Thanks for the confirmation!
> 
> I went back to look at the postgres logs on my dev server. These logs 
> are rotated once they hit 2G in size. One typical log from the middle of 
> my test last night hit that in 13 minutes and had over 5.2 million 
> DELETE calls. There a quite a few logs like this. That would not be fun 
> for the disk space on the log volume, either.
> 
> Don.
> 
> -- 
> 
> Don Seiler
> www.seiler.us <http://www.seiler.us>
> 
> 
> ------------------------------------------------------------------------
> 
> This message is intended only for the use of the addressee and may contain
> information that is PRIVILEGED AND CONFIDENTIAL.
> 
> If you are not the intended recipient, you are hereby notified that any
> dissemination of this communication is strictly prohibited. If you have
> received this communication in error, please erase all copies of the message
> and its attachments and notify the sender immediately. Thank you.


-- 
Adrian Klaver
adrian.klaver@aklaver.com