Home > mailing lists

Re: Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL - Mailing list pgsql-admin

From	Ron Johnson
Subject	Re: Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL
Date	May 30 18:29:34
Msg-id	CANzqJaA6B7XCyqxXFfdZMYTN5GNagHBdgEzbqwcti16N9wfcDA@mail.gmail.com Whole thread Raw
In response to	Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL (Motog Plus <mplus7535@gmail.com>)
List	pgsql-admin

Tree view

On Fri, May 30, 2025 at 3:51 AM Motog Plus <mplus7535@gmail.com> wrote:

Hi Team,

We are currently planning a data archival initiative for our production PostgreSQL databases and would appreciate suggestions or insights from the community regarding best practices and proven approaches.

**Scenario:**
- We have a few large tables (several hundred million rows) where we want to archive historical data (e.g., older than 1 year).
- The archived data should be moved to a separate PostgreSQL database (on a same or different server).
- Our goals are: efficient data movement, minimal downtime, and safe deletion from the source after successful archival.

- PostgreSQL version: 15.12
- Both source and target databases are PostgreSQL.

We explored using `COPY TO` and `COPY FROM` with CSV files, uploaded to a SharePoint or similar storage system. However, our infrastructure team raised concerns around the computational load of large CSV processing and potential security implications with file transfers.

We’d like to understand:
- What approaches have worked well for you in practice?

This is how I migrated 6TB of data from an Oracle database to Postgresql, and then implemented quarterly archiving of the PG database:

- COPY FROM (SELECT * FROM live_table WHERE date_fld in some_manageable_date_range) TO STDOUT.

- Compress

- scp

- COPY TO archive_table.

- Index

- DELETE FROM live_table WHERE date_fld in some_manageable_date_range (This I only did in the PG archive process

(Naturally, the Oracle migration used Oracle-specific commands.)

- Are there specific tools or strategies you’d recommend for ongoing archival?

I write generic bash loops to which you pass an array that contains the table name, PK, date column and date range.

Given a list of tables, it did the COPY FROM, lz4 and scp. Once that finished successfully, another script dropped archive indices on the current table, COPY TO and CREATE INDEX statements. A third script did the deletes.

This works even when the live database tables are all connected via FK. You just need to carefully order the tables in your script.

- Any performance or consistency issues we should watch out for?

My rules for scripting are "bite-sized pieces" and "check those return codes!".

Your insights or any relevant documentation/pointers would be immensely helpful.

Index support uber alles. When deleting from a table which relies on a foreign key link to a table which _does_ have a date field, don't hesitate to join on that table.

And DELETE of bite-sized chunks is faster than people give it credit for.

Death to <Redacted>, and butter sauce.

Don't boil me, I'm still alive.

<Redacted> lobster!

pgsql-admin by date:

From: Ron Johnson
Date: 30 May, 18:02:23
Subject: Re: pg_dump verbose start and stop times?

From: Achilleas Mantzios
Date: 31 May, 23:43:52
Subject: PostgreSQL 16.6 , query stuck with STAT Ssl, wait_event_type : IPC , wait_event : ParallelFinish

Re: Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL - Mailing list pgsql-admin

Previous

Next