Thread: PostgreSQL historical database

PostgreSQL historical database

From
Erik Serrano
Date:
Dear,

Along with greetings, I would like to ask if there is any product, way (architecture), system that allows me to create a historical database from a main transactional database in PostgreSQL.

I thank you in advance for any contributions that help me to approach this new project.

Thank you very much, Guys,
Regards



Erik R. Serrano Saavedra
      Data Base Administrator
 

Re: PostgreSQL historical database

From
Samed YILDIRIM
Date:
Hello Erik,

It is not very clear for me what you are looking for. But, pg_bitemporal may be answer for you. I recommend to you to check the repository below. If this is not what you want, can you elaborate a little more?


Best regards.
Samed YILDIRIM

On Tue, 5 Nov 2024, 17:31 Erik Serrano, <eserranos@gmail.com> wrote:
Dear,

Along with greetings, I would like to ask if there is any product, way (architecture), system that allows me to create a historical database from a main transactional database in PostgreSQL.

I thank you in advance for any contributions that help me to approach this new project.

Thank you very much, Guys,
Regards



Erik R. Serrano Saavedra
      Data Base Administrator
 

Re: PostgreSQL historical database

From
Ron Johnson
Date:
On Tue, Nov 5, 2024 at 10:31 AM Erik Serrano <eserranos@gmail.com> wrote:
Dear,

Along with greetings, I would like to ask if there is any product, way (architecture), system that allows me to create a historical database from a main transactional database in PostgreSQL.

Are you asking about how to "detach" a table from one database, and then attach it to another database?

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> crustacean!

PostgreSQL historical database

From
Thiemo Kellner
Date:
Hi

I am not sure what the purpose of yours is. To the best of my knowledge, transactions have no validity duration and are immutable such that historicisation use useless.

Maybe it would help to know your goals in more detail.

Kind regards

Thiemo

Re: PostgreSQL historical database

From
Erik Serrano
Date:

Dear Sirs,

I'll tell you a little about what I need. Normally, during the day, records are made or recorded in the main database, which at the end of the day are consolidated (accounting closings) and are recorded in the database. In order not to make the main database grow without measure (which will only maintain the range between 3 months to 1 year). For this reason, this data must be transferred to another database so that it lasts over time and can be consulted by other areas. (This action is done humanly every day of the year at the end of the day)
Therefore, the project seeks to be able to carry out this extraction of the consolidated data to another database, but automatically.

I was thinking of doing this with some triggers or with jobs that allow me to carry out these actions. I also thought of creating a replication of only the consolidated tables to the new historical database server, but I have not yet defined the method.

That's why I need to know if there is a tool that allows me to create this database.

I hope this clarifies a little the scope of the new historical database.

Thank you very much in advance
Regards


Erik R. Serrano Saavedra
      Data Base Administrator



El mar, 5 nov 2024 a las 12:37, Samed YILDIRIM (<samed@reddoc.net>) escribió:
Hello Erik,

It is not very clear for me what you are looking for. But, pg_bitemporal may be answer for you. I recommend to you to check the repository below. If this is not what you want, can you elaborate a little more?


Best regards.
Samed YILDIRIM

On Tue, 5 Nov 2024, 17:31 Erik Serrano, <eserranos@gmail.com> wrote:
Dear,

Along with greetings, I would like to ask if there is any product, way (architecture), system that allows me to create a historical database from a main transactional database in PostgreSQL.

I thank you in advance for any contributions that help me to approach this new project.

Thank you very much, Guys,
Regards



Erik R. Serrano Saavedra
      Data Base Administrator
 

Re: PostgreSQL historical database

From
Ron Johnson
Date:
On Tue, Nov 5, 2024 at 12:30 PM Erik Serrano <eserranos@gmail.com> wrote:

Dear Sirs,

I'll tell you a little about what I need. Normally, during the day, records are made or recorded in the main database, which at the end of the day are consolidated (accounting closings) and are recorded in the database. In order not to make the main database grow without measure (which will only maintain the range between 3 months to 1 year). For this reason, this data must be transferred to another database so that it lasts over time and can be consulted by other areas. (This action is done humanly every day of the year at the end of the day)
Therefore, the project seeks to be able to carry out this extraction of the consolidated data to another database, but automatically.

I was thinking of doing this with some triggers or with jobs that allow me to carry out these actions. I also thought of creating a replication of only the consolidated tables to the new historical database server, but I have not yet defined the method.

That's why I need to know if there is a tool that allows me to create this database.

psql in a bash script, run in a cron job.  I wrote something quite similar that runs every month to:
1. copy old records into CSV files,
2. move them to S3 buckets,
3. delete the rows, 
4. VACUUM and ANALYZE the tables.

If there are FK relationships, you'll have to determine the proper order in which to delete them, but that's relatively easy and just has to be done once.

(The only difference is that your Step 2 would be "copy them into the new database".)

Depending on the number of rows per year, table partitioning might also simplify things.  Or it could add complexity and slow things down; you'll have to test that yourself.


I hope this clarifies a little the scope of the new historical database.

Thank you very much in advance
Regards


Erik R. Serrano Saavedra
      Data Base Administrator



El mar, 5 nov 2024 a las 12:37, Samed YILDIRIM (<samed@reddoc.net>) escribió:
Hello Erik,

It is not very clear for me what you are looking for. But, pg_bitemporal may be answer for you. I recommend to you to check the repository below. If this is not what you want, can you elaborate a little more?


Best regards.
Samed YILDIRIM

On Tue, 5 Nov 2024, 17:31 Erik Serrano, <eserranos@gmail.com> wrote:
Dear,

Along with greetings, I would like to ask if there is any product, way (architecture), system that allows me to create a historical database from a main transactional database in PostgreSQL.

I thank you in advance for any contributions that help me to approach this new project.

Thank you very much, Guys,
Regards



Erik R. Serrano Saavedra
      Data Base Administrator
 


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> crustacean!

Re: PostgreSQL historical database

From
Keith Fiske
Date:


On Tue, Nov 5, 2024 at 12:30 PM Erik Serrano <eserranos@gmail.com> wrote:

Dear Sirs,

I'll tell you a little about what I need. Normally, during the day, records are made or recorded in the main database, which at the end of the day are consolidated (accounting closings) and are recorded in the database. In order not to make the main database grow without measure (which will only maintain the range between 3 months to 1 year). For this reason, this data must be transferred to another database so that it lasts over time and can be consulted by other areas. (This action is done humanly every day of the year at the end of the day)
Therefore, the project seeks to be able to carry out this extraction of the consolidated data to another database, but automatically.

I was thinking of doing this with some triggers or with jobs that allow me to carry out these actions. I also thought of creating a replication of only the consolidated tables to the new historical database server, but I have not yet defined the method.

That's why I need to know if there is a tool that allows me to create this database.

I hope this clarifies a little the scope of the new historical database.

Thank you very much in advance
Regards


Erik R. Serrano Saavedra
      Data Base Administrator



I would first recommend looking into partitioning for managing data retention like this. As Ron says, you'll want to look into the performance implications of this, but it allows for the most efficient method of removing old data from PostgreSQL and is typically worth the overhead costs. Otherwise you're dealing with potentially expensive deletion operations and managing bloat vs just detaching/dropping a table.

From there, you can decide how to process the table to move it to another database. The simplest method would be to pg_dump it from the old database, drop it, then restore it to the "archive" database. 

For time-based partitioning, pg_partman is a common tool that is used. It also includes features to help manage retention as well as reliably dumping out old tables to either be archived or used elsewhere.

https://github.com/pgpartman/pg_partman

--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com

RE: PostgreSQL historical database

From
Date:

Hi,

 

For your purpose, you should study using partitioning and postgres_fdw.

 

Regards

 

Michel SALAIS

De : Erik Serrano <eserranos@gmail.com>
Envoyé : mardi 5 novembre 2024 18:31
À : Samed YILDIRIM <samed@reddoc.net>
Cc : Pgsql-admin <pgsql-admin@lists.postgresql.org>; pgsql-admin <pgsql-admin@postgresql.org>
Objet : Re: PostgreSQL historical database

 

 

Dear Sirs,

I'll tell you a little about what I need. Normally, during the day, records are made or recorded in the main database, which at the end of the day are consolidated (accounting closings) and are recorded in the database. In order not to make the main database grow without measure (which will only maintain the range between 3 months to 1 year). For this reason, this data must be transferred to another database so that it lasts over time and can be consulted by other areas. (This action is done humanly every day of the year at the end of the day)
Therefore, the project seeks to be able to carry out this extraction of the consolidated data to another database, but automatically.

I was thinking of doing this with some triggers or with jobs that allow me to carry out these actions. I also thought of creating a replication of only the consolidated tables to the new historical database server, but I have not yet defined the method.

That's why I need to know if there is a tool that allows me to create this database.

I hope this clarifies a little the scope of the new historical database.

Thank you very much in advance
Regards

 

 

Erik R. Serrano Saavedra

      Data Base Administrator

 

 

 

El mar, 5 nov 2024 a las 12:37, Samed YILDIRIM (<samed@reddoc.net>) escribió:

Hello Erik,

 

It is not very clear for me what you are looking for. But, pg_bitemporal may be answer for you. I recommend to you to check the repository below. If this is not what you want, can you elaborate a little more?

 

 

Best regards.
Samed YILDIRIM

 

On Tue, 5 Nov 2024, 17:31 Erik Serrano, <eserranos@gmail.com> wrote:

Dear,

Along with greetings, I would like to ask if there is any product, way (architecture), system that allows me to create a historical database from a main transactional database in PostgreSQL.

I thank you in advance for any contributions that help me to approach this new project.

Thank you very much, Guys,
Regards


 

Erik R. Serrano Saavedra

      Data Base Administrator

 

Re: PostgreSQL historical database

From
Ron Johnson
Date:
On Tue, Nov 5, 2024 at 1:15 PM Keith Fiske <keith.fiske@crunchydata.com> wrote:
[snip] 
I would first recommend looking into partitioning for managing data retention like this. As Ron says, you'll want to look into the performance implications of this, but it allows for the most efficient method of removing old data from PostgreSQL and is typically worth the overhead costs. Otherwise you're dealing with potentially expensive deletion operations and managing bloat vs just detaching/dropping a table.
 
Am I the only person to have ever gotten good DELETE performance from Postgresql?

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: PostgreSQL historical database

From
Rui DeSousa
Date:


On Nov 5, 2024, at 1:15 PM, Keith Fiske <keith.fiske@crunchydata.com> wrote:

I would first recommend looking into partitioning for managing data retention like this.

I don’t think you need partitioning.  The is so many ways to slice this problem.  

You need some form of logical replication.  I’ve used both of these methods in the past to do exactly the issue at hand.  These are not the only solutions.

1, Queue tables via triggers and then asynchronous replicate those transaction to the archive database in real time. 
2. Messages queues; Modern application make use of massage queues; Just attach the archive database to the same message queue.

I don’t see a need for partitioning.  I managed a Muti-terabyte content delivery platform where the data was only licensed between 30 to 90 days.  Records where being purged more or less just as fast as new records were being created.  In essence the database was completely reloaded with new data every 30-90 days depending on the data stream.  We did no use any form of partitioned tables and had no performance issues.

Re: PostgreSQL historical database

From
Erik Serrano
Date:

Muchas gracias, Michel .  Saludos

El vie, 15 nov 2024, 19:52, Rui DeSousa <rui@crazybean.net> escribió:


On Nov 5, 2024, at 1:15 PM, Keith Fiske <keith.fiske@crunchydata.com> wrote:

I would first recommend looking into partitioning for managing data retention like this.

I don’t think you need partitioning.  The is so many ways to slice this problem.  

You need some form of logical replication.  I’ve used both of these methods in the past to do exactly the issue at hand.  These are not the only solutions.

1, Queue tables via triggers and then asynchronous replicate those transaction to the archive database in real time. 
2. Messages queues; Modern application make use of massage queues; Just attach the archive database to the same message queue.

I don’t see a need for partitioning.  I managed a Muti-terabyte content delivery platform where the data was only licensed between 30 to 90 days.  Records where being purged more or less just as fast as new records were being created.  In essence the database was completely reloaded with new data every 30-90 days depending on the data stream.  We did no use any form of partitioned tables and had no performance issues.