Thread: Schedule pg_repack job with pg_cron
To schedule a pg_repack job using pg_cron on an Amazon RDS for PostgreSQL instance, you need to follow a few steps. However, it’s important to note that pg_cron is only supported on certain versions of Amazon RDS, and pg_repack is also a separate extension that must be installed and enabled.
Here’s a general guide on how you can set this up:
Prerequisites
- Check Compatibility: Ensure that your RDS instance supports pg_cron and pg_repack. Both extensions need to be available and supported by the RDS version you are using.
- Install pg_repack: Ensure pg_repack is installed on your RDS instance. You can install it via the AWS Management Console if it's available for your PostgreSQL version. For example:
- Go to RDS Dashboard -> Parameter Groups.
- Modify the parameter group associated with your RDS instance to add pg_repack to the shared_preload_libraries parameter.
- Enable pg_cron: Similarly, ensure that pg_cron is enabled. You may need to add pg_cron to the shared_preload_libraries in your parameter group and reboot your instance.
Setting Up the Job
- Create the pg_cron Extension: First, create the pg_cron extension in your database:
sql
Copy code
CREATE EXTENSION IF NOT EXISTS pg_cron;
- Create the pg_repack Extension: Similarly, create the pg_repack extension:
sql
Copy code
CREATE EXTENSION IF NOT EXISTS pg_repack;
- Schedule a pg_repack Job: You can schedule a job using pg_cron to run pg_repack. Here is an example of how to set up a weekly job:
sql
Copy code
SELECT cron.schedule(
'weekly_repack',
'0 3 * * 0', -- This runs every Sunday at 3 AM
$$
SELECT pg_repack.repack_database();
$$);
In this example, pg_repack.repack_database() is called to reorganize the database. Adjust the schedule expression (0 3 * * 0) as needed to fit your desired schedule.
Considerations
- Permissions: Ensure the user running the pg_cron job has the necessary permissions to execute pg_repack.
- Performance Impact: Running pg_repack can be resource-intensive. Schedule the job during off-peak hours to minimize the impact on your database.
- Testing: Before scheduling the job, test pg_repack on a non-production instance to ensure it behaves as expected.
- Monitoring: Monitor the job to ensure it completes successfully and troubleshoot any issues that arise.
By setting up pg_repack with pg_cron on Amazon RDS for PostgreSQL, you can automate the process of reclaiming disk space and improving database performance without significant downtime.
Thanks!
Jake
HiPlease share the details for pg_repack job schedule with pg_cron from RDS postgres database instance level.ThanksSathishreddy
On Aug 7, 2024, at 9:24 AM, jacob ndinkwa <jndinkwa@gmail.com> wrote:To schedule a pg_repack job using pg_cron on an Amazon RDS for PostgreSQL instance, you need to follow a few steps. However, it’s important to note that pg_cron is only supported on certain versions of Amazon RDS, and pg_repack is also a separate extension that must be installed and enabled.
On Aug 7, 2024, at 9:24 AM, jacob ndinkwa <jndinkwa@gmail.com> wrote:To schedule a pg_repack job using pg_cron on an Amazon RDS for PostgreSQL instance, you need to follow a few steps. However, it’s important to note that pg_cron is only supported on certain versions of Amazon RDS, and pg_repack is also a separate extension that must be installed and enabled.
Is scheduling pg_repack just a bad idea and just introducing just more bloat? Why not just tune auto vacuum?80/20 rule… most schemas are going to have their large/hot tables, etc and data has a natural life cycle. If you have a heathy application then bloat is not an issue as free space is used by new tuples. Each database has a data flow to it depending on the maturity and nature of the application/database. Exiting tuples make room for new tuples, etc.If your have to vacuum full / pg_repack your tables on a scheduled bases then I think there is something very wrong with your application.Pg_repack will do more harm in the long run. i.e. the entire time pg_repack is running xmin is frozen thus creating more bloat everywhere else!Bloat is overrated; especially in a transaction system where all your data access patterns should be well defined and not doing full table scans. Just focus on identifying bloated indexes periodically and rebuilding those. There should be no need to vacuum full tables under normal circumstances.
On Aug 7, 2024, at 3:39 PM, Ron Johnson <ronljohnsonjr@gmail.com> wrote:On Wed, Aug 7, 2024 at 3:29 PM Rui DeSousa <rui.desousa@icloud.com> wrote:On Aug 7, 2024, at 9:24 AM, jacob ndinkwa <jndinkwa@gmail.com> wrote:To schedule a pg_repack job using pg_cron on an Amazon RDS for PostgreSQL instance, you need to follow a few steps. However, it’s important to note that pg_cron is only supported on certain versions of Amazon RDS, and pg_repack is also a separate extension that must be installed and enabled.
Is scheduling pg_repack just a bad idea and just introducing just more bloat? Why not just tune auto vacuum?80/20 rule… most schemas are going to have their large/hot tables, etc and data has a natural life cycle. If you have a heathy application then bloat is not an issue as free space is used by new tuples. Each database has a data flow to it depending on the maturity and nature of the application/database. Exiting tuples make room for new tuples, etc.If your have to vacuum full / pg_repack your tables on a scheduled bases then I think there is something very wrong with your application.Pg_repack will do more harm in the long run. i.e. the entire time pg_repack is running xmin is frozen thus creating more bloat everywhere else!Bloat is overrated; especially in a transaction system where all your data access patterns should be well defined and not doing full table scans. Just focus on identifying bloated indexes periodically and rebuilding those. There should be no need to vacuum full tables under normal circumstances.Part of a properly-maintained system is regularly archive/purging (whether that be dropping date-based partitions, or deleting old data from unpartitioned tables or tables partitioned by something other than a date).For example, I gave a list of tables (all intertwined via FK constraints) to the application support people, and they returned the list stating how many weeks or months of data to retain in each table. Every Saturday night a cron job goes through and deletes the old data from, and then "manually" vacuum-analyzes them.No bloat...--Death to America, and butter sauce.Iraq lobster!
Ron Johnson schrieb am 07.08.2024 um 21:39: > Part of a properly-maintained system is *regularly* archive/purging > (whether that be dropping date-based partitions, or deleting old > data from unpartitioned tables or tables partitioned by something > other than a date). > > For example, I gave a list of tables (all intertwined via FK > constraints) to the application support people, and they returned > the list stating how many weeks or months of data to retain in each > table. Every Saturday night a cron job goes through and deletes the > old data from, and then "manually" vacuum-analyzes them. If the application will then insert new data after the cleanup, Postgres will re-use the free space that the delete "created". So depending on the speed of inserts, you might not really gain that much.
Ron Johnson schrieb am 07.08.2024 um 21:39:
> Part of a properly-maintained system is *regularly* archive/purging
> (whether that be dropping date-based partitions, or deleting old
> data from unpartitioned tables or tables partitioned by something
> other than a date).
>
> For example, I gave a list of tables (all intertwined via FK
> constraints) to the application support people, and they returned
> the list stating how many weeks or months of data to retain in each
> table. Every Saturday night a cron job goes through and deletes the
> old data from, and then "manually" vacuum-analyzes them.
If the application will then insert new data after the cleanup, Postgres
will re-use the free space that the delete "created". So depending
on the speed of inserts, you might not really gain that much.
Ron Johnson schrieb am 08.08.2024 um 12:26: >> Part of a properly-maintained system is *regularly* archive/ >> purging (whether that be dropping date-based partitions, or >> deleting old data from unpartitioned tables or tables partitioned >> by something other than a date). >> >> For example, I gave a list of tables (all intertwined via FK >> constraints) to the application support people, and they returned >> the list stating how many weeks or months of data to retain in >> each table. Every Saturday night a cron job goes through and >> deletes the old data from, and then "manually" vacuum-analyzes >> them. > > > If the application will then insert new data after the cleanup, > Postgres will re-use the free space that the delete "created". So > depending on the speed of inserts, you might not really gain that > much. > > > Or did you think that I do a VACUUM FULL on those tables? (No; I > definitely don't do that, though I /occasionally/ CLUSTER /some/ of > the tables to make range queries more efficient.) Sorry, I misread your post and was indeed thinking about VACUUM FULL as pg_repack is an alternative to that.