Home > mailing lists

Thread: Schedule pg_repack job with pg_cron

Schedule pg_repack job with pg_cron

From

Sathish Reddy

Date:

07 August 2024, 08:53:10

Please share the details for pg_repack job schedule with pg_cron from RDS postgres database instance level.

Thanks

Sathishreddy

Re: Schedule pg_repack job with pg_cron

From

jacob ndinkwa

Date:

07 August 2024, 13:24:16

Hello Sathish,

To schedule a pg_repack job using pg_cron on an Amazon RDS for PostgreSQL instance, you need to follow a few steps. However, it’s important to note that pg_cron is only supported on certain versions of Amazon RDS, and pg_repack is also a separate extension that must be installed and enabled.

Here’s a general guide on how you can set this up:

Prerequisites

Check Compatibility: Ensure that your RDS instance supports pg_cron and pg_repack. Both extensions need to be available and supported by the RDS version you are using.
Install pg_repack: Ensure pg_repack is installed on your RDS instance. You can install it via the AWS Management Console if it's available for your PostgreSQL version. For example:
- Go to RDS Dashboard -> Parameter Groups.
- Modify the parameter group associated with your RDS instance to add pg_repack to the shared_preload_libraries parameter.
Enable pg_cron: Similarly, ensure that pg_cron is enabled. You may need to add pg_cron to the shared_preload_libraries in your parameter group and reboot your instance.

Setting Up the Job

Create the pg_cron Extension: First, create the pg_cron extension in your database:

sql

Copy code

CREATE EXTENSION IF NOT EXISTS pg_cron;

Create the pg_repack Extension: Similarly, create the pg_repack extension:

sql

Copy code

CREATE EXTENSION IF NOT EXISTS pg_repack;

Schedule a pg_repack Job: You can schedule a job using pg_cron to run pg_repack. Here is an example of how to set up a weekly job:

sql

Copy code

SELECT cron.schedule(

'weekly_repack',

'0 3 * * 0', -- This runs every Sunday at 3 AM

SELECT pg_repack.repack_database();

$$);

In this example, pg_repack.repack_database() is called to reorganize the database. Adjust the schedule expression (0 3 * * 0) as needed to fit your desired schedule.

Considerations

Permissions: Ensure the user running the pg_cron job has the necessary permissions to execute pg_repack.
Performance Impact: Running pg_repack can be resource-intensive. Schedule the job during off-peak hours to minimize the impact on your database.
Testing: Before scheduling the job, test pg_repack on a non-production instance to ensure it behaves as expected.
Monitoring: Monitor the job to ensure it completes successfully and troubleshoot any issues that arise.

By setting up pg_repack with pg_cron on Amazon RDS for PostgreSQL, you can automate the process of reclaiming disk space and improving database performance without significant downtime.

Thanks!

Jake

On Wed, Aug 7, 2024 at 4:53 AM Sathish Reddy <sathishreddy.postgresql@gmail.com> wrote:

Hi
Please share the details for pg_repack job schedule with pg_cron from RDS postgres database instance level.

Thanks
Sathishreddy

Re: Schedule pg_repack job with pg_cron

From

Rui DeSousa

Date:

07 August 2024, 19:29:15

On Aug 7, 2024, at 9:24 AM, jacob ndinkwa <jndinkwa@gmail.com> wrote:

To schedule a pg_repack job using pg_cron on an Amazon RDS for PostgreSQL instance, you need to follow a few steps. However, it’s important to note that pg_cron is only supported on certain versions of Amazon RDS, and pg_repack is also a separate extension that must be installed and enabled.

Is scheduling pg_repack just a bad idea and just introducing just more bloat? Why not just tune auto vacuum?

80/20 rule… most schemas are going to have their large/hot tables, etc and data has a natural life cycle. If you have a heathy application then bloat is not an issue as free space is used by new tuples. Each database has a data flow to it depending on the maturity and nature of the application/database. Exiting tuples make room for new tuples, etc.

If your have to vacuum full / pg_repack your tables on a scheduled bases then I think there is something very wrong with your application.

Pg_repack will do more harm in the long run. i.e. the entire time pg_repack is running xmin is frozen thus creating more bloat everywhere else!

Bloat is overrated; especially in a transaction system where all your data access patterns should be well defined and not doing full table scans. Just focus on identifying bloated indexes periodically and rebuilding those. There should be no need to vacuum full tables under normal circumstances.

Re: Schedule pg_repack job with pg_cron

From

Ron Johnson

Date:

07 August 2024, 19:39:33

On Wed, Aug 7, 2024 at 3:29 PM Rui DeSousa <rui.desousa@icloud.com> wrote:

On Aug 7, 2024, at 9:24 AM, jacob ndinkwa <jndinkwa@gmail.com> wrote:

To schedule a pg_repack job using pg_cron on an Amazon RDS for PostgreSQL instance, you need to follow a few steps. However, it’s important to note that pg_cron is only supported on certain versions of Amazon RDS, and pg_repack is also a separate extension that must be installed and enabled.

Is scheduling pg_repack just a bad idea and just introducing just more bloat? Why not just tune auto vacuum?

80/20 rule… most schemas are going to have their large/hot tables, etc and data has a natural life cycle. If you have a heathy application then bloat is not an issue as free space is used by new tuples. Each database has a data flow to it depending on the maturity and nature of the application/database. Exiting tuples make room for new tuples, etc.

If your have to vacuum full / pg_repack your tables on a scheduled bases then I think there is something very wrong with your application.

Pg_repack will do more harm in the long run. i.e. the entire time pg_repack is running xmin is frozen thus creating more bloat everywhere else!

Bloat is overrated; especially in a transaction system where all your data access patterns should be well defined and not doing full table scans. Just focus on identifying bloated indexes periodically and rebuilding those. There should be no need to vacuum full tables under normal circumstances.

Part of a properly-maintained system is regularly archive/purging (whether that be dropping date-based partitions, or deleting old data from unpartitioned tables or tables partitioned by something other than a date).

For example, I gave a list of tables (all intertwined via FK constraints) to the application support people, and they returned the list stating how many weeks or months of data to retain in each table. Every Saturday night a cron job goes through and deletes the old data from, and then "manually" vacuum-analyzes them.

No bloat...

Death to America, and butter sauce.

Iraq lobster!

Re: Schedule pg_repack job with pg_cron

From

Rui DeSousa

Date:

07 August 2024, 20:52:25

On Aug 7, 2024, at 3:39 PM, Ron Johnson <ronljohnsonjr@gmail.com> wrote:

On Wed, Aug 7, 2024 at 3:29 PM Rui DeSousa <rui.desousa@icloud.com> wrote:
On Aug 7, 2024, at 9:24 AM, jacob ndinkwa <jndinkwa@gmail.com> wrote:

To schedule a pg_repack job using pg_cron on an Amazon RDS for PostgreSQL instance, you need to follow a few steps. However, it’s important to note that pg_cron is only supported on certain versions of Amazon RDS, and pg_repack is also a separate extension that must be installed and enabled.

Is scheduling pg_repack just a bad idea and just introducing just more bloat? Why not just tune auto vacuum?

80/20 rule… most schemas are going to have their large/hot tables, etc and data has a natural life cycle. If you have a heathy application then bloat is not an issue as free space is used by new tuples. Each database has a data flow to it depending on the maturity and nature of the application/database. Exiting tuples make room for new tuples, etc.

If your have to vacuum full / pg_repack your tables on a scheduled bases then I think there is something very wrong with your application.

Pg_repack will do more harm in the long run. i.e. the entire time pg_repack is running xmin is frozen thus creating more bloat everywhere else!

Bloat is overrated; especially in a transaction system where all your data access patterns should be well defined and not doing full table scans. Just focus on identifying bloated indexes periodically and rebuilding those. There should be no need to vacuum full tables under normal circumstances.

Part of a properly-maintained system is regularly archive/purging (whether that be dropping date-based partitions, or deleting old data from unpartitioned tables or tables partitioned by something other than a date).

For example, I gave a list of tables (all intertwined via FK constraints) to the application support people, and they returned the list stating how many weeks or months of data to retain in each table. Every Saturday night a cron job goes through and deletes the old data from, and then "manually" vacuum-analyzes them.

No bloat...

--
Death to America, and butter sauce.
Iraq lobster!

It really comes down to life cycle of the given data and I don’t think I would say regularly archive/purging data is a requirement. I have worked on all these type of systems;

i.e.:

1. Regulatory applications where the transaction needed to be keep in the system for at least 7 to 10 years.

2. Application where legally the data needed to be removed from the system within 3 months (licensed data); new data was entering the system just as fast as it was exiting daily (total of 2TB of data).

3. A highly transactional system where the data was purged within 30 days; however, it was steamed in realtime to a datamart where was keep forever and no purge cycle was defined.

Having a cron job to run vacuum-analyze manually is a security blanket. It is not needed if auto vacuum is tuned; however, it doesn’t hurt and it is a good CYA plan.

Needing to run Vacuum FULL on a regular bases is a symptom to a larger underling issue with the system and that’s my entire point.

Re: Schedule pg_repack job with pg_cron

From

shammat@gmx.net

Date:

08 August 2024, 10:16:53

Ron Johnson schrieb am 07.08.2024 um 21:39:
> Part of a properly-maintained system is *regularly* archive/purging
> (whether that be dropping date-based partitions, or deleting old
> data from unpartitioned tables or tables partitioned by something
> other than a date).
>
> For example, I gave a list of tables (all intertwined via FK
> constraints) to the application support people, and they returned
> the list stating how many weeks or months of data to retain in each
> table.  Every Saturday night a cron job goes through and deletes the
> old data from, and then "manually" vacuum-analyzes them.


If the application will then insert new data after the cleanup, Postgres
will re-use the free space that the delete "created". So depending
on the speed of inserts, you might not really gain that much.

Re: Schedule pg_repack job with pg_cron

From

Ron Johnson

Date:

08 August 2024, 10:26:22

On Thu, Aug 8, 2024 at 6:17 AM <shammat@gmx.net> wrote:

Ron Johnson schrieb am 07.08.2024 um 21:39:
> Part of a properly-maintained system is *regularly* archive/purging
> (whether that be dropping date-based partitions, or deleting old
> data from unpartitioned tables or tables partitioned by something
> other than a date).
>
> For example, I gave a list of tables (all intertwined via FK
> constraints) to the application support people, and they returned
> the list stating how many weeks or months of data to retain in each
> table. Every Saturday night a cron job goes through and deletes the
> old data from, and then "manually" vacuum-analyzes them.

If the application will then insert new data after the cleanup, Postgres
will re-use the free space that the delete "created". So depending
on the speed of inserts, you might not really gain that much.

Eh? The whole point of VACUUM is to ensure that the existing allocated disk space is available for new records?

Or did you think that I do a VACUUM FULL on those tables? (No; I definitely don't do that, though I occasionally CLUSTER some of the tables to make range queries more efficient.)

Death to America, and butter sauce.

Iraq lobster!

Re: Schedule pg_repack job with pg_cron

From

shammat@gmx.net

Date:

08 August 2024, 10:59:18

Ron Johnson schrieb am 08.08.2024 um 12:26:
>> Part of a properly-maintained system is *regularly* archive/
>> purging (whether that be dropping date-based partitions, or
>> deleting old data from unpartitioned tables or tables partitioned
>> by something other than a date).
>>
>> For example, I gave a list of tables (all intertwined via FK
>> constraints) to the application support people, and they returned
>> the list stating how many weeks or months of data to retain in
>> each table.  Every Saturday night a cron job goes through and
>> deletes the old data from, and then "manually" vacuum-analyzes
>> them.
>
>
> If the application will then insert new data after the cleanup,
> Postgres will re-use the free space that the delete "created". So
> depending on the speed of inserts, you might not really gain that
> much.
>
>
> Or did you think that I do a VACUUM FULL on those tables?  (No; I
> definitely don't do that, though I /occasionally/ CLUSTER /some/ of
> the tables to make range queries more efficient.)
Sorry, I misread your post and was indeed thinking about VACUUM FULL
as pg_repack is an alternative to that.