Re: rebuild big tables with pgrepack - Mailing list pgsql-admin
| From | pg254kl@georgiou.vip |
|---|---|
| Subject | Re: rebuild big tables with pgrepack |
| Date | |
| Msg-id | 176324240033.6.6265117135512277017.1008813004@georgiou.vip Whole thread Raw |
| In response to | rebuild big tables with pgrepack (ek ek <livadidrive@gmail.com>) |
| List | pgsql-admin |
On 11/14/25 2:14 PM, ek ek - livadidrive at gmail.com wrote:
Hello everyone,I’m going to rebuild a 900GB table using pg_repack. I’m hesitant to do such a large operation in one go.Is there an ideal or recommended way to repack very large tables?
I recall almost 20 years ago when Bruce Momjian was educating us on PostgreSQL (using Sybase ASE, but migrating to PostgreSQL), we discussed the table and index rebuilding mania. I never forgot the "we have clients that haven't rebuilt anything for years and they run just fine" haha. OK, sometimes if you have very "hot" tables its warranted, but the point that stayed with me is that it's usually unnecessary.
But ok, you may have your reasons, so...
- Connect to your cluster from a box with good connectivity to it (eg: for AWS RDS this means an EC2 instance on same VPC)
- Definitely run pg_repack inside a tmux session
- Be safe and have at least 3 x pg_total_relation_size(table) free space
- Make sure you understand the -k (--no-kill-backend) and --wait-timeout options. By default (no -k) pg_repack will wait on blocking backends and on a busy table eventually timeout. Decisions decisions. Definitely run when things are calm.
- If your default toast compression is not lz4, and this table uses TOAST, consider changing it to lz4 prior to this pg_repack. lz4 is blazingly fast with low cpu cost.
- I don't like running anything heavy handed on production without practicing/testing first. If things go bad and management asks "did you test this?" the answer should always be "yes". If this is your first time pg_repacking such table, make sure you first do it on a staging environment under heavy load during a performance run. The cache churn can be an issue if your workload depends on a hot cache. Again, definitely run during the calmest window, but test under load ;)
At 900GB I'd start considering partitioning, it will make maintenance jobs more efficient, not to mention the smaller indices should help with better use of your cache (if your partition strategy can segregate old unused data from newer used data).
-- regards, Kiriakos Georgiou
pgsql-admin by date: