Thread: Autovacuum on Partitioned Tables

Autovacuum on Partitioned Tables

From
Ryan Ruenroeng
Date:
Hello there!

There is a statement in the Postgres Docs: "Partitioned tables are not processed by autovacuum."

What does the above statement mean?
Does autovacuum not process both the parent and the child tables in a partition relationship? 

What is the definition of a partitioned table?

I have a table with 50+ million rows that gets data added to/wiped from it every 90 days. We are planning to break this table into a few thousand partitions. More partitions will likely be added in the future, but we don't have plans to delete any of the partitions. Will we need to manually track the statistics of these partitions and manually vacuum the tables or will autovacuum help to manage them?

Best,
Ryan

Re: Autovacuum on Partitioned Tables

From
Ron
Date:
On 10/31/22 17:26, Ryan Ruenroeng wrote:
Hello there!

There is a statement in the Postgres Docs: "Partitioned tables are not processed by autovacuum."

What does the above statement mean?
Does autovacuum not process both the parent and the child tables in a partition relationship?

Partitioned (aka parent) tables are "virtual".  There's nothing to vacuum are analyze.


What is the definition of a partitioned table?

I have a table with 50+ million rows that gets data added to/wiped from it every 90 days. We are planning to break this table into a few thousand partitions.

1. That's a lot of partitions.  Older (like v12) query optimizers don't do a great job handle that many partitions.
2. Be careful what you partition on.  (We added "partition_date" to PKs so as to partition by date, even though the "real" PK is a synthetic key; it was a query performance disaster in Pg 12.)

More partitions will likely be added in the future, but we don't have plans to delete any of the partitions. Will we need to manually track the statistics of these partitions and manually vacuum the tables or will autovacuum help to manage them?

Autovacuum will handle it.

--
Angular momentum makes the world go 'round.

Re: Autovacuum on Partitioned Tables

From
Tom Lane
Date:
Ron <ronljohnsonjr@gmail.com> writes:
> On 10/31/22 17:26, Ryan Ruenroeng wrote:
>> I have a table with 50+ million rows that gets data added to/wiped from it
>> every 90 days. We are planning to break this table into a few thousand
>> partitions.

> 1. That's a lot of partitions.  Older (like v12) query optimizers don't do a
> great job handle that many partitions.

Newer ones don't either, if you are incautious enough to issue a query
that touches all/most of the partitions --- or even just that the
planner cannot prove doesn't touch all/most of the partitions.  So
that sort of setup hinges critically on having very stylized queries
that you've vetted in advance.  We'll probably continue to make
incremental improvements in how well things work with lots of
partitions, but I don't foresee it ever becoming a cost-free thing.

TBH, if you've got 50m rows, I'm not sure you need partitions at all.
You sure as heck do not need "a few thousand" of them.

You should be basing your partitioning design on predictable maintenance
operations that will match your partition boundaries.  For example,
if your business requirement is to keep five years' worth of records
and you want to drop the oldest month's records in bulk once a month,
then partitioning by month would be pretty helpful to make those drops
cheap.  That would lead to 60 active partitions which is entirely
reasonable.

            regards, tom lane



Re: Autovacuum on Partitioned Tables

From
Laurenz Albe
Date:
On Mon, 2022-10-31 at 17:26 -0500, Ryan Ruenroeng wrote:
> There is a statement in the Postgres Docs: "Partitioned tables are not processed by autovacuum."
> 
> What does the above statement mean?
> Does autovacuum not process both the parent and the child tables in a partition relationship? 

I agree that this is unclear and have proposed an improvement:

https://postgr.es/m/1fd81ddc7710a154834030133c6fea41e55c8efb.camel@cybertec.at

Yours,
Laurenz Albe



Re: Autovacuum on Partitioned Tables

From
Ron
Date:
On 10/31/22 23:05, Tom Lane wrote:
[snip]
> TBH, if you've got 50m rows, I'm not sure you need partitions at all.

Big rows (i.e. document storage tables with bytea or xml fields) can make 
databases explode in size even with only 50M rows.

(Yes, I know the arguments against it, but it works quite well when the 
database is in a cloud instance.  Worries about backup times, at least, are 
eliminated.)

-- 
Angular momentum makes the world go 'round.



Re: Autovacuum on Partitioned Tables

From
Ryan Ruenroeng
Date:
Thanks all for your responses. We have a couple of tables. Each with 50-70mil rows currently, but they are expected to grow. Partitioning seems to be a better long-term strategy, queries to these tables, using their existing indexes, leaves them basically unusable (loooong run times). 

FYI, we are currently on PG v11.16

Let's assume we do go with my current proposal because it is inline with how we are planning to drop partitions that stop seeing activity in the future:

[snip] from Ron's post above:
Autovacuum will handle it.

I still have some doubts based on this:
"Tuples changed in partitions and inheritance children do not trigger analyze on the parent table. If the parent table is empty or rarely changed, it may never be processed by autovacuum, and the statistics for the inheritance tree as a whole won't be collected. It is necessary to run ANALYZE on the parent table manually in order to keep the statistics up to date." [Link]
Q1: Will we at least need to call Analyze via a cron job on the parent table to ensure that the statistics are up to date for autovacuum to catch the tables?
From reading the documentation that a few of you have pointed me to, I'm led to believe that the parent table is the "Partition" table. The children tables are treated by the autovacuum as tables
Q2: Autovacuum will act on the partitions/children to the parent table. Is that a correct statement?
--
It's good to know that the query optimizer will improve with partitions on versions 12+. Thank you. 

Best,
Ryan


On Tue, Nov 1, 2022 at 2:54 AM Ron <ronljohnsonjr@gmail.com> wrote:
On 10/31/22 23:05, Tom Lane wrote:
[snip]
> TBH, if you've got 50m rows, I'm not sure you need partitions at all.

Big rows (i.e. document storage tables with bytea or xml fields) can make
databases explode in size even with only 50M rows.

(Yes, I know the arguments against it, but it works quite well when the
database is in a cloud instance.  Worries about backup times, at least, are
eliminated.)

--
Angular momentum makes the world go 'round.


Re: Autovacuum on Partitioned Tables

From
Ron
Date:
On 11/1/22 10:11, Ryan Ruenroeng wrote:T
Thanks all for your responses. We have a couple of tables. Each with 50-70mil rows currently, but they are expected to grow. Partitioning seems to be a better long-term strategy, queries to these tables, using their existing indexes, leaves them basically unusable (loooong run times). 

FYI, we are currently on PG v11.16

Let's assume we do go with my current proposal because it is inline with how we are planning to drop partitions that stop seeing activity in the future:

[snip] from Ron's post above:
Autovacuum will handle it.

I still have some doubts based on this:
"Tuples changed in partitions and inheritance children do not trigger analyze on the parent table. If the parent table is empty or rarely changed, it may never be processed by autovacuum, and the statistics for the inheritance tree as a whole won't be collected. It is necessary to run ANALYZE on the parent table manually in order to keep the statistics up to date." [Link]
Q1: Will we at least need to call Analyze via a cron job on the parent table to ensure that the statistics are up to date for autovacuum to catch the tables?

Tuning auto-analyze variables is supposed to obviate the need for manual vacuuming analyzing.  I do manual vacuuming anayzing anyway in a cron job. "This set of tables have had more than X amount of changes, so I'll explictly analyze them."

Once a week, I do bloat calculations and vacuum based on that.

From reading the documentation that a few of you have pointed me to, I'm led to believe that the parent table is the "Partition" table.

"\d" will show you exactly what you need to know.

 The children tables are treated by the autovacuum as tables

Child tables are tables.

Q2: Autovacuum will act on the partitions/children to the parent table. Is that a correct statement?

Parents are meta-tables.  There's nothing to vacuum or analyze.

--
It's good to know that the query optimizer will improve with partitions on versions 12+. Thank you. 

Best,
Ryan


On Tue, Nov 1, 2022 at 2:54 AM Ron <ronljohnsonjr@gmail.com> wrote:
On 10/31/22 23:05, Tom Lane wrote:
[snip]
> TBH, if you've got 50m rows, I'm not sure you need partitions at all.

Big rows (i.e. document storage tables with bytea or xml fields) can make
databases explode in size even with only 50M rows.

(Yes, I know the arguments against it, but it works quite well when the
database is in a cloud instance.  Worries about backup times, at least, are
eliminated.)

--
Angular momentum makes the world go 'round.



--
Angular momentum makes the world go 'round.