Re: Millions of tables - Mailing list pgsql-performance

From Greg Spiegelberg
Subject Re: Millions of tables
Date
Msg-id CAEtnbpXbKBR1Hp_prQWNMbCDLVWn0Upo6V3_c83f+XFJ-aKNQA@mail.gmail.com
Whole thread Raw
In response to Re: Millions of tables  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Responses Re: Millions of tables
Re: Millions of tables
List pgsql-performance
On Fri, Sep 30, 2016 at 4:49 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 9/29/16 6:11 AM, Alex Ignatov (postgrespro) wrote:
With millions of tables you have to set    autovacuum_max_workers
 sky-high =). We have some situation when at thousands of tables
autovacuum can’t vacuum all tables that need it. Simply it vacuums some
of most modified table and never reach others. Only manual vacuum can
help with this situation. With wraparound issue it can be a nightmare

Specifically, autovac isn't going to start worrying about anti-wrap vacuums until tables start hitting autovacuum_freeze_max_age (or autovacuum_multixact_freeze_max_age). Any tables that hit that threshold go to the front of the line for being vacuumed. (But keep in mind that there is no universal line, just what each worker computes on it's own when it's started).

Where things will completely fall apart for you is if a lot of tables all have roughly the same relfrozenxid (or relminmxid), like they would immediately after a large load. In that scenario you'll suddenly have loads of work for autovac to do, all at the same time. That will make the database, DBAs and you Very Unhappy (tm).

Somehow, some way, you *must* do a vacuum of the entire database. Luckily the freeze map in 9.6 means you'd only have to do that one time (assuming the data really is static). In any older version, (auto)vacuum will need to eventually *read everything in every table* at least once every ~2B transactions.

Data is not static.  The 4M tables fall into one of two groups.

Group A contains 2M tables.  INSERT will occur ~100 times/day and maximum number of records anticipated will be 200k.  Periodic DELETE's will occur removing "old" records.  Age is something the client sets and I have no way of saying 1 or 10k records will be removed.

Group B contains the other 2M tables.  Maximum records ~140k and UPSERT will be the only mechanism used to populate and maintain.  Periodic DELETE's may run on these tables as well removing "old" records.

Will a set of tables require vacuum'ing at the same time?  Quite possibly but I have no way to say 2 or 200k tables will need it.


When you say "must do a vacuum of the entire database", are you saying the entire database must be vacuum'd as a whole per 2B transactions or all tables must be vacuum'd eventually at least once?  I want to be absolutely clear on what you're saying.

 
There is one potentially significant difference between autovac and manual vacuums here; autovac treats toast tables as just another table, with their own stats and their own freeze needs. If you're generating a lot of toast records that might make a difference.

I do not anticipate TOAST entering the picture.  No single column or record > 8KB or even approaching it. We have a few databases that (ab)use pg_toast and I want to avoid those complications.

 
-Greg

pgsql-performance by date:

Previous
From: Gavin Flower
Date:
Subject: Re: MYSQL Stats
Next
From: Jim Nasby
Date:
Subject: Re: Millions of tables