Thread: pg_multixact not getting truncated

pg_multixact not getting truncated

From

Josh Berkus

Date:

04 November 2014, 00:49:49

Hackers,

I'm looking at a couple of high-transaction-rate and high-FK-conflict
rate servers where pg_multixact has grown to be more than 1GB in size.
One such server doesn't appear to be having any notable issues with
vacuuming, and the oldest mxid on the system is about 47m old. VACUUM
FREEZEing the oldest databases did not cause the pg_multixact dir to get
smaller --- it may have even caused it to get larger.

Why would pg_multixact not be truncating?  Does it never truncate files
with aborted multixacts in them?  Might we have another multixact bug?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pg_multixact not getting truncated

From

Alvaro Herrera

Date:

04 November 2014, 01:07:02

Josh Berkus wrote:
> Hackers,
> 
> I'm looking at a couple of high-transaction-rate and high-FK-conflict
> rate servers where pg_multixact has grown to be more than 1GB in size.
> One such server doesn't appear to be having any notable issues with
> vacuuming, and the oldest mxid on the system is about 47m old. VACUUM
> FREEZEing the oldest databases did not cause the pg_multixact dir to get
> smaller --- it may have even caused it to get larger.
> 
> Why would pg_multixact not be truncating?  Does it never truncate files
> with aborted multixacts in them?  Might we have another multixact bug?

Have a look at the MultiXactId values in pg_controldata, datminmxid in
pg_database, and relminmxid in pg_class.  They should advance as you
VACUUM FREEZE.  If it's stuck at 1, you might be in pg_upgrade trouble
if you used 9.3.4 or earlier to upgrade.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: pg_multixact not getting truncated

From

Josh Berkus

Date:

04 November 2014, 01:24:11

On 11/03/2014 05:06 PM, Alvaro Herrera wrote:
> Josh Berkus wrote:
>> Hackers,
>>
>> I'm looking at a couple of high-transaction-rate and high-FK-conflict
>> rate servers where pg_multixact has grown to be more than 1GB in size.
>> One such server doesn't appear to be having any notable issues with
>> vacuuming, and the oldest mxid on the system is about 47m old. VACUUM
>> FREEZEing the oldest databases did not cause the pg_multixact dir to get
>> smaller --- it may have even caused it to get larger.
>>
>> Why would pg_multixact not be truncating?  Does it never truncate files
>> with aborted multixacts in them?  Might we have another multixact bug?
> 
> Have a look at the MultiXactId values in pg_controldata, datminmxid in
> pg_database, and relminmxid in pg_class.  They should advance as you
> VACUUM FREEZE.  If it's stuck at 1, you might be in pg_upgrade trouble
> if you used 9.3.4 or earlier to upgrade.

They're advancing.  I also know for a fact that this system was not
pg_upgraded, because I set it up.  It's always been on 9.3.5.

So after some vacuum freezing and a checkpoint, the oldestMultiXid
advanced by 9 million, about 20% of the gap with current multiXid.
However, that did not delete 20% of the files; before the Freeze there
were 4414 files, and now there are 4195.  So advancing 9m mxids resulted
in only deleting ~~130 files.  So I think we definitely have an issue here.

BTW, the reason I started poking into this was a report from a user that
they have a pg_multixact directory which is 21GB in size, and is 2X the
size of the database.

Here's XID data:

Latest checkpoint's NextXID:          0/1126461940
Latest checkpoint's NextOID:          371135838
Latest checkpoint's NextMultiXactId:  162092874
Latest checkpoint's NextMultiOffset:  778360290
Latest checkpoint's oldestXID:        945761490
Latest checkpoint's oldestXID's DB:   370038709
Latest checkpoint's oldestActiveXID:  1126461940
Latest checkpoint's oldestMultiXid:   123452201
Latest checkpoint's oldestMulti's DB: 370038709

Oldest mxid file is 29B2, newest is 3A13

No tables had a relminmxid of 1 (some of the system tables were 0,
though), and the data from pg_class and pg_database is consistent.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pg_multixact not getting truncated

From

Josh Berkus

Date:

04 November 2014, 01:40:52

On 11/03/2014 05:24 PM, Josh Berkus wrote:
> BTW, the reason I started poking into this was a report from a user that
> they have a pg_multixact directory which is 21GB in size, and is 2X the
> size of the database.
> 
> Here's XID data:
> 
> Latest checkpoint's NextXID:          0/1126461940
> Latest checkpoint's NextOID:          371135838
> Latest checkpoint's NextMultiXactId:  162092874
> Latest checkpoint's NextMultiOffset:  778360290
> Latest checkpoint's oldestXID:        945761490
> Latest checkpoint's oldestXID's DB:   370038709
> Latest checkpoint's oldestActiveXID:  1126461940
> Latest checkpoint's oldestMultiXid:   123452201
> Latest checkpoint's oldestMulti's DB: 370038709
> 
> Oldest mxid file is 29B2, newest is 3A13
> 
> No tables had a relminmxid of 1 (some of the system tables were 0,
> though), and the data from pg_class and pg_database is consistent.

More tidbits:

I just did a quick check on customer systems (5 of them).  This only
seems to be happening on customer systems where I specifically know
there is a high number of FK lock waits (the system above gets around
1000 per minute that we know of).  Other systems with higher transaction
rates don't exhibit this issue; I checked a 9.3.5 database which
normally needs to do XID wraparound once every 10 days, and it's
pg_multixact is only 48K (it has one file, 0000).

Note that pg_clog on the bad machine is only 64K in size.

How many IDs are there per mxid file?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pg_multixact not getting truncated

From

Jim Nasby

Date:

05 November 2014, 18:41:25

On 11/3/14, 7:40 PM, Josh Berkus wrote:
> On 11/03/2014 05:24 PM, Josh Berkus wrote:
>> BTW, the reason I started poking into this was a report from a user that
>> they have a pg_multixact directory which is 21GB in size, and is 2X the
>> size of the database.
>>
>> Here's XID data:
>>
>> Latest checkpoint's NextXID:          0/1126461940
>> Latest checkpoint's NextOID:          371135838
>> Latest checkpoint's NextMultiXactId:  162092874
>> Latest checkpoint's NextMultiOffset:  778360290
>> Latest checkpoint's oldestXID:        945761490
>> Latest checkpoint's oldestXID's DB:   370038709
>> Latest checkpoint's oldestActiveXID:  1126461940
>> Latest checkpoint's oldestMultiXid:   123452201
>> Latest checkpoint's oldestMulti's DB: 370038709
>>
>> Oldest mxid file is 29B2, newest is 3A13
>>
>> No tables had a relminmxid of 1 (some of the system tables were 0,
>> though), and the data from pg_class and pg_database is consistent.
>
> More tidbits:
>
> I just did a quick check on customer systems (5 of them).  This only
> seems to be happening on customer systems where I specifically know
> there is a high number of FK lock waits (the system above gets around
> 1000 per minute that we know of).  Other systems with higher transaction
> rates don't exhibit this issue; I checked a 9.3.5 database which
> normally needs to do XID wraparound once every 10 days, and it's
> pg_multixact is only 48K (it has one file, 0000).
>
> Note that pg_clog on the bad machine is only 64K in size.
>
> How many IDs are there per mxid file?
#define MULTIXACT_OFFSETS_PER_PAGE (BLCKSZ / sizeof(MultiXactOffset))

So for 8k blocks, there are 2k offsets (really MultiXactIds) per page, 32 pages per SLRU segment. Your file names
aren'tmaking sense to me. :( If I'm doing the math correctly, 29B2 is MXID 699 531 264 and 3A13 is 974 323 712. You're
onlylooking in pg_multixact/members/, yes?

Relevant code starts in vacuum.c/vac_update_datfrozenxid()

If there's any rows in pg_class for tables/matviews/toast with either relfrozenxid > next XID or relminmxid > next MXID
thenthe code *silently* pulls the plug right there. IMO we should at least issue a warning.

That you see relminxid advancing tells me this isn't the case here.

ForceTransactionIdLimitUpdate() is a bit suspect in that it only looks at xidVacLimit, but if it were breaking then you
wouldn'tsee pg_database minmxid advancing.

Looking through TruncateMultiXact, I don't see anything that could prevent truncation, unless the way we're handing
MultiXactIDwraparound is broken (which I don't see any indication of).

Can you post the contents of pg_multixact/members/?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: pg_multixact not getting truncated

From

Josh Berkus

Date:

05 November 2014, 19:15:24

On 11/05/2014 10:40 AM, Jim Nasby wrote:
> On 11/3/14, 7:40 PM, Josh Berkus wrote:
>> On 11/03/2014 05:24 PM, Josh Berkus wrote:
>>> BTW, the reason I started poking into this was a report from a user that
>>> they have a pg_multixact directory which is 21GB in size, and is 2X the
>>> size of the database.
>>>
>>> Here's XID data:
>>>
>>> Latest checkpoint's NextXID:          0/1126461940
>>> Latest checkpoint's NextOID:          371135838
>>> Latest checkpoint's NextMultiXactId:  162092874
>>> Latest checkpoint's NextMultiOffset:  778360290
>>> Latest checkpoint's oldestXID:        945761490
>>> Latest checkpoint's oldestXID's DB:   370038709
>>> Latest checkpoint's oldestActiveXID:  1126461940
>>> Latest checkpoint's oldestMultiXid:   123452201
>>> Latest checkpoint's oldestMulti's DB: 370038709
>>>
>>> Oldest mxid file is 29B2, newest is 3A13
>>>
>>> No tables had a relminmxid of 1 (some of the system tables were 0,
>>> though), and the data from pg_class and pg_database is consistent.
>>
>> More tidbits:
>>
>> I just did a quick check on customer systems (5 of them).  This only
>> seems to be happening on customer systems where I specifically know
>> there is a high number of FK lock waits (the system above gets around
>> 1000 per minute that we know of).  Other systems with higher transaction
>> rates don't exhibit this issue; I checked a 9.3.5 database which
>> normally needs to do XID wraparound once every 10 days, and it's
>> pg_multixact is only 48K (it has one file, 0000).
>>
>> Note that pg_clog on the bad machine is only 64K in size.
>>
>> How many IDs are there per mxid file?
> #define MULTIXACT_OFFSETS_PER_PAGE (BLCKSZ / sizeof(MultiXactOffset))
> 
> So for 8k blocks, there are 2k offsets (really MultiXactIds) per page,
> 32 pages per SLRU segment. Your file names aren't making sense to me. :(
> If I'm doing the math correctly, 29B2 is MXID 699 531 264 and 3A13 is
> 974 323 712. You're only looking in pg_multixact/members/, yes?

These are members, not offsets.  What's stored in members?

> Relevant code starts in vacuum.c/vac_update_datfrozenxid()
> 
> If there's any rows in pg_class for tables/matviews/toast with either
> relfrozenxid > next XID or relminmxid > next MXID then the code
> *silently* pulls the plug right there. IMO we should at least issue a
> warning.

Wait, how would that situation arise in the first place? Wraparound is
supposed to prevent it.

Mind you, I checked pg_class, and it matches the minmxid shown by
pg_database, so that's not the smoking gun.

> Can you post the contents of pg_multixact/members/?

Well, not as of last week, obviously.

https://gist.github.com/jberkus/d05db3629e8c898664c4

I haven't pasted all the filenames, because, well, look at the count.  I
also added the contents of the /offsets directory, for full information.

Note that we've added 400 multixact files since my first email.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pg_multixact not getting truncated

From

Josh Berkus

Date:

08 November 2014, 00:09:53

On 11/05/2014 11:15 AM, Josh Berkus wrote:
> On 11/05/2014 10:40 AM, Jim Nasby wrote:
>> Can you post the contents of pg_multixact/members/?
> 
> Well, not as of last week, obviously.
> 
> https://gist.github.com/jberkus/d05db3629e8c898664c4
> 
> I haven't pasted all the filenames, because, well, look at the count.  I
> also added the contents of the /offsets directory, for full information.
> 
> Note that we've added 400 multixact files since my first email.

So, just did a quick survey of our managed services customers.  4 out of
6 have pg_multixact bloat.  Basically it looks like this:

User1 (the one above): 1.1GB
User2: 64K
User3: 929MB
User4: 1.3GB
User5: 301MB
User6: 48K

Nobody (out of 6 more) running a version older than 9.3 had a
pg_multixact larger than 128K.

None of these users used pg_upgrade, so that's not the source of this
problem.  Instead, as demonstrated earlier, we are simply not truncating
pg_multixact sufficiently.

So this looks like a pretty serious bug.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pg_multixact not getting truncated

From

Alvaro Herrera

Date:

08 November 2014, 00:44:02

Josh Berkus wrote:

> So, just did a quick survey of our managed services customers.  4 out of
> 6 have pg_multixact bloat.  Basically it looks like this:
> 
> User1 (the one above): 1.1GB
> User2: 64K
> User3: 929MB
> User4: 1.3GB
> User5: 301MB
> User6: 48K
> 
> Nobody (out of 6 more) running a version older than 9.3 had a
> pg_multixact larger than 128K.

9.3 multixacts are completely different from pre-9.3.  We now depend on
the freezing parameters to determine how much is kept; previously they
were kept only for live trnsactions.

Quoting the pg_controldata output you pasted previously:

>>> Latest checkpoint's NextMultiXactId:  162092874
>>> Latest checkpoint's oldestMultiXid:   123452201

This says that the live multixact range goes from 123 million to 162
million; roughly 40 million values.  (The default value for
vacuum_multixact_freeze_table_age is 150 million, which is what
determines how many values are kept.)

You gist.github paste tells us there are 4598 members files.  Each file
has 32 pages, and each page hosts 2045 members; so there are 32 * 2045 *
4598 members, or somewhat about 300 million.  For 40 million
multixacts, this means there are about 7 members per multixact, in
average, which seems a reasonable number to me.

If you want to have vacuum truncate pg_multixact more aggresively, you
need to decrease vacuum_multixact_freeze_table_age and
vacuum_multixact_freeze_min_age.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: pg_multixact not getting truncated

From

Josh Berkus

Date:

08 November 2014, 01:20:53

On 11/07/2014 04:43 PM, Alvaro Herrera wrote:
> This says that the live multixact range goes from 123 million to 162
> million; roughly 40 million values.  (The default value for
> vacuum_multixact_freeze_table_age is 150 million, which is what
> determines how many values are kept.)
> 
> You gist.github paste tells us there are 4598 members files.  Each file
> has 32 pages, and each page hosts 2045 members; so there are 32 * 2045 *
> 4598 members, or somewhat about 300 million.  For 40 million
> multixacts, this means there are about 7 members per multixact, in
> average, which seems a reasonable number to me.

So the basic problem is that multixact files are just huge, with an
average of 35 bytes per multixact?

> If you want to have vacuum truncate pg_multixact more aggresively, you
> need to decrease vacuum_multixact_freeze_table_age and
> vacuum_multixact_freeze_min_age.

If that's the case, then we need to set the defaults more aggressively.I suggest maybe 10 million.  The alternative is
allowingit to creep up

to 150million, which would be 5GB.  I don't see adding 5GB to user
databases without warning them as good behavior.

Of course, this will lead to LOTs of additional vacuuming ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pg_multixact not getting truncated

From

Alvaro Herrera

Date:

08 November 2014, 01:29:27

Josh Berkus wrote:
> On 11/07/2014 04:43 PM, Alvaro Herrera wrote:
> > This says that the live multixact range goes from 123 million to 162
> > million; roughly 40 million values.  (The default value for
> > vacuum_multixact_freeze_table_age is 150 million, which is what
> > determines how many values are kept.)
> > 
> > You gist.github paste tells us there are 4598 members files.  Each file
> > has 32 pages, and each page hosts 2045 members; so there are 32 * 2045 *
> > 4598 members, or somewhat about 300 million.  For 40 million
> > multixacts, this means there are about 7 members per multixact, in
> > average, which seems a reasonable number to me.
> 
> So the basic problem is that multixact files are just huge, with an
> average of 35 bytes per multixact?

The more members the multixacts have, the more space they occupy.  I
would have thought this was obvious enough.

> > If you want to have vacuum truncate pg_multixact more aggresively, you
> > need to decrease vacuum_multixact_freeze_table_age and
> > vacuum_multixact_freeze_min_age.
> 
> If that's the case, then we need to set the defaults more aggressively.
>  I suggest maybe 10 million.  The alternative is allowing it to creep up
> to 150million, which would be 5GB.  I don't see adding 5GB to user
> databases without warning them as good behavior.
> 
> Of course, this will lead to LOTs of additional vacuuming ...

There's a trade-off here: more vacuuming I/O usage for less disk space
used.  How stressed your customers really are about 1 GB of disk space?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: pg_multixact not getting truncated

From

Andres Freund

Date:

08 November 2014, 11:52:53

On 2014-11-07 16:09:44 -0800, Josh Berkus wrote:
> On 11/05/2014 11:15 AM, Josh Berkus wrote:
> > On 11/05/2014 10:40 AM, Jim Nasby wrote:
> >> Can you post the contents of pg_multixact/members/?
> > 
> > Well, not as of last week, obviously.
> > 
> > https://gist.github.com/jberkus/d05db3629e8c898664c4
> > 
> > I haven't pasted all the filenames, because, well, look at the count.  I
> > also added the contents of the /offsets directory, for full information.
> > 
> > Note that we've added 400 multixact files since my first email.
> 
> So, just did a quick survey of our managed services customers.  4 out of
> 6 have pg_multixact bloat.  Basically it looks like this:
> 
> User1 (the one above): 1.1GB
> User2: 64K
> User3: 929MB
> User4: 1.3GB
> User5: 301MB
> User6: 48K
> 
> Nobody (out of 6 more) running a version older than 9.3 had a
> pg_multixact larger than 128K.
> 
> None of these users used pg_upgrade, so that's not the source of this
> problem.  Instead, as demonstrated earlier, we are simply not truncating
> pg_multixact sufficiently.
> 
> So this looks like a pretty serious bug.

I've not seen actual evidence of a bug here. What's their multixact
related settings? How large is pg_clog?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: pg_multixact not getting truncated

From

Andres Freund

Date:

08 November 2014, 11:55:05

On 2014-11-07 17:20:44 -0800, Josh Berkus wrote:
> On 11/07/2014 04:43 PM, Alvaro Herrera wrote:
> > This says that the live multixact range goes from 123 million to 162
> > million; roughly 40 million values.  (The default value for
> > vacuum_multixact_freeze_table_age is 150 million, which is what
> > determines how many values are kept.)
> > 
> > You gist.github paste tells us there are 4598 members files.  Each file
> > has 32 pages, and each page hosts 2045 members; so there are 32 * 2045 *
> > 4598 members, or somewhat about 300 million.  For 40 million
> > multixacts, this means there are about 7 members per multixact, in
> > average, which seems a reasonable number to me.
> 
> So the basic problem is that multixact files are just huge, with an
> average of 35 bytes per multixact?

Depends on the concurrency. The number of members is determined by the
number of xacts concurrenly locking a row.

> > If you want to have vacuum truncate pg_multixact more aggresively, you
> > need to decrease vacuum_multixact_freeze_table_age and
> > vacuum_multixact_freeze_min_age.
> 
> If that's the case, then we need to set the defaults more
> aggressively.

Why? If you have that high transaction volume, a few seldomly read files
won't hurt you.

> Of course, this will lead to LOTs of additional vacuuming ...

Yes. And that's likely to cause much, much more grief.

Also. Didn't you just *vehemently* oppose making these values tunable at
all?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: pg_multixact not getting truncated

From

Josh Berkus

Date:

08 November 2014, 20:10:57

On 11/07/2014 05:29 PM, Alvaro Herrera wrote:
> Josh Berkus wrote:
>> Of course, this will lead to LOTs of additional vacuuming ...
> 
> There's a trade-off here: more vacuuming I/O usage for less disk space
> used.  How stressed your customers really are about 1 GB of disk space?

These customers not so much.  The users I encountered on chat whose
pg_multixact was over 20GB, and larger than their database?  Lots.

On 11/08/2014 03:54 AM, Andres Freund wrote:
> On 2014-11-07 17:20:44 -0800, Josh Berkus wrote:
>> So the basic problem is that multixact files are just huge, with an
>> average of 35 bytes per multixact?
>
> Depends on the concurrency. The number of members is determined by the
> number of xacts concurrenly locking a row..

Yeah, that leads to some extreme inflation for databases where FK
conflicts are common though.

On 11/08/2014 03:54 AM, Andres Freund wrote:
> On 2014-11-07 17:20:44 -0800, Josh Berkus wrote:
>> Of course, this will lead to LOTs of additional vacuuming ...
>
> Yes. And that's likely to cause much, much more grief.
>
> Also. Didn't you just *vehemently* oppose making these values tunable at
> all?

Yes, I opposed adding a *user* tunable with zero information on how it
should be tuned or why.  I always do and always will.  I also think our
defaults for multixact freezing should be tied to the ones for xid
freezing, and should not by default be completely independent numbers;
I'm still not convinced that it makes sense to have a separate multixact
threshold at all **since the same amount of vacuuming needs to be done
regardless of whether we're truncating xids or mxids**.

Certainly when I play with tuning this for customers, I'm going to lower
vacuum_freeze_table_age as well.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pg_multixact not getting truncated

From

Andres Freund

Date:

08 November 2014, 21:46:32

On 2014-11-08 12:10:48 -0800, Josh Berkus wrote:
> On 11/07/2014 05:29 PM, Alvaro Herrera wrote:
> > Josh Berkus wrote:
> >> Of course, this will lead to LOTs of additional vacuuming ...
> > 
> > There's a trade-off here: more vacuuming I/O usage for less disk space
> > used.  How stressed your customers really are about 1 GB of disk space?
> 
> These customers not so much.  The users I encountered on chat whose
> pg_multixact was over 20GB, and larger than their database?  Lots.

> On 11/08/2014 03:54 AM, Andres Freund wrote:
> > On 2014-11-07 17:20:44 -0800, Josh Berkus wrote:
> >> So the basic problem is that multixact files are just huge, with an
> >> average of 35 bytes per multixact?
> >
> > Depends on the concurrency. The number of members is determined by the
> > number of xacts concurrenly locking a row..
> 
> Yeah, that leads to some extreme inflation for databases where FK
> conflicts are common though.

On the other hand, those are the ones benefitting most from the gain in
concurrency.

> On 11/08/2014 03:54 AM, Andres Freund wrote:
> > On 2014-11-07 17:20:44 -0800, Josh Berkus wrote:
> >> Of course, this will lead to LOTs of additional vacuuming ...
> >
> > Yes. And that's likely to cause much, much more grief.
> >
> > Also. Didn't you just *vehemently* oppose making these values tunable at
> > all?
> 
> Yes, I opposed adding a *user* tunable with zero information on how it
> should be tuned or why. I always do and always will.

I think that's primarily naive. We don't always *have* that knowledge
ahead of time. There's interactions in the real world that we are not
able to predict. And people are usually more happy to find that their
problem can be fixed by tuning a somewhat obscure GUC than having to
patch their server or, much worse, upgrade to a newer major version that
just came out. *No* user knows all our GUCs, even the really experienced
ones only know half or so. And that's not because there's too
many. Unless there are only three, they'll never.

> I also think our
> defaults for multixact freezing should be tied to the ones for xid
> freezing, and should not by default be completely independent numbers;

I think it'd be a good idea to tune them more automatedly in the
future. But I think the current situation where you can vastly increase
multivacuum_freeze_max_age while having
multivacuum_multixact_freeze_max_age is *much* more useful in practice
than when they always were the same.

> I'm still not convinced that it makes sense to have a separate multixact
> threshold at all **since the same amount of vacuuming needs to be done
> regardless of whether we're truncating xids or mxids**.

That's just plain wrong. The growth rate of one can be nearly
independent of the other. It can e.g. be very sensible to have a huge
xid freeze limit, but a much smaller multixact limit.

> Certainly when I play with tuning this for customers, I'm going to lower
> vacuum_freeze_table_age as well.

I'm these days suggesting that people should add manual vacuuming for
"older" relations during off peak hours on busy databases. There's too
many sites which service degrades noticeably during a full table vacuum.

If you actually mean autovacuum_freeze_max_age - I don't generally
agree. It very often can be a good idea to significantly increase it.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: pg_multixact not getting truncated

From

Josh Berkus

Date:

10 November 2014, 04:01:09

On 11/08/2014 01:46 PM, Andres Freund wrote:

> I think it'd be a good idea to tune them more automatedly in the
> future. But I think the current situation where you can vastly increase
> multivacuum_freeze_max_age while having
> multivacuum_multixact_freeze_max_age is *much* more useful in practice
> than when they always were the same.

Can you explain that?  Because I'm not picturing the situation where
that would make sense.

>> I'm still not convinced that it makes sense to have a separate multixact
>> threshold at all **since the same amount of vacuuming needs to be done
>> regardless of whether we're truncating xids or mxids**.
> 
> That's just plain wrong. The growth rate of one can be nearly
> independent of the other. It can e.g. be very sensible to have a huge
> xid freeze limit, but a much smaller multixact limit.

Yah, so who cares?  Either way you're in for a full table scan, and if
you're doing the full table scan anyway, you might as well freeze the xids.

>> Certainly when I play with tuning this for customers, I'm going to lower
>> vacuum_freeze_table_age as well.
> 
> I'm these days suggesting that people should add manual vacuuming for
> "older" relations during off peak hours on busy databases. There's too
> many sites which service degrades noticeably during a full table vacuum.

Me too: https://github.com/pgexperts/flexible-freeze

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pg_multixact not getting truncated

From

Josh Berkus

Date:

10 November 2014, 06:17:03

On 11/09/2014 08:00 PM, Josh Berkus wrote:
On 11/08/2014 01:46 PM, Andres Freund wrote:
>> I'm these days suggesting that people should add manual vacuuming for
>> > "older" relations during off peak hours on busy databases. There's too
>> > many sites which service degrades noticeably during a full table vacuum.
> Me too: https://github.com/pgexperts/flexible-freeze

It turns out that not even a program of preventative scheduled vacuuming
helps.  This is because the template0 database anchors the minmxid and
prevents it from being advanced until autovacuum gets around to that
database, at whatever the minmxid threshold is.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pg_multixact not getting truncated

From

Robert Haas

Date:

10 November 2014, 13:51:35

On Sat, Nov 8, 2014 at 3:10 PM, Josh Berkus <josh@agliodbs.com> wrote:
> I also think our
> defaults for multixact freezing should be tied to the ones for xid
> freezing, and should not by default be completely independent numbers;
> I'm still not convinced that it makes sense to have a separate multixact
> threshold at all **since the same amount of vacuuming needs to be done
> regardless of whether we're truncating xids or mxids**.

That doesn't make any sense at all.  For one thing, it's not as if
there is only ONE threshold here.  There are three different ones,
controlling three different aspects of the behavior: (a) the age at
which we begin trying to freeze the pages we are planning to vacuum
anyway, (b) the age at which we force a vacuum that we're planning to
do anyway to scan the entire table, and (c) the age at which we
trigger an autovacuum that we weren't otherwise planning to do.

Generally, the reason why I think we need different thresholds for
XIDs and MXIDs is that they may be getting consumed at vastly
different rates.  It may be useful to have a light that comes on in
your car when you only have one gallon of fuel left, but you'd want a
different threshold for an airplane because it burns fuel at a
different rate.  If you made that light come on when there's a gallon
of fuel left, it would way too late to do any good.

I think a big part of the tuning problem here is that we don't have
any way of knowing what the real burn rates will be in a particular
customer environment.  If you're burning MXIDs very slowly, you
probably want to threshold (a), the age at which we begin freezing
pages we are planning to vacuum anyway, quite low, so that the next
full-table vacuum triggered by XID consumption freezes all the MXIDs
also, and advances relminmxid, thus preventing freezing passes
specifically for MXIDs from ever happening.  But if the MXID
consumption rate is very high, that may result in unnecessary I/O
freezing tuples that would have been updated before MXID age became an
issue anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_multixact not getting truncated

From

Jim Nasby

Date:

13 November 2014, 00:17:02

On 11/10/14, 12:16 AM, Josh Berkus wrote:
> On 11/09/2014 08:00 PM, Josh Berkus wrote:
> On 11/08/2014 01:46 PM, Andres Freund wrote:
>>> I'm these days suggesting that people should add manual vacuuming for
>>>> "older" relations during off peak hours on busy databases. There's too
>>>> many sites which service degrades noticeably during a full table vacuum.
>> Me too: https://github.com/pgexperts/flexible-freeze
>
> It turns out that not even a program of preventative scheduled vacuuming
> helps.  This is because the template0 database anchors the minmxid and
> prevents it from being advanced until autovacuum gets around to that
> database, at whatever the minmxid threshold is.

How did template0 even get a MultiXact? That sounds like they're really abusing the template databases. :( (Do keep in
mindthat MXID 1 is a special value.)

Regarding linking the two settings, I agree with others that XIDs and MXIDs are basically completely independent (as
yourcustomer apparently has discovered). If you set both of the min_age parameters fairly small then it doesn't matter
whichmax limit (the table_age parameters) you hit; you'll get a full scan and the low min_age limits will mean you'll
getgood freezing of both.

The only other thing I can think of would be having yet another set of minimum age limits that come into play when
you'redoing a full scan as opposed to a partial one, but that seems like overkill to me. I guess another option would
beto get more aggressive depending on the size of pg_multixact/...

BTW, the only reason I know of not to set both min_age parameters to zero is to prevent loss of forensic information.
Ifthat's not a concern you can always just set them to zero. Even if it is a concern, I suspect that the forensic info
youcould gather from a MultiXact is a lot more limited than for an XID, so it's probably pretty safe setting that to
zero.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: pg_multixact not getting truncated

From

Alvaro Herrera

Date:

13 November 2014, 02:57:34

Jim Nasby wrote:
> On 11/10/14, 12:16 AM, Josh Berkus wrote:
> >On 11/09/2014 08:00 PM, Josh Berkus wrote:
> >On 11/08/2014 01:46 PM, Andres Freund wrote:
> >>>I'm these days suggesting that people should add manual vacuuming for
> >>>>"older" relations during off peak hours on busy databases. There's too
> >>>>many sites which service degrades noticeably during a full table vacuum.
> >>Me too: https://github.com/pgexperts/flexible-freeze
> >
> >It turns out that not even a program of preventative scheduled vacuuming
> >helps.  This is because the template0 database anchors the minmxid and
> >prevents it from being advanced until autovacuum gets around to that
> >database, at whatever the minmxid threshold is.
> 
> How did template0 even get a MultiXact? That sounds like they're really abusing the template databases. :( (Do keep
inmind that MXID 1 is a special value.)
 

No, it's normal -- template0 does not have a multixact in any tuple's
xmax, but datminxid is set to the value that is current when it is
frozen.

> BTW, the only reason I know of not to set both min_age parameters to
> zero is to prevent loss of forensic information. If that's not a
> concern you can always just set them to zero. Even if it is a concern,
> I suspect that the forensic info you could gather from a MultiXact is
> a lot more limited than for an XID, so it's probably pretty safe
> setting that to zero.

Freezing tuples too early could cause useless dirtying of pages; if the
tuple is deleted, updated or locked again after being frozen, you end up
with more writes.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: pg_multixact not getting truncated

From

Josh Berkus

Date:

19 November 2014, 20:58:07

On 11/12/2014 06:57 PM, Alvaro Herrera wrote:
>> How did template0 even get a MultiXact? That sounds like they're really abusing the template databases. :( (Do keep
inmind that MXID 1 is a special value.)

> No, it's normal -- template0 does not have a multixact in any tuple's
> xmax, but datminxid is set to the value that is current when it is
> frozen.
> 

So, to follow up on this: it seems to me that we shouldn't be requiring
freezing for databases where allowconn=false.  This seems like a TODO to
me, even possibly a backpatchable bug fix.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pg_multixact not getting truncated

From

Alvaro Herrera

Date:

19 November 2014, 21:03:06

Josh Berkus wrote:
> On 11/12/2014 06:57 PM, Alvaro Herrera wrote:
> >> How did template0 even get a MultiXact? That sounds like they're really abusing the template databases. :( (Do
keepin mind that MXID 1 is a special value.)
 
> > No, it's normal -- template0 does not have a multixact in any tuple's
> > xmax, but datminxid is set to the value that is current when it is
> > frozen.
> 
> So, to follow up on this: it seems to me that we shouldn't be requiring
> freezing for databases where allowconn=false.  This seems like a TODO to
> me, even possibly a backpatchable bug fix.

Why do we need this for pg_multixact but not for pg_clog?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: pg_multixact not getting truncated

From

Josh Berkus

Date:

19 November 2014, 21:16:48

On 11/19/2014 01:03 PM, Alvaro Herrera wrote:
> Josh Berkus wrote:
>> On 11/12/2014 06:57 PM, Alvaro Herrera wrote:
>>>> How did template0 even get a MultiXact? That sounds like they're really abusing the template databases. :( (Do
keepin mind that MXID 1 is a special value.)
 
>>> No, it's normal -- template0 does not have a multixact in any tuple's
>>> xmax, but datminxid is set to the value that is current when it is
>>> frozen.
>>
>> So, to follow up on this: it seems to me that we shouldn't be requiring
>> freezing for databases where allowconn=false.  This seems like a TODO to
>> me, even possibly a backpatchable bug fix.
> 
> Why do we need this for pg_multixact but not for pg_clog?

I think we want it for both.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pg_multixact not getting truncated

From

Robert Haas

Date:

20 November 2014, 16:02:37

On Wed, Nov 19, 2014 at 4:16 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 11/19/2014 01:03 PM, Alvaro Herrera wrote:
>> Josh Berkus wrote:
>>> On 11/12/2014 06:57 PM, Alvaro Herrera wrote:
>>>>> How did template0 even get a MultiXact? That sounds like they're really abusing the template databases. :( (Do
keepin mind that MXID 1 is a special value.)
 
>>>> No, it's normal -- template0 does not have a multixact in any tuple's
>>>> xmax, but datminxid is set to the value that is current when it is
>>>> frozen.
>>>
>>> So, to follow up on this: it seems to me that we shouldn't be requiring
>>> freezing for databases where allowconn=false.  This seems like a TODO to
>>> me, even possibly a backpatchable bug fix.
>>
>> Why do we need this for pg_multixact but not for pg_clog?
>
> I think we want it for both.

So that we can have two ways to lose data?

Forbidding connections to a database doesn't prevent XID or MXID wraparound.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_multixact not getting truncated

From

Josh Berkus

Date:

20 November 2014, 20:44:43

> So that we can have two ways to lose data?
> 
> Forbidding connections to a database doesn't prevent XID or MXID wraparound.

It does prevent the user from doing anything about it, though, since
they can't manually vacuum template0 without knowing unpublished hackery.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pg_multixact not getting truncated

From

Robert Haas

Date:

20 November 2014, 21:04:02

On Thu, Nov 20, 2014 at 3:44 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> So that we can have two ways to lose data?
>>
>> Forbidding connections to a database doesn't prevent XID or MXID wraparound.
>
> It does prevent the user from doing anything about it, though, since
> they can't manually vacuum template0 without knowing unpublished hackery.

True.  I don't know what to do about that.  Do you?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_multixact not getting truncated

From

Alvaro Herrera

Date:

20 November 2014, 21:10:36

Robert Haas wrote:
> On Thu, Nov 20, 2014 at 3:44 PM, Josh Berkus <josh@agliodbs.com> wrote:
> >> So that we can have two ways to lose data?
> >>
> >> Forbidding connections to a database doesn't prevent XID or MXID wraparound.
> >
> > It does prevent the user from doing anything about it, though, since
> > they can't manually vacuum template0 without knowing unpublished hackery.
> 
> True.  I don't know what to do about that.  Do you?

Maybe tweak autovacuum so that it vacuum-freezes the non-connectable
template databases when they are multixact_freeze_min_age old -- or
something similar.  That would cause the multixact age to go down to
zero for those databases with enough frequency.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: pg_multixact not getting truncated

From

Josh Berkus

Date:

20 November 2014, 21:21:42

On 11/20/2014 01:03 PM, Robert Haas wrote:
> On Thu, Nov 20, 2014 at 3:44 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> So that we can have two ways to lose data?
>>>
>>> Forbidding connections to a database doesn't prevent XID or MXID wraparound.
>>
>> It does prevent the user from doing anything about it, though, since
>> they can't manually vacuum template0 without knowing unpublished hackery.
> 
> True.  I don't know what to do about that.  Do you?

Well, the first thing that comes to mind is that template0 should be
"permanently frozen".  That is, all objects in it should be created with
frozen xid and mxids.  After all, nobody can modify anything in it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pg_multixact not getting truncated

From

Tom Lane

Date:

20 November 2014, 21:47:49

Josh Berkus <josh@agliodbs.com> writes:
> Well, the first thing that comes to mind is that template0 should be
> "permanently frozen".  That is, all objects in it should be created with
> frozen xid and mxids.  After all, nobody can modify anything in it.

That sounds about as unsafe as can be.  You can't stop superusers from
connecting to template0 and modifying it if they want to ... and I don't
really want to say "ok, the consequence of that is silent disaster many
moons later".
        regards, tom lane

Re: pg_multixact not getting truncated

From

Josh Berkus

Date:

21 November 2014, 02:06:42

On 11/20/2014 01:47 PM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> Well, the first thing that comes to mind is that template0 should be
>> "permanently frozen".  That is, all objects in it should be created with
>> frozen xid and mxids.  After all, nobody can modify anything in it.
> 
> That sounds about as unsafe as can be.  You can't stop superusers from
> connecting to template0 and modifying it if they want to ... and I don't
> really want to say "ok, the consequence of that is silent disaster many
> moons later".

So it would get unfrozen when they modify it, and they'd have to deal
with it.  Right now we're optimizing for something only 0.1% of users
ever do.

The harder part of this -- the handwavy part -- is the whole idea of a
"permanent freeze".  Right now there's no way to mark anything as
"frozen until next modifed", we're just resetting the clock on it.  If
there were any such thing, it would solve some of the problems around
vacuum freeze.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pg_multixact not getting truncated

From

Greg Stark

Date:

21 November 2014, 18:09:05

<div dir="ltr"><p dir="ltr"><br /> On Thu, Nov 20, 2014 at 9:21 PM, Josh Berkus <<a href="mailto:josh@agliodbs.com"
target="_blank">josh@agliodbs.com</a>>wrote:<br /> > Well, the first thing that comes to mind is that template0
shouldbe<br /> > "permanently frozen". That is, all objects in it should be created with<br /> > frozen xid and
mxids.After all, nobody can modify anything in it.<p dir="ltr">This is actually the way it used to be. It was changed
becauseit was discovered there was some case where an unfrozen xid would end up in template0 anyways and for some
reasonit was hard to be sure to avoid it. I don't recall exactly what the situation was that triggered it but the
argumentwas made then that it was safest to just include template0 in autovacuum rather than depend on getting this
100%right and risk corruption.<p dir="ltr">--<br /> greg</div>

Re: pg_multixact not getting truncated

From

Josh Berkus

Date:

21 November 2014, 18:45:03

Greg,


> This is actually the way it used to be. It was changed because it was
> discovered there was some case where an unfrozen xid would end up in
> template0 anyways and for some reason it was hard to be sure to avoid it. I
> don't recall exactly what the situation was that triggered it but the
> argument was made then that it was safest to just include template0 in
> autovacuum rather than depend on getting this 100% right and risk
> corruption.

Right, and that was fine before pg_multixact, because even with 500m
XIDs in the bank, pg_clog is still pretty small.  The problem is that
with the same number of multixacts, pg_multixact is around *16GB* in size.

Thing is, template0 is just there as a check on users messing up
template1.  Having that kind if precaution causing repeated operational
problems for users is not good design.  Maybe we should just get rid of
template0 and come up with some other mechanism to reset template1 to
bare-bones state.

Actually, here's a question ... pg_clog is usually smaller than I think
it should be (that is, smaller than 4bytes * XID_age).  Why is that?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pg_multixact not getting truncated

From

Josh Berkus

Date:

21 November 2014, 18:51:51

On 11/21/2014 10:44 AM, Josh Berkus wrote:
> Greg,
> 
> 
>> This is actually the way it used to be. It was changed because it was
>> discovered there was some case where an unfrozen xid would end up in
>> template0 anyways and for some reason it was hard to be sure to avoid it. I
>> don't recall exactly what the situation was that triggered it but the
>> argument was made then that it was safest to just include template0 in
>> autovacuum rather than depend on getting this 100% right and risk
>> corruption.
> 
> Right, and that was fine before pg_multixact, because even with 500m
> XIDs in the bank, pg_clog is still pretty small.  The problem is that
> with the same number of multixacts, pg_multixact is around *16GB* in size.
> 
> Thing is, template0 is just there as a check on users messing up
> template1.  Having that kind if precaution causing repeated operational
> problems for users is not good design.  Maybe we should just get rid of
> template0 and come up with some other mechanism to reset template1 to
> bare-bones state.

Or and even simpler solution: provide a way for the superuser to
manually vacuum template0 *without* needing to update pg_database.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: pg_multixact not getting truncated

From

Alvaro Herrera

Date:

21 November 2014, 19:20:56

Josh Berkus wrote:

> Actually, here's a question ... pg_clog is usually smaller than I think
> it should be (that is, smaller than 4bytes * XID_age).  Why is that?

It's not 4 bytes per Xid, but 2 bits per Xid, so Xid_age / 4.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: pg_multixact not getting truncated

From

Jim Nasby

Date:

21 November 2014, 22:23:55

On 11/21/14, 12:51 PM, Josh Berkus wrote:
> On 11/21/2014 10:44 AM, Josh Berkus wrote:
>> Greg,
>>
>>
>>> This is actually the way it used to be. It was changed because it was
>>> discovered there was some case where an unfrozen xid would end up in
>>> template0 anyways and for some reason it was hard to be sure to avoid it. I
>>> don't recall exactly what the situation was that triggered it but the
>>> argument was made then that it was safest to just include template0 in
>>> autovacuum rather than depend on getting this 100% right and risk
>>> corruption.
>>
>> Right, and that was fine before pg_multixact, because even with 500m
>> XIDs in the bank, pg_clog is still pretty small.  The problem is that
>> with the same number of multixacts, pg_multixact is around *16GB* in size.
>>
>> Thing is, template0 is just there as a check on users messing up
>> template1.  Having that kind if precaution causing repeated operational
>> problems for users is not good design.  Maybe we should just get rid of
>> template0 and come up with some other mechanism to reset template1 to
>> bare-bones state.
>
> Or and even simpler solution: provide a way for the superuser to
> manually vacuum template0 *without* needing to update pg_database.

AIUI, this is only an issue because evin if you completely freeze a normal database you can't set frozenxid or minmxid
toFrozen because that will be wrong as soon as any DML happens and we don't want to screw with a real-time update to
pg_classand pg_database. But any database you can't connect to is clearly a special case.

What if we allowed you to vacuum a database you couldn't connect to, and while scanning such a database tracked whether
eachrel was completely frozen? Because no one else could have connected we know that no new (M)XIDs could have been
createdin that database.

Is there any fundamental reason a vacuum from one database couldn't vacuum relations in another database, so long as no
onecould be connected to it? I'm sure there's some sanity checks that would need to be modified...

We'd need something more sophisticated than datcanconnect for this to work as well, since we'd need to prevent anyone
fromcopying a database while being vacuumed, as well as preventing anyone from changing datcanconnect while the vacuum
isrunning.

-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com