On 11/03/2014 05:06 PM, Alvaro Herrera wrote:
> Josh Berkus wrote:
>> Hackers,
>>
>> I'm looking at a couple of high-transaction-rate and high-FK-conflict
>> rate servers where pg_multixact has grown to be more than 1GB in size.
>> One such server doesn't appear to be having any notable issues with
>> vacuuming, and the oldest mxid on the system is about 47m old. VACUUM
>> FREEZEing the oldest databases did not cause the pg_multixact dir to get
>> smaller --- it may have even caused it to get larger.
>>
>> Why would pg_multixact not be truncating? Does it never truncate files
>> with aborted multixacts in them? Might we have another multixact bug?
>
> Have a look at the MultiXactId values in pg_controldata, datminmxid in
> pg_database, and relminmxid in pg_class. They should advance as you
> VACUUM FREEZE. If it's stuck at 1, you might be in pg_upgrade trouble
> if you used 9.3.4 or earlier to upgrade.
They're advancing. I also know for a fact that this system was not
pg_upgraded, because I set it up. It's always been on 9.3.5.
So after some vacuum freezing and a checkpoint, the oldestMultiXid
advanced by 9 million, about 20% of the gap with current multiXid.
However, that did not delete 20% of the files; before the Freeze there
were 4414 files, and now there are 4195. So advancing 9m mxids resulted
in only deleting ~~130 files. So I think we definitely have an issue here.
BTW, the reason I started poking into this was a report from a user that
they have a pg_multixact directory which is 21GB in size, and is 2X the
size of the database.
Here's XID data:
Latest checkpoint's NextXID: 0/1126461940
Latest checkpoint's NextOID: 371135838
Latest checkpoint's NextMultiXactId: 162092874
Latest checkpoint's NextMultiOffset: 778360290
Latest checkpoint's oldestXID: 945761490
Latest checkpoint's oldestXID's DB: 370038709
Latest checkpoint's oldestActiveXID: 1126461940
Latest checkpoint's oldestMultiXid: 123452201
Latest checkpoint's oldestMulti's DB: 370038709
Oldest mxid file is 29B2, newest is 3A13
No tables had a relminmxid of 1 (some of the system tables were 0,
though), and the data from pg_class and pg_database is consistent.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com