Re: pg_multixact issues - Mailing list pgsql-general

From Thomas Munro
Subject Re: pg_multixact issues
Date
Msg-id CAEepm=2w2AXSpibKrF==44b1tHo5tWtpB+8NrT+XwQB_pW2-rA@mail.gmail.com
Whole thread Raw
In response to pg_multixact issues  (Kiriakos Georgiou <kg.postgresql@olympiakos.com>)
Responses Re: pg_multixact issues  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-general
On Thu, Feb 11, 2016 at 7:52 AM, Kiriakos Georgiou
<kg.postgresql@olympiakos.com> wrote:
> Hello,
>
> Our pg_multixact directory keeps growing.  I did a "vacuum freeze” which
> didn’t help.  I also did a "vacuum full” which didn’t help either.
> We had this condition with 9.3.4 as well.  When I upgraded our cluster to
> 9.4.5 (via plain sql dump and load) as expected the issue was resolved but
> now it’s happening again.  Luckily it has no ill effect other than consuming
> 4G of space for an otherwise 1G database.
>
> Can you offer any hints as to how I can cure this?
>
> thanks,
> Kiriakos Georgiou
>
>
> pg_controldata output:
>
> pg_control version number:            942
> Catalog version number:               201409291
> Database system identifier:           6211781659140720513
> Database cluster state:               in production
> pg_control last modified:             Wed Feb 10 13:45:02 2016
> Latest checkpoint location:           D/FB5FE630
> Prior checkpoint location:            D/FB5FE558
> Latest checkpoint's REDO location:    D/FB5FE5F8
> Latest checkpoint's REDO WAL file:    000000010000000D000000FB
> Latest checkpoint's TimeLineID:       1
> Latest checkpoint's PrevTimeLineID:   1
> Latest checkpoint's full_page_writes: on
> Latest checkpoint's NextXID:          0/3556219
> Latest checkpoint's NextOID:          2227252
> Latest checkpoint's NextMultiXactId:  2316566
> Latest checkpoint's NextMultiOffset:  823062151
> Latest checkpoint's oldestXID:        668
> Latest checkpoint's oldestXID's DB:   1
> Latest checkpoint's oldestActiveXID:  3556219
> Latest checkpoint's oldestMultiXid:   1
> Latest checkpoint's oldestMulti's DB: 1
> Time of latest checkpoint:            Wed Feb 10 13:45:02 2016
> Fake LSN counter for unlogged rels:   0/1
> Minimum recovery ending location:     0/0
> Min recovery ending loc's timeline:   0
> Backup start location:                0/0
> Backup end location:                  0/0
> End-of-backup record required:        no
> Current wal_level setting:            hot_standby
> Current wal_log_hints setting:        off
> Current max_connections setting:      100
> Current max_worker_processes setting: 8
> Current max_prepared_xacts setting:   0
> Current max_locks_per_xact setting:   1024
> Maximum data alignment:               8
> Database block size:                  8192
> Blocks per segment of large relation: 131072
> WAL block size:                       8192
> Bytes per WAL segment:                16777216
> Maximum length of identifiers:        64
> Maximum columns in an index:          32
> Maximum size of a TOAST chunk:        1996
> Size of a large-object chunk:         2048
> Date/time type storage:               64-bit integers
> Float4 argument passing:              by value
> Float8 argument passing:              by value
> Data page checksum version:           0
>
> the offsets directory:
>
> -rw-------   1 postgres dba       262144 Nov  3 15:22 0000
> -rw-------   1 postgres dba       262144 Nov  5 12:45 0001
> -rw-------   1 postgres dba       262144 Nov  9 14:25 0002
> -rw-------   1 postgres dba       262144 Nov 13 10:10 0003
> -rw-------   1 postgres dba       262144 Nov 16 15:40 0004
> -rw-------   1 postgres dba       262144 Nov 20 09:55 0005
> -rw-------   1 postgres dba       262144 Dec  1 08:00 0006
> -rw-------   1 postgres dba       262144 Dec  9 11:50 0007
> -rw-------   1 postgres dba       262144 Dec 16 08:14 0008
> -rw-------   1 postgres dba       262144 Dec 21 09:40 0009
> -rw-------   1 postgres dba       262144 Dec 31 09:55 000A
> -rw-------   1 postgres dba       262144 Jan  4 21:17 000B
> -rw-------   1 postgres dba       262144 Jan  6 10:50 000C
> -rw-------   1 postgres dba       262144 Jan  7 18:20 000D
> -rw-------   1 postgres dba       262144 Jan 13 13:55 000E
> -rw-------   1 postgres dba       262144 Jan 15 11:55 000F
> -rw-------   1 postgres dba       262144 Jan 22 07:50 0010
> -rw-------   1 postgres dba       262144 Jan 26 16:35 0011
> -rw-------   1 postgres dba       262144 Jan 29 10:16 0012
> -rw-------   1 postgres dba       262144 Feb  3 13:17 0013
> -rw-------   1 postgres dba       262144 Feb  3 16:13 0014
> -rw-------   1 postgres dba       262144 Feb  4 08:24 0015
> -rw-------   1 postgres dba       262144 Feb  5 13:20 0016
> -rw-------   1 postgres dba       262144 Feb  8 11:26 0017
> -rw-------   1 postgres dba       262144 Feb  8 11:46 0018
> -rw-------   1 postgres dba       262144 Feb  8 12:25 0019
> -rw-------   1 postgres dba       262144 Feb  8 13:19 001A
> -rw-------   1 postgres dba       262144 Feb  8 14:23 001B
> -rw-------   1 postgres dba       262144 Feb  8 15:32 001C
> -rw-------   1 postgres dba       262144 Feb  8 17:01 001D
> -rw-------   1 postgres dba       262144 Feb  8 19:19 001E
> -rw-------   1 postgres dba       262144 Feb  8 22:11 001F
> -rw-------   1 postgres dba       262144 Feb  9 01:44 0020
> -rw-------   1 postgres dba       262144 Feb  9 05:57 0021
> -rw-------   1 postgres dba       262144 Feb  9 10:45 0022
> -rw-------   1 postgres dba        98304 Feb 10 13:35 0023
>
> the members directory has 15723 files:
> ls -l|wc -l
>    15723

Maybe you didn't vacuum freeze your template0 database and it's
probably holding back the oldest multixact ID -- take a look at the
datminmxid column in pg_database.

You seem to be generating fairly large multixacts, and they use disk
space until the cluster-wide oldest multixact ID is advanced by vacuum
freeze.  You can estimate the average number of members per multixact
like this (assuming default page size):

number of members = number of member segment files * 1636 * 32
number of multixacts = number of offsets segment files * 2048 * 32

You have about 2.3 million multixacts, and about 823 million members,
so your multixacts have an average of around 358 members each.

There are two ways for autovacuum to be automatically triggered to
clean up old multixact state:

1.  When you have more than vacuum_multixact_freeze_table_age (default
150 million) multixacts, it will perform a wraparound vacuum to
advance the cluster's oldest multixact ID.  The oldest is 1 for you,
meaning this has probably never happened, and you're only up to around
2.3 million so it's not going to happen for a long time with the
default setting.

2.  When it thinks that the member address space is in danger of
wrapping, which kicks in before the above for clusters with large
average multixact size.  This happens after
MULTIXACT_MEMBER_SAFE_THRESHOLD is exceeded, which is when you get to
~2.2 billion members, and you're not quite half way there yet.

I'm not sure what the best course of action is for you, but here are
some thoughts:

1.  You could manually freeze all databases with vacuumdb --freeze
--all, but this requires first setting datallowcon to true for your
template0 database, because otherwise it would not be vacuum-frozen by
any command you can issue manually, and would therefore prevent the
oldest multixact ID from advancing.

2.  You could change the vacuum_multixact_freeze_table_age to a value
that will cause it to do vacuum freezes more often.  Assuming your
workload continues with the same average multixact size, you could
work backwards from the numbers above: since 2.3 million multixacts
occupy 4GB, if you set vacuum_multixact_freeze_table_age = 500000 you
should expect a size under 1GB.  But note that the freeze operations
can be IO intensive and take a long time depending on your database
size so you don't want them too often.

3.  You could do nothing and wait for autovacuum to detect that you
are using more than half the member address space and trigger a
freeze, which will happen some time after you have around 41k member
segments (occupying around 10GB of disk since they hold 256kb each).

4.  You could look into whether all those multixacts with many member
are really expected.  (Large numbers of concurrent FK checks or
explicit share locks on the same rows perhaps?  A good proportion of
this happened on one day this week I see.)

--
Thomas Munro
http://www.enterprisedb.com


pgsql-general by date:

Previous
From: "drum.lucas@gmail.com"
Date:
Subject: Re: Input data for column is not compatible with data type
Next
From: Ivan Voras
Date:
Subject: Re: Catalog bloat (again)