Home > mailing lists

Thread: Should a DB vacuum use up a lot of space ?

Should a DB vacuum use up a lot of space ?

From

Philippe Girolami

Date:

06 August 2016, 09:57:54

Hi all,

We’ve got a 16TB database that’s run up against the wrap-around tx id issue (despite running auto-vacuum):

ERROR: database is not accepting commands to avoid wraparound data loss in database "public"

HINT: Stop the postmaster and use a standalone backend to vacuum that database.

The message says to VACUUM the whole database so we’ve dropped down to single user mode and issued a “vacuum verbose” command on the backend

1) Unlike other times I’ve run vacuum verbose, I’m not getting any logging message at all (for the past 9 hours)

2) File system usage is going steadily up (3TB over the past 9 hours). Why is that ? I didn’t expect additional disk usage and we’re going to run into disk space usage if this keeps up.

3) Was a database-wide vacuum required or could I have just figured out which tables needed to be vacuumed and vacuumed those (my understanding was no), for example using http://dba.stackexchange.com/a/35234 and a query such as SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid WHERE c.relkind IN ('r', 'm') order by 2 desc limit 10;

4) Assuming I could have just identified individual tables that required vacuuming, can I CTRL-C the current vacuum process without running into issues ?

Thanks

Philippe Girolami

Re: Should a DB vacuum use up a lot of space ?

From

Adrian Klaver

Date:

06 August 2016, 17:14:31

On 08/05/2016 11:57 PM, Philippe Girolami wrote:
> Hi all,
>
>
>
> We’ve got a 16TB database that’s run up against the wrap-around tx id
> issue (despite running auto-vacuum):

What Postgres version?

>
>   ERROR:  database is not accepting commands to avoid wraparound data
> loss in database "public"
>
>   HINT:  Stop the postmaster and use a standalone backend to vacuum that
> database.
>
>
>
> The message says to VACUUM the whole database so we’ve dropped down to
> single user mode and issued a “vacuum verbose” command on the backend
>
>
>
> 1)       Unlike other times I’ve run vacuum verbose, I’m not getting any
> logging message at all (for the past 9 hours)

In single user mode you need to specify the log file on the command line:

https://www.postgresql.org/docs/9.5/static/app-postgres.html

-r filename

     Send all server log output to filename. This option is only honored
when supplied as a command-line option.

The file will show up the $DATA directory.


>
> 2)       File system usage is going steadily up (3TB over the past 9
> hours). Why is that ? I didn’t expect additional disk usage and we’re
> going to run into disk space usage if this keeps up.

What was the exact command you gave?

>
> 3)       Was a database-wide vacuum required or could I have just
> figured out which tables needed to be vacuumed and vacuumed those (my
> understanding was no), for example using
> http://dba.stackexchange.com/a/35234 and a query such as
>                     SELECT c.oid::regclass as table_name,
> greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age FROM pg_class c
> LEFT JOIN pg_class t ON c.reltoastrelid = t.oid WHERE c.relkind IN ('r',
> 'm') order by 2 desc limit 10;

See here:

https://www.postgresql.org/docs/9.5/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

>
> 4)       Assuming I could have just identified individual tables that
> required vacuuming, can I CTRL-C the current vacuum process without
> running into issues ?

Running a test on a small database here, a Ctrl-D in the single user
session closed it down successfully while running a VACUUM. Seems it
comes down to which is the bigger issue, running out of disk space or
having to redo the VACUUM.

>
>
>
> Thanks
>
> Philippe Girolami
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Should a DB vacuum use up a lot of space ?

From

Tom Lane

Date:

06 August 2016, 17:59:19

Philippe Girolami <philippe.girolami@mosaik.com> writes:
> We’ve got a 16TB database that’s run up against the wrap-around tx id issue (despite running auto-vacuum):
>   ERROR:  database is not accepting commands to avoid wraparound data loss in database "public"
>   HINT:  Stop the postmaster and use a standalone backend to vacuum that database.
> The message says to VACUUM the whole database so we’ve dropped down to single user mode and issued a “vacuum verbose”
commandon the backend 

> 2)       File system usage is going steadily up (3TB over the past 9
> hours). Why is that ?

Probably because there's no background process to issue auto-checkpoints
and thereby recover WAL space.

I'd try doing a control-C on the VACUUM, then a CHECKPOINT, then start
over.  You might have to vacuum large tables individually and checkpoint
after each one.

            regards, tom lane

Re: Should a DB vacuum use up a lot of space ?

From

Philippe Girolami

Date:

06 August 2016, 22:02:05

Thanks to Tom & Adrian, here’s what happened (my version was 9.1, sorry I forgot to mention it)

1) 10 hours after my email, the VACUUM had used up about 3.5TB but had stopped using up more disk space, it was now
“simply”reading data from the file system
 
2) I attempted to interrupt using CTRL-D to no avail so I interrupted with CTRL-C. That stopped it with a clean message
(butdid not relinquish filesystem space)
 
3) I exited the backend successfully using CTRL-D and relaunched it with the additional “–r” command line argument
4) I ran the query to see which tables were the “oldest” and did not recognize the ones before I started the vacuuming
(encouraging!)
5) I ran CHECKPOINT on the backend and got all the disk space back
6) I realized that the message regarding wraparound was no longer an ERROR but a WARNING so I was able to restart
postgres“normally”
 
7) I ran a query based on my previous query to build VACUUM VERBOSE commands on the tables with the oldest transaction
idsand wrote it to a text file and then execute that file, I now have tens of millions of transactions back and can
restartmy server. I’ll do the rest of the VACUUM maintenance during low-load periods.
 

Cheers
Philippe

Re: Should a DB vacuum use up a lot of space ?

From

Adrian Klaver

Date:

06 August 2016, 22:08:28

On 08/06/2016 12:01 PM, Philippe Girolami wrote:
> Thanks to Tom & Adrian, here’s what happened (my version was 9.1, sorry I forgot to mention it)
>
> 1) 10 hours after my email, the VACUUM had used up about 3.5TB but had stopped using up more disk space, it was now
“simply”reading data from the file system 
> 2) I attempted to interrupt using CTRL-D to no avail so I interrupted with CTRL-C. That stopped it with a clean
message(but did not relinquish filesystem space) 
> 3) I exited the backend successfully using CTRL-D and relaunched it with the additional “–r” command line argument
> 4) I ran the query to see which tables were the “oldest” and did not recognize the ones before I started the
vacuuming(encouraging!) 
> 5) I ran CHECKPOINT on the backend and got all the disk space back
> 6) I realized that the message regarding wraparound was no longer an ERROR but a WARNING so I was able to restart
postgres“normally” 
> 7) I ran a query based on my previous query to build VACUUM VERBOSE commands on the tables with the oldest
transactionids and wrote it to a text file and then execute that file, I now have tens of millions of transactions back
andcan restart my server. I’ll do the rest of the VACUUM maintenance during low-load periods. 

Thanks for the feedback it is nice to 'close the loop' on an issue.

>
> Cheers
> Philippe
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Should a DB vacuum use up a lot of space ?

From

Philippe Girolami

Date:

07 August 2016, 10:32:17

@Adrian, no problem ! I’m sure someone else will run into this and have the same questions, hopefully they’ll find the
answers.

I am seeing something weird though (again, this is v9.1): after my database became usable again, I started getting the
10Mwarning on template0. So I made it connectable and ran VACUUM FREEZE on it and made it unconnectable again. That
resolvethe warning.
 

However, I see the “age” keeps increasing on that database as I ran queries on my own db. Yesterday the age was 32 and
nowit’s already 77933902
 
Is that to be expected ? I didn’t expect it

Thanks

Re: Should a DB vacuum use up a lot of space ?

From

Adrian Klaver

Date:

07 August 2016, 18:20:20

On 08/07/2016 12:32 AM, Philippe Girolami wrote:
> @Adrian, no problem ! I’m sure someone else will run into this and have the same questions, hopefully they’ll find
theanswers. 
>
> I am seeing something weird though (again, this is v9.1): after my database became usable again, I started getting
the10M warning on template0. So I made it connectable and ran VACUUM FREEZE on it and made it unconnectable again. That
resolvethe warning. 
>
> However, I see the “age” keeps increasing on that database as I ran queries on my own db. Yesterday the age was 32
andnow it’s already 77933902 

Just to be sure you are talking about template0?

> Is that to be expected ? I didn’t expect it

As I understand it;

1) xid's are global to the cluster.
2) age(xid) measures the difference between the latest global xid to
whatever xid you supply it.
3) age(datfrozenxid) measures the difference between the minimum value
for the table frozen ids in a particular database and the latest global xid.
4) template0 has a datfrozenxid so there is something for age(xid) to
compute, it just does not mean anything as long as template0 is really a
read-only database. In other words template0 is not actually
contributing any transactions to the consumption of the global store of
xids.

>
> Thanks
>
>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Should a DB vacuum use up a lot of space ?

From

Philippe Girolami

Date:

07 August 2016, 20:26:38

>> I am seeing something weird though (again, this is v9.1): after my database became usable again, I started getting
the10M warning on template0. So I made it connectable and ran VACUUM 
 
>>FREEZE on it and made it unconnectable again. That resolve the warning.
>>
>> However, I see the “age” keeps increasing on that database as I ran queries on my own db. Yesterday the age was 32
andnow it’s already 77933902
 
>Just to be sure you are talking about template0?
Yes, I am

>> Is that to be expected ? I didn’t expect it
>As I understand it;
>    
>    1) xid's are global to the cluster.
>    2) age(xid) measures the difference between the latest global xid to 
>    whatever xid you supply it.
>    3) age(datfrozenxid) measures the difference between the minimum value 
>    for the table frozen ids in a particular database and the latest global xid.
>    4) template0 has a datfrozenxid so there is something for age(xid) to 
>    compute, it just does not mean anything as long as template0 is really a 
>    read-only database. In other words template0 is not actually 
>    contributing any transactions to the consumption of the global store of 
>    xids.
Yes, I understand. I’m just worried that if I see the WARNING for the 100M mark, I’m afraid when it gets to the 1M mark
onthat database it will shut down the cluster.
 

More weirdness this afternoon : the wraparound ERROR showed up again even though I have trouble believing I burned
throughso many transactions in under a day. But let’s assume I did, here is what I noticed
 

1) I vacuumed all other databases. For everyone of those, the age went down to 50M instead of zero. Is that normal ?
2) The only database that didn’t work on was template0 (the age did not change). It did work on template1

Should I suspect something fishy going on ?


Thanks

Re: Should a DB vacuum use up a lot of space ?

From

John R Pierce

Date:

07 August 2016, 20:41:13

On 8/7/2016 10:26 AM, Philippe Girolami wrote:

More weirdness this afternoon : the wraparound ERROR showed up again even though I have trouble believing I burned through so many transactions in under a day. But let’s assume I did, here is what I noticed

1) I vacuumed all other databases. For everyone of those, the age went down to 50M instead of zero. Is that normal ?
2) The only database that didn’t work on was template0 (the age did not change). It did work on template1

Should I suspect something fishy going on ?

do you have any long running 'idle in transaction' sessions? these would show up in pg_stat_activity, you want to look at now()-xact_start to see the age of the oldest of these. no tuples newer than the oldest xact_start can be vacuumed.

select * from pg_stat_activity where state='idle in transaction';
select count(*), max(now()-xact_start) from pg_stat_activity where state='idle in transaction';

-- 
john r pierce, recycling bits in santa cruz

Re: Should a DB vacuum use up a lot of space ?

From

Adrian Klaver

Date:

07 August 2016, 22:22:07

On 08/07/2016 10:26 AM, Philippe Girolami wrote:
>>> I am seeing something weird though (again, this is v9.1): after my database became usable again, I started getting
the10M warning on template0. So I made it connectable and ran VACUUM 
>>> FREEZE on it and made it unconnectable again. That resolve the warning.
>>>
>>> However, I see the “age” keeps increasing on that database as I ran queries on my own db. Yesterday the age was 32
andnow it’s already 77933902 
>> Just to be sure you are talking about template0?
> Yes, I am
>
>>> Is that to be expected ? I didn’t expect it
>> As I understand it;
>>
>>    1) xid's are global to the cluster.
>>    2) age(xid) measures the difference between the latest global xid to
>>    whatever xid you supply it.
>>    3) age(datfrozenxid) measures the difference between the minimum value
>>    for the table frozen ids in a particular database and the latest global xid.
>>    4) template0 has a datfrozenxid so there is something for age(xid) to
>>    compute, it just does not mean anything as long as template0 is really a
>>    read-only database. In other words template0 is not actually
>>    contributing any transactions to the consumption of the global store of
>>    xids.
> Yes, I understand. I’m just worried that if I see the WARNING for the 100M mark, I’m afraid when it gets to the 1M
markon that database it will shut down the cluster. 
>
> More weirdness this afternoon : the wraparound ERROR showed up again even though I have trouble believing I burned
throughso many transactions in under a day. But let’s assume I did, here is what I noticed 
>
> 1) I vacuumed all other databases. For everyone of those, the age went down to 50M instead of zero. Is that normal ?
> 2) The only database that didn’t work on was template0 (the age did not change). It did work on template1

That is expected as template0 is read-only and so VACUUM will not work
on it.

>
> Should I suspect something fishy going on ?

Not sure without more information.

1) Can you be specific about your database references? 'That database'
is open-ended.

2) Show the actual numbers from your xid queries. Both the raw values
and the age() transformed ones.

3) What are your configuration parameters for the variables mentioned in
the section below?:

https://www.postgresql.org/docs/9.1/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

4) If you want to get an idea of fast xid's are being created a quick
and dirty way is from here:

https://www.postgresql.org/docs/9.1/static/functions-info.html

txid_current()     bigint     get current transaction ID

Now if you do select txid_current() outside a transaction it will create
an xid on its own, still if you repeat it over some interval of time you
will get an idea of how fast the server is going through xid's.


>
>
> Thanks
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Should a DB vacuum use up a lot of space ?

From

Philippe Girolami

Date:

08 August 2016, 00:56:07

>That is expected as template0 is read-only and so VACUUM will not work 
>on it.
Isn’t template1 the same ? I’m not seeing that behavior on that one
  
>> Should I suspect something fishy going on ?
    
>    Not sure without more information.  
>    1) Can you be specific about your database references? 'That database' 
>    is open-ended.
“That database” = the database that’s been causing wrap-around problems since yesterday. It’s called “public”

>    2) Show the actual numbers from your xid queries. Both the raw values 
>    and the age() transformed ones.
backend> SELECT datname, datfrozenxid, age(datfrozenxid) FROM pg_database;
         1: datname     (typeid = 19, len = 64, typmod = -1, byval = f)
         2: datfrozenxid        (typeid = 28, len = 4, typmod = -1, byval = t)
         3: age (typeid = 23, len = 4, typmod = -1, byval = t)
        ----
         1: datname = "template1"       (typeid = 19, len = 64, typmod = -1, byval = f)
         2: datfrozenxid = "3814003766" (typeid = 28, len = 4, typmod = -1, byval = t)
         3: age = "50000394"    (typeid = 23, len = 4, typmod = -1, byval = t)
        ----
         1: datname = "postgres"        (typeid = 19, len = 64, typmod = -1, byval = f)
         2: datfrozenxid = "3814003765" (typeid = 28, len = 4, typmod = -1, byval = t)
         3: age = "50000395"    (typeid = 23, len = 4, typmod = -1, byval = t)
        ----
         1: datname = "public"  (typeid = 19, len = 64, typmod = -1, byval = f)
         2: datfrozenxid = "1717520404" (typeid = 28, len = 4, typmod = -1, byval = t)
         3: age = "2146483756"  (typeid = 23, len = 4, typmod = -1, byval = t)
        ----
         1: datname = "xxxx"  (typeid = 19, len = 64, typmod = -1, byval = f)
         2: datfrozenxid = "3814003760" (typeid = 28, len = 4, typmod = -1, byval = t)
         3: age = "50000400"    (typeid = 23, len = 4, typmod = -1, byval = t)
        ----
         1: datname = "osmtest" (typeid = 19, len = 64, typmod = -1, byval = f)
         2: datfrozenxid = "3814003762" (typeid = 28, len = 4, typmod = -1, byval = t)
         3: age = "50000398"    (typeid = 23, len = 4, typmod = -1, byval = t)
        ----
         1: datname = "template0"       (typeid = 19, len = 64, typmod = -1, byval = f)
         2: datfrozenxid = "3732096533" (typeid = 28, len = 4, typmod = -1, byval = t)
         3: age = "131907627"   (typeid = 23, len = 4, typmod = -1, byval = t)
        ----
         1: datname = "drupal_prod"     (typeid = 19, len = 64, typmod = -1, byval = f)
         2: datfrozenxid = "3814003758" (typeid = 28, len = 4, typmod = -1, byval = t)
         3: age = "50000402"    (typeid = 23, len = 4, typmod = -1, byval = t)
        ----
         1: datname = "anta"    (typeid = 19, len = 64, typmod = -1, byval = f)
         2: datfrozenxid = "3814003756" (typeid = 28, len = 4, typmod = -1, byval = t)
         3: age = "50000404"    (typeid = 23, len = 4, typmod = -1, byval = t)
        ----

>    3) What are your configuration parameters for the variables mentioned in 
>    the section below?:    
>   https://www.postgresql.org/docs/9.1/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
Should be the default values, I’ve never changed them.

backend> show vacuum_freeze_min_age
         1: vacuum_freeze_min_age       (typeid = 25, len = -1, typmod = -1, byval = f)
        ----
         1: vacuum_freeze_min_age = "50000000"  (typeid = 25, len = -1, typmod = -1, byval = f)

backend> show vacuum_freeze_table_age
         1: vacuum_freeze_table_age     (typeid = 25, len = -1, typmod = -1, byval = f)
        ----
         1: vacuum_freeze_table_age = "150000000"       (typeid = 25, len = -1, typmod = -1, byval = f)

backend> show autovacuum_freeze_max_age
         1: autovacuum_freeze_max_age   (typeid = 25, len = -1, typmod = -1, byval = f)
        ----
         1: autovacuum_freeze_max_age = "200000000"     (typeid = 25, len = -1, typmod = -1, byval = f)

backend> show autovacuum_vacuum_threshold
         1: autovacuum_vacuum_threshold (typeid = 25, len = -1, typmod = -1, byval = f)
        ----
         1: autovacuum_vacuum_threshold = "50"  (typeid = 25, len = -1, typmod = -1, byval = f)

backend> show autovacuum_max_workers
         1: autovacuum_max_workers      (typeid = 25, len = -1, typmod = -1, byval = f)
        ----
         1: autovacuum_max_workers = "3"        (typeid = 25, len = -1, typmod = -1, byval = f)

backend> show autovacuum_vacuum_scale_factor
         1: autovacuum_vacuum_scale_factor      (typeid = 25, len = -1, typmod = -1, byval = f)
        ----
         1: autovacuum_vacuum_scale_factor = "0.2"      (typeid = 25, len = -1, typmod = -1, byval = f)

backend> show autovacuum_vacuum_cost_delay
         1: autovacuum_vacuum_cost_delay        (typeid = 25, len = -1, typmod = -1, byval = f)
        ----
         1: autovacuum_vacuum_cost_delay = "20ms"       (typeid = 25, len = -1, typmod = -1, byval = f)

backend> show autovacuum_vacuum_cost_limit
         1: autovacuum_vacuum_cost_limit        (typeid = 25, len = -1, typmod = -1, byval = f)
        ----
         1: autovacuum_vacuum_cost_limit = "-1" (typeid = 25, len = -1, typmod = -1, byval = f)
    
>    4) If you want to get an idea of fast xid's are being created a quick 
>    and dirty way is from here:
    
>    https://www.postgresql.org/docs/9.1/static/functions-info.html
    
>    txid_current()     bigint     get current transaction ID
    
>    Now if you do select txid_current() outside a transaction it will create 
>    an xid on its own, still if you repeat it over some interval of time you 
>    will get an idea of how fast the server is going through xid's.
Well I’m now at a point where that’s not even possible, I have consistently run into the following
• I vacuum enough table to get back a couple dozen transactions below the 1M mark
• I exit singleuser mode and relaunch the server so I keep on vacuuming the oldest table first by copying
• But I don’t even have time to launch my script : as soon as launch the server, the auto-vacuum daemon kicks in and
burnsthrough the transactions : I guess it’s not smart enough to start with the oldest tables ?
 

So it looks like I’m going to have to bite the bullet and really vacuum the whole database instead of just the oldest
tablesfirst which will impact our production pipelines.

Re: Should a DB vacuum use up a lot of space ?

From

Adrian Klaver

Date:

08 August 2016, 04:09:05

On 08/07/2016 02:55 PM, Philippe Girolami wrote:
>> That is expected as template0 is read-only and so VACUUM will not work
>> on it.
> Isn’t template1 the same ? I’m not seeing that behavior on that one
>
>>> Should I suspect something fishy going on ?
>
>>    Not sure without more information.
>>    1) Can you be specific about your database references? 'That database'
>>    is open-ended.
> “That database” = the database that’s been causing wrap-around problems since yesterday. It’s called “public”
>
>>    2) Show the actual numbers from your xid queries. Both the raw values
>>    and the age() transformed ones.
> backend> SELECT datname, datfrozenxid, age(datfrozenxid) FROM pg_database;
>          1: datname     (typeid = 19, len = 64, typmod = -1, byval = f)
>          2: datfrozenxid        (typeid = 28, len = 4, typmod = -1, byval = t)
>          3: age (typeid = 23, len = 4, typmod = -1, byval = t)
>         ----
>          1: datname = "template1"       (typeid = 19, len = 64, typmod = -1, byval = f)
>          2: datfrozenxid = "3814003766" (typeid = 28, len = 4, typmod = -1, byval = t)
>          3: age = "50000394"    (typeid = 23, len = 4, typmod = -1, byval = t)
>         ----
>          1: datname = "postgres"        (typeid = 19, len = 64, typmod = -1, byval = f)
>          2: datfrozenxid = "3814003765" (typeid = 28, len = 4, typmod = -1, byval = t)
>          3: age = "50000395"    (typeid = 23, len = 4, typmod = -1, byval = t)
>         ----
>          1: datname = "public"  (typeid = 19, len = 64, typmod = -1, byval = f)
>          2: datfrozenxid = "1717520404" (typeid = 28, len = 4, typmod = -1, byval = t)
>          3: age = "2146483756"  (typeid = 23, len = 4, typmod = -1, byval = t)
>         ----
>          1: datname = "xxxx"  (typeid = 19, len = 64, typmod = -1, byval = f)
>          2: datfrozenxid = "3814003760" (typeid = 28, len = 4, typmod = -1, byval = t)
>          3: age = "50000400"    (typeid = 23, len = 4, typmod = -1, byval = t)
>         ----
>          1: datname = "osmtest" (typeid = 19, len = 64, typmod = -1, byval = f)
>          2: datfrozenxid = "3814003762" (typeid = 28, len = 4, typmod = -1, byval = t)
>          3: age = "50000398"    (typeid = 23, len = 4, typmod = -1, byval = t)
>         ----
>          1: datname = "template0"       (typeid = 19, len = 64, typmod = -1, byval = f)
>          2: datfrozenxid = "3732096533" (typeid = 28, len = 4, typmod = -1, byval = t)
>          3: age = "131907627"   (typeid = 23, len = 4, typmod = -1, byval = t)
>         ----
>          1: datname = "drupal_prod"     (typeid = 19, len = 64, typmod = -1, byval = f)
>          2: datfrozenxid = "3814003758" (typeid = 28, len = 4, typmod = -1, byval = t)
>          3: age = "50000402"    (typeid = 23, len = 4, typmod = -1, byval = t)
>         ----
>          1: datname = "anta"    (typeid = 19, len = 64, typmod = -1, byval = f)
>          2: datfrozenxid = "3814003756" (typeid = 28, len = 4, typmod = -1, byval = t)
>          3: age = "50000404"    (typeid = 23, len = 4, typmod = -1, byval = t)
>         ----
>
>>    3) What are your configuration parameters for the variables mentioned in
>>    the section below?:
>>   https://www.postgresql.org/docs/9.1/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
> Should be the default values, I’ve never changed them.
>
> backend> show vacuum_freeze_min_age
>          1: vacuum_freeze_min_age       (typeid = 25, len = -1, typmod = -1, byval = f)
>         ----
>          1: vacuum_freeze_min_age = "50000000"  (typeid = 25, len = -1, typmod = -1, byval = f)
>
> backend> show vacuum_freeze_table_age
>          1: vacuum_freeze_table_age     (typeid = 25, len = -1, typmod = -1, byval = f)
>         ----
>          1: vacuum_freeze_table_age = "150000000"       (typeid = 25, len = -1, typmod = -1, byval = f)
>
> backend> show autovacuum_freeze_max_age
>          1: autovacuum_freeze_max_age   (typeid = 25, len = -1, typmod = -1, byval = f)
>         ----
>          1: autovacuum_freeze_max_age = "200000000"     (typeid = 25, len = -1, typmod = -1, byval = f)
>
> backend> show autovacuum_vacuum_threshold
>          1: autovacuum_vacuum_threshold (typeid = 25, len = -1, typmod = -1, byval = f)
>         ----
>          1: autovacuum_vacuum_threshold = "50"  (typeid = 25, len = -1, typmod = -1, byval = f)
>
> backend> show autovacuum_max_workers
>          1: autovacuum_max_workers      (typeid = 25, len = -1, typmod = -1, byval = f)
>         ----
>          1: autovacuum_max_workers = "3"        (typeid = 25, len = -1, typmod = -1, byval = f)
>
> backend> show autovacuum_vacuum_scale_factor
>          1: autovacuum_vacuum_scale_factor      (typeid = 25, len = -1, typmod = -1, byval = f)
>         ----
>          1: autovacuum_vacuum_scale_factor = "0.2"      (typeid = 25, len = -1, typmod = -1, byval = f)
>
> backend> show autovacuum_vacuum_cost_delay
>          1: autovacuum_vacuum_cost_delay        (typeid = 25, len = -1, typmod = -1, byval = f)
>         ----
>          1: autovacuum_vacuum_cost_delay = "20ms"       (typeid = 25, len = -1, typmod = -1, byval = f)
>
> backend> show autovacuum_vacuum_cost_limit
>          1: autovacuum_vacuum_cost_limit        (typeid = 25, len = -1, typmod = -1, byval = f)
>         ----
>          1: autovacuum_vacuum_cost_limit = "-1" (typeid = 25, len = -1, typmod = -1, byval = f)
>
>>    4) If you want to get an idea of fast xid's are being created a quick
>>    and dirty way is from here:
>
>>    https://www.postgresql.org/docs/9.1/static/functions-info.html
>
>>    txid_current()     bigint     get current transaction ID
>
>>    Now if you do select txid_current() outside a transaction it will create
>>    an xid on its own, still if you repeat it over some interval of time you
>>    will get an idea of how fast the server is going through xid's.
> Well I’m now at a point where that’s not even possible, I have consistently run into the following
> • I vacuum enough table to get back a couple dozen transactions below the 1M mark

So you are VACUUMing the lesser 'younger' tables?

> • I exit singleuser mode and relaunch the server so I keep on vacuuming the oldest table first by copying
> • But I don’t even have time to launch my script : as soon as launch the server, the auto-vacuum daemon kicks in and
burnsthrough the transactions : I guess it’s not smart enough to start with the oldest tables ? 

Not understanding; 'the auto-vacuum daemon kicks in and burns through
the transactions'.

Are you saying it is reclaiming xids for you or using them?

If reclaiming that is what is supposed to do and is good thing.
Or am I misunderstanding?


>
> So it looks like I’m going to have to bite the bullet and really vacuum the whole database instead of just the oldest
tablesfirst which will impact our production pipelines. 

Or VACUUM the most heavily used tables in the database.

The next question to be asked is; what is creating the transactions and
is the transaction rate 'normal' or is there a possibility you have a
rogue process or rogue processes in action?

>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Should a DB vacuum use up a lot of space ?

From

Philippe Girolami

Date:

08 August 2016, 10:08:36

>So you are VACUUMing the lesser 'younger' tables?
I VACUUM those with the highest age : 
SELECT age,array_agg(table_name) FROM (SELECT c.oid::regclass as table_name,
greatest(age(c.relfrozenxid),age(t.relfrozenxid))as age FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
WHEREc.relkind IN ('r', 'm') order by 2 desc limit 1000) tt group by age order by age desc
 

  
>> • I exit singleuser mode and relaunch the server so I keep on vacuuming the oldest table first by copying
>> • But I don’t even have time to launch my script : as soon as launch the server, the auto-vacuum daemon kicks in and
burnsthrough the transactions : I guess it’s not smart enough to start with 
 
>>the oldest tables ?
>Not understanding; 'the auto-vacuum daemon kicks in and burns through 
>the transactions'.
>Are you saying it is reclaiming xids for you or using them?
>If reclaiming that is what is supposed to do and is good thing.
>Or am I misunderstanding?
Here is  what the logs show when I do what I described above

1) I got 7 transactions back in single user mode
Aug  7 23:40:57 p2 postgres[30376]: [5-1] 2016-08-07 23:40:57 CEST WARNING:  database "public" must be vacuumed within
999893transactions
 
Aug  7 23:40:57 p2 postgres[30376]: [5-2] 2016-08-07 23:40:57 CEST HINT:  To avoid a database shutdown, execute a
database-wideVACUUM in that database.
 
 
2) I exit single user mode and restart the database
Aug  7 23:41:40 p2 postgres[15457]: [1-1] 2016-08-07 23:41:40 CEST LOG:  database system was shut down at 2016-08-07
23:41:32CEST
 
Aug  7 23:41:40 p2 postgres[15458]: [1-1] 2016-08-07 23:41:40 CEST LOG:  incomplete startup packet
Aug  7 23:41:40 p2 postgres[15459]: [1-1] 2016-08-07 23:41:40 CEST FATAL:  the database system is starting up
Aug  7 23:41:40 p2 postgres[15457]: [2-1] 2016-08-07 23:41:40 CEST WARNING:  database with OID 16385 must be vacuumed
within999892 transactions
 
Aug  7 23:41:40 p2 postgres[15457]: [2-2] 2016-08-07 23:41:40 CEST HINT:  To avoid a database shutdown, execute a
database-wideVACUUM in that database.
 
Aug  7 23:41:40 p2 postgres[15457]: [2-3] #011You might also need to commit or roll back old prepared transactions.
Aug  7 23:41:40 p2 postgres[15462]: [1-1] 2016-08-07 23:41:40 CEST LOG:  autovacuum launcher started
Aug  7 23:41:40 p2 postgres[15447]: [1-1] 2016-08-07 23:41:40 CEST LOG:  database system is ready to accept
connections

3) but I don’t even have time to run a query to see the state of the databases
Aug  7 23:41:52 p2 postgres[15487]: [2-1] 2016-08-07 23:41:52 CEST ERROR:  database is not accepting commands to avoid
wraparounddata loss in database "public"
 
Aug  7 23:41:52 p2 postgres[15487]: [2-2] 2016-08-07 23:41:52 CEST HINT:  Stop the postmaster and use a standalone
backendto vacuum that database.
 
Aug  7 23:41:52 p2 postgres[15487]: [2-3] #011You might also need to commit or roll back old prepared transactions.
Aug  7 23:41:52 p2 postgres[15487]: [2-4] 2016-08-07 23:41:52 CEST STATEMENT:  SELECT datname, age(datfrozenxid) FROM
pg_database
Aug  7 23:41:52 p2 postgres[15497]: [2-1] 2016-08-07 23:41:52 CEST ERROR:  database is not accepting commands to avoid
wraparounddata loss in database "public"
 
Aug  7 23:41:52 p2 postgres[15497]: [2-2] 2016-08-07 23:41:52 CEST HINT:  Stop the postmaster and use a standalone
backendto vacuum that database.
 
Aug  7 23:41:52 p2 postgres[15497]: [2-3] #011You might also need to commit or roll back old prepared transactions.
Aug  7 23:41:52 p2 postgres[15497]: [2-4] 2016-08-07 23:41:52 CEST STATEMENT:  COPY  ( SELECT 'VACUUM VERBOSE '
||c.oid::regclass||';--' as _command, greatest ( age ( c.relfrozenxid ) ,age ( t.relfrozenxid ) ) as age FROM pg_class
cLEFT JOIN pg_class t ON c.reltoastrelid = t.oid WHERE c.relkind IN ( 'r' , 'm' ) order by 2 desc limit 1000 ) TO
STDOUT(FORMAT text)
 
Aug  7 23:41:52 p2 postgres[15518]: [2-1] 2016-08-07 23:41:52 CEST ERROR:  database is not accepting commands to avoid
wraparounddata loss in database "public"
 
Aug  7 23:41:52 p2 postgres[15518]: [2-2] 2016-08-07 23:41:52 CEST HINT:  Stop the postmaster and use a standalone
backendto vacuum that database.
 
Aug  7 23:41:52 p2 postgres[15518]: [2-3] #011You might also need to commit or roll back old prepared transactions.
Aug  7 23:41:52 p2 postgres[15518]: [2-4] 2016-08-07 23:41:52 CEST STATEMENT:  SELECT datname, age(datfrozenxid) FROM
pg_database
Aug  7 23:41:53 p2 rsyslogd-2177: imuxsock lost 3512 messages from pid 15476 due to rate-limiting
Aug  7 23:41:53 p2 postgres[15476]: [2-1] 2016-08-07 23:41:53 CEST ERROR:  database is not accepting commands to avoid
wraparounddata loss in database "public"
 
Aug  7 23:41:53 p2 postgres[15476]: [2-2] 2016-08-07 23:41:53 CEST HINT:  Stop the postmaster and use a standalone
backendto vacuum that database.
 
Aug  7 23:41:53 p2 postgres[15476]: [2-3] #011You might also need to commit or roll back old prepared transactions.
Aug  7 23:41:53 p2 postgres[15476]: [2-4] 2016-08-07 23:41:53 CEST CONTEXT:  automatic analyze of table
"public.public.aaaaaa"


4) lots of autovacuum failures
Aug  7 23:41:55 p2 postgres[15476]: [3-1] 2016-08-07 23:41:55 CEST ERROR:  database is not accepting commands to avoid
wraparounddata loss in database "public"
 
Aug  7 23:41:55 p2 postgres[15476]: [3-2] 2016-08-07 23:41:55 CEST HINT:  Stop the postmaster and use a standalone
backendto vacuum that database.
 
Aug  7 23:41:55 p2 postgres[15476]: [3-3] #011You might also need to commit or roll back old prepared transactions.
Aug  7 23:41:55 p2 postgres[15476]: [3-4] 2016-08-07 23:41:55 CEST CONTEXT:  automatic analyze of table
"public.public.aaaaab"
Aug  7 23:41:56 p2 postgres[15476]: [4-1] 2016-08-07 23:41:56 CEST ERROR:  database is not accepting commands to avoid
wraparounddata loss in database "public"
 
Aug  7 23:41:56 p2 postgres[15476]: [4-2] 2016-08-07 23:41:56 CEST HINT:  Stop the postmaster and use a standalone
backendto vacuum that database.
 
Aug  7 23:41:56 p2 postgres[15476]: [4-3] #011You might also need to commit or roll back old prepared transactions.
Aug  7 23:41:56 p2 postgres[15476]: [4-4] 2016-08-07 23:41:56 CEST CONTEXT:  automatic analyze of table
"public.public.aaaaac"
Aug  7 23:41:56 p2 postgres[15476]: [5-1] 2016-08-07 23:41:56 CEST ERROR:  database is not accepting commands to avoid
wraparounddata loss in database "public"
 
Aug  7 23:41:56 p2 postgres[15476]: [5-2] 2016-08-07 23:41:56 CEST HINT:  Stop the postmaster and use a standalone
backendto vacuum that database.
 
Aug  7 23:41:56 p2 postgres[15476]: [5-3] #011You might also need to commit or roll back old prepared transactions.
(…)

>> So it looks like I’m going to have to bite the bullet and really vacuum the whole database instead of just the
oldesttables first which will impact our production pipelines.  
 
> Or VACUUM the most heavily used tables in the database.
Why would that help ? Aren’t the tables with the highest age the problem ?

>    The next question to be asked is; what is creating the transactions and 
>    is the transaction rate 'normal' or is there a possibility you have a 
>    rogue process or rogue processes in action?
That’s always a possibility but I can’t think of what that would be.
I was guessing that the auto-vacuum wasn’t vacuuming the tables with the highest age first. Isn’t that a possibility ?

Thanks

Re: Should a DB vacuum use up a lot of space ?

From

Adrian Klaver

Date:

08 August 2016, 16:53:28

On 08/08/2016 12:08 AM, Philippe Girolami wrote:
>> So you are VACUUMing the lesser 'younger' tables?
> I VACUUM those with the highest age :
> SELECT age,array_agg(table_name) FROM (SELECT c.oid::regclass as table_name,
greatest(age(c.relfrozenxid),age(t.relfrozenxid))as age FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
WHEREc.relkind IN ('r', 'm') order by 2 desc limit 1000) tt group by age order by age desc 
>
>
>>> • I exit singleuser mode and relaunch the server so I keep on vacuuming the oldest table first by copying
>>> • But I don’t even have time to launch my script : as soon as launch the server, the auto-vacuum daemon kicks in
andburns through the transactions : I guess it’s not smart enough to start with 
>>> the oldest tables ?
>> Not understanding; 'the auto-vacuum daemon kicks in and burns through
>> the transactions'.
>> Are you saying it is reclaiming xids for you or using them?
>> If reclaiming that is what is supposed to do and is good thing.
>> Or am I misunderstanding?
> Here is  what the logs show when I do what I described above
>
> 1) I got 7 transactions back in single user mode
> Aug  7 23:40:57 p2 postgres[30376]: [5-1] 2016-08-07 23:40:57 CEST WARNING:  database "public" must be vacuumed
within999893 transactions 

So the above is from when you enter single user mode?

> Aug  7 23:40:57 p2 postgres[30376]: [5-2] 2016-08-07 23:40:57 CEST HINT:  To avoid a database shutdown, execute a
database-wideVACUUM in that database. 
>

I am not seeing what you do in single user mode?



> 2) I exit single user mode and restart the database
> Aug  7 23:41:40 p2 postgres[15457]: [1-1] 2016-08-07 23:41:40 CEST LOG:  database system was shut down at 2016-08-07
23:41:32CEST 
> Aug  7 23:41:40 p2 postgres[15458]: [1-1] 2016-08-07 23:41:40 CEST LOG:  incomplete startup packet
> Aug  7 23:41:40 p2 postgres[15459]: [1-1] 2016-08-07 23:41:40 CEST FATAL:  the database system is starting up
> Aug  7 23:41:40 p2 postgres[15457]: [2-1] 2016-08-07 23:41:40 CEST WARNING:  database with OID 16385 must be vacuumed
within999892 transactions 

So you actually lost a transaction.

> Aug  7 23:41:40 p2 postgres[15457]: [2-2] 2016-08-07 23:41:40 CEST HINT:  To avoid a database shutdown, execute a
database-wideVACUUM in that database. 
> Aug  7 23:41:40 p2 postgres[15457]: [2-3] #011You might also need to commit or roll back old prepared transactions.
> Aug  7 23:41:40 p2 postgres[15462]: [1-1] 2016-08-07 23:41:40 CEST LOG:  autovacuum launcher started
> Aug  7 23:41:40 p2 postgres[15447]: [1-1] 2016-08-07 23:41:40 CEST LOG:  database system is ready to accept
connections
>
> 3) but I don’t even have time to run a query to see the state of the databases
> Aug  7 23:41:52 p2 postgres[15487]: [2-1] 2016-08-07 23:41:52 CEST ERROR:  database is not accepting commands to
avoidwraparound data loss in database "public" 
> Aug  7 23:41:52 p2 postgres[15487]: [2-2] 2016-08-07 23:41:52 CEST HINT:  Stop the postmaster and use a standalone
backendto vacuum that database. 
> Aug  7 23:41:52 p2 postgres[15487]: [2-3] #011You might also need to commit or roll back old prepared transactions.
> Aug  7 23:41:52 p2 postgres[15487]: [2-4] 2016-08-07 23:41:52 CEST STATEMENT:  SELECT datname, age(datfrozenxid) FROM
pg_database
> Aug  7 23:41:52 p2 postgres[15497]: [2-1] 2016-08-07 23:41:52 CEST ERROR:  database is not accepting commands to
avoidwraparound data loss in database "public" 
> Aug  7 23:41:52 p2 postgres[15497]: [2-2] 2016-08-07 23:41:52 CEST HINT:  Stop the postmaster and use a standalone
backendto vacuum that database. 
> Aug  7 23:41:52 p2 postgres[15497]: [2-3] #011You might also need to commit or roll back old prepared transactions.
> Aug  7 23:41:52 p2 postgres[15497]: [2-4] 2016-08-07 23:41:52 CEST STATEMENT:  COPY  ( SELECT 'VACUUM VERBOSE '
||c.oid::regclass||';--' as _command, greatest ( age ( c.relfrozenxid ) ,age ( t.relfrozenxid ) ) as age FROM pg_class
cLEFT JOIN pg_class t ON c.reltoastrelid = t.oid WHERE c.relkind IN ( 'r' , 'm' ) order by 2 desc limit 1000 ) TO
STDOUT(FORMAT text) 
> Aug  7 23:41:52 p2 postgres[15518]: [2-1] 2016-08-07 23:41:52 CEST ERROR:  database is not accepting commands to
avoidwraparound data loss in database "public" 
> Aug  7 23:41:52 p2 postgres[15518]: [2-2] 2016-08-07 23:41:52 CEST HINT:  Stop the postmaster and use a standalone
backendto vacuum that database. 
> Aug  7 23:41:52 p2 postgres[15518]: [2-3] #011You might also need to commit or roll back old prepared transactions.
> Aug  7 23:41:52 p2 postgres[15518]: [2-4] 2016-08-07 23:41:52 CEST STATEMENT:  SELECT datname, age(datfrozenxid) FROM
pg_database
> Aug  7 23:41:53 p2 rsyslogd-2177: imuxsock lost 3512 messages from pid 15476 due to rate-limiting

Hmm I wonder what pid 15476(see below) is trying to do that is blowing
out the logging?

> Aug  7 23:41:53 p2 postgres[15476]: [2-1] 2016-08-07 23:41:53 CEST ERROR:  database is not accepting commands to
avoidwraparound data loss in database "public" 
> Aug  7 23:41:53 p2 postgres[15476]: [2-2] 2016-08-07 23:41:53 CEST HINT:  Stop the postmaster and use a standalone
backendto vacuum that database. 
> Aug  7 23:41:53 p2 postgres[15476]: [2-3] #011You might also need to commit or roll back old prepared transactions.
> Aug  7 23:41:53 p2 postgres[15476]: [2-4] 2016-08-07 23:41:53 CEST CONTEXT:  automatic analyze of table
"public.public.aaaaaa"
>
>
> 4) lots of autovacuum failures

It is trying to analyze tables and being refused as are all commands.

> Aug  7 23:41:55 p2 postgres[15476]: [3-1] 2016-08-07 23:41:55 CEST ERROR:  database is not accepting commands to
avoidwraparound data loss in database "public" 
> Aug  7 23:41:55 p2 postgres[15476]: [3-2] 2016-08-07 23:41:55 CEST HINT:  Stop the postmaster and use a standalone
backendto vacuum that database. 
> Aug  7 23:41:55 p2 postgres[15476]: [3-3] #011You might also need to commit or roll back old prepared transactions.
> Aug  7 23:41:55 p2 postgres[15476]: [3-4] 2016-08-07 23:41:55 CEST CONTEXT:  automatic analyze of table
"public.public.aaaaab"
> Aug  7 23:41:56 p2 postgres[15476]: [4-1] 2016-08-07 23:41:56 CEST ERROR:  database is not accepting commands to
avoidwraparound data loss in database "public" 
> Aug  7 23:41:56 p2 postgres[15476]: [4-2] 2016-08-07 23:41:56 CEST HINT:  Stop the postmaster and use a standalone
backendto vacuum that database. 
> Aug  7 23:41:56 p2 postgres[15476]: [4-3] #011You might also need to commit or roll back old prepared transactions.
> Aug  7 23:41:56 p2 postgres[15476]: [4-4] 2016-08-07 23:41:56 CEST CONTEXT:  automatic analyze of table
"public.public.aaaaac"
> Aug  7 23:41:56 p2 postgres[15476]: [5-1] 2016-08-07 23:41:56 CEST ERROR:  database is not accepting commands to
avoidwraparound data loss in database "public" 
> Aug  7 23:41:56 p2 postgres[15476]: [5-2] 2016-08-07 23:41:56 CEST HINT:  Stop the postmaster and use a standalone
backendto vacuum that database. 
> Aug  7 23:41:56 p2 postgres[15476]: [5-3] #011You might also need to commit or roll back old prepared transactions.
> (…)
>
>>> So it looks like I’m going to have to bite the bullet and really vacuum the whole database instead of just the
oldesttables first which will impact our production pipelines. 
>> Or VACUUM the most heavily used tables in the database.
> Why would that help ? Aren’t the tables with the highest age the problem ?

I think we are saying the same thing. The tables that have accumulated
the most xid debt.

>
>>    The next question to be asked is; what is creating the transactions and
>>    is the transaction rate 'normal' or is there a possibility you have a
>>    rogue process or rogue processes in action?
> That’s always a possibility but I can’t think of what that would be.
> I was guessing that the auto-vacuum wasn’t vacuuming the tables with the highest age first. Isn’t that a possibility
?

Looks to me like it never gets a chance in normal mode to get its work
done. As mentioned before you need to create some headroom to work with.
The logical place to do that would be in single user mode where other
sessions cannot interfere.

>
> Thanks
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Should a DB vacuum use up a lot of space ?

From

Philippe Girolami

Date:

09 August 2016, 22:23:43

Adrian,

All the logs posted were from syslog (that’s were postgres writes its log on our Ubuntu install)

>> 1) I got 7 transactions back in single user mode
>> Aug  7 23:40:57 p2 postgres[30376]: [5-1] 2016-08-07 23:40:57 CEST WARNING:  database "public" must be vacuumed
within999893 transactions  
 
>    So the above is from when you enter single user mode?
Yes
    
>> Aug  7 23:40:57 p2 postgres[30376]: [5-2] 2016-08-07 23:40:57 CEST HINT:  To avoid a database shutdown, execute a
database-wideVACUUM in that database.
 
>>
>I am not seeing what you do in single user mode?
I ran

SELECT age,array_agg(table_name) FROM (SELECT c.oid::regclass as table_name,
greatest(age(c.relfrozenxid),age(t.relfrozenxid))as age FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
WHEREc.relkind IN ('r', 'm') order by 2 desc limit 1000) tt group by age order by age desc
 

And then one VACUUM per table listed until I got enough transactions back  

I ended up letting VACUUM run as long as needed in single-user mode (that took another 24h, in addition to the first
30h)and got the max age down to 147M which is great.
 
In the end, I was probably trying to optimize restoring service too much.

Anyway, thanks for the help and suggestions. The responses were super important to deal with the disk usage and making
sureinterrupting the vacuum wouldn’t cause any problems !

Re: Should a DB vacuum use up a lot of space ?

From

Jeff Janes

Date:

10 August 2016, 19:23:26

On Mon, Aug 8, 2016 at 12:08 AM, Philippe Girolami
<philippe.girolami@mosaik.com> wrote:

>>Not understanding; 'the auto-vacuum daemon kicks in and burns through
>>the transactions'.
>>Are you saying it is reclaiming xids for you or using them?
>>If reclaiming that is what is supposed to do and is good thing.
>>Or am I misunderstanding?
> Here is  what the logs show when I do what I described above
>
> 1) I got 7 transactions back in single user mode
> Aug  7 23:40:57 p2 postgres[30376]: [5-1] 2016-08-07 23:40:57 CEST WARNING:  database "public" must be vacuumed
within999893 transactions 
> Aug  7 23:40:57 p2 postgres[30376]: [5-2] 2016-08-07 23:40:57 CEST HINT:  To avoid a database shutdown, execute a
database-wideVACUUM in that database. 

What do you mean by getting 7 back?  'Back' compared to what?  You had
999886 before vacuuming that table?

> 2) I exit single user mode and restart the database

999893 is not the number left before hitting the ordinary stop limit,
it is the number left before your database becomes
corrupted. 999893 is stil past the ordinary stop limit (1,000,000), so
there is no point taking it out of single user mode at this point as
it would already be in shutdown at the time it starts.

> 4) lots of autovacuum failures

9.1.23 (to be released tomorrow-ish) does include a change that might
possibly have helped.

>>    The next question to be asked is; what is creating the transactions and
>>    is the transaction rate 'normal' or is there a possibility you have a
>>    rogue process or rogue processes in action?
> That’s always a possibility but I can’t think of what that would be.

I think you are just misinterpreting what the reported number means,
and there is no mysterious transaction consumption going on at all.

Cheers,

Jeff