Re: Cluster "stuck" in "not accepting commands to avoid wraparound data loss" - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"
Date
Msg-id CA+TgmoYH26szcTjQS5zKN5D8AMGPyQsD2kPFZcrPUJgy70DbkA@mail.gmail.com
Whole thread Raw
In response to Cluster "stuck" in "not accepting commands to avoid wraparound data loss"  (Andres Freund <andres@anarazel.de>)
Responses Re: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Thu, Dec 10, 2015 at 4:55 PM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> I recently started a pgbench benchmark (to evaluate a piece of hardware,
> not postgres) with master. Unfortunately, by accident, I started
> postgres in a shell, not screen like pgbench.
>
> Just logged back in and saw:
> client 71 aborted in state 8: ERROR:  database is not accepting commands to avoid wraparound data loss in database
"postgres"
> HINT:  Stop the postmaster and vacuum that database in single-user mode.
> You might also need to commit or roll back old prepared transactions.
> transaction type: TPC-B (sort of)
> scaling factor: 300
> query mode: prepared
> number of clients: 97
> number of threads: 97
> duration: 300000 s
> number of transactions actually processed: 2566862424
> latency average: 3.214 ms
> latency stddev: 7.336 ms
> tps = 30169.374133 (including connections establishing)
> tps = 30169.378406 (excluding connections establishing)
>
> Hm. Bad news. We apparently didn't keep up vacuuming. But worse news is
> that even now, days later, autovacuum hasn't progressed:
> postgres=# select txid_current();
> ERROR:  database is not accepting commands to avoid wraparound data loss in database "postgres"
> HINT:  Stop the postmaster and vacuum that database in single-user mode.
> You might also need to commit or roll back old prepared transactions.
>
> Looking at datfrozenxid:
> postgres=# select datname, datfrozenxid, age(datfrozenxid) FROM pg_database ;
>   datname  | datfrozenxid |    age
> -----------+--------------+-----------
>  template1 |   3357685367 |         0
>  template0 |   3357685367 |         0
>  postgres  |   3159867733 | 197817634
> (3 rows)
> reveals that the launcher doesn't do squat because it doesn't think it
> needs to do anything.
>
> (gdb) p *ShmemVariableCache
> $3 = {nextOid = 24576, oidCount = 0, nextXid = 3357685367, oldestXid = 1211201715, xidVacLimit = 1411201715,
xidWarnLimit= 3347685362,
 
>   xidStopLimit = 3357685362, xidWrapLimit = 3358685362, oldestXidDB = 12380, oldestCommitTs = 0, newestCommitTs = 0,
>   latestCompletedXid = 3357685366}
>
> 'oldestXid' shows the problem: We're indeed pretty short before a
> wraparound.

Is the postmaster in a "stopped" state (T)?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
Next
From: Andres Freund
Date:
Subject: Re: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"