Thread: Assertion failure twophase.c (2) (testing HS/SR)
in a 9.0devel, primary+standby, cvs from 2010.03.04 01:30 With three patches: new_smart_shutdown_20100201.patch extend_format_of_recovery_info_funcs_v4.20100303.patch fix-KnownAssignedXidsRemoveMany-1.patch pg_dump -d $db8.4.2 | psql -d $db9.0devel-primary FailedAssertion, File: "twophase.c", Line: 1201. The standby was restarted and seems to catch up OK again. LOG: database system was interrupted; last known up at 2010-03-04 01:35:23 CET cp: cannot stat `/var/data1/pg_stuff/dump/hotslave/replication_archive/000000010000000000000001': No such file or directory FATAL: the database system is starting up LOG: entering standby mode LOG: redo starts at 0/1000020 LOG: consistent recovery state reached at 0/2000000 LOG: database system is ready to accept read only connections ERROR: cannot execute CREATE TABLE in a read-only transaction STATEMENT: create table t (c text); ERROR: cannot execute SELECT INTO in a read-only transaction STATEMENT: create table t as select 1; ERROR: cannot execute TRUNCATE TABLE in a read-only transaction STATEMENT: truncate wal; ERROR: cannot execute TRUNCATE TABLE in a read-only transaction STATEMENT: truncate wal; ERROR: cannot execute TRUNCATE TABLE in a read-only transaction STATEMENT: truncate wal; ERROR: cannot execute TRUNCATE TABLE in a read-only transaction STATEMENT: truncate wal; ERROR: cannot execute TRUNCATE TABLE in a read-only transaction STATEMENT: truncate wal; ERROR: cannot execute TRUNCATE TABLE in a read-only transaction STATEMENT: truncate wal; ERROR: cannot execute TRUNCATE TABLE in a read-only transaction STATEMENT: truncate wal; FATAL: database "ms" does not exist TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: "twophase.c", Line: 1201) LOG: startup process (PID 18107) was terminated by signal 6: Aborted LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. LOG: database system was interrupted while in recovery at log time 2010-03-04 05:00:24 CET HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. cp: cannot stat `/var/data1/pg_stuff/dump/hotslave/replication_archive/000000010000001C0000007F': No such file or directory LOG: entering standby mode LOG: redo starts at 1C/7800F8E0 LOG: consistent recovery state reached at 1C/ADB9C178 LOG: database system is ready to accept read only connections The ERRORs (and FATAL) were accidentally issued commands; I can't tell if they were causing the assertion. (database 'ms' indeed was not present on this instance) see also: http://archives.postgresql.org/pgsql-hackers/2010-02/msg02221.php thanks, Erik Rijkers
Erik Rijkers wrote: > in a 9.0devel, primary+standby, cvs from 2010.03.04 01:30 > > With three patches: > > new_smart_shutdown_20100201.patch > extend_format_of_recovery_info_funcs_v4.20100303.patch Got a link to these two patches? I couldn't find them with a quick search. > fix-KnownAssignedXidsRemoveMany-1.patch > > pg_dump -d $db8.4.2 | psql -d $db9.0devel-primary > > FailedAssertion, File: "twophase.c", Line: 1201. > > The standby was restarted and seems to catch up OK again. > ... > see also: > http://archives.postgresql.org/pgsql-hackers/2010-02/msg02221.php I'm still not any wiser on what's causing that, but I've fixed the bug in KnownAssignedXidsMany() now. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Mar 11, 2010 at 6:29 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Erik Rijkers wrote: >> in a 9.0devel, primary+standby, cvs from 2010.03.04 01:30 >> >> With three patches: >> >> new_smart_shutdown_20100201.patch http://archives.postgresql.org/pgsql-hackers/2010-01/msg03116.php >> extend_format_of_recovery_info_funcs_v4.20100303.patch http://archives.postgresql.org/pgsql-hackers/2010-03/msg00175.php > Got a link to these two patches? I couldn't find them with a quick search. For your convenience, I attached those patches in this post. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Thu, 2010-03-11 at 11:29 +0200, Heikki Linnakangas wrote: > I'm still not any wiser on what's causing that, but I've fixed the bug > in KnownAssignedXidsMany() now. Yeh, I've been mulling this over for a few days now and I can't see a way that could have happened either. I agree with your fix and the stronger placement of the Assertion. Thanks. I will be doing some further investigation in that area as well, over next week or so. -- Simon Riggs www.2ndQuadrant.com
On Thu, March 4, 2010 17:00, Erik Rijkers wrote: > in a 9.0devel, primary+standby, cvs from 2010.03.04 01:30 > > With three patches: > > new_smart_shutdown_20100201.patch > extend_format_of_recovery_info_funcs_v4.20100303.patch > fix-KnownAssignedXidsRemoveMany-1.patch > > pg_dump -d $db8.4.2 | psql -d $db9.0devel-primary > > FailedAssertion, File: "twophase.c", Line: 1201. > For the record, this still happens (FailedAssertion, File: "twophase.c", Line: 1201.) (created 2010.03.13 23:49 cvs). Unfortunately, it does not happen always, or predictably. patches:new_smart_shutdown_20100201.patchextend_format_of_recovery_info_funcs_v4.20100303.patch(both here: http://archives.postgresql.org/pgsql-hackers/2010-03/msg00446.php) (fix-KnownAssignedXidsRemoveMany-1.patch has been committed, I think?) I use commandlines like this to copy schemas across from 8.4.2 to 9.0devel: pg_dump -c -h /tmp -p 5432 -n myschema --no-owner --no-privileges mydb \ | psql -1qtA -h /tmp -p 7575 -d replicas (the copied schemas were together 175 GB) As I seem to be the only one who finds this, I started looking what could be unique in this install: and it would be postbio, which we use for its gist-indexing on ranges (http://pgfoundry.org/projects/postbio/). We use postbio's int_interval type as a column type. But keep in mind that sometimes the whole dump+restore+replication completes OK. Other installed modules are: contrib/btree_gist contrib/seg contrib/adminpack log_line_prefix = '%t %p %d %u start=%s ' # slave pgsql.sr_hotslave/logfile: 2010-03-13 23:54:59 CET 15765 start=2010-03-13 23:54:59 CET LOG: database system was interrupted; last known up at 2010-03-13 23:54:31 CET cp: cannot stat `/var/data1/pg_stuff/dump/hotslave/replication_archive/000000010000000000000001': No such file or directory 2010-03-13 23:55:00 CET 15765 start=2010-03-13 23:54:59 CET LOG: entering standby mode 2010-03-13 23:55:00 CET 15765 start=2010-03-13 23:54:59 CET LOG: redo starts at 0/1000020 2010-03-13 23:55:00 CET 15765 start=2010-03-13 23:54:59 CET LOG: consistent recovery state reached at 0/2000000 2010-03-13 23:55:00 CET 15763 start=2010-03-13 23:54:59 CET LOG: database system is ready to accept read only connections TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: "twophase.c", Line: 1201) 2010-03-14 05:28:59 CET 15763 start=2010-03-13 23:54:59 CET LOG: startup process (PID 15765) was terminated by signal 6: Aborted 2010-03-14 05:28:59 CET 15763 start=2010-03-13 23:54:59 CET LOG: terminating any other active server processes Maybe I'll try now to setup a similar instance without postbio, to see if the crash still occurs. hth, Erik Rijkers
Can you still reproduce this or has some of the changes since then fixed it? We never quite figured out the cause.. Erik Rijkers wrote: > On Thu, March 4, 2010 17:00, Erik Rijkers wrote: >> in a 9.0devel, primary+standby, cvs from 2010.03.04 01:30 >> >> With three patches: >> >> new_smart_shutdown_20100201.patch >> extend_format_of_recovery_info_funcs_v4.20100303.patch >> fix-KnownAssignedXidsRemoveMany-1.patch >> >> pg_dump -d $db8.4.2 | psql -d $db9.0devel-primary >> >> FailedAssertion, File: "twophase.c", Line: 1201. >> > > For the record, this still happens (FailedAssertion, File: "twophase.c", Line: 1201.) > (created 2010.03.13 23:49 cvs). > > Unfortunately, it does not happen always, or predictably. > > patches: > new_smart_shutdown_20100201.patch > extend_format_of_recovery_info_funcs_v4.20100303.patch > (both here: http://archives.postgresql.org/pgsql-hackers/2010-03/msg00446.php ) > > (fix-KnownAssignedXidsRemoveMany-1.patch has been committed, I think?) > > > I use commandlines like this to copy schemas across from 8.4.2 to 9.0devel: > pg_dump -c -h /tmp -p 5432 -n myschema --no-owner --no-privileges mydb \ > | psql -1qtA -h /tmp -p 7575 -d replicas > > (the copied schemas were together 175 GB) > > As I seem to be the only one who finds this, I started looking what could be unique in this > install: and it would be postbio, which we use for its gist-indexing on ranges > (http://pgfoundry.org/projects/postbio/). We use postbio's int_interval type as a column type. > But keep in mind that sometimes the whole dump+restore+replication completes OK. > > > Other installed modules are: > contrib/btree_gist > contrib/seg > contrib/adminpack > > log_line_prefix = '%t %p %d %u start=%s ' # slave > > pgsql.sr_hotslave/logfile: > > 2010-03-13 23:54:59 CET 15765 start=2010-03-13 23:54:59 CET LOG: database system was > interrupted; last known up at 2010-03-13 23:54:31 CET > cp: cannot stat `/var/data1/pg_stuff/dump/hotslave/replication_archive/000000010000000000000001': > No such file or directory > 2010-03-13 23:55:00 CET 15765 start=2010-03-13 23:54:59 CET LOG: entering standby mode > 2010-03-13 23:55:00 CET 15765 start=2010-03-13 23:54:59 CET LOG: redo starts at 0/1000020 > 2010-03-13 23:55:00 CET 15765 start=2010-03-13 23:54:59 CET LOG: consistent recovery state > reached at 0/2000000 > 2010-03-13 23:55:00 CET 15763 start=2010-03-13 23:54:59 CET LOG: database system is ready to > accept read only connections > TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: "twophase.c", Line: 1201) > 2010-03-14 05:28:59 CET 15763 start=2010-03-13 23:54:59 CET LOG: startup process (PID 15765) > was terminated by signal 6: Aborted > 2010-03-14 05:28:59 CET 15763 start=2010-03-13 23:54:59 CET LOG: terminating any other active > server processes > > > Maybe I'll try now to setup a similar instance without postbio, to see if the crash still occurs. > > hth, > > Erik Rijkers > > > -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, April 22, 2010 09:53, Heikki Linnakangas wrote: > Can you still reproduce this or has some of the changes since then fixed > it? We never quite figured out the cause.. I don't know for sure: >> Unfortunately, it does not happen always, or predictably. The only thing that I established after that email was sent, is that the error can also occur without the postbio package being been installed (this has happened once). It's a very easy test; I will probably run it a few more times. > Erik Rijkers wrote: >> On Thu, March 4, 2010 17:00, Erik Rijkers wrote: >>> in a 9.0devel, primary+standby, cvs from 2010.03.04 01:30 >>> >>> With three patches: >>> >>> new_smart_shutdown_20100201.patch >>> extend_format_of_recovery_info_funcs_v4.20100303.patch >>> fix-KnownAssignedXidsRemoveMany-1.patch >>> >>> pg_dump -d $db8.4.2 | psql -d $db9.0devel-primary >>> >>> FailedAssertion, File: "twophase.c", Line: 1201. >>> >> >> For the record, this still happens (FailedAssertion, File: "twophase.c", Line: 1201.) >> (created 2010.03.13 23:49 cvs). >> >> Unfortunately, it does not happen always, or predictably. >> >> patches: >> new_smart_shutdown_20100201.patch >> extend_format_of_recovery_info_funcs_v4.20100303.patch >> (both here: http://archives.postgresql.org/pgsql-hackers/2010-03/msg00446.php ) >> >> (fix-KnownAssignedXidsRemoveMany-1.patch has been committed, I think?) >> >> >> I use commandlines like this to copy schemas across from 8.4.2 to 9.0devel: >> pg_dump -c -h /tmp -p 5432 -n myschema --no-owner --no-privileges mydb \ >> | psql -1qtA -h /tmp -p 7575 -d replicas >> >> (the copied schemas were together 175 GB) >> >> As I seem to be the only one who finds this, I started looking what could be unique in this >> install: and it would be postbio, which we use for its gist-indexing on ranges >> (http://pgfoundry.org/projects/postbio/). We use postbio's int_interval type as a column type. >> But keep in mind that sometimes the whole dump+restore+replication completes OK. >> >> >> Other installed modules are: >> contrib/btree_gist >> contrib/seg >> contrib/adminpack >> >> log_line_prefix = '%t %p %d %u start=%s ' # slave >> >> pgsql.sr_hotslave/logfile: >> >> 2010-03-13 23:54:59 CET 15765 start=2010-03-13 23:54:59 CET LOG: database system was >> interrupted; last known up at 2010-03-13 23:54:31 CET >> cp: cannot stat >> `/var/data1/pg_stuff/dump/hotslave/replication_archive/000000010000000000000001': >> No such file or directory >> 2010-03-13 23:55:00 CET 15765 start=2010-03-13 23:54:59 CET LOG: entering standby mode >> 2010-03-13 23:55:00 CET 15765 start=2010-03-13 23:54:59 CET LOG: redo starts at 0/1000020 >> 2010-03-13 23:55:00 CET 15765 start=2010-03-13 23:54:59 CET LOG: consistent recovery state >> reached at 0/2000000 >> 2010-03-13 23:55:00 CET 15763 start=2010-03-13 23:54:59 CET LOG: database system is ready to >> accept read only connections >> TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: "twophase.c", Line: 1201) >> 2010-03-14 05:28:59 CET 15763 start=2010-03-13 23:54:59 CET LOG: startup process (PID 15765) >> was terminated by signal 6: Aborted >> 2010-03-14 05:28:59 CET 15763 start=2010-03-13 23:54:59 CET LOG: terminating any other active >> server processes >> >> >> Maybe I'll try now to setup a similar instance without postbio, to see if the crash still >> occurs. >> >> hth, >> >> Erik Rijkers >> >> >> > > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com >
On Fri, 2010-04-23 at 03:08 +0200, Erik Rijkers wrote: > It's a very easy test; I will probably run it a few more times. Please share details of your system and hardware. -- Simon Riggs www.2ndQuadrant.com