Thread: The slave suddenly stopped with such DB log : "will not overwrite a used ItemId" and "heap_insert_redo: failed to add tuple"
The slave suddenly stopped with such DB log : "will not overwrite a used ItemId" and "heap_insert_redo: failed to add tuple"
From
hailong Li
Date:
Hi, dear pgsql-general
The details are as follows:
1. environmentDB Master
$ cat /etc/issue
CentOS release 6.5 (Final)
Kernel \r on an \m
$ uname -av
Linux l-xxxxx1.xx.cnx 3.14.29-3.centos6.x86_64 #1 SMP Tue Jan 20 17:48:32 CST 2015 x86_64 x86_64 x86_64 GNU/Linux
$ psql -U postgres
psql (9.3.5)
Type "help" for help.
postgres=# select version();
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
(1 row)
$ pg_config
BINDIR = /opt/pg93/bin
DOCDIR = /opt/pg93/share/doc/postgresql
HTMLDIR = /opt/pg93/share/doc/postgresql
INCLUDEDIR = /opt/pg93/include
PKGINCLUDEDIR = /opt/pg93/include/postgresql
INCLUDEDIR-SERVER = /opt/pg93/include/postgresql/server
LIBDIR = /opt/pg93/lib
PKGLIBDIR = /opt/pg93/lib/postgresql
LOCALEDIR = /opt/pg93/share/locale
MANDIR = /opt/pg93/share/man
SHAREDIR = /opt/pg93/share/postgresql
SYSCONFDIR = /opt/pg93/etc/postgresql
PGXS = /opt/pg93/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/opt/pg93' '--with-perl' '--with-libxml' '--with-libxslt' '--with-ossp-uuid' 'CFLAGS= -march=core2 -O2 '
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -march=core2 -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fpic
LDFLAGS = -L../../../src/common -Wl,--as-needed -Wl,-rpath,'/opt/pg93/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgport -lpgcommon -lxslt -lxml2 -lz -lreadline -lcrypt -ldl -lm
VERSION = PostgreSQL 9.3.5
DB Slave
$ cat /etc/issue
CentOS release 6.5 (Final)
Kernel \r on an \m
$ uname -av
Linux l-xxxx2.xx.cnx 3.14.31-3.centos6.x86_64 #1 SMP Mon Feb 2 15:26:04 CST 2015 x86_64 x86_64 x86_64 GNU/Linux
$ psql -U postgres
psql (9.3.5)
Type "help" for help.
postgres=# select version();
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
(1 row)
postgres=# show log_line_prefix ;
log_line_prefix
------------------------------
[%u %d %a %h %m %p %c %l %x]
(1 row)
$ pg_config
BINDIR = /opt/pg93/bin
DOCDIR = /opt/pg93/share/doc/postgresql
HTMLDIR = /opt/pg93/share/doc/postgresql
INCLUDEDIR = /opt/pg93/include
PKGINCLUDEDIR = /opt/pg93/include/postgresql
INCLUDEDIR-SERVER = /opt/pg93/include/postgresql/server
LIBDIR = /opt/pg93/lib
PKGLIBDIR = /opt/pg93/lib/postgresql
LOCALEDIR = /opt/pg93/share/locale
MANDIR = /opt/pg93/share/man
SHAREDIR = /opt/pg93/share/postgresql
SYSCONFDIR = /opt/pg93/etc/postgresql
PGXS = /opt/pg93/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/opt/pg93' '--with-perl' '--with-libxml' '--with-libxslt' '--with-ossp-uuid' 'CFLAGS= -march=core2 -O2 '
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -march=core2 -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fpic
LDFLAGS = -L../../../src/common -Wl,--as-needed -Wl,-rpath,'/opt/pg93/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgport -lpgcommon -lxslt -lxml2 -lz -lreadline -lcrypt -ldl -lm
VERSION = PostgreSQL 9.3.5
--------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
(1 row)
$ pg_config
BINDIR = /opt/pg93/bin
DOCDIR = /opt/pg93/share/doc/postgresql
HTMLDIR = /opt/pg93/share/doc/postgresql
INCLUDEDIR = /opt/pg93/include
PKGINCLUDEDIR = /opt/pg93/include/postgresql
INCLUDEDIR-SERVER = /opt/pg93/include/postgresql/server
LIBDIR = /opt/pg93/lib
PKGLIBDIR = /opt/pg93/lib/postgresql
LOCALEDIR = /opt/pg93/share/locale
MANDIR = /opt/pg93/share/man
SHAREDIR = /opt/pg93/share/postgresql
SYSCONFDIR = /opt/pg93/etc/postgresql
PGXS = /opt/pg93/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/opt/pg93' '--with-perl' '--with-libxml' '--with-libxslt' '--with-ossp-uuid' 'CFLAGS= -march=core2 -O2 '
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -march=core2 -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fpic
LDFLAGS = -L../../../src/common -Wl,--as-needed -Wl,-rpath,'/opt/pg93/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgport -lpgcommon -lxslt -lxml2 -lz -lreadline -lcrypt -ldl -lm
VERSION = PostgreSQL 9.3.5
DB Slave
$ cat /etc/issue
CentOS release 6.5 (Final)
Kernel \r on an \m
$ uname -av
Linux l-xxxx2.xx.cnx 3.14.31-3.centos6.x86_64 #1 SMP Mon Feb 2 15:26:04 CST 2015 x86_64 x86_64 x86_64 GNU/Linux
$ psql -U postgres
psql (9.3.5)
Type "help" for help.
postgres=# select version();
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
(1 row)
postgres=# show log_line_prefix ;
log_line_prefix
------------------------------
[%u %d %a %h %m %p %c %l %x]
(1 row)
$ pg_config
BINDIR = /opt/pg93/bin
DOCDIR = /opt/pg93/share/doc/postgresql
HTMLDIR = /opt/pg93/share/doc/postgresql
INCLUDEDIR = /opt/pg93/include
PKGINCLUDEDIR = /opt/pg93/include/postgresql
INCLUDEDIR-SERVER = /opt/pg93/include/postgresql/server
LIBDIR = /opt/pg93/lib
PKGLIBDIR = /opt/pg93/lib/postgresql
LOCALEDIR = /opt/pg93/share/locale
MANDIR = /opt/pg93/share/man
SHAREDIR = /opt/pg93/share/postgresql
SYSCONFDIR = /opt/pg93/etc/postgresql
PGXS = /opt/pg93/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/opt/pg93' '--with-perl' '--with-libxml' '--with-libxslt' '--with-ossp-uuid' 'CFLAGS= -march=core2 -O2 '
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -march=core2 -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fpic
LDFLAGS = -L../../../src/common -Wl,--as-needed -Wl,-rpath,'/opt/pg93/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgport -lpgcommon -lxslt -lxml2 -lz -lreadline -lcrypt -ldl -lm
VERSION = PostgreSQL 9.3.5
2. the DB log in the Slave's log_directory
[ 2015-02-05 15:38:51.406 CST 2328 54d08abc.918 6 0]WARNING: will not overwrite a used ItemId
[ 2015-02-05 15:38:51.406 CST 2328 54d08abc.918 7 0]CONTEXT: xlog redo insert: rel 38171461/16384/57220350; tid 1778398/9
[ 2015-02-05 15:38:51.406 CST 2328 54d08abc.918 8 0]PANIC: heap_insert_redo: failed to add tuple
[ 2015-02-05 15:38:51.406 CST 2328 54d08abc.918 9 0]CONTEXT: xlog redo insert: rel 38171461/16384/57220350; tid 1778398/9
[ 2015-02-05 15:38:51.765 CST 2320 54d08abb.910 6 0]LOG: startup process (PID 2328) was terminated by signal 6: Aborted
[ 2015-02-05 15:38:51.765 CST 2320 54d08abb.910 7 0]LOG: terminating any other active server processes
[DBusesr DBname [unknown] 192.168.xxx.x 2015-02-05 15:38:51.765 CST 61450 54d31d48.f00a 3 0]WARNING: terminating connection because of crash of another server process
[DBusesr DBname [unknown] 192.168.xxx.x 2015-02-05 15:38:51.765 CST 61450 54d31d48.f00a 4 0]DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
[DBusesr DBname [unknown] 192.168.xxx.x 2015-02-05 15:38:51.765 CST 61450 54d31d48.f00a 5 0]HINT: In a moment you should be able to reconnect to the database and repeat your command.
[DBusesr DBname [unknown] 192.168.xxx.x 2015-02-05 15:38:51.765 CST 51208 54d315b6.c808 7 0]WARNING: terminating connection because of crash of another server process
[DBusesr DBname [unknown] 192.168.xxx.x 2015-02-05 15:38:51.765 CST 51208 54d315b6.c808 8 0]DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
[DBusesr DBname [unknown] 192.168.xxx.x 2015-02-05 15:38:51.765 CST 51208 54d315b6.c808 9 0]HINT: In a moment you should be able to reconnect to the database and repeat your command.
The salve was running, but stopped suddenly , and I never start it !
Anyone encounter the same problem? Could tell me why and how to avoid it?
If you need some more detailed information, please tell me and I'll give it to you.
Thanks Best Regards!
Re: The slave suddenly stopped with such DB log : "will not overwrite a used ItemId" and "heap_insert_redo: failed to add tuple"
From
Adrian Klaver
Date:
On 03/02/2015 02:49 AM, hailong Li wrote: > > > Hi, dear pgsql-general > > > The details are as follows: > > *1. environment* > > *DB Master* > > $ cat /etc/issue > CentOS release 6.5 (Final) > Kernel \r on an \m > > $ uname -av > Linux l-xxxxx1.xx.cnx 3.14.29-3.centos6.x86_64 #1 SMP Tue Jan 20 > 17:48:32 CST 2015 x86_64 x86_64 x86_64 GNU/Linux > > $ psql -U postgres > psql (9.3.5) > Type "help" for help. > > postgres=# select version(); > version > -------------------------------------------------------------------------------------------------------------- > PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) > 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit > (1 row) > > > *DB Slave > > *$ cat /etc/issue > CentOS release 6.5 (Final) > Kernel \r on an \m > > $ uname -av > Linux l-xxxx2.xx.cnx 3.14.31-3.centos6.x86_64 #1 SMP Mon Feb 2 15:26:04 > CST 2015 x86_64 x86_64 x86_64 GNU/Linux* > * > $ psql -U postgres > psql (9.3.5) > Type "help" for help. > > postgres=# select version(); > version > -------------------------------------------------------------------------------------------------------------- > PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) > 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit > (1 row) > > > *The salve was running, but stopped suddenly , and I never start it ! > > Anyone encounter the same problem? Could tell me why and how to avoid it? > > If you need some more detailed information, please tell me and I'll give it to you. So what sort of replication(streaming, archiving, synchronous,etc) where you doing? Was the streaming happening across a local network or a remote network? Was there a hardware issue on either of the machines? > > > Thanks > > Best Regards! > -- Adrian Klaver adrian.klaver@aklaver.com
Re: The slave suddenly stopped with such DB log : "will not overwrite a used ItemId" and "heap_insert_redo: failed to add tuple"
From
hailong Li
Date:
2015-03-02 23:21 GMT+08:00 Adrian Klaver <adrian.klaver@aklaver.com>:
psql -U postgres
psql (9.3.5)
Type "help" for help.
postgres=# \db+
List of tablespaces
Name | Owner | Location | Access privileges | Description
------------+----------+---------------+-------------------+-------------
pg_default | postgres | | |
pg_global | postgres | | |
pgtblspc | laser | /ssd/pgtblspc | |
(3 rows)
On 03/02/2015 02:49 AM, hailong Li wrote:
Hi, dear pgsql-general
The details are as follows:
*1. environment*
*DB Master*
$ cat /etc/issue
CentOS release 6.5 (Final)
Kernel \r on an \m
$ uname -av
Linux l-xxxxx1.xx.cnx 3.14.29-3.centos6.x86_64 #1 SMP Tue Jan 20
17:48:32 CST 2015 x86_64 x86_64 x86_64 GNU/Linux
$ psql -U postgres
psql (9.3.5)
Type "help" for help.
postgres=# select version();
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
(1 row)
*DB Slave
*$ cat /etc/issue
CentOS release 6.5 (Final)
Kernel \r on an \m
$ uname -av
Linux l-xxxx2.xx.cnx 3.14.31-3.centos6.x86_64 #1 SMP Mon Feb 2 15:26:04
CST 2015 x86_64 x86_64 x86_64 GNU/Linux*
*
$ psql -U postgres
psql (9.3.5)
Type "help" for help.
postgres=# select version();
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
(1 row)
*The salve was running, but stopped suddenly , and I never start it !
Anyone encounter the same problem? Could tell me why and how to avoid it?
If you need some more detailed information, please tell me and I'll give it to you.
So what sort of replication(streaming, archiving, synchronous,etc) where you doing?
streaminig
Was the streaming happening across a local network or a remote network?
local network
Was there a hardware issue on either of the machines?
I did not find anything wrong with a hardware issue on either of the machines at that time, but there is some data on the tablespace which is located on the SSD of the master machine .
psql -U postgres
psql (9.3.5)
Type "help" for help.
postgres=# \db+
List of tablespaces
Name | Owner | Location | Access privileges | Description
------------+----------+---------------+-------------------+-------------
pg_default | postgres | | |
pg_global | postgres | | |
pgtblspc | laser | /ssd/pgtblspc | |
(3 rows)
Finally , I made a new slave instance on the slave server and it works fine until now.
Thanks
Best Regards!
--
Adrian Klaver
adrian.klaver@aklaver.com
Re: The slave suddenly stopped with such DB log : "will not overwrite a used ItemId" and "heap_insert_redo: failed to add tuple"
From
Jim Nasby
Date:
On 3/3/15 6:52 AM, hailong Li wrote: > > Finally , I made a new slave instance on the slave server and it works > fine until now. Just so you're aware, that error means there was page level corruption either on the replica or possibly on the master, or the replication stream or WAL files got corrupted. You probably have either a hardware or a configuration problem somewhere. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Re: The slave suddenly stopped with such DB log : "will not overwrite a used ItemId" and "heap_insert_redo: failed to add tuple"
From
hailong Li
Date:
2015-03-05 16:34 GMT+08:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
Actually there were 3 slave nodes of the same master, and they stopped nearly at the same time. So, I prefer page level corruption on the master to the slaves, and SSD hardware to configuration problem somewhere.
On 3/3/15 6:52 AM, hailong Li wrote:
Finally , I made a new slave instance on the slave server and it works
fine until now.
Just so you're aware, that error means there was page level corruption either on the replica or possibly on the master, or the replication stream or WAL files got corrupted. You probably have either a hardware or a configuration problem somewhere.
Actually there were 3 slave nodes of the same master, and they stopped nearly at the same time. So, I prefer page level corruption on the master to the slaves, and SSD hardware to configuration problem somewhere.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com