Thread: Postgres process is crashing continously in 9.1.1
Hi,
We are using Postgres 9.1.1 on a board with Coldfire controller.
The postgres processes are crashing and restarting upon executing a particular instruction and it keeps repeating. Even when we tried with Postgres 9.1.3, same problem happens.
It works fine until the FINANCIALTRANSACTIONID reaches 1000.
But the same setup is working fine on a windows PC. We have tried to compare the configuration differences between windows PC and the board and found that only difference is the Shared Buffers which is 32 on the PC and 24 on the board.
I am pasting the server log from the board here.
The line highlighted in yellow is the instruction which is causing the crash.
Please let us know why this crash is happening and how we can fix it.
LOG: redo starts at 0/D9B75B4
LOG: record with zero length at 0/D9BBE5C
LOG: redo done at 0/D9BBE22
LOG: last completed transaction was at log time 2012-05-22 02:22:26.641488+00
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
ERROR: duplicate key value violates unique constraint "financialtransaction_pkey"
DETAIL: Key (financialtransactionid)=(1004) already exists.
STATEMENT: Insert into FINANCIALTRANSACTION (ATTENDANT,ENGINEHOUR,RECEIPTPRINTED,FINANCIALTRANSACTIONID) values ('0','0.0','0','1004')
LOG: server process (PID 4016) was terminated by signal 11: Segmentation fault
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
FATAL: poll() failed in statistics collector: Unknown error 516
LOG: statistics collector process (PID 3962) exited with exit code 1
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted; last known up at 2012-05-22 02:22:29 UTC
LOG: database system was not properly shut down; automatic recovery in progress
LOG: consistent recovery state reached at 0/D9BBEAA
LOG: redo starts at 0/D9BBEAA
LOG: record with zero length at 0/D9C07FA
LOG: redo done at 0/D9C07C0
LOG: last completed transaction was at log time 2012-05-22 02:23:05.372245+00
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
ERROR: duplicate key value violates unique constraint "financialtransaction_pkey"
DETAIL: Key (financialtransactionid)=(1004) already exists.
STATEMENT: Insert into FINANCIALTRANSACTION (ATTENDANT,ENGINEHOUR,RECEIPTPRINTED,FINANCIALTRANSACTIONID) values ('0','0.0','0','1004')
LOG: server process (PID 4098) was terminated by signal 11: Segmentation fault
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
FATAL: poll() failed in statistics collector: Unknown error 516
LOG: statistics collector process (PID 4035) exited with exit code 1
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted; last known up at 2012-05-22 02:23:08 UTC
LOG: database system was not properly shut down; automatic recovery in progress
LOG: consistent recovery state reached at 0/D9C0848
LOG: redo starts at 0/D9C0848
LOG: record with zero length at 0/D9C5218
LOG: redo done at 0/D9C51DE
LOG: last completed transaction was at log time 2012-05-22 02:23:49.659502+00
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
Thanks and Regards
Jayashankar
Larsen & Toubro Limited
www.larsentoubro.com
This Email may contain confidential or privileged information for the intended recipient (s). If you are not the intended recipient, please do not use or disseminate the information, notify the sender and delete it from your system.
Earth Day. Every Day.
We can understand the difference in shared buffer size as the Windows PC has 2GB of RAM and the board has 256MB of RAM.
So please let us know if this shared buffer parameter has any relation to the problem we are facing.
Thanks and Regards
Jayashankar
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jayashankar K B
Sent: Tuesday, May 22, 2012 11:27 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Postgres process is crashing continously in 9.1.1
Hi,
We are using Postgres 9.1.1 on a board with Coldfire controller.
The postgres processes are crashing and restarting upon executing a particular instruction and it keeps repeating. Even when we tried with Postgres 9.1.3, same problem happens.
It works fine until the FINANCIALTRANSACTIONID reaches 1000.
But the same setup is working fine on a windows PC. We have tried to compare the configuration differences between windows PC and the board and found that only difference is the Shared Buffers which is 32 on the PC and 24 on the board.
I am pasting the server log from the board here.
The line highlighted in yellow is the instruction which is causing the crash.
Please let us know why this crash is happening and how we can fix it.
LOG: redo starts at 0/D9B75B4
LOG: record with zero length at 0/D9BBE5C
LOG: redo done at 0/D9BBE22
LOG: last completed transaction was at log time 2012-05-22 02:22:26.641488+00
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
ERROR: duplicate key value violates unique constraint "financialtransaction_pkey"
DETAIL: Key (financialtransactionid)=(1004) already exists.
STATEMENT: Insert into FINANCIALTRANSACTION (ATTENDANT,ENGINEHOUR,RECEIPTPRINTED,FINANCIALTRANSACTIONID) values ('0','0.0','0','1004')
LOG: server process (PID 4016) was terminated by signal 11: Segmentation fault
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
FATAL: poll() failed in statistics collector: Unknown error 516
LOG: statistics collector process (PID 3962) exited with exit code 1
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted; last known up at 2012-05-22 02:22:29 UTC
LOG: database system was not properly shut down; automatic recovery in progress
LOG: consistent recovery state reached at 0/D9BBEAA
LOG: redo starts at 0/D9BBEAA
LOG: record with zero length at 0/D9C07FA
LOG: redo done at 0/D9C07C0
LOG: last completed transaction was at log time 2012-05-22 02:23:05.372245+00
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
ERROR: duplicate key value violates unique constraint "financialtransaction_pkey"
DETAIL: Key (financialtransactionid)=(1004) already exists.
STATEMENT: Insert into FINANCIALTRANSACTION (ATTENDANT,ENGINEHOUR,RECEIPTPRINTED,FINANCIALTRANSACTIONID) values ('0','0.0','0','1004')
LOG: server process (PID 4098) was terminated by signal 11: Segmentation fault
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
FATAL: poll() failed in statistics collector: Unknown error 516
LOG: statistics collector process (PID 4035) exited with exit code 1
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted; last known up at 2012-05-22 02:23:08 UTC
LOG: database system was not properly shut down; automatic recovery in progress
LOG: consistent recovery state reached at 0/D9C0848
LOG: redo starts at 0/D9C0848
LOG: record with zero length at 0/D9C5218
LOG: redo done at 0/D9C51DE
LOG: last completed transaction was at log time 2012-05-22 02:23:49.659502+00
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
Thanks and Regards
Jayashankar
Larsen & Toubro Limited
www.larsentoubro.com
This Email may contain confidential or privileged information for the intended recipient (s). If you are not the intended recipient, please do not use or disseminate the information, notify the sender and delete it from your system.
Earth Day. Every Day.
Larsen & Toubro Limited
www.larsentoubro.com
This Email may contain confidential or privileged information for the intended recipient (s). If you are not the intended recipient, please do not use or disseminate the information, notify the sender and delete it from your system.
Earth Day. Every Day.
On 05/21/12 11:05 PM, Jayashankar K B wrote: > board with Coldfire controller. what is this board? Coldfire is the embedded 68k-like Freescale processor? what operating system is this under? what sort of storage does this embedded system use for the database? telling us FINANCIALWHATEVERID > 1000 doesn't really do us much good since we have no idea what your database looks like, or what your code is doing. the log seems to indicate there was a constraint violation just before the exception hit. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
Yes the board has the embedded 68k architecture based Freescale Coldfire processor. The board has a custom built Linux based on the kernel 2.6.38 The database is stored on an SD card of 4GB capacity. This is the table we have. CREATE TABLE financialtransaction ( FINANCIALTRANSACTIONID BIGINT NOT NULL PRIMARY KEY, TIME_STAMP TIMESTAMP, ATTENDANT SMALLINT, RECEIPTPRINTED BOOLEAN DEFAULT FALSE, ODOMETER VARCHAR(20), ENGINEHOUR NUMERIC(9,2), CONSTRAINT financialtransaction_pkey PRIMARY KEY (FINANCIALTRANSACTIONID ) ) WITH ( OIDS=FALSE ); ALTER TABLE financialtransaction OWNER TO postgres; On writing into this table, a stored procedure is triggered which inserts into another table. But crash is happening while writing into this financialtransaction table once this table has more than 1000 records. Please let me know if you need any other information. Thanks and Regards Jayashankar -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of John R Pierce Sent: 22 May 2012 PM 12:00 To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Postgres process is crashing continously in 9.1.1 On 05/21/12 11:05 PM, Jayashankar K B wrote: > board with Coldfire controller. what is this board? Coldfire is the embedded 68k-like Freescale processor? what operating system is this under? what sort of storage does this embedded system use for the database? telling us FINANCIALWHATEVERID > 1000 doesn't really do us much good since we have no idea what your database looks like,or what your code is doing. the log seems to indicate there was a constraint violation just before the exception hit. -- john r pierce N 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general Larsen & Toubro Limited www.larsentoubro.com This Email may contain confidential or privileged information for the intended recipient (s). If you are not the intendedrecipient, please do not use or disseminate the information, notify the sender and delete it from your system. Earth Day. Every Day.
On 05/22/2012 01:57 PM, Jayashankar K B wrote: > Please let us know why this crash is happening and how we can fix it. > LOG: server process (PID 4016) was terminated by signal 11: Segmentation > fault If you can't reproduce this crash on a more developer-friendly machine than your embedded system, what you will need to do is trap this crash and get a backtrace that shows where and how the Pg backend(s) died. Your embedded devs should hopefully have no problem with this. You can enable core dumps and have Pg coredump if you have the storage. This works even if you can't predict exactly when the crash will happen or which backend will crash. It requires enough disk space to write out a core file. If you're using a vaguely modern Linux kernel you can set a core dump path on an NFS volume or other network file store to write cores to, so you don't need local storage. See man 5 core http://linux.die.net/man/5/core and the kernel.core_pattern sysctl. Note that you can even pipe core dumps to a program (like, say, scp or netcat) so they don't have to be written even to a network mounted file system. Alternately, you can attach gdb to a backend you know will crash and trap the crash that way. See: http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD You will need PostgreSQL to have been compiled with debugging enabled and will need the debug symbols for your libraries. On many embedded platforms those are not included; the binaries are typically stripped. If you're working with stripped binaries you'll get one of the useless backtraces shown in the wiki article above. If your binaries are stripped you can still create a useful backtrace so long as you have access to unstripped copies of those binaries in your development environment, outside the running embedded machine, or you have debuginfo files. You need a core file, either one you let Linux save on crash, or one you created by trapping a crash with gdb and saving it with the "gcore /path/to/core/file/postgres.core" command. Once you have the core file and have it copied to your development environment, you can debug it with gdb from there using versions of your libraries with full debug symbols or detached debuginfo. Note that the libraries and PostgreSQL binaries must be EXACTLY IDENTICAL to the ones running on the real host except for not being stripped. You can't use binaries that're just the same version of the libraries, they have to be the _same_, built with the same version of the same compiler with the same options as the ones you were actually running. Usually they're the exact same binaries, just copies made before you stripped them for copying onto the embedded device. Of course, you'll be running gdb inside your cross-compile environment to debug. Again, your embedded developers should know how to do all this. If your embedded platform doesn't have debuginfo files or unstripped versions of your libraries, yell at whoever built it and get them to fix it. If you don't have unstripped binaries, you can still build a debug version of PostgreSQL and examine that, you'll just have lots of "???" entries for non-PostgreSQL parts of the call path. The stack trace might be useless, but might not be too. -- Craig Ringer
On Tue, May 22, 2012 at 4:51 PM, Jayashankar K B <Jayashankar.KB@lnties.com> wrote: > On writing into this table, a stored procedure is triggered which inserts into another table. > But crash is happening while writing into this financialtransaction table once this table has more than 1000 records. What language is the stored procedure written in? Is it possible that it's that procedure that segfaults? Postgres experts, do stored procedure segfaults bring down the backend process like that? ChrisA
But here, the crash is happening right at the insert statement. That is insert itself is failing. Unless the insert is successful, stored procedure is not triggered. Thanks and regards Jayashankar -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Chris Angelico Sent: Tuesday, May 22, 2012 3:10 PM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Postgres process is crashing continously in 9.1.1 On Tue, May 22, 2012 at 4:51 PM, Jayashankar K B <Jayashankar.KB@lnties.com> wrote: > On writing into this table, a stored procedure is triggered which inserts into another table. > But crash is happening while writing into this financialtransaction table once this table has more than 1000 records. What language is the stored procedure written in? Is it possible that it's that procedure that segfaults? Postgres experts,do stored procedure segfaults bring down the backend process like that? ChrisA -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general Larsen & Toubro Limited www.larsentoubro.com This Email may contain confidential or privileged information for the intended recipient (s). If you are not the intendedrecipient, please do not use or disseminate the information, notify the sender and delete it from your system. Earth Day. Every Day.
On Tue, May 22, 2012 at 8:23 PM, Jayashankar K B <Jayashankar.KB@lnties.com> wrote: > But here, the crash is happening right at the insert statement. That is insert itself is failing. > Unless the insert is successful, stored procedure is not triggered. Hmm. I wonder is it possible that going past ID 999 and into a four-digit number is causing stack damage that crashes the server a few iterations later... many things are possible. I'd look at the code of the procedure and see if there's any possible memory/stack issues. ChrisA
On Tue, May 22, 2012 at 5:41 AM, Chris Angelico <rosuav@gmail.com> wrote: > > On Tue, May 22, 2012 at 8:23 PM, Jayashankar K B > <Jayashankar.KB@lnties.com> wrote: > > But here, the crash is happening right at the insert statement. That is > > insert itself is failing. > > Unless the insert is successful, stored procedure is not triggered. > > Hmm. I wonder is it possible that going past ID 999 and into a > four-digit number is causing stack damage that crashes the server a > few iterations later... many things are possible. I'd look at the code > of the procedure and see if there's any possible memory/stack issues. Hm, on linux you check stack size with ulimit -s? If stack is set too low, a lower setting of max_stack_depth should prevent the crash. It's pretty hard to hit that unless you have extraordinarily complex and/or recursive functions though. Any chance of seeing the function source? merlin