Thread: BUG #11807: Postgresql server crashed when running transaction tests

BUG #11807: Postgresql server crashed when running transaction tests

From
bzhao@recognia.com
Date:
The following bug has been logged on the website:

Bug reference:      11807
Logged by:          Bing Zhao
Email address:      bzhao@recognia.com
PostgreSQL version: 9.2.9
Operating system:   CentOS 5.6
Description:

Recently our database component testing failed, we traced our source code
which nothing changed. Dev reported it's working on their dev machine but
failed in test environment. The only difference is postgresql server
version, dev is running 9.2.8 and test is running 9.2.9.After I downgraded
to 9.2.8 in test environment, the test suit works again. See logs and
database configuration under.

Log:
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted; last known up at 2014-10-28 10:57:41
EDT
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  redo starts at 0/867123F8
LOG:  record with zero length at 0/867768B8
LOG:  redo done at 0/86776878
LOG:  last completed transaction was at log time 2014-10-28
10:58:08.088209-04
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections

Conf:
listen_addresses = '*'
olap.rownum_name = 'default'
#log_autovacuum_min_duration = 0
#log_connections = on
#log_hostname = on
#log_disconnections = on
#log_lock_waits = on
log_statement = 'all'
maintenance_work_mem = 120MB
effective_cache_size = 1408MB
work_mem = 24MB
wal_buffers = 8MB
shared_buffers = 480MB
max_connections = 40
plperl.use_strict = on
plperl.on_init = 'use JSON::XS;'

Re: BUG #11807: Postgresql server crashed when running transaction tests

From
Heikki Linnakangas
Date:
On 10/28/2014 05:22 PM, bzhao@recognia.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      11807
> Logged by:          Bing Zhao
> Email address:      bzhao@recognia.com
> PostgreSQL version: 9.2.9
> Operating system:   CentOS 5.6
> Description:
>
> Recently our database component testing failed, we traced our source code
> which nothing changed. Dev reported it's working on their dev machine but
> failed in test environment. The only difference is postgresql server
> version, dev is running 9.2.8 and test is running 9.2.9.After I downgraded
> to 9.2.8 in test environment, the test suit works again. See logs and
> database configuration under.

Unfortunately there isn't any details in the logs that would help
tracking this down. Can you get a core dump, and from that a backtrace
using gdb?

What does the test program do? If you can narrow the test case down to a
small self-contained script and post that, that'd be great.

- Heikki

Re: BUG #11807: Postgresql server crashed when running transaction tests

From
Federico Campoli
Date:
On 28/10/14 15:30, Heikki Linnakangas wrote:
> On 10/28/2014 05:22 PM, bzhao@recognia.com wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference:      11807
>> Logged by:          Bing Zhao
>> Email address:      bzhao@recognia.com
>> PostgreSQL version: 9.2.9
>> Operating system:   CentOS 5.6
>> Description:
>>
>> Recently our database component testing failed, we traced our source code
>> which nothing changed. Dev reported it's working on their dev machine but
>> failed in test environment. The only difference is postgresql server
>> version, dev is running 9.2.8 and test is running 9.2.9.After I
>> downgraded
>> to 9.2.8 in test environment, the test suit works again. See logs and
>> database configuration under.
>
> Unfortunately there isn't any details in the logs that would help
> tracking this down. Can you get a core dump, and from that a backtrace
> using gdb?
>
> What does the test program do? If you can narrow the test case down to a
> small self-contained script and post that, that'd be great.

We had a similar problem.
The query causing the server crash involved a many nested nested loops
in the execution plan.

We downgraded to 9.2.8 which is fine.

I'm still investigating and I'll file a bug report asap.

Cheers
--
Federico Campoli
Brandwatch | Senior Database Administrator
federico@brandwatch.com |

New York  | San Francisco |  *Brighton*  |  Berlin  |  Stuttgart

Re: BUG #11807: Postgresql server crashed when running transaction tests

From
Michael Paquier
Date:
On Wed, Oct 29, 2014 at 12:40 AM, Federico Campoli
<federico@brandwatch.com> wrote:
> We had a similar problem.
> The query causing the server crash involved a many nested nested loops in
> the execution plan.
This report may be different of yours, what has happened here may be a
Postgres bug or for example something like the famous OOM killer,
performing a headshot on a process. As Heikki mentioned, without
details it is not really possible to track any root cause.
--
Michael