Thread: Postgresql 9.0.13 core dump

Postgresql 9.0.13 core dump

From
Laurentius Purba
Date:
Hello all,

I am having core dump on Postgres 9.0.13 with the message "...was terminated by signal 10: Bus error...".

So, I set a PID on the log file to capture specific PID that causing this crash. After, several crashes, I finally got the PID that causing this crash. But I am still not convinced that this process that causing this crash. That is the reason why I sent out this email, hoping anybody can help me or have had this experience before.

Based on the PID on the log file, the crash happened while the application was trying to update a table's field with binary (PDF) content. The datatype of this field is TEXT.

I am using FreeBSD 9.1 host that has 3 database jails. One of them is this database that causing core dump. Other databases are working fine.

Below is the information about the core dump.

Any help is appreciated.

Thanks!
-Laurent


[pgsql@MY-BOX ~]$ gdb postgres data/postgres.core                                                                                                                                                                         
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)...
Core was generated by `postgres'.
Program terminated with signal 10, Bus error.
Reading symbols from /usr/local/lib/libintl.so.9...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/libintl.so.9
Reading symbols from /usr/local/lib/libxml2.so.5...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/libxml2.so.5
Reading symbols from /usr/lib/libssl.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libssl.so.6
Reading symbols from /lib/libcrypto.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypto.so.6
Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.5
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /usr/local/lib/libiconv.so.3...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/libiconv.so.3
Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libz.so.6
Reading symbols from /usr/lib/liblzma.so.5...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/liblzma.so.5
Reading symbols from /usr/local/lib/nss_ldap.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/nss_ldap.so.1
Reading symbols from /usr/local/lib/libldap-2.4.so.8...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/libldap-2.4.so.8
Reading symbols from /usr/local/lib/liblber-2.4.so.8...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/liblber-2.4.so.8
Reading symbols from /usr/lib/libkrb5.so.10...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libkrb5.so.10
Reading symbols from /usr/lib/libcom_err.so.5...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libcom_err.so.5
Reading symbols from /usr/lib/libgssapi_krb5.so.10...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libgssapi_krb5.so.10
Reading symbols from /usr/lib/libasn1.so.10...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libasn1.so.10
Reading symbols from /lib/libcrypt.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypt.so.5
Reading symbols from /usr/lib/libhx509.so.10...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libhx509.so.10
Reading symbols from /usr/lib/libroken.so.10...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libroken.so.10
Reading symbols from /usr/lib/libgssapi.so.10...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libgssapi.so.10
Reading symbols from /usr/local/lib/postgresql/plpgsql.so...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/postgresql/plpgsql.so
Reading symbols from /usr/local/lib/postgresql/dict_snowball.so...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/postgresql/dict_snowball.so
Reading symbols from /usr/local/lib/postgresql/plperl.so...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/postgresql/plperl.so
Reading symbols from /usr/local/lib/perl5/5.14.2/mach/CORE/libperl.so...(no debugging symbols found)...done.                                                                                                             [29/1378]
Loaded symbols for /usr/local/lib/perl5/5.14.2/mach/CORE/libperl.so
Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib/libthr.so.3
Reading symbols from /lib/libutil.so.9...(no debugging symbols found)...done.
Loaded symbols for /lib/libutil.so.9
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x0000000910d8121b in pthread_mutex_lock () from /lib/libthr.so.3
[New Thread 90f129000 (LWP 119947/postgres)]
(gdb) bt
#0  0x0000000910d8121b in pthread_mutex_lock () from /lib/libthr.so.3
#1  0x0000000800f38635 in xmlRMutexLock () from /usr/local/lib/libxml2.so.5
#2  0x0000000800f877bc in __xmlRandom () from /usr/local/lib/libxml2.so.5
#3  0x0000000800f87859 in xmlDictCreate () from /usr/local/lib/libxml2.so.5
#4  0x0000000800ecb685 in xmlInitParserCtxt () from /usr/local/lib/libxml2.so.5
#5  0x0000000800ecb6fe in xmlNewParserCtxt () from /usr/local/lib/libxml2.so.5
#6  0x000000000068e8d5 in cursor_to_xml ()
#7  0x000000000068eacf in xmlparse ()
#8  0x0000000000544540 in GetAttributeByNum ()
#9  0x000000000054127a in ExecProject ()
#10 0x0000000000547bc4 in ExecScan ()
#11 0x0000000000540d6d in ExecProcNode ()
#12 0x0000000000553ec3 in ExecModifyTable ()
#13 0x0000000000540ddd in ExecProcNode ()
#14 0x000000000053f9cf in standard_ExecutorRun ()
#15 0x000000000055e36b in SPI_saveplan ()
#16 0x000000000055e7ad in SPI_execute_plan_with_paramlist ()
#17 0x000000090d1483ad in exec_get_datum_type () from /usr/local/lib/postgresql/plpgsql.so
#18 0x000000090d14a3e2 in exec_get_datum_type () from /usr/local/lib/postgresql/plpgsql.so
#19 0x000000090d14bcff in exec_get_datum_type () from /usr/local/lib/postgresql/plpgsql.so
#20 0x000000090d14a610 in exec_get_datum_type () from /usr/local/lib/postgresql/plpgsql.so
#21 0x000000090d149ddb in exec_get_datum_type () from /usr/local/lib/postgresql/plpgsql.so
#22 0x000000090d14c693 in plpgsql_exec_function () from /usr/local/lib/postgresql/plpgsql.so
#23 0x000000090d142243 in plpgsql_call_handler () from /usr/local/lib/postgresql/plpgsql.so
#24 0x00000000006a79c5 in OidFunctionCall1 ()
#25 0x0000000000544540 in GetAttributeByNum ()
#26 0x000000000054127a in ExecProject ()
#27 0x00000000005559f3 in ExecResult ()
#28 0x0000000000540ded in ExecProcNode ()
#29 0x000000000053f9cf in standard_ExecutorRun ()
#30 0x000000000055e36b in SPI_saveplan ()
#31 0x000000000055e7ad in SPI_execute_plan_with_paramlist ()
#32 0x000000090d146a53 in exec_get_datum_type () from /usr/local/lib/postgresql/plpgsql.so
#33 0x000000090d149fdd in exec_get_datum_type () from /usr/local/lib/postgresql/plpgsql.so
#34 0x000000090d149ddb in exec_get_datum_type () from /usr/local/lib/postgresql/plpgsql.so
#35 0x000000090d14c693 in plpgsql_exec_function () from /usr/local/lib/postgresql/plpgsql.so
#36 0x000000090d142243 in plpgsql_call_handler () from /usr/local/lib/postgresql/plpgsql.so
#37 0x00000000006a79c5 in OidFunctionCall1 ()
#38 0x0000000000544540 in GetAttributeByNum ()
#39 0x000000000054127a in ExecProject ()
#40 0x00000000005559f3 in ExecResult ()
#41 0x0000000000540ded in ExecProcNode ()
#42 0x000000000053f9cf in standard_ExecutorRun ()
#43 0x00000000005f3d88 in PostgresMain ()
#44 0x00000000005f5271 in PortalRun ()
#45 0x00000000005f2c60 in PostgresMain ()
#46 0x00000000005c174b in ClosePostmasterPorts ()
#47 0x00000000005c23c7 in PostmasterMain ()
#48 0x000000000056c0fe in main ()
(gdb) quit


Re: Postgresql 9.0.13 core dump

From
Kevin Grittner
Date:
Laurentius Purba <lpurba@sproutloud.com> wrote:

> I am having core dump on Postgres 9.0.13 with the message "...was
> terminated by signal 10: Bus error...".

Every time I have seen this it has been a bug in VM or jail code;
although a hardware problem could also cause it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Postgresql 9.0.13 core dump

From
Tomas Vondra
Date:
On 14.10.2013 22:18, Laurentius Purba wrote:
> Hello all,
>
> I am having core dump on Postgres 9.0.13 with the message "...was
> terminated by signal 10: Bus error...".
>
> So, I set a PID on the log file to capture specific PID that causing
> this crash. After, several crashes, I finally got the PID that causing
> this crash. But I am still not convinced that this process that causing
> this crash. That is the reason why I sent out this email, hoping anybody
> can help me or have had this experience before.
>
> Based on the PID on the log file, the crash happened while the
> application was trying to update a table's field with binary (PDF)
> content. The datatype of this field is TEXT.

Hi Laurentius,

wouldn't it be better to use BYTEA columns for binary content, not TEXT?
I'm not sure if that's a problem with PDF, but generally TEXT does not
allow some octet values (e.g. '\0').

The backtrace you've posted however lists a bunch of libxml functions at
the top, and in my experience libxml is not the best coded piece of
software. So I'd guess the problem is somewhere within libxml. What
version of libxml are you using?

Signal 10 usually means hardware error (but if the other jails are
running fine, it's unlikely) or about the same as SEGFAULT (i.e.
accessing invalid memory etc.).

What I don't understand is why the call ended in libxml when you're
dealing with PDF?

Is this reproducible? Does that happen with a particular PDF, or with
random PDF documents? Can you prepare a small self-contained test case?

regards
Tomas



Re: Postgresql 9.0.13 core dump

From
Laurentius Purba
Date:
Hi Kevin,

Thanks for your response.

I did google this error message, "...signal 10: Bus error.." and found the issue with hardware problem, memory.

Do you have any other pointers or clues that I can look into?

-Laurent


On Mon, Oct 14, 2013 at 4:33 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
Laurentius Purba <lpurba@sproutloud.com> wrote:

> I am having core dump on Postgres 9.0.13 with the message "...was
> terminated by signal 10: Bus error...".

Every time I have seen this it has been a bug in VM or jail code;
although a hardware problem could also cause it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Postgresql 9.0.13 core dump

From
Kevin Grittner
Date:
Laurentius Purba <lpurba@sproutloud.com> wrote:

> I did google this error message, "...signal 10: Bus error.." and
> found the issue with hardware problem, memory.
>
> Do you have any other pointers or clues that I can look into?

Well, the very first thing I would do is to make sure that the OS
and jail software was up-to-date, in case it is a bug which has
been fixed.  I might take a look at BIOS and firmware levels, too.
Tomas makes a good point about the XML library tending to be
problematic, so I would make sure that was current.

If I still had the problem after that, I would try to distil it
down to the smallest reproducible test case, and see whether it ran
OK outside the jail.

Another approach would be to run hardware tests, although that
tends to require a longer maintenance window than the other things.

Basically, there are a lot of layers the problem could be in, and
not a lot of reason to suspect any particular layer, so you need to
start ruling things out.  In a situation like that I tend to start
with the fastest, easiest layers to check first.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Postgresql 9.0.13 core dump

From
Laurentius Purba
Date:
Kevin,

Thanks for the response. I will look into it based on your suggestion.

-Laurent


On Mon, Oct 14, 2013 at 8:29 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
Laurentius Purba <lpurba@sproutloud.com> wrote:

> I did google this error message, "...signal 10: Bus error.." and
> found the issue with hardware problem, memory.
>
> Do you have any other pointers or clues that I can look into?

Well, the very first thing I would do is to make sure that the OS
and jail software was up-to-date, in case it is a bug which has
been fixed.  I might take a look at BIOS and firmware levels, too.
Tomas makes a good point about the XML library tending to be
problematic, so I would make sure that was current.

If I still had the problem after that, I would try to distil it
down to the smallest reproducible test case, and see whether it ran
OK outside the jail.

Another approach would be to run hardware tests, although that
tends to require a longer maintenance window than the other things.

Basically, there are a lot of layers the problem could be in, and
not a lot of reason to suspect any particular layer, so you need to
start ruling things out.  In a situation like that I tend to start
with the fastest, easiest layers to check first.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Postgresql 9.0.13 core dump

From
Alban Hertroys
Date:
On 15 October 2013 14:48, Laurentius Purba <lpurba@sproutloud.com> wrote:
> Kevin,
>
> Thanks for the response. I will look into it based on your suggestion.

I seem to recall there was an optimization issue in llvm that could
cause such behaviour with virtual machines. What I don't recall is
whether the solution was to compile the VM or the application (in this
case PostgreSQL) using gcc.

Check the mailing list archives for FreeBSD-stable for this issue on
VM's and LLVM.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: Postgresql 9.0.13 core dump

From
Laurentius Purba
Date:
Hi Tomas,

Thanks for your response.

Regarding using BYTEA instead of TEXT for binary content, I did a google search prior sending my first email. 

Also, in my first email, I mentioned that I am not convinced this query, updating a field with pdf content in a table, causing this core dump. The reason is, out of few crashes, 3 or 4 crashes, this is the only PID that it's tied to that update query. The other crashes have their own PIDs that were not tied to any query(ies).

Regarding libxml, I am using libxml2-2.8.0_2     XML parser library for GNOME.

Is this reproducible? Unfortunately it is not. Regarding an update PDF, I am not sure if this update causing core dump, as I mentioned in my second paragraph.

Anyway, thanks for your response, Tomas.

-Laurent 


On Mon, Oct 14, 2013 at 4:37 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
On 14.10.2013 22:18, Laurentius Purba wrote:
> Hello all,
>
> I am having core dump on Postgres 9.0.13 with the message "...was
> terminated by signal 10: Bus error...".
>
> So, I set a PID on the log file to capture specific PID that causing
> this crash. After, several crashes, I finally got the PID that causing
> this crash. But I am still not convinced that this process that causing
> this crash. That is the reason why I sent out this email, hoping anybody
> can help me or have had this experience before.
>
> Based on the PID on the log file, the crash happened while the
> application was trying to update a table's field with binary (PDF)
> content. The datatype of this field is TEXT.

Hi Laurentius,

wouldn't it be better to use BYTEA columns for binary content, not TEXT?
I'm not sure if that's a problem with PDF, but generally TEXT does not
allow some octet values (e.g. '\0').

The backtrace you've posted however lists a bunch of libxml functions at
the top, and in my experience libxml is not the best coded piece of
software. So I'd guess the problem is somewhere within libxml. What
version of libxml are you using?

Signal 10 usually means hardware error (but if the other jails are
running fine, it's unlikely) or about the same as SEGFAULT (i.e.
accessing invalid memory etc.).

What I don't understand is why the call ended in libxml when you're
dealing with PDF?

Is this reproducible? Does that happen with a particular PDF, or with
random PDF documents? Can you prepare a small self-contained test case?

regards
Tomas



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Postgresql 9.0.13 core dump

From
anjanu
Date:
We figured out the problem. It was not FreeBSD jail related but libxml2. We
had it compiled with thread support. After we recompiled libxml with thread
support disabled, the core dumps disappeared.

Thanks everyone for your help.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Postgresql-9-0-13-core-dump-tp5774532p5776850.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.