Thread: FreeBSD 9.0/amd64, PostgreSQL 9.1.2, pgbouncer 1.4.2: segmentation fault

FreeBSD 9.0/amd64, PostgreSQL 9.1.2, pgbouncer 1.4.2: segmentation fault

From
Andrew Alcheyev
Date:
Hello everybody.

Today I have run into some strange situation, maybe a bug.

There is a client with variable
"default_transaction_isolation='serializable'" set in role properties
and it connects to Postgres via pgbouncer.

The problem appears after it executes several ten thousand queries, the
most of them are SELECTs, any other kinds occur very seldom (of
course, the tables used are heavily explored by other clients).

At some point the backend with this client crashes due to segmentation
fault with signal 11.

GDB shows the following:
#0  0x00000000006b3a5b in SHMQueueDelete (queue=0x845b4c498) at shmqueue.c:77
77              prevElem->next = queue->next;
[New Thread 8025cd400 (LWP 100635/<unknown>)]
[New LWP 100709]
(gdb) bt
#0  0x00000000006b3a5b in SHMQueueDelete (queue=0x845b4c498) at shmqueue.c:77
#1  0x00000000006c16ef in SummarizeOldestCommittedSxact () at predicate.c:1467
#2  0x00000000006c197f in RegisterSerializableTransactionInt (snapshot=0xbe7480) at predicate.c:1605
#3  0x00000000006c190a in RegisterSerializableTransaction (snapshot=0xbe7480) at predicate.c:1569
#4  0x00000000007ffcf7 in GetTransactionSnapshot () at snapmgr.c:138
#5  0x00000000006cbaf5 in exec_simple_query (query_string=0x80249e030 "SELECT 'DBD::Pg ping test'") at postgres.c:932
#6  0x00000000006cfcb3 in PostgresMain (argc=2, argv=0x80249a890, username=0x80249a860 "stat") at postgres.c:3926
#7  0x000000000068340d in BackendRun (port=0x802448900) at postmaster.c:3601
#8  0x0000000000682b1a in BackendStartup (port=0x802448900) at postmaster.c:3286
#9  0x000000000067ff5c in ServerLoop () at postmaster.c:1455
#10 0x000000000067f73d in PostmasterMain (argc=3, argv=0x7fffffffdb90) at postmaster.c:1116
#11 0x00000000005fa67a in main (argc=3, argv=0x7fffffffdb90) at main.c:199
(gdb) p prevElem
$1 = (SHM_QUEUE *) 0x0

The other conditions are:
- FreeBSD 9.0-RELEASE / amd64 / ZFS and 12G RAM;
- PostgreSQL 9.1.2 built from ports (with debug symbols);
- pgbouncer 1.4.2 built from ports at the same host;

/etc/sysctl.conf:
kern.ipc.shmall=393216
kern.ipc.shmmax=2147483648
kern.ipc.shm_use_phys=1

/boot/loader.conf:
kern.ipc.semmni=256
kern.ipc.semmns=512
kern.ipc.semmnu=256

postgresql.conf:
shared_buffers = 1GB
temp_buffers = 64MB
max_prepared_transactions = 20
work_mem = 16MB
wal_buffers = 128kB


If one undefs variable "default_transaction_isolation", then the
situation vanishes.

So why does this thing happen? Is there a bug in Postgresql or FreeBSD?

I'd be glad to produce any other meaning information.


With the best regards, Andrew.

Re: FreeBSD 9.0/amd64, PostgreSQL 9.1.2, pgbouncer 1.4.2: segmentation fault

From
Heikki Linnakangas
Date:
On 10.01.2012 17:22, Andrew Alcheyev wrote:
> At some point the backend with this client crashes due to segmentation
> fault with signal 11.
>
> GDB shows the following:
> #0  0x00000000006b3a5b in SHMQueueDelete (queue=0x845b4c498) at shmqueue.c:77
> 77              prevElem->next = queue->next;
> [New Thread 8025cd400 (LWP 100635/<unknown>)]
> [New LWP 100709]
> (gdb) bt
> #0  0x00000000006b3a5b in SHMQueueDelete (queue=0x845b4c498) at shmqueue.c:77
> #1  0x00000000006c16ef in SummarizeOldestCommittedSxact () at predicate.c:1467
> #2  0x00000000006c197f in RegisterSerializableTransactionInt (snapshot=0xbe7480) at predicate.c:1605
> #3  0x00000000006c190a in RegisterSerializableTransaction (snapshot=0xbe7480) at predicate.c:1569
> #4  0x00000000007ffcf7 in GetTransactionSnapshot () at snapmgr.c:138
> #5  0x00000000006cbaf5 in exec_simple_query (query_string=0x80249e030 "SELECT 'DBD::Pg ping test'") at postgres.c:932
> #6  0x00000000006cfcb3 in PostgresMain (argc=2, argv=0x80249a890, username=0x80249a860 "stat") at postgres.c:3926
> #7  0x000000000068340d in BackendRun (port=0x802448900) at postmaster.c:3601
> #8  0x0000000000682b1a in BackendStartup (port=0x802448900) at postmaster.c:3286
> #9  0x000000000067ff5c in ServerLoop () at postmaster.c:1455
> #10 0x000000000067f73d in PostmasterMain (argc=3, argv=0x7fffffffdb90) at postmaster.c:1116
> #11 0x00000000005fa67a in main (argc=3, argv=0x7fffffffdb90) at main.c:199
> (gdb) p prevElem
> $1 = (SHM_QUEUE *) 0x0

That clearly looks like a bug in the SSI feature, introduced in
PostgreSQL 9.1.

This looks superficically similar to the bug that Dan Ports spotted on
Friday:
http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg190135.html.
If you can reproduce the issue easily, could you try the patch he posted
and see if it fixes it?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: FreeBSD 9.0/amd64, PostgreSQL 9.1.2, pgbouncer 1.4.2: segmentation fault

From
Andrew Alcheyev
Date:
Hello!

On Tuesday, January 10, 2012, 10:11:04 PM you wrote:

HL> That clearly looks like a bug in the SSI feature, introduced in
HL> PostgreSQL 9.1.

HL> This looks superficically similar to the bug that Dan Ports spotted on
HL> Friday:
HL> http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg190135.html.
HL> If you can reproduce the issue easily, could you try the patch he posted
HL> and see if it fixes it?

I had applied the patch suggested and it fixed backend crashes, at
least for a while - the server didn't crash for the first half of
today. So I think that the patch fixes my situation. Unfortunately, it
fixes only half of my problem.

Well, it does good and the backend hasn't crashed yet, but the client
is still experiencing query problems at some point (not far, I guess,
from where its backend would segfault without the patch). This time it
encounters the following error from the backend:

ERROR:  out of shared memory
HINT:  You might need to increase max_pred_locks_per_transaction.

In my first letter I forgot to mention that the client has multiple
instances and they query the server at the same time. But what it is
interesting that this error has been reported to them more or less
simultaneously.

To repeat myself, if I set the client's variable
"default_transaction_isolation" to "read committed", then the error
disappers.

So what should I do? Do I need to increase
"max_pred_locks_per_transaction" in postgresql.conf?
And how can I calculate desired value?


With the best regards, Andrew.

Re: FreeBSD 9.0/amd64, PostgreSQL 9.1.2, pgbouncer 1.4.2: segmentation fault

From
"Kevin Grittner"
Date:
Andrew Alcheyev <buddy@telenet.ru> wrote:

> Well, it does good and the backend hasn't crashed yet, but the
> client is still experiencing query problems at some point (not
> far, I guess, from where its backend would segfault without the
> patch). This time it encounters the following error from the
> backend:
>
> ERROR:  out of shared memory
> HINT:  You might need to increase max_pred_locks_per_transaction.

I noticed that you are using prepared transactions.  Do you have any
lingering transactions prepared but not committed or rolled back?
(You can look in pg_prepared_xacts, and see when they were
prepared.)

> So what should I do? Do I need to increase
> "max_pred_locks_per_transaction" in postgresql.conf?

Maybe, but let's rule out other problems first.

> And how can I calculate desired value?

You would need to review pg_locks under load to get a handle on
that.  I don't think anyone has devised any generalized formula yet,
but if we rule out other problems, I'd be happy to review your lock
situation and make suggestions.

-Kevin

Re: FreeBSD 9.0/amd64, PostgreSQL 9.1.2, pgbouncer 1.4.2: segmentation fault

From
"Kevin Grittner"
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
> Andrew Alcheyev <buddy@telenet.ru> wrote:
>
>> Well, it does good and the backend hasn't crashed yet, but the
>> client is still experiencing query problems at some point (not
>> far, I guess, from where its backend would segfault without the
>> patch). This time it encounters the following error from the
>> backend:
>>
>> ERROR:  out of shared memory
>> HINT:  You might need to increase max_pred_locks_per_transaction.
>
> I noticed that you are using prepared transactions.  Do you have
> any lingering transactions prepared but not committed or rolled
> back?  (You can look in pg_prepared_xacts, and see when they were
> prepared.)
>
>> So what should I do? Do I need to increase
>> "max_pred_locks_per_transaction" in postgresql.conf?
>
> Maybe, but let's rule out other problems first.

Has this been resolved?  Any details would be useful to us.

-Kevin

Re: FreeBSD 9.0/amd64, PostgreSQL 9.1.2, pgbouncer 1.4.2: segmentation fault

From
Andrew Alcheyev
Date:
Hello!

I'm sorry that I could not answer at once.

Meanwhile I investigated the matter of my problem, learned about SSI
(mostly at http://wiki.postgresql.org/wiki/Serializable) and
realized that we wouldn't need "serializable" transaction isolation
level anymore (thanks to SSI). So I just turned it off within the
settings of the problematic client and thus resolved the issue we had
at our production server.

I don't think I'd like to reproduce the situation intentionally
whenever because it breaks the normal flow of our important operations
and provokes manual intervention to data processing.

On Wednesday, January 11, 2012, 9:26:48 PM you wrote:

KG> Andrew Alcheyev <buddy@telenet.ru> wrote:
KG>
>> Well, it does good and the backend hasn't crashed yet, but the
>> client is still experiencing query problems at some point (not
>> far, I guess, from where its backend would segfault without the
>> patch). This time it encounters the following error from the
>> backend:
>>
>> ERROR:  out of shared memory
>> HINT:  You might need to increase max_pred_locks_per_transaction.
KG>
KG> I noticed that you are using prepared transactions.  Do you have any
KG> lingering transactions prepared but not committed or rolled back?
KG> (You can look in pg_prepared_xacts, and see when they were
KG> prepared.)
KG>
>> So what should I do? Do I need to increase
>> "max_pred_locks_per_transaction" in postgresql.conf?
KG>
KG> Maybe, but let's rule out other problems first.
KG>
>> And how can I calculate desired value?
KG>
KG> You would need to review pg_locks under load to get a handle on
KG> that.  I don't think anyone has devised any generalized formula yet,
KG> but if we rule out other problems, I'd be happy to review your lock
KG> situation and make suggestions.
KG>
KG> -Kevin



With the best regards, Andrew.