Re: Segmentation fault with core dump - Mailing list pgsql-general

From Hiroshi Inoue
Subject Re: Segmentation fault with core dump
Date
Msg-id 51B73A00.1030206@tpf.co.jp
Whole thread Raw
In response to Re: Segmentation fault with core dump  (Joshua Berry <yoberi@gmail.com>)
Responses Re: Segmentation fault with core dump  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Hi,

(2013/05/09 1:39), Joshua Berry wrote:
> | I'm using PG 9.1.9 with a client application using various versions of
> the
> | pgsqlODBC driver on Windows. Cursors are used heavily, as well as some
> pretty
> | heavy trigger queries on db writes which update several materialized
> views.
> |
> | The server has 48GB RAM installed, PG is configured for 12GB shared
> buffers,
> | 8MB max_stack_depth, 32MB temp_buffers, and 2MB work_mem. Most of the
> other
> | settings are defaults.
> |
> | The server will seg fault from every few days to up to two weeks. Each
> time
> | one of the postgres server processes seg faults, the server gets
> terminated by
> | signal 11, restarts in recovery for up to 30 seconds, after which time it
> | accepts connections as if nothing ever happened. Unfortunately all the
> open
> | cursors and connections are lost, so the client apps are left in a bad
> state.
> |
> | Seg faults have also occurred with PG 8.4. ... I migrated the database
> to a
> | server running PG9.1 with the hopes that the problem would disappear,
> but it
> | has not. So now I'm starting to debug.
> |
> | # uname -a
> | Linux [hostname] 2.6.32-358.2.1.el6.x86_64 #1 SMP Tue Mar 12 14:18:09
> CDT 2013
> | x86_64 x86_64 x86_64 GNU/Linux
> | # cat /etc/redhat-release
> | Scientific Linux release 6.3 (Carbon)
> |
> | # psql -U jberry
> | psql (9.1.9)
> | Type "help" for help.
> |
> | jberry=# select version();
> |                                                    version
> |
> -------------------------------------------------------------------------------
> |  PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
> 4.4.7
> |  20120313 (Red Hat 4.4.7-3), 64-bit
> | (1 row)
>
> I've had another postmaster segfault on my production server. It appears
> to be the same failure as the last one nearly a month ago, but I wanted
> to post the gdb bt details in case it helps shed light on the issue.
> Please let me know if anyone would like to drill into the dumped core
> with greater detail. Both the OS and PG versions remain unchanged.
>
> Kind Regards,
> -Joshua
>
>
> On Fri, Apr 12, 2013 at 6:12 AM, Andres Freund <andres@2ndquadrant.com
> <mailto:andres@2ndquadrant.com>> wrote:
>
>     On 2013-04-10 19:06:12 -0400, Tom Lane wrote:
>      > I wrote:
>      > > (Wanders away wondering just how much the regression tests exercise
>      > > holdable cursors.)
>      >
>      > And the answer is they're not testing this code path at all,
>     because if
>      > you do
>      >       DECLARE c CURSOR WITH HOLD FOR ...
>      >       FETCH ALL FROM c;
>      > then the second query executes with a portal (and resource owner)
>      > created to execute the FETCH command, not directly on the held
>     portal.
>      >
>      > After a little bit of thought I'm not sure it's even possible to
>      > reproduce this problem with libpq, because it doesn't expose any
>     way to
>      > issue a bare protocol Execute command against a pre-existing portal.
>      > (I had thought psqlOBC went through libpq, but maybe it's playing
>     some
>      > games here.)
>      >
>      > Anyway, I'm thinking the appropriate fix might be like this
>      >
>      > -             CurrentResourceOwner = portal->resowner;
>      > +             if (portal->resowner)
>      > +                     CurrentResourceOwner = portal->resowner;
>      >
>      > in several places in pquery.c; that is, keep using
>      > TopTransactionResourceOwner if the portal doesn't have its own.
>      >
>      > A more general but probably much more invasive solution would be
>     to fake
>      > up an intermediate portal when pulling data from a held portal, to
>      > more closely approximate the explicit-FETCH case.
>
>     We could also allocate a new resowner for the duration of that
>     transaction. That would get reassigned to the transactions resowner in
>     PreCommit_Portals (after a slight change there).
>     That actually seems simple enough?

I made some changes to multi thread handling of psqlodbc driver.
It's also better to fix the crash at backend side.

I made 2 patches.
The 1st one temporarily changes CurrentResourceOwner to
CurTransactionResourceOwner during catalog cache handling.
The 2nd one allocates a new resource owner for held portals.
Both fix the crash in my test case.

regards,
Hiroshi Inoue

Attachment

pgsql-general by date:

Previous
From: Philipp Kraus
Date:
Subject: Re: databse version
Next
From: Tom Lane
Date:
Subject: Re: Segmentation fault with core dump