Home > mailing lists

Re: [Incident report]Backend process crashed when executing 2pc transaction - Mailing list pgsql-hackers

From	Marco Slot
Subject	Re: [Incident report]Backend process crashed when executing 2pc transaction
Date	November 28, 2019 11:01:55
Msg-id	CANNhMLAjdTUzdwL50f8LX09je1jh+bZ6C4i=iZh8hgDEH0i0QA@mail.gmail.com Whole thread Raw
In response to	Re: [Incident report]Backend process crashed when executing 2pc transaction (Amit Langote <amitlangote09@gmail.com>)
Responses	Re: [Incident report]Backend process crashed when executing 2pc transaction (Amit Langote <amitlangote09@gmail.com>)
List	pgsql-hackers

Tree view

On Thu, Nov 28, 2019 at 6:18 AM Amit Langote <amitlangote09@gmail.com> wrote:
> Interesting.  Still, I think you'd be in better position than anyone
> else to come up with reproduction steps for vanilla PostgreSQL by
> analyzing the stack trace if and when the crash next occurs (or using
> the existing core dump).  It's hard to tell by only guessing what may
> have gone wrong when there is external code involved, especially
> something like Citus that hooks into many points within vanilla
> PostgreSQL.

To clarify: In a Citus cluster you typically have a coordinator which
contains the "distributed tables" and one or more workers which
contain the data. All are PostgreSQL servers with the citus extension.
The coordinator uses every available hook in PostgreSQL to make the
distributed tables behave like regular tables. Any crash on the
coordinator is likely to be attributable to Citus, because most of the
code that is exercised is Citus code. The workers are used as regular
PostgreSQL servers with the coordinator acting as a regular client. On
the worker, the ProcessUtility hook will just pass on the arguments to
standard_ProcessUtility without any processing. The crash happened on
a worker.

One interesting thing is the prepared transaction name generated by
the coordinator, which follows the form: citus_<coordinator node
id>_<pid>_<server-wide transaction number >_<prepared transaction
number in session>. The server-wide transaction number is a 64-bit
counter that is kept in shared memory and starts at 1. That means that
over 4 billion (4207001212) transactions happened on the coordinator
since the server started, which quite possibly resulted in 4 billion
prepared transactions on this particular server. I'm wondering if some
counter is overflowing.

cheers,
Marco

pgsql-hackers by date:

From: Masahiko Sawada
Date: 28 November 2019, 11:01:21
Subject: Re: [HACKERS] Block level parallel vacuum

From: Yugo Nagata
Date: 28 November 2019, 11:10:52
Subject: Re: Implementing Incremental View Maintenance

Re: [Incident report]Backend process crashed when executing 2pc transaction - Mailing list pgsql-hackers

Previous

Next