Home > mailing lists

Re: [Incident report]Backend process crashed when executing 2pc transaction - Mailing list pgsql-hackers

From	Amit Langote
Subject	Re: [Incident report]Backend process crashed when executing 2pc transaction
Date	November 28, 2019 12:01:18
Msg-id	CA+HiwqGxmSxu8e07sNLEmKJqFm7-69QhidjA+huA1ifm0n1CnA@mail.gmail.com Whole thread Raw
In response to	Re: [Incident report]Backend process crashed when executing 2pc transaction (Marco Slot <marco@citusdata.com>)
Responses	RE: [Incident report]Backend process crashed when executing 2pctransaction (Ranier Vilela <ranier_gyn@hotmail.com>)
List	pgsql-hackers

Tree view

Hi Marco,

On Thu, Nov 28, 2019 at 5:02 PM Marco Slot <marco@citusdata.com> wrote:
>
> On Thu, Nov 28, 2019 at 6:18 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > Interesting.  Still, I think you'd be in better position than anyone
> > else to come up with reproduction steps for vanilla PostgreSQL by
> > analyzing the stack trace if and when the crash next occurs (or using
> > the existing core dump).  It's hard to tell by only guessing what may
> > have gone wrong when there is external code involved, especially
> > something like Citus that hooks into many points within vanilla
> > PostgreSQL.
>
> To clarify: In a Citus cluster you typically have a coordinator which
> contains the "distributed tables" and one or more workers which
> contain the data. All are PostgreSQL servers with the citus extension.
> The coordinator uses every available hook in PostgreSQL to make the
> distributed tables behave like regular tables. Any crash on the
> coordinator is likely to be attributable to Citus, because most of the
> code that is exercised is Citus code. The workers are used as regular
> PostgreSQL servers with the coordinator acting as a regular client. On
> the worker, the ProcessUtility hook will just pass on the arguments to
> standard_ProcessUtility without any processing. The crash happened on
> a worker.

Thanks for clarifying.

> One interesting thing is the prepared transaction name generated by
> the coordinator, which follows the form: citus_<coordinator node
> id>_<pid>_<server-wide transaction number >_<prepared transaction
> number in session>. The server-wide transaction number is a 64-bit
> counter that is kept in shared memory and starts at 1. That means that
> over 4 billion (4207001212) transactions happened on the coordinator
> since the server started, which quite possibly resulted in 4 billion
> prepared transactions on this particular server. I'm wondering if some
> counter is overflowing.

Interesting.  This does kind of gets us closer to figuring out what
might have gone wrong, but hard to tell without the core dump at hand.

Thanks,
Amit

pgsql-hackers by date:

From: Daniel Gustafsson
Date: 28 November 2019, 11:58:06
Subject: Re: format of pg_upgrade loadable_libraries warning

From: Hubert Zhang
Date: 28 November 2019, 12:23:59
Subject: Yet another vectorized engine

Re: [Incident report]Backend process crashed when executing 2pc transaction - Mailing list pgsql-hackers

Previous

Next