Thread: Deadlock detected after pg_repack receives SIGINT

Deadlock detected after pg_repack receives SIGINT

From

Jiří Hlinka

Date:

03 November 2015, 15:51:18

I'm running a pg_repack from a bash script with timeout of 10 minutes like
so (simplified version):
timeout -s SIGINT 10m pg_repack --table=frequently_updated_table

After the 10 min timeout, the OS sends SIGINT to pg_repack process so the
pg_repack calls:
SELECT repack.repack_drop($1, $2)
and it causes a deadlock with other process which is INSERTing into
frequently_updated_table that has a pg_repack's trigger to log every change
to a repack.log_[OID of frequently_updated_table].

Is it a bug of pg_repack or am I doing something wrong? Should I add some
detailed info?

Thank You for any help,
Jiri

PS: I was trying a mailing list of pg_repack (http://lists.pgfoundry.org/pipermail/reorg-genera) and github without any notice. If there is better place to ask, tell me, please.

Re: Deadlock detected after pg_repack receives SIGINT

From

Michael Paquier

Date:

03 November 2015, 16:44:11

On Tue, Nov 3, 2015 at 9:51 PM, Jiří Hlinka <jiri.hlinka@gmail.com> wrote:
> After the 10 min timeout, the OS sends SIGINT to pg_repack process so the
> pg_repack calls:
> SELECT repack.repack_drop($1, $2)
> and it causes a deadlock with other process which is INSERTing into
> frequently_updated_table that has a pg_repack's trigger to log every change
> to a repack.log_[OID of frequently_updated_table].

Recalling memories from this stuff, that's part of a cleanup rollback,
right? A repack can be a lengthy operation on relations that are
heavily updated as pg_repack keeps a trace of each DML event that
happens to the source table and then replays them one by one.

> Is it a bug of pg_repack or am I doing something wrong? Should I add some
> detailed info?

I doubt there is anything involving Postgres here. It seems that some
process is still holding a lock on a relation that is being dropped,
caused by a race condition in pg_repack code.

> PS: I was trying a mailing list of pg_repack
> (http://lists.pgfoundry.org/pipermail/reorg-genera) and github without any
> notice. If there is better place to ask, tell me, please.

I guess you should let those folks time to answer. If I were you I'd
begin first by letting more time for the repack operation to complete.
Regards,
--
Michael

Re: Deadlock detected after pg_repack receives SIGINT

From

Jim Nasby

Date:

04 November 2015, 05:48:36

On 11/3/15 7:44 AM, Michael Paquier wrote:
> I doubt there is anything involving Postgres here. It seems that some
> process is still holding a lock on a relation that is being dropped,
> caused by a race condition in pg_repack code.
>
>> >PS: I was trying a mailing list of pg_repack
>> >(http://lists.pgfoundry.org/pipermail/reorg-genera) and github without any
>> >notice. If there is better place to ask, tell me, please.
> I guess you should let those folks time to answer. If I were you I'd
> begin first by letting more time for the repack operation to complete.

Make sure you're on the latest version too. There were some recent
commits aimed at fixing some race conditions.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Deadlock detected after pg_repack receives SIGINT

From

Jiří Hlinka

Date:

04 November 2015, 16:16:14

Thanks Jim and Michael for comments.

I'm on pg_repack 1.3.2 (latest sable, no devel version available to check if it is already fixed).

Michael: your memories are fresh and clear :-), yes, it is part of a cleanup rollback. The problem is, that the pgrepack_drop call this statement:

DROP TABLE IF EXISTS repack.log_%u CASCADE

...and it collides with this trigger inserting new touples inside the [frequently_updated_table] in the repack.log_[OID of frequently_updated_table] routine:

SELECT 'CREATE TRIGGER z_repack_trigger'
	' BEFORE INSERT OR DELETE OR UPDATE ON ' \|\| repack.oid2text($1) \|\|
	' FOR EACH ROW EXECUTE PROCEDURE repack.repack_trigger(' \|\|
	'''INSERT INTO repack.log_' \|\| $1 \|\| '(pk, row) VALUES(' \|\|
	' CASE WHEN $1 IS NULL THEN NULL ELSE (ROW($1.' \|\|
	repack.get_index_columns($2, ', $1.') \|\| ')::repack.pk_' \|\|
	$1 \|\| ') END, $2)'')';

As these two actions are both run by pg_repack, it seems like a bug to me as pg_repack should not be able to deadlock itself, but not 100% sure about this part...

On Wed, Nov 4, 2015 at 3:48 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 11/3/15 7:44 AM, Michael Paquier wrote:
I doubt there is anything involving Postgres here. It seems that some
process is still holding a lock on a relation that is being dropped,
caused by a race condition in pg_repack code.

>PS: I was trying a mailing list of pg_repack
>(http://lists.pgfoundry.org/pipermail/reorg-genera) and github without any
>notice. If there is better place to ask, tell me, please.
I guess you should let those folks time to answer. If I were you I'd
begin first by letting more time for the repack operation to complete.

Make sure you're on the latest version too. There were some recent commits aimed at fixing some race conditions.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Bc. Jiří Hlinka

Tel.: 725 315 263

Re: Deadlock detected after pg_repack receives SIGINT

From

Michael Paquier

Date:

05 November 2015, 04:58:04

On Wed, Nov 4, 2015 at 10:16 PM, Jiří Hlinka <jiri.hlinka@gmail.com> wrote:

I'm on pg_repack 1.3.2 (latest sable, no devel version available to check if it is already fixed).

Michael: your memories are fresh and clear :-), yes, it is part of a cleanup rollback. The problem is, that the pgrepack_drop call this statement:
DROP TABLE IF EXISTS repack.log_%u CASCADE

...and it collides with this trigger inserting new touples inside the [frequently_updated_table] in the repack.log_[OID of frequently_updated_table] routine:

SELECT 'CREATE TRIGGER z_repack_trigger'

' BEFORE INSERT OR DELETE OR UPDATE ON ' || repack.oid2text($1) ||
' FOR EACH ROW EXECUTE PROCEDURE repack.repack_trigger(' ||
'''INSERT INTO repack.log_' || $1 || '(pk, row) VALUES(' ||
' CASE WHEN $1 IS NULL THEN NULL ELSE (ROW($1.' ||
repack.get_index_columns($2, ', $1.') || ')::repack.pk_' ||
$1 || ') END, $2)'')';

As these two actions are both run by pg_repack, it seems like a bug to me as pg_repack should not be able to deadlock itself, but not 100% sure about this part...

Is the trigger activity moving on or is this one waiting as well for a lock? It sounds like pg_repack is waiting for the end of the transaction running this trigger to finish before being able to drop the trigger and this relation safely. I guess that you are running large UPDATE queries, right? It really looks like you should let more room to pg_repack to do its stuff.
--

Michael

Re: Deadlock detected after pg_repack receives SIGINT

From

Jiří Hlinka

Date:

05 November 2015, 09:14:22

I'll check if the trigger activity was moving on or waiting for a lock from logs.

I'm doing simple UPDATES, INSERTs and DELETEs on this table, but frequency of these DMLs is _very_ high (it is a queue table used for a one-way selective [just part of data are replicated] replication of queries between two instances of the database, lets say from the main DB to data warehouse DB, therefore part of DML queries on tables of the main table is "copied" by a trigger to this queue table which is than used as a source for replicating changes into data warehouse DB) - this is the reason why the table needs VACUUM FULL at least twice a day, or better - running pg_repack on it at least twice a day.

My opinion is, that pg_repack should guarantee a consistent, deadlock-free behaviour via proper locking policy - if it is frequently updated table or not and the "worst" case that could appear is, that the pg_repack will wait for very long time (and this is the reason for the timeout usage in the script - to stop pg_repack if it is unable to receive the lock it is waiting for). I know it is hard to achieve this and not sure if it is possible to achieve this "clean" behaviour, but if for example deadlock happend, it is possible to fix it as a bug to make it even better than it is now.

Maybe I could lower the impact of the problem by checking inside the script if, after timeout expires, the pg_repack is still waiting in "Waiting for X transactions to finished" (= safe to SIGINT pg_repack) or if it is already in the process of repacking the table - do not stop it here. But there still remain the fact, that there may be a bug which I could and sholud report to the developers.

Thanks Michael,

Jiri

On Thu, Nov 5, 2015 at 2:57 AM, Michael Paquier <michael.paquier@gmail.com> wrote:

On Wed, Nov 4, 2015 at 10:16 PM, Jiří Hlinka <jiri.hlinka@gmail.com> wrote:
I'm on pg_repack 1.3.2 (latest sable, no devel version available to check if it is already fixed).

Michael: your memories are fresh and clear :-), yes, it is part of a cleanup rollback. The problem is, that the pgrepack_drop call this statement:
DROP TABLE IF EXISTS repack.log_%u CASCADE

...and it collides with this trigger inserting new touples inside the [frequently_updated_table] in the repack.log_[OID of frequently_updated_table] routine:

SELECT 'CREATE TRIGGER z_repack_trigger'

' BEFORE INSERT OR DELETE OR UPDATE ON ' || repack.oid2text($1) ||
' FOR EACH ROW EXECUTE PROCEDURE repack.repack_trigger(' ||
'''INSERT INTO repack.log_' || $1 || '(pk, row) VALUES(' ||
' CASE WHEN $1 IS NULL THEN NULL ELSE (ROW($1.' ||
repack.get_index_columns($2, ', $1.') || ')::repack.pk_' ||
$1 || ') END, $2)'')';

As these two actions are both run by pg_repack, it seems like a bug to me as pg_repack should not be able to deadlock itself, but not 100% sure about this part...

Is the trigger activity moving on or is this one waiting as well for a lock? It sounds like pg_repack is waiting for the end of the transaction running this trigger to finish before being able to drop the trigger and this relation safely. I guess that you are running large UPDATE queries, right? It really looks like you should let more room to pg_repack to do its stuff.
--
Michael

Bc. Jiří Hlinka

Tel.: 725 315 263

Re: Deadlock detected after pg_repack receives SIGINT

From

Kevin Grittner

Date:

05 November 2015, 19:43:49

On Thursday, November 5, 2015 12:16 AM, Jiří Hlinka <jiri.hlinka@gmail.com> wrote:

> My opinion is, that pg_repack should guarantee a consistent,
> deadlock-free behaviour via proper locking policy

I would be very interesting in seeing a description of what locking
policy would guarantee deadlock-free behavior when run concurrently
with unknown software.  If you have a link to a paper on the topic,
that would serve as well as a description here.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Deadlock detected after pg_repack receives SIGINT

From

Jiří Hlinka

Date:

06 November 2015, 10:08:48

Hi Kevin,

my point was, that pg_repack deadlocked itself - I think it should be possible to guarantee deadlock-free behavior at least via advisory lock for operations of pg_repack itself (I understand it is not possible to guarantee this across more apps). If it is not true, I'd be glad to hear I'm wrong (really!).

Thanks,

Jiri

On Thu, Nov 5, 2015 at 5:43 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

On Thursday, November 5, 2015 12:16 AM, Jiří Hlinka <jiri.hlinka@gmail.com> wrote:

> My opinion is, that pg_repack should guarantee a consistent,
> deadlock-free behaviour via proper locking policy

I would be very interesting in seeing a description of what locking
policy would guarantee deadlock-free behavior when run concurrently
with unknown software. If you have a link to a paper on the topic,
that would serve as well as a description here.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Bc. Jiří Hlinka

Tel.: 725 315 263

Re: Deadlock detected after pg_repack receives SIGINT

From

Michael Paquier

Date:

06 November 2015, 10:17:40

On Fri, Nov 6, 2015 at 4:08 PM, Jiří Hlinka <jiri.hlinka@gmail.com> wrote:
> my point was, that pg_repack deadlocked itself - I think it should be
> possible to guarantee deadlock-free behavior at least via advisory lock for
> operations of pg_repack itself (I understand it is not possible to guarantee
> this across more apps). If it is not true, I'd be glad to hear I'm wrong
> (really!).

Er, well. Based on the information given there is actually no evidence
that pg_repack is actually deadlocked. The code path calling the
cleanup callback after receiving SIGINT waiting for the transaction
working on the trigger being dropped to finish. We could say that
there is a deadlock if the transaction inserting data to repack.log%
is actually holding a lock that conflicts with the trigger being
dropped.
--
Michael

Re: Deadlock detected after pg_repack receives SIGINT

From

Jim Nasby

Date:

10 November 2015, 18:50:03

On 11/5/15 12:14 AM, Jiří Hlinka wrote:
> I'm doing simple UPDATES, INSERTs and DELETEs on this table, but
> frequency of these DMLs is _very_ high (it is a queue table used for a
> one-way selective [just part of data are replicated] replication of
> queries between two instances of the database, lets say from the main DB
> to data warehouse DB, therefore part of DML queries on tables of the
> main table is "copied" by a trigger to this queue table which is than
> used as a source for replicating changes into data warehouse DB) - this
> is the reason why the table needs VACUUM FULL at least twice a day, or
> better - running pg_repack on it at least twice a day.

That's a workload that is very problematic. It's why PgQ (used by
londiste) swaps queue tables around and does truncates. Slony now does
the same.

You'll probably be much happier either using PgQ, using some kind of
partitioning on the queue so that you're just truncating, or switching
to a different queuing solution altogether.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com