Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations - Mailing list pgsql-hackers

From MauMau
Subject Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
Date
Msg-id C26BF550D9A941DF925FBBC9BB6B4A16@maumau
Whole thread Raw
In response to Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations  ("MauMau" <maumau307@gmail.com>)
Responses Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
List pgsql-hackers
I've tracked down the real root cause.  The fix is very simple.  Please
check the attached one-liner patch.

The cause is that the temporary relations are truncated unconditionally
regardless of whether they are accessed in the transaction or not.  That is,
the following sequence of steps result in the hang:

1. A session creates a temporary table with ON COMMIT DELETE ROWS.  It adds
the temporary table to the list of relations that should be truncated at
transaction commit.

2. The session receives a sinval catchup signal (SIGUSR1) from another
session.  It starts a transaction and processes sinval messages in the
SIGUSR1 signal handler.  No WAL is output while processing the sinval
messages.

3. When the transaction commits, the list of temporary relations are checked
to see if they need to be truncated.

4. The temporary table created in step 1 is truncated.  To truncate a
relation, Access Exclusive lock is acquired on it.  When hot standby is
used, acquiring an Access Exclusive lock generates a WAL record
(RM_STANDBY_ID, XLOG_STANDBY_LOCK).

5. The transaction waits on a latch for a reply from a synchronous standby,
because it wrote some WAL.  But the latch wait never returns, because the
latch needs to receive SIGUSR1 but the SIGUSR1 handler is already in
progress from step 2.


The correct behavior is for the transaction not to truncate the temporary
table in step 4, because the transaction didn't use the temporary table.

I confirmed that the fix is already in 9.3 and 9.5devel, so I just copied
the code fragment from 9.5devel to 9.2.9.  The attached patch is for 9.2.9.
I didn't check 9.4 and other versions.  Why wasn't the fix applied to 9.2?

Finally, I found a very easy way to reproduce the problem:

1. On terminal session 1, start psql and run:
  CREATE TEMPORARY TABLE t (c int) ON COMMIT DELETE ROWS;
Leave the psql session open.

2. On terminal session 2, run:
  pgbench -c8 -t500 -s1 -n -f test.sql dbname
[test.sql]
CREATE TEMPORARY TABLE t (c int) ON COMMIT DELETE ROWS;
DROP TABLE t;

3. On the psql session on terminal session 1, run any SQL statement.  It
doesn't reply.  The backend is stuck at SyncRepWaitForLSN().

Regards
MauMau

Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: jsonb format is pessimal for toast compression
Next
From: Fujii Masao
Date:
Subject: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index