Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations - Mailing list pgsql-hackers

From MauMau
Subject Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
Date
Msg-id 926E7158E8C04E7D87FAF73BE5AA8D6E@maumau
Whole thread Raw
In response to Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
List pgsql-hackers
From: "Robert Haas" <robertmhaas@gmail.com>
> I think the problem here is that it actually is possible for one
> session to access the temporary objects of another session:
> Now, we could prohibit that specific thing.  But at the very least, it
> has to be possible for one session to drop another session's temporary
> objects, because autovacuum does it eventually, and superusers will
> want to do it sooner to shut autovacuum up.  So it's hard to reason
> about whether and to what extent it's safe to not send sinval messages
> for temporary objects.

I was a bit surprised to know that one session can access the data of 
another session's temporary tables.  That implenentation nay be complicating 
the situation -- extra sinval messages.


> I think you might be approaching this problem from the wrong end,
> though.  The question in my mind is: why does the
> StartTransactionCommand() / CommitTransactionCommand() pair in
> ProcessCatchupEvent() end up writing a commit record?  The obvious
> possibility that occurs to me is that maybe rereading the invalidated
> catalog entries causes a HOT prune, and maybe there ought to be some
> way for a transaction that has only done HOT pruning to commit
> asynchronously, just as we already do for transactions that only
> modify temporary tables.  Or, failing that, maybe there's a way to
> suppress synchronous commit for this particular transaction.

I could figure out what log record was output in the transaction started in 
ProcessCatchupEvent() by inserting elog() in XLogInsert().  The log record 
was (RM_STANDBY_ID, XLOG_STANDBY_LOCK).

The cause of the hang turned out clear.  It was caused as follows:

1. When a transaction commits which used a temporary table created with ON 
COMMIT DELETE ROWS, the sinval catchup signal (SIGUSR1) was issued from 
smgrtruncate().  This is because the temporary table is truncated at 
transaction end.

2. Another session, which is waiting for a client request, receives SIGUSR1. 
It calls ProcessCatchupEvent().

3. ProcessCatchupEvent() calls StartTransactionCommand(), emits the 
XLOG_STANDBY_LOCK WAL record, and then calls CommitTransactionCommand().

4. It then calls SyncRepWaitForLSN(), which in turn waits on the latch.

5. But the WaitLatch() never returns, because the session is already running 
inside the SIGUSR1 handler in step 2.  WaitLatch() needs SIGUSR1 to 
complete.

I think there is a problem with the latch and SIGUSR1 mechanism.  How can we 
fix this problem?

Regards
MauMau





pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: SKIP LOCKED DATA (work in progress)
Next
From: David Rowley
Date:
Subject: Re: SKIP LOCKED DATA (work in progress)