Re: Queries that should be canceled will get stuck on secure_write function - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Queries that should be canceled will get stuck on secure_write function
Date
Msg-id 20210827192446.3kztmd5wqpy6vjm6@alap3.anarazel.de
Whole thread Raw
In response to Re: Queries that should be canceled will get stuck on secure_write function  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Queries that should be canceled will get stuck on secure_write function  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2021-08-27 08:27:38 -0400, Robert Haas wrote:
> On Tue, Aug 24, 2021 at 9:58 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > to report an error to a client, and then calls AbortCurrentTransaction()
> > to abort the transaction. If secure_write() called by EmitErrorReport()
> > gets stuck, a backend gets stuck inside transaction block and the locks
> > keep being held unnecessarily. Isn't this problematic? Can we change
> > the order of them?
> ...
> More generally, I think it's hopeless to try to improve the situation
> very much by changing the place where secure_write() happens. It
> happens in so many different places, and it is clearly not going to be
> possible to make it so that in none of those places do we hold any
> important server resources. Therefore I think the only solution is to
> fix secure_write() itself ... and the trick is what to do there given
> that we have to be very careful not to do anything that might try to
> write another message while we are already in the middle of writing a
> message.

I wonder if we could improve the situation somewhat by using the non-blocking
pqcomm functions in a few select places. E.g. if elog.c's
send_message_to_frontend() sent its message via a new pq_endmessage_noblock()
(which'd use the existing pq_putmessage_noblock()) and used
pq_flush_if_writable() instead of pq_flush(), we'd a) not block sending to the
client before AbortCurrentTransaction(), b) able to queue further error
messages safely.

I think this'd not violate the goal of putting pq_flush() into
send_message_to_frontend():
    /*
     * This flush is normally not necessary, since postgres.c will flush out
     * waiting data when control returns to the main loop. But it seems best
     * to leave it here, so that the client has some clue what happened if the
     * backend dies before getting back to the main loop ... error/notice
     * messages should not be a performance-critical path anyway, so an extra
     * flush won't hurt much ...
     */
    pq_flush();

because the only situations where we'd not send the data out immediately would
be when the socket buffer is already full. In which case the client wouldn't
get the error immediately anyway?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: log_autovacuum in Postgres 14 -- ordering issue
Next
From: "Bossart, Nathan"
Date:
Subject: Re: Estimating HugePages Requirements?