LISTEN denial of service with aborted transaction - Mailing list pgsql-hackers

From Matt Newell
Subject LISTEN denial of service with aborted transaction
Date
Msg-id 1740208.GbWj81YA5k@obsidian
Whole thread Raw
Responses Re: LISTEN denial of service with aborted transaction
List pgsql-hackers
This morning with our production database I began receiving reports
of the database being "down".

I checked the log and was surprised to see extremely long durations for a 
LISTEN that happens after each connection is made by our database library. 
This coincided with many(approx 600) new connections happening in a short time 
window due to render nodes automatically being turned on when the first job of 
the morning was submitted(idle nodes are turned off to save power).  My 
initial hunch was that there was some code in postgres that resulted 
exponential execution time if enough listens on new connections happened at 
the same time.  As I was trying to gather more information the listen times 
began to decrease and after about 20 minutes things were back to normal.

A few hours later the symptoms returned but this time the listen was taking 
upwards of 15 minutes.  I did some more reading and checked the pg_notify 
directory and found that there were 49 files.  Having never checked that 
directory before I wasn't sure if that was normal.  A short time later I 
noticed there was a process sitting idle with an aborted transaction.  After 
killing the process things quickly returned to normal.

After doing a search for "idle in transaction (aborted)" I came upon 
http://stackoverflow.com/questions/15036438/postgres-connection-leaks-idle-in-transaction-aborted 

While this is a potential solution for dealing with the problem it seems that 
the postgresql developers have decided to let connections stay in the "idle in 
transaction (aborted)" state for a reason, most likely under the assumption 
that it's relatively safe and only eats up the resources of a single 
connection.  However it's easy to demonstrate that doing:

listen "abc";
begin;
select bad_column from non_existant_table;

...will eventually cause a denial of service situation if the DBA hasn't setup
guards against connection sitting idle in an aborted transaction.

Looking at the code in src/backend/commands/async.c I think there are a couple 
ways to eliminate this problem.

1. When a connection issues it's first LISTEN command, in Exec_ListenPreCommit    
QUEUE_BACKEND_POS(MyBackendId) = QUEUE_TAIL;
this causes the connection to iterate through every notify queued in the slru, 
even though at that point I believe the connection can safely ignore any 
notifications from transactions that are already committed, and if I 
understand correctly notifications aren't put into the slru until precommit,  
so wouldn't it be safe to do:
QUEUE_BACKEND_POS(MyBackendId) = QUEUE_HEAD;
inside Async_Listen?  If that's not safe, then could a new member be added to 
AsyncQueueControl that points to the first uncommitted QueuePosition (wouldn't 
have to be kept completely up to date).

This would solve the problem of slow initial LISTEN in the face of a growing 
pg_notify queue.

2. Would it be possible when a backend is signaled to check if it is idle 
inside an aborted transaction, and if so process the notifications and put any 
that match what the backend is listening on into a local list.  This would 
allow the slru to be cleaned up.  In practice I think most notifications would 
either be disregarded or combined with a duplicate, so the list would most 
likely end up staying very small. If the backend local list does grow too 
large then the connection could be killed or handled in some other appropriate 
way.

I am happy to attempt coming up with a patch if the ideas are deemed 
worthwhile.

Thanks,
Matt Newell






pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: No Issue Tracker - Say it Ain't So!
Next
From: Tom Lane
Date:
Subject: Re: No Issue Tracker - Say it Ain't So!