Re: How do I track down a possible locking problem? - Mailing list pgsql-general
From | Herouth Maoz |
---|---|
Subject | Re: How do I track down a possible locking problem? |
Date | |
Msg-id | FB719A3C-5D09-4F7B-981D-2FF999FD4276@unicell.co.il Whole thread Raw |
In response to | Re: How do I track down a possible locking problem? (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: How do I track down a possible locking problem?
(Jeff Janes <jeff.janes@gmail.com>)
|
List | pgsql-general |
On 18/02/2014, at 19:02, Jeff Janes wrote:
On Mon, Feb 17, 2014 at 8:45 AM, Herouth Maoz <herouth@unicell.co.il> wrote:I have a production system using Postgresql 9.1.2.
The system basically receives messages, puts them in a queue, and then several parallel modules, each in its own thread, read from that queue, and perform two inserts, then release the message to the next queue for non-database-related processing.
Today, after we received complaints from a customer about delays, I noticed odd behavior in the system. About 2 percent of the messages were inserted into the tables more than an hour after they got into the system.How do you know that?
The queue never has more than 27,000 messages at the same time, and all together, the parallel modules process about 5000 or 6000 messages per minute. So basically, the delay for a single message should never be more than a few minutes. Even if one module gets stuck, another will grab the next message from the queue. I believe the only way for a message to be stuck for so long would be for it to be grabbed by a module, and then for the database write to be somehow delayed for a hour, although it's merely a simple insert performed with a prepared statement.
The database in production is very busy with millions of writes per hour. Could there be a situation in which a particular connection gets "starved" while other connections are able to run queries without noticeable delay?If there is a delay like that, it would almost certainly be due to database locks that show up in pg_locks.But, I doubt that that is your problem.
How can I truck such locks down? Does anybody have any ideas other than starvation? The system lets me view statistics of how many messages were processed in each modules and the average latency. None of the four modules running has long average latency or low number of messages processes, so I don't think the issue is related to any particular thread in my (Java) system being slow or locked away by the others.If the insert into PostgreSQL was freezing, wouldn't that time get reflected in your latency monitoring?
It sounds to me like your application has a bug in its queue, where it forgets about items on the queue for a while.
Not likely. This application has been running with the same queue implementation since 2001, even before we started using PostgreSQL... All bugs in that particular code would have been eliminated long ago. The system consists of dozens of queues, and we have monitoring that alerts us to any queue where messages are left unattended. If there was such a bug, we'd see queues that never become empty.
But anyway, I'll put up some cron job that monitors the locks or the query lengths, and see if I can come up with anything. Theoretically, I have a way of logging when each message is enqueued or dequeued from each queue in the system, but this would slow down the production system, so I'll only do that as a last resort.
Thank you for pointing me to the lock monitoring documentation.
pgsql-general by date: