Thread: Monitoring for long running transactions
We had a customer complaining of random data loss for the last 6 months or so. We eventually tracked it down to a combination of bad coding and a couple of bugs with the ORM. Basically, 'BEGIN' was being emitted by the web app when viewing a certain page and 'COMMIT' was never emitted after that. So once the app would get restarted, all data changes would be lost. Definitely worst case scenario. So the question is, what is the best way to monitor for this scenario going forward? Are there any plugins or community recommended scripts already made? Regards,
On 6/4/20 10:00 AM, Samuel Smith wrote: > We had a customer complaining of random data loss for the last 6 months > or so. We eventually tracked it down to a combination of bad coding and > a couple of bugs with the ORM. Basically, 'BEGIN' was being emitted by > the web app when viewing a certain page and 'COMMIT' was never emitted > after that. So once the app would get restarted, all data changes would > be lost. Definitely worst case scenario. > > So the question is, what is the best way to monitor for this scenario > going forward? Are there any plugins or community recommended scripts > already made? https://www.postgresql.org/docs/12/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW " state text Current overall state of this backend. Possible values are: active: The backend is executing a query. idle: The backend is waiting for a new client command. idle in transaction: The backend is in a transaction, but is not currently executing a query. idle in transaction (aborted): This state is similar to idle in transaction, except one of the statements in the transaction caused an error. fastpath function call: The backend is executing a fast-path function. disabled: This state is reported if track_activities is disabled in this backend. " > > Regards, > > > > -- Adrian Klaver adrian.klaver@aklaver.com
On 6/4/20 2:29 PM, Adrian Klaver wrote: > On 6/4/20 10:00 AM, Samuel Smith wrote: >> We had a customer complaining of random data loss for the last 6 >> months or so. We eventually tracked it down to a combination of bad >> coding and a couple of bugs with the ORM. Basically, 'BEGIN' was being >> emitted by the web app when viewing a certain page and 'COMMIT' was >> never emitted after that. So once the app would get restarted, all >> data changes would be lost. Definitely worst case scenario. >> >> So the question is, what is the best way to monitor for this scenario >> going forward? Are there any plugins or community recommended scripts >> already made? > > https://www.postgresql.org/docs/12/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW > > > " > state text Current overall state of this backend. Possible > values are: > > active: The backend is executing a query. > > idle: The backend is waiting for a new client command. > > idle in transaction: The backend is in a transaction, but is not > currently executing a query. > > idle in transaction (aborted): This state is similar to idle in > transaction, except one of the statements in the transaction caused an > error. > > fastpath function call: The backend is executing a fast-path function. > > disabled: This state is reported if track_activities is disabled in > this backend. > " > >> >> Regards, >> >> >> >> > > Sorry, I should have clarified that I was aware of the pg_stat_activity table. That is how we found the problem in the first place. And yes I could just write a bash script and run it in cron. I just didn't know if there was a more "official" way to go about this since it is probably a common monitoring point and/or if something like this was already made. Regards,
On 6/4/20 12:59 PM, Samuel Smith wrote: > On 6/4/20 2:29 PM, Adrian Klaver wrote: >> > > > Sorry, I should have clarified that I was aware of the pg_stat_activity > table. That is how we found the problem in the first place. And yes I > could just write a bash script and run it in cron. I just didn't know if > there was a more "official" way to go about this since it is probably a > common monitoring point and/or if something like this was already made. I guess it comes down to how much info you want. A list of monitoring tools: https://wiki.postgresql.org/wiki/Monitoring#pgcenter Haven't used it but this one: https://pgmetrics.io/ looks interesting. From here: https://pgmetrics.io/docs/ Active backends: transaction running too long, idling in transaction, waiting for locks Information on scripting above: https://pgmetrics.io/docs/scripting.html > > Regards, > > -- Adrian Klaver adrian.klaver@aklaver.com
## Samuel Smith (pgsql@net153.net): > Sorry, I should have clarified that I was aware of the pg_stat_activity > table. That is how we found the problem in the first place. And yes I > could just write a bash script and run it in cron. I just didn't know if > there was a more "official" way to go about this since it is probably a > common monitoring point and/or if something like this was already made. This works fine for everything nagios-like: https://bucardo.org/check_postgres/check_postgres.pl.html#txn_time Other solutions may be found via the PostgreSQL wiki - as you noted, this is (should be) monitored, so almost everything which can monitor PostgreSQL covers this. (which raises the obvious question - did you monitor your server or did you just use the common "shouting user" approach? ;)) Regards, Christoph -- Spare Space
Samuel Smith schrieb am 04.06.2020 um 21:59: > Sorry, I should have clarified that I was aware of the > pg_stat_activity table. That is how we found the problem in the first > place. And yes I could just write a bash script and run it in cron. I > just didn't know if there was a more "official" way to go about this > since it is probably a common monitoring point and/or if something > like this was already made. What about setting idle_in_transaction_session_timeout to some reasonably high value (15 minutes?) Then you would get an error in your application because the connection was terminated and you would learn early about theproblem. Even with monitoring enabled, you would probably still kill those sessions manually as the application most probably can'tcommit them properly any more. Thomas