Thread: debugging intermittent slow updates under higher load
Hi All, This is on postgres 9.4.16, same table as the last question I asked, here's an abbreviated desc: # \d alerts_alert Table "public.alerts_alert" Column | Type | Modifiers -----------------+--------------------------+----------- tags | jsonb | not null id | character varying(86) | not null ... Indexes: "alerts_alert_pkey" PRIMARY KEY, btree (id) The table has around 1.5M rows which have been updated/inserted around 121M times, the distribution of updates to row in alerts_alert will be quite uneven, from 1 insert up to 1 insert and 0.5M updates. Under high load (200-300 inserts/updates per second) we see occasional (~10 per hour) updates taking excessively long times (2-10s). These updates are always of the form: UPDATE "alerts_alert" SET ...bunch of fields... WHERE "alerts_alert"."id" = '...sha1 hash...'; Here's a sample explain: https://explain.depesz.com/s/Fjq8 What could be causing this? What could we do to debug? What config changes could we make to alleviate this? cheers, Chris
> > The table has around 1.5M rows which have been updated/inserted around > 121M times, the distribution of updates to row in alerts_alert will be > quite uneven, from 1 insert up to 1 insert and 0.5M updates. > > Under high load (200-300 inserts/updates per second) we see occasional > (~10 per hour) updates taking excessively long times (2-10s). These > updates are always of the form: > > UPDATE "alerts_alert" SET ...bunch of fields... WHERE > "alerts_alert"."id" = '...sha1 hash...'; > > Here's a sample explain: > > https://explain.depesz.com/s/Fjq8 > > What could be causing this? What could we do to debug? What config > changes could we make to alleviate this? > Hello Chris, One of the reasons could be the row already locked by another backend, doing the same kind of an update or something different. Are these updates performed in a longer transactions? Can they hit the same row from two clients at the same time? Is there any other write or select-for-update/share load on the table? Have you tried periodical logging of the non-granted locks? Try querying pg_stat_activity and pg_locks (possibly joined and maybe repeatedly self-joined, google for it) to get the backends that wait one for another while competing for to lock the same row or object. Best, Alex
Also read about hot updates and the storage parameter named "fill_factor", so, data blocks can be recycled instead of creating new ones if the updated fields don't update also indexes.
Am Mi., 5. Dez. 2018 um 09:39 Uhr schrieb Alexey Bashtanov <bashtanov@imap.cc>:
>
> The table has around 1.5M rows which have been updated/inserted around
> 121M times, the distribution of updates to row in alerts_alert will be
> quite uneven, from 1 insert up to 1 insert and 0.5M updates.
>
> Under high load (200-300 inserts/updates per second) we see occasional
> (~10 per hour) updates taking excessively long times (2-10s). These
> updates are always of the form:
>
> UPDATE "alerts_alert" SET ...bunch of fields... WHERE
> "alerts_alert"."id" = '...sha1 hash...';
>
> Here's a sample explain:
>
> https://explain.depesz.com/s/Fjq8
>
> What could be causing this? What could we do to debug? What config
> changes could we make to alleviate this?
>
Hello Chris,
One of the reasons could be the row already locked by another backend,
doing the same kind of an update or something different.
Are these updates performed in a longer transactions?
Can they hit the same row from two clients at the same time?
Is there any other write or select-for-update/share load on the table?
Have you tried periodical logging of the non-granted locks?
Try querying pg_stat_activity and pg_locks (possibly joined and maybe
repeatedly self-joined, google for it)
to get the backends that wait one for another while competing for to
lock the same row or object.
Best,
Alex
--
This parameter can be updated on a "per table" basis.
Am Mi., 5. Dez. 2018 um 09:47 Uhr schrieb Rene Romero Benavides <rene.romero.b@gmail.com>:
Also read about hot updates and the storage parameter named "fill_factor", so, data blocks can be recycled instead of creating new ones if the updated fields don't update also indexes.Am Mi., 5. Dez. 2018 um 09:39 Uhr schrieb Alexey Bashtanov <bashtanov@imap.cc>:
>
> The table has around 1.5M rows which have been updated/inserted around
> 121M times, the distribution of updates to row in alerts_alert will be
> quite uneven, from 1 insert up to 1 insert and 0.5M updates.
>
> Under high load (200-300 inserts/updates per second) we see occasional
> (~10 per hour) updates taking excessively long times (2-10s). These
> updates are always of the form:
>
> UPDATE "alerts_alert" SET ...bunch of fields... WHERE
> "alerts_alert"."id" = '...sha1 hash...';
>
> Here's a sample explain:
>
> https://explain.depesz.com/s/Fjq8
>
> What could be causing this? What could we do to debug? What config
> changes could we make to alleviate this?
>
Hello Chris,
One of the reasons could be the row already locked by another backend,
doing the same kind of an update or something different.
Are these updates performed in a longer transactions?
Can they hit the same row from two clients at the same time?
Is there any other write or select-for-update/share load on the table?
Have you tried periodical logging of the non-granted locks?
Try querying pg_stat_activity and pg_locks (possibly joined and maybe
repeatedly self-joined, google for it)
to get the backends that wait one for another while competing for to
lock the same row or object.
Best,
Alex
--
--
On 05/12/2018 15:40, Alexey Bashtanov wrote: > >> > One of the reasons could be the row already locked by another backend, > doing the same kind of an update or something different. > Are these updates performed in a longer transactions? Nope, the transaction will just be updating one row at a time. > Can they hit the same row from two clients at the same time? I've looked for evidence of this, but can't find any. Certainly nothing running for 2-10s, queries against this table are normally a few hundred ms. > Is there any other write or select-for-update/share load on the table? Not that I'm aware of. How would I go about getting metrics on problems like these? > Have you tried periodical logging of the non-granted locks? > Try querying pg_stat_activity and pg_locks (possibly joined and maybe > repeatedly self-joined, google for it) > to get the backends that wait one for another while competing for to > lock the same row or object. Is there any existing tooling that does this? I'm loath to start hacking something up when I'd hope others have done a better job already... Chris
On 05/12/2018 15:47, Rene Romero Benavides wrote: > Also read about hot updates and the storage parameter named > "fill_factor", so, data blocks can be recycled instead of creating new > ones if the updated fields don't update also indexes. I have read about these, but I'd prefer not to be making opportunistic/guessing changes on this. How can I collect metrics/logging/etc evidence to confirm what the problem actually is? cheers, Chris
> Is there any existing tooling that does this? There must be some, google for queries involving pg_locks > I'm loath to start hacking something up when I'd hope others have done > a better job already... If you log all queries that take more than a second to complete, is your update the only one logged, or something (the would-be blocker) gets logged down together with it?
On 06/12/2018 11:00, Alexey Bashtanov wrote: > >> I'm loath to start hacking something up when I'd hope others have done >> a better job already... > If you log all queries that take more than a second to complete, is your > update the only one logged, or something (the would-be blocker) gets > logged down together with it? Nope, only ones logged are these updates. Chris
Hi
čt 6. 12. 2018 v 12:18 odesílatel Chris Withers <chris@withers.org> napsal:
On 06/12/2018 11:00, Alexey Bashtanov wrote:
>
>> I'm loath to start hacking something up when I'd hope others have done
>> a better job already...
> If you log all queries that take more than a second to complete, is your
> update the only one logged, or something (the would-be blocker) gets
> logged down together with it?
Nope, only ones logged are these updates.
Can you check latency on file system? Some latencies can be enforced by overloaded file system due wrong configuration of file system cache.
Regards
Pavel
Chris