Re: weird issue with occasional stuck queries - Mailing list pgsql-general

From Adam Scott
Subject Re: weird issue with occasional stuck queries
Date
Msg-id CA+s62-M3bCtos=ocwNoyo0s8rEx0Q_Pw+BiURNXf55hdCaq=PA@mail.gmail.com
Whole thread Raw
In response to Re: weird issue with occasional stuck queries  (spiral <spiral@spiral.sh>)
List pgsql-general
The logs were helpful.  You may want to see the statements around  the errors, as more detail may be there such as the SQL statement associated with the error.

Deadlocks are an indicator that the client code needs to be examined for improvement.  See https://www.cybertec-postgresql.com/en/postgresql-understanding-deadlocks/ about deadlocks.  They will slow things down and could cause a queue of SQL statements eventually bogging down the system.

It definitely looks like locking issues which is why you don't see high CPU.  IIRC you might see high system CPU usage, as opposed to userspace CPU, where the kernel is getting overloaded. The `top` command will help to show that. 

The disks could be saturated by the write ahead log (WAL) handling of all the transactions.  More about WAL here: https://www.postgresql.org/docs/10/wal-internals.html  You could consider moving that directory somewhere else using a symbolic link (conf. the link)

Anyway, these are the things I would look at.

Adam





On Sat, Apr 2, 2022 at 5:23 AM spiral <spiral@spiral.sh> wrote:
Hey,

> That wait event according to documentation is "Waiting to access the
> multixact member SLRU cache."  SLRU = segmented least recently used
> cache

I see, thanks!

> if you are low on memory, it can slow down the allocation of
> buffers. Do you have a query that is a "select for update" running
> somewhere? If your disk is low on space `df -h` that might explain
> the issue.

- There aren't any queries that are running for longer than the selects
shown earlier; definitely not "select for update" since I don't ever
use that in my code.
- Both disk and RAM utilization is relatively low.

> Is there an ERROR: multixact  something in your postgres log?

There isn't, but while checking I saw some other concerning errors
including "deadlock detected", "could not map dynamic shared memory
segment" and "could not attach to dynamic shared area".
(full logs here: https://paste.sr.ht/blob/9ced99b119c3fce1ecfd71e8554946e7845a44dd )

> Another thing to look at is `iostat -x -y` and look at disk util %.
> This is an indicator, but not definitive, of how much disk access is
> going on.  It may be your drives are just saturated although your
> IOWait looks ok in your attachment.

I didn't specifically look at that, but I did notice *very* high disk
utilization in at least one instance of the stuck queries, as I
mentioned previously. Why would the disks be getting saturated? The
query count isn't noticeably higher than average, and the database
is not autovacuuming, so not sure what could cause that.

spiral

pgsql-general by date:

Previous
From: Benedict Holland
Date:
Subject: Re: Re: How long does iteration over 4-5 million rows usually take?
Next
From: overland
Date:
Subject: Re: weird issue with occasional stuck queries