On Mon, Jul 11, 2022 at 10:50:23AM +0200, Aleš Zelený wrote:
> So far, it has happened three times (during a single week) from the 14.3 ->
> 14.4 upgrade, before 14.4 we haven't suffered from such an issue.
>
> Questions:
> 1) Can we safely downgrade from 14.4 to 14.3 by shutting down the instance
> and reinstalling 14.3 PG packages (to prove, that the issue disappear)?
> 2) What is the best way to diagnose what is the root cause?
Hmm. 14.4 has nothing in its release notes that would point to a
change in the vacuum or autovacuum's code paths:
https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4
There is nothing specific after a look at the changes as of, and I am
not grabbing anything that would imply a change in memory context
handling either:
`git log --stat REL_14_3..REL_14_4`
`git diff REL_14_3..REL_14_4 -- *.c`
Saying that, you should be able to downgrade safely as there are no
changes in WAL format or such that would break things. Saying that,
the corruption issue caused by CONCURRENTLY is something you'd still
have to face.
> 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR: out of
> memory
> 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL: Failed
> on request of size 152094068 in memory context "TopTransactionContext".
> 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
> automatic vacuum of table "prematch.replication.tab_queue_tmp"
This is the interesting part. Do you happen to use logical
replication in a custom C++ plugin?
> 2022-07-02 14:48:47 CEST [4476]: [43-1] user=,db=,host=,app= LOG: could
> not fork worker process: Cannot allocate memory
> terminate called after throwing an instance of 'std::bad_alloc'
> what(): std::bad_alloc
>
> DETAIL: parameters: $1 = '1', $2 = '1748010445', $3 = '0', $4 = '1000'
> terminate
> called after throwing an instance of 'std::bad_alloc' terminate called
> after throwing an instance of 'std::bad_alloc' what(): what():
> std::bad_allocstd::bad_alloc 2022-07-08 14:54:23 CEST [4476]: [49-1]
> user=,db=,host=,app= LOG: background worker "parallel worker" (PID 25251)
> was terminated by signal 6: Aborted
> 2022-07-08 14:54:23 CEST [4476]: [51-1] user=,db=,host=,app= LOG:
> terminating any other active server processes
Looks like something is going wrong in the memory handling of one of
your C++ extensions here. If you can isolate an issue using a query
without any custom code, that would be a Postgres problem, but I think
that you are missing a trick in it.
--
Michael