Thread: Failover and vacuum

Failover and vacuum

From

Raj

Date:

27 March, 20:40:47

Have 2 nodes ( primary and standby postgres 15.6) in openshift kubernetes.

Patroni setup. 300gb data. No failover since last six months. Suddenly after failover, there were lot of issues such as too many connections and slowness.

Is it due to not analyze done in new node?

Re: Failover and vacuum

From

"Gurjeet Singh"

Date:

29 March, 06:25:08

On Thu Mar 27, 2025 at 10:40 AM PDT, Raj wrote:
> Hi
>
> Have 2 nodes ( primary and standby postgres 15.6)  in openshift kubernetes.
>
> Patroni setup. 300gb data. No failover since last six months. Suddenly
> after failover, there were lot of issues such as too many connections and
> slowness.
>
> Is it due to not analyze done in new node?

There isn't enough information in your message for anyone to take an
educated guess. If you see evidence of vacuum not being performed on
certain tables for a while, yes that may explain queries taking too
long, which may cause other symptoms. If analyze hasn't been performed
for a while, then you may see suboptimal plans for the queries.

But, again, you haven't provided enough information, so you may not get
very helpful answers.

Best regards,
Gurjeet
http://Gurje.et

Re: Failover and vacuum

From

Raj

Date:

29 March, 09:39:33

What additional information required?

On Sat, 29 Mar 2025, 08:55 Gurjeet Singh, <gurjeet@singh.im> wrote:

On Thu Mar 27, 2025 at 10:40 AM PDT, Raj wrote:
> Hi
>
> Have 2 nodes ( primary and standby postgres 15.6) in openshift kubernetes.
>
> Patroni setup. 300gb data. No failover since last six months. Suddenly
> after failover, there were lot of issues such as too many connections and
> slowness.
>
> Is it due to not analyze done in new node?

There isn't enough information in your message for anyone to take an
educated guess. If you see evidence of vacuum not being performed on
certain tables for a while, yes that may explain queries taking too
long, which may cause other symptoms. If analyze hasn't been performed
for a while, then you may see suboptimal plans for the queries.

But, again, you haven't provided enough information, so you may not get
very helpful answers.

Best regards,
Gurjeet
http://Gurje.et

Re: Failover and vacuum

From

Ron Johnson

Date:

29 March, 13:48:04

On Fri, Mar 28, 2025 at 11:25 PM Gurjeet Singh <gurjeet@singh.im> wrote:

On Thu Mar 27, 2025 at 10:40 AM PDT, Raj wrote:
> Hi
>
> Have 2 nodes ( primary and standby postgres 15.6) in openshift kubernetes.
>
> Patroni setup. 300gb data. No failover since last six months. Suddenly
> after failover, there were lot of issues such as too many connections and
> slowness.
>
> Is it due to not analyze done in new node?

There isn't enough information in your message for anyone to take an
educated guess. If you see evidence of vacuum not being performed on
certain tables for a while, yes that may explain queries taking too
long, which may cause other symptoms. If analyze hasn't been performed
for a while, then you may see suboptimal plans for the queries.

But, again, you haven't provided enough information, so you may not get
very helpful answers.

But don't vacuuming and analyze statistics get replicated?

Death to <Redacted>, and butter sauce.

Don't boil me, I'm still alive.

<Redacted> lobster!

Re: Failover and vacuum

From

Ron Johnson

Date:

29 March, 14:08:01

On Thu, Mar 27, 2025 at 1:41 PM Raj <rajeshkumar.dba09@gmail.com> wrote:

Hi

Have 2 nodes ( primary and standby postgres 15.6) in openshift kubernetes.

Patroni setup. 300gb data. No failover since last six months. Suddenly after failover, there were lot of issues such as too many connections and slowness.

Is it due to not analyze done in new node?

Is postgresql.conf configured the same on both nodes?

max_connections being lower on the replica node would certainly and immediately cause "too many connections" errors.


diff -y --suppress-common-lines $PGDATA/postgresql.conf <(ssh -q otherserver "cat $PGDATA/postgresql.conf")

Vacuuming and statistics_are_ replicated: that data is in tables, so must be replicated). However, when they were last vacuumed and analyzed is apparently not on disk. Thus, the new primary can't know the number of tuples analyzed since the last ANALYZE, and the number of dead and inserted records since the last ANALYZE.

Thus, I'd do a vacuumdb --analyze-in-stages soon after the switch-over.

Death to <Redacted>, and butter sauce.

Don't boil me, I'm still alive.

<Redacted> lobster!

Re: Failover and vacuum

From

Raj

Date:

29 March, 16:10:47

Great, it's same configuration on both nodes. We are not taking connections from standby anyway.

On Sat, 29 Mar 2025, 16:38 Ron Johnson, <ronljohnsonjr@gmail.com> wrote:

On Thu, Mar 27, 2025 at 1:41 PM Raj <rajeshkumar.dba09@gmail.com> wrote:
Hi

Have 2 nodes ( primary and standby postgres 15.6) in openshift kubernetes.

Patroni setup. 300gb data. No failover since last six months. Suddenly after failover, there were lot of issues such as too many connections and slowness.

Is it due to not analyze done in new node?

Is postgresql.conf configured the same on both nodes?

max_connections being lower on the replica node would certainly and immediately cause "too many connections" errors.
diff -y --suppress-common-lines $PGDATA/postgresql.conf <(ssh -q otherserver "cat $PGDATA/postgresql.conf")
Vacuuming and statistics_are_ replicated: that data is in tables, so must be replicated). However, when they were last vacuumed and analyzed is apparently not on disk. Thus, the new primary can't know the number of tuples analyzed since the last ANALYZE, and the number of dead and inserted records since the last ANALYZE.

Thus, I'd do a vacuumdb --analyze-in-stages soon after the switch-over.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Failover and vacuum

From

Ron Johnson

Date:

29 March, 18:05:24

So what would cause "too many connections", if both nodes have the same mox_connections?

What does your application do when DB response time is slow? I've read that pod-type services can be configured to spawn more connections when the app thinks the DB isn't fast enough.

On Sat, Mar 29, 2025 at 9:10 AM Raj <rajeshkumar.dba09@gmail.com> wrote:

Great, it's same configuration on both nodes. We are not taking connections from standby anyway.
On Sat, 29 Mar 2025, 16:38 Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
On Thu, Mar 27, 2025 at 1:41 PM Raj <rajeshkumar.dba09@gmail.com> wrote:
Hi

Have 2 nodes ( primary and standby postgres 15.6) in openshift kubernetes.

Patroni setup. 300gb data. No failover since last six months. Suddenly after failover, there were lot of issues such as too many connections and slowness.

Is it due to not analyze done in new node?

Is postgresql.conf configured the same on both nodes?

max_connections being lower on the replica node would certainly and immediately cause "too many connections" errors.
diff -y --suppress-common-lines $PGDATA/postgresql.conf <(ssh -q otherserver "cat $PGDATA/postgresql.conf")
Vacuuming and statistics_are_ replicated: that data is in tables, so must be replicated). However, when they were last vacuumed and analyzed is apparently not on disk. Thus, the new primary can't know the number of tuples analyzed since the last ANALYZE, and the number of dead and inserted records since the last ANALYZE.

Thus, I'd do a vacuumdb --analyze-in-stages soon after the switch-over.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Death to <Redacted>, and butter sauce.

Don't boil me, I'm still alive.

<Redacted> lobster!