Home > mailing lists

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication - Mailing list pgsql-hackers

From	Melih Mutlu
Subject	Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date	February 1, 2023 12:12:19
Msg-id	CAGPVpCSNExJ3tgK8QgnNUb1QVGvJNprW7LWJ-8fWfGKgtcittw@mail.gmail.com Whole thread Raw
In response to	Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication (shveta malik <shveta.malik@gmail.com>)
Responses	Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
List	pgsql-hackers

Tree view

Hi Shveta,

shveta malik <shveta.malik@gmail.com>, 1 Şub 2023 Çar, 15:01 tarihinde şunu yazdı:

On Wed, Feb 1, 2023 at 5:05 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote:
2) I found a crash in the previous patch (v9), but have not tested it
on the latest yet. Crash happens when all the replication slots are
consumed and we are trying to create new. I tweaked the settings like
below so that it can be reproduced easily:
max_sync_workers_per_subscription=3
max_replication_slots = 2
and then ran the test case shared by you. I think there is some memory
corruption happening. (I did test in debug mode, have not tried in
release mode). I tried to put some traces to identify the root-cause.
I observed that worker_1 keeps on moving from 1 table to another table
correctly, but at some point, it gets corrupted i.e. origin-name
obtained for it is wrong and it tries to advance that and since that
origin does not exist, it asserts and then something else crashes.
From log: (new trace lines added by me are prefixed by shveta, also
tweaked code to have my comment 1 fixed to have clarity on worker-id).

form below traces, it is clear that worker_1 was moving from one
relation to another, always getting correct origin 'pg_16688_1', but
at the end it got 'pg_16688_49' which does not exist. Second part of
trace shows who updated 'pg_16688_49', it was done by worker_49 which
even did not get chance to create this origin due to max_rep_slot
reached.

Thanks for investigating this error. I think it's the same error as the one Shi reported earlier. [1]

I couldn't reproduce it yet but will apply your tweaks and try again.

Looking into this.

[1] https://www.postgresql.org/message-id/OSZPR01MB631013C833C98E826B3CFCB9FDC69%40OSZPR01MB6310.jpnprd01.prod.outlook.com

Thanks,

Melih Mutlu

Microsoft

pgsql-hackers by date:

From: Melih Mutlu
Date: 01 February 2023, 12:07:25
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

From: adherent postgres
Date: 01 February 2023, 12:24:11
Subject: About PostgreSQL Core Team

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication - Mailing list pgsql-hackers

Previous

Next