On Wed, Feb 1, 2023 at 5:05 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote: 2) I found a crash in the previous patch (v9), but have not tested it on the latest yet. Crash happens when all the replication slots are consumed and we are trying to create new. I tweaked the settings like below so that it can be reproduced easily: max_sync_workers_per_subscription=3 max_replication_slots = 2 and then ran the test case shared by you. I think there is some memory corruption happening. (I did test in debug mode, have not tried in release mode). I tried to put some traces to identify the root-cause. I observed that worker_1 keeps on moving from 1 table to another table correctly, but at some point, it gets corrupted i.e. origin-name obtained for it is wrong and it tries to advance that and since that origin does not exist, it asserts and then something else crashes. From log: (new trace lines added by me are prefixed by shveta, also tweaked code to have my comment 1 fixed to have clarity on worker-id).
form below traces, it is clear that worker_1 was moving from one relation to another, always getting correct origin 'pg_16688_1', but at the end it got 'pg_16688_49' which does not exist. Second part of trace shows who updated 'pg_16688_49', it was done by worker_49 which even did not get chance to create this origin due to max_rep_slot reached.
Thanks for investigating this error. I think it's the same error as the one Shi reported earlier. [1]
I couldn't reproduce it yet but will apply your tweaks and try again.