Re: BUG #17695: Failed Assert in logical replication snapbuild. - Mailing list pgsql-bugs

From Alexander Lakhin
Subject Re: BUG #17695: Failed Assert in logical replication snapbuild.
Date
Msg-id 102cb85d-2205-c8ec-ac37-797c03e025e1@gmail.com
Whole thread Raw
In response to Re: BUG #17695: Failed Assert in logical replication snapbuild.  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: BUG #17695: Failed Assert in logical replication snapbuild.  (Daniel Gustafsson <daniel@yesql.se>)
List pgsql-bugs
22.05.2023 03:56, Masahiko Sawada wrote:
> On Thu, May 18, 2023 at 11:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
>
>> I can easily (without gdb and sleep()) reproduce the issue on master with
>> the following script:
>> ...
> Thank you for sharing the script. But it seems not stable as I could
> not reproduce the issue in my environment. I think we need a stable
> reproducer so that we can include it in core regression tests. Or it
> may be okay not to include it if we could not find a convenient way
> and the fix is trivial.

I've came to the minimal reproducer:
numclients=40
for ((c=1;c<=numclients;c++)); do
createdb regress_$c
done

for ((c=1;c<=numclients;c++)); do
(
echo "
CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_$c', 'test_decoding');
SELECT data FROM pg_logical_slot_get_changes('regression_slot_$c', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts',

'1');
" | psql -d regress_$c >psql-$c.log
) &
done
wait
grep TRAP server.log

(I've set
fsync = off
wal_level = logical
in postgresql.conf)

When using a build made with ASAN (and gcc-12), I get several asserts at once:
grep TRAP server.log  | wc -l
12
Without ASAN, I get no failures with numclients = 40, but still get series of
those with numclients=80...

It's hardly suitable for the regression test, but it clearly demonstrates the
issue without using gdb. With the fix from [1] applied, I've got no failures,
even with numclients=100, for 10 runs.

I also think, that the fix is simple enough to be committed without a
complicated/resource-intensive regression test.

[1] https://www.postgresql.org/message-id/CAD21AoDNv09ZMr-E%2BfNzhduvkE6eK2fjCRA7wJHOhF8APH5JdQ%40mail.gmail.com

Best regards,
Alexander



pgsql-bugs by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Need Support to Upgrade from 13.6 to 15.3
Next
From: Dippu Kumar
Date:
Subject: Re: Need Support to Upgrade from 13.6 to 15.3