Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date
Msg-id CAA4eK1KjLP7UXY4yo3Eg5S1SnH8UAK57TV7auPu3-H9_FXqFzg@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions  (Noah Misch <noah@leadboat.com>)
Responses HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
List pgsql-hackers
On Wed, Dec 9, 2020 at 2:56 PM Noah Misch <noah@leadboat.com> wrote:
>
> Further testing showed it was a file location problem, not a deletion problem.
> The worker tried to open
> base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset/16393-510.changes.0, but these
> were the files actually existing:
>
> [nm@power-aix 0:2 2020-12-08T13:56:35 64gcc 0]$ ls -la $(find src/test/subscription/tmp_check -name
'*sharedfileset*')
> src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.0.sharedfileset:
> total 408
> drwx------    2 nm       usr             256 Dec 08 03:20 .
> drwx------    4 nm       usr             256 Dec 08 03:20 ..
> -rw-------    1 nm       usr          207806 Dec 08 03:20 16393-510.changes.0
>
> src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset:
> total 0
> drwx------    2 nm       usr             256 Dec 08 03:20 .
> drwx------    4 nm       usr             256 Dec 08 03:20 ..
> -rw-------    1 nm       usr               0 Dec 08 03:20 16393-511.changes.0
>
> > > I have executed "make check" in the loop with only this file.  I have
> > > repeated it 5000 times but no failure, I am wondering shall we try to
> > > execute in the same machine in a loop where it failed once?
> >
> > Yes, that might help. Noah, would it be possible for you to try that
>
> The problem is xidhash using strcmp() to compare keys; it needs memcmp().  For
> this to matter, xidhash must contain more than one element.  Existing tests
> rarely exercise the multi-element scenario.  Under heavy load, on this system,
> the test publisher can have two active transactions at once, in which case it
> does exercise multi-element xidhash.  (The publisher is sensitive to timing,
> but the subscriber is not; once WAL contains interleaved records of two XIDs,
> the subscriber fails every time.)  This would be much harder to reproduce on a
> little-endian system, where strcmp(&xid, &xid_plus_one)!=0.  On big-endian,
> every small XID has zero in the first octet; they all look like empty strings.
>

Your analysis is correct.

> The attached patch has the one-line fix and some test suite changes that make
> this reproduce frequently on any big-endian system.  I'm currently planning to
> drop the test suite changes from the commit, but I could keep them if folks
> like them.  (They'd need more comments and timeout handling.)
>

I think it is better to keep this test which can always test multiple
streams on the subscriber.

Thanks for working on this.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Andrey Borodin
Date:
Subject: Re: Yet another fast GiST build
Next
From: Amit Kapila
Date:
Subject: Re: Parallel INSERT (INTO ... SELECT ...)