Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions |
Date | |
Msg-id | CAA4eK1KjLP7UXY4yo3Eg5S1SnH8UAK57TV7auPu3-H9_FXqFzg@mail.gmail.com Whole thread Raw |
In response to | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions (Noah Misch <noah@leadboat.com>) |
Responses |
HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions |
List | pgsql-hackers |
On Wed, Dec 9, 2020 at 2:56 PM Noah Misch <noah@leadboat.com> wrote: > > Further testing showed it was a file location problem, not a deletion problem. > The worker tried to open > base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset/16393-510.changes.0, but these > were the files actually existing: > > [nm@power-aix 0:2 2020-12-08T13:56:35 64gcc 0]$ ls -la $(find src/test/subscription/tmp_check -name '*sharedfileset*') > src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.0.sharedfileset: > total 408 > drwx------ 2 nm usr 256 Dec 08 03:20 . > drwx------ 4 nm usr 256 Dec 08 03:20 .. > -rw------- 1 nm usr 207806 Dec 08 03:20 16393-510.changes.0 > > src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset: > total 0 > drwx------ 2 nm usr 256 Dec 08 03:20 . > drwx------ 4 nm usr 256 Dec 08 03:20 .. > -rw------- 1 nm usr 0 Dec 08 03:20 16393-511.changes.0 > > > > I have executed "make check" in the loop with only this file. I have > > > repeated it 5000 times but no failure, I am wondering shall we try to > > > execute in the same machine in a loop where it failed once? > > > > Yes, that might help. Noah, would it be possible for you to try that > > The problem is xidhash using strcmp() to compare keys; it needs memcmp(). For > this to matter, xidhash must contain more than one element. Existing tests > rarely exercise the multi-element scenario. Under heavy load, on this system, > the test publisher can have two active transactions at once, in which case it > does exercise multi-element xidhash. (The publisher is sensitive to timing, > but the subscriber is not; once WAL contains interleaved records of two XIDs, > the subscriber fails every time.) This would be much harder to reproduce on a > little-endian system, where strcmp(&xid, &xid_plus_one)!=0. On big-endian, > every small XID has zero in the first octet; they all look like empty strings. > Your analysis is correct. > The attached patch has the one-line fix and some test suite changes that make > this reproduce frequently on any big-endian system. I'm currently planning to > drop the test suite changes from the commit, but I could keep them if folks > like them. (They'd need more comments and timeout handling.) > I think it is better to keep this test which can always test multiple streams on the subscriber. Thanks for working on this. -- With Regards, Amit Kapila.
pgsql-hackers by date: