Thread: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing
BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing
From
ofir.manor@gmail.com
Date:
The following bug has been logged on the website: Bug reference: 13725 Logged by: Ofir Manor Email address: ofir.manor@gmail.com PostgreSQL version: 9.4.5 Operating system: CentOS 7.1 64-bit Description: Nasty but easily reproducible. Test case, script 1. I clean up the previous test, create a new table and two replication slot. Then, I run a script in the background (it will trigger the bug), insert 100K rows, try to get them with pg_logical_slot_get_changes from the first slot. This works correctly every time - I get back the 100,000 rows + one BEGIN + one COMMIT. psql --quiet << EOF SELECT pg_drop_replication_slot('test1_slot'); SELECT pg_drop_replication_slot('test2_slot'); DROP TABLE test; CREATE TABLE test (id int primary key, v varchar); SELECT * FROM pg_create_logical_replication_slot('test1_slot', 'test_decoding'); SELECT * FROM pg_create_logical_replication_slot('test2_slot', 'test_decoding'); EOF ./get_loop.sh & psql --quiet --tuples-only -o out1 << EOF INSERT INTO test SELECT i, (i*3)::varchar FROM generate_series(1,100000) i; SELECT * FROM pg_logical_slot_get_changes('test1_slot', NULL, NULL); EOF cat --squeeze-blank out1 | wc -l Script 2 (get_loop.sh) continuously polls for changes using a loop of pg_logical_slot_get_changes, using the second replication slot. Since the test pg database is idle, only one call returns. for i in `seq 1 10000`; do echo "SELECT * FROM pg_logical_slot_get_changes('test2_slot', NULL, NULL);" done | psql --quiet --tuples-only -o out2 cat --squeeze-blank out2 | wc -l However, I get 116K or 148K rows in the output (after the 100K inserts). I can clearly see the the output jumps back after 16K or 48K rows - starting to stream the transaction from the beginning (see the first column - the offsets): ... 1/68F6E1B8 | 450854 | table public.test: INSERT: id[integer]:49149 v[character varying]:'147447' 1/68F6E248 | 450854 | table public.test: INSERT: id[integer]:49150 v[character varying]:'147450' 1/68F6E2D8 | 450854 | table public.test: INSERT: id[integer]:49151 v[character varying]:'147453' 1/68F6E368 | 450854 | table public.test: INSERT: id[integer]:49152 v[character varying]:'147456' 1/68891010 | 450854 | table public.test: INSERT: id[integer]:1 v[character varying]:'3' 1/688910D8 | 450854 | table public.test: INSERT: id[integer]:2 v[character varying]:'6' 1/68891168 | 450854 | table public.test: INSERT: id[integer]:3 v[character varying]:'9' 1/688911F8 | 450854 | table public.test: INSERT: id[integer]:4 v[character varying]:'12' ... When I have larger transaction (inserting 1M or 10M rows, it gets worse, jumping back to the beginning multiple times. BTW - I don't know if it is a bug in pg_logical_slot_get_changes or in the test_decoding output plugin that I'm using to evaluate logical decoding. Any fix / workaround? Thanks in advance, Ofir
Re: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing
From
"Shulgin, Oleksandr"
Date:
On Sun, Oct 25, 2015 at 8:14 PM, <ofir.manor@gmail.com> wrote: > The following bug has been logged on the website: > > Bug reference: 13725 > Logged by: Ofir Manor > Email address: ofir.manor@gmail.com > PostgreSQL version: 9.4.5 > Operating system: CentOS 7.1 64-bit > Description: > > Nasty but easily reproducible. > > Test case, script 1. > I clean up the previous test, create a new table and two replication slot. > Then, I run a script in the background (it will trigger the bug), insert > 100K rows, try to get them with pg_logical_slot_get_changes from the first > slot. > This works correctly every time - I get back the 100,000 rows + one BEGIN + > one COMMIT. > psql --quiet << EOF > SELECT pg_drop_replication_slot('test1_slot'); > SELECT pg_drop_replication_slot('test2_slot'); > DROP TABLE test; > CREATE TABLE test (id int primary key, v varchar); > SELECT * FROM pg_create_logical_replication_slot('test1_slot', > 'test_decoding'); > SELECT * FROM pg_create_logical_replication_slot('test2_slot', > 'test_decoding'); > EOF > ./get_loop.sh & > psql --quiet --tuples-only -o out1 << EOF > INSERT INTO test SELECT i, (i*3)::varchar FROM generate_series(1,100000) > i; > SELECT * FROM pg_logical_slot_get_changes('test1_slot', NULL, NULL); > EOF > cat --squeeze-blank out1 | wc -l > > Script 2 (get_loop.sh) continuously polls for changes using a loop of > pg_logical_slot_get_changes, using the second replication slot. Since the > test pg database is idle, only one call returns. > > for i in `seq 1 10000`; do > echo "SELECT * FROM pg_logical_slot_get_changes('test2_slot', NULL, > NULL);" > done | psql --quiet --tuples-only -o out2 > cat --squeeze-blank out2 | wc -l > > However, I get 116K or 148K rows in the output (after the 100K inserts). > I can clearly see the the output jumps back after 16K or 48K rows - > starting > to stream the transaction from the beginning (see the first column - the > offsets): > ... > 1/68F6E1B8 | 450854 | table public.test: INSERT: id[integer]:49149 > v[character varying]:'147447' > 1/68F6E248 | 450854 | table public.test: INSERT: id[integer]:49150 > v[character varying]:'147450' > 1/68F6E2D8 | 450854 | table public.test: INSERT: id[integer]:49151 > v[character varying]:'147453' > 1/68F6E368 | 450854 | table public.test: INSERT: id[integer]:49152 > v[character varying]:'147456' > 1/68891010 | 450854 | table public.test: INSERT: id[integer]:1 v[character > varying]:'3' > 1/688910D8 | 450854 | table public.test: INSERT: id[integer]:2 v[character > varying]:'6' > 1/68891168 | 450854 | table public.test: INSERT: id[integer]:3 v[character > varying]:'9' > 1/688911F8 | 450854 | table public.test: INSERT: id[integer]:4 v[character > varying]:'12' > ... > > When I have larger transaction (inserting 1M or 10M rows, it gets worse, > jumping back to the beginning multiple times. > > BTW - I don't know if it is a bug in pg_logical_slot_get_changes or in the > test_decoding output plugin that I'm using to evaluate logical decoding. > > Any fix / workaround? > I didn't try it, but looks like this part of code in logicalfuncs.c is to blame: static Datum pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool binary) { [snip] /* compute the current end-of-wal */ if (!RecoveryInProgress()) end_of_wal = GetFlushRecPtr(); else end_of_wal = GetXLogReplayRecPtr(NULL); [snip] while ((startptr != InvalidXLogRecPtr && startptr < end_of_wal) || (ctx->reader->EndRecPtr && ctx->reader->EndRecPtr < end_of_wal)) { So we compute end_of_wal before entering the loop, but the WAL keeps growing as we read through it. If we do it correctly, there's potential that the loop will never finish if the WAL grows faster than we can decode it. Shouldn't we also try to re-write this SRF to support SFRM_ValuePerCall? Another (quite minor) problem I see is this: --- a/src/backend/replication/logical/logicalfuncs.c +++ b/src/backend/replication/logical/logicalfuncs.c @@ -301,7 +301,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin upto_lsn = PG_GETARG_LSN(1); if (PG_ARGISNULL(2)) - upto_nchanges = InvalidXLogRecPtr; + upto_nchanges = 0; else upto_nchanges = PG_GETARG_INT32(2); -- Alex
Re: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing
From
Andres Freund
Date:
On 2015-10-26 10:43:08 +0100, Shulgin, Oleksandr wrote: > > Test case, script 1. > > I clean up the previous test, create a new table and two replication slot. > > Then, I run a script in the background (it will trigger the bug), insert > > 100K rows, try to get them with pg_logical_slot_get_changes from the first > > slot. > > This works correctly every time - I get back the 100,000 rows + one BEGIN + > > one COMMIT. > > psql --quiet << EOF > > SELECT pg_drop_replication_slot('test1_slot'); > > SELECT pg_drop_replication_slot('test2_slot'); > > DROP TABLE test; > > CREATE TABLE test (id int primary key, v varchar); > > SELECT * FROM pg_create_logical_replication_slot('test1_slot', > > 'test_decoding'); > > SELECT * FROM pg_create_logical_replication_slot('test2_slot', > > 'test_decoding'); > > EOF > > ./get_loop.sh & > > psql --quiet --tuples-only -o out1 << EOF > > INSERT INTO test SELECT i, (i*3)::varchar FROM generate_series(1,100000) > > i; > > SELECT * FROM pg_logical_slot_get_changes('test1_slot', NULL, NULL); > > EOF > > cat --squeeze-blank out1 | wc -l > > > > Script 2 (get_loop.sh) continuously polls for changes using a loop of > > pg_logical_slot_get_changes, using the second replication slot. Since the > > test pg database is idle, only one call returns. If this is anything more than a reproducer: don't do that, use the streaming protocol. > > for i in `seq 1 10000`; do > > echo "SELECT * FROM pg_logical_slot_get_changes('test2_slot', NULL, > > NULL);" > > done | psql --quiet --tuples-only -o out2 > > cat --squeeze-blank out2 | wc -l > > > > However, I get 116K or 148K rows in the output (after the 100K inserts). > > I can clearly see the the output jumps back after 16K or 48K rows - > > starting > > to stream the transaction from the beginning (see the first column - the > > offsets): > > ... > > 1/68F6E1B8 | 450854 | table public.test: INSERT: id[integer]:49149 > > v[character varying]:'147447' > > 1/68F6E248 | 450854 | table public.test: INSERT: id[integer]:49150 > > v[character varying]:'147450' > > 1/68F6E2D8 | 450854 | table public.test: INSERT: id[integer]:49151 > > v[character varying]:'147453' > > 1/68F6E368 | 450854 | table public.test: INSERT: id[integer]:49152 > > v[character varying]:'147456' > > 1/68891010 | 450854 | table public.test: INSERT: id[integer]:1 v[character > > varying]:'3' > > 1/688910D8 | 450854 | table public.test: INSERT: id[integer]:2 v[character > > varying]:'6' > > 1/68891168 | 450854 | table public.test: INSERT: id[integer]:3 v[character > > varying]:'9' > > 1/688911F8 | 450854 | table public.test: INSERT: id[integer]:4 v[character > > varying]:'12' > > ... > > > > When I have larger transaction (inserting 1M or 10M rows, it gets worse, > > jumping back to the beginning multiple times. > > > > BTW - I don't know if it is a bug in pg_logical_slot_get_changes or in the > > test_decoding output plugin that I'm using to evaluate logical decoding. > > > > Any fix / workaround? > > > > I didn't try it, but looks like this part of code in logicalfuncs.c is to > blame: > > static Datum > pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, > bool binary) > { > [snip] > > /* compute the current end-of-wal */ > if (!RecoveryInProgress()) > end_of_wal = GetFlushRecPtr(); > else > end_of_wal = GetXLogReplayRecPtr(NULL); > > [snip] > while ((startptr != InvalidXLogRecPtr && startptr < end_of_wal) || > (ctx->reader->EndRecPtr && ctx->reader->EndRecPtr < end_of_wal)) > { > > So we compute end_of_wal before entering the loop, but the WAL keeps > growing as we read through it. So? > If we do it correctly, there's potential that the loop will never finish if > the WAL grows faster than we can decode it. > Shouldn't we also try to re-write this SRF to support > SFRM_ValuePerCall? Why? ValuePercall gets materialized into a tuplestore as well, unless you call it from the select list. > Another (quite minor) problem I see is this: > > --- a/src/backend/replication/logical/logicalfuncs.c > +++ b/src/backend/replication/logical/logicalfuncs.c > @@ -301,7 +301,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo > fcinfo, bool confirm, bool bin > upto_lsn = PG_GETARG_LSN(1); > > if (PG_ARGISNULL(2)) > - upto_nchanges = InvalidXLogRecPtr; > + upto_nchanges = 0; > else > upto_nchanges = PG_GETARG_INT32(2); Ugh, yes. Greetings, Andres Freund
Re: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing
From
"Shulgin, Oleksandr"
Date:
On Mon, Oct 26, 2015 at 12:05 PM, Andres Freund <andres@anarazel.de> wrote: > > > > So we compute end_of_wal before entering the loop, but the WAL keeps > > growing as we read through it. > > So? > > > If we do it correctly, there's potential that the loop will never finish > if > > the WAL grows faster than we can decode it. > > > Shouldn't we also try to re-write this SRF to support > > SFRM_ValuePerCall? > > Why? ValuePercall gets materialized into a tuplestore as well, unless > you call it from the select list. > Ah, yes: good catch. Then it makes less sense to change this indeed. -- Alex
Re: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing
From
Andres Freund
Date:
On 2015-10-26 13:21:44 +0200, ofir.manor@gmail.com wrote: > Yes, this is a small script to reproduce, the real code is Java, we saw > sporadic wrong results. > However, I'm interested in CDC (get change notifications per row to my > app), not PG-to-PG replication. The streaming interface exists for row-by-row notification as well, and is a *LOT* more efficient. If there's a bug here, we obviously need to fix it nonetheless. Greetings, Andres Freund
Re: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing
From
"Shulgin, Oleksandr"
Date:
On Mon, Oct 26, 2015 at 12:30 PM, Andres Freund <andres@anarazel.de> wrote: > On 2015-10-26 13:21:44 +0200, ofir.manor@gmail.com wrote: > > Yes, this is a small script to reproduce, the real code is Java, we saw > > sporadic wrong results. > > However, I'm interested in CDC (get change notifications per row to my > > app), not PG-to-PG replication. > > The streaming interface exists for row-by-row notification as well, and > is a *LOT* more efficient. > Yeah, but I don't think there's a workable Java implementation available yet? If there's a bug here, we obviously need to fix it nonetheless. > I would assume re-calculating end_of_wal in the while loop condition would fix this? One can obviously run out of memory with a more WAL-intensive example, but I would not consider that a bug of this interface. -- Alex
Re: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing
From
Andres Freund
Date:
On 2015-10-26 12:34:48 +0100, Shulgin, Oleksandr wrote: > On Mon, Oct 26, 2015 at 12:30 PM, Andres Freund <andres@anarazel.de> wrote: > > > On 2015-10-26 13:21:44 +0200, ofir.manor@gmail.com wrote: > > > Yes, this is a small script to reproduce, the real code is Java, we saw > > > sporadic wrong results. > > > However, I'm interested in CDC (get change notifications per row to my > > > app), not PG-to-PG replication. > > > > The streaming interface exists for row-by-row notification as well, and > > is a *LOT* more efficient. > > > > Yeah, but I don't think there's a workable Java implementation available > yet? No idea, but it's not that hard to write one. > > If there's a bug here, we obviously need to fix it nonetheless. > I would assume re-calculating end_of_wal in the while loop condition would > fix this? Why? That'd just lead to outputting more rows in one invocation, and that's it? I think I'm not following what you see as the problem?
Re: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing
From
"Shulgin, Oleksandr"
Date:
On Mon, Oct 26, 2015 at 12:38 PM, Andres Freund <andres@anarazel.de> wrote: > On 2015-10-26 12:34:48 +0100, Shulgin, Oleksandr wrote: > > On Mon, Oct 26, 2015 at 12:30 PM, Andres Freund <andres@anarazel.de> > wrote: > > > > > On 2015-10-26 13:21:44 +0200, ofir.manor@gmail.com wrote: > > > > Yes, this is a small script to reproduce, the real code is Java, we > saw > > > > sporadic wrong results. > > > > However, I'm interested in CDC (get change notifications per row to > my > > > > app), not PG-to-PG replication. > > > > > > The streaming interface exists for row-by-row notification as well, and > > > is a *LOT* more efficient. > > > > > > > Yeah, but I don't think there's a workable Java implementation available > > yet? > > No idea, but it's not that hard to write one. > > > > If there's a bug here, we obviously need to fix it nonetheless. > > I would assume re-calculating end_of_wal in the while loop condition > would > > fix this? > > Why? That'd just lead to outputting more rows in one invocation, and > that's it? I think I'm not following what you see as the problem? > I think there are just some false expectations involved about how this interface should work. The OP likely expects that after the partial results were returned by the first call to pg_logical_slot_get_changes(), the next call will continue from the point where the first call left. This doesn't happen because in the first call we never cross transaction boundary? Hm, but why do we see the partial changes anyway? I would assume if we started decoding this at all, the transaction was already committed and end_of_wal will be past its end... I'm lost. -- Alex
Re: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing
From
Andres Freund
Date:
Hi, On 2015-10-26 14:09:24 +0200, ofir.manor@gmail.com wrote: > Regarding my expectations... > 1. I have observed that pg_logical_slot_get_changes always return changes > at the transaction boundary. Right, it's documented to do so. > For example, asking for 1 change (third parameter, replace NULL a number) > can still return million of changes (if the next transaction is big), as > the check is likely done at the end of each transaction. > I'm actually OK with that (weird but doesn't hurt me). The reason for that is that we can't return changes until they're committed, and that we currently can't specify anything but transaction boundaries as "progress" point. > 2. I have validated that a single call to pg_logical_slot_get_changes > returns a result set with duplicates, going back to the start (I've seen it > with a Java debugger, looping over the forward-only cursor of the SELECT > from the replication slot). That is the bug I'm reporting - not across > calls but within a single call. That'd be rather weird. I'll look into it, but due to the pgconf.eu conference this week I can't promise I'll get to it this week. > 3. I also don't understand the "end of WAL" thing, since the call must wait > to see if the transaction commits or rolls back. The SQL interface doesn't ever block waiting for WAL, it decodes as far as it can with the existing data. > Maybe if it reaches "end of WAL", it starts over internally, adding the > same changes again to some internal buffer that later gets sent? No. Andres
Re: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing
From
Date:
> > > > 2. I have validated that a single call to pg_logical_slot_get_changes > > returns a result set with duplicates, going back to the start (I've seen > it > > with a Java debugger, looping over the forward-only cursor of the SELECT > > from the replication slot). That is the bug I'm reporting - not across > > calls but within a single call. > > That'd be rather weird. I'll look into it, but due to the pgconf.eu > conference this week I can't promise I'll get to it this week. > > > Andres > Sure, triage can wait, make the most out of the conference and thanks for the fast replies! I did update the test case to prove that all output comes from a single call. I added a different constant to each SQL query - the bash loop iteration number: for i in `seq 1 10000`; do echo SELECT $i ",v.* FROM pg_logical_slot_get_changes('test2_slot', NULL, NULL) v;" done | psql --quiet --tuples-only | cat --squeeze-blank > out2 cat out2 | wc -l You can see all output comes with the same SQL (in this case, statement 419). 419 | 1/76344D18 | 450880 | BEGIN 450880 419 | 1/76344D18 | 450880 | table public.test: INSERT: id[integer]:1 v[character varying]:'3' 419 | 1/76344DE0 | 450880 | table public.test: INSERT: id[integer]:2 v[character varying]:'6' ... 419 | 1/76BD9348 | 450880 | table public.test: INSERT: id[integer]:61439 v[character varying]:'184317' 419 | 1/76BD93D8 | 450880 | table public.test: INSERT: id[integer]:61440 v[character varying]:'184320' 419 | 1/76344D18 | 450880 | table public.test: INSERT: id[integer]:1 v[character varying]:'3' 419 | 1/76344DE0 | 450880 | table public.test: INSERT: id[integer]:2 v[character varying]:'6' ... 419 | 1/7713BEB0 | 450880 | table public.test: INSERT: id[integer]:99999 v[character varying]:'299997' 419 | 1/7713BF40 | 450880 | table public.test: INSERT: id[integer]:100000 v[character varying]:'300000' 419 | 1/7713C028 | 450880 | COMMIT 450880 -- Ofir Manor Blog: http://ofirm.wordpress.com <http://ofirm.wordpress.com> LinkedIn: http://il.linkedin.com/in/ofirmanor Twitter: @ofirm Mobile: +972-54-7801286
Re: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing
From
Date:
Regarding my expectations... 1. I have observed that pg_logical_slot_get_changes always return changes at the transaction boundary. For example, asking for 1 change (third parameter, replace NULL a number) can still return million of changes (if the next transaction is big), as the check is likely done at the end of each transaction. I'm actually OK with that (weird but doesn't hurt me). 2. I have validated that a single call to pg_logical_slot_get_changes returns a result set with duplicates, going back to the start (I've seen it with a Java debugger, looping over the forward-only cursor of the SELECT from the replication slot). That is the bug I'm reporting - not across calls but within a single call. 3. I also don't understand the "end of WAL" thing, since the call must wait to see if the transaction commits or rolls back. Maybe if it reaches "end of WAL", it starts over internally, adding the same changes again to some internal buffer that later gets sent? I can also add that it starts over after a "round" number of rows - I saw 8K, 52K, 64K etc (run the test case). Maybe that is a hint (or just an artifact of how INSERT AS SELECT works?) On Mon, Oct 26, 2015 at 1:51 PM, Shulgin, Oleksandr < oleksandr.shulgin@zalando.de> wrote: > On Mon, Oct 26, 2015 at 12:38 PM, Andres Freund <andres@anarazel.de> > wrote: > >> On 2015-10-26 12:34:48 +0100, Shulgin, Oleksandr wrote: >> > On Mon, Oct 26, 2015 at 12:30 PM, Andres Freund <andres@anarazel.de> >> wrote: >> > >> > > On 2015-10-26 13:21:44 +0200, ofir.manor@gmail.com wrote: >> > > > Yes, this is a small script to reproduce, the real code is Java, we >> saw >> > > > sporadic wrong results. >> > > > However, I'm interested in CDC (get change notifications per row to >> my >> > > > app), not PG-to-PG replication. >> > > >> > > The streaming interface exists for row-by-row notification as well, >> and >> > > is a *LOT* more efficient. >> > > >> > >> > Yeah, but I don't think there's a workable Java implementation available >> > yet? >> >> No idea, but it's not that hard to write one. >> >> > > If there's a bug here, we obviously need to fix it nonetheless. >> > I would assume re-calculating end_of_wal in the while loop condition >> would >> > fix this? >> >> Why? That'd just lead to outputting more rows in one invocation, and >> that's it? I think I'm not following what you see as the problem? >> > > I think there are just some false expectations involved about how this > interface should work. The OP likely expects that after the partial > results were returned by the first call to pg_logical_slot_get_changes(), > the next call will continue from the point where the first call left. > > This doesn't happen because in the first call we never cross transaction > boundary? Hm, but why do we see the partial changes anyway? I would > assume if we started decoding this at all, the transaction was already > committed and end_of_wal will be past its end... > > I'm lost. > > -- > Alex > > -- Ofir Manor Blog: http://ofirm.wordpress.com <http://ofirm.wordpress.com> LinkedIn: http://il.linkedin.com/in/ofirmanor Twitter: @ofirm Mobile: +972-54-7801286
Re: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing
From
Date:
Yes, this is a small script to reproduce, the real code is Java, we saw sporadic wrong results. However, I'm interested in CDC (get change notifications per row to my app), not PG-to-PG replication. Hope it makes sense, Ofir On Mon, Oct 26, 2015 at 1:05 PM, Andres Freund <andres@anarazel.de> wrote: > On 2015-10-26 10:43:08 +0100, Shulgin, Oleksandr wrote: > > > Test case, script 1. > > > I clean up the previous test, create a new table and two replication > slot. > > > Then, I run a script in the background (it will trigger the bug), > insert > > > 100K rows, try to get them with pg_logical_slot_get_changes from the > first > > > slot. > > > This works correctly every time - I get back the 100,000 rows + one > BEGIN + > > > one COMMIT. > > > psql --quiet << EOF > > > SELECT pg_drop_replication_slot('test1_slot'); > > > SELECT pg_drop_replication_slot('test2_slot'); > > > DROP TABLE test; > > > CREATE TABLE test (id int primary key, v varchar); > > > SELECT * FROM pg_create_logical_replication_slot('test1_slot', > > > 'test_decoding'); > > > SELECT * FROM pg_create_logical_replication_slot('test2_slot', > > > 'test_decoding'); > > > EOF > > > ./get_loop.sh & > > > psql --quiet --tuples-only -o out1 << EOF > > > INSERT INTO test SELECT i, (i*3)::varchar FROM > generate_series(1,100000) > > > i; > > > SELECT * FROM pg_logical_slot_get_changes('test1_slot', NULL, NULL); > > > EOF > > > cat --squeeze-blank out1 | wc -l > > > > > > Script 2 (get_loop.sh) continuously polls for changes using a loop of > > > pg_logical_slot_get_changes, using the second replication slot. Since > the > > > test pg database is idle, only one call returns. > > If this is anything more than a reproducer: don't do that, use the > streaming protocol. > > > > for i in `seq 1 10000`; do > > > echo "SELECT * FROM pg_logical_slot_get_changes('test2_slot', NULL, > > > NULL);" > > > done | psql --quiet --tuples-only -o out2 > > > cat --squeeze-blank out2 | wc -l > > > > > > However, I get 116K or 148K rows in the output (after the 100K > inserts). > > > I can clearly see the the output jumps back after 16K or 48K rows - > > > starting > > > to stream the transaction from the beginning (see the first column - > the > > > offsets): > > > ... > > > 1/68F6E1B8 | 450854 | table public.test: INSERT: id[integer]:49149 > > > v[character varying]:'147447' > > > 1/68F6E248 | 450854 | table public.test: INSERT: id[integer]:49150 > > > v[character varying]:'147450' > > > 1/68F6E2D8 | 450854 | table public.test: INSERT: id[integer]:49151 > > > v[character varying]:'147453' > > > 1/68F6E368 | 450854 | table public.test: INSERT: id[integer]:49152 > > > v[character varying]:'147456' > > > 1/68891010 | 450854 | table public.test: INSERT: id[integer]:1 > v[character > > > varying]:'3' > > > 1/688910D8 | 450854 | table public.test: INSERT: id[integer]:2 > v[character > > > varying]:'6' > > > 1/68891168 | 450854 | table public.test: INSERT: id[integer]:3 > v[character > > > varying]:'9' > > > 1/688911F8 | 450854 | table public.test: INSERT: id[integer]:4 > v[character > > > varying]:'12' > > > ... > > > > > > When I have larger transaction (inserting 1M or 10M rows, it gets > worse, > > > jumping back to the beginning multiple times. > > > > > > BTW - I don't know if it is a bug in pg_logical_slot_get_changes or in > the > > > test_decoding output plugin that I'm using to evaluate logical > decoding. > > > > > > Any fix / workaround? > > > > > > > I didn't try it, but looks like this part of code in logicalfuncs.c is to > > blame: > > > > static Datum > > pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, > > bool binary) > > { > > [snip] > > > > /* compute the current end-of-wal */ > > if (!RecoveryInProgress()) > > end_of_wal = GetFlushRecPtr(); > > else > > end_of_wal = GetXLogReplayRecPtr(NULL); > > > > [snip] > > while ((startptr != InvalidXLogRecPtr && startptr < end_of_wal) || > > (ctx->reader->EndRecPtr && ctx->reader->EndRecPtr < end_of_wal)) > > { > > > > So we compute end_of_wal before entering the loop, but the WAL keeps > > growing as we read through it. > > So? > > > If we do it correctly, there's potential that the loop will never finish > if > > the WAL grows faster than we can decode it. > > > Shouldn't we also try to re-write this SRF to support > > SFRM_ValuePerCall? > > Why? ValuePercall gets materialized into a tuplestore as well, unless > you call it from the select list. > > > Another (quite minor) problem I see is this: > > > > --- a/src/backend/replication/logical/logicalfuncs.c > > +++ b/src/backend/replication/logical/logicalfuncs.c > > @@ -301,7 +301,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo > > fcinfo, bool confirm, bool bin > > upto_lsn = PG_GETARG_LSN(1); > > > > if (PG_ARGISNULL(2)) > > - upto_nchanges = InvalidXLogRecPtr; > > + upto_nchanges = 0; > > else > > upto_nchanges = PG_GETARG_INT32(2); > > Ugh, yes. > > Greetings, > > Andres Freund > -- Ofir Manor Blog: http://ofirm.wordpress.com <http://ofirm.wordpress.com> LinkedIn: http://il.linkedin.com/in/ofirmanor Twitter: @ofirm Mobile: +972-54-7801286
Re: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing
From
Date:
Hi Andreas, any update? To add another data point, I found that the "concurrent" run (get from test2_slot) is also running much slower than the get that starts after the insert commits (get from test1_slot). You can amplify this by going from an insert of a 100K rows to 1M rows., Ofir On Mon, Oct 26, 2015 at 3:05 PM, <ofir.manor@gmail.com> wrote: > >> > 2. I have validated that a single call to pg_logical_slot_get_changes >> > returns a result set with duplicates, going back to the start (I've >> seen it >> > with a Java debugger, looping over the forward-only cursor of the SELECT >> > from the replication slot). That is the bug I'm reporting - not across >> > calls but within a single call. >> >> That'd be rather weird. I'll look into it, but due to the pgconf.eu >> conference this week I can't promise I'll get to it this week. >> >> >> Andres >> > > > Sure, triage can wait, make the most out of the conference and thanks for > the fast replies! > I did update the test case to prove that all output comes from a single > call. I added a different constant to each SQL query - the bash loop > iteration number: > > for i in `seq 1 10000`; do > echo SELECT $i ",v.* FROM pg_logical_slot_get_changes('test2_slot', > NULL, NULL) v;" > done | psql --quiet --tuples-only | cat --squeeze-blank > out2 > cat out2 | wc -l > > You can see all output comes with the same SQL (in this case, statement > 419). > 419 | 1/76344D18 | 450880 | BEGIN 450880 > 419 | 1/76344D18 | 450880 | table public.test: INSERT: id[integer]:1 > v[character varying]:'3' > 419 | 1/76344DE0 | 450880 | table public.test: INSERT: id[integer]:2 > v[character varying]:'6' > ... > 419 | 1/76BD9348 | 450880 | table public.test: INSERT: > id[integer]:61439 v[character varying]:'184317' > 419 | 1/76BD93D8 | 450880 | table public.test: INSERT: > id[integer]:61440 v[character varying]:'184320' > 419 | 1/76344D18 | 450880 | table public.test: INSERT: id[integer]:1 > v[character varying]:'3' > 419 | 1/76344DE0 | 450880 | table public.test: INSERT: id[integer]:2 > v[character varying]:'6' > ... > > 419 | 1/7713BEB0 | 450880 | table public.test: INSERT: > id[integer]:99999 v[character varying]:'299997' > 419 | 1/7713BF40 | 450880 | table public.test: INSERT: > id[integer]:100000 v[character varying]:'300000' > 419 | 1/7713C028 | 450880 | COMMIT 450880 > > -- > > Ofir Manor > > Blog: http://ofirm.wordpress.com <http://ofirm.wordpress.com> > LinkedIn: http://il.linkedin.com/in/ofirmanor > > Twitter: @ofirm Mobile: +972-54-7801286 > -- Ofir Manor Blog: http://ofirm.wordpress.com <http://ofirm.wordpress.com> LinkedIn: http://il.linkedin.com/in/ofirmanor Twitter: @ofirm Mobile: +972-54-7801286