Thread: Re: [HACKERS] make async slave to wait for lsn to be replayed
On 2018-03-06 14:50, Simon Riggs wrote: > On 6 March 2018 at 11:24, Dmitry Ivanov <d.ivanov@postgrespro.ru> > wrote: >>> In PG11, I propose the following command, sticking mostly to Ants' >>> syntax, and allowing to wait for multiple events before it returns. >>> It >>> doesn't hold snapshot and will not get cancelled by Hot Standby. >>> >>> WAIT FOR event [, event ...] options >>> >>> event is >>> LSN value >>> TIMESTAMP value >>> >>> options >>> TIMEOUT delay >>> UNTIL TIMESTAMP timestamp >>> (we have both, so people don't need to do math, they can use >>> whichever >>> they have) >> >> >> I have a (possibly) dumb question: if we have specified several >> events, >> should WAIT finish if only one of them triggered? It's not immediately >> obvious if we're waiting for ALL of them to trigger, or just one will >> suffice (ANY). IMO the syntax could be extended to something like: >> >> WAIT FOR [ANY | ALL] event [, event ...] options, >> >> with ANY being the default variant. > > +1 Here I made new patch of feature, discussed above. WAIT FOR - wait statement to pause beneath statements ========== Synopsis ========== WAIT FOR [ANY | SOME | ALL] event [, event ...] options and event is: LSN value TIMESTAMP value and options is: TIMEOUT delay UNTIL TIMESTAMP timestamp Description ========== WAIT FOR - make to wait statements (that are beneath) on sleep until event happens (Don’t process new queries until an event happens). How to use it ========== WAIT FOR LSN ‘LSN’ [, timeout in ms]; #Wait until LSN 0/303EC60 will be replayed, or 10 second passed. WAIT FOR ANY LSN ‘0/303EC60’, TIMEOUT 10000; #Or same without timeout. WAIT FOR LSN ‘0/303EC60’; #Or wait for some timestamp. WAIT FOR TIMESTAMP '2020-01-02 17:20:19.028161+03'; #Wait until ALL events occur: LSN to be replayed and timestamp passed. WAIT FOR ALL LSN ‘0/303EC60’, TIMESTAMP '2020-01-28 11:10:39.021341+03'; Notice: WAIT FOR will release on PostmasterDeath or Interruption events if they come earlier then LSN or timeout. Testing the implementation ====================== The implementation was tested with src/test/recovery/t/018_waitfor.pl -- Ivan Kartyshov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hello. I looked this briefly but not tested. At Fri, 06 Mar 2020 00:24:01 +0300, Kartyshov Ivan <i.kartyshov@postgrespro.ru> wrote in > On 2018-03-06 14:50, Simon Riggs wrote: > > On 6 March 2018 at 11:24, Dmitry Ivanov <d.ivanov@postgrespro.ru> > > wrote: > >>> In PG11, I propose the following command, sticking mostly to Ants' > >>> syntax, and allowing to wait for multiple events before it returns. It > >>> doesn't hold snapshot and will not get cancelled by Hot Standby. > >>> WAIT FOR event [, event ...] options > >>> event is > >>> LSN value > >>> TIMESTAMP value > >>> options > >>> TIMEOUT delay > >>> UNTIL TIMESTAMP timestamp > >>> (we have both, so people don't need to do math, they can use whichever > >>> they have) > >> I have a (possibly) dumb question: if we have specified several > >> events, > >> should WAIT finish if only one of them triggered? It's not immediately > >> obvious if we're waiting for ALL of them to trigger, or just one will > >> suffice (ANY). IMO the syntax could be extended to something like: > >> WAIT FOR [ANY | ALL] event [, event ...] options, > >> with ANY being the default variant. > > +1 > > Here I made new patch of feature, discussed above. > > WAIT FOR - wait statement to pause beneath statements > ========== > > Synopsis > ========== > WAIT FOR [ANY | SOME | ALL] event [, event ...] options > and event is: > LSN value > TIMESTAMP value > > and options is: > TIMEOUT delay > UNTIL TIMESTAMP timestamp The syntax seems getting confused. What happens if we typed in the command "WAIT FOR TIMESTAMP '...' UNTIL TIMESTAMP '....'"? It seems to me the options is useles. Couldn't the TIMEOUT option be a part of event? I know gram.y doesn't accept that syntax but it is not apparent from the description above. As I read through the previous thread, one of the reason for this feature implemented as a syntax is it was inteded to be combined into BEGIN statement. If there is not any use case for the feature midst of a transaction, why don't you turn it into a part of BEGIN command? > Description > ========== > WAIT FOR - make to wait statements (that are beneath) on sleep until > event happens (Don’t process new queries until an event happens). ... > Notice: WAIT FOR will release on PostmasterDeath or Interruption > events > if they come earlier then LSN or timeout. I think interrupts ought to result in ERROR. wait.c adds a fair amount of code and uses proc-array based approach. But Thomas suggested queue-based approach and I also think it is better. We already have a queue-based mechanism that behaves almost the same with this feature in the comit code on master-side. It avoids spurious backend wakeups. Couldn't we extend SyncRepWaitForLSN or share a part of the code/infrastructures so that this feature can share the code? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
As it was discussed earlier, I added wait for statement into begin/start statement. Synopsis ========== BEGIN [ WORK | TRANSACTION ] [ transaction_mode[, ...] ] wait_for_event where transaction_mode is one of: ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED } READ WRITE | READ ONLY [ NOT ] DEFERRABLE WAIT FOR [ANY | SOME | ALL] event [, event ...] and event is: LSN value [options] TIMESTAMP value and options is: TIMEOUT delay UNTIL TIMESTAMP timestamp -- Ivan Kartyshov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 2020-03-21 14:16, Kartyshov Ivan wrote: > As it was discussed earlier, I added wait for statement into > begin/start statement. Thanks! To address the discussion: I like the idea of having WAIT as a part of BEGIN statement rather than a separate command, as Thomas Munro suggested. That way, the syntax itself enforces that WAIT FOR LSN will actually take effect, even for single-snapshot transactions. It seems more convenient for the user, who won't have to remember the details about how WAIT interacts with isolation levels. > BEGIN [ WORK | TRANSACTION ] [ transaction_mode[, ...] ] wait_for_event Not sure about this, but could we add "WAIT FOR .." as another transaction_mode rather than a separate thing? That way, user won't have to worry about the order. As of now, one should remember to always put WAIT FOR as the Last parameter in the BEGIN statement. > and event is: > LSN value [options] > TIMESTAMP value I would maybe remove WAIT FOR TIMESTAMP. As Robert Haas has pointed out, it seems a lot like pg_sleep_until(). Besides, it doesn't necessarily need to be connected to transaction start, which makes it different from WAIT FOR LSN - so I wouldn't mix them together. I had another look at the code: === In WaitShmemSize() you might want to use functions that check for overflow - add_size()/mul_size(). They're used in similar situations, for example in BTreeShmemSize(). === This is how WaitUtility() is called - note that time_val will always be > 0: + if (time_val <= 0) + time_val = 1; +... + res = WaitUtility(lsn, (int)(time_val * 1000), dest); (time_val * 1000) is passed to WaitUtility() as the delay argument. And inside WaitUtility() we have this: +if (delay > 0) + latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH; +else + latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH; Since we always pass a delay value greater than 0, we'll never get to the "else" clause here and we'll never be ready to wait for LSN forever. Perhaps due to that, the current test outputs this after a simple WAIT FOR LSN command: psql:<stdin>:1: NOTICE: LSN is not reached. === Speaking of tests, When I tried to test BEGIN TRANSACTION WAIT FOR LSN, I got a segfault: LOG: statement: BEGIN TRANSACTION WAIT FOR LSN '0/3002808' LOG: server process (PID 10385) was terminated by signal 11: Segmentation fault DETAIL: Failed process was running: COMMIT Could you add some more tests to the patch when this is fixed? With WAIT as part of BEGIN statement + with things such as WAIT FOR ALL ... / WAIT FOR ANY ... / WAIT FOR LSN ... UNTIL TIMESTAMP ... === In WaitSetLatch() we should probably check backend for NULL before calling SetLatch(&backend->procLatch) We might also need to check wait statement for NULL in these two cases: + case T_TransactionStmt: + {... + result = transformWaitForStmt(pstate, (WaitStmt *) stmt->wait); case TRANS_STMT_START: {... + WaitStmt *waitstmt = (WaitStmt *) stmt->wait; + res = WaitMain(waitstmt, dest); === After we added the "wait" attribute to TransactionStmt struct, do we perhaps need to add something to _copyTransactionStmt() / _equalTransactionStmt()? -- Anna Akenteva Postgres Professional: The Russian Postgres Company http://www.postgrespro.com
Anna, thank you for your review. On 2020-03-25 21:10, Anna Akenteva wrote: > On 2020-03-21 14:16, Kartyshov Ivan wrote: >> and event is: >> LSN value [options] >> TIMESTAMP value > I would maybe remove WAIT FOR TIMESTAMP. As Robert Haas has pointed > out, it seems a lot like pg_sleep_until(). Besides, it doesn't > necessarily need to be connected to transaction start, which makes it > different from WAIT FOR LSN - so I wouldn't mix them together. I don't mind. But I think we should get one more opinions on this point. > === > This is how WaitUtility() is called - note that time_val will always be > > 0: > + if (time_val <= 0) > + time_val = 1; > +... > + res = WaitUtility(lsn, (int)(time_val * 1000), dest); > > (time_val * 1000) is passed to WaitUtility() as the delay argument. > And inside WaitUtility() we have this: > > +if (delay > 0) > + latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH; > +else > + latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH; > > Since we always pass a delay value greater than 0, we'll never get to > the "else" clause here and we'll never be ready to wait for LSN > forever. Perhaps due to that, the current test outputs this after a > simple WAIT FOR LSN command: > psql:<stdin>:1: NOTICE: LSN is not reached. I fix it, and Interruptions in last patch. Anna, feel free to work on this patch. -- Ivan Kartyshov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 2020-03-27 04:15, Kartyshov Ivan wrote: > Anna, feel free to work on this patch. Ivan and I worked on this patch a bit more. We fixed the bugs that we could find and cleaned up the code. For now, we've kept both options: WAIT as a standalone statement and WAIT as a part of BEGIN. The new patch is attached. The syntax looks a bit different now: - WAIT FOR [ANY | ALL] event [, ...] - BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] [ WAIT FOR [ANY | ALL] event [, ...]] where event is one of: LSN value TIMEOUT number_of_milliseconds timestamp Now, one event cannot contain both an LSN and a TIMEOUT. With such syntax, the behaviour seems to make sense. For the (default) WAIT FOR ALL strategy, we pick the maximum LSN and maximum allowed timeout, and wait for the LSN till the timeout is over. If no timeout is specified, we wait forever. If no LSN is specified, we just wait for the time to pass. For the WAIT FOR ANY strategy, it's the same but we pick minimum LSN and timeout. There are still some questions left: 1) Should we only keep the BEGIN option, or does the standalone command have potential after all? 2) Should we change the grammar so that WAIT can be in any position of the BEGIN statement, not necessarily in the end? Ivan and I haven't come to a consensus about this, so more opinions would be helpful. 3) Since we added the "wait" attribute to TransactionStmt struct, do we need to add something to _copyTransactionStmt() / _equalTransactionStmt()? -- Anna Akenteva Postgres Professional: The Russian Postgres Company http://www.postgrespro.com
Attachment
On 2020-04-01 02:26, Anna Akenteva wrote: > On 2020-03-27 04:15, Kartyshov Ivan wrote: >> Anna, feel free to work on this patch. > > Ivan and I worked on this patch a bit more. We fixed the bugs that we > could find and cleaned up the code. For now, we've kept both options: > WAIT as a standalone statement and WAIT as a part of BEGIN. The new > patch is attached. > > The syntax looks a bit different now: > > - WAIT FOR [ANY | ALL] event [, ...] > - BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] [ WAIT FOR > [ANY | ALL] event [, ...]] > where event is one of: > LSN value > TIMEOUT number_of_milliseconds > timestamp > > Now, one event cannot contain both an LSN and a TIMEOUT. > In my understanding the whole idea of having TIMEOUT was to do something like 'Do wait for this LSN to be replicated, but no longer than TIMEOUT milliseconds'. What is the point of having plain TIMEOUT? It seems to be equivalent to pg_sleep, doesn't it? Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
I did some code cleanup and added tests - both for the standalone WAIT FOR statement and for WAIT FOR as a part of BEGIN. The new patch is attached. On 2020-04-03 17:29, Alexey Kondratov wrote: > On 2020-04-01 02:26, Anna Akenteva wrote: >> >> - WAIT FOR [ANY | ALL] event [, ...] >> - BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] [ WAIT FOR >> [ANY | ALL] event [, ...]] >> where event is one of: >> LSN value >> TIMEOUT number_of_milliseconds >> timestamp >> >> Now, one event cannot contain both an LSN and a TIMEOUT. >> > > In my understanding the whole idea of having TIMEOUT was to do > something like 'Do wait for this LSN to be replicated, but no longer > than TIMEOUT milliseconds'. What is the point of having plain TIMEOUT? > It seems to be equivalent to pg_sleep, doesn't it? > In the patch that I reviewed, you could do things like: WAIT FOR LSN lsn0, LSN lsn1 TIMEOUT time1, LSN lsn2 TIMEOUT time2; and such a statement was in practice equivalent to WAIT FOR LSN(max(lsn0, lsn1, lsn2)) TIMEOUT (max(time1, time2)) As you can see, even though grammatically lsn1 is grouped with time1 and lsn2 is grouped with time2, both timeouts that we specified are not connected to their respective LSN-s, and instead they kinda act like global timeouts. Therefore, I didn't see a point in keeping TIMEOUT necessarily grammatically connected to LSN. In the new syntax our statement would look like this: WAIT FOR LSN lsn0, LSN lsn1, LSN lsn2, TIMEOUT time1, TIMEOUT time2; TIMEOUT-s are not forced to be grouped with LSN-s anymore, which makes it more clear that all specified TIMEOUTs will be global and will apply to all LSN-s at once. The point of having TIMEOUT is still to let us limit the time of waiting for LSNs. It's just that with the new syntax, we can also use TIMEOUT without an LSN. You are right, such a case is equivalent to pg_sleep. One way to avoid that is to prohibit waiting for TIMEOUT without specifying an LSN. Do you think we should do that? -- Anna Akenteva Postgres Professional: The Russian Postgres Company http://www.postgrespro.com
Attachment
Hi! On Fri, Apr 3, 2020 at 9:51 PM Anna Akenteva <a.akenteva@postgrespro.ru> wrote: > In the patch that I reviewed, you could do things like: > WAIT FOR > LSN lsn0, > LSN lsn1 TIMEOUT time1, > LSN lsn2 TIMEOUT time2; > and such a statement was in practice equivalent to > WAIT FOR LSN(max(lsn0, lsn1, lsn2)) TIMEOUT (max(time1, time2)) > > As you can see, even though grammatically lsn1 is grouped with time1 and > lsn2 is grouped with time2, both timeouts that we specified are not > connected to their respective LSN-s, and instead they kinda act like > global timeouts. Therefore, I didn't see a point in keeping TIMEOUT > necessarily grammatically connected to LSN. > > In the new syntax our statement would look like this: > WAIT FOR LSN lsn0, LSN lsn1, LSN lsn2, TIMEOUT time1, TIMEOUT time2; > TIMEOUT-s are not forced to be grouped with LSN-s anymore, which makes > it more clear that all specified TIMEOUTs will be global and will apply > to all LSN-s at once. > > The point of having TIMEOUT is still to let us limit the time of waiting > for LSNs. It's just that with the new syntax, we can also use TIMEOUT > without an LSN. You are right, such a case is equivalent to pg_sleep. > One way to avoid that is to prohibit waiting for TIMEOUT without > specifying an LSN. Do you think we should do that? I think specifying multiple LSNs/TIMEOUTs is kind of ridiculous. We can assume that client application is smart enough to calculate minimum/maximum on its side. When multiple LSNs/TIMEOUTs are specified, what should we wait for? Reaching all the LSNs? Reaching any of LSNs? Are timeouts independent from LSNs or sticked together? So if we didn't manage to reach LSN1 in TIMEOUT1, then we don't wait for LSN2 with TIMEOUT2 (or not)? I think that now we would be fine with single LSN and single TIMEOUT. In future we may add multiple LSNs/TIMEOUTs or/and support for expressions as LSNs/TIMEOUTs if we figure out it's necessary. I also think it's good to couple waiting for lsn with beginning of transaction is good idea. Separate WAIT FOR LSN statement called in the middle of transaction looks problematic for me. Imagine we have RR isolation and already acquired the snapshot. Then out snapshot can block applying wal records, which we are waiting for. That would be implicit deadlock. It would be nice to evade such deadlocks by design. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 2020-04-03 21:51, Anna Akenteva wrote: > I did some code cleanup and added tests - both for the standalone WAIT > FOR statement and for WAIT FOR as a part of BEGIN. The new patch is > attached. I did more cleanup and code optimization on waiting events on latch. And rebase patch. -- Ivan Kartyshov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 2020-04-04 03:14, Alexander Korotkov wrote: > I think that now we would be fine with single LSN and single TIMEOUT. > In future we may add multiple LSNs/TIMEOUTs or/and support for > expressions as LSNs/TIMEOUTs if we figure out it's necessary. > > I also think it's good to couple waiting for lsn with beginning of > transaction is good idea. Separate WAIT FOR LSN statement called in > the middle of transaction looks problematic for me. Imagine we have RR > isolation and already acquired the snapshot. Then out snapshot can > block applying wal records, which we are waiting for. That would be > implicit deadlock. It would be nice to evade such deadlocks by > design. Ok, here is a new version of patch with single LSN and TIMEOUT. Synopsis ========== BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] [WAIT FOR LSN 'lsn' [ TIMEOUT 'value']] and START TRANSACTION [ transaction_mode [, ...] ] [WAIT FOR LSN 'lsn' [ TIMEOUT 'value']] where lsn is result of pg_current_wal_flush_lsn on master. and value is uint time interval in milliseconds. Description ========== BEGIN/START...WAIT FOR - pause the start of transaction until a specified LSN has been replayed. (Don’t open transaction if lsn is not reached on timeout). How to use it ========== WAIT FOR LSN ‘LSN’ [, timeout in ms]; # Before starting transaction, wait until LSN 0/84832E8 is replayed. Wait time is not limited here because a timeout was not specified BEGIN WAIT FOR LSN '0/84832E8'; # Before starting transaction, wait until LSN 0/84832E8 is replayed. Limit the wait time with 10 seconds, and if LSN is not reached by then, don't start the transaction. START TRANSACTION WAIT FOR LSN '0/8DFFB88' TIMEOUT 10000; # Same as previous, but with transaction isolation level = REPEATABLE READ BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ WAIT FOR LSN '0/815C0F1' TIMEOUT 10000; Notice: WAIT FOR will release on PostmasterDeath or Interruption events if they come earlier than LSN or timeout. Testing the implementation ====================== The implementation was tested with src/test/recovery/t/020_begin_wait.pl -- Ivan Kartyshov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Tue, Apr 7, 2020 at 12:58 AM Kartyshov Ivan <i.kartyshov@postgrespro.ru> wrote: > On 2020-04-04 03:14, Alexander Korotkov wrote: > > I think that now we would be fine with single LSN and single TIMEOUT. > > In future we may add multiple LSNs/TIMEOUTs or/and support for > > expressions as LSNs/TIMEOUTs if we figure out it's necessary. > > > > I also think it's good to couple waiting for lsn with beginning of > > transaction is good idea. Separate WAIT FOR LSN statement called in > > the middle of transaction looks problematic for me. Imagine we have RR > > isolation and already acquired the snapshot. Then out snapshot can > > block applying wal records, which we are waiting for. That would be > > implicit deadlock. It would be nice to evade such deadlocks by > > design. > Ok, here is a new version of patch with single LSN and TIMEOUT. I think this quite small feature, which already received quite amount of review. The last version is very pinched. But I think it would be good to commit some very basic version, which is at least some progress in the area and could be extended in future. I'm going to pass trough the code tomorrow and commit this unless I found major issues or somebody objects. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 2020-04-07 00:58, Kartyshov Ivan wrote: > Ok, here is a new version of patch with single LSN and TIMEOUT. I had a look at the code and did some more code cleanup, with Ivan's permission. This is what I did: - Removed "WAIT FOR" command tag from cmdtaglist.h and renamed WaitStmt to WaitClause (since there's no standalone WAIT FOR command anymore) - Added _copyWaitClause() and _equalWaitClause() - Removed unused #include-s from utility.c - Adjusted tests and documentation - Fixed/added some code comments I have a couple of questions about WaitUtility() though: - When waiting forever (due to not specifying a timeout), isn't 60 seconds too long of an interval to check for interrupts? - If we did specify a timeout, it might be a very long one. In this case, shouldn't we also make sure to wake up sometimes to check for interrupts? - Is it OK that specifying timeout = 0 (BEGIN WAIT FOR LSN ... TIMEOUT 0) is the same as not specifying timeout at all? -- Anna Akenteva Postgres Professional: The Russian Postgres Company http://www.postgrespro.com
Attachment
On Tue, Apr 7, 2020 at 7:56 AM Anna Akenteva <a.akenteva@postgrespro.ru> wrote: > > On 2020-04-07 00:58, Kartyshov Ivan wrote: > > Ok, here is a new version of patch with single LSN and TIMEOUT. > > I had a look at the code and did some more code cleanup, with Ivan's > permission. > This is what I did: > - Removed "WAIT FOR" command tag from cmdtaglist.h and renamed WaitStmt > to WaitClause (since there's no standalone WAIT FOR command anymore) > - Added _copyWaitClause() and _equalWaitClause() > - Removed unused #include-s from utility.c > - Adjusted tests and documentation > - Fixed/added some code comments > > I have a couple of questions about WaitUtility() though: > - When waiting forever (due to not specifying a timeout), isn't 60 > seconds too long of an interval to check for interrupts? > - If we did specify a timeout, it might be a very long one. In this > case, shouldn't we also make sure to wake up sometimes to check for > interrupts? > Right, we should probably wait for 100ms before checking the interrupts. See the similar logic in pg_promote where we wait for specified number of seconds. > - Is it OK that specifying timeout = 0 (BEGIN WAIT FOR LSN ... TIMEOUT > 0) is the same as not specifying timeout at all? > Yes that sounds reasonable to me. Review comments: -------------------------- 1. +/* + * Delete wait event of the current backend from the shared memory array. + * + * TODO: Consider state cleanup on backend failure. + * Check: + * 1) nomal|smart|fast|immediate stop + * 2) SIGKILL and SIGTERM + */ +static void +DeleteEvent(void) I don't see how this is implemented or called to handle any errors. For example in function WaitUtility if the WaitLatch errors out due to any error, then the event won't be deleted. I think we can't assume WaitLatch or any other code in this code path will never error out. For ex. WaitLatch---->WaitEventSetWaitBlock() can error out. Also, in future we can add more code which can error out. 2. + /* + * If received an interruption from CHECK_FOR_INTERRUPTS, + * then delete the current event from array. + */ + if (InterruptPending) + { + DeleteEvent(); + ProcessInterrupts(); + } We generally do this type of handling via CHECK_FOR_INTERRUPTS. One reason is that it behaves slightly differently in Windows. I am not sure why we want to do differently here? This looks quite adhoc to me and may not be correct. If we handle this event in error path, then we might not need to do some special handling. 3. +/* + * On WAIT use a latch to wait till LSN is replayed, + * postmaster dies or timeout happens. + * Returns 1 if LSN was reached and 0 otherwise. + */ +int +WaitUtility(XLogRecPtr target_lsn, const float8 secs) Isn't it better to have a return value as bool? IOW, why this function need int as its return value? 4. +#define GetNowFloat() ((float8) GetCurrentTimestamp() / 1000000.0) This same define is used elsewhere in the code as well, may be we can define it in some central place and use it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Apr 7, 2020 at 5:56 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > > On Tue, Apr 7, 2020 at 12:58 AM Kartyshov Ivan > <i.kartyshov@postgrespro.ru> wrote: > > On 2020-04-04 03:14, Alexander Korotkov wrote: > > > I think that now we would be fine with single LSN and single TIMEOUT. > > > In future we may add multiple LSNs/TIMEOUTs or/and support for > > > expressions as LSNs/TIMEOUTs if we figure out it's necessary. > > > > > > I also think it's good to couple waiting for lsn with beginning of > > > transaction is good idea. Separate WAIT FOR LSN statement called in > > > the middle of transaction looks problematic for me. Imagine we have RR > > > isolation and already acquired the snapshot. Then out snapshot can > > > block applying wal records, which we are waiting for. That would be > > > implicit deadlock. It would be nice to evade such deadlocks by > > > design. > > Ok, here is a new version of patch with single LSN and TIMEOUT. > > I think this quite small feature, which already received quite amount > of review. The last version is very pinched. But I think it would be > good to commit some very basic version, which is at least some > progress in the area and could be extended in future. I'm going to > pass trough the code tomorrow and commit this unless I found major > issues or somebody objects. > I have gone through this thread and skimmed through the patch and I am not sure if we can say that this patch is ready to go. First, I don't think we have a consensus on the syntax being used in the patch (various people didn't agree to LSN specific syntax). They wanted a more generic syntax and I see that we tried to implement it and it turns out to be a bit complex but that doesn't mean we just give up on the idea and take the simplest approach and that too without a broader agreement. Second, on my quick review, it seems there are a few things like error handling, interrupt checking which need more work. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2020-04-07 13:32, Amit Kapila wrote: > First, I don't > think we have a consensus on the syntax being used in the patch > (various people didn't agree to LSN specific syntax). They wanted a > more generic syntax and I see that we tried to implement it and it > turns out to be a bit complex but that doesn't mean we just give up on > the idea and take the simplest approach and that too without a broader > agreement. Thank you for your comments! Initially, the syntax used to be "WAITLSN", which confined us with only waiting for LSN-s and not anything else. So we switched to "WAIT FOR LSN", which would allow us to add variations like "WAIT FOR XID" or "WAIT FOR COMMIT TOKEN" in the future if we wanted. A few people seemed to imply that this kind of syntax is expandable enough: On 2018-02-01 14:47, Simon Riggs wrote: > I agree that WAIT LSN is good syntax because this allows us to wait > for something else in future. On 2017-10-31 12:42:56, Ants Aasma wrote: > For lack of a better proposal I would like something along the lines > of: > WAIT FOR state_id[, state_id] [ OPTIONS (..)] As for giving up waiting for multiple events: we can only wait for LSN-s at the moment, and there seems to be no point in waiting for multiple LSN-s at once, because it's equivalent to waiting for the biggest LSN. So we opted for simpler grammar for now, only letting the user specify one LSN and one TIMEOUT. If in the future we allow waiting for something else, like XID-s, we can expand the grammar as needed. What are your own thoughts on the syntax? -- Anna Akenteva Postgres Professional: The Russian Postgres Company http://www.postgrespro.com
On Tue, Apr 7, 2020 at 3:07 PM Anna Akenteva <a.akenteva@postgrespro.ru> wrote: > On 2017-10-31 12:42:56, Ants Aasma wrote: > > For lack of a better proposal I would like something along the lines > > of: > > WAIT FOR state_id[, state_id] [ OPTIONS (..)] > > As for giving up waiting for multiple events: we can only wait for LSN-s > at the moment, and there seems to be no point in waiting for multiple > LSN-s at once, because it's equivalent to waiting for the biggest LSN. > So we opted for simpler grammar for now, only letting the user specify > one LSN and one TIMEOUT. If in the future we allow waiting for something > else, like XID-s, we can expand the grammar as needed. +1 In the latest version of patch we have very brief and simple syntax allowing to wait for given LSN with given timeout. In future we can expand this syntax in different ways. I don't see that current syntax is limiting us from something. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, Apr 7, 2020 at 5:37 PM Anna Akenteva <a.akenteva@postgrespro.ru> wrote: > > On 2020-04-07 13:32, Amit Kapila wrote: > > First, I don't > > think we have a consensus on the syntax being used in the patch > > (various people didn't agree to LSN specific syntax). They wanted a > > more generic syntax and I see that we tried to implement it and it > > turns out to be a bit complex but that doesn't mean we just give up on > > the idea and take the simplest approach and that too without a broader > > agreement. > > Thank you for your comments! > > Initially, the syntax used to be "WAITLSN", which confined us with only > waiting for LSN-s and not anything else. So we switched to "WAIT FOR > LSN", which would allow us to add variations like "WAIT FOR XID" or > "WAIT FOR COMMIT TOKEN" in the future if we wanted. A few people seemed > to imply that this kind of syntax is expandable enough: > > On 2018-02-01 14:47, Simon Riggs wrote: > > I agree that WAIT LSN is good syntax because this allows us to wait > > for something else in future. > > On 2017-10-31 12:42:56, Ants Aasma wrote: > > For lack of a better proposal I would like something along the lines > > of: > > WAIT FOR state_id[, state_id] [ OPTIONS (..)] > > As for giving up waiting for multiple events: we can only wait for LSN-s > at the moment, and there seems to be no point in waiting for multiple > LSN-s at once, because it's equivalent to waiting for the biggest LSN. > So we opted for simpler grammar for now, only letting the user specify > one LSN and one TIMEOUT. If in the future we allow waiting for something > else, like XID-s, we can expand the grammar as needed. > > What are your own thoughts on the syntax? > I don't know how users can specify the LSN value but OTOH I could see if users can somehow provide the correct value of commit LSN for which they want to wait, then it could work out. It is possible that I misread and we have a consensus on WAIT FOR LSN [option] because I saw what Simon and Ants have proposed includes multiple state/events and it might be fine to have just one event for now. Anyone else wants to share an opinion on syntax? I think even if we are good with syntax, I could see the code is not completely ready to go as mentioned in few comments raised by me. I am not sure if we want to commit it in the current form and then improve after feature freeze. If it is possible to fix the loose ends quickly and there are no more comments by anyone then probably we might be able to commit it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2020-04-07 12:58, Amit Kapila wrote: > > Review comments: > 1. > +static void > +DeleteEvent(void) > I don't see how this is implemented or called to handle any errors. > > 2. > + if (InterruptPending) > + { > + DeleteEvent(); > + ProcessInterrupts(); > + } > We generally do this type of handling via CHECK_FOR_INTERRUPTS. > > 3. > +int > +WaitUtility(XLogRecPtr target_lsn, const float8 secs) > Isn't it better to have a return value as bool? > > 4. > +#define GetNowFloat() ((float8) GetCurrentTimestamp() / 1000000.0) > This same define is used elsewhere in the code as well, may be we can > define it in some central place and use it. Thank you for your review! Ivan and I have worked on the patch and tried to address your comments: 0. Now we wake up at least every 100ms to check for interrupts. 1. Now we call DeleteWaitedLSN() from ProcessInterrupts()=>LockErrorCleanup(). It seems that we can only exit the WAIT cycle improperly due to interrupts, so this should be enough (?) 2. Now we use CHECK_FOR_INTERRUPTS() instead of ProcessInterrupts() 3. Now WaitUtility() returns bool rather than int 4. Now GetNowFloat() is only defined at one place in the code What we changed additionally: - Prohibited using WAIT FOR LSN on master - Added more tests - Checked the code with pgindent and adjusted pgindent/typedefs.list - Changed min_lsn's type to pg_atomic_uint64 + fixed how we work with mutex - Code cleanup in wait.[c|h]: cleaned up #include-s, gave better names to functions, changed elog() to ereport() -- Anna Akenteva Postgres Professional: The Russian Postgres Company http://www.postgrespro.com
Attachment
On Tue, Apr 7, 2020 at 10:58 PM Anna Akenteva <a.akenteva@postgrespro.ru> wrote: > Thank you for your review! > Ivan and I have worked on the patch and tried to address your comments: I've pushed this. I promise to do careful post-commit review and resolve any issues arising. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 2020-04-08 00:27, Tom Lane wrote: > Alexander Korotkov <akorotkov@postgresql.org> writes: » WAIT FOR LSN lsn [ TIMEOUT timeout ] > > This seems like a really carelessly chosen syntax —- *three* new > keywords, when you probably didn't need any. Are you not aware that > there is distributed overhead in the grammar for every keyword? > Plus, each new keyword carries the risk of breaking existing > applications, since it no longer works as an alias-not-preceded-by-AS. > To avoid creating new keywords, we could change syntax in the following way: WAIT FOR => DEPENDS ON LSN => EVENT TIMEOUT => WITH INTERVAL So START TRANSACTION WAIT FOR LSN '0/3F07A6B1' TIMEOUT 5000; would instead look as START TRANSACTION DEPENDS ON EVENT '0/3F07A6B1' WITH INTERVAL '5 seconds'; [1] https://www.postgresql.org/message-id/28209.1586294824%40sss.pgh.pa.us -- Ivan Kartyshov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Wed, Apr 8, 2020 at 2:14 AM Kartyshov Ivan <i.kartyshov@postgrespro.ru> wrote: > On 2020-04-08 00:27, Tom Lane wrote: > > Alexander Korotkov <akorotkov@postgresql.org> writes: > » WAIT FOR LSN lsn [ TIMEOUT timeout ] > > > > This seems like a really carelessly chosen syntax —- *three* new > > keywords, when you probably didn't need any. Are you not aware that > > there is distributed overhead in the grammar for every keyword? > > Plus, each new keyword carries the risk of breaking existing > > applications, since it no longer works as an alias-not-preceded-by-AS. > > > > To avoid creating new keywords, we could change syntax in the following > way: > WAIT FOR => DEPENDS ON Looks OK for me. > LSN => EVENT I think it's too generic. Not every event is lsn. TBH, lsn is not event at all :) I wonder is we can still use word lsn, but don't use keyword for that. Can we take arbitrary non-quoted literal there and check it later? > TIMEOUT => WITH INTERVAL I'm not yet sure about this. Probably there are better options. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
At Wed, 8 Apr 2020 02:52:55 +0300, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote in > On Wed, Apr 8, 2020 at 2:14 AM Kartyshov Ivan > <i.kartyshov@postgrespro.ru> wrote: > > On 2020-04-08 00:27, Tom Lane wrote: > > > Alexander Korotkov <akorotkov@postgresql.org> writes: > > » WAIT FOR LSN lsn [ TIMEOUT timeout ] > > > > > > This seems like a really carelessly chosen syntax —- *three* new > > > keywords, when you probably didn't need any. Are you not aware that > > > there is distributed overhead in the grammar for every keyword? > > > Plus, each new keyword carries the risk of breaking existing > > > applications, since it no longer works as an alias-not-preceded-by-AS. > > > > > > > To avoid creating new keywords, we could change syntax in the following > > way: > > WAIT FOR => DEPENDS ON > > Looks OK for me. > > > LSN => EVENT > > I think it's too generic. Not every event is lsn. TBH, lsn is not > event at all :) > > I wonder is we can still use word lsn, but don't use keyword for that. > Can we take arbitrary non-quoted literal there and check it later? > > > TIMEOUT => WITH INTERVAL > > I'm not yet sure about this. Probably there are better options. How about something like the follows. BEGIN AFTER ColId Sconst BEGIN FOLOWING ColId Sconst UNTIL <absolute time>; LIMIT BY <interval>; WITHIN Iconst; regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020-04-08 04:09, Kyotaro Horiguchi wrote: > How about something like the follows. > > BEGIN AFTER ColId Sconst > BEGIN FOLOWING ColId Sconst > > UNTIL <absolute time>; > LIMIT BY <interval>; > WITHIN Iconst; > > regards. I like your suggested keywords! I think that "AFTER" + "WITHIN" sound the most natural. We could completely give up the LSN keyword for now. The final command could look something like: BEGIN AFTER ‘0/303EC60’ WITHIN '5 seconds'; or BEGIN AFTER ‘0/303EC60’ WITHIN 5000; I'd like to hear others' opinions on the syntax as well. -- Anna Akenteva Postgres Professional: The Russian Postgres Company http://www.postgrespro.com
Anna Akenteva <a.akenteva@postgrespro.ru> writes: > I'd like to hear others' opinions on the syntax as well. Pardon me for coming very late to the party, but it seems like there are other questions that ought to be answered before we worry about any of this. Why is this getting grafted onto BEGIN/START TRANSACTION in the first place? It seems like it would be just as useful as a separate command, if not more so. You could always start a transaction just after waiting. Conversely, there might be reasons to want to wait within an already-started transaction. If it could survive as a separate command, then I'd humbly suggest that it requires no grammar work at all. You could just invent one or more functions that take suitable parameters. regards, tom lane
At Wed, 08 Apr 2020 16:35:46 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in > Anna Akenteva <a.akenteva@postgrespro.ru> writes: > > I'd like to hear others' opinions on the syntax as well. > > Pardon me for coming very late to the party, but it seems like there are > other questions that ought to be answered before we worry about any of > this. Why is this getting grafted onto BEGIN/START TRANSACTION in the > first place? It seems like it would be just as useful as a separate > command, if not more so. You could always start a transaction just > after waiting. Conversely, there might be reasons to want to wait > within an already-started transaction. > > If it could survive as a separate command, then I'd humbly suggest > that it requires no grammar work at all. You could just invent one > or more functions that take suitable parameters. The rationale for not being a fmgr function is stated in the following comments. https://www.postgresql.org/message-id/CAEepm%3D0V74EApmfv%3DMGZa24Ac_pV1vGrp3Ovnv-3rUXwxu9epg%40mail.gmail.com | because it doesn't work for our 2 higher isolation levels as | mentioned." https://www.postgresql.org/message-id/CA%2BTgmob-aG3Lqh6OpvMDYTNR5eyq94VugyEejyk7pLhE9uwnyA%40mail.gmail.com | IMHO, trying to do this using a function-based interface is a really | bad idea for exactly the reasons you mention. I don't see why we'd | resist the idea of core syntax here; transactions are a core part of | PostgreSQL. It seemed to me that they were suggested it to be in a part of BEGIN command, but the next proposed patch implemented "WAIT FOR" command for uncertain reasons to me. I don't object to the isolate command if it is useful than a part of BEGIN command. By the way, for example, pg_current_wal_lsn() is a volatile function and repeated calls within a SERIALIZABLE transaction can return different values. If there's no necessity for this feature to be a core command, I think I would like to be it a function. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020/04/09 16:11, Kyotaro Horiguchi wrote: > At Wed, 08 Apr 2020 16:35:46 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in >> Anna Akenteva <a.akenteva@postgrespro.ru> writes: >>> I'd like to hear others' opinions on the syntax as well. >> >> Pardon me for coming very late to the party, but it seems like there are >> other questions that ought to be answered before we worry about any of >> this. Why is this getting grafted onto BEGIN/START TRANSACTION in the >> first place? It seems like it would be just as useful as a separate >> command, if not more so. You could always start a transaction just >> after waiting. Conversely, there might be reasons to want to wait >> within an already-started transaction. >> >> If it could survive as a separate command, then I'd humbly suggest >> that it requires no grammar work at all. You could just invent one >> or more functions that take suitable parameters. > > The rationale for not being a fmgr function is stated in the following > comments. This issue happens because the function is executed after BEGIN? If yes, what about executing the function (i.e., as separate transaction) before BEGIN? If so, the snapshot taken in the function doesn't affect the subsequent transaction whatever its isolation level is. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Fujii Masao <masao.fujii@oss.nttdata.com> writes: > On 2020/04/09 16:11, Kyotaro Horiguchi wrote: >> At Wed, 08 Apr 2020 16:35:46 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in >>> Why is this getting grafted onto BEGIN/START TRANSACTION in the >>> first place? >> The rationale for not being a fmgr function is stated in the following >> comments. [...] > This issue happens because the function is executed after BEGIN? If yes, > what about executing the function (i.e., as separate transaction) before BEGIN? > If so, the snapshot taken in the function doesn't affect the subsequent > transaction whatever its isolation level is. I wonder whether making it a procedure, rather than a plain function, would help any. regards, tom lane
On 2020-04-09 16:33, Tom Lane wrote: > Fujii Masao <masao.fujii@oss.nttdata.com> writes: >> On 2020/04/09 16:11, Kyotaro Horiguchi wrote: >>> At Wed, 08 Apr 2020 16:35:46 -0400, Tom Lane <tgl@sss.pgh.pa.us> >>> wrote in >>>> Why is this getting grafted onto BEGIN/START TRANSACTION in the >>>> first place? > >>> The rationale for not being a fmgr function is stated in the >>> following >>> comments. [...] > >> This issue happens because the function is executed after BEGIN? If >> yes, >> what about executing the function (i.e., as separate transaction) >> before BEGIN? >> If so, the snapshot taken in the function doesn't affect the >> subsequent >> transaction whatever its isolation level is. > > I wonder whether making it a procedure, rather than a plain function, > would help any. > Just another idea in case if one will still decide to go with a separate statement + BEGIN integration instead of a function. We could use parenthesized options list here. This is already implemented for VACUUM, REINDEX, etc. There was an idea to allow CONCURRENTLY in REINDEX there [1] and recently this was proposed again for new options [2], since it is much more extensible from the grammar perspective. That way, the whole feature may look like: WAIT (LSN '16/B374D848', TIMEOUT 100); and/or BEGIN WAIT (LSN '16/B374D848', WHATEVER_OPTION_YOU_WANT); ... COMMIT; It requires only one reserved keyword 'WAIT'. The advantage of this approach is that it can be extended to support xid, timestamp, csn or anything else, that may be invented in the future, without affecting the grammar. What do you think? Personally, I find this syntax to be more convenient and human-readable compared with function call: SELECT pg_wait_for_lsn('16/B374D848'); BEGIN; [1] https://www.postgresql.org/message-id/aad2ec49-5142-7356-ffb2-a9b2649cdd1f%402ndquadrant.com [2] https://www.postgresql.org/message-id/20200401060334.GB142683%40paquier.xyz Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
On 2020/04/10 3:16, Alexey Kondratov wrote: > On 2020-04-09 16:33, Tom Lane wrote: >> Fujii Masao <masao.fujii@oss.nttdata.com> writes: >>> On 2020/04/09 16:11, Kyotaro Horiguchi wrote: >>>> At Wed, 08 Apr 2020 16:35:46 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in >>>>> Why is this getting grafted onto BEGIN/START TRANSACTION in the >>>>> first place? >> >>>> The rationale for not being a fmgr function is stated in the following >>>> comments. [...] >> >>> This issue happens because the function is executed after BEGIN? If yes, >>> what about executing the function (i.e., as separate transaction) before BEGIN? >>> If so, the snapshot taken in the function doesn't affect the subsequent >>> transaction whatever its isolation level is. >> >> I wonder whether making it a procedure, rather than a plain function, >> would help any. >> > > Just another idea in case if one will still decide to go with a separate statement + BEGIN integration instead of a function.We could use parenthesized options list here. This is already implemented for VACUUM, REINDEX, etc. There was anidea to allow CONCURRENTLY in REINDEX there [1] and recently this was proposed again for new options [2], since it is muchmore extensible from the grammar perspective. > > That way, the whole feature may look like: > > WAIT (LSN '16/B374D848', TIMEOUT 100); > > and/or > > BEGIN > WAIT (LSN '16/B374D848', WHATEVER_OPTION_YOU_WANT); > ... > COMMIT; > > It requires only one reserved keyword 'WAIT'. The advantage of this approach is that it can be extended to support xid,timestamp, csn or anything else, that may be invented in the future, without affecting the grammar. > > What do you think? > > Personally, I find this syntax to be more convenient and human-readable compared with function call: > > SELECT pg_wait_for_lsn('16/B374D848'); > BEGIN; I can imagine that some users want to specify the LSN to wait for, from the result of another query, for example, SELECT pg_wait_for_lsn(lsn) FROM xxx. If this is valid use case, isn't the function better? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020-04-10 05:25, Fujii Masao wrote: > On 2020/04/10 3:16, Alexey Kondratov wrote: >> Just another idea in case if one will still decide to go with a >> separate statement + BEGIN integration instead of a function. We could >> use parenthesized options list here. This is already implemented for >> VACUUM, REINDEX, etc. There was an idea to allow CONCURRENTLY in >> REINDEX there [1] and recently this was proposed again for new options >> [2], since it is much more extensible from the grammar perspective. >> >> That way, the whole feature may look like: >> >> WAIT (LSN '16/B374D848', TIMEOUT 100); >> >> and/or >> >> BEGIN >> WAIT (LSN '16/B374D848', WHATEVER_OPTION_YOU_WANT); >> ... >> COMMIT; >> >> It requires only one reserved keyword 'WAIT'. The advantage of this >> approach is that it can be extended to support xid, timestamp, csn or >> anything else, that may be invented in the future, without affecting >> the grammar. >> >> What do you think? >> >> Personally, I find this syntax to be more convenient and >> human-readable compared with function call: >> >> SELECT pg_wait_for_lsn('16/B374D848'); >> BEGIN; > > I can imagine that some users want to specify the LSN to wait for, > from the result of another query, for example, > SELECT pg_wait_for_lsn(lsn) FROM xxx. If this is valid use case, > isn't the function better? > I think that the main purpose of the feature is to achieve read-your-writes-consistency, while using async replica for reads. In that case lsn of last modification is stored inside application, so there is no need to do any query for that. Moreover, you cannot store this lsn inside database, since reads are distributed across all replicas (+ primary). Thus, I could imagine that 'xxx' in your example states for some kind of stored procedure, that fetches lsn from the off-postgres storage, but it looks like very narrow case to count on it, doesn't it? Anyway, I am not against implementing this as a function. That was just another option to consider. Just realized that the last patch I have seen does not allow usage of wait on primary. It may be a problem if reads are pooled not only across replicas, but on primary as well, which should be quite usual I guess. In that case application does not know either request will be processed on replica, or on primary. I think it should be allowed without any warning, or just saying some LOG/DEBUG at most, that there was no waiting performed. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
Hi, On 2020-04-10 11:25:02 +0900, Fujii Masao wrote: > > BEGIN > > WAIT (LSN '16/B374D848', WHATEVER_OPTION_YOU_WANT); > > ... > > COMMIT; > > > > It requires only one reserved keyword 'WAIT'. The advantage of this approach is that it can be extended to support xid,timestamp, csn or anything else, that may be invented in the future, without affecting the grammar. > > > > What do you think? > > > > Personally, I find this syntax to be more convenient and human-readable compared with function call: > > > > SELECT pg_wait_for_lsn('16/B374D848'); > > BEGIN; > > I can imagine that some users want to specify the LSN to wait for, > from the result of another query, for example, > SELECT pg_wait_for_lsn(lsn) FROM xxx. If this is valid use case, > isn't the function better? I don't think a function is a good idea - it'll cause a snapshot to be held while waiting. Which in turn will cause hot_standby_feedback to not be able to report an increased xmin up. And it will possibly hit snapshot recovery conflicts. Whereas explicit syntax, especially if a transaction control statement, won't have that problem. I'd personally look at 'AFTER' instead of 'WAIT'. Upthread you talked about a reserved keyword - why does it have to be reserved? FWIW, I'm not really convinced there needs to be bespoke timeout syntax for this feature. I can see reasons why you'd not just want to rely on statement_timeout, but at least that should be discussed. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > I don't think a function is a good idea - it'll cause a snapshot to be > held while waiting. Which in turn will cause hot_standby_feedback to not > be able to report an increased xmin up. And it will possibly hit > snapshot recovery conflicts. Good point, but we could address that by making it a procedure no? regards, tom lane
Hi, On 2020-04-10 16:29:39 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I don't think a function is a good idea - it'll cause a snapshot to be > > held while waiting. Which in turn will cause hot_standby_feedback to not > > be able to report an increased xmin up. And it will possibly hit > > snapshot recovery conflicts. > > Good point, but we could address that by making it a procedure no? Probably. Don't think we have great infrastructure for builtin procedures yet though? We'd presumably not want to use plpgsql. ISTM that we can make it BEGIN AFTER 'xx/xx' or such, which'd not require any keywords, it'd be easier to use than a procedure. With a separate procedure, you'd likely need more roundtrips / complex logic at the client. You either need to check first if the procedure errored ou, and then send the BEGIN, or send both together and separate out potential errors. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2020-04-10 16:29:39 -0400, Tom Lane wrote: >> Good point, but we could address that by making it a procedure no? > Probably. Don't think we have great infrastructure for builtin > procedures yet though? We'd presumably not want to use plpgsql. Don't think anyone's tried yet. It's not instantly clear that the amount of code needed would be more than comes along with new syntax, though. > ISTM that we can make it BEGIN AFTER 'xx/xx' or such, which'd not > require any keywords, it'd be easier to use than a procedure. I still don't see a good argument for tying this to BEGIN. If it has to be a statement, why not a standalone statement? (I also have a lurking suspicion that this shouldn't be SQL at all but part of the replication command set.) regards, tom lane
Hi, On 2020-04-10 17:17:10 -0400, Tom Lane wrote: > > ISTM that we can make it BEGIN AFTER 'xx/xx' or such, which'd not > > require any keywords, it'd be easier to use than a procedure. > > I still don't see a good argument for tying this to BEGIN. If it > has to be a statement, why not a standalone statement? Because the goal is to start a transaction where a certain action from the primary is visible. I think there's also some advantages of having it in a single statement for poolers. If a pooler analyzes BEGIN AFTER 'xx/xx' it could e.g. redirect the transaction to a node that's caught up far enough, instead of blocking. But that can't work even close to as easily if it's something that has to be executed before transaction begin. > (I also have a lurking suspicion that this shouldn't be SQL at all > but part of the replication command set.) Hm? I'm not quite following. The feature is useful to achieve read-your-own-writes consistency. Consider Primary: INSERT INTO app_users ...; SELECT pg_current_wal_lsn(); Standby: BEGIN AFTER 'returned/lsn'; Standby: SELECT i_am_a_set_of_very_expensive_queries FROM ..., app_users; without the AFTER/WAIT whatnot, you cannot rely on the insert having been replicated to the standby. Offloading queries from the write node to replicas is a pretty standard technique for scaling out databases (including PG). We just make it harder than necessary. How would this be part of the replication command set? This shouldn't require replication permissions for the user executing the queries. While I'm in favor of merging the replication protocol entirely with the normal protocol, I've so far received very little support for that proposition... Greetings, Andres Freund
On 2020-04-11 00:44, Andres Freund wrote: > I think there's also some advantages of having it in a single statement > for poolers. If a pooler analyzes BEGIN AFTER 'xx/xx' it could > e.g. redirect the transaction to a node that's caught up far enough, > instead of blocking. But that can't work even close to as easily if > it's > something that has to be executed before transaction begin. > I think that's a good point. Also, I'm not sure how we'd expect a wait-for-LSN procedure to work inside a single-snapshot transaction. Would it throw an error inside a RR transaction block? Would it give a warning? -- Anna Akenteva Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
This patch require some rewording of documentation/comments and variable names after the language change introduced by 229f8c219f8f..a9a4a7ad565b, the thread below can be used as reference for how to change: https://www.postgresql.org/message-id/flat/20200615182235.x7lch5n6kcjq4aue%40alap3.anarazel.de cheers ./daniel
On 2020-07-13 14:21, Daniel Gustafsson wrote: > This patch require some rewording of documentation/comments and > variable names > after the language change introduced by 229f8c219f8f..a9a4a7ad565b, the > thread > below can be used as reference for how to change: > > https://www.postgresql.org/message-id/flat/20200615182235.x7lch5n6kcjq4aue%40alap3.anarazel.de > Thank you for the heads up! I updated the most recent patch and removed the use of "master" from it, replacing it with "primary". -- Anna Akenteva Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Tue, Aug 18, 2020 at 01:12:51PM +0300, Anna Akenteva wrote: > I updated the most recent patch and removed the use of "master" from it, > replacing it with "primary". This is failing to apply lately, causing the CF bot to complain: http://cfbot.cputube.org/patch_29_772.log -- Michael
Attachment
Re: [HACKERS] make async slave to wait for lsn to be replayed
From
a.pervushina@postgrespro.ru
Date:
Anna Akenteva писал 2020-04-08 22:36: > On 2020-04-08 04:09, Kyotaro Horiguchi wrote: > > I like your suggested keywords! I think that "AFTER" + "WITHIN" sound > the most natural. We could completely give up the LSN keyword for now. > The final command could look something like: > > BEGIN AFTER ‘0/303EC60’ WITHIN '5 seconds'; > or > BEGIN AFTER ‘0/303EC60’ WITHIN 5000; Hello, I've changed the syntax of the command from BEGIN [ WAIT FOR LSN value [ TIMEOUT delay ]] to BEGIN [ AFTER value [ WITHIN delay ]] and removed all the unnecessary keywords. Best regards, Alexandra Pervushina.
Attachment
Re: [HACKERS] make async slave to wait for lsn to be replayed
From
a.pervushina@postgrespro.ru
Date:
Hello, I've changed the BEGIN WAIT FOR LSN statement to core functions pg_waitlsn, pg_waitlsn_infinite and pg_waitlsn_no_wait. Currently the functions work inside repeatable read transactions, but waitlsn creates a snapshot if called first in a transaction block, which can possibly lead the transaction to working incorrectly, so the function gives a warning. Usage examples ========== select pg_waitlsn(‘LSN’, timeout); select pg_waitlsn_infinite(‘LSN’); select pg_waitlsn_no_wait(‘LSN’);
Attachment
Hello. At Wed, 18 Nov 2020 15:05:00 +0300, a.pervushina@postgrespro.ru wrote in > I've changed the BEGIN WAIT FOR LSN statement to core functions > pg_waitlsn, pg_waitlsn_infinite and pg_waitlsn_no_wait. > Currently the functions work inside repeatable read transactions, but > waitlsn creates a snapshot if called first in a transaction block, > which can possibly lead the transaction to working incorrectly, so the > function gives a warning. According to the discuttion here, implementing as functions is not optimal. As a Poc, I made it as a procedure. However I'm not sure it is the correct implement as a native procedure but it seems working as expected. > Usage examples > ========== > select pg_waitlsn(‘LSN’, timeout); > select pg_waitlsn_infinite(‘LSN’); > select pg_waitlsn_no_wait(‘LSN’); The first and second usage is coverd by a single procedure. The last function is equivalent to pg_last_wal_replay_lsn(). As the result, the following procedure is provided in the attached. pg_waitlsn(wait_lsn pg_lsn, timeout integer DEFAULT -1) Any opinions mainly compared to implementation as a command? regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 470e113b33..4283b98eb4 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -42,6 +42,7 @@ #include "catalog/pg_database.h" #include "commands/progress.h" #include "commands/tablespace.h" +#include "commands/wait.h" #include "common/controldata_utils.h" #include "executor/instrument.h" #include "miscadmin.h" @@ -7463,6 +7464,15 @@ StartupXLOG(void) break; } + /* + * If we replayed an LSN that someone was waiting for, + * set latches in shared memory array to notify the waiter. + */ + if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN()) + { + WaitSetLatch(XLogCtl->lastReplayedEndRecPtr); + } + /* Else, try to fetch the next WAL record */ record = ReadRecord(xlogreader, LOG, false); } while (record != NULL); diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index fa58afd9d7..c19d49e7a4 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1460,6 +1460,10 @@ LANGUAGE internal STRICT IMMUTABLE PARALLEL SAFE AS 'unicode_is_normalized'; +CREATE OR REPLACE PROCEDURE + pg_waitlsn(wait_lsn pg_lsn, timeout integer DEFAULT -1) + LANGUAGE internal AS 'pg_waitlsn'; + -- -- The default permissions for functions mean that anyone can execute them. -- A number of functions shouldn't be executable by just anyone, but rather diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile index e8504f0ae4..2c0bd41336 100644 --- a/src/backend/commands/Makefile +++ b/src/backend/commands/Makefile @@ -60,6 +60,7 @@ OBJS = \ user.o \ vacuum.o \ variable.o \ - view.o + view.o \ + wait.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index f9bbe97b50..959e96b7e0 100644 --- a/src/backend/storage/ipc/ipci.c +++ b/src/backend/storage/ipc/ipci.c @@ -23,6 +23,7 @@ #include "access/syncscan.h" #include "access/twophase.h" #include "commands/async.h" +#include "commands/wait.h" #include "miscadmin.h" #include "pgstat.h" #include "postmaster/autovacuum.h" @@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void) size = add_size(size, BTreeShmemSize()); size = add_size(size, SyncScanShmemSize()); size = add_size(size, AsyncShmemSize()); + size = add_size(size, WaitShmemSize()); #ifdef EXEC_BACKEND size = add_size(size, ShmemBackendArraySize()); #endif @@ -268,6 +270,11 @@ CreateSharedMemoryAndSemaphores(void) SyncScanShmemInit(); AsyncShmemInit(); + /* + * Init array of events for the wait clause in shared memory + */ + WaitShmemInit(); + #ifdef EXEC_BACKEND /* diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index c87ffc6549..2b4d73ba2f 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -38,6 +38,7 @@ #include "access/transam.h" #include "access/twophase.h" #include "access/xact.h" +#include "commands/wait.h" #include "miscadmin.h" #include "pgstat.h" #include "postmaster/autovacuum.h" @@ -713,6 +714,9 @@ LockErrorCleanup(void) AbortStrongLockAcquire(); + /* If waitlsn was interrupted, then stop waiting for that LSN */ + DeleteWaitedLSN(); + /* Nothing to do if we weren't waiting for a lock */ if (lockAwaited == NULL) { diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c index 4096faff9a..90876da120 100644 --- a/src/backend/utils/adt/misc.c +++ b/src/backend/utils/adt/misc.c @@ -373,8 +373,6 @@ pg_sleep(PG_FUNCTION_ARGS) * less than the specified time when WaitLatch is terminated early by a * non-query-canceling signal such as SIGHUP. */ -#define GetNowFloat() ((float8) GetCurrentTimestamp() / 1000000.0) - endtime = GetNowFloat() + secs; for (;;) diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index b5f52d4e4a..918eaedfd5 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -11375,4 +11375,8 @@ proname => 'is_normalized', prorettype => 'bool', proargtypes => 'text text', prosrc => 'unicode_is_normalized' }, +{ oid => '9313', descr => 'wait for LSN to be replayed', + proname => 'pg_waitlsn', prokind => 'p',prorettype => 'void', proargtypes => 'pg_lsn int4', + proargnames => '{wait_lsn,timeout}', + prosrc => 'pg_waitlsn' } ] diff --git a/src/include/utils/timestamp.h b/src/include/utils/timestamp.h index 63bf71ac61..6c4ecd704d 100644 --- a/src/include/utils/timestamp.h +++ b/src/include/utils/timestamp.h @@ -113,4 +113,6 @@ extern int date2isoyearday(int year, int mon, int mday); extern bool TimestampTimestampTzRequiresRewrite(void); +#define GetNowFloat() ((float8) GetCurrentTimestamp() / 1000000.0) + #endif /* TIMESTAMP_H */
On Thu, Jan 21, 2021 at 1:30 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
Hello.
At Wed, 18 Nov 2020 15:05:00 +0300, a.pervushina@postgrespro.ru wrote in
> I've changed the BEGIN WAIT FOR LSN statement to core functions
> pg_waitlsn, pg_waitlsn_infinite and pg_waitlsn_no_wait.
> Currently the functions work inside repeatable read transactions, but
> waitlsn creates a snapshot if called first in a transaction block,
> which can possibly lead the transaction to working incorrectly, so the
> function gives a warning.
According to the discuttion here, implementing as functions is not
optimal. As a Poc, I made it as a procedure. However I'm not sure it
is the correct implement as a native procedure but it seems working as
expected.
> Usage examples
> ==========
> select pg_waitlsn(‘LSN’, timeout);
> select pg_waitlsn_infinite(‘LSN’);
> select pg_waitlsn_no_wait(‘LSN’);
The first and second usage is coverd by a single procedure. The last
function is equivalent to pg_last_wal_replay_lsn(). As the result, the
following procedure is provided in the attached.
pg_waitlsn(wait_lsn pg_lsn, timeout integer DEFAULT -1)
Any opinions mainly compared to implementation as a command?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
The patch (pg_waitlsn_v10_2_kh.patch) does not compile successfully and has compilation errors. Can you please take a look?
xlog.c:45:10: fatal error: commands/wait.h: No such file or directory
#include "commands/wait.h"
^~~~~~~~~~~~~~~~~
compilation terminated.
make[4]: *** [<builtin>: xlog.o] Error 1
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [../../../src/backend/common.mk:39: transam-recursive] Error 2
make[2]: *** [common.mk:39: access-recursive] Error 2
make[1]: *** [Makefile:42: all-backend-recurse] Error 2
make: *** [GNUmakefile:11: all-src-recurse] Error 2
#include "commands/wait.h"
^~~~~~~~~~~~~~~~~
compilation terminated.
make[4]: *** [<builtin>: xlog.o] Error 1
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [../../../src/backend/common.mk:39: transam-recursive] Error 2
make[2]: *** [common.mk:39: access-recursive] Error 2
make[1]: *** [Makefile:42: all-backend-recurse] Error 2
make: *** [GNUmakefile:11: all-src-recurse] Error 2
I am changing the status to "Waiting on Author"
Ibrar Ahmed
At Thu, 18 Mar 2021 18:57:15 +0500, Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote in > On Thu, Jan 21, 2021 at 1:30 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> > wrote: > > > Hello. > > > > At Wed, 18 Nov 2020 15:05:00 +0300, a.pervushina@postgrespro.ru wrote in > > > I've changed the BEGIN WAIT FOR LSN statement to core functions > > > pg_waitlsn, pg_waitlsn_infinite and pg_waitlsn_no_wait. > > > Currently the functions work inside repeatable read transactions, but > > > waitlsn creates a snapshot if called first in a transaction block, > > > which can possibly lead the transaction to working incorrectly, so the > > > function gives a warning. > > > > According to the discuttion here, implementing as functions is not > > optimal. As a Poc, I made it as a procedure. However I'm not sure it > > is the correct implement as a native procedure but it seems working as > > expected. > > > > > Usage examples > > > ========== > > > select pg_waitlsn(‘LSN’, timeout); > > > select pg_waitlsn_infinite(‘LSN’); > > > select pg_waitlsn_no_wait(‘LSN’); > > > > The first and second usage is coverd by a single procedure. The last > > function is equivalent to pg_last_wal_replay_lsn(). As the result, the > > following procedure is provided in the attached. > > > > pg_waitlsn(wait_lsn pg_lsn, timeout integer DEFAULT -1) > > > > Any opinions mainly compared to implementation as a command? > > > > regards. > > > > -- > > Kyotaro Horiguchi > > NTT Open Source Software Center > > > > The patch (pg_waitlsn_v10_2_kh.patch) does not compile successfully and has > compilation errors. Can you please take a look? > > https://cirrus-ci.com/task/6241565996744704 > > xlog.c:45:10: fatal error: commands/wait.h: No such file or directory > #include "commands/wait.h" > ^~~~~~~~~~~~~~~~~ > compilation terminated. > make[4]: *** [<builtin>: xlog.o] Error 1 > make[4]: *** Waiting for unfinished jobs.... > make[3]: *** [../../../src/backend/common.mk:39: transam-recursive] Error 2 > make[2]: *** [common.mk:39: access-recursive] Error 2 > make[1]: *** [Makefile:42: all-backend-recurse] Error 2 > make: *** [GNUmakefile:11: all-src-recurse] Error 2 > > I am changing the status to "Waiting on Author" Anna is the autor. The "patch" was just to show how we can implement the feature as a procedure. (Sorry for the bad mistake I made.) The patch still applies to the master. So I resend just rebased version as v10_2, and attached the "PoC" as *.txt which applies on top of the patch. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 6f8810e149..3c580083dd 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -42,6 +42,7 @@ #include "catalog/pg_database.h" #include "commands/progress.h" #include "commands/tablespace.h" +#include "commands/wait.h" #include "common/controldata_utils.h" #include "executor/instrument.h" #include "miscadmin.h" @@ -7535,6 +7536,15 @@ StartupXLOG(void) break; } + /* + * If we replayed an LSN that someone was waiting for, + * set latches in shared memory array to notify the waiter. + */ + if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN()) + { + WaitSetLatch(XLogCtl->lastReplayedEndRecPtr); + } + /* Else, try to fetch the next WAL record */ record = ReadRecord(xlogreader, LOG, false); } while (record != NULL); diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile index e8504f0ae4..2c0bd41336 100644 --- a/src/backend/commands/Makefile +++ b/src/backend/commands/Makefile @@ -60,6 +60,7 @@ OBJS = \ user.o \ vacuum.o \ variable.o \ - view.o + view.o \ + wait.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c new file mode 100644 index 0000000000..1f2483672b --- /dev/null +++ b/src/backend/commands/wait.c @@ -0,0 +1,297 @@ +/*------------------------------------------------------------------------- + * + * wait.c + * Implements waitlsn, which allows waiting for events such as + * LSN having been replayed on replica. + * + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group + * Portions Copyright (c) 2020, Regents of PostgresPro + * + * IDENTIFICATION + * src/backend/commands/wait.c + * + *------------------------------------------------------------------------- + */ +#include "postgres.h" + +#include <math.h> + +#include "access/xact.h" +#include "access/xlog.h" +#include "access/xlogdefs.h" +#include "commands/wait.h" +#include "funcapi.h" +#include "miscadmin.h" +#include "pgstat.h" +#include "storage/backendid.h" +#include "storage/pmsignal.h" +#include "storage/proc.h" +#include "storage/shmem.h" +#include "storage/sinvaladt.h" +#include "storage/spin.h" +#include "utils/builtins.h" +#include "utils/pg_lsn.h" +#include "utils/timestamp.h" + +/* Add to shared memory array */ +static void AddWaitedLSN(XLogRecPtr lsn_to_wait); + +/* Shared memory structure */ +typedef struct +{ + int backend_maxid; + pg_atomic_uint64 min_lsn; /* XLogRecPtr of minimal waited for LSN */ + slock_t mutex; + /* LSNs that different backends are waiting */ + XLogRecPtr lsn[FLEXIBLE_ARRAY_MEMBER]; +} WaitState; + +static WaitState *state; + +/* + * Add the wait event of the current backend to shared memory array + */ +static void +AddWaitedLSN(XLogRecPtr lsn_to_wait) +{ + SpinLockAcquire(&state->mutex); + if (state->backend_maxid < MyBackendId) + state->backend_maxid = MyBackendId; + + state->lsn[MyBackendId] = lsn_to_wait; + + if (lsn_to_wait < state->min_lsn.value) + state->min_lsn.value = lsn_to_wait; + SpinLockRelease(&state->mutex); +} + +/* + * Delete wait event of the current backend from the shared memory array. + */ +void +DeleteWaitedLSN(void) +{ + int i; + XLogRecPtr lsn_to_delete; + + SpinLockAcquire(&state->mutex); + + lsn_to_delete = state->lsn[MyBackendId]; + state->lsn[MyBackendId] = InvalidXLogRecPtr; + + /* If we are deleting the minimal LSN, then choose the next min_lsn */ + if (lsn_to_delete != InvalidXLogRecPtr && + lsn_to_delete == state->min_lsn.value) + { + state->min_lsn.value = PG_UINT64_MAX; + for (i = 2; i <= state->backend_maxid; i++) + if (state->lsn[i] != InvalidXLogRecPtr && + state->lsn[i] < state->min_lsn.value) + state->min_lsn.value = state->lsn[i]; + } + + /* If deleting from the end of the array, shorten the array's used part */ + if (state->backend_maxid == MyBackendId) + for (i = (MyBackendId); i >= 2; i--) + if (state->lsn[i] != InvalidXLogRecPtr) + { + state->backend_maxid = i; + break; + } + + SpinLockRelease(&state->mutex); +} + +/* + * Report amount of shared memory space needed for WaitState + */ +Size +WaitShmemSize(void) +{ + Size size; + + size = offsetof(WaitState, lsn); + size = add_size(size, mul_size(MaxBackends + 1, sizeof(XLogRecPtr))); + return size; +} + +/* + * Initialize an array of events to wait for in shared memory + */ +void +WaitShmemInit(void) +{ + bool found; + uint32 i; + + state = (WaitState *) ShmemInitStruct("pg_wait_lsn", + WaitShmemSize(), + &found); + if (!found) + { + SpinLockInit(&state->mutex); + + for (i = 0; i < (MaxBackends + 1); i++) + state->lsn[i] = InvalidXLogRecPtr; + + state->backend_maxid = 0; + state->min_lsn.value = PG_UINT64_MAX; + } +} + +/* + * Set latches in shared memory to signal that new LSN has been replayed + */ +void +WaitSetLatch(XLogRecPtr cur_lsn) +{ + uint32 i; + int backend_maxid; + PGPROC *backend; + + SpinLockAcquire(&state->mutex); + backend_maxid = state->backend_maxid; + + for (i = 2; i <= backend_maxid; i++) + { + backend = BackendIdGetProc(i); + + if (backend && state->lsn[i] != 0 && + state->lsn[i] <= cur_lsn) + { + SetLatch(&backend->procLatch); + } + } + SpinLockRelease(&state->mutex); +} + +/* + * Get minimal LSN that someone waits for + */ +XLogRecPtr +GetMinWaitedLSN(void) +{ + return state->min_lsn.value; +} + +/* + * On WAIT use a latch to wait till LSN is replayed, + * postmaster dies or timeout happens. Timeout is specified in milliseconds. + * Returns true if LSN was reached and false otherwise. + */ +bool +WaitUtility(XLogRecPtr target_lsn, const int timeout_ms) +{ + XLogRecPtr cur_lsn = GetXLogReplayRecPtr(NULL); + int latch_events; + float8 endtime; + bool res = false; + bool wait_forever = (timeout_ms <= 0); + + if (!RecoveryInProgress()) { + ereport(ERROR, + errmsg("Cannot use waitlsn on primary")); + return false; + } + + /* + * In transactions, that have isolation level repeatable read or higher + * waitlsn creates a snapshot if called first in a block, which can + * lead the transaction to working incorrectly + */ + + if (IsTransactionBlock() && XactIsoLevel != XACT_READ_COMMITTED) { + ereport(WARNING, + errmsg("Waitlsn may work incorrectly in this isolation level"), + errhint("Call waitlsn before starting the transaction")); + } + + endtime = GetNowFloat() + timeout_ms / 1000.0; + + latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH; + + /* Check if we already reached the needed LSN */ + if (cur_lsn >= target_lsn) + return true; + + AddWaitedLSN(target_lsn); + + for (;;) + { + int rc; + float8 time_left = 0; + long time_left_ms = 0; + + time_left = endtime - GetNowFloat(); + + /* Use 100 ms as the default timeout to check for interrupts */ + if (wait_forever || time_left < 0 || time_left > 0.1) + time_left_ms = 100; + else + time_left_ms = (long) ceil(time_left * 1000.0); + + /* If interrupt, LockErrorCleanup() will do DeleteWaitedLSN() for us */ + CHECK_FOR_INTERRUPTS(); + + /* If postmaster dies, finish immediately */ + if (!PostmasterIsAlive()) + break; + + rc = WaitLatch(MyLatch, latch_events, time_left_ms, + WAIT_EVENT_CLIENT_READ); + + ResetLatch(MyLatch); + + if (rc & WL_LATCH_SET) + cur_lsn = GetXLogReplayRecPtr(NULL); + + if (rc & WL_TIMEOUT) + { + cur_lsn = GetXLogReplayRecPtr(NULL); + /* If the time specified by user has passed, stop waiting */ + time_left = endtime - GetNowFloat(); + if (!wait_forever && time_left <= 0.0) + break; + } + + /* If LSN has been replayed */ + if (target_lsn <= cur_lsn) + break; + } + + DeleteWaitedLSN(); + + if (cur_lsn < target_lsn) + ereport(WARNING, + errmsg("LSN was not reached"), + errhint("Try to increase wait time.")); + else + res = true; + + return res; +} + +Datum +pg_waitlsn(PG_FUNCTION_ARGS) +{ + XLogRecPtr trg_lsn = PG_GETARG_LSN(0); + uint64_t delay = PG_GETARG_INT32(1); + + PG_RETURN_BOOL(WaitUtility(trg_lsn, delay)); +} + +Datum +pg_waitlsn_infinite(PG_FUNCTION_ARGS) +{ + XLogRecPtr trg_lsn = PG_GETARG_LSN(0); + + PG_RETURN_BOOL(WaitUtility(trg_lsn, 0)); +} + +Datum +pg_waitlsn_no_wait(PG_FUNCTION_ARGS) +{ + XLogRecPtr trg_lsn = PG_GETARG_LSN(0); + + PG_RETURN_BOOL(WaitUtility(trg_lsn, 1)); +} \ No newline at end of file diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index 3e4ec53a97..fb8f8588a7 100644 --- a/src/backend/storage/ipc/ipci.c +++ b/src/backend/storage/ipc/ipci.c @@ -23,6 +23,7 @@ #include "access/syncscan.h" #include "access/twophase.h" #include "commands/async.h" +#include "commands/wait.h" #include "miscadmin.h" #include "pgstat.h" #include "postmaster/autovacuum.h" @@ -150,6 +151,7 @@ CreateSharedMemoryAndSemaphores(void) size = add_size(size, BTreeShmemSize()); size = add_size(size, SyncScanShmemSize()); size = add_size(size, AsyncShmemSize()); + size = add_size(size, WaitShmemSize()); #ifdef EXEC_BACKEND size = add_size(size, ShmemBackendArraySize()); #endif @@ -270,6 +272,11 @@ CreateSharedMemoryAndSemaphores(void) SyncScanShmemInit(); AsyncShmemInit(); + /* + * Init array of events for the wait clause in shared memory + */ + WaitShmemInit(); + #ifdef EXEC_BACKEND /* diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index 897045ee27..540991146a 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -38,6 +38,7 @@ #include "access/transam.h" #include "access/twophase.h" #include "access/xact.h" +#include "commands/wait.h" #include "miscadmin.h" #include "pgstat.h" #include "postmaster/autovacuum.h" @@ -716,6 +717,9 @@ LockErrorCleanup(void) AbortStrongLockAcquire(); + /* If waitlsn was interrupted, then stop waiting for that LSN */ + DeleteWaitedLSN(); + /* Nothing to do if we weren't waiting for a lock */ if (lockAwaited == NULL) { diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c index 634f574d7e..50c836fdb7 100644 --- a/src/backend/utils/adt/misc.c +++ b/src/backend/utils/adt/misc.c @@ -375,8 +375,6 @@ pg_sleep(PG_FUNCTION_ARGS) * less than the specified time when WaitLatch is terminated early by a * non-query-canceling signal such as SIGHUP. */ -#define GetNowFloat() ((float8) GetCurrentTimestamp() / 1000000.0) - endtime = GetNowFloat() + secs; for (;;) diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index e259531f60..c11387961e 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -11411,4 +11411,19 @@ proname => 'is_normalized', prorettype => 'bool', proargtypes => 'text text', prosrc => 'unicode_is_normalized' }, +{ oid => '16387', descr => 'wait for LSN until timeout', + proname => 'pg_waitlsn', prorettype => 'bool', proargtypes => 'pg_lsn int8', + proargnames => '{trg_lsn,delay}', + prosrc => 'pg_waitlsn' }, + +{ oid => '16388', descr => 'wait for LSN for an infinite time', + proname => 'pg_waitlsn_infinite', prorettype => 'bool', proargtypes => 'pg_lsn', + proargnames => '{trg_lsn}', + prosrc => 'pg_waitlsn_infinite' }, + +{ oid => '16389', descr => 'wait for LSN with no timeout', + proname => 'pg_waitlsn_no_wait', prorettype => 'bool', proargtypes => 'pg_lsn', + proargnames => '{trg_lsn}', + prosrc => 'pg_waitlsn_no_wait' }, + ] diff --git a/src/include/commands/wait.h b/src/include/commands/wait.h new file mode 100644 index 0000000000..fd21e43416 --- /dev/null +++ b/src/include/commands/wait.h @@ -0,0 +1,26 @@ +/*------------------------------------------------------------------------- + * + * wait.h + * prototypes for commands/wait.c + * + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group + * Portions Copyright (c) 2020, Regents of PostgresPro + * + * src/include/commands/wait.h + * + *------------------------------------------------------------------------- + */ +#ifndef WAIT_H +#define WAIT_H +#include "postgres.h" +#include "tcop/dest.h" +#include "nodes/parsenodes.h" + +extern bool WaitUtility(XLogRecPtr lsn, const int timeout_ms); +extern Size WaitShmemSize(void); +extern void WaitShmemInit(void); +extern void WaitSetLatch(XLogRecPtr cur_lsn); +extern XLogRecPtr GetMinWaitedLSN(void); +extern void DeleteWaitedLSN(void); + +#endif /* WAIT_H */ diff --git a/src/include/utils/timestamp.h b/src/include/utils/timestamp.h index 63bf71ac61..6c4ecd704d 100644 --- a/src/include/utils/timestamp.h +++ b/src/include/utils/timestamp.h @@ -113,4 +113,6 @@ extern int date2isoyearday(int year, int mon, int mday); extern bool TimestampTimestampTzRequiresRewrite(void); +#define GetNowFloat() ((float8) GetCurrentTimestamp() / 1000000.0) + #endif /* TIMESTAMP_H */ diff --git a/src/test/recovery/t/021_waitlsn.pl b/src/test/recovery/t/021_waitlsn.pl new file mode 100644 index 0000000000..81dd70ef96 --- /dev/null +++ b/src/test/recovery/t/021_waitlsn.pl @@ -0,0 +1,91 @@ +# Checks waitlsn +use strict; +use warnings; + +use PostgresNode; +use TestLib; +use Test::More tests => 11; + +# Initialize primary node +my $node_primary = get_new_node('primary'); +$node_primary->init(allows_streaming => 1); +$node_primary->start; + +# And some content and take a backup +$node_primary->safe_psql('postgres', + "CREATE TABLE wait_test AS SELECT generate_series(1,10) AS a"); +my $backup_name = 'my_backup'; +$node_primary->backup($backup_name); + +# Using the backup, create a streaming standby with a 1 second delay +my $node_standby = get_new_node('standby'); +my $delay = 1; +$node_standby->init_from_backup($node_primary, $backup_name, + has_streaming => 1); +$node_standby->append_conf('postgresql.conf', qq[ + recovery_min_apply_delay = '${delay}s' +]); +$node_standby->start; + +# Check that timeouts make us wait for the specified time (1s here) +my $current_time = $node_standby->safe_psql('postgres', "SELECT now()"); +my $two_seconds = 2000; # in milliseconds +my $start_time = time(); +$node_standby->safe_psql('postgres', + "SELECT pg_waitlsn('0/FFFFFFFF', $two_seconds)"); +my $time_waited = (time() - $start_time) * 1000; # convert to milliseconds +ok($time_waited >= $two_seconds, "waitlsn waits for enough time"); + +# Check that timeouts let us stop waiting right away, before reaching target LSN +$node_primary->safe_psql('postgres', + "INSERT INTO wait_test VALUES (generate_series(11, 20))"); +my $lsn1 = $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn()"); +my ($ret, $out, $err) = $node_standby->psql('postgres', + "SELECT pg_waitlsn('$lsn1', 1)"); + +ok($ret == 0, "zero return value when failed to waitlsn on standby"); +ok($err =~ /WARNING: LSN was not reached/, + "correct error message when failed to waitlsn on standby"); +ok($out eq "f", "if given too little wait time, WAIT doesn't reach target LSN"); + + +# Check that waitlsn works fine and reaches target LSN if given no timeout + +# Add data on primary, memorize primary's last LSN +$node_primary->safe_psql('postgres', + "INSERT INTO wait_test VALUES (generate_series(21, 30))"); +my $lsn2 = $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn()"); + +# Wait for it to appear on replica, memorize replica's last LSN +$node_standby->safe_psql('postgres', + "SELECT pg_waitlsn_infinite('$lsn2')"); +my $reached_lsn = $node_standby->safe_psql('postgres', + "SELECT pg_last_wal_replay_lsn()"); + +# Make sure that primary's and replica's LSNs are the same after WAIT +my $compare_lsns = $node_standby->safe_psql('postgres', + "SELECT pg_lsn_cmp('$reached_lsn'::pg_lsn, '$lsn2'::pg_lsn)"); +ok($compare_lsns eq 0, + "standby reached the same LSN as primary before starting transaction"); + + +# Make sure that it's not allowed to use waitlsn on primary +($ret, $out, $err) = $node_primary->psql('postgres', + "SELECT pg_waitlsn_infinite('0/FFFFFFFF')"); + +ok($ret != 0, "non-zero return value when trying to waitlsn on primary"); +ok($err =~ /ERROR: Cannot use waitlsn on primary/, + "correct error message when trying to waitlsn on primary"); +ok($out eq '', "empty output when trying to waitlsn on primary"); + +# Make sure that waitlsn gives a warning inside a read commited transaction + +($ret, $out, $err) = $node_standby->psql('postgres', + "BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT pg_waitlsn_no_wait('0/FFFFFFFF')"); +ok($ret == 0, "zero return value when trying to waitlsn in transaction"); +ok($err =~ /WARNING: Waitlsn may work incorrectly in this isolation level/, + "correct warning message when trying to waitlsn in transaction"); +ok($out eq "f", "non empty output when trying to waitlsn in transaction"); + +$node_standby->stop; +$node_primary->stop; \ No newline at end of file diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 1d1d5d2f0e..6075ee5e77 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -2730,6 +2730,7 @@ WaitEventIPC WaitEventSet WaitEventTimeout WaitPMResult +WaitState WalCloseMethod WalLevel WalRcvData diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 0dca65dc7b..635508639a 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1474,6 +1474,10 @@ LANGUAGE internal STRICT IMMUTABLE PARALLEL SAFE AS 'unicode_is_normalized'; +CREATE OR REPLACE PROCEDURE + pg_waitlsn(wait_lsn pg_lsn, timeout integer DEFAULT -1) + LANGUAGE internal AS 'pg_waitlsn'; + -- -- The default permissions for functions mean that anyone can execute them. -- A number of functions shouldn't be executable by just anyone, but rather diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index c11387961e..7f25938cbc 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -11426,4 +11426,8 @@ proargnames => '{trg_lsn}', prosrc => 'pg_waitlsn_no_wait' }, +{ oid => '9313', descr => 'wait for LSN to be replayed', + proname => 'pg_waitlsn', prokind => 'p',prorettype => 'void', proargtypes => 'pg_lsn int4', + proargnames => '{wait_lsn,timeout}', + prosrc => 'pg_waitlsn' } ]