Re: Implement waiting for wal lsn replay: reloaded - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: Implement waiting for wal lsn replay: reloaded |
Date | |
Msg-id | CAPpHfdt5VCM1DoodvVoiRUaoRuXrNEzAmrMQ-eLa0E7wByXaKw@mail.gmail.com Whole thread Raw |
In response to | Re: Implement waiting for wal lsn replay: reloaded (Tomas Vondra <tomas@vondra.me>) |
List | pgsql-hackers |
Hi, Tomas. Thank you so much for your review! Please find the revised patchset. On Thu, Mar 13, 2025 at 4:15 PM Tomas Vondra <tomas@vondra.me> wrote: > I did a quick look at this patch. I haven't found any correctness > issues, but I have some general review comments and questions about the > grammar / syntax. > > 1) The sgml docs don't really show the syntax very nicely, it only shows > this at the beginning of wait_for.sgml: > > WAIT FOR ( <replaceable class="parameter">parameter</replaceable> > '<replaceable class="parameter">value</replaceable>' [, ... ] ) ] > > I kinda understand this comes from using the generic option list (I'll > get to that shortly), but I think it'd be much better to actually show > the "full" syntax here, instead of leaving the "parameters" to later. Sounds reasonable, changed to show the full syntax in the synopsis. > 2) The syntax description suggests "(" and ")" are required, but that > does not seem to be the case - in fact, it's not even optional, and when > I try using that, I get syntax error. Good catch, fixed. > 3) I have my doubts about using the generic_option_list for this. Yes, I > understand this allows using fewer reserved keywords, but it leads to > some weirdness and I'm not sure it's worth it. Not sure what the right > trade off is here. > > Anyway, some examples of the weird stuff implied by this approach: > > - it forces "," between the options, which is a clear difference from > what we do for every other command > > - it forces everything to be a string, i.e. you can' say "TIMEOUT 10", > it has to be "TIMEOUT '10'" > > I don't have a very strong opinion on this, but the result seems a bit > strange to me. I've improved the syntax. I still tried to keep the number of new keywords and grammar rules minimal. That leads to moving some parser login into wait.c. This is probably a bit awkward, but saves our grammar from bloat. Let me know what do you think about this approach. > 4) I'm not sure I understand the motivation of the "throw false" mode, > and I'm not sure I understand this description in the sgml docs: > > On timeout, or if the server is promoted before > <parameter>lsn</parameter> is reached, an error is emitted, > as soon as <parameter>throw</parameter> is not specified or set to > true. > If <parameter>throw</parameter> is set to false, then the command > doesn't throw errors. > > I find it a bit confusing. What is the use case for this mode? The idea here is that application could do some handling of these errors without having to parse the error messages (parsing error messages is inconvenient because of localization etc). > 5) One place in the docs says: > > The target log sequence number to wait for. > > Thie is literally the only place using "log sequence number" in our > code base, I'd just use "LSN" just like every other place. OK fixed. > 6) The docs for the TIMEOUT parameter say this: > > <varlistentry> > <term><replaceable class="parameter">timeout</replaceable></term> > <listitem> > <para> > When specified and greater than zero, the command waits until > <parameter>lsn</parameter> is reached or the specified > <parameter>timeout</parameter> has elapsed. Must be a non- > negative integer, the default is zero. > </para> > </listitem> > </varlistentry> > > That doesn't say what unit does the option use. Is is seconds, > milliseconds or what? > > In fact, it'd be nice to let users specify that in the value, similar > to other options (e.g. SET statement_timeout = '10s'). The default unit of milliseconds is specified. Also, an alternative way to specify timeout is now supported. Timeout might be a string literal consisting of numeric and unit specifier. > 7) One place in the docs says this: > > That is, after this function execution, the value returned by > <function>pg_last_wal_replay_lsn</function> should be greater ... > > I think the reference to "function execution" is obsolete? Actually, this is just the function, which reports current replay LSN, not function introduced by previous version of this patch. We refer it to just express the constraint that LSN must be replayed after execution of the command. > 8) I find this confusing: > > However, if <command>WAIT FOR</command> is > called on primary promoted from standby and <literal>lsn</literal> > was already replayed, then the <command>WAIT FOR</command> command > just exits immediately. > > Does this mean running the WAIT command on a primary (after it was > already promoted) will exit immediately? Why does it matter that it > was promoted from a standby? Shouldn't it exit immediately even for > a standalone instance? I think the previous sentence should give an idea that otherwise error gets thrown. That also happens immediately for sure. > 9) xlogwait.c > > I think this should start with a basic "design" description of how the > wait is implemented, in a comment at the top of the file. That is, what > we keep in the shared memory, what happens during a wait, how it uses > the pairing heap, etc. After reading this comment I should understand > how it all fits together. OK, I've added the header comment. > 10) WaitForLSNReplay / WaitLSNWakeup > > I think the function comment should document the important stuff (e.g. > return values for various situations, how it groups waiters into chunks > of 16 elements during wakeup, ...). Revised header comments for those functions too. > 11) WaitLSNProcInfo / WaitLSNState > > Does this need to be exposed in xlogwait.h? These structs seem private > to xlogwait.c, so maybe declare it there? Hmm, I don't remember why I moved them to xlogwait.h. OK, moved them back to xlogwait.c. ------ Regards, Alexander Korotkov Supabase
Attachment
pgsql-hackers by date: