Re: Add LSN <-> time conversion functionality - Mailing list pgsql-hackers
From | Andrey M. Borodin |
---|---|
Subject | Re: Add LSN <-> time conversion functionality |
Date | |
Msg-id | 334BD6DF-35D6-40E8-BF37-45360955E2CD@yandex-team.ru Whole thread Raw |
In response to | Add LSN <-> time conversion functionality (Melanie Plageman <melanieplageman@gmail.com>) |
List | pgsql-hackers |
This is a copy of my message for pgsql-hackers mailing list. Unfortunately original message was rejected due to one of recipientsaddresses is blocked. > On 1 Aug 2024, at 10:54, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote: > > > >> On 1 Aug 2024, at 05:44, Melanie Plageman <melanieplageman@gmail.com> wrote: >> >> Thanks for the review! v6 attached. >> >> On Sat, Jul 6, 2024 at 1:36 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote: >>> >>> PgStartLSN = GetXLogInsertRecPtr(); >>> Should this be kind of RecoveryEndPtr? How is it expected to behave on Standby in HA cluster, which was doing a crashrecovery of 1y WALs in a day, then is in startup for a year as a Hot Standby, and then is promoted? >> >> So, I don't think we will allow use of the LSNTimeStream on a standby, >> since it is unclear what it would mean on a standby. For example, do >> you want to know the time the LSN was generated or the time it was >> replayed? Note that bgwriter won't insert values to the time stream on >> a standby (it explicitly checks). > > Yes, I mentioned Standby because PgStartLSN is not what it says it is. > >> >> But, you bring up an issue that I don't quite know what to do about. >> If the standby doesn't have an LSNTimeStream, then when it is >> promoted, LSN <-> time conversions of LSNs and times before the >> promotion seem impossible. Maybe if the stats file is getting written >> out at checkpoints, we could restore from that previous primary's file >> after promotion? > > I’m afraid that clocks of a Primary from previous timeline might be not in sync with ours. > It’s OK if it causes error, we just need to be prepared when they indicate values from future. Perhaps, by shifting theirlast point to our “PgStartLSN”. > >> >> This brings up a second issue, which is that, currently, bgwriter >> won't insert into the time stream when wal_level is minimal. I'm not >> sure exactly how to resolve it because I am relying on the "last >> important rec pointer" and the LOG_SNAPSHOT_INTERVAL_MS to throttle >> when the bgwriter actually inserts new records into the LSNTimeStream. >> I could come up with a different timeout interval for updating the >> time stream. Perhaps I should do that? > > IDK. My knowledge of bgwriter is not enough to give a meaningful advise here. > >> >>> lsn_ts_calculate_error_area() is prone to overflow. Even int64 does not seem capable to accommodate LSN*time. And thefunction may return negative result, despite claiming area as a result. It’s intended, but a little misleading. >> >> Ah, great point. I've fixed this. > > Well, not exactly. Result of lsn_ts_calculate_error_area() is still fabs()’ed upon usage. I’d recommend to fabs in function. > BTW lsn_ts_calculate_error_area() have no prototype. > > Also, I’m not a big fan of using IEEE 754 float in this function. This data type have 24 bits of significand bits. > Consider that current timestamp has 50 binary digits. Let’s assume realistic LSNs have same 50 bits. > Then our rounding error is 2^76 byte*microseconds. > Let’s assume we are interested to measure time on a scale of 1GB WAL records. > This gives us rounding error of 2^46 microseconds = 2^26 seconds = 64 million seconds = 2 years. > Seems like a gross error. > > If we use IEEE 754 doubles we have 53 significand bytes. And rounding error will be on a scale of 128 microseconds perGB, which is acceptable. > > So I think double is better than float here. > > Nitpicking, but I’d prefer to sum up (triangle2 + triangle3 + rectangle_part) before subtracting. This can save a bit ofprecision (smaller figures can have lesser exponent). > > >>>> On 29 Jun 2024, at 03:09, Melanie Plageman <melanieplageman@gmail.com> wrote: >>>> change the user-facing functions for estimating an >>>> LSN/time conversion to instead return a floor and a ceiling -- instead >>>> of linearly interpolating a guess. This would be a way to keep users >>>> from misunderstanding the accuracy of the functions to translate LSN >>>> <-> time. >>> >>> I think this is a good idea. And it covers well “server restart problem”. If API just returns -inf as a boundary, callercan correctly interpret this situation. >> >> Providing "ceiling" and "floor" user functions is still a TODO for me, >> however, I think that the patch mostly does handle server restarts. >> >> In the event of a restart, the cumulative stats system will have >> persisted our time stream, so the LSNTimeStream will just be read back >> in with the rest of the stats. I've added logic to ensure that if the >> PgStartLSN is newer than our oldest LSNTimeStream entry, we use the >> oldest entry instead of PgStartLSN when doing conversions LSN <-> >> time. >> >> As for a crash, stats do not persist crashes, but I think Michael's >> patch will go in to write out the stats file at checkpoints, and then >> this should be good enough. >> >> Is there anything else you think that is an issue with restarts? > > Nope, looks good to me. > >> >>> Thanks! Looking forward to more timely freezing. >> >> Thanks! I'll be posting a new version of the opportunistic freezing >> patch that uses the time stream quite soon, so I hope you'll take a >> look at that as well! > > Great! Thank you! > Besides your TODOs and my nitpicking this patch series looks RfC to me. > > I have to address some review comments on my patches, then I hope I’ll switch to reviewing opportunistic freezing. > > > Best regards, Andrey Borodin.
pgsql-hackers by date: