Re: Add LSN <-> time conversion functionality - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Add LSN <-> time conversion functionality |
Date | |
Msg-id | CAAKRu_ad2DsWjt+h2kV43Nwe1+wm5T__VrvXeuZHwV05f6iPjQ@mail.gmail.com Whole thread Raw |
In response to | Add LSN <-> time conversion functionality (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: Add LSN <-> time conversion functionality
|
List | pgsql-hackers |
Attached v7 changes the SQL-callable functions to return ranges of LSNs and times covering the target time or LSN instead of linearly interpolating an approximate answer. I also changed the frequency and conditions under which the background writer updates the global LSNTimeStream. There is now a dedicated interval at which the LSNTimeStream is updated (instead of reusing the log standby snapshot interval). I also found that it is incorrect to set PgStartLSN to the insert LSN in PostmasterMain(). The XLog buffer cache is not guaranteed to be initialized in time. Instead of trying to provide an LSN lower bound for locating times before those recorded on the global LSNTimeStream, I simply return a lower bound of InvalidXLogRecPtr. Similarly, I provide a lower bound of -infinity when locating LSNs before those recorded on the global LSNTimeStream. On Thu, Aug 1, 2024 at 3:55 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote: > > > On 1 Aug 2024, at 05:44, Melanie Plageman <melanieplageman@gmail.com> wrote: > > > > On Sat, Jul 6, 2024 at 1:36 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote: > >> > >> PgStartLSN = GetXLogInsertRecPtr(); > >> Should this be kind of RecoveryEndPtr? How is it expected to behave on Standby in HA cluster, which was doing a crashrecovery of 1y WALs in a day, then is in startup for a year as a Hot Standby, and then is promoted? > > > > So, I don't think we will allow use of the LSNTimeStream on a standby, > > since it is unclear what it would mean on a standby. For example, do > > you want to know the time the LSN was generated or the time it was > > replayed? Note that bgwriter won't insert values to the time stream on > > a standby (it explicitly checks). > > Yes, I mentioned Standby because PgStartLSN is not what it says it is. Right, I've found another way of dealing with this since PgStartLSN was incorrect. > > But, you bring up an issue that I don't quite know what to do about. > > If the standby doesn't have an LSNTimeStream, then when it is > > promoted, LSN <-> time conversions of LSNs and times before the > > promotion seem impossible. Maybe if the stats file is getting written > > out at checkpoints, we could restore from that previous primary's file > > after promotion? > > I’m afraid that clocks of a Primary from previous timeline might be not in sync with ours. > It’s OK if it causes error, we just need to be prepared when they indicate values from future. Perhaps, by shifting theirlast point to our “PgStartLSN”. Regarding a standby being promoted. I plan to make a version of the LSNTimeStream functions which works on a standby by using getRecordTimestamp() and inserts an LSN from the last record replayed and the associated timestamp. That should mean the LSNTimeStream on the standby is roughly the same as the one on the primary since those records were inserted on the primary. As for time going backwards in general, I've now made it so that values are only inserted if the times are monotonically increasing and the LSN is the same or increasing. This should handle time going backwards, either on the primary itself or after a standby is promoted if the timeline wasn't a perfect match. > > This brings up a second issue, which is that, currently, bgwriter > > won't insert into the time stream when wal_level is minimal. I'm not > > sure exactly how to resolve it because I am relying on the "last > > important rec pointer" and the LOG_SNAPSHOT_INTERVAL_MS to throttle > > when the bgwriter actually inserts new records into the LSNTimeStream. > > I could come up with a different timeout interval for updating the > > time stream. Perhaps I should do that? > > IDK. My knowledge of bgwriter is not enough to give a meaningful advise here. See my note at top of the email. > >> lsn_ts_calculate_error_area() is prone to overflow. Even int64 does not seem capable to accommodate LSN*time. And thefunction may return negative result, despite claiming area as a result. It’s intended, but a little misleading. > > > > Ah, great point. I've fixed this. > > Well, not exactly. Result of lsn_ts_calculate_error_area() is still fabs()’ed upon usage. I’d recommend to fabs in function. > BTW lsn_ts_calculate_error_area() have no prototype. > > Also, I’m not a big fan of using IEEE 754 float in this function. This data type have 24 bits of significand bits. > Consider that current timestamp has 50 binary digits. Let’s assume realistic LSNs have same 50 bits. > Then our rounding error is 2^76 byte*microseconds. > Let’s assume we are interested to measure time on a scale of 1GB WAL records. > This gives us rounding error of 2^46 microseconds = 2^26 seconds = 64 million seconds = 2 years. > Seems like a gross error. > > If we use IEEE 754 doubles we have 53 significand bytes. And rounding error will be on a scale of 128 microseconds perGB, which is acceptable. > > So I think double is better than float here. > > Nitpicking, but I’d prefer to sum up (triangle2 + triangle3 + rectangle_part) before subtracting. This can save a bit ofprecision (smaller figures can have lesser exponent). Okay, thanks for the detail. See what you think about v7. Some perf testing of bgwriter updates are still a todo. I was thinking that it might be bad to take an exclusive lock on the WAL stats data structure for the entire time I am inserting a new value to the LSNTimeStream. I was thinking maybe I should take a share lock and calculate which element to drop first and then take the exclusive lock? Or maybe I should make a separate lock for just the stream member of PgStat_WalStats. Maybe it isn't worth it? I'm not sure. - Melanie
Attachment
pgsql-hackers by date: