Re: Add LSN <-> time conversion functionality - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Add LSN <-> time conversion functionality
Date
Msg-id f6885752-75ef-496f-a6cc-ad759feb907f@vondra.me
Whole thread Raw
In response to Re: Add LSN <-> time conversion functionality  (Melanie Plageman <melanieplageman@gmail.com>)
List pgsql-hackers
On 8/9/24 17:48, Melanie Plageman wrote:
> On Fri, Aug 9, 2024 at 9:15 AM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
>>
>> On Fri, Aug 9, 2024 at 9:09 AM Tomas Vondra <tomas@vondra.me> wrote:
>>>
>>> I suggest we do the simplest and most obvious algorithm possible, at
>>> least for now. Focusing on this part seems like a distraction from the
>>> freezing thing you actually want to do.
>>
>> The simplest thing to do would be to pick an arbitrary point in the
>> past (say one week) and then throw out all the points (except the very
>> oldest to avoid extrapolation) from before that cliff. I would like to
>> spend time on getting a new version of the freezing patch on the list,
>> but I think Robert had strong feelings about having a complete design
>> first. I'll switch focus to that for a bit so that perhaps you all can
>> see how I am using the time -> LSN conversion and that could inform
>> the design of the data structure.
> 
> I realize this thought didn't make much sense since it is a fixed size
> data structure. We would have to use some other algorithm to get rid
> of data if there are still too many points from within the last week.
> 

Not sure I understand. Why would the fixed size of the struct mean we
can't discard too old data?

I'd imagine we simply reclaim some of the slots and mark them as unused,
"move" the data to make space for recent data, or something like that.
Or just use something like a cyclic buffer, that wraps around and
overwrites oldest data.

> In the adaptive freezing code, I use the time stream to answer a yes
> or no question. I translate a time in the past (now -
> target_freeze_duration) to an LSN so that I can determine if a page
> that is being modified for the first time after having been frozen has
> been modified sooner than target_freeze_duration (a GUC value). If it
> is, that page was unfrozen too soon. So, my use case is to produce a
> yes or no answer. It doesn't matter very much how accurate I am if I
> am wrong. I count the page as having been unfrozen too soon or I
> don't. So, it seems I care about the accuracy of data from now until
> now  - target_freeze_duration + margin of error a lot and data before
> that not at all. While it is true that if I'm wrong about a page that
> was older but near the cutoff, that might be better than being wrong
> about a very recent page, it is still wrong.
> 

Yeah. But isn't that a bit backwards? The decision can be wrong because
the estimate was too off, or maybe it was spot on and we still made a
wrong decision. That's what happens with heuristics.

I think a natural expectation is that the quality of the answers
correlates with the accuracy of the data / estimates. With accurate
results (say we keep a perfect history, with no loss of precision for
older data) we should be doing the right decision most of the time. If
not, it's a lost cause, IMHO. And with lower accuracy it'd get worse,
otherwise why would we need the detailed data.

But now that I think about it, I'm not entirely sure I understand what
point are you making :-(


regards

-- 
Tomas Vondra



pgsql-hackers by date:

Previous
From: "Tristan Partin"
Date:
Subject: Re: Building with meson on NixOS/nixpkgs
Next
From: Tomas Vondra
Date:
Subject: Re: Add LSN <-> time conversion functionality