Re: Add LSN <-> time conversion functionality - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Add LSN <-> time conversion functionality |
Date | |
Msg-id | CA+TgmoZ-huAspWFJ8xP7u7bn8BDVJQ5sNjKGwXug_G6pYOHj0Q@mail.gmail.com Whole thread Raw |
In response to | Re: Add LSN <-> time conversion functionality (Tomas Vondra <tomas@vondra.me>) |
Responses |
Re: Add LSN <-> time conversion functionality
Re: Add LSN <-> time conversion functionality |
List | pgsql-hackers |
On Thu, Aug 8, 2024 at 2:34 PM Tomas Vondra <tomas@vondra.me> wrote: > How could we do this? We have 1s precision, so we start with buckets for > each seconds. And we want to allow merging stuff nicely. The smallest > merges we could do is 1s -> 2s -> 4s -> 8s -> ... but let's say we do > 1s -> 10s -> 100s -> 1000s instead. > > So we start with 100x one-second buckets > > [A_0, A_1, ..., A_99] -> 100 x 1s buckets > [B_0, B_1, ..., B_99] -> 100 x 10s buckets > [C_0, C_1, ..., C_99] -> 100 x 100s buckets > [D_0, D_1, ..., D_99] -> 100 x 1000s buckets > > We start by adding data into A_k buckets. After filling all 100 of them, > we grab the oldest 10 buckets, and combine/move them into B_k. And so > on, until B is gets full too. Then we grab the 10 oldest B_k entries, > and move them into C. and so on. For D the oldest entries would get > discarded, or we could add another layer with each bucket representing > 10k seconds. Yeah, this kind of algorithm makes sense to me, although as you say later, I don't think we need this amount of precision. I also think you're right to point out that this provides certain guaranteed behavior. > A-D is already enough to cover 30h, with A-E it'd be ~300h. Do we need > (or want) to keep a longer history? I think there is a difference of opinion about this between Melanie and I. I feel like we should be designing something that does the exact job we need done for the freezing stuff, and if anyone else can use it, that's a bonus. For that, I feel that 300h is more than plenty. The goal of the freezing stuff, roughly speaking, is to answer the question "will this be unfrozen real soon?". "Real soon" could arguably mean a minute or an hour, but I don't think it makes sense for it to be a week. If we're freezing data now that has a good chance of being unfrozen again within 7 days, we should just freeze it anyway. The cost of freezing isn't really all that high. If we keep freezing pages that are going to be unfrozen again within seconds or minutes, we pay those freezing costs enough times that they become material, but I have difficulty imagining that it ever matters if we re-freeze the same page every week. It's OK to be wrong as long as we aren't wrong too often, and I think that being wrong once per page per week isn't too often. But I think Melanie was hoping to create something more general, which on one level is understandable, but on the other hand it's unclear what the goals are exactly. If we limit our scope to specifically VACUUM, we can make reasonable guesses about how much precision we need and for how long. But a hypothetical other client of this infrastructure could need anything at all, which makes it very unclear what the best design is, IMHO. > Isn't this a sign this does not quite fit into pgstats? Even if this > happens to deal with unsafe restarts, replica promotions and so on, what > if the user just does pg_stat_reset? That already causes trouble because > we simply forget deleted/updated/inserted tuples. If we also forget data > used for freezing heuristics, that does not seem great ... +1. > Wouldn't it be better to write this into WAL as part of a checkpoint (or > something like that?), and make bgwriter to not only add LSN/timestamp > into the stream, but also write it into WAL. It's already waking up, on > idle systems ~32B written to WAL does not matter, and on busy system > it's just noise. I am not really sure of the best place to put this data. I agree that pgstat doesn't feel like quite the right place. But I'm not quite sure that putting it into every checkpoint is the right idea either. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: