Re: Add LSN <-> time conversion functionality - Mailing list pgsql-hackers

From Andrey M. Borodin
Subject Re: Add LSN <-> time conversion functionality
Date
Msg-id 38F43C9F-C270-4AA4-8BB6-392D4BE75F90@yandex-team.ru
Whole thread Raw
In response to Re: Add LSN <-> time conversion functionality  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Add LSN <-> time conversion functionality
List pgsql-hackers
Hi!

I’m doing another iteration over the patchset.

PgStartLSN = GetXLogInsertRecPtr();
Should this be kind of RecoveryEndPtr? How is it expected to behave on Standby in HA cluster, which was doing a crash
recoveryof 1y WALs in a day, then is in startup for a year as a Hot Standby, and then is promoted? 

lsn_ts_calculate_error_area() is prone to overflow. Even int64 does not seem capable to accommodate LSN*time. And the
functionmay return negative result, despite claiming area as a result. It’s intended, but a little misleading. 

i-- > 0
Is there a point to do a backward count in the loop?
Consider dropping not one by one, but half of a stream, LSNTimeStream is ~2Kb of cache and it’s loaded as a whole to
thecache.. 



> On 27 Jun 2024, at 07:18, Melanie Plageman <melanieplageman@gmail.com> wrote:
>
>> 2. Some benchmarks to proof the patch does not have CPU footprint.
>
> This is still a todo. Typically when designing a benchmark like this,
> I would want to pick a worst-case workload to see how bad it could be.
> I wonder if just a write heavy workload like pgbench builtin tpcb-like
> would be sufficient?

Increasing background writer activity to maximum and not seeing LSNTimeStream function in `perf top` seems enough to
me.

>
>> === Nits ===
>> "Timeline" term is already taken.
>
> I changed it to LSNTimeStream. What do you think?
Sounds good to me.


>
>> Tests fail on Windows.
>
> I think this was because of the compiler warnings, but I need to
> double-check now.
Nope, it really looks more serious.
[12:31:25.701] FAILED: src/backend/postgres_lib.a.p/utils_activity_pgstat_wal.c.obj
[12:31:25.701] "cl" "-Isrc\backend\postgres_lib.a.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include"
"-I..\src\include\port\win32""-I..\src\include\port\win32_msvc" "/MDd" "/FIpostgres_pch.h" "/Yupostgres_pch.h"
"/Fpsrc\backend\postgres_lib.a.p\postgres_pch.pch""/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/DWIN32"
"/DWINDOWS""/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244"
"/wd4273""/wd4101" "/wd4102" "/wd4090" "/wd4267" "-DBUILDING_DLL" "/FS"
"/FdC:\cirrus\build\src\backend\postgres_lib.pdb"/Fosrc/backend/postgres_lib.a.p/utils_activity_pgstat_wal.c.obj "/c"
../src/backend/utils/activity/pgstat_wal.c
[12:31:25.701] ../src/backend/utils/activity/pgstat_wal.c(433): error C2375: 'pg_estimate_lsn_at_time': redefinition;
differentlinkage 
[12:31:25.701] c:\cirrus\build\src\include\utils/fmgrprotos.h(2906): note: see declaration of 'pg_estimate_lsn_at_time'
[12:31:25.701] ../src/backend/utils/activity/pgstat_wal.c(434): error C2375: 'pg_estimate_time_at_lsn': redefinition;
differentlinkage 
[12:31:25.701] c:\cirrus\build\src\include\utils/fmgrprotos.h(2905): note: see declaration of 'pg_estimate_time_at_lsn'
[12:31:25.701] ../src/backend/utils/activity/pgstat_wal.c(435): error C2375: 'pg_lsntime_stream': redefinition;
differentlinkage 
[12:31:25.858] c:\cirrus\build\src\include\utils/fmgrprotos.h(2904): note: see declaration of 'pg_lsntime_stream'


>
>> The patch lacks tests.
>
> I thought about this a bit. I wonder what kind of tests make sense.
>
> I could
> 1) Add tests with the existing stats tests
> (src/test/regress/sql/stats.sql) and just test that bgwriter is in
> fact adding to the time stream.
>
> 2) Or should I add some infrastructure to be able to create an
> LSNTimeStream and then insert values to it and do some validations of
> what is added? I did a version of this but it is just much more
> annoying with C & SQL than with python (where I tried out my
> algorithms) [2].

I think just a test which calls functions and discards the result would greatly increase coverage.


> On 29 Jun 2024, at 03:09, Melanie Plageman <melanieplageman@gmail.com> wrote:
> change the user-facing functions for estimating an
> LSN/time conversion to instead return a floor and a ceiling -- instead
> of linearly interpolating a guess. This would be a way to keep users
> from misunderstanding the accuracy of the functions to translate LSN
> <-> time.

I think this is a good idea. And it covers well “server restart problem”. If API just returns -inf as a boundary,
callercan correctly interpret this situation. 

Thanks! Looking forward to more timely freezing.


Best regards, Andrey Borodin.


pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Simplifying width_bucket_numeric()
Next
From: Erik Wienhold
Date:
Subject: Re: XML test error on Arch Linux