Home > mailing lists

Re: emit recovery stats via a new file or a new hook - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: emit recovery stats via a new file or a new hook
Date	October 31, 2021 22:00:17
Msg-id	20211031190017.j3bg3ud22np44fri@alap3.anarazel.de Whole thread Raw
In response to	emit recovery stats via a new file or a new hook (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses	Re: emit recovery stats via a new file or a new hook (Amit Kapila <amit.kapila16@gmail.com>) Re: emit recovery stats via a new file or a new hook (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List	pgsql-hackers

Tree view

Hi,

On 2021-10-31 19:06:07 +0530, Bharath Rupireddy wrote:
> It is sometimes super important to be able to answer customer
> questions like: What was the total time taken by the last recovery of
> the server? What was the time taken by each phase of recovery/redo
> processing of the startup process? Why did the recovery take so long?
> We've encountered these questions while dealing with the postgres
> customers. If these stats are available in an easily consumable
> fashion, it will be easier for us to understand, debug and identify
> root cause for "recovery taking a long time" problems, improve if
> possible and answer the customer questions. Also, these recovery stats
> can be read by an external analytical tool to show the recovery
> patterns to the customers directly. Although postgres emits some info
> via server logs thanks to the recent commit [3], it isn't easily
> consumable for the use cases that I mentioned.
> 
> Here are a few thoughts on how we could go about doing this. I
> proposed them earlier in [1],
> 1) capture and write recovery stats into a file
> 2) capture and emit recovery stats via a new hook
> 3) capture and write into a new system catalog table (assuming at the
> end of the recovery the database is in a consistent state, but I'm not
> sure if we ever update any catalog tables in/after the
> startup/recovery phase)
> 
> As Robert rightly suggested at [2], option (3) isn't an easy way to do
> that so we can park that idea aside, options (1) and (2) seem
> reasonable.

I don't think 1) is a good approach, because it just leads us down the
path of having dozens of log files. 2) isn't useful either, because
you'd need to load an extension library first, which users won't
have done before hitting the problem. And 3) isn't really possible.

I'm not sure that the new log messages aren't sufficient. But if they
aren't, it seems better to keep additional data in the stats system, and
make them visible via views, rather than adding yet another place to
keep stats.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Andres Freund
Date: 31 October 2021, 21:56:01
Subject: Re: should we enable log_checkpoints out of the box?

From: Andres Freund
Date: 31 October 2021, 22:05:42
Subject: Re: Time to drop plpython2?

Re: emit recovery stats via a new file or a new hook - Mailing list pgsql-hackers

Previous

Next