Re: Asynchronous and "direct" IO support for PostgreSQL. - Mailing list pgsql-hackers

From Alexey Lesovsky
Subject Re: Asynchronous and "direct" IO support for PostgreSQL.
Date
Msg-id c8a067ac-55df-42e8-57b0-d70cdd30e0bc@dataegret.com
Whole thread Raw
In response to Asynchronous and "direct" IO support for PostgreSQL.  (Andres Freund <andres@anarazel.de>)
Responses Re: Asynchronous and "direct" IO support for PostgreSQL.
Re: Asynchronous and "direct" IO support for PostgreSQL.
List pgsql-hackers
Hi,

Thank you for the amazing and great work.

On 23.02.2021 15:03, Andres Freund wrote:
> ## Stats
>
> There are two new views: pg_stat_aios showing AIOs that are currently
> in-progress, pg_stat_aio_backends showing per-backend statistics about AIO.

As a DBA I would like to propose a few amendments that might help with 
practical usage of stats when feature will be finally implemented. My 
suggestions aren’t related to the central idea of the proposed changes, 
but rather to the stats part.

A quick side note, there are two terms in Prometheus 
(https://prometheus.io/docs/concepts/metric_types/):
1. Counter. A counter is a cumulative metric that represents a single 
monotonically increasing counter whose value can only increase or be 
reset to zero on restart.
2. Gauge. A gauge is a metric that represents a single numerical value 
that can arbitrarily go up and down.

For the purposes of long-term stats collection, COUNTERs are preferred 
over GAUGEs, because COUNTERs allow us to understand how metrics are 
changed overtime without missing out potential spikes in activity. As a 
result, we have a much better historic perspective.

Measuring and collecting GAUGEs is limited to the moments in time when 
the stats are taken (snapshots) so the changes that took place between 
the snapshots remain unmeasured. In systems with a high rate of 
transactions per second (even 1 second interval between the snapshots) 
GAUGEs measuring won’t provide the full picture.  In addition, most of 
the monitoring systems like Prometheus, Zabbix, etc. use longer 
intervals (from 10-15 to 60 seconds).

The main idea is to try to expose almost all numeric stats as COUNTERs - 
this increases overall observabilty of implemented feature.

pg_stat_aios.
In general, this stat is a set of text values, and at the same time it 
looks GAUGE-like (similar to pg_stat_activity or pg_locks), and is only 
relevant for the moment when the user is looking at it. I think it would 
be better to rename this view to pg_stat_progress_aios. And keep 
pg_stat_aios for other AIO stats with global COUNTERs (like stuff in 
pg_stat_user_tables or pg_stat_statements, or system-wide /proc/stat, 
/proc/diskstats).

pg_stat_aio_backends.
This stat is based on COUNTERs, which is great, but the issue here is 
that its lifespan is limited by the lifespan of the backend processes - 
once the backend exits the stat will no longer be available - which 
could be inappropriate in workloads with short-lived backends.

I think there might be few existing examples in the current code that 
could be repurposed to implement the suggestions above (such as 
pg_stat_user_tables, pg_stat_database, etc). With this in mind, I think 
having these changes incorporated shouldn’t take significant effort 
considering the benefit it will bring to the final user.

Once again huge respect to your work on this changes and good look.

Regards, Alexey




pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Asynchronous and "direct" IO support for PostgreSQL.
Next
From: Greg Stark
Date:
Subject: Re: Asynchronous and "direct" IO support for PostgreSQL.