stats for network traffic WIP - Mailing list pgsql-hackers

From Nigel Heron
Subject stats for network traffic WIP
Date
Msg-id CAHhq2wJXRqTMJXZwMAOdtQOkxSKxg_aMxxofhvCo=RGXvh0AUg@mail.gmail.com
Whole thread Raw
Responses Re: stats for network traffic WIP  (Stephen Frost <sfrost@snowman.net>)
Re: stats for network traffic WIP  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
Hi, I've been using postgres for many years but never took the time to play with the code until now. As a learning experience i came up with this WIP patch to keep track of the # of bytes sent and received by the server over it's communication sockets. Counters are kept per database, per connection and globally/shared.
The counters are incremented for tcp (remote and localhost) and for unix sockets. The major WIP issue so far is that connections using SSL aren't counted properly. If there's any interest, i'll keep working on it.

a few functions are added:
- pg_stat_get_bytes_sent() returns the total count of outgoing bytes for the whole cluster (all dbs and all connections including replication)
- pg_stat_get_bytes_received() same but for incoming data
- pg_stat_get_db_bytes_sent(oid) returns count of outgoing bytes for a specific database
- pg_stat_get_db_bytes_received(oid) same but for incoming data

"bytes_sent" and "bytes_received" columns are added to:
- pg_stat_get_activity function
- pg_stat_activity view
- pg_stat_database view
- pg_stat_replication view

The counters are reset with the existing reset functions, but a new parameter value is added for the shared stats call (i named it "socket" for lack of imagination), eg. pg_stat_reset_shared('socket').

some benefits of the patch:
- can be used to track bandwidth usage of postgres, useful if the host isn't a dedicated db server, where host level statistics would include other traffic.
- can track bandwidth usage of streaming replication.
- can be used to find misbehaving connections.
- can be used in multi-user/multi-database clusters for resource usage tracking.
- competing databases have such metrics.
- could also be added to pg_stat_statements for extra debugging.
- etc.?

some negatives:
- extra code is called for each send() and recv(), I haven't measured the performance impact yet. (but can be turned off using track_counts=off)
- stats collector has more work to do.
- some stats structs are changed which will cause an error while trying to load them from disk the first time and the old stats will be lost.
- PL functions that create their own sockets aren't tracked.
- sockets from FDWs calls aren't tracked.

To debug the counters, i'm using clients connected through haproxy to generate traffic and then compare haproxy's stats with what pg stores in pg_stat/global.stat on shutdown. Attached is a very basic python script that can read the global.stat file (it takes the DATADIR as a parameter).

Any feedback is appreciated,
-nigel.
Attachment

pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: signed vs. unsigned in TYPEALIGN (was Re: space reserved for WAL record does not match what was written: panic on windows)
Next
From: David Fetter
Date:
Subject: Re: FDW API / flow charts for the docs?