[PATCH] pg_stat_toast - Mailing list pgsql-hackers

From Gunnar \"Nick\" Bluth
Subject [PATCH] pg_stat_toast
Date
Msg-id a08b54fa-7b13-9531-6233-33a3d23773a8@pro-open.de
Whole thread Raw
Responses Re: [PATCH] pg_stat_toast
RE: [PATCH] pg_stat_toast
[PATCH] pg_stat_toast v0.3
List pgsql-hackers
Hello -hackers!

Please have a look at the attached patch, which implements some 
statistics for TOAST.

The idea (and patch) have been lurking here for quite a while now, so I 
decided to dust it off, rebase it to HEAD and send it out for review today.

A big shoutout to Georgios Kokolatos, who gave me a crash course in PG 
hacking, some very useful hints and valueable feedback early this year.

I'd like to get some feedback about the general idea, approach, naming 
etc. before refining this further.

I'm not a C person and I s**k at git, so please be kind with me! ;-)
Also, I'm not subscribed here, so a CC would be much appreciated!


Why gather TOAST statistics?
============================
TOAST is transparent and opaque at the same time.
Whilst we know that it's there and we know _that_ it works, we cannot 
generally tell _how well_ it works.

What we can't answer (easily) are questions like e.g.
- how many datums have been externalized?
- how many datums have been compressed?
- how often has a compression failed (resulted in no space saving)?
- how effective is the compression algorithm used on a column?
- how much time did the DB spend compressing/decompressing TOAST values?

The patch adds some functionality that will eventually be able to answer 
these (and probably more) questions.

Currently, #1 - #4 can be answered based on the view contained in 
"pg_stats_toast.sql":

postgres=# CREATE TABLE test (i int, lz4 text COMPRESSION lz4, std text);
postgres=# INSERT INTO test  SELECT 
i,repeat(md5(i::text),100),repeat(md5(i::text),100) FROM 
generate_series(0,100000) x(i);
postgres=# SELECT * FROM pg_stat_toast WHERE schemaname = 'public';
-[ RECORD 1 ]--------+----------
schemaname           | public
reloid               | 16829
attnum               | 2
relname              | test
attname              | lz4
externalizations     | 0
compressions         | 100001
compressionsuccesses | 100001
compressionsizesum   | 6299710
originalsizesum      | 320403204
-[ RECORD 2 ]--------+----------
schemaname           | public
reloid               | 16829
attnum               | 3
relname              | test
attname              | std
externalizations     | 0
compressions         | 100001
compressionsuccesses | 100001
compressionsizesum   | 8198819
originalsizesum      | 320403204


Implementation
==============
I added some callbacks in backend/access/table/toast_helper.c to 
"pgstat_report_toast_activity" in backend/postmaster/pgstat.c.

The latter (and the other additions there) are essentially 1:1 copies of 
the function statistics.

Those were the perfect template, as IMHO the TOAST activities (well, 
what we're interested in at least) are very much comparable to function 
calls:
a) It doesn't really matter if the TOASTed data was committed, as "the 
damage is done" (i.e. CPU cycles were used) anyway
b) The information can (thus/best) be stored on DB level, no need to 
touch the relation or attribute statistics

I didn't find anything that could have been used as a hash key, so the
     PgStat_StatToastEntry
uses the shiny new
     PgStat_BackendAttrIdentifier
(containing relid Oid, attr int).

For persisting in the statsfile, I chose the identifier 'O' (as 'T' was 
taken).


What's working?
===============
- Gathering of TOAST externalization and compression events
- collecting the sizes before and after compression
- persisting in statsfile
- not breaking "make check"
- not crashing anything (afaict)

What's missing (yet)?
===============
- proper definition of the "pgstat_track_toast" GUC
- Gathering of times (for compression [and decompression?])
- improve "pg_stat_toast" view and include it in the catalog
- documentation (obviously)
- proper naming (of e.g. the hash key type, functions, view columns etc.)
- would it be necessary to implement overflow protection for the size & 
time sums?

Thanks in advance & best regards,
-- 
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bluth@pro-open.de
__________________________________________________________________________
"Ceterum censeo SystemD esse delendam" - Cato
Attachment

pgsql-hackers by date:

Previous
From: Zhihong Yu
Date:
Subject: Re: extended stats on partitioned tables
Next
From: Tomas Vondra
Date:
Subject: Re: extended stats on partitioned tables