[PATCH] pg_stat_toast - Mailing list pgsql-hackers
From | Gunnar \"Nick\" Bluth |
---|---|
Subject | [PATCH] pg_stat_toast |
Date | |
Msg-id | a08b54fa-7b13-9531-6233-33a3d23773a8@pro-open.de Whole thread Raw |
Responses |
Re: [PATCH] pg_stat_toast
RE: [PATCH] pg_stat_toast [PATCH] pg_stat_toast v0.3 |
List | pgsql-hackers |
Hello -hackers! Please have a look at the attached patch, which implements some statistics for TOAST. The idea (and patch) have been lurking here for quite a while now, so I decided to dust it off, rebase it to HEAD and send it out for review today. A big shoutout to Georgios Kokolatos, who gave me a crash course in PG hacking, some very useful hints and valueable feedback early this year. I'd like to get some feedback about the general idea, approach, naming etc. before refining this further. I'm not a C person and I s**k at git, so please be kind with me! ;-) Also, I'm not subscribed here, so a CC would be much appreciated! Why gather TOAST statistics? ============================ TOAST is transparent and opaque at the same time. Whilst we know that it's there and we know _that_ it works, we cannot generally tell _how well_ it works. What we can't answer (easily) are questions like e.g. - how many datums have been externalized? - how many datums have been compressed? - how often has a compression failed (resulted in no space saving)? - how effective is the compression algorithm used on a column? - how much time did the DB spend compressing/decompressing TOAST values? The patch adds some functionality that will eventually be able to answer these (and probably more) questions. Currently, #1 - #4 can be answered based on the view contained in "pg_stats_toast.sql": postgres=# CREATE TABLE test (i int, lz4 text COMPRESSION lz4, std text); postgres=# INSERT INTO test SELECT i,repeat(md5(i::text),100),repeat(md5(i::text),100) FROM generate_series(0,100000) x(i); postgres=# SELECT * FROM pg_stat_toast WHERE schemaname = 'public'; -[ RECORD 1 ]--------+---------- schemaname | public reloid | 16829 attnum | 2 relname | test attname | lz4 externalizations | 0 compressions | 100001 compressionsuccesses | 100001 compressionsizesum | 6299710 originalsizesum | 320403204 -[ RECORD 2 ]--------+---------- schemaname | public reloid | 16829 attnum | 3 relname | test attname | std externalizations | 0 compressions | 100001 compressionsuccesses | 100001 compressionsizesum | 8198819 originalsizesum | 320403204 Implementation ============== I added some callbacks in backend/access/table/toast_helper.c to "pgstat_report_toast_activity" in backend/postmaster/pgstat.c. The latter (and the other additions there) are essentially 1:1 copies of the function statistics. Those were the perfect template, as IMHO the TOAST activities (well, what we're interested in at least) are very much comparable to function calls: a) It doesn't really matter if the TOASTed data was committed, as "the damage is done" (i.e. CPU cycles were used) anyway b) The information can (thus/best) be stored on DB level, no need to touch the relation or attribute statistics I didn't find anything that could have been used as a hash key, so the PgStat_StatToastEntry uses the shiny new PgStat_BackendAttrIdentifier (containing relid Oid, attr int). For persisting in the statsfile, I chose the identifier 'O' (as 'T' was taken). What's working? =============== - Gathering of TOAST externalization and compression events - collecting the sizes before and after compression - persisting in statsfile - not breaking "make check" - not crashing anything (afaict) What's missing (yet)? =============== - proper definition of the "pgstat_track_toast" GUC - Gathering of times (for compression [and decompression?]) - improve "pg_stat_toast" view and include it in the catalog - documentation (obviously) - proper naming (of e.g. the hash key type, functions, view columns etc.) - would it be necessary to implement overflow protection for the size & time sums? Thanks in advance & best regards, -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato
Attachment
pgsql-hackers by date: