Block write statistics WIP - Mailing list pgsql-hackers

From Greg Smith
Subject Block write statistics WIP
Date
Msg-id 51988A0D.5080600@2ndQuadrant.com
Whole thread Raw
Responses Re: Block write statistics WIP  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
I have some time now for working on the mystery of why there are no
block write statistics in the database.  I hacked out the statistics
collection and reporting side already, rough patch attached if you want
to see the boring parts.

I guessed that there had to be a gotcha behind why this wasn't done
before now, and I've found one so far.  All of the read statistics are
collected with a macro that expects to know a Relation number.  Callers
to ReadBuffer pass one.  On the write side, the two places that
increment the existing write counters (pg_stat_statements, vacuum) are
MarkBufferDirty and MarkBufferDirtyHint.  Neither of those is given a
Relation though.  Inspecting the Buffer passed can only find the buffer
tag's RelFileNode.

I've thought of two paths to get a block write count out of that so far:

-Provide a function to find the Relation from the RelFileNode.  There is
a warning about the perils of assuming you can map that way from a
buftag value in buf_internals.h though:

"Note: the BufferTag data must be sufficient to determine where to write
the block, without reference to pg_class or pg_tablespace entries.  It's
possible that the backend flushing the buffer doesn't even believe the
relation is visible yet (its xact may have started before the xact that
created the rel).  The storage manager must be able to cope anyway."

-Modify every caller of MarkDirty* to include a relation when that
information is available.  There are about 200 callers of those
functions around, so that won't be fun.  I noted that many of them are
in the XLog replay functions, which won't have the relation--but those
aren't important to count anyway.

Neither of these options feels very good to me, so selecting between the
two feels like picking the lesser of two evils.  I'm fine with chugging
through all of the callers to modify MarkDirty*, but I didn't want to do
that only to have the whole approach rejected as wrong.  That's why I
stopped here to get some feedback.

The way that MarkDirty requires this specific underlying storage manager
behavior to work properly strikes me as as a bit of a layering violation
too.  I'd like the read and write paths to have a similar API, but here
they don't even operate on the same type of inputs.  Addressing that is
probably harder than just throwing a hack on the existing code though.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Attachment

pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: pgbench vs. SERIALIZABLE
Next
From: Simon Riggs
Date:
Subject: Re: fast promotion and log_checkpoints