Thread: machine-readable pg_controldata?

machine-readable pg_controldata?

From
Josh Berkus
Date:
All,

Currently, the only way for admin scripts to get individual data items
out of pg_controldata (such as the next XID or the catalog version) is
via grep and regex. Given that people are going to be relying on some of
this data for replication admin in the future, it seems past time to
have a form of pg_controldata which either outputs machine-readable text
(XML or JSON), or (my preference) takes options to output just the
invididual items, e.g.

pg_controldata --catalog_version

Even better would be the ability to get everything which is in
pg_controldata currently as part of a system view in a running
PostgreSQL; I can get most of them, but certainly not all.

Thoughts?

--Josh Berkus


Re: machine-readable pg_controldata?

From
Magnus Hagander
Date:
2010/3/4 Josh Berkus <josh@agliodbs.com>:
> All,
>
> Currently, the only way for admin scripts to get individual data items
> out of pg_controldata (such as the next XID or the catalog version) is
> via grep and regex. Given that people are going to be relying on some of
> this data for replication admin in the future, it seems past time to
> have a form of pg_controldata which either outputs machine-readable text
> (XML or JSON), or (my preference) takes options to output just the
> invididual items, e.g.

Huh? It's fixed with, you don't need regexps for that. Just split the
string apart.

Taking options for single fields might have a better usecase, of course :-)


> pg_controldata --catalog_version
>
> Even better would be the ability to get everything which is in
> pg_controldata currently as part of a system view in a running
> PostgreSQL; I can get most of them, but certainly not all.

+1 for having all the information available from inside the backend,
if that's technically possible (which I assume it should be)

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: machine-readable pg_controldata?

From
Joshua Tolley
Date:
On Thu, Mar 04, 2010 at 10:54:15PM +0100, Magnus Hagander wrote:
> 2010/3/4 Josh Berkus <josh@agliodbs.com>:
> > pg_controldata --catalog_version
> >
> > Even better would be the ability to get everything which is in
> > pg_controldata currently as part of a system view in a running
> > PostgreSQL; I can get most of them, but certainly not all.
>
> +1 for having all the information available from inside the backend,
> if that's technically possible (which I assume it should be)

I'd love to see pg_config's various bits of information in there as well. I
just haven't gotten around to writing it. But +1 from me, FWIW.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

Re: machine-readable pg_controldata?

From
Joe Conway
Date:
On 03/04/2010 02:09 PM, Joshua Tolley wrote:
> On Thu, Mar 04, 2010 at 10:54:15PM +0100, Magnus Hagander wrote:
>> 2010/3/4 Josh Berkus <josh@agliodbs.com>:
>>> pg_controldata --catalog_version
>>>
>>> Even better would be the ability to get everything which is in
>>> pg_controldata currently as part of a system view in a running
>>> PostgreSQL; I can get most of them, but certainly not all.
>>
>> +1 for having all the information available from inside the backend,
>> if that's technically possible (which I assume it should be)
>
> I'd love to see pg_config's various bits of information in there as well. I
> just haven't gotten around to writing it. But +1 from me, FWIW.

I agree something like this would be useful -- maybe I'll try to come up
with some round tuits...

Joe



Re: machine-readable pg_controldata?

From
Greg Smith
Date:
Magnus Hagander wrote:<br /><blockquote cite="mid:9837222c1003041354v19e7b451xd823e4a8a7489d@mail.gmail.com"
type="cite"><prewrap="">Huh? It's fixed with, you don't need regexps for that. Just split the
 
string apart.

Taking options for single fields might have a better usecase, of course :-) </pre></blockquote><br /> I do find it a
bithard to imagine that any program capable of shelling out to call pg_controldata and doing something with the output
wouldhit a major hurdle parsing the format that's already there.  Moving toward single fields I could see as being
betterfor some cases, but going all the way to XML/JSON is a step backwards from the plain text format as far as I'm
concerned. Anything that can parse one of those complicated formats should be able to trivially chew the existing one
already. Seriously:  I have bash scripts that parse that thing.<br /><br /><blockquote
cite="mid:9837222c1003041354v19e7b451xd823e4a8a7489d@mail.gmail.com"type="cite"><blockquote type="cite"><pre
wrap="">Evenbetter would be the ability to get everything which is in
 
pg_controldata currently as part of a system view in a running
PostgreSQL; I can get most of them, but certainly not all.   </pre></blockquote><pre wrap="">
+1 for having all the information available from inside the backend,
if that's technically possible (which I assume it should be) </pre></blockquote><br /> I revisit this every time I
writeyet another user-space parser and ask myself why I haven't exposed it in the server yet.  The primary answer so
farhas always been "because you can't execute a query on the standby while it's in recovery", making half the stuff I
wantedthe data far (e.g. standby lag monitoring like <a class="moz-txt-link-freetext"
href="http://www.kennygorman.com/wordpress/?p=249">http://www.kennygorman.com/wordpress/?p=249</a>) unable to use that
interfaceanyway.  Now that Hot Standby cracks that objection, it's worth talking about for a minute.<br /><br />
pg_controldataitself just reads the file in directly and dumps the data.  There is a copy of it kept around all the
timein shared memory though (ControlFile in xlog.c), protected by a LWLock.  At a high level you can imagine a new
functionin xlog.c that acquires that lock, copies the block into a space the backend allocated for saving it, releases
thelock, and then returns the whole structure.  Then just wrap some number of superuser-only UDFs around it (I'd guess
nobodywants regular ones able to hold a lock on ControlFile) and you've exposed the results to user-space.<br /><br />
Twoquestions before I'd volunteer to write that:<br /><br /> 1) How do you handle the situation where the
pg_controldatais invalid?  "Not read in yet" and "CRC is bad" are the two most obvious ones that can happen.  Return a
nullfor every field, try and guess (the way pg_resetxlog does), don't return a row of output at all, or throw an
error? Each of these has slightly different implications for how admin code that will do something with these values
willhave to be structured.<br /><br /> 2) While it's easy to say "I only want one or two of these values" and expose a
wholeset of UDFs to grab them individually (perhaps wrapping into a system view via that popular approach), I am
concernedthat people are going to call any single-value versions provided one at a time and get an inconsistent set.  I
thinkthe only reasonable interface to this would not return a single field, it would pop out all of them so you got a
matchingset from the point in time the lock was held.  And if that's the case, I'm not sure of the most reasonable UI
is. Just return a whole row with a column for each field in the file, and then people can select out just the ones they
want? (That's probably the right one)  Produce the mess as a series of rows with (name,value) pairs?  Put them into an
array?<br/><br /> Have re-raised these concerns to myself, this is usually the point in this exercise where I go "screw
it,I'll just parse pg_controldata again instead" and do that instead.  This is happening so much lately that I think
Josh'ssuggestion it's just unworkable to keep going via that model forever has merit though.  I find it hard to mark
this9.0 territory though, given the data is not actually difficult to grab--and that trail is already well blazed,
nothingnew in this version.<br /><br /> In short:  I'd vote for TODO item and would happily write myself for 9.1 given
reasonableagreement on the questions raised above, -1 for doing anything about it right now though.  Given both the
existenceof completely reasonable workarounds and the existence of much more serious blocker problems sitting on the
roadmapto release, can't get real excited about this as the thing to worry about right now.  Same reason I ignored the
ideawhen Joshua Tolley brought it up last month:  <a class="moz-txt-link-freetext"
href="http://archives.postgresql.org/message-id/4b69caeb.9513f30a.731a.3427@mx.google.com">http://archives.postgresql.org/message-id/4b69caeb.9513f30a.731a.3427@mx.google.com</a><br
/><br/><pre class="moz-signature" cols="72">-- 
 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
<a class="moz-txt-link-abbreviated" href="mailto:greg@2ndQuadrant.com">greg@2ndQuadrant.com</a>   <a
class="moz-txt-link-abbreviated"href="http://www.2ndQuadrant.us">www.2ndQuadrant.us</a>
 

</pre>

Re: machine-readable pg_controldata?

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> I do find it a bit hard to imagine that any program capable of shelling 
> out to call pg_controldata and doing something with the output would hit 
> a major hurdle parsing the format that's already there.

+1

> 1) How do you handle the situation where the pg_controldata is invalid?  

Throw an error

> 2) While it's easy to say "I only want one or two of these values" and 
> expose a whole set of UDFs to grab them individually (perhaps wrapping 
> into a system view via that popular approach), I am concerned that 
> people are going to call any single-value versions provided one at a 
> time and get an inconsistent set.

I'm not too concerned about this. This will be a fairly advanced interface, 
and a warning in the docs should suffice. I think a good interface will 
help however. I'd lean towards something like pg_settings.

What I *would* like to see is two tweaks to the output of pg_controldata. 
First, having the "time of latest checkpoint" appear as an epoch (rather than 
or in addition to a localized time string) would help quite a bit. Second, it 
can be hard to build regex solutions when you don't know whan language your 
end user will be using. Not sure of the best solution for that one off the top of 
my head, but there are some workarounds. For example, check_postgres.pl stores 
all the languages translations of "Time of latest checkoint" to help it find that 
information, but I'd sure like a more elegant solution. (One could count lines, 
but that's presumes the order and number of items will never change).

- -- 
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201003050945
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkuRGYMACgkQvJuQZxSWSsiDvgCgxgFtcy99ehUGt7i7gCp8zRTY
044An1JEEwki9KLZu5BhKXCUNGqfyXDf
=ruYL
-----END PGP SIGNATURE-----




Re: machine-readable pg_controldata?

From
Heikki Linnakangas
Date:
Greg Smith wrote:
> pg_controldata itself just reads the file in directly and dumps the
> data.  There is a copy of it kept around all the time in shared memory
> though (ControlFile in xlog.c), protected by a LWLock.  At a high level
> you can imagine a new function in xlog.c that acquires that lock, copies
> the block into a space the backend allocated for saving it, releases the
> lock, and then returns the whole structure.  Then just wrap some number
> of superuser-only UDFs around it (I'd guess nobody wants regular ones
> able to hold a lock on ControlFile) and you've exposed the results to
> user-space.
> 
> Two questions before I'd volunteer to write that:
> 
> 1) How do you handle the situation where the pg_controldata is invalid? 
> "Not read in yet" and "CRC is bad" are the two most obvious ones that
> can happen.

If the data in pg_control are invalid, the database won't start up, so
by the time you're running the user-defined-functions the data really
ought be valid.

> 2) While it's easy to say "I only want one or two of these values" and
> expose a whole set of UDFs to grab them individually (perhaps wrapping
> into a system view via that popular approach), I am concerned that
> people are going to call any single-value versions provided one at a
> time and get an inconsistent set.  I think the only reasonable interface
> to this would not return a single field, it would pop out all of them so
> you got a matching set from the point in time the lock was held.  And if
> that's the case, I'm not sure of the most reasonable UI is.  Just return
> a whole row with a column for each field in the file, and then people
> can select out just the ones they want?  (That's probably the right
> one)

Yeah, one column for field seems ok to me.

Which fields do you want to expose? Perhaps it would make sense to split
the functionality in a few user-defined functions: one to return static
information about the architecture and compilation options (alignment,
32-bin vs 64 bit, block sizes, etc.) one to return all the fields
regarding latest checkpoint, plus other functions for the rest that are
needed.

The REDO location of last checkpoint might deserve a function of its
own, like we have pg_last_xlog_replay_location() and
pg_last_xlog_receive_location().

> Have re-raised these concerns to myself, this is usually the point in
> this exercise where I go "screw it, I'll just parse pg_controldata again
> instead" and do that instead.  This is happening so much lately that I
> think Josh's suggestion it's just unworkable to keep going via that
> model forever has merit though.  I find it hard to mark this 9.0
> territory though, given the data is not actually difficult to grab--and
> that trail is already well blazed, nothing new in this version.

I have no problem adding this to 9.0 if we have a solid proposal for the
UI. It's low risk and makes the life easier for people. People are
clearly missing this.

Then again, if you don't use the copy in shared memory but just open the
pg_control file and read it in the UDF, you could implement this as a
pgfoundry module that works with older versions too.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: machine-readable pg_controldata?

From
Greg Smith
Date:
Heikki Linnakangas wrote:
> Then again, if you don't use the copy in shared memory but just open the
> pg_control file and read it in the UDF, you could implement this as a
> pgfoundry module that works with older versions too.
>   

This is the direction I'd prefer to see this go in a 9.0 context.  It's 
easy enough to build a fully functional version that lives works via the 
same proposed final UDF interface, just with the extra step of reading 
the file.  Get that working, and you just added a useful module 
supporting all the way back to 8.2 (I think--not sure if there's been 
any other changes that would break this) that people would love to have. 

Once it's done, the UI is solid, all the data is known to be exposed in 
the right way it turns out people wanted it to be, then do the simple 
conversion it to grab from shared memory instead and add it as an 
official 9.1 feature.  I'm not feeling any pressure that this is a 
must-fix item for the 9.0 release freeze--as warts here go, this is a 
both a small one and one that doesn't have to be fixed in core, so two 
strikes against it being critical.

I would rather have the ability to tweak on this for a few months to get 
everything right, while being able to expose regular updates outside of 
core, than to commit "this is the best we've got so far" just under the 
wire for 9.0 without necessarily enough time to do it well.  The few 
messages that have shown up here already have made left me with the 
optinion that just getting the requirements and preferred implementation 
nailed down here is going to take a few rounds of development to work 
out to everyone's satisfaction.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: machine-readable pg_controldata?

From
Josh Berkus
Date:
On 3/5/10 10:28 AM, Greg Smith wrote:
> 
> I would rather have the ability to tweak on this for a few months to get
> everything right, while being able to expose regular updates outside of
> core, than to commit "this is the best we've got so far" just under the
> wire for 9.0 without necessarily enough time to do it well.

Oh, I wasn't proposing doing *anything* for 9.0.  I wanted to get
something on the TODO list for 9.1.  As far as I'm concerned, 9.0 is
closed to new ideas.  We have enough bugs to fix as it is.

--Josh Berkus


Re: machine-readable pg_controldata?

From
Greg Smith
Date:
Josh Berkus wrote:
> Oh, I wasn't proposing doing *anything* for 9.0.  I wanted to get
> something on the TODO list for 9.1.  As far as I'm concerned, 9.0 is
> closed to new ideas.  We have enough bugs to fix as it is.
>   

I didn't get that initially from how you characterized this as "past 
time" to add.  It's at 
http://wiki.postgresql.org/wiki/Todo#Point-In-Time_Recovery_.28PITR.29 now.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: machine-readable pg_controldata?

From
Josh Berkus
Date:
> I didn't get that initially from how you characterized this as "past
> time" to add.  It's at
> http://wiki.postgresql.org/wiki/Todo#Point-In-Time_Recovery_.28PITR.29 now.

Sorry for not being clear.  I took it for granted that since it's past
2/15, no non-critical patches would be even considered.  And
pg_controldata certainly isn't critical.

--Josh Berkus



Re: machine-readable pg_controldata?

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Greg Smith wrote:
>> 1) How do you handle the situation where the pg_controldata is invalid? 

> If the data in pg_control are invalid, the database won't start up, so
> by the time you're running the user-defined-functions the data really
> ought be valid.

Yeah.  If you are pulling from the shared-memory copy this is an
entirely idle concern.  If that data is not correct we have *way* worse
concerns than whether some UDF or other is going to go crazy.

> Which fields do you want to expose?

That's actually the part of this that concerns me most.  The data that
is in pg_control is really somewhat ad-hoc, particularly the items that
have to do with checking database compatibility.  I'm not comfortable
with the notion that we should expose all and only these fields, and
even less so with the idea that they should be tied together at the SQL
level.

What I'd prefer to see is a use-case presented and an API defined to
solve that case (or those cases, as the case may be).  If pulling data
from pg_control is the best solution, great.  But "let's expose
pg_control" seems like a backwards design approach.

(FWIW, my recollection of the original design intention for
pg_controldata was that it was meant as a troubleshooting tool if the
database wouldn't start up.  I'm somewhat bemused to hear that people
are trying to use it as part of production scripts.  It wasn't meant to
produce machine-readable output, much less to give values that you could
rely on with respect to a running server.)
        regards, tom lane


Re: machine-readable pg_controldata?

From
Josh Berkus
Date:
> (FWIW, my recollection of the original design intention for
> pg_controldata was that it was meant as a troubleshooting tool if the
> database wouldn't start up.  I'm somewhat bemused to hear that people
> are trying to use it as part of production scripts.  It wasn't meant to
> produce machine-readable output, much less to give values that you could
> rely on with respect to a running server.)

I'll have a more detailed list when I've gotten further with testing HS/SR.

--Josh Berkus


Re: machine-readable pg_controldata?

From
Bruce Momjian
Date:
Tom Lane wrote:
> (FWIW, my recollection of the original design intention for
> pg_controldata was that it was meant as a troubleshooting tool if the
> database wouldn't start up.  I'm somewhat bemused to hear that people
> are trying to use it as part of production scripts.  It wasn't meant to
> produce machine-readable output, much less to give values that you could
> rely on with respect to a running server.)

pg_migrator reads/parses the output of pg_controldata for cluster
compatibility checking and for setting counters.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do


Re: machine-readable pg_controldata?

From
Joe Conway
Date:
On 03/05/2010 10:28 AM, Greg Smith wrote:
> Heikki Linnakangas wrote:
>> Then again, if you don't use the copy in shared memory but just open the
>> pg_control file and read it in the UDF, you could implement this as a
>> pgfoundry module that works with older versions too.
>
> This is the direction I'd prefer to see this go in a 9.0 context.  It's
> easy enough to build a fully functional version that lives works via the
> same proposed final UDF interface, just with the extra step of reading
> the file.  Get that working, and you just added a useful module
> supporting all the way back to 8.2 (I think--not sure if there's been
> any other changes that would break this) that people would love to have.

FWIW, here is a quick and dirty effort:
 http://www.joeconway.com/source_code/pg_controldata.tar.gz

It turns out some minor changes are needed for 8.4 or earlier as some
attributes have been added to ControlFileData. Once we get general
agreement I can do that too. I have not bothered to start a pgfoundry
project yet -- thoughts?

Joe


Re: machine-readable pg_controldata?

From
Fujii Masao
Date:
On Sat, Mar 6, 2010 at 12:13 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> The REDO location of last checkpoint might deserve a function of its
> own, like we have pg_last_xlog_replay_location() and
> pg_last_xlog_receive_location().

Agreed. I submitted the patch which introduces new function returning
the REDO location of last checkpoint, on other thread.
http://archives.postgresql.org/pgsql-hackers/2010-03/msg00334.php

Are you planning to apply the patch until 9.0 release?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: machine-readable pg_controldata?

From
Dimitri Fontaine
Date:
Joe Conway <mail@joeconway.com> writes:
>  I have not bothered to start a pgfoundry project yet -- thoughts?

For the visibility of the project, pgfoundry is still a good idea it
seems, even if you still have to register separately for the online
catalogue: http://www.postgresql.org/download/products/1

Now, AFAIUI, if you want to ease the porting of the pgfoundry platform
to something else, the best is not to use much of the features there (if
any), but host the project at github or google code or whatever, and
link from pgfoundry.

For example, I've been switching the code hosting of my extensions from
pgfoundry to github for some time now (switching as soon as maintenance
is needed, git cvsimport is all I need to migrate).

Regards,
-- 
dim


Re: machine-readable pg_controldata?

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> (FWIW, my recollection of the original design intention for
> pg_controldata was that it was meant as a troubleshooting tool if the
> database wouldn't start up.  I'm somewhat bemused to hear that people
> are trying to use it as part of production scripts.  It wasn't meant to
> produce machine-readable output, much less to give values that you could
> rely on with respect to a running server.)

I mostly use it to check on the progress of a PITR slave: just curious, 
is there a better/preferred way to get that information?

- -- 
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 201003091016
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkuWZnAACgkQvJuQZxSWSsiqWgCglhHZ/awR5lSkjPG+yzd05ulz
X6AAn0uvraweuz8pG15OybCyMzgfH3XX
=QI2N
-----END PGP SIGNATURE-----