Thread: pg_controldata gobbledygook

pg_controldata gobbledygook

From

Peter Eisentraut

Date:

26 April 2013, 03:07:09

I'm not sure who is supposed to be able to read this sort of stuff:

Latest checkpoint's NextXID:          0/7575
Latest checkpoint's NextOID:          49152
Latest checkpoint's NextMultiXactId:  7
Latest checkpoint's NextMultiOffset:  13
Latest checkpoint's oldestXID:        1265
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1

Note that these symbols don't even correspond to the actual symbols used
in the source code in some cases.

The comments in the pg_control.h header file use much more pleasant
terms, which when put to use would lead to output similar to this:

Latest checkpoint's next free transaction ID:             0/7575
Latest checkpoint's next free OID:                        49152
Latest checkpoint's next free MultiXactId:                7
Latest checkpoint's next free MultiXact offset:           13
Latest checkpoint's cluster-wide minimum datfrozenxid:    1265
Latest checkpoint's database with cluster-wide minimum datfrozenxid:  1
Latest checkpoint's oldest transaction ID still running:  0
Latest checkpoint's cluster-wide minimum datminmxid:      1
Latest checkpoint's database with cluster-wide minimum datminmxid:  1

One could even rearrange the layout a little bit like this:

Control data as of latest checkpoint:   next free transaction ID:             0/7575   next free OID:
    49152
 
etc.

Comments?

Re: pg_controldata gobbledygook

From

Tom Lane

Date:

26 April 2013, 03:19:23

Peter Eisentraut <peter_e@gmx.net> writes:
> The comments in the pg_control.h header file use much more pleasant
> terms, which when put to use would lead to output similar to this:

> Latest checkpoint's next free transaction ID:             0/7575
> Latest checkpoint's next free OID:                        49152
> Latest checkpoint's next free MultiXactId:                7
> Latest checkpoint's next free MultiXact offset:           13
> Latest checkpoint's cluster-wide minimum datfrozenxid:    1265
> Latest checkpoint's database with cluster-wide minimum datfrozenxid:  1
> Latest checkpoint's oldest transaction ID still running:  0
> Latest checkpoint's cluster-wide minimum datminmxid:      1
> Latest checkpoint's database with cluster-wide minimum datminmxid:  1

> One could even rearrange the layout a little bit like this:

> Control data as of latest checkpoint:
>     next free transaction ID:             0/7575
>     next free OID:                        49152
> etc.

> Comments?

I think I've heard of scripts grepping the output of pg_controldata for
this that or the other.  Any rewording of the labels would break that.
While I'm not opposed to improving the labels, I would vote against your
second, abbreviated scheme because it would make things ambiguous for
simple grep-based scripts.
        regards, tom lane

Re: pg_controldata gobbledygook

From

Peter Geoghegan

Date:

26 April 2013, 03:22:51

On Thu, Apr 25, 2013 at 8:07 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> Comments?

+1 from me.

I don't think that these particular changes would break WAL-E,
Heroku's continuous archiving tool, which has a class called
PgControlDataParser. However, it's possible to imagine someone being
affected in a similar way. So I'd be sure to document it clearly, and
to perhaps preserve the old label names to avoid breaking scripts.

-- 
Peter Geoghegan

Re: pg_controldata gobbledygook

From

Alvaro Herrera

Date:

26 April 2013, 04:22:13

Tom Lane wrote:

> I think I've heard of scripts grepping the output of pg_controldata for
> this that or the other.  Any rewording of the labels would break that.
> While I'm not opposed to improving the labels, I would vote against your
> second, abbreviated scheme because it would make things ambiguous for
> simple grep-based scripts.

We could provide two alternative outputs, one for human consumption with
the proposed format and something else that uses, say, shell assignment
syntax.  (I did propose this years ago and I might have an unfinished
patch still lingering about somewhere.)

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: pg_controldata gobbledygook

From

Fabrízio de Royes Mello

Date:

26 April 2013, 04:25:45

On Fri, Apr 26, 2013 at 12:22 AM, Peter Geoghegan <pg@heroku.com> wrote:

On Thu, Apr 25, 2013 at 8:07 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> Comments?

+1 from me.

I don't think that these particular changes would break WAL-E,
Heroku's continuous archiving tool, which has a class called
PgControlDataParser. However, it's possible to imagine someone being
affected in a similar way. So I'd be sure to document it clearly, and
to perhaps preserve the old label names to avoid breaking scripts.

Why don't we add options to pg_controldata outputs the info in other several formats like json, yaml, xml or another one?

Best regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: pg_controldata gobbledygook

From

Tom Lane

Date:

26 April 2013, 04:35:01

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> I think I've heard of scripts grepping the output of pg_controldata for
>> this that or the other.  Any rewording of the labels would break that.
>> While I'm not opposed to improving the labels, I would vote against your
>> second, abbreviated scheme because it would make things ambiguous for
>> simple grep-based scripts.

> We could provide two alternative outputs, one for human consumption with
> the proposed format and something else that uses, say, shell assignment
> syntax.  (I did propose this years ago and I might have an unfinished
> patch still lingering about somewhere.)

And a script would use that how?  "pg_controldata --machine-friendly"
would fail outright on older versions.  I think it's okay to ask script
writers to writepg_controldata | grep -e 'old label|new label'
but not okay to ask them to deal with anything as complicated as trying
a switch to see if it works or not.
        regards, tom lane

Re: pg_controldata gobbledygook

From

Daniel Farina

Date:

26 April 2013, 06:54:17

On Thu, Apr 25, 2013 at 9:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Tom Lane wrote:
>>> I think I've heard of scripts grepping the output of pg_controldata for
>>> this that or the other.  Any rewording of the labels would break that.
>>> While I'm not opposed to improving the labels, I would vote against your
>>> second, abbreviated scheme because it would make things ambiguous for
>>> simple grep-based scripts.
>
>> We could provide two alternative outputs, one for human consumption with
>> the proposed format and something else that uses, say, shell assignment
>> syntax.  (I did propose this years ago and I might have an unfinished
>> patch still lingering about somewhere.)
>
> And a script would use that how?  "pg_controldata --machine-friendly"
> would fail outright on older versions.  I think it's okay to ask script
> writers to write
>         pg_controldata | grep -e 'old label|new label'
> but not okay to ask them to deal with anything as complicated as trying
> a switch to see if it works or not.

From what I'm reading, it seems like the main benefit of the changes
is to make things easier for humans to skim over.  Automated programs
that care about precise meanings of each field are awkwardly but
otherwise well-served by the precise output as rendered right now.

What about doing something similar but different from the
--machine-readable proposal, such as adding an option for the
*human*-readable variant that is guaranteed to mercilessly change as
human-readers/-hackers sees fit on whim?  It's a bit of a kludge that
this is not the default, but would prevent having to serve two quite
different masters with the same output.

Although I'm not seriously proposing explicitly "-h" (as seen in some
GNU programs in rendering byte sizes and the like...yet could be
confused for 'help'), something like that may serve as prior art.

Re: pg_controldata gobbledygook

From

Gavin Flower

Date:

26 April 2013, 09:01:01

<div class="moz-cite-prefix">On 26/04/13 18:53, Daniel Farina wrote:<br /></div><blockquote
cite="mid:CAAZKuFa5ougGqfy+z6SpMB+ppCy3Oxq1xP1X4ekEgVvj4Zt2cQ@mail.gmail.com"type="cite"><pre wrap="">On Thu, Apr 25,
2013at 9:34 PM, Tom Lane <a class="moz-txt-link-rfc2396E" href="mailto:tgl@sss.pgh.pa.us"><tgl@sss.pgh.pa.us></a>
wrote:
</pre><blockquote type="cite"><pre wrap="">Alvaro Herrera <a class="moz-txt-link-rfc2396E"
href="mailto:alvherre@2ndquadrant.com"><alvherre@2ndquadrant.com></a>writes:
 
</pre><blockquote type="cite"><pre wrap="">Tom Lane wrote:
</pre><blockquote type="cite"><pre wrap="">I think I've heard of scripts grepping the output of pg_controldata for
this that or the other.  Any rewording of the labels would break that.
While I'm not opposed to improving the labels, I would vote against your
second, abbreviated scheme because it would make things ambiguous for
simple grep-based scripts.
</pre></blockquote></blockquote><pre wrap="">
</pre><blockquote type="cite"><pre wrap="">We could provide two alternative outputs, one for human consumption with
the proposed format and something else that uses, say, shell assignment
syntax.  (I did propose this years ago and I might have an unfinished
patch still lingering about somewhere.)
</pre></blockquote><pre wrap="">
And a script would use that how?  "pg_controldata --machine-friendly"
would fail outright on older versions.  I think it's okay to ask script
writers to write       pg_controldata | grep -e 'old label|new label'
but not okay to ask them to deal with anything as complicated as trying
a switch to see if it works or not.
</pre></blockquote><pre wrap="">
From what I'm reading, it seems like the main benefit of the changes
is to make things easier for humans to skim over.  Automated programs
that care about precise meanings of each field are awkwardly but
otherwise well-served by the precise output as rendered right now.

What about doing something similar but different from the
--machine-readable proposal, such as adding an option for the
*human*-readable variant that is guaranteed to mercilessly change as
human-readers/-hackers sees fit on whim?  It's a bit of a kludge that
this is not the default, but would prevent having to serve two quite
different masters with the same output.

Although I'm not seriously proposing explicitly "-h" (as seen in some
GNU programs in rendering byte sizes and the like...yet could be
confused for 'help'), something like that may serve as prior art.


</pre></blockquote>  I think the current way should remain the default, as Daniel suggests - but a '--human-readable'
(orsuitable abbreviation) flag could be added.<br /><br /> Such as in the command to list directory details, using the
'ls'command in Linux...<br /><br /><br /> (Below, <tt><small><b>Y</b></small></tt> = 1024 * 1024 * 1024 * 1024 * 1024 *
1024* 1024 * 1024 bytes  = 2^80 bytes.)<br /><br /><small><b><tt>man ls</tt></b><b><tt><br
/></tt></b><b><tt>[...]</tt></b><b><tt><br/></tt></b><b><tt>      -h, --human-readable</tt></b><b><tt><br
/></tt></b><b><tt>             with -l, print sizes in human readable format (e.g., 1K 234M 2G)</tt></b><b><tt><br
/></tt></b><b><tt>[...]</tt></b><b><tt><br /></tt></b><b><tt>       SIZE  may  be (or may be an integer optionally
followedby) one of fol‐</tt></b><b><tt><br /></tt></b><b><tt>       lowing: KB 1000, K 1024, MB 1000*1000, M 1024*1024,
andso on for G, T,</tt></b><b><tt><br /></tt></b><b><tt>       P, E, Z, Y.</tt></b><b><tt><br /></tt></b><b><tt>
[...]</tt></b></small><br/><br /><br /> Cheers,<br /> Gavin<br />

Re: pg_controldata gobbledygook

From

Andres Freund

Date:

26 April 2013, 09:08:18

On 2013-04-25 23:07:02 -0400, Peter Eisentraut wrote:
> I'm not sure who is supposed to be able to read this sort of stuff:
> 
> Latest checkpoint's NextXID:          0/7575
> Latest checkpoint's NextOID:          49152
> Latest checkpoint's NextMultiXactId:  7
> Latest checkpoint's NextMultiOffset:  13
> Latest checkpoint's oldestXID:        1265
> Latest checkpoint's oldestXID's DB:   1
> Latest checkpoint's oldestActiveXID:  0
> Latest checkpoint's oldestMultiXid:   1
> Latest checkpoint's oldestMulti's DB: 1
> 
> Note that these symbols don't even correspond to the actual symbols used
> in the source code in some cases.
> 
> The comments in the pg_control.h header file use much more pleasant
> terms, which when put to use would lead to output similar to this:
> 
> Latest checkpoint's next free transaction ID:             0/7575
> Latest checkpoint's next free OID:                        49152
> Latest checkpoint's next free MultiXactId:                7
> Latest checkpoint's next free MultiXact offset:           13
> Latest checkpoint's cluster-wide minimum datfrozenxid:    1265
> Latest checkpoint's database with cluster-wide minimum datfrozenxid:  1
> Latest checkpoint's oldest transaction ID still running:  0
> Latest checkpoint's cluster-wide minimum datminmxid:      1
> Latest checkpoint's database with cluster-wide minimum datminmxid:  1
> 
> One could even rearrange the layout a little bit like this:
> 
> Control data as of latest checkpoint:
>     next free transaction ID:             0/7575
>     next free OID:                        49152

I have to admit I don't see the point. None of those values is particularly
interesting to anybody without implementation level knowledge and those
will likely deal with them just fine. And I find the version with the
shorter names far quicker to read.
The clarity win here doesn't seem to be worth the price of potentially
breaking some tools.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: pg_controldata gobbledygook

From

Bernd Helmle

Date:

26 April 2013, 11:28:21

--On 25. April 2013 23:19:14 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I think I've heard of scripts grepping the output of pg_controldata for
> this that or the other.  Any rewording of the labels would break that.
> While I'm not opposed to improving the labels, I would vote against your
> second, abbreviated scheme because it would make things ambiguous for
> simple grep-based scripts.

I had exactly this kind of discussion just a few days ago with a customer, 
who wants to use the output in their scripts and was a little worried about 
the compatibility between major versions.

I don't think we do guarantuee any output format compatibility between 
corresponding symbols in major versions explicitly, but given that 
pg_controldata seems to have a broad use case here, we should maybe 
document it somewhere wether to discourage or encourage people to rely on 
it?

-- 
Thanks
Bernd

Re: pg_controldata gobbledygook

From

Robert Haas

Date:

26 April 2013, 12:51:34

On Fri, Apr 26, 2013 at 5:08 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> I have to admit I don't see the point. None of those values is particularly
> interesting to anybody without implementation level knowledge and those
> will likely deal with them just fine. And I find the version with the
> shorter names far quicker to read.
> The clarity win here doesn't seem to be worth the price of potentially
> breaking some tools.

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_controldata gobbledygook

From

Jeff Janes

Date:

26 April 2013, 16:31:23

On Fri, Apr 26, 2013 at 2:08 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-04-25 23:07:02 -0400, Peter Eisentraut wrote:
> I'm not sure who is supposed to be able to read this sort of stuff:
>
> Latest checkpoint's NextXID: 0/7575
> Latest checkpoint's NextOID: 49152
> Latest checkpoint's NextMultiXactId: 7
> Latest checkpoint's NextMultiOffset: 13
> Latest checkpoint's oldestXID: 1265
> Latest checkpoint's oldestXID's DB: 1
> Latest checkpoint's oldestActiveXID: 0
> Latest checkpoint's oldestMultiXid: 1
> Latest checkpoint's oldestMulti's DB: 1
>
> Note that these symbols don't even correspond to the actual symbols used
> in the source code in some cases.
>
> The comments in the pg_control.h header file use much more pleasant
> terms, which when put to use would lead to output similar to this:
>
> Latest checkpoint's next free transaction ID: 0/7575
> Latest checkpoint's next free OID: 49152
> Latest checkpoint's next free MultiXactId: 7
> Latest checkpoint's next free MultiXact offset: 13
> Latest checkpoint's cluster-wide minimum datfrozenxid: 1265
> Latest checkpoint's database with cluster-wide minimum datfrozenxid: 1
> Latest checkpoint's oldest transaction ID still running: 0
> Latest checkpoint's cluster-wide minimum datminmxid: 1
> Latest checkpoint's database with cluster-wide minimum datminmxid: 1
>
> One could even rearrange the layout a little bit like this:
>
> Control data as of latest checkpoint:
> next free transaction ID: 0/7575
> next free OID: 49152

I have to admit I don't see the point. None of those values is particularly
interesting to anybody without implementation level knowledge and those
will likely deal with them just fine. And I find the version with the
shorter names far quicker to read.

I agree. For the ones I didn't know the meaning of, I still don't know the meaning of them based on the long form, either. While a tutorial on what these things mean might be useful, embedding the tutorial into the output of pg_controldata probably isn't the right place.

Cheers,

Jeff

Re: pg_controldata gobbledygook

From

Bruce Momjian

Date:

02 May 2013, 14:31:39

On Fri, Apr 26, 2013 at 08:51:23AM -0400, Robert Haas wrote:
> On Fri, Apr 26, 2013 at 5:08 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > I have to admit I don't see the point. None of those values is particularly
> > interesting to anybody without implementation level knowledge and those
> > will likely deal with them just fine. And I find the version with the
> > shorter names far quicker to read.
> > The clarity win here doesn't seem to be worth the price of potentially
> > breaking some tools.
> 
> +1.

FYI, pg_upgrade would certainly have to be updated to handle this
change.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +