Thread: ProcessUtility_hook

ProcessUtility_hook

From
Itagaki Takahiro
Date:
We can hook SELECT, INSERT, UPDATE and DELETE with ExecutorXxx_hooks,
but there is no way to hook DDL commands. Can I submit a patch to add
ProcessUtility_hook?

pg_stat_statements module also handle DDL commands as same as normal
queries. Fortunately, autovacuum calls call vacuum() directly,
so the statistics will not filled with VACUUM and ANALYZE commands
by autovacuum.


There might be another usage for the hook. Since we can filter all
SQL commands using ExecutorRun_hook and ProcessUtility_hook, so we
could write query text for database audit with programmable filter
in the hooks.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



Re: ProcessUtility_hook

From
Euler Taveira de Oliveira
Date:
Itagaki Takahiro escreveu:
> We can hook SELECT, INSERT, UPDATE and DELETE with ExecutorXxx_hooks,
> but there is no way to hook DDL commands. Can I submit a patch to add
> ProcessUtility_hook?
> 
+1. Hey, I was thinking about that yesterday. ;)

> pg_stat_statements module also handle DDL commands as same as normal
> queries. Fortunately, autovacuum calls call vacuum() directly,
> so the statistics will not filled with VACUUM and ANALYZE commands
> by autovacuum.
> 
I don't look at the code but maybe there is a way to hook something there too.
But I'd leave it alone for now, unless it's few lines of code.


--  Euler Taveira de Oliveira http://www.timbira.com/


Re: ProcessUtility_hook

From
Itagaki Takahiro
Date:
Here is a patch to add ProcessUtility_hook to handle all DDL
commands in user modules. (ProcessUtility_hook_20091021.patch)
It adds a global variable 'ProcessUtility_hook' and
export the original function as 'standard_ProcessUtility'.


Euler Taveira de Oliveira <euler@timbira.com> wrote:

> > pg_stat_statements module also handle DDL commands as same as normal
> > queries. Fortunately, autovacuum calls call vacuum() directly,
> > so the statistics will not filled with VACUUM and ANALYZE commands
> > by autovacuum.
> >
> I don't look at the code but maybe there is a way to hook something there too.
> But I'd leave it alone for now, unless it's few lines of code.

I see it is debatable whether pg_stat_statements should support the hook.
So I split the change in another patch. (pgss_utility_hook_20091021.patch)

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


Attachment

Re: ProcessUtility_hook

From
Euler Taveira de Oliveira
Date:
Itagaki Takahiro escreveu:
> Here is a patch to add ProcessUtility_hook to handle all DDL
> commands in user modules. (ProcessUtility_hook_20091021.patch)
> It adds a global variable 'ProcessUtility_hook' and
> export the original function as 'standard_ProcessUtility'.
> 
I've reviewed your patch. It applies cleanly and compiles without warnings. It
lacks documentation. Could you update it?

The functionality is divided in two parts. The first part is a hook in the
utility module. The idea is capture the commands that doesn't pass through
executor. I'm afraid that that hook will be used only for capturing non-DML
queries. If so, why don't we hack the tcop/postgres.c and grab those queries
from the same point we log statements? This feature is similar to the executor
one. But in the executor case, we use the plan for other things. Other
utilities are (i) using that hook to filter out some non-DML commands and (ii)
developing some non-DML replication mechanism for trigger-based replication
solutions.

The second part is to use that hook to capture non-DML commands for
pg_stat_statements module. Do we need to have rows = 0 for non-DML commands?
Maybe NULL could be an appropriate value. The PREPARE command stopped to countthe number of rows. Should we count the
rowsin EXECUTE command or in the
 
PREPARE command? The other command that doesn't count properly is COPY. Could
you fix that? I'm concerned about storing some commands that expose passwords
like CREATE ROLE foo PASSWORD 'secret'; I know that the queries are only
showed to superusers but maybe we should add this information to documentation
or provide a mechanism to exclude those commands.

I don't know if it is worth the trouble adding some code to capture VACUUM and
ANALYZE commands called inside autovacuum.


--  Euler Taveira de Oliveira http://www.timbira.com/


Re: ProcessUtility_hook

From
Itagaki Takahiro
Date:
Euler Taveira de Oliveira <euler@timbira.com> wrote:

> The functionality is divided in two parts. The first part is a hook in the
> utility module. The idea is capture the commands that doesn't pass through
> executor. I'm afraid that that hook will be used only for capturing non-DML
> queries. If so, why don't we hack the tcop/postgres.c and grab those queries
> from the same point we log statements?

DDLs can be used from user defined functions. It has the same reason
why we have executor hooks instead of tcop hooks.


> The second part is to use that hook to capture non-DML commands for
> pg_stat_statements module.

- I fixed a bug that it should handle only isTopLevel command.
- A new parameter pg_stat_statements.track_ddl (boolean) is
  added to enable or disable the feature.

> Do we need to have rows = 0 for non-DML commands?
> Maybe NULL could be an appropriate value.

Yes, but it requires additional management to handle 0 and NULL
separately. I don't think it is worth doing.

> The PREPARE command stopped to count
>  the number of rows. Should we count the rows in EXECUTE command or in the
> PREPARE command?

It requires major rewrites of EXECUTE command to pass the number of
affected rows to caller. I doubt it is worth fixing because almost all
drivers use protocol-level prepared statements instead of PREPARE+EXECUTE.

> The other command that doesn't count properly is COPY. Could
> you fix that?

I added codes for it.

> I'm concerned about storing some commands that expose passwords
> like CREATE ROLE foo PASSWORD 'secret'; I know that the queries are only
> showed to superusers but maybe we should add this information to documentation
> or provide a mechanism to exclude those commands.

I think there is no additonal problem there because we can see the 'secret'
command using pg_stat_activity even now.


> I don't know if it is worth the trouble adding some code to capture VACUUM and
> ANALYZE commands called inside autovacuum.

I'd like to exclude VACUUM and ANALYZE by autovacuum because
pg_stat_statements should not filled with those commands.


Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


Attachment

Re: ProcessUtility_hook

From
Bruce Momjian
Date:
I have applied this patch, with only a minor wording improvement:
 Specify <literal>on</> to track DDL commands, which excludes <command>SELECT</>,
       ^^^^^^^^^^^^^^
 
Thanks.

---------------------------------------------------------------------------

Itagaki Takahiro wrote:
> 
> Euler Taveira de Oliveira <euler@timbira.com> wrote:
> 
> > The functionality is divided in two parts. The first part is a hook in the
> > utility module. The idea is capture the commands that doesn't pass through
> > executor. I'm afraid that that hook will be used only for capturing non-DML
> > queries. If so, why don't we hack the tcop/postgres.c and grab those queries
> > from the same point we log statements?
> 
> DDLs can be used from user defined functions. It has the same reason
> why we have executor hooks instead of tcop hooks.
> 
> 
> > The second part is to use that hook to capture non-DML commands for
> > pg_stat_statements module.
> 
> - I fixed a bug that it should handle only isTopLevel command.
> - A new parameter pg_stat_statements.track_ddl (boolean) is
>   added to enable or disable the feature.
> 
> > Do we need to have rows = 0 for non-DML commands?
> > Maybe NULL could be an appropriate value.
> 
> Yes, but it requires additional management to handle 0 and NULL
> separately. I don't think it is worth doing.
> 
> > The PREPARE command stopped to count
> >  the number of rows. Should we count the rows in EXECUTE command or in the
> > PREPARE command?
> 
> It requires major rewrites of EXECUTE command to pass the number of
> affected rows to caller. I doubt it is worth fixing because almost all
> drivers use protocol-level prepared statements instead of PREPARE+EXECUTE.
> 
> > The other command that doesn't count properly is COPY. Could
> > you fix that?
> 
> I added codes for it.
> 
> > I'm concerned about storing some commands that expose passwords
> > like CREATE ROLE foo PASSWORD 'secret'; I know that the queries are only
> > showed to superusers but maybe we should add this information to documentation
> > or provide a mechanism to exclude those commands.
> 
> I think there is no additonal problem there because we can see the 'secret'
> command using pg_stat_activity even now.
> 
> 
> > I don't know if it is worth the trouble adding some code to capture VACUUM and
> > ANALYZE commands called inside autovacuum.
> 
> I'd like to exclude VACUUM and ANALYZE by autovacuum because
> pg_stat_statements should not filled with those commands.
> 
> 
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
> 

[ Attachment, skipping... ]

> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: ProcessUtility_hook

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> I have applied this patch, with only a minor wording improvement:

Uh, we weren't even done reviewing this were we?
        regards, tom lane


Re: ProcessUtility_hook

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > I have applied this patch, with only a minor wording improvement:
> 
> Uh, we weren't even done reviewing this were we?

Uh, I am new to this commitfest wiki thing, but it did have a review by
Euler Taveira de Oliveira:
https://commitfest.postgresql.org/action/patch_view?id=196

and the author replied.  Is there more that needs to be done?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: ProcessUtility_hook

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> Uh, we weren't even done reviewing this were we?

> Uh, I am new to this commitfest wiki thing, but it did have a review by
> Euler Taveira de Oliveira:
>     https://commitfest.postgresql.org/action/patch_view?id=196
> and the author replied.  Is there more that needs to be done?

It wasn't marked Ready For Committer, so presumably the reviewer
wasn't done with it.  I know I hadn't looked at it at all, because
I was waiting for the commitfest review process to finish.
        regards, tom lane


Re: ProcessUtility_hook

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Tom Lane wrote:
> >> Uh, we weren't even done reviewing this were we?
> 
> > Uh, I am new to this commitfest wiki thing, but it did have a review by
> > Euler Taveira de Oliveira:
> >     https://commitfest.postgresql.org/action/patch_view?id=196
> > and the author replied.  Is there more that needs to be done?
> 
> It wasn't marked Ready For Committer, so presumably the reviewer
> wasn't done with it.  I know I hadn't looked at it at all, because
> I was waiting for the commitfest review process to finish.

So, if someone writes a patch, and it is reviewed, and the patch author
updates the patch and replies, it still should be reviewed again before
being committed?  I was unclear on that.  The updated patch only
appeared today, so maybe it was ready, but the commit fest manager has
to indicate that in the status before I review/apply it?   Should I
revert the patch?

So there is nothing for me to do to help?  The only two patches I see as
ready for committer are HS and SR;  not going to touch those.  ;-)

Also, we are two weeks into the commit fest and we have more unapplied
patches than applied ones.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: ProcessUtility_hook

From
Tom Lane
Date:
I wrote:
> It wasn't marked Ready For Committer, so presumably the reviewer
> wasn't done with it.  I know I hadn't looked at it at all, because
> I was waiting for the commitfest review process to finish.

... and now that I have, I find at least four highly questionable
things about it:

1. The placement of the hook.  Why is it three lines down in
ProcessUtility?  It's probably reasonable to have the Assert first,
but I don't see why the hook function should have the ability to
editorialize on the behavior of everything about ProcessUtility
*except* the read-only-xact check.

2. The naming and documentation of the added GUC setting for
pg_stat_statements.  "track_ddl" seems pretty bizarre to me because
there are many utility statements that no one would call DDL.  COPY,
for example, is certainly not DDL.  Why not call it "track_utility"?

3. The enable-condition test in pgss_ProcessUtility.  Is it really
appropriate to be gating this by isTopLevel?  I should think that
the nested_level check in pgss_enabled would be sufficient and
more likely to do what's expected.

4. The special case for CopyStmt.  That's just weird, and it adds
a maintenance requirement we don't need.  I don't see a really good
argument why COPY (alone among utility statements) deserves to have
a rowcount tracked by pg_stat_statements, but even if you want that
it'd be better to rely on examining the completionTag after the fact.
The fact that the tag is "COPY nnnn" is part of the user-visible API
for COPY and won't change lightly.  The division of labor between
ProcessUtility and copy.c is far more volatile, but this patch has
injected itself into that.
        regards, tom lane


Re: ProcessUtility_hook

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> So, if someone writes a patch, and it is reviewed, and the patch author
> updates the patch and replies, it still should be reviewed again before
> being committed?

Well, that's for the reviewer to say --- if the update satisfies his
concerns, he should sign off on it, if not not.  I've tried to avoid
pre-empting that process.

> Also, we are two weeks into the commit fest and we have more unapplied
> patches than applied ones.

Yup.  Lots of unfinished reviews out there.  Robert spent a good deal
of effort in the last two fests trying to light fires under reviewers;
do you want to take up that cudgel?  I think wholesale commits of things
that haven't finished review is mostly going to send a signal that the
review process doesn't matter, which is *not* the signal I think we
should send.
        regards, tom lane


Re: ProcessUtility_hook

From
Bruce Momjian
Date:
OK, reverted and placed back in "Needs Review" status.

---------------------------------------------------------------------------

Tom Lane wrote:
> I wrote:
> > It wasn't marked Ready For Committer, so presumably the reviewer
> > wasn't done with it.  I know I hadn't looked at it at all, because
> > I was waiting for the commitfest review process to finish.
> 
> ... and now that I have, I find at least four highly questionable
> things about it:
> 
> 1. The placement of the hook.  Why is it three lines down in
> ProcessUtility?  It's probably reasonable to have the Assert first,
> but I don't see why the hook function should have the ability to
> editorialize on the behavior of everything about ProcessUtility
> *except* the read-only-xact check.
> 
> 2. The naming and documentation of the added GUC setting for
> pg_stat_statements.  "track_ddl" seems pretty bizarre to me because
> there are many utility statements that no one would call DDL.  COPY,
> for example, is certainly not DDL.  Why not call it "track_utility"?
> 
> 3. The enable-condition test in pgss_ProcessUtility.  Is it really
> appropriate to be gating this by isTopLevel?  I should think that
> the nested_level check in pgss_enabled would be sufficient and
> more likely to do what's expected.
> 
> 4. The special case for CopyStmt.  That's just weird, and it adds
> a maintenance requirement we don't need.  I don't see a really good
> argument why COPY (alone among utility statements) deserves to have
> a rowcount tracked by pg_stat_statements, but even if you want that
> it'd be better to rely on examining the completionTag after the fact.
> The fact that the tag is "COPY nnnn" is part of the user-visible API
> for COPY and won't change lightly.  The division of labor between
> ProcessUtility and copy.c is far more volatile, but this patch has
> injected itself into that.
> 
>             regards, tom lane

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: ProcessUtility_hook

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > So, if someone writes a patch, and it is reviewed, and the patch author
> > updates the patch and replies, it still should be reviewed again before
> > being committed?
> 
> Well, that's for the reviewer to say --- if the update satisfies his
> concerns, he should sign off on it, if not not.  I've tried to avoid
> pre-empting that process.

OK, so the reviewer knows he has to reply to the author's comments, OK.

> > Also, we are two weeks into the commit fest and we have more unapplied
> > patches than applied ones.
> 
> Yup.  Lots of unfinished reviews out there.  Robert spent a good deal
> of effort in the last two fests trying to light fires under reviewers;
> do you want to take up that cudgel?  I think wholesale commits of things

I am afraid I am then duplicating work the commit fest manager is doing,
and if someone is bugged by me and the CF manager, they might get upset.

> that haven't finished review is mostly going to send a signal that the
> review process doesn't matter, which is *not* the signal I think we
> should send.

True.

Maybe I am best focusing on open issues like the threading and psql -1
patches I worked on today.  There is certainly enough of that stuff to
keep me busy.  I thought I could help with the commit fest load, but now
I am unsure.  That non-commit-fest stuff has to be done too so maybe
managing that will help.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: CommitFest status/management

From
Greg Smith
Date:
Bruce Momjian wrote:
> So, if someone writes a patch, and it is reviewed, and the patch author
> updates the patch and replies, it still should be reviewed again before
> being committed?
That's generally how things were expected to happen--to at least 
double-check that the proposed fixes match what was expected.  I don't 
think it was spelled out very well in the past though.

> Also, we are two weeks into the commit fest and we have more unapplied
> patches than applied ones.
>   
Considering that one of those was a holiday week with a lot of schedule 
disruption proceeding it, I don't know how much faster things could have 
moved.  There were large chunks of the reviewer volunteers who wanted 
only jobs they could finish before the holiday, and others who weren't 
available at all until afterwards.  And I don't know about every else, 
but I took all four days off and started today behind by several hundred 
list messages.  I was planning to start nagging again tomorrow, hoping 
that most would be caught up from any holiday time off too by then.

Right now, of the 16 patches listed in "Needs Review" besides the ECPG 
ones Michael is working through, the breakdown is like this:

Not reviewed at all yet:  6
Reviewed once, updated, waiting for re-review:  10

So the bulk of the problem for keeping the pipeline moving is in these 
re-reviews holding up transitions to "Ready for committer".  I've had 
some discussion with Robert about how to better distinguish in the 
management app when the reviewer has work they're expected to do vs. 
when they think they're done with things.  We're converging toward a 
more clear set of written guidelines to provide for managing future 
CommitFests as part of that, right now there's a few too many fuzzy 
parts for my liking.

If the need here is to speed up how fast things are fed to committers, 
we can certainly do that.  The current process still favors having 
reviewers do as much as possible first, as shown by all the stuff 
sitting in the re-review queue.  The work we're waiting on them for 
could be done by the committers instead if we want to shorten the whole 
process a bit.  I don't think that's really what you want though.

-- 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com



Re: CommitFest status/management

From
Andrew Dunstan
Date:

Greg Smith wrote:
>
> If the need here is to speed up how fast things are fed to committers, 
> we can certainly do that.  The current process still favors having 
> reviewers do as much as possible first, as shown by all the stuff 
> sitting in the re-review queue.  The work we're waiting on them for 
> could be done by the committers instead if we want to shorten the 
> whole process a bit.  I don't think that's really what you want though.
>

As I have observed before, I think we need some infrastructure to help 
committers claim items, so we don't duplicate work.

Right now the only items marked "ready for reviewer" are Streaming 
Replication and Hot Standby, which I imagine Heiki will be handling.

I'm going to look at the YAML format for EXPLAIN patch shortly.

cheers

andrew




Re: CommitFest status/management

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> As I have observed before, I think we need some infrastructure to help 
> committers claim items, so we don't duplicate work.

Robert acknowledged the need for a "claimed by committer" field in the
fest application, but he hasn't got round to it yet.  In the meantime
I've been adding a "Taking this one..." type of comment to an entry
I want to claim.

> I'm going to look at the YAML format for EXPLAIN patch shortly.

Do we have consensus yet that we want YAML?  It seemed, well,
yet another format without all that much advantage over what's
there.
        regards, tom lane


Re: CommitFest status/management

From
"David E. Wheeler"
Date:
On Nov 30, 2009, at 8:08 PM, Tom Lane wrote:

>> I'm going to look at the YAML format for EXPLAIN patch shortly.
> 
> Do we have consensus yet that we want YAML?  It seemed, well,
> yet another format without all that much advantage over what's
> there.

Legibility++

David


Re: CommitFest status/management

From
Andrew Dunstan
Date:

Tom Lane wrote:
>> I'm going to look at the YAML format for EXPLAIN patch shortly.
>>     
>
> Do we have consensus yet that we want YAML?  It seemed, well,
> yet another format without all that much advantage over what's
> there.
>
>             
>   

I think you and I are the two main skeptics ;-) But we seem to be rather 
in the minority, and it's not terribly invasive from what I have seen so 
far.

cheers

andrew


Re: CommitFest status/management

From
Robert Haas
Date:
On Mon, Nov 30, 2009 at 10:16 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Considering that one of those was a holiday week with a lot of schedule
> disruption proceeding it, I don't know how much faster things could have
> moved.  There were large chunks of the reviewer volunteers who wanted only
> jobs they could finish before the holiday, and others who weren't available
> at all until afterwards.  And I don't know about every else, but I took all
> four days off and started today behind by several hundred list messages.  I
> was planning to start nagging again tomorrow, hoping that most would be
> caught up from any holiday time off too by then.
>
> Right now, of the 16 patches listed in "Needs Review" besides the ECPG ones
> Michael is working through, the breakdown is like this:
>
> Not reviewed at all yet:  6
> Reviewed once, updated, waiting for re-review:  10
>
> So the bulk of the problem for keeping the pipeline moving is in these
> re-reviews holding up transitions to "Ready for committer".  I've had some
> discussion with Robert about how to better distinguish in the management app
> when the reviewer has work they're expected to do vs. when they think
> they're done with things.  We're converging toward a more clear set of
> written guidelines to provide for managing future CommitFests as part of
> that, right now there's a few too many fuzzy parts for my liking.
>
> If the need here is to speed up how fast things are fed to committers, we
> can certainly do that.  The current process still favors having reviewers do
> as much as possible first, as shown by all the stuff sitting in the
> re-review queue.  The work we're waiting on them for could be done by the
> committers instead if we want to shorten the whole process a bit.  I don't
> think that's really what you want though.

I think the pressure has to be applied all through the process.
People who haven't provided a review by now are certainly overdue for
a polite poke, Thanksgiving or no Thanksgiving.  If the first review
doesn't happen until the third week of the CommitFest, how is the
patch going to get a fair shake by the end of the fourth one?  I mean,
if that happens to a small number of patches, OK, but when it's 20% of
what's in the CommitFest, it seems like it's bound to lead to a huge
crunch at the end.

In any case, unlike last CommitFest, where the problem was a lack of
adequate committer activity, it's pretty clear that the the problem
this CommitFest has been a combination of slow reviews and slow
updates by patch authors.  I've been keeping a loose eye on the patch
queue and my impression is that there has rarely been more than 1
patch other than HS + SR in the "Ready for Committer" state, and many
times zero.  That's means the pipeline is stalling, and that's bad.

...Robert


Re: ProcessUtility_hook

From
Euler Taveira de Oliveira
Date:
Tom Lane escreveu:
> Bruce Momjian <bruce@momjian.us> writes:
>> So, if someone writes a patch, and it is reviewed, and the patch author
>> updates the patch and replies, it still should be reviewed again before
>> being committed?
> 
> Well, that's for the reviewer to say --- if the update satisfies his
> concerns, he should sign off on it, if not not.  I've tried to avoid
> pre-empting that process.
> 
That's correct. I didn't have time to review the new patch yet. :( I'll do it
later today. IIRC Tom had some objections (during the last CF) the way the
patch was proposed and suggested changes. Let's see if the Takahiro-san did
everything that was suggested.


--  Euler Taveira de Oliveira http://www.timbira.com/


Re: CommitFest status/management

From
Robert Haas
Date:
On Mon, Nov 30, 2009 at 11:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> As I have observed before, I think we need some infrastructure to help
>> committers claim items, so we don't duplicate work.
>
> Robert acknowledged the need for a "claimed by committer" field in the
> fest application, but he hasn't got round to it yet.  In the meantime
> I've been adding a "Taking this one..." type of comment to an entry
> I want to claim.

Sorry I haven't gotten around to this.  Beyond being a little burned
out a little bit, I have been a little bit under the weather and a
little occupied with life apart from PostgreSQL, as if there were such
a thing.  Anyway, one of the concerns I have about this is that adding
another field to the commitfest_view page seems as though it will
create some layout issues - the leftmost column will get squished.  I
could (a) go ahead and do it anyway or (b) do it, but modify the
layout in some unspecified way so that it doesn't impact the format as
much or of course (c) not do it.  Any thoughts?

It would also like to clarify the use case for this a little bit more.Is this just to track patches which committers
arein the process of 
committing (or have committed)?  Or would a committer potentially set
this on some patch that was still being reviewed, and if so would that
mean "don't review this any more because I'm taking over" or "I'm
planning to pick this up when the review process completes" or
something else?

Thanks,

...Robert


Re: CommitFest status/management

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Mon, Nov 30, 2009 at 11:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Andrew Dunstan <andrew@dunslane.net> writes:
> >> As I have observed before, I think we need some infrastructure to help
> >> committers claim items, so we don't duplicate work.
> >
> > Robert acknowledged the need for a "claimed by committer" field in the
> > fest application, but he hasn't got round to it yet. ?In the meantime
> > I've been adding a "Taking this one..." type of comment to an entry
> > I want to claim.
> 
> Sorry I haven't gotten around to this.  Beyond being a little burned
> out a little bit, I have been a little bit under the weather and a
> little occupied with life apart from PostgreSQL, as if there were such
> a thing.  Anyway, one of the concerns I have about this is that adding
> another field to the commitfest_view page seems as though it will
> create some layout issues - the leftmost column will get squished.  I
> could (a) go ahead and do it anyway or (b) do it, but modify the
> layout in some unspecified way so that it doesn't impact the format as
> much or of course (c) not do it.  Any thoughts?
> 
> It would also like to clarify the use case for this a little bit more.
>  Is this just to track patches which committers are in the process of
> committing (or have committed)?  Or would a committer potentially set
> this on some patch that was still being reviewed, and if so would that
> mean "don't review this any more because I'm taking over" or "I'm
> planning to pick this up when the review process completes" or
> something else?

I am thinking we can just add a new status, "claimed by committer" and
not bother about adding a new column with the committer name.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: CommitFest status/management

From
Andrew Dunstan
Date:

Bruce Momjian wrote:
>> It would also like to clarify the use case for this a little bit more.
>>  Is this just to track patches which committers are in the process of
>> committing (or have committed)?  Or would a committer potentially set
>> this on some patch that was still being reviewed, and if so would that
>> mean "don't review this any more because I'm taking over" or "I'm
>> planning to pick this up when the review process completes" or
>> something else?
>>     
>
> I am thinking we can just add a new status, "claimed by committer" and
> not bother about adding a new column with the committer name.
>
>   

I think it would be more flexible and useful to be able to "claim" it 
before the review is finished. It would mean in effect "I am following 
this and intend to commit it when it's ready".

cheers

andrew


Re: CommitFest status/management

From
Robert Haas
Date:
On Tue, Dec 1, 2009 at 8:27 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
> Bruce Momjian wrote:
>>>
>>> It would also like to clarify the use case for this a little bit more.
>>>  Is this just to track patches which committers are in the process of
>>> committing (or have committed)?  Or would a committer potentially set
>>> this on some patch that was still being reviewed, and if so would that
>>> mean "don't review this any more because I'm taking over" or "I'm
>>> planning to pick this up when the review process completes" or
>>> something else?
>>
>> I am thinking we can just add a new status, "claimed by committer" and
>> not bother about adding a new column with the committer name.
>
> I think it would be more flexible and useful to be able to "claim" it before
> the review is finished. It would mean in effect "I am following this and
> intend to commit it when it's ready".

If we went with Bruce's interpretation, we could have a "committer"
field that only appears when the status is "Claimed by Committer" or
"Committed" and the contents of that field could be displayed in
parentheses in the status column, like this: Claimed by Committer (Tom
Lane).

If we went with Andrew's interpretation, we would need a completely
separate column, because there wouldn't be any logical relationship
between the status field and the committer field.

Any other votes?  Tom?

On a possibly related note, I am not totally sure that we want to
enshrine the principle that committers categorically won't touch
patches that are not yet marked Ready for Committer.  For major
patches like SE-PostgreSQL or the partitioning stuff, early committer
involvement is an extremely important ingredient for success.  And, I
have an uncomfortable feeling about having Tom, Bruce, and Andrew all
intentionally sitting on the bench waiting for reviews to complete
while the days tick away.  On the other hand, I also agree with Tom's
point that, if completing reviews doesn't affect whether the patch
gets in, there's less incentive for people to review.

...Robert


Re: CommitFest status/management

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Nov 30, 2009 at 11:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert acknowledged the need for a "claimed by committer" field in the
>> fest application, but he hasn't got round to it yet.

> Sorry I haven't gotten around to this.  Beyond being a little burned
> out a little bit, I have been a little bit under the weather and a
> little occupied with life apart from PostgreSQL, as if there were such
> a thing.  Anyway, one of the concerns I have about this is that adding
> another field to the commitfest_view page seems as though it will
> create some layout issues - the leftmost column will get squished.  I
> could (a) go ahead and do it anyway or (b) do it, but modify the
> layout in some unspecified way so that it doesn't impact the format as
> much or of course (c) not do it.  Any thoughts?

I would be satisfied if there were a "claimed by" field in the per-patch
detail page, which is where you'd have to go to set it anyway.  If you
want you could add an additional status value "claimed by committer"
so it'd be visible in the main page.

> It would also like to clarify the use case for this a little bit more.

It's to keep committers from treading on each others' toes.  Right now,
if say Andrew is working over a patch with intent to commit, there's no
visibility of that fact in the fest status.

I would imagine that a patch should not normally get into this state
until it's been marked "ready for committer" by the reviewer.  Except
perhaps in cases where the reviewer and committer are the same person.
        regards, tom lane


Re: CommitFest status/management

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> If we went with Bruce's interpretation, we could have a "committer"
> field that only appears when the status is "Claimed by Committer" or
> "Committed" and the contents of that field could be displayed in
> parentheses in the status column, like this: Claimed by Committer (Tom
> Lane).

> If we went with Andrew's interpretation, we would need a completely
> separate column, because there wouldn't be any logical relationship
> between the status field and the committer field.

> Any other votes?  Tom?

I'm happy with Andrew's interpretation --- I just want a separate text
field for inserting a committer's name.  I don't want any magic behavior
of that field.

> On a possibly related note, I am not totally sure that we want to
> enshrine the principle that committers categorically won't touch
> patches that are not yet marked Ready for Committer.

No, but I think that should be the default assumption once a reviewer
has been assigned.  If the reviewer doesn't totally fall down on the
job, he/she should be allowed to finish reviewing.
        regards, tom lane


Re: CommitFest status/management

From
Robert Haas
Date:
On Tue, Dec 1, 2009 at 9:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> If we went with Bruce's interpretation, we could have a "committer"
>> field that only appears when the status is "Claimed by Committer" or
>> "Committed" and the contents of that field could be displayed in
>> parentheses in the status column, like this: Claimed by Committer (Tom
>> Lane).
>
>> If we went with Andrew's interpretation, we would need a completely
>> separate column, because there wouldn't be any logical relationship
>> between the status field and the committer field.
>
>> Any other votes?  Tom?
>
> I'm happy with Andrew's interpretation --- I just want a separate text
> field for inserting a committer's name.  I don't want any magic behavior
> of that field.

OK, I'll add a separate text field for the committer's name, but for
now it won't display on the summary page, just the detail page.

...Robert


Re: CommitFest status/management

From
Tom Lane
Date:
BTW, if you have time for any purely cosmetic details ... the way the
CommitFest field on a patch detail page works has always struck me as
weird.  It's a data field, and so if it has any behavior at all that
behavior ought to involve modifying its value.  But what it actually is
is a navigation link.  I think you ought to drop it down to a plain
text field and add a "Back to CommitFest" link to the page header,
similar to the way navigation works on other dependent pages.
        regards, tom lane


Re: CommitFest status/management

From
Andrew Dunstan
Date:

Robert Haas wrote:
>
> OK, I'll add a separate text field for the committer's name, but for
> now it won't display on the summary page, just the detail page.
>
>
>   

Perhaps it could go underneath the reviewer names, maybe in a different 
color. (And yes, like many of us I suck at GUI design, so feel free to 
discount any suggestions I make in that area).

cheers

andrew


Re: YAML Was: CommitFest status/management

From
Josh Berkus
Date:
On 11/30/09 8:17 PM, Andrew Dunstan wrote:
> Do we have consensus yet that we want YAML?  It seemed, well,
> yet another format without all that much advantage over what's
> there.

Well, what's the code count?  What dependencies, if any, does it add?

--Josh


Re: YAML Was: CommitFest status/management

From
Andrew Dunstan
Date:

Josh Berkus wrote:
> On 11/30/09 8:17 PM, Andrew Dunstan wrote:
>   
>> Do we have consensus yet that we want YAML?  It seemed, well,
>> yet another format without all that much advantage over what's
>> there.
>>     
>
> Well, what's the code count?  What dependencies, if any, does it add?
>
>
>   
The patch itself is quite small. There are no extra external dependencies.

YAML and JSON are pretty much interchangeable for our purposes. 
According to Wikipedia, "Both functionally and syntactically, JSON is 
effectively a subset of YAML." See 
<http://en.wikipedia.org/wiki/JSON#YAML> So the YAML parsers should be 
able to handle the JSON output. The only thing we'd be buying with this 
patch is making a bit happier some people who prefer reading the YAML 
syntax. For machine readability we'd be gaining precisely nothing.

I guess the question is this: when are we going to say "No more output 
formats. We have enough."?

One consideration is this: the more formats we support the dumber the 
output will be. Already the XML output is arguably dumber than it should 
be, because XML elements are two-dimensional (they can have property 
lists (attributes) and child elements) but JSON/YAML nodes are 
one-dimensional, so we have made some things that one might normally 
expect to be attributes in XML into child elements. While adding YAML 
won't impose any additional burden of that kind, because its semantics 
are so close to those of JSON, other output formats well might.

cheers

andrew




Re: YAML Was: CommitFest status/management

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> YAML and JSON are pretty much interchangeable for our purposes. 
> According to Wikipedia, "Both functionally and syntactically, JSON is 
> effectively a subset of YAML." See 
> <http://en.wikipedia.org/wiki/JSON#YAML> So the YAML parsers should be 
> able to handle the JSON output. The only thing we'd be buying with this 
> patch is making a bit happier some people who prefer reading the YAML 
> syntax. For machine readability we'd be gaining precisely nothing.

Hmm.  So the argument for it is "let's make a machine-readable format
more human-readable"?  I'm not getting the point.  People should look
at the regular text output.

> One consideration is this: the more formats we support the dumber the 
> output will be. Already the XML output is arguably dumber than it should 
> be, because XML elements are two-dimensional (they can have property 
> lists (attributes) and child elements) but JSON/YAML nodes are 
> one-dimensional, so we have made some things that one might normally 
> expect to be attributes in XML into child elements. While adding YAML 
> won't impose any additional burden of that kind, because its semantics 
> are so close to those of JSON, other output formats well might.

I tend to look at it the other way around: having to support output
formats that have significantly different data models is a Good Thing
because it forces you to design sufficiently general code mechanisms.
If YAML had yet another data model it might actually be a useful
exercise to get the code to handle that.  However, if it's not teaching
us anything we didn't learn from JSON, there's no gain from that
viewpoint either.
        regards, tom lane


Re: YAML Was: CommitFest status/management

From
Ron Mayer
Date:
Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> YAML...
> 
> Hmm.  So the argument for it is "let's make a machine-readable format
> more human-readable"?  I'm not getting the point.  People should look
> at the regular text output.

IMHO YAML beats the regular text format for human-readability -
at least for people with narrow terminal windows, and for novices.

Greg posted examples comparing regular-text vs yaml vs json here:
http://archives.postgresql.org/pgsql-hackers/2009-08/msg02090.php

I think it's more human-readable for novices since it explicitly
spells out what values refer to startup values vs totals.
I think it's more human-readable to me because the current text
format frequently wraps for me on even a modestly complex query,
and I find scrolling down easier than scrolling both ways.

None of the other machine-intended formats seem to suit
that purpose well because they're dominated by a lot of markup.

That said, though, it's not that big a deal.




Re: YAML Was: CommitFest status/management

From
Josh Berkus
Date:
All,

If some people want it, and there's no significant maintenance burden
associated with YAML output, then why not?

Mind you, if it was practical, I'd suggest that YAML ... and all
additional Explain formats ... should be a contrib module.  Anything
other than XML and JSON will be fairly marginal.

--Josh Berkus



Re: YAML Was: CommitFest status/management

From
"Joshua D. Drake"
Date:
On Wed, 2009-12-02 at 10:45 -0800, Josh Berkus wrote:
> All,
>
> If some people want it, and there's no significant maintenance burden
> associated with YAML output, then why not?
>
> Mind you, if it was practical, I'd suggest that YAML ... and all
> additional Explain formats ... should be a contrib module.  Anything
> other than XML and JSON will be fairly marginal.

That would be my take... have explain kick out XML (or whatever) and
then parse it into anything you want. That way it isn't additional
burden into core.

Joshua D. Drake


>
> --Josh Berkus
>
>


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - Salamander

Re: YAML Was: CommitFest status/management

From
Brendan Jurd
Date:
2009/12/3 Ron Mayer <rm_pg@cheapcomplexdevices.com>:
> Tom Lane wrote:
>> Hmm.  So the argument for it is "let's make a machine-readable format
>> more human-readable"?  I'm not getting the point.  People should look
>> at the regular text output.
>
> IMHO YAML beats the regular text format for human-readability -
> at least for people with narrow terminal windows, and for novices.

Agreed.  Calling the regular text output of EXPLAIN "human readable"
is an exaggeration.

Cheers,
BJ


Re: YAML Was: CommitFest status/management

From
Tom Lane
Date:
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
> Tom Lane wrote:
>> Hmm.  So the argument for it is "let's make a machine-readable format
>> more human-readable"?  I'm not getting the point.  People should look
>> at the regular text output.

> IMHO YAML beats the regular text format for human-readability -
> at least for people with narrow terminal windows, and for novices.
> Greg posted examples comparing regular-text vs yaml vs json here:
> http://archives.postgresql.org/pgsql-hackers/2009-08/msg02090.php

Mph.  Maybe I've been looking at the traditional format too long,
but I don't find the YAML version better --- it's so verbose that
you could only see a small fraction of a query at a time.

The main strike against the traditional format IME is exactly what
Greg alludes to in that message: it's prone to being rendered totally
unreadable by email line-wrapping.  However, I'm unconvinced that YAML
would be any better; it looks like it's still critically dependent on
the location and amount of whitespace in order to be readable.  The
lines might be a bit shorter on average, but you're still going to hit
a narrow window's right margin pretty quick in any complicated plan.
In any case, the real killer is email clients that feel no obligation to
preserve whitespace layout at all, and this would certainly not look
much better after that treatment.
        regards, tom lane


Re: YAML Was: CommitFest status/management

From
"Joshua D. Drake"
Date:
On Wed, 2009-12-02 at 10:45 -0800, Josh Berkus wrote:
> All,
> 
> If some people want it, and there's no significant maintenance burden
> associated with YAML output, then why not?
> 
> Mind you, if it was practical, I'd suggest that YAML ... and all
> additional Explain formats ... should be a contrib module.  Anything
> other than XML and JSON will be fairly marginal.

That would be my take... have explain kick out XML (or whatever) and
then parse it into anything you want. That way it isn't additional
burden into core.

Joshua D. Drake


> 
> --Josh Berkus
> 
> 


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - Salamander



Re: CommitFest status/management

From
Robert Haas
Date:
On Tue, Dec 1, 2009 at 9:43 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Dec 1, 2009 at 9:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> If we went with Bruce's interpretation, we could have a "committer"
>>> field that only appears when the status is "Claimed by Committer" or
>>> "Committed" and the contents of that field could be displayed in
>>> parentheses in the status column, like this: Claimed by Committer (Tom
>>> Lane).
>>
>>> If we went with Andrew's interpretation, we would need a completely
>>> separate column, because there wouldn't be any logical relationship
>>> between the status field and the committer field.
>>
>>> Any other votes?  Tom?
>>
>> I'm happy with Andrew's interpretation --- I just want a separate text
>> field for inserting a committer's name.  I don't want any magic behavior
>> of that field.
>
> OK, I'll add a separate text field for the committer's name, but for
> now it won't display on the summary page, just the detail page.

Done.

...Robert


Re: CommitFest status/management

From
Andrew Dunstan
Date:

Robert Haas wrote:
>>> I'm happy with Andrew's interpretation --- I just want a separate text
>>> field for inserting a committer's name.  I don't want any magic behavior
>>> of that field.
>>>       
>> OK, I'll add a separate text field for the committer's name, but for
>> now it won't display on the summary page, just the detail page.
>>     
>
> Done.
>
>
>   

Cool. Thanks.

cheers

andrew


Re: ProcessUtility_hook

From
Itagaki Takahiro
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> ... and now that I have, I find at least four highly questionable
> things about it:
>
> 1. The placement of the hook.  Why is it three lines down in
> ProcessUtility?  It's probably reasonable to have the Assert first,
> but I don't see why the hook function should have the ability to
> editorialize on the behavior of everything about ProcessUtility
> *except* the read-only-xact check.

I moved the initialization of completionTag into standard_ProcessUtility.

> 2. The naming and documentation of the added GUC setting for
> pg_stat_statements.  "track_ddl" seems pretty bizarre to me because
> there are many utility statements that no one would call DDL.  COPY,
> for example, is certainly not DDL.  Why not call it "track_utility"?

Ok, fixed.

> 3. The enable-condition test in pgss_ProcessUtility.  Is it really
> appropriate to be gating this by isTopLevel?  I should think that
> the nested_level check in pgss_enabled would be sufficient and
> more likely to do what's expected.

I removed the isTopLevel check. I was worried about auto-generated
utility commands; generated sub commands are called with the same
query string as the top query. Don't it confuse statistics?

> 4. The special case for CopyStmt.  That's just weird, and it adds
> a maintenance requirement we don't need.  I don't see a really good
> argument why COPY (alone among utility statements) deserves to have
> a rowcount tracked by pg_stat_statements, but even if you want that
> it'd be better to rely on examining the completionTag after the fact.
> The fact that the tag is "COPY nnnn" is part of the user-visible API
> for COPY and won't change lightly.  The division of labor between
> ProcessUtility and copy.c is far more volatile, but this patch has
> injected itself into that.

Ok, fixed. I've thought string-based interface is not desirable, but it
should be a stable API. COPY and INSERT/UPDATE/DELETE (used by EXECUTE)
are counted by pg_stat_statements, but EXECUTE SELECT is impossible.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


Attachment

Re: YAML Was: CommitFest status/management

From
Robert Haas
Date:
On Wed, Dec 2, 2009 at 4:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
>> Tom Lane wrote:
>>> Hmm.  So the argument for it is "let's make a machine-readable format
>>> more human-readable"?  I'm not getting the point.  People should look
>>> at the regular text output.
>
>> IMHO YAML beats the regular text format for human-readability -
>> at least for people with narrow terminal windows, and for novices.
>> Greg posted examples comparing regular-text vs yaml vs json here:
>> http://archives.postgresql.org/pgsql-hackers/2009-08/msg02090.php
>
> Mph.  Maybe I've been looking at the traditional format too long,
> but I don't find the YAML version better --- it's so verbose that
> you could only see a small fraction of a query at a time.

I've been thinking about this a little more and I think I am going to
vote (if I get a vote) to reject this patch.  I think Andrew hit the
crucial point upthread: the latest version of YAML is a superset of
JSON, which means that the only possible use cases for this patch are:

- people using older YAML parsers that can't handle the latest version
(because if they can handle the latest version then they can just use
that on the existing JSON format), and
- people who want to use the YAML format as a substitute for text
format on grounds of readability.

The first target doesn't seem worth aiming at.  Admittedly the latest
version of the YAML 1.2 spec - whose stated goal is JSON-compatibilty
- is only two months old, so there may be many YAML users who don't
have parsers that are completely JSON-compatible.  But presumably this
problem will resolve itself over time.

With respect to the second one, I am not going to argue that the
current text format is ideal in all ways.  In particular, on complex
plans, I find it difficult to match up the plans for the inner and
outer sides of any given node, which may be far enough apart
vertically that it's difficult to tell exactly which bits are in the
same column.  Furthermore, the main output row for each node is wider
than I would like, especially when using EXPLAIN ANALYZE.  Even on
relatively simple plans, I have to maximize my terminal window to
forestall wrapping, and on complex plans, wrapping is inevitable even
if I do maximize the window.  For all of that, in the nearly-two-years
I've been reading pgsql-hackers, I can't remember one complaint about
the visual presentation of the EXPLAIN output.  It's possible that my
memory is faulty, but I think I would remember if there had been very
many.

On top of that, if you did want YAML for easier readability, what
aspect of the output is more readable in YAML than it is in text
format?  The only answer I can think of is that you like having each
data element on a separate line, so that the plan is much longer but
somewhat narrower.  But if that's what you want, the JSON output is
almost as good - the only difference is a bit of extra punctuation.

...Robert


Re: YAML Was: CommitFest status/management

From
Josh Berkus
Date:
> On top of that, if you did want YAML for easier readability, what
> aspect of the output is more readable in YAML than it is in text
> format?  The only answer I can think of is that you like having each
> data element on a separate line, so that the plan is much longer but
> somewhat narrower.  But if that's what you want, the JSON output is
> almost as good - the only difference is a bit of extra punctuation.

"almost as good" ... I agree with Kevin that it's more readable.

The whole patch just adds 144 lines.  It doesn't look to me like there's
significant maintenance burden involved, but of course I need to defer
to the more experienced.  It's even possible that we could reduce the
size of the patch still further if we really looked at it as just a
differently punctuated JSON.

Having compared the JSON and YAML output formats, I think having YAML as
a 2nd human-readable format might be valuable, even though it adds
nothing to machine-processing.

Again, if there were a sensible way to do YAML as a contrib module, I'd
go for that, but there isn't.

--Josh Berkus


Re: YAML Was: CommitFest status/management

From
Robert Haas
Date:
On Fri, Dec 4, 2009 at 7:42 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> On top of that, if you did want YAML for easier readability, what
>> aspect of the output is more readable in YAML than it is in text
>> format?  The only answer I can think of is that you like having each
>> data element on a separate line, so that the plan is much longer but
>> somewhat narrower.  But if that's what you want, the JSON output is
>> almost as good - the only difference is a bit of extra punctuation.
>
> "almost as good" ... I agree with Kevin that it's more readable.
>
> The whole patch just adds 144 lines.  It doesn't look to me like there's
> significant maintenance burden involved, but of course I need to defer
> to the more experienced.  It's even possible that we could reduce the
> size of the patch still further if we really looked at it as just a
> differently punctuated JSON.
>
> Having compared the JSON and YAML output formats, I think having YAML as
> a 2nd human-readable format might be valuable, even though it adds
> nothing to machine-processing.
>
> Again, if there were a sensible way to do YAML as a contrib module, I'd
> go for that, but there isn't.

I don't think the maintenance burden is the issue, per se.  It's more
that I don't like the idea of putting in a bunch of formats that are
only trivially different from each other, and if there were ever a
case where we should reject a new format because it is too similar to
an existing one, this seems to be it.  If that's a bad reason for
rejecting a new format, then I don't have a second one, but we may end
up with a lot of formats - and that WILL be a maintenance burden,
especially if we ever want to make non-trivial extensions to the
output format.

Frankly, just adding new fields (perhaps controlled by new options) is
never going to be that big of a deal, but there will certainly come a
day when someone wants to do something more novel, like dumping
parse-tree representations of expressions or something along the line
of Tom Raney's visual explain tool, which dumped out every path the
planner considered.  I don't really want to be the person who has to
tell the person who writes that patch "sorry, but we have to reject
your patch until it supports every one of our numerous slightly
different output formats".

One possibility for contrib-module-izing this, and perhaps other
output formats that people might like to see, is to write a function
that takes the JSON or XML output as input and does the appropriate
translation.  For something like YAML, whose semantics are so close to
JSON, this should be pretty simple.  One of the reasons why I was hot
to get JSON support into the initial version of machine-readable
EXPLAIN is because JSON maps very cleanly onto the type of data
structures that are common in scripting languages: everything is lists
and hashes, nested inside each other, and text and numeric scalars.
So you can read a JSON object into a script written in Perl, PHP,
Python, Ruby, JavaScript, and probably half a dozen other languages
and get a native object.  From there, it's very easy to write the data
back out in whatever format you happen to prefer just by walking the
data structure.  I suspect that a JSON-to-YAML converter in Perl would
be less than 50 lines.

(The XML format can also be transformed using things like XSLT, but
I'm less familiar with those tools.)

...Robert


Re: YAML Was: CommitFest status/management

From
Ron Mayer
Date:
Josh Berkus wrote:
>> ... YAML for easier readability ...
> 
> "almost as good" ... I agree with Kevin that it's more readable.
> 
> Again, if there were a sensible way to do YAML as a contrib module, I'd
> go for that, but there isn't.

Perhaps that's be a direction this could take?   What would
it take for this feature to be a demo/example for some future
modules system.

It seems like there have been a few recent features related
to decorating output (UTF8 tables, YAML explain, \d... updates).

While there's no great way to make this a contrib module today,
would it make sense to add such hooks for an eventual module system?





Re: YAML Was: CommitFest status/management

From
Euler Taveira de Oliveira
Date:
Ron Mayer escreveu:
> While there's no great way to make this a contrib module today,
> would it make sense to add such hooks for an eventual module system?
> 
I don't think so. It's easier to code a converter tool.


--  Euler Taveira de Oliveira http://www.timbira.com/


Re: YAML Was: CommitFest status/management

From
Itagaki Takahiro
Date:
Josh Berkus <josh@agliodbs.com> wrote:

> Having compared the JSON and YAML output formats, I think having YAML as
> a 2nd human-readable format might be valuable, even though it adds
> nothing to machine-processing.

Sure. YAML is human readable and can print more information that is
too verbose in one-line text format. +1 to add YAML format to EXPLAIN.

> Again, if there were a sensible way to do YAML as a contrib module, I'd
> go for that, but there isn't.

Some ideas to export formatters:1. Provides register_explain_format(name, handler) function.   Contrib modules can
registertheir handlers when LOADed.
 
2. Provides a method to specify a filter function. XML output is   passed to the function and probably converted with
XSLT.

BTW, an extensible formatter is surely useful, but what will we do about
XML and JSON formats when we have it? Will we remove them from core?
If we have a plan to the direction, we should complete it before 8.5.0
release. We will be hard to revert them after the release.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: YAML Was: CommitFest status/management

From
Robert Haas
Date:
On Sun, Dec 6, 2009 at 8:34 PM, Itagaki Takahiro
<itagaki.takahiro@oss.ntt.co.jp> wrote:
>
> Josh Berkus <josh@agliodbs.com> wrote:
>
>> Having compared the JSON and YAML output formats, I think having YAML as
>> a 2nd human-readable format might be valuable, even though it adds
>> nothing to machine-processing.
>
> Sure. YAML is human readable and can print more information that is
> too verbose in one-line text format. +1 to add YAML format to EXPLAIN.
>
>> Again, if there were a sensible way to do YAML as a contrib module, I'd
>> go for that, but there isn't.
>
> Some ideas to export formatters:
>  1. Provides register_explain_format(name, handler) function.
>    Contrib modules can register their handlers when LOADed.
>
>  2. Provides a method to specify a filter function. XML output is
>    passed to the function and probably converted with XSLT.
>
> BTW, an extensible formatter is surely useful, but what will we do about
> XML and JSON formats when we have it? Will we remove them from core?
> If we have a plan to the direction, we should complete it before 8.5.0
> release. We will be hard to revert them after the release.

I sent a previous message about this, but just realized I forgot to
copy the list.  I thought about the possibility of a pluggable system
for formats when I wrote the original machine-readable explain patch.
It seemed to me at the time that if we went this route any contrib
module would have to duplicate significant portions of explain.c, no
matter where we put the hooks.  We'd presumably feel obliged to update
the contrib module when making any changes to explain.c, so I think
the maintenance burden would actually be higher that way than with
everything in core.

The main point here for me is that the JSON format is already
parseable by YAML parsers, and can probably be turned into YAML using
a very short Perl script - possibly even using a sed script.  I think
that it's overkill to support two formats that are that similar.

Tallying up the votes on this patch, we have Tom Lane, Andrew Dunstan,
Josh Drake, and myself arguing against including this in core, and
Josh Berkus and Itagaki Takahiro arguing for it.  Ron Mayer seemed
somewhat in favor of it in his message on-list but sent a message
off-list taking the opposite position.  Brendan Jurd cast aspersions
on the human-readability of the text format but didn't explicitly take
a position on the core issue, and Euler Taveira de Oliveira suggested
that writing a converter would be easier than writing a module
framework (which I think is almost certainly true) but also didn't
explicitly take a position on the core issue.

Overall, that sounds to me like a slight preponderance of "no" votes.
Anyone think I've mis-tallied or want to weigh in?

...Robert


Re: YAML Was: CommitFest status/management

From
Greg Smith
Date:
Robert Haas wrote:
> The main point here for me is that the JSON format is already
> parseable by YAML parsers, and can probably be turned into YAML using
> a very short Perl script - possibly even using a sed script.  I think
> that it's overkill to support two formats that are that similar.
>   
It's not the case that JSON can be turned into YAML or that it just 
happens that it can be parsed by YAML parsers.  While there was some 
possible divergence in earlier versions, a JSON 1.2 document *is* in 
YAML format already.  JSON is actually a subset of YAML that uses one of 
the many possible YAML styles--basically, YAML accepts anything in JSON 
format, along with others.  This means that by providing JSON output, 
we've *already* provided YAML output, too.  Just not the nice looking 
output people tend to associate with YAML.

Accordingly, there is really no basis for this patch to exist from the 
perspective of helping a typical tool author.  If you want to parse YAML 
robustly, you're going to grab someone's parsing library to do it rather 
than writing it yourself, and if you do that it will accept the existing 
JSON output just fine too.  Basically this patch lives or dies by 
whether it looks so much nicer to people as to justify its code weight.

Given the above, I don't understand why writing this patch was deemed 
worthwhile in the first place, but I hate to tell people they can't have 
something they find visually appealing just because I don't think it's 
an improvement.  Consider me a neutral vote, although I suspect the 
above may sway some people who were on the fence toward disapproval.

-- 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com



Re: YAML Was: CommitFest status/management

From
Tom Lane
Date:
Greg Smith <greg@2ndquadrant.com> writes:
> Given the above, I don't understand why writing this patch was deemed 
> worthwhile in the first place,

It was written and submitted by one person who did not bother to ask
first whether anyone else thought it was worthwhile.  So its presence
on the CF list should not be taken as evidence that there's consensus
for it.
        regards, tom lane


Re: YAML Was: CommitFest status/management

From
Itagaki Takahiro
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> It was written and submitted by one person who did not bother to ask
> first whether anyone else thought it was worthwhile.  So its presence
> on the CF list should not be taken as evidence that there's consensus
> for it.

Should we have "Needs Discussion" phase before "Needs Review" ?
Reviews, including me, think patches with needs-review status are
worthwhile. In contrast, contributers often register their patches
to CF without discussions just because of no response; they cannot
find whether no response is silent approval or not.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: YAML Was: CommitFest status/management

From
Hitoshi Harada
Date:
2009/12/7 Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>:
>
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> It was written and submitted by one person who did not bother to ask
>> first whether anyone else thought it was worthwhile.  So its presence
>> on the CF list should not be taken as evidence that there's consensus
>> for it.
>
> Should we have "Needs Discussion" phase before "Needs Review" ?
> Reviews, including me, think patches with needs-review status are
> worthwhile. In contrast, contributers often register their patches
> to CF without discussions just because of no response; they cannot
> find whether no response is silent approval or not.

+1. Sometimes a reviewer waits for the consensus in the community when
someone else waits for review (, because it is marked as "Needs
Review").

Regards,


--
Hitoshi Harada


Re: YAML Was: CommitFest status/management

From
Dimitri Fontaine
Date:
Hi,

Greg Smith <greg@2ndquadrant.com> writes:
> Robert Haas wrote:
>> The main point here for me is that the JSON format is already
>> parseable by YAML parsers, and can probably be turned into YAML using
>> a very short Perl script - possibly even using a sed script.  I think
>> that it's overkill to support two formats that are that similar.
>>   
> It's not the case that JSON can be turned into YAML or that it just happens
> that it can be parsed by YAML parsers.  While there was some possible
> divergence in earlier versions, a JSON 1.2 document *is* in YAML format
> already.  JSON is actually a subset of YAML that uses one of the many
> possible YAML styles--basically, YAML accepts anything in JSON format, along
> with others.  This means that by providing JSON output, we've *already*
> provided YAML output, too.  Just not the nice looking output people tend to
> associate with YAML.

Well we have JSON and agreed it was a good idea to have it. Now JSON is
a subset of YAML and some would prefer another YAML style (me included).

If the problem is supporting 2 formats in core rather than 3, what about
replacing the current JSON support with the YAML one?

At a later point we could even have JSON support back by having the YAML
printer able to output different YAML styles, but I guess that's not
where we are now.

Vote: +1 for YAML even if that means dropping JSON.

Regards,
-- 
dim


Re: YAML Was: CommitFest status/management

From
Peter Eisentraut
Date:
On mån, 2009-12-07 at 17:14 +0900, Hitoshi Harada wrote:
> 2009/12/7 Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>:
> >
> > Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> >> It was written and submitted by one person who did not bother to ask
> >> first whether anyone else thought it was worthwhile.  So its presence
> >> on the CF list should not be taken as evidence that there's consensus
> >> for it.
> >
> > Should we have "Needs Discussion" phase before "Needs Review" ?
> > Reviews, including me, think patches with needs-review status are
> > worthwhile. In contrast, contributers often register their patches
> > to CF without discussions just because of no response; they cannot
> > find whether no response is silent approval or not.
> 
> +1. Sometimes a reviewer waits for the consensus in the community when
> someone else waits for review (, because it is marked as "Needs
> Review").

Yes, I would have had use for this myself a couple of times.



Re: YAML Was: CommitFest status/management

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----                              
Hash: RIPEMD160                                                 


> Tallying up the votes on this patch

First, I would hope that you don't overlook the author of the patch (me) 
as an "aye" vote. :) Additionally, if you are going to tally, please     
consider the positive responses from people on -hackers and -general     
when the patch was first posted and there was a question about what      
the format would look like.                                              

YAML is designed to be machine-parseable (just like XML) *AND* 
human-readable (obstensibly like our current output). JSON tries 
to overcome the shortcomings of XML, but doesn't quite get it right.
YAML does: it's small, intuitive, and only as verbose as needed. Human
readable, yet structured enough to be read by machines as well.

As far as the arguments for making an extensible system for supporting
other formats, but I think this is putting the cart before the horse.
While this feature is only in cvs at the moment, I've not heard a
single person state another format they would like to see. I cannot
think of one myself (machine parseable anyway: I'd love to see an
"abbreviated" version that takes out the explain bits of explain
analyze and changes rows= to R=, loops= to L=, etc.).

This is a small patch, easy to maintain, and useful to more people
than just myself, judging from the responses to the earlier thread.

ObPartingShot: I hope that those in the "machine readable output
should never be seen by human eyes" camp will support my upcoming
patch to remove all extra whitespace, including newlines, from the
XML format. :)

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200912070727
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAksc9QMACgkQvJuQZxSWSshJXwCfeHrspp39mqGJD5Z86121F1Ap
VJEAoMIXrXe8/vS5rPnzyfbX04fJjp36
=H5BM
-----END PGP SIGNATURE-----




Re: YAML Was: CommitFest status/management

From
Florian Weimer
Date:
* Dimitri Fontaine:

> Well we have JSON and agreed it was a good idea to have it. Now JSON is
> a subset of YAML and some would prefer another YAML style (me included).

YAML is rather difficult to parse, and most parsers assume a trusted
document source because they support arbitrary class instantiations
(and it's not possible to switch this off, or it's rather involved to
do so).

Plain JSON doesn't have this issue.

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99


Re: YAML Was: CommitFest status/management

From
Tom Lane
Date:
Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It was written and submitted by one person who did not bother to ask
>> first whether anyone else thought it was worthwhile.  So its presence
>> on the CF list should not be taken as evidence that there's consensus
>> for it.

> Should we have "Needs Discussion" phase before "Needs Review" ?
> Reviews, including me, think patches with needs-review status are
> worthwhile. In contrast, contributers often register their patches
> to CF without discussions just because of no response; they cannot
> find whether no response is silent approval or not.

Hm, I guess the question would be: what is the condition for getting
out of that state?  It's clear who is supposed to move a patch out of
'Needs Review', 'Waiting for Author', or 'Ready for Committer'
respectively.  I don't know who's got the authority to decide that
something has or has not achieved community consensus.

Right at the moment we handle this sort of problem in a very informal
way, but if it's going to become part of the commitfest state for a
patch I think we need to be a bit less informal.
        regards, tom lane


Re: YAML Was: CommitFest status/management

From
Greg Smith
Date:
Dimitri Fontaine wrote:
> If the problem is supporting 2 formats in core rather than 3, what about
> replacing the current JSON support with the YAML one?
>   
That's a step backwards.  By providing JSON format, we've also satisfied 
people who want YAML.  Ripping out JSON would mean we *only* support 
YAML.  There are far many more JSON parsers than YAML parsers, which is 
why I thought the current code committed was good enough.

Anyway, the fact that I have a technical opinion suggests to me I've 
caught up with the discussion now, so let's talk about where we're at.  
I think that the ongoing discussion here is likely to consume more 
resources than the expected future maintenance of this small bit of 
code.  I believe the following to be true:

-The patch is small to apply
-It would also be easy to remove in the future should a more modular 
EXPLAIN implementation replace it
-Having one more "legacy" format to satisfy would actually improve the 
odds that a future modular EXPLAIN would be robustly designed
-There is no way a modular explain will be written any time soon
-While it's questionable whether it's strictly a majority on voting 
here, it's close, which suggests there is plenty of support for wanting 
this feature
-Since nothing is removed the people who aren't in favor of this format 
aren't negatively impacted by it being committed

All that suggests to me that we've cleared the usual "do we want it?" 
hurdles that would normally go along with deciding whether a patch 
should go to a committer or not.  That leaves code quality then.  Are 
there any open issues with that?  There's some notes about line-breaks 
in the CF app.  Once we have a patch with those issues resolved, this 
should go to a committer for final veto power on its inclusion, and then 
we're done here.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com



Re: YAML Was: CommitFest status/management

From
Ron Mayer
Date:
Greg Smith wrote:
> That's a step backwards.  By providing JSON format, we've also satisfied
> people who want YAML.  Ripping out JSON would mean we *only* support
> YAML.  There are far many more JSON parsers than YAML parsers, which is
> why I thought the current code committed was good enough.

XML parsers are common enough IMHO the other computer readable formats
can't be that important from a computer-readability perspective, leaving
their main benefit as being human friendly.

I like YAML output; but I think the most compelling arguments against the
patch are that if so many people want different formats it may be a good
use case for external modules.  And far more than yaml output, I'd like
to see a flexible module system with an equivalent of "cpan install yaml"
or "gem install yaml".


I suppose one could argue that instead of YAML we design a different
human-oriented format for loosely structured data; but that seems
even harder.



Re: YAML Was: CommitFest status/management

From
Alvaro Herrera
Date:
Florian Weimer escribió:
> * Dimitri Fontaine:
> 
> > Well we have JSON and agreed it was a good idea to have it. Now JSON is
> > a subset of YAML and some would prefer another YAML style (me included).
> 
> YAML is rather difficult to parse, and most parsers assume a trusted
> document source because they support arbitrary class instantiations
> (and it's not possible to switch this off, or it's rather involved to
> do so).

That's irrelevant because EXPLAIN YAML is already a trusted document
source.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: YAML Was: CommitFest status/management

From
Alvaro Herrera
Date:
Robert Haas escribió:

> Tallying up the votes on this patch, we have Tom Lane, Andrew Dunstan,
> Josh Drake, and myself arguing against including this in core, and
> Josh Berkus and Itagaki Takahiro arguing for it.  Ron Mayer seemed
> somewhat in favor of it in his message on-list but sent a message
> off-list taking the opposite position.  Brendan Jurd cast aspersions
> on the human-readability of the text format but didn't explicitly take
> a position on the core issue,

Please add my vote for YAML, because of the human-unreadability of the
default text format.  In fact I'd argue we could rip out the default
text format and replace it with this one.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: YAML Was: CommitFest status/management

From
"Joshua D. Drake"
Date:
On Mon, 2009-12-07 at 14:52 -0300, Alvaro Herrera wrote:
> Robert Haas escribió:
>
> > Tallying up the votes on this patch, we have Tom Lane, Andrew Dunstan,
> > Josh Drake, and myself arguing against including this in core, and
> > Josh Berkus and Itagaki Takahiro arguing for it.  Ron Mayer seemed
> > somewhat in favor of it in his message on-list but sent a message
> > off-list taking the opposite position.  Brendan Jurd cast aspersions
> > on the human-readability of the text format but didn't explicitly take
> > a position on the core issue,
>
> Please add my vote for YAML, because of the human-unreadability of the
> default text format.  In fact I'd argue we could rip out the default
> text format and replace it with this one.

That would almost correct. My standing is we should have one format that
is parsed. The current "text" explain output certainly does not qualify.
The XML one or YAML one does.

Pick one, then parse it into whatever you want from the client.

Sincerely,

Joshua D. Drake



--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - Salamander

Re: YAML Was: CommitFest status/management

From
"David E. Wheeler"
Date:
On Dec 7, 2009, at 9:52 AM, Alvaro Herrera wrote:

>> Tallying up the votes on this patch, we have Tom Lane, Andrew Dunstan,
>> Josh Drake, and myself arguing against including this in core, and
>> Josh Berkus and Itagaki Takahiro arguing for it.  Ron Mayer seemed
>> somewhat in favor of it in his message on-list but sent a message
>> off-list taking the opposite position.  Brendan Jurd cast aspersions
>> on the human-readability of the text format but didn't explicitly take
>> a position on the core issue,
> 
> Please add my vote for YAML, because of the human-unreadability of the
> default text format.  In fact I'd argue we could rip out the default
> text format and replace it with this one.

+1

David


Re: YAML

From
Josh Berkus
Date:
All,

What it's sounding like is that we ought to have a plug-in (both for
postmaster and for psql) which allows the calling of an external library
to parse explain output.  Then people could covert XML/JSON into
whatever they wanted.

--Josh Berkus



Re: YAML

From
Andrew Dunstan
Date:

Josh Berkus wrote:
> All,
>
> What it's sounding like is that we ought to have a plug-in (both for
> postmaster and for psql) which allows the calling of an external library
> to parse explain output.  Then people could covert XML/JSON into
> whatever they wanted.
>
>
>   

Not everything is sanely convertible into some sort of plugin. A plugin 
mechanism for this would be FAR more trouble that it is worth, IMNSHO.

We are massively over-egging this pudding (as a culinary blogger you 
should appreciate this analogy).

cheers

andrew


Re: YAML

From
Greg Smith
Date:
Josh Berkus wrote:
> What it's sounding like is that we ought to have a plug-in (both for
> postmaster and for psql) which allows the calling of an external library
> to parse explain output.  Then people could covert XML/JSON into
> whatever they wanted.
>   
It would be kinder to just reject the YAML patch than to make it wait 
for this.

-- 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com



Re: YAML

From
Josh Berkus
Date:
> Not everything is sanely convertible into some sort of plugin. A plugin
> mechanism for this would be FAR more trouble that it is worth, IMNSHO.
> 
> We are massively over-egging this pudding (as a culinary blogger you
> should appreciate this analogy).

OK, then let's just accept it.  It's small, has a maintainer, is useful
to some people, and doesn't create any wierd complications.  I think,
given the knowledge that YAML is now a subdialect of JSON it could
potentially be made smaller, but I can't say how at the moment.

--Josh Berkus


Re: YAML

From
Andrew Dunstan
Date:

Josh Berkus wrote:
>> Not everything is sanely convertible into some sort of plugin. A plugin
>> mechanism for this would be FAR more trouble that it is worth, IMNSHO.
>>
>> We are massively over-egging this pudding (as a culinary blogger you
>> should appreciate this analogy).
>>     
>
> OK, then let's just accept it.  It's small, has a maintainer, is useful
> to some people, and doesn't create any wierd complications.  I think,
> given the knowledge that YAML is now a subdialect of JSON it could
> potentially be made smaller, but I can't say how at the moment.
>
>
>   

Actually, it's the other way, JSON is a subset of YAML.

I was in fact prepared to commit this patch, despite some significant 
misgivings about its wisdom, mainly because it does have such a low 
impact. But then other people raised objections. I'm not sure how strong 
those objections are, though.

I must say that while the YAML output might look a bit nicer than the 
JSON output, the difference strikes me as mostly marginal. But I guess 
it's like beauty and obscenity, something in the eye of the beholder. De 
gustibus non est disputandum.

cheers

andrew





Re: YAML

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> I was in fact prepared to commit this patch, despite some significant 
> misgivings about its wisdom, mainly because it does have such a low 
> impact. But then other people raised objections. I'm not sure how strong 
> those objections are, though.

The "lite" version posted by Itagaki-san on 11/30 seems short enough
that maybe we should just stop arguing and apply it.  There were some
other versions that fooled around with existing logic, which I was a lot
less happy about because of the difficulty of being sure that nothing
was broken.

I definitely don't think we should get involved with trying to create
support for plugin formatters or anything like that --- the amount of
effort required seems far out of proportion to the benefit.
        regards, tom lane


Re: YAML Was: CommitFest status/management

From
"Joshua D. Drake"
Date:
On Mon, 2009-12-07 at 14:52 -0300, Alvaro Herrera wrote:
> Robert Haas escribió:
> 
> > Tallying up the votes on this patch, we have Tom Lane, Andrew Dunstan,
> > Josh Drake, and myself arguing against including this in core, and
> > Josh Berkus and Itagaki Takahiro arguing for it.  Ron Mayer seemed
> > somewhat in favor of it in his message on-list but sent a message
> > off-list taking the opposite position.  Brendan Jurd cast aspersions
> > on the human-readability of the text format but didn't explicitly take
> > a position on the core issue,
> 
> Please add my vote for YAML, because of the human-unreadability of the
> default text format.  In fact I'd argue we could rip out the default
> text format and replace it with this one.

That would almost correct. My standing is we should have one format that
is parsed. The current "text" explain output certainly does not qualify.
The XML one or YAML one does.

Pick one, then parse it into whatever you want from the client.

Sincerely,

Joshua D. Drake



-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - Salamander



Re: YAML

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> I was in fact prepared to commit this patch, despite some significant 
>> misgivings about its wisdom, mainly because it does have such a low 
>> impact. But then other people raised objections. I'm not sure how strong 
>> those objections are, though.
>>     
>
> The "lite" version posted by Itagaki-san on 11/30 seems short enough
> that maybe we should just stop arguing and apply it.  There were some
> other versions that fooled around with existing logic, which I was a lot
> less happy about because of the difficulty of being sure that nothing
> was broken.
>   

Well, I guess he can commit it himself now ;-)

> I definitely don't think we should get involved with trying to create
> support for plugin formatters or anything like that --- the amount of
> effort required seems far out of proportion to the benefit.
>         
>   

right.

cheers

andrew


Re: YAML Was: CommitFest status/management

From
Florian Weimer
Date:
* Alvaro Herrera:

> Florian Weimer escribió:
>> * Dimitri Fontaine:
>>
>> > Well we have JSON and agreed it was a good idea to have it. Now JSON is
>> > a subset of YAML and some would prefer another YAML style (me included).
>>
>> YAML is rather difficult to parse, and most parsers assume a trusted
>> document source because they support arbitrary class instantiations
>> (and it's not possible to switch this off, or it's rather involved to
>> do so).
>
> That's irrelevant because EXPLAIN YAML is already a trusted document
> source.

Uhm, really?  Do I have to expect code execution on the client when I
connect to a database?  I hope not.

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99


Re: YAML

From
Tim Bunce
Date:
On Mon, Dec 07, 2009 at 07:07:13PM -0500, Andrew Dunstan wrote:
>
>
> Josh Berkus wrote:
>>> Not everything is sanely convertible into some sort of plugin. A plugin
>>> mechanism for this would be FAR more trouble that it is worth, IMNSHO.
>>>
>>> We are massively over-egging this pudding (as a culinary blogger you
>>> should appreciate this analogy).
>>>     
>>
>> OK, then let's just accept it.  It's small, has a maintainer, is useful
>> to some people, and doesn't create any wierd complications.  I think,
>> given the knowledge that YAML is now a subdialect of JSON it could
>> potentially be made smaller, but I can't say how at the moment.
>
> Actually, it's the other way, JSON is a subset of YAML.

I've no contribution to the main topic, but I'd like to point out that
the "JSON is a subset of YAML" meme is not without controversy:
 http://search.cpan.org/~mlehmann/JSON-XS-2.26/XS.pm#JSON_and_YAML

It may not be relevant in your use-case, but I thought it worth a mention.

Tim.


Re: YAML

From
Andrew Dunstan
Date:

Tim Bunce wrote:
>
> I've no contribution to the main topic, but I'd like to point out that
> the "JSON is a subset of YAML" meme is not without controversy:
>
>   http://search.cpan.org/~mlehmann/JSON-XS-2.26/XS.pm#JSON_and_YAML
>
> It may not be relevant in your use-case, but I thought it worth a mention.
>
>
>   

Ouch. Thanks for bringing it to our attention.

Well, if we're going to commit this, as now appears likely, we should 
have some language lawyers go over our code for both YAML and JSON with 
a fine tooth comb to make sure what we are producing is strictly 
According To Hoyle.

cheers

andrew


Re: YAML

From
Robert Haas
Date:
On Tue, Dec 8, 2009 at 9:13 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
> Well, if we're going to commit this, as now appears likely, we should have
> some language lawyers go over our code for both YAML and JSON with a fine
> tooth comb to make sure what we are producing is strictly According To
> Hoyle.

+1.  I'm a little concerned about the bit about the YAML specification
changing, too, but at least if we can ensure that we're compliant with
the spec that is current at the time the code goes in we have a leg to
stand on.

...Robert


Re: YAML

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> +1.  I'm a little concerned about the bit about the YAML specification
> changing, too, but at least if we can ensure that we're compliant with
> the spec that is current at the time the code goes in we have a leg to
> stand on.

If the spec is in flux, that seems like More Than Sufficient reason
to reject the patch for the time being.  It can be resubmitted when
it's no longer shooting at a moving target.
        regards, tom lane


Re: YAML

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> If the spec is in flux, that seems like More Than Sufficient reason
> to reject the patch for the time being.  It can be resubmitted when
> it's no longer shooting at a moving target.

Saying that it is in flux is a bit of a stretch. Even if it were, the
parts that do change are nothing that will affect us. We're doing
dirt-simple YAML (and JSON) generation. Basically, 'name: value' pairs
plus some list building via indents and dashes. I'm completely not
worried about our usage ever falling afoul of future YAML or JSON
spec changes.

(This goes for the person on this list concerned about the output
being too hard to parse. Yes, YAML has lots of tiny corner cases
and elaborate syntax, but we're not using any of those, so parsing
should be quite possible for any YAML parser out there).

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200912081104
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkseeZQACgkQvJuQZxSWSsjNlACg3j5zNPnGzNiXtRG0r9OZnlY3
qjkAoOvzcq+S9qLGQIMbZ0BH55P+TtH/
=icE3
-----END PGP SIGNATURE-----




Re: YAML Was: CommitFest status/management

From
Takahiro Itagaki
Date:
Can I ask the final decision whether the YAML formatter should be
applied or rejected?  As far as I read the discussion, we can apply it
because serveral users want it and we don't have a plan to support
extensible formatters in the core.

Greg Smith <greg@2ndquadrant.com> wrote:
> -The patch is small to apply
> -Having one more "legacy" format to satisfy would actually improve the 
> odds that a future modular EXPLAIN would be robustly designed
> -While it's questionable whether it's strictly a majority on voting 
> here, it's close, which suggests there is plenty of support for wanting 
> this feature
> -Since nothing is removed the people who aren't in favor of this format 
> aren't negatively impacted by it being committed

Comments from additional discussion:
 - YAML format is the second (or the best for some people)   human-unreadabe format for EXPLAIN output.
 - JSON is not always a subset of YAML. It is not recommended to   generate/parse YAML with a JSON generator/parser or
viceversa.
 
 - We won't add any core hooks and plug-in to format EXPLAIN output.   Client tools may convert the output if needed.
(ex.using XLST)
 
 - The spec of YAML might be changed in the feature, but we use   only the basic parts. The parts will not be changed.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: YAML Was: CommitFest status/management

From
Greg Smith
Date:
Takahiro Itagaki wrote:
> Can I ask the final decision whether the YAML formatter should be
> applied or rejected?  As far as I read the discussion, we can apply it
> because serveral users want it and we don't have a plan to support
> extensible formatters in the core.
>   
The path I thought made sense at this point was to mark the patch ready 
for a committer, since it sounds like everyone is done with it now, and 
have another committer besides yourself do a final review as part of 
that.  At this point, I think we've justified the feature and confirmed 
the feature works.  Given the controversy, I think another set of eyes 
to make sure it's not going to be a maintenance headache moving forward 
should (as usual) have the final say on whether the code goes in or not, 
because that's only drawback to it left to committing it I see at this 
point.

To be clear about which version we're talking about:  
http://archives.postgresql.org/message-id/20091130123456.4A03.52131E4D@oss.ntt.co.jp 
is the candidate for commit that includes the cleanup you've already done.

-- 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com



Re: YAML Was: CommitFest status/management

From
Andrew Dunstan
Date:

Greg Smith wrote:
> Takahiro Itagaki wrote:
>> Can I ask the final decision whether the YAML formatter should be
>> applied or rejected?  As far as I read the discussion, we can apply it
>> because serveral users want it and we don't have a plan to support
>> extensible formatters in the core.
>>   
> The path I thought made sense at this point was to mark the patch 
> ready for a committer, since it sounds like everyone is done with it 
> now, and have another committer besides yourself do a final review as 
> part of that. 

That brings us back to where this conversation started ;-) I'll pick it up.

cheers

andrew


Re: ProcessUtility_hook

From
Greg Smith
Date:
It looks like the last round of issues on this patch only came from 
Tom's concerns, so I'm not sure if anyone but Tom (or a similarly picky 
alternate committer) is going to find anything else in a re-review.  It 
looks to me like all the issues were sorted out anyway.  Euler, you're 
off the hook for this one; it looks "ready for committer" to me.

-- 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com



Re: YAML Was: CommitFest status/management

From
Tom Lane
Date:
Greg Smith <greg@2ndquadrant.com> writes:
> To be clear about which version we're talking about:  
> http://archives.postgresql.org/message-id/20091130123456.4A03.52131E4D@oss.ntt.co.jp 
> is the candidate for commit that includes the cleanup you've already done.

I suppose this is subject to the same auto_explain problem already
noticed for JSON, but I would suggest that we commit this first and
then fix both together, rather than create merge issues.
        regards, tom lane


Re: ProcessUtility_hook

From
Robert Haas
Date:
On Tue, Dec 8, 2009 at 10:12 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> It looks like the last round of issues on this patch only came from Tom's
> concerns, so I'm not sure if anyone but Tom (or a similarly picky alternate
> committer) is going to find anything else in a re-review.  It looks to me
> like all the issues were sorted out anyway.  Euler, you're off the hook for
> this one; it looks "ready for committer" to me.

Since Itagaki Takahiro is now a committer, I sort of assumed he would
be committing his own patches.  I am not really sure what the
etiquette is in this area, but there seems to be an implication here
someone else will be committing this, which isn't necessarily my
understanding of how it works.  Certainly, unless someone has a
contrary opinion, I think he can go ahead if he wishes.  On the other
hand, if it's helpful, I'm more than happy to pick up this one and/or
EXPLAIN BUFFERS for final review and commit.

...Robert


Re: ProcessUtility_hook

From
Greg Smith
Date:
Robert Haas wrote:
> Since Itagaki Takahiro is now a committer, I sort of assumed he would
> be committing his own patches.

Maybe, but I wasn't going to be the one to suggest that Tom get cut out 
of the loop after he raised a list of issues with the patch already. 

I think the situation for EXPLAIN BUFFERS is much simpler, given that 
the last round of feedback was only quibbling over the line formatting 
and docs.  What I think is a reasonable general guideline is to route 
submitted patches back to their author to commit only when the patch has 
been recently free of code issues during its review.  If we've already 
had another committer chime in with concerns, they should probably 
confirm themselves that the updated version is suitable to commit, and 
do so instead of the author.  That just seems like a reasonable 
risk-reduction workflow approach to me, similar to how the "sign-off" 
practice works on some other projects.

-- 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com



Re: ProcessUtility_hook

From
Robert Haas
Date:
Why does this patch #ifdef out the _PG_fini code in pg_stat_statements?

Where you check for INSERT, UPDATE, and DELETE return codes in
pgss_ProcessUtility, I think this deserves a comment explaining that
these could occur as a result of EXECUTE.  It wasn't obvious to me,
anyway.

It seems to me that the current hook placement does not address this complaint

>> 1. The placement of the hook.  Why is it three lines down in
>> ProcessUtility?  It's probably reasonable to have the Assert first,
>> but I don't see why the hook function should have the ability to
>> editorialize on the behavior of everything about ProcessUtility
>> *except* the read-only-xact check.

...Robert


Re: ProcessUtility_hook

From
Takahiro Itagaki
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> Why does this patch #ifdef out the _PG_fini code in pg_stat_statements?

That's because _PG_fini is never called in current releases.
We could remove it completely, but I'd like to leave it for future
releases where _PG_fini callback is re-enabled.
Or, removing #ifdef (enabling _PG_fini function) is also harmless.

> Where you check for INSERT, UPDATE, and DELETE return codes in
> pgss_ProcessUtility, I think this deserves a comment explaining that
> these could occur as a result of EXECUTE.  It wasn't obvious to me,
> anyway.

Like this?
/** Parse command tag to retrieve the number of affected rows.* COPY command returns COPY tag. EXECUTE command might
returnINSERT,* UPDATE, or DELETE tags, but we cannot retrieve the number of rows* for SELECT. We assume other commands
alwaysreturn 0 row.*/
 

> It seems to me that the current hook placement does not address this complaint
> >> I don't see why the hook function should have the ability to
> >> editorialize on the behavior of everything about ProcessUtility
> >> *except* the read-only-xact check.

I moved the initialization code of completionTag as the comment,
but check_xact_readonly() should be called before the hook.
Am I missing something?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: ProcessUtility_hook

From
Robert Haas
Date:
On Wed, Dec 9, 2009 at 9:33 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>
>> Why does this patch #ifdef out the _PG_fini code in pg_stat_statements?
>
> That's because _PG_fini is never called in current releases.
> We could remove it completely, but I'd like to leave it for future
> releases where _PG_fini callback is re-enabled.
> Or, removing #ifdef (enabling _PG_fini function) is also harmless.

I guess my vote is for leaving it alone, but I might not know what I'm
talking about.  :-)

>> Where you check for INSERT, UPDATE, and DELETE return codes in
>> pgss_ProcessUtility, I think this deserves a comment explaining that
>> these could occur as a result of EXECUTE.  It wasn't obvious to me,
>> anyway.
>
> Like this?
> /*
>  * Parse command tag to retrieve the number of affected rows.
>  * COPY command returns COPY tag. EXECUTE command might return INSERT,
>  * UPDATE, or DELETE tags, but we cannot retrieve the number of rows
>  * for SELECT. We assume other commands always return 0 row.
>  */

I'm confused by the "we cannot retrieve the number of rows for SELECT"
part.  Can you clarify that?

>> It seems to me that the current hook placement does not address this complaint
>> >> I don't see why the hook function should have the ability to
>> >> editorialize on the behavior of everything about ProcessUtility
>> >> *except* the read-only-xact check.
>
> I moved the initialization code of completionTag as the comment,
> but check_xact_readonly() should be called before the hook.
> Am I missing something?

Beats me.  I interpreted Tom's remark to be saying the hook call
should come first, but I'm not sure which way is actually better.

...Robert


Re: ProcessUtility_hook

From
Takahiro Itagaki
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> > Like this?
> > /*
> >  * Parse command tag to retrieve the number of affected rows.
> >  * COPY command returns COPY tag. EXECUTE command might return INSERT,
> >  * UPDATE, or DELETE tags, but we cannot retrieve the number of rows
> >  * for SELECT. We assume other commands always return 0 row.
> >  */
> 
> I'm confused by the "we cannot retrieve the number of rows for SELECT"
> part.  Can you clarify that?

Ah, I meant the SELECT was "EXECUTE of SELECT".

If I use internal structure names, the explanation will be:
----
EXECUTE command returns INSERT, UPDATE, DELETE, or SELECT tags.
We can retrieve the number of rows from INSERT, UPDATE, and DELETE tags,
but cannot from SELECT tag because the tag doesn't contain row numbers
and also EState->es_processed is unavailable for EXECUTE commands.
----

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: ProcessUtility_hook

From
Robert Haas
Date:
On Wed, Dec 9, 2009 at 10:14 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
>> I'm confused by the "we cannot retrieve the number of rows for SELECT"
>> part.  Can you clarify that?
>
> Ah, I meant the SELECT was "EXECUTE of SELECT".
>
> If I use internal structure names, the explanation will be:
> ----
> EXECUTE command returns INSERT, UPDATE, DELETE, or SELECT tags.
> We can retrieve the number of rows from INSERT, UPDATE, and DELETE tags,
> but cannot from SELECT tag because the tag doesn't contain row numbers
> and also EState->es_processed is unavailable for EXECUTE commands.
> ----

OK, that makes sense.  It might read a little better this way:

The EXECUTE command returns INSERT, UPDATE, DELETE, or SELECT tags.
We can retrieve the number of rows from INSERT, UPDATE, and DELETE tags,
but the SELECT doesn't contain row numbers.  We also can't get it from
EState->es_processed, because that is unavailable for EXECUTE commands.

That seems like a rather unfortunate limitation though...

...Robert


Re: YAML Was: CommitFest status/management

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Greg Smith <greg@2ndquadrant.com> writes:
>   
>> To be clear about which version we're talking about:  
>> http://archives.postgresql.org/message-id/20091130123456.4A03.52131E4D@oss.ntt.co.jp 
>> is the candidate for commit that includes the cleanup you've already done.
>>     
>
> I suppose this is subject to the same auto_explain problem already
> noticed for JSON, but I would suggest that we commit this first and
> then fix both together, rather than create merge issues.
>
>             
>   

OK, I've committed the YAML stuff, so who is working on the auto-explain 
bug? Robert? I'd like that fixed before I add in the query text to 
auto-explain output.

cheers

andrew


Re: YAML Was: CommitFest status/management

From
Robert Haas
Date:
On Thu, Dec 10, 2009 at 8:39 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> OK, I've committed the YAML stuff, so who is working on the auto-explain
> bug? Robert? I'd like that fixed before I add in the query text to
> auto-explain output.

I'm going to propose fixing this in what seems to me to be the
simplest possible way, per the attached.  Anyone want to argue?

...Robert

Attachment

Re: YAML Was: CommitFest status/management

From
Takahiro Itagaki
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> On Thu, Dec 10, 2009 at 8:39 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> > OK, I've committed the YAML stuff, so who is working on the auto-explain
> > bug? Robert?
> 
> I'm going to propose fixing this in what seems to me to be the
> simplest possible way, per the attached.  Anyone want to argue?

+1 to the fix.

Typical usage of explain functions will be:   1. ExplainInitState()   2. (setup ExplainState)   3. ExplainBeginOutput()
 4. ExplainXXX() except ExplainQuery()   5. ExplainEndOutput()
 

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: ProcessUtility_hook

From
Tom Lane
Date:
I applied this patch after a little bit of editorialization concerning
the points mentioned in this discussion:

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Dec 9, 2009 at 9:33 PM, Takahiro Itagaki
> <itagaki.takahiro@oss.ntt.co.jp> wrote:
>> Robert Haas <robertmhaas@gmail.com> wrote:
>>> Why does this patch #ifdef out the _PG_fini code in pg_stat_statements?
>> 
>> That's because _PG_fini is never called in current releases.
>> We could remove it completely, but I'd like to leave it for future
>> releases where _PG_fini callback is re-enabled.
>> Or, removing #ifdef (enabling _PG_fini function) is also harmless.

> I guess my vote is for leaving it alone, but I might not know what I'm
> talking about.  :-)

I removed this change.  I'm not convinced that taking out _PG_fini is
appropriate in the first place, but if it is, we should do it in all
contrib modules together, not just randomly as side-effects of unrelated
patches.

>>> Where you check for INSERT, UPDATE, and DELETE return codes in
>>> pgss_ProcessUtility, I think this deserves a comment explaining that
>>> these could occur as a result of EXECUTE. �It wasn't obvious to me,
>>> anyway.
>> 
>> Like this?
>> /*
>> �* Parse command tag to retrieve the number of affected rows.
>> �* COPY command returns COPY tag. EXECUTE command might return INSERT,
>> �* UPDATE, or DELETE tags, but we cannot retrieve the number of rows
>> �* for SELECT. We assume other commands always return 0 row.
>> �*/

I took this out, too.  It's inconsistent and IMHO the wrong thing
anyway.  If a regular DML statement is EXECUTE'd, the associated
rowcounts will be tallied by the executor hooks, so this was
double-counting the effects.  The only way it wouldn't be
double-counting is if you have tracking of nested statements turned off;
but if you do, I don't see why you'd expect them to get counted anyway
in this one particular case.

>>> It seems to me that the current hook placement does not address this complaint:
>>>> I don't see why the hook function should have the ability to
>>>> editorialize on the behavior of everything about ProcessUtility
>>>> *except* the read-only-xact check.

Nope, it didn't.  The point here is that one of the purposes of a hook
is to allow a loadable module to editorialize on the behavior of the
hooked function, and that read-only check is part of the behavior of
ProcessUtility.  I doubt that bypassing the test would be a particularly
smart thing for somebody to do, but it is for example reasonable to want
to throw a warning or do nothing at all instead of allowing the error to
occur.  So that should be inside standard_ProcessUtility.
        regards, tom lane