Thread: gSoC - ADD MERGE COMMAND - code patch submission

gSoC - ADD MERGE COMMAND - code patch submission

From
Boxuan Zhai
Date:
Dear All,
 
This is ZHAI BOXUAN, a student of gSoC 2010. My project is to add merge command in postgres.
 
This is the first submission of our codes, which has finished the parser, analyzer and rewriter parts.
 
If you are interested in this project, please download the source code in attachment and have test.
 
There are 3 files in the attachement. Two .patch file is created on the /src/backend and /src/include folders (between orignial psql 8.4.3 and our modified edition).
 
There is a more detailed instruction in readme.
 
Any comments will be highly appreciated.
 
Thanks!
Attachment

Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Robert Haas
Date:
On Fri, Jul 9, 2010 at 10:25 PM, Boxuan Zhai <bxzhai2010@gmail.com> wrote:
> Dear All,
>
> This is ZHAI BOXUAN, a student of gSoC 2010. My project is to add merge
> command in postgres.
>
> This is the first submission of our codes, which has finished the parser,
> analyzer and rewriter parts.
>
> If you are interested in this project, please download the source code in
> attachment and have test.
>
> There are 3 files in the attachement. Two .patch file is created on the
> /src/backend and /src/include folders (between orignial psql 8.4.3 and our
> modified edition).
>
> There is a more detailed instruction in readme.
>
> Any comments will be highly appreciated.

Is there any chance you can submit this as a single patch file?  Or if
not, can you at least use a zip or tar file instead of a RAR archive?

Ideally the patch would be against CVS HEAD, not 8.4.3.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Andres Freund
Date:
On Fri, Jul 09, 2010 at 11:33:04PM -0400, Robert Haas wrote:
> On Fri, Jul 9, 2010 at 10:25 PM, Boxuan Zhai <bxzhai2010@gmail.com> wrote:
> > Dear All,
> >
> > This is ZHAI BOXUAN, a student of gSoC 2010. My project is to add merge
> > command in postgres.
> > There is a more detailed instruction in readme.
I would find it helpfull to find a short recap of how you want to
handle the various problems (mostly around locking) in the readme.

> > Any comments will be highly appreciated.
> Is there any chance you can submit this as a single patch file?  Or if
> not, can you at least use a zip or tar file instead of a RAR archive?
> Ideally the patch would be against CVS HEAD, not 8.4.3.

I would also suggest you base your patch either against the git tree
or CVS. Currently it does include patches agains generated files like
gram.y or kwlist.h which make it harder to see the real changes.

Thanks,
Andres


Re: gSoC - ADD MERGE COMMAND - code patch submission

From
David Fetter
Date:
On Sat, Jul 10, 2010 at 11:52:31AM +0200, Andres Freund wrote:
> On Fri, Jul 09, 2010 at 11:33:04PM -0400, Robert Haas wrote:
> > On Fri, Jul 9, 2010 at 10:25 PM, Boxuan Zhai <bxzhai2010@gmail.com> wrote:
> > > Dear All,
> > >
> > > This is ZHAI BOXUAN, a student of gSoC 2010. My project is to add merge
> > > command in postgres.
> > > There is a more detailed instruction in readme.
> I would find it helpfull to find a short recap of how you want to
> handle the various problems (mostly around locking) in the readme.
>
> > > Any comments will be highly appreciated.
> > Is there any chance you can submit this as a single patch file?  Or if
> > not, can you at least use a zip or tar file instead of a RAR archive?
> > Ideally the patch would be against CVS HEAD, not 8.4.3.
>
> I would also suggest you base your patch either against the git tree
> or CVS. Currently it does include patches agains generated files like
> gram.y or kwlist.h which make it harder to see the real changes.
>
> Thanks,
> Andres

Please find enclosed a patch against git master as of
7b2668159bb4d0f5177a23d05bf7c2ab00bc0d75.  It works up to make, but
fails on make check.

I'm thinking the docs for INSERT, UPDATE, and DELETE should link to
the docs for this, as they get written.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachment

Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> Please find enclosed a patch against git master as of
> 7b2668159bb4d0f5177a23d05bf7c2ab00bc0d75.  It works up to make, but
> fails on make check.

There seem to be about four different comment styles used in this patch,
none of which match the project standard:
http://developer.postgresql.org/pgdocs/postgres/source-format.html

BTW, I notice that that page fails to mention anything about preferred
window width.  I believe the project standard is to make things readable
in an 80-column window --- anyone have an objection to stating that
explicitly?
        regards, tom lane


Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Robert Haas
Date:
On Jul 10, 2010, at 11:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Fetter <david@fetter.org> writes:
>> Please find enclosed a patch against git master as of
>> 7b2668159bb4d0f5177a23d05bf7c2ab00bc0d75.  It works up to make, but
>> fails on make check.
>
> There seem to be about four different comment styles used in this patch,
> none of which match the project standard:
> http://developer.postgresql.org/pgdocs/postgres/source-format.html
>
> BTW, I notice that that page fails to mention anything about preferred
> window width.  I believe the project standard is to make things readable
> in an 80-column window --- anyone have an objection to stating that
> explicitly?

I certainly don't.

Though, if the worst problem with this patch is the formatting, we're doing *quite* well.

...Robert

Re: gSoC - ADD MERGE COMMAND - code patch submission

From
David Fetter
Date:
On Sat, Jul 10, 2010 at 09:26:38AM -0700, David Fetter wrote:
> On Sat, Jul 10, 2010 at 11:52:31AM +0200, Andres Freund wrote:
> > On Fri, Jul 09, 2010 at 11:33:04PM -0400, Robert Haas wrote:
> > > On Fri, Jul 9, 2010 at 10:25 PM, Boxuan Zhai <bxzhai2010@gmail.com> wrote:
> > > > Dear All,
> > > >
> > > > This is ZHAI BOXUAN, a student of gSoC 2010. My project is to add merge
> > > > command in postgres.
> > > > There is a more detailed instruction in readme.
> > I would find it helpfull to find a short recap of how you want to
> > handle the various problems (mostly around locking) in the readme.
> >
> > > > Any comments will be highly appreciated.
> > > Is there any chance you can submit this as a single patch file?  Or if
> > > not, can you at least use a zip or tar file instead of a RAR archive?
> > > Ideally the patch would be against CVS HEAD, not 8.4.3.
> >
> > I would also suggest you base your patch either against the git tree
> > or CVS. Currently it does include patches agains generated files like
> > gram.y or kwlist.h which make it harder to see the real changes.
> >
> Please find enclosed a patch against git master as of
> 7b2668159bb4d0f5177a23d05bf7c2ab00bc0d75.  It works up to make, but
> fails on make check.
>
> I'm thinking the docs for INSERT, UPDATE, and DELETE should link to
> the docs for this, as they get written.
>
> Cheers,
> David.

Oops.  Just noticed that there were 56 lines' worth of C++ style
comments, which I've corrected in the enclosed patch, along with some
spelling mistakes, grammar, and gratuitous white space.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachment

Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Though, if the worst problem with this patch is the formatting, we're doing *quite* well.

Well, the worst problem with it is that it hasn't touched the
interesting part, ie, what happens at execution time.  I haven't
seen a design for that, which means it's impossible to evaluate
whether the code that is here is of any use.  We might need some
other representation entirely.

BTW, Fetter's version of the patch seems to be lacking any gram.y
changes, but surely those exist already?
        regards, tom lane


Re: gSoC - ADD MERGE COMMAND - code patch submission

From
David Fetter
Date:
On Sat, Jul 10, 2010 at 01:18:49PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > Though, if the worst problem with this patch is the formatting, we're doing *quite* well.
>
> Well, the worst problem with it is that it hasn't touched the
> interesting part, ie, what happens at execution time.  I haven't
> seen a design for that, which means it's impossible to evaluate
> whether the code that is here is of any use.  We might need some
> other representation entirely.
>
> BTW, Fetter's version of the patch seems to be lacking any gram.y
> changes, but surely those exist already?

Oops.

Fixed that now in attached patch.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachment

Re: gSoC - ADD MERGE COMMAND - code patch submission

From
David Fetter
Date:
On Sat, Jul 10, 2010 at 10:39:02AM -0700, David Fetter wrote:
> On Sat, Jul 10, 2010 at 01:18:49PM -0400, Tom Lane wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> > > Though, if the worst problem with this patch is the formatting, we're doing *quite* well.
> >
> > Well, the worst problem with it is that it hasn't touched the
> > interesting part, ie, what happens at execution time.  I haven't
> > seen a design for that, which means it's impossible to evaluate
> > whether the code that is here is of any use.  We might need some
> > other representation entirely.
> >
> > BTW, Fetter's version of the patch seems to be lacking any gram.y
> > changes, but surely those exist already?
>
> Oops.
>
> Fixed that now in attached patch.

By the way, "make check" fails here with attached initdb.log:

./pg_regress --inputdir=. --dlpath=. --multibyte=SQL_ASCII  --temp-install=./tmp_check --top-builddir=../../..
--schedule=./parallel_schedule  
============== creating temporary installation        ==============
============== initializing database system           ==============

pg_regress: initdb failed
Examine /home/shackle/pggit/postgresql/src/test/regress/log/initdb.log for the reason.
Command was: "/home/shackle/pggit/postgresql/src/test/regress/./tmp_check/install//home/shackle/tip/bin/initdb" -D
"/home/shackle/pggit/postgresql/src/test/regress/./tmp_check/data"-L
"/home/shackle/pggit/postgresql/src/test/regress/./tmp_check/install//home/shackle/tip/share/postgresql"--noclean >
"/home/shackle/pggit/postgresql/src/test/regress/log/initdb.log"2>&1 
make[2]: *** [check] Error 2
make[2]: Leaving directory `/home/shackle/pggit/postgresql/src/test/regress'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/home/shackle/pggit/postgresql/src/test'
make: *** [check] Error 2

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachment

Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> By the way, "make check" fails here with attached initdb.log:

> creating system views ... FATAL:  unrecognized token: "false"

Hm, I'd suspect something fouled up in keyword recognition.
Did you do a "make clean" and rebuild?

BTW, this patch is still a few bricks shy of a load, since there's no
kwlist.h change and so the new MERGE keyword couldn't possibly be
recognized.  More generally, I'm wondering why the original .rar
submission was 300k (presumably compressed) and your diff is only
about 35k ...
        regards, tom lane


Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Peter Eisentraut
Date:
On lör, 2010-07-10 at 09:26 -0700, David Fetter wrote:
> Please find enclosed a patch against git master as of
> 7b2668159bb4d0f5177a23d05bf7c2ab00bc0d75.  It works up to make, but
> fails on make check.

It looks like this implementation reaches about the same level of parser
support as the stuff that I had coded up a few months ago at the airport
within a couple of hours [0][1], and I had sent the student the code, so
he could have had that for free.

But as others had commented already, the meat of the problem is how
MERGE statement *execution* is supposed to work.

[0]
http://git.postgresql.org/gitweb?p=users/petere/postgresql.git;a=shortlog;h=refs/heads/merge-statement
[1] http://petereisentraut.blogspot.com/2010/05/merge-syntax.html



Re: gSoC - ADD MERGE COMMAND - code patch submission

From
David Fetter
Date:
On Sat, Jul 10, 2010 at 01:53:53PM -0400, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
> > By the way, "make check" fails here with attached initdb.log:
> 
> > creating system views ... FATAL:  unrecognized token: "false"
> 
> Hm, I'd suspect something fouled up in keyword recognition.  Did you
> do a "make clean" and rebuild?

I did make maintainer-clean.

> BTW, this patch is still a few bricks shy of a load, since there's
> no kwlist.h change and so the new MERGE keyword couldn't possibly be
> recognized.  More generally, I'm wondering why the original .rar
> submission was 300k (presumably compressed) and your diff is only
> about 35k ...

I'll look into that.  From what you can see, is it worth trying to
clean up, starting from base, or should we just wait for the next
revision of the patch?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> On Sat, Jul 10, 2010 at 01:53:53PM -0400, Tom Lane wrote:
>> BTW, this patch is still a few bricks shy of a load, since there's
>> no kwlist.h change and so the new MERGE keyword couldn't possibly be
>> recognized.  More generally, I'm wondering why the original .rar
>> submission was 300k (presumably compressed) and your diff is only
>> about 35k ...

> I'll look into that.  From what you can see, is it worth trying to
> clean up, starting from base, or should we just wait for the next
> revision of the patch?

Well, rebasing against HEAD will presumably help the submitter
(assuming that he takes the advice to work against HEAD not 8.4.x).
But really what we need to see is design documentation, not code.
        regards, tom lane


Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Andrew Dunstan
Date:

Tom Lane wrote:
> BTW, I notice that that page fails to mention anything about preferred
> window width.  I believe the project standard is to make things readable
> in an 80-column window --- anyone have an objection to stating that
> explicitly?
>
>
>   

No, on the contrary, I'm in favor of stating it.

cheers

andrew


Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Peter Eisentraut
Date:
On lör, 2010-07-10 at 12:45 -0400, Tom Lane wrote:
> I believe the project standard is to make things readable
> in an 80-column window --- anyone have an objection to stating that
> explicitly?

Is that what pgindent reformats it to?



Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On lör, 2010-07-10 at 12:45 -0400, Tom Lane wrote:
>> I believe the project standard is to make things readable
>> in an 80-column window --- anyone have an objection to stating that
>> explicitly?

> Is that what pgindent reformats it to?

pgindent tries to leave a character or two to spare, IIRC, so its target
is probably 78 or thereabouts.
        regards, tom lane


Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Boxuan Zhai
Date:
Hi, 

Thanks for all these feedback. 

I found that people have problems on running my codes, which probably comes from my nonstandard submission format. I can compile, install and initialize postgres in my own machine. The system accepts MERGE command and will throw an error when it runs into the executor, which cannot recognize the MERGE command type so far. 

I will make a standard patch as soon as possible. Sorry for the troubles. 

Yours Boxuan 



2010/7/11 Tom Lane <tgl@sss.pgh.pa.us>
Peter Eisentraut <peter_e@gmx.net> writes:
> On lör, 2010-07-10 at 12:45 -0400, Tom Lane wrote:
>> I believe the project standard is to make things readable
>> in an 80-column window --- anyone have an objection to stating that
>> explicitly?

> Is that what pgindent reformats it to?

pgindent tries to leave a character or two to spare, IIRC, so its target
is probably 78 or thereabouts.

                       regards, tom lane

Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Greg Smith
Date:
Boxuan Zhai wrote:
> I found that people have problems on running my codes, which probably 
> comes from my nonstandard submission format. I can compile, install 
> and initialize postgres in my own machine. The system accepts MERGE 
> command and will throw an error when it runs into the executor, which 
> cannot recognize the MERGE command type so far. 

Your job as a potential contributor to PostgreSQL is to make it as easy 
as possible for others to test your code out and get good results.  I 
sent you some more detailed guidelines over the weekend as to what I 
think you should do here to achieve that.  You should wait until you've 
gotten a private review from one of the two people who have volunteered 
to help you out here before you submit anything else to the list.  
Wasting the time of everyone in the community by sharing code that 
doesn't mean any of the project guidelines is a very bad idea; please 
don't do that again.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Peter Eisentraut
Date:
On mån, 2010-07-12 at 10:04 -0400, Greg Smith wrote:
> Wasting the time of everyone in the community by sharing code that 
> doesn't mean any of the project guidelines is a very bad idea; please 
> don't do that again.

I think it's better to share code that doesn't mean project guidelines
and solicit advice rather than not to share anything.



Re: gSoC - ADD MERGE COMMAND - code patch submission

From
"Joshua D. Drake"
Date:
On Mon, 2010-07-12 at 23:28 +0300, Peter Eisentraut wrote:
> On mån, 2010-07-12 at 10:04 -0400, Greg Smith wrote:
> > Wasting the time of everyone in the community by sharing code that
> > doesn't mean any of the project guidelines is a very bad idea; please
> > don't do that again.
>
> I think it's better to share code that doesn't mean project guidelines
> and solicit advice rather than not to share anything.

Agreed.

It is great that we have guidelines. We should definitely encourage
people to use them. We should also lead, not push people into wanting to
use them.

Collaboration is good.


JD
--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering

Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Greg Smith
Date:
Peter Eisentraut wrote:
> I think it's better to share code that doesn't mean project guidelines
> and solicit advice rather than not to share anything.
>   

I feel the assumption that code is so valuable that it should be shared 
regardless of whether it meets conventions is a flawed one for this 
project.  There are already dozens, if not hundreds, of useful patch 
submissions that have been sent to this list, consumed time, and then 
gone nowhere because they didn't happen in a way that the community was 
able to integrate them properly.  For anyone who isn't producing 
commiter quality patches, the process is far more important than the 
code if you want to get something non-trivial accomplished.

Also, producing code in whatever format you want and dumping that on the 
community so that people like David Fetter waste their time cleaning it 
up is not the way the GSoC work is supposed to happen.  I didn't want 
any other current or potential future participants in that program to 
get the wrong idea from that example. 

There is a brief "get to know the community" period at the beginning of 
the summer schedule.  I think that next year this project would be well 
served to give each student a small patch to review during that time, as 
a formal intro to the community process.  The tendency among students to 
just wander off coding without doing any interaction like that is both 
common and counterproductive, given how patches to PostgreSQL actually 
shuffle along toward becoming commit quality code.  Far as I'm 
concerned, a day spent working with the patch review checklist on 
someone else's patch pays for itself tenfold when it comes time to 
produce patches that others will be able to review.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Tom Lane
Date:
Greg Smith <greg@2ndquadrant.com> writes:
> There is a brief "get to know the community" period at the beginning of 
> the summer schedule.  I think that next year this project would be well 
> served to give each student a small patch to review during that time, as 
> a formal intro to the community process.  The tendency among students to 
> just wander off coding without doing any interaction like that is both 
> common and counterproductive, given how patches to PostgreSQL actually 
> shuffle along toward becoming commit quality code.  Far as I'm 
> concerned, a day spent working with the patch review checklist on 
> someone else's patch pays for itself tenfold when it comes time to 
> produce patches that others will be able to review.

That seems like a great idea.

Is there a specific period when that's supposed to happen for GSoC
students?  Can we arrange for a commitfest to be running then?
(I guess it'd need to be early in the fest, else the low-hanging
fruit will be gone already.)
        regards, tom lane


Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Robert Haas
Date:
On Jul 12, 2010, at 4:16 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> I feel the assumption that code is so valuable that it should be shared regardless of whether it meets conventions is
aflawed one for this project.  There are already dozens, if not hundreds, of useful patch submissions that have been
sentto this list, consumed time, and then gone nowhere because they didn't happen in a way that the community was able
tointegrate them properly.   

True - but we don't want to unduly discourage potential contributors or make them afraid of posting, either.  It is for
thecommunity to decide whether the effort to clean up a patch is worthwhile, and to provide guidance on what must
change.Individual contributors shouldn't seek to take that process off-list, at least IMHO. 

The main problem with this patch is not that it was submitted as a RAR of multiple diffs against 8.4.3 instead of a
singlediff against HEAD: it's that we've apparently reached GSoC midterms without making progress beyond what Peter
hackedtogether whilst sitting in an airport. 

...Robert

Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Greg Smith
Date:
Tom Lane wrote:
> Is there a specific period when that's supposed to happen for GSoC
> students?  Can we arrange for a commitfest to be running then

The GSoC "Community bonding period" is described at 
http://googlesummerofcode.blogspot.com/2007/04/so-what-is-this-community-bonding-all.html 
and what to cover is a near perfect match for things like introducing 
the patch review and submission process.  This year, the period from 
when proposals were accepted on April 26th through the official coding 
start on May 24th were labeled as being devoted to that.  Given the way 
the release schedule has worked out the last few years, I expect that 
every year there will be a whole stack of possibly moldy patches sitting 
in the queue for the first CF of the next version at that point.  I 
don't think we necessarily need to organize a full on CF around that 
schedule, but picking a small patch for each student to start chewing on 
during that period would usefully settle them into list interaction and 
community development process much more gradually than starting that 
with their code drops. 

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Tom Lane
Date:
Greg Smith <greg@2ndquadrant.com> writes:
> Tom Lane wrote:
>> Is there a specific period when that's supposed to happen for GSoC
>> students?  Can we arrange for a commitfest to be running then

> The GSoC "Community bonding period" is described at 
> http://googlesummerofcode.blogspot.com/2007/04/so-what-is-this-community-bonding-all.html 
> and what to cover is a near perfect match for things like introducing 
> the patch review and submission process.  This year, the period from 
> when proposals were accepted on April 26th through the official coding 
> start on May 24th were labeled as being devoted to that.  Given the way 
> the release schedule has worked out the last few years, I expect that 
> every year there will be a whole stack of possibly moldy patches sitting 
> in the queue for the first CF of the next version at that point.

Hmm.  Assuming that we manage to keep to an annual release schedule
(no sure thing, since we've never done it yet) what that would mean
is that the students are looking for feedback while most of the key
developers are in heads-down, let's-get-this-release-to-beta mode.
Not sure how well that will work.  Still, we can try it.
        regards, tom lane


Re: gSoC - ADD MERGE COMMAND - code patch submission

From
"Joshua D. Drake"
Date:
On Mon, 2010-07-12 at 23:28 +0300, Peter Eisentraut wrote:
> On mån, 2010-07-12 at 10:04 -0400, Greg Smith wrote:
> > Wasting the time of everyone in the community by sharing code that 
> > doesn't mean any of the project guidelines is a very bad idea; please 
> > don't do that again.
> 
> I think it's better to share code that doesn't mean project guidelines
> and solicit advice rather than not to share anything.

Agreed.

It is great that we have guidelines. We should definitely encourage
people to use them. We should also lead, not push people into wanting to
use them.

Collaboration is good.


JD
-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering



Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Boxuan Zhai
Date:
Dear Hackers
 
I considered my situation. And I found that I didn't communicate well with you, as makes you have little confidence on my project. Most of the time I just work by myself and not report to you frequently. I always want to finish a solid stage progress before do a submission. This may be a bad habit in the remote project.
 
In fact, I have a detailed design on how to implement the command and I am working hard these days to catch the schedule.
 
In my design,
1.  the merge command is firstly transformed to a "MergeStmt" node in parser. And analyzer will generate a left out join query as the top query (or main query). This query is similar to a SELECT command query, but I set target relation in it. The top query will drive the scanning and joining over target and source tables.
 
The merge actions are transformed into lower level queries. I create a Query node  for each of them and append them in a newly create List field mergeActQry. The action queries have different command type and specific target list and qual list, according to their declaration by user. But they all share the same range table. This is because we don't need the action queries to be planned latter. The joining strategy is decided by the top query. We are only interest in their specific action qualifications. In other words, these action queries are only containers for their target list and qualifications.
 
2. When the query is ready, it will be send to rewriter. In this part, we can call RewriteQuery() to handle the action queries. The UPDATE action will trigger rules on UPDATE, and so on. What need to be noticed are: 1. the actions of the same type should not be rewritten repeatedly. If there are two UPDATE actions in merge command, we should not trigger the ON UPDATE rules twice. 2. if an action type is fully replaced by rules, we should remove all actions of this type from the action list.
Rewriter will also do some process on the target list of each action.
 
The first submission has finished the above part.
 
3. In planner, the top level query is handled in a normal way. Since it has almost the same structure as a SELECT query, the planner() function can work on it straight forward. However, we need a small change here. The merge command has a target relation, which need a ctid junk attribute in the target list. The ctid is required by the UPDATE and DELETE actions.
 
Besides, for each of the action queries, we also need to create a Plan node. We don't need to do a full plan on the action queries. The crucial point is to preprocess the target list and qualification of each action. (Explanation for this point. The execution of a merge action is composed by two parts. The top plan will be executed in the main loop, and return the joined tuples one by one. And a action will apply its qualification on the returned tuples. If succeed, it will take the action and do corresponding modification on the target table. Thus, even we have a Plan node created for each action, we don't want to throw it directly into Planner() function. That will generate a new plan over the tables in Range Table, which is very probably different with the top-level plan. If we run the action plans directly, they will be confilict with each other).
 
I create a function  merge_action_planner() to do this job. This part is added at the end of standard_planner(). After that, all the plans of merge actions are linked into a new List filed in PlannedStmt result of the top plan.

4. When planner is finished, the plan will be send to executor through PortalRun(). As a new command, merge will chose the PORTAL_MULTI_QUERY strategy, and be sent to ProcessQuery() function.
 
5. As in the ExecutorStart() part, we need to set junkfilter for merge command, since we have a ctid junk attr in target list. And, the merge action plans should also be initialized  and transformed into PlanState nodes. However, the initialization over action plan is only focus on the target list and quals. We don't need other part of traditional plan initialization, since these action plans are not for scanning or joining (this is the job of top plan). We only want to transform the action information into standard format that can be used by qualification evaluator in executor.

I HAVE DONE ALL THE ABOVE IN A SECOND SUBMISSION.
 
6. In ExecutorRun() part, the top plan will be passed into ExecutePlan(). The action planstates can be found in the
estate->es_plannedstmt field.
The top plan can return tuples of the left out join on source table and target table. (I can see the tuple be returned in my codes). Thus, the design is correct. At least the top plan can do its work well. In the junkfilter, if we can find a non-null ctid, it is a matched tuple, or else, it is a NOT MATCHED tuple. Then we need to evaluate the additional quals of the actions one by one. If the evaluations of one action succeed, we will take this action and skip the remaining ones. 
 
Since the target list and qual expressions are all processed by rewriter, planner and InitPlan(), I think they will be accepted by the ExecQual() function without many problems.
 
This is the last step, and I am still working on it.
 
PS: Heikki asked me about what the "EXPLAIN MERGE ..." command will do. Well, I have not test it, but it may through an error or just explain the top plan, since I put the action plans in a new field, which cannot be recognized by old functions.
 
 
Thanks!
 
Yours Boxuan.

Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Heikki Linnakangas
Date:
On 16/07/10 03:26, Boxuan Zhai wrote:
> PS: Heikki asked me about what the "EXPLAIN MERGE ..." command will do.
> Well, I have not test it, but it may through an error or just explain the
> top plan, since I put the action plans in a new field, which cannot be
> recognized by old functions.

I meant what EXPLAIN MERGE output will look like after the project is 
finished, not what it will do at this stage. I was trying to get a 
picture of how you're thinking to implement the executor, what nodes 
there is in a MERGE plan.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Boxuan Zhai
Date:
Hi,
 
For the EXPLAIN MERGE command, I expect it to return a result similar to that of a SELECT command.
 
I think the EXPLAIN command is to show how the tables in a query is scaned and joined. In my design, the merge command will generate a top-level query (and plan) as the main query. It is in fact a left join select query over the source and target tables.  This main query (plan) decides how the tables are scanned. The merge actions will not effect this process. So when we explain the merge command, a similar result will be returned.  

For example the command
EXPLAIN
MERGE INTO Stock USING Sale ON Stock.stock_id = Sale.sale_id
WHEN MATCHED THEN UPDATE SET balance = balance + sale.vol;
WHEN ....
.....
 
Will return a result just like that of the following command:
 
EXPLAIN
SELECT * FROM Sale LEFT JOIN Stock ON stock_id = sale_id;
 
Yours Boxuan.
 
2010/7/16 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
On 16/07/10 03:26, Boxuan Zhai wrote:
PS: Heikki asked me about what the "EXPLAIN MERGE ..." command will do.
Well, I have not test it, but it may through an error or just explain the
top plan, since I put the action plans in a new field, which cannot be
recognized by old functions.

I meant what EXPLAIN MERGE output will look like after the project is finished, not what it will do at this stage. I was trying to get a picture of how you're thinking to implement the executor, what nodes there is in a MERGE plan.

--
 Heikki Linnakangas

 EnterpriseDB   http://www.enterprisedb.com

Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Heikki Linnakangas
Date:
On 16/07/10 12:26, Boxuan Zhai wrote:
> For the EXPLAIN MERGE command, I expect it to return a result similar to
> that of a SELECT command.
>
> I think the EXPLAIN command is to show how the tables in a query is scaned
> and joined. In my design, the merge command will generate a top-level query
> (and plan) as the main query. It is in fact a left join select query over
> the source and target tables.  This main query (plan) decides how the tables
> are scanned. The merge actions will not effect this process. So when we
> explain the merge command, a similar result will be returned.
>
> For example the command
> EXPLAIN
> MERGE INTO Stock USING Sale ON Stock.stock_id = Sale.sale_id
> WHEN MATCHED THEN UPDATE SET balance = balance + sale.vol;
> WHEN ....
> .....
>
> Will return a result just like that of the following command:
>
> EXPLAIN
> SELECT * FROM Sale LEFT JOIN Stock ON stock_id = sale_id;

You really need to look at the changes in 9.0 in this area, you now have 
a Update/Delete/Insert node (implemented in 
src/backend/executor/nodeModifyTable.c) at the top of the plan for 
update/insert/delete commands:

postgres=# explain UPDATE foo SET id = 456 WHERE id = 123;                        QUERY PLAN
----------------------------------------------------------- Update  (cost=0.00..40.00 rows=12 width=6)   ->  Seq Scan
onfoo  (cost=0.00..40.00 rows=12 width=6)         Filter: (id = 123)
 
(3 rows)

I would expect there to be a Merge node similar to that, with 
Update/Insert/Delete subnodes for each action.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Simon Riggs
Date:
On Fri, 2010-07-16 at 08:26 +0800, Boxuan Zhai wrote:
> The merge actions are transformed into lower level queries. I create a
> Query node  for each of them and append them in a newly create List
> field mergeActQry. The action queries have different command type and
> specific target list and qual list, according to their declaration by
> user. But they all share the same range table. This is because we
> don't need the action queries to be planned latter. The joining
> strategy is decided by the top query. We are only interest in their
> specific action qualifications. In other words, these action queries
> are only containers for their target list and qualifications. 
>  
> 2. When the query is ready, it will be send to rewriter. In this part,
> we can call RewriteQuery() to handle the action queries. The UPDATE
> action will trigger rules on UPDATE, and so on. What need to be
> noticed are: 1. the actions of the same type should not be rewritten
> repeatedly. If there are two UPDATE actions in merge command, we
> should not trigger the ON UPDATE rules twice. 2. if an action type is
> fully replaced by rules, we should remove all actions of this type
> from the action list. 
> Rewriter will also do some process on the target list of each action. 

IMHO it is a bad thing that we are attempting to execute each action
statement as a query. That means we need to execute an inner SQL
statement for each row returned by the top level query.

That design makes MERGE similar in performance to an upsert PL/pgsql
function, which will perform terribly on large numbers of rows.

This was exactly the point where I stopped implementation previously:
attempting to make MERGE work with rules is enough to prevent a tighter
in-executor implementation of the action list.

[To Boxuan, on a personal note, you seem to be coping quite well with
the code and the process; congratulations and keep going.]

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services



Fwd: gSoC - ADD MERGE COMMAND - code patch submission

From
Boxuan Zhai
Date:


---------- Forwarded message ----------
From: Boxuan Zhai <bxzhai2010@gmail.com>
Date: 2010/7/17
Subject: Re: [HACKERS] gSoC - ADD MERGE COMMAND - code patch submission
To: Simon Riggs <simon@2ndquadrant.com>




2010/7/17 Simon Riggs <simon@2ndquadrant.com>

On Fri, 2010-07-16 at 08:26 +0800, Boxuan Zhai wrote:
> The merge actions are transformed into lower level queries. I create a
> Query node  for each of them and append them in a newly create List
> field mergeActQry. The action queries have different command type and
> specific target list and qual list, according to their declaration by
> user. But they all share the same range table. This is because we
> don't need the action queries to be planned latter. The joining
> strategy is decided by the top query. We are only interest in their
> specific action qualifications. In other words, these action queries
> are only containers for their target list and qualifications.
>
> 2. When the query is ready, it will be send to rewriter. In this part,
> we can call RewriteQuery() to handle the action queries. The UPDATE
> action will trigger rules on UPDATE, and so on. What need to be
> noticed are: 1. the actions of the same type should not be rewritten
> repeatedly. If there are two UPDATE actions in merge command, we
> should not trigger the ON UPDATE rules twice. 2. if an action type is
> fully replaced by rules, we should remove all actions of this type
> from the action list.
> Rewriter will also do some process on the target list of each action.

IMHO it is a bad thing that we are attempting to execute each action
statement as a query. That means we need to execute an inner SQL
statement for each row returned by the top level query.

That design makes MERGE similar in performance to an upsert PL/pgsql
function, which will perform terribly on large numbers of rows.

Dear Simmon,
 
Thanks for your feedback. I may not present my idea clearly. 
In my design, the merge actions are not executed as separate queries. Only the top level query (that is a query like "<source table> LEFT JOIN <target_table> ON <matching_qual>" ) will be planned and executed. For each tuple return by this plan, we will choose a proper action for it and do the corresponding modification. The tables will only be scanned and joined once. One merge action will not do a full run of tables join and then modify table as a standard UPDATE/DELETE/INSERT query.  (Is this what you are worried about?)
 
In fact, for one action, we only need the information of: 1. the action type (UPDATE or DELTE or INSERT). 2 the target list. and 3. the additional qualifications. And a Query node is a perfect container for these infor. That's why I transform them in to Query nodes. But all through the analyzer, rewriter, planner and executor. I just call related functions to formalize the expressions in their target list and qual lists. The range table and join tree is only dermined by the top level query, they will not be effected by merge actions.
 
 
 
This was exactly the point where I stopped implementation previously:
attempting to make MERGE work with rules is enough to prevent a tighter
in-executor implementation of the action list.
I am sorry that I don't catch your meanning here clearly.
As my understanding, if there is a rule on the target table, the rewriter will add a new query in the execution queue. (or replace the original query).  I think the rule queries will not effect the process within the original query, because they are totally separate queries which will be run before or after the original query. Are you suggest that we should not allow rules on MERGE command?
 
 
[To Boxuan, on a personal note, you seem to be coping quite well with
the code and the process; congratulations and keep going.]
 
Thank you. Your encouragement is very important to me.
 
--

 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services



Re: gSoC - ADD MERGE COMMAND - code patch submission

From
Boxuan Zhai
Date:
Hi,
 
I have just moved my modifications to the latest git edition. And I made a patch file through git diff as the second submission. I think the format is much better the my last submission.
 
As I mentioned before, our progress has come into the executor. So far, the executor can accept the top-level query and return tuples for it. The next step is to add action qualification evaluation on the returned tuple slot.
 
Thanks
 
Boxuan


 
2010/7/17 Boxuan Zhai <bxzhai2010@gmail.com>


---------- Forwarded message ----------
From: Boxuan Zhai <bxzhai2010@gmail.com>
Date: 2010/7/17
Subject: Re: [HACKERS] gSoC - ADD MERGE COMMAND - code patch submission
To: Simon Riggs <simon@2ndquadrant.com>




2010/7/17 Simon Riggs <simon@2ndquadrant.com>

On Fri, 2010-07-16 at 08:26 +0800, Boxuan Zhai wrote:
> The merge actions are transformed into lower level queries. I create a
> Query node  for each of them and append them in a newly create List
> field mergeActQry. The action queries have different command type and
> specific target list and qual list, according to their declaration by
> user. But they all share the same range table. This is because we
> don't need the action queries to be planned latter. The joining
> strategy is decided by the top query. We are only interest in their
> specific action qualifications. In other words, these action queries
> are only containers for their target list and qualifications.
>
> 2. When the query is ready, it will be send to rewriter. In this part,
> we can call RewriteQuery() to handle the action queries. The UPDATE
> action will trigger rules on UPDATE, and so on. What need to be
> noticed are: 1. the actions of the same type should not be rewritten
> repeatedly. If there are two UPDATE actions in merge command, we
> should not trigger the ON UPDATE rules twice. 2. if an action type is
> fully replaced by rules, we should remove all actions of this type
> from the action list.
> Rewriter will also do some process on the target list of each action.

IMHO it is a bad thing that we are attempting to execute each action
statement as a query. That means we need to execute an inner SQL
statement for each row returned by the top level query.

That design makes MERGE similar in performance to an upsert PL/pgsql
function, which will perform terribly on large numbers of rows.

Dear Simmon,
 
Thanks for your feedback. I may not present my idea clearly. 
In my design, the merge actions are not executed as separate queries. Only the top level query (that is a query like "<source table> LEFT JOIN <target_table> ON <matching_qual>" ) will be planned and executed. For each tuple return by this plan, we will choose a proper action for it and do the corresponding modification. The tables will only be scanned and joined once. One merge action will not do a full run of tables join and then modify table as a standard UPDATE/DELETE/INSERT query.  (Is this what you are worried about?)
 
In fact, for one action, we only need the information of: 1. the action type (UPDATE or DELTE or INSERT). 2 the target list. and 3. the additional qualifications. And a Query node is a perfect container for these infor. That's why I transform them in to Query nodes. But all through the analyzer, rewriter, planner and executor. I just call related functions to formalize the expressions in their target list and qual lists. The range table and join tree is only dermined by the top level query, they will not be effected by merge actions.
 
 
 
This was exactly the point where I stopped implementation previously:
attempting to make MERGE work with rules is enough to prevent a tighter
in-executor implementation of the action list.
I am sorry that I don't catch your meanning here clearly.
As my understanding, if there is a rule on the target table, the rewriter will add a new query in the execution queue. (or replace the original query).  I think the rule queries will not effect the process within the original query, because they are totally separate queries which will be run before or after the original query. Are you suggest that we should not allow rules on MERGE command?
 
 
[To Boxuan, on a personal note, you seem to be coping quite well with
the code and the process; congratulations and keep going.]
 
Thank you. Your encouragement is very important to me.
 
--

 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services




Attachment

Re: gSoC - ADD MERGE COMMAND - code patch submission

From
taskov
Date:
Hello,
could you tell me where I can find the latest version of the MERGE PATCH
file? I need to use it on PostgreSQL 9.3.
I couldn't find it anywhere in git.

Regards,
Nikolay




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/gSoC-ADD-MERGE-COMMAND-code-patch-submission-tp1956415p5785822.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.