Thread: Need a mentor, and a project.

Need a mentor, and a project.

From
abindra@u.washington.edu
Date:
Hello there,

I am a graduate student at the University of Washington, Tacoma
(http://www.tacoma.washington.edu/tech/) with an interest in databases (especially query processing). I am familiar
withdatabase theory and in an earlier life I used to be an application developer and have done a lot of SQL/database
relatedwork. I have been interested in learning and contribution to postgres for a while now. This quarter I was the TA
forthe undergrad intro to database class. I convinced my Prof. to use Postgresql to teach and it has been fun. It has
alsoallowed me to familiarize myself with postgres from an external user's point of view.
 

Next quarter I am planning to do an Independent Study course where the main objective would be to allow me to get
familiarwith the internals of Postgres by working on a project(s). I would like to work on something that could
possiblybe accepted as a patch.
 

This is (I think) somewhat similar to what students do during google summer and I was hoping to get some help here in
termsof:
 
1. A good project to work on for a newbie.
2. Would someone be willing to be a mentor? It would be nice to be able to get some guidance on a one-to-one basis.

Thanks for your time. If you have any questions or need more information, please do let me know.

Regards
Ashish








Re: Need a mentor, and a project.

From
"Albe Laurenz"
Date:
abindra wrote:
> Next quarter I am planning to do an Independent Study course
> where the main objective would be to allow me to get familiar
> with the internals of Postgres by working on a project(s). I
> would like to work on something that could possibly be
> accepted as a patch.
>
> This is (I think) somewhat similar to what students do during
> google summer and I was hoping to get some help here in terms of:
> 1. A good project to work on for a newbie.
> 2. Would someone be willing to be a mentor? It would be nice
> to be able to get some guidance on a one-to-one basis.

I would start with the TODO list: http://wiki.postgresql.org/wiki/Todo
These are things for which there is a consensus that it would be
a good idea to implement them. Pick things that look interesting to
you, and try to read the discussions in the archives that lead
to the TODO items.

Bring the topic up in the hackers list, say that you would like
to work on this or that TODO item, present your ideas of how you
want to do it. Ask about things where you feel insecure.
If you get some support, proceed to write a patch. Ask for
directions, post half-baked patches and ask for comments.

That is because you will probably receive a good amount of
critizism and maybe rejection, and if you invest a couple of
months into working on something that nobody knows about *and*
your work gets rejected, that is much worse than drawing fire
right away.

It's probably not easy to find a mentor (unless you have money
to give away), but you may find people who are interested in
what you are doing and who will help you.

Yours,
Laurenz Albe


Re: Need a mentor, and a project.

From
Joshua Tolley
Date:
On Mon, Dec 07, 2009 at 09:53:32AM +0100, Albe Laurenz wrote:
> abindra wrote:
> > Next quarter I am planning to do an Independent Study course
> > where the main objective would be to allow me to get familiar
> > with the internals of Postgres by working on a project(s). I
> > would like to work on something that could possibly be
> > accepted as a patch.
> >
> > This is (I think) somewhat similar to what students do during
> > google summer and I was hoping to get some help here in terms of:
> > 1. A good project to work on for a newbie.
> > 2. Would someone be willing to be a mentor? It would be nice
> > to be able to get some guidance on a one-to-one basis.
>
> I would start with the TODO list: http://wiki.postgresql.org/wiki/Todo
> These are things for which there is a consensus that it would be
> a good idea to implement them. Pick things that look interesting to
> you, and try to read the discussions in the archives that lead
> to the TODO items.

I agree the TODO list is a good place to start. Other good sources include the
-hackers list and comments in the code. I was surprised when I began taking an
interest in PostgreSQL how rarely interesting projects mentioned on -hackers
made it into the TODO list; I've come to realize that the TODO contains, in
general, very non-controversial items everyone is pretty sure we could use,
whereas -hackers ranges freely over other topics which are still very
interesting but often more controversial or less obviously necessary.
Committed patches both large and small address TODO list items fairly rarely,
so don't get too hung up on finding something from the TODO list alone.

> Bring the topic up in the hackers list, say that you would like
> to work on this or that TODO item, present your ideas of how you
> want to do it. Ask about things where you feel insecure.
> If you get some support, proceed to write a patch. Ask for
> directions, post half-baked patches and ask for comments.
>
> That is because you will probably receive a good amount of
> critizism and maybe rejection, and if you invest a couple of
> months into working on something that nobody knows about *and*
> your work gets rejected, that is much worse than drawing fire
> right away.

+1. Especially when developing a complex patch, and especially when you're new
to the community, you need to avoid working in a vacuum, for social as well as
technical reasons. The more complex a patch, the more consensus you'll
eventually need to achieve before getting it committed, in general, and it
helps to gain that consensus early on, rather than after you've written a lot
of code. The keyword "proposal" might be a useful search term when digging in
the -hackers archives for historical examples.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

Re: Need a mentor, and a project.

From
Robert Haas
Date:
On Sun, Dec 6, 2009 at 9:24 PM,  <abindra@u.washington.edu> wrote:
> 2. Would someone be willing to be a mentor? It would be nice to be able to get some guidance on a one-to-one basis.

I might be willing to do this, but if you pick a project that is
outside my area of knowledge then I might not be able to help as much.

...Robert


Re: Need a mentor, and a project.

From
Ashish
Date:
Albe & Joshua, thanks for the advice. I am in the process of deciding what to work on and am looking at the TODO list.
Idefinitely do not intend to work in a vacuum :-) I am really excited about this and look forward to being challenged
andlearning a lot.
 

Regards
Ashish


On Mon, 7 Dec 2009, Joshua Tolley wrote:

> On Mon, Dec 07, 2009 at 09:53:32AM +0100, Albe Laurenz wrote:
>> abindra wrote:
>>> Next quarter I am planning to do an Independent Study course
>>> where the main objective would be to allow me to get familiar
>>> with the internals of Postgres by working on a project(s). I
>>> would like to work on something that could possibly be
>>> accepted as a patch.
>>>
>>> This is (I think) somewhat similar to what students do during
>>> google summer and I was hoping to get some help here in terms of:
>>> 1. A good project to work on for a newbie.
>>> 2. Would someone be willing to be a mentor? It would be nice
>>> to be able to get some guidance on a one-to-one basis.
>>
>> I would start with the TODO list: http://wiki.postgresql.org/wiki/Todo
>> These are things for which there is a consensus that it would be
>> a good idea to implement them. Pick things that look interesting to
>> you, and try to read the discussions in the archives that lead
>> to the TODO items.
>
> I agree the TODO list is a good place to start. Other good sources include the
> -hackers list and comments in the code. I was surprised when I began taking an
> interest in PostgreSQL how rarely interesting projects mentioned on -hackers
> made it into the TODO list; I've come to realize that the TODO contains, in
> general, very non-controversial items everyone is pretty sure we could use,
> whereas -hackers ranges freely over other topics which are still very
> interesting but often more controversial or less obviously necessary.
> Committed patches both large and small address TODO list items fairly rarely,
> so don't get too hung up on finding something from the TODO list alone.
>
>> Bring the topic up in the hackers list, say that you would like
>> to work on this or that TODO item, present your ideas of how you
>> want to do it. Ask about things where you feel insecure.
>> If you get some support, proceed to write a patch. Ask for
>> directions, post half-baked patches and ask for comments.
>>
>> That is because you will probably receive a good amount of
>> critizism and maybe rejection, and if you invest a couple of
>> months into working on something that nobody knows about *and*
>> your work gets rejected, that is much worse than drawing fire
>> right away.
>
> +1. Especially when developing a complex patch, and especially when you're new
> to the community, you need to avoid working in a vacuum, for social as well as
> technical reasons. The more complex a patch, the more consensus you'll
> eventually need to achieve before getting it committed, in general, and it
> helps to gain that consensus early on, rather than after you've written a lot
> of code. The keyword "proposal" might be a useful search term when digging in
> the -hackers archives for historical examples.
>
> --
> Joshua Tolley / eggyknap
> End Point Corporation
> http://www.endpoint.com
>




Re: Need a mentor, and a project.

From
Josh Berkus
Date:
On 12/7/09 4:41 PM, Ashish wrote:
> Albe & Joshua, thanks for the advice. I am in the process of deciding
> what to work on and am looking at the TODO list. I definitely do not
> intend to work in a vacuum :-) I am really excited about this and look
> forward to being challenged and learning a lot.

When you decide what you want to work on, let us know and we'll try to
find you an appropriate mentor.

--Josh Berkus


Re: Need a mentor, and a project.

From
Ashish
Date:
Hi Robert,

Thanks. If I may, what encompasses your area of expertise...

BTW Congratulation on becoming a committer!

Regards
Ashish

On Mon, 7 Dec 2009, Robert Haas wrote:

> On Sun, Dec 6, 2009 at 9:24 PM,  <abindra@u.washington.edu> wrote:
>> 2. Would someone be willing to be a mentor? It would be nice to be able to get some guidance on a one-to-one basis.
>
> I might be willing to do this, but if you pick a project that is
> outside my area of knowledge then I might not be able to help as much.
>
> ...Robert
>




Re: Need a mentor, and a project.

From
Robert Haas
Date:
On Mon, Dec 7, 2009 at 8:04 PM, Ashish <abindra@u.washington.edu> wrote:
> Hi Robert,
>
> Thanks. If I may, what encompasses your area of expertise...
>
> BTW Congratulation on becoming a committer!

Thanks.  As others have said, it's probably best to pick a project
first, or at least an area.  It's more important to find something
you're interested in working on than to think about working with some
particular person.

...Robert


Re: Need a mentor, and a project.

From
Peter Eisentraut
Date:
On mån, 2009-12-07 at 09:53 +0100, Albe Laurenz wrote:
> I would start with the TODO list: http://wiki.postgresql.org/wiki/Todo
> These are things for which there is a consensus that it would be
> a good idea to implement them.

The Todo list is not a list of things for which such a consensus exists.
The Todo list is in general a list of things that someone thought should
be considered at some point.  But unless the item is linked to a mailing
list thread that already shows a consensus about the feature, you need
to start with a discussion about a plan.

So don't submit a project plan to your university or boss based on "I
will work on item X because it's on the Todo list" without taking ample
time to discuss things here first.



Re: Need a mentor, and a project.

From
Greg Smith
Date:
Peter Eisentraut wrote:
> But unless the item is linked to a mailing
> list thread that already shows a consensus about the feature, you need
> to start with a discussion about a plan.
>   
And realistically, even if the item is so linked, someone new to the 
project still shouldn't just plow away on it without asking for 
confirmation first anyway.  There are many things on the TODO list that 
everyone would like to see fixed, the problem is well defined and 
unambiguous, but the way the solution needs to be structured is much 
harder than is obvious.  As a simplest example, we regularly have people 
show up with patches where the "solution" was "just add threading to the 
back-end here..." which might seem completely reasonable to someone 
new--but it will never get committed.

-- 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com



Re: Need a mentor, and a project.

From
Ashish
Date:
I am thinking about starting with the following TODO item:

--> Have EXPLAIN ANALYZE issue NOTICE messages when the estimated and actual row counts differ by a specified
percentage.

I picked this because it is somewhat related to query processing which is what I am most interested in. It also <seems>
likea good start up project for a newbie like me. Before I start looking into what this would involve and start a
conversationon designing a solution - I wanted to know what you guys think about this particular TODO, and it
suitabilityto a newbie. Looking forward to your comments...
 

Thanks
Ashish

On Mon, 7 Dec 2009, Josh Berkus wrote:

> On 12/7/09 4:41 PM, Ashish wrote:
>> Albe & Joshua, thanks for the advice. I am in the process of deciding
>> what to work on and am looking at the TODO list. I definitely do not
>> intend to work in a vacuum :-) I am really excited about this and look
>> forward to being challenged and learning a lot.
>
> When you decide what you want to work on, let us know and we'll try to
> find you an appropriate mentor.
>
> --Josh Berkus
>




Re: Need a mentor, and a project.

From
Bruce Momjian
Date:
Ashish wrote:
> I am thinking about starting with the following TODO item:
>
> --> Have EXPLAIN ANALYZE issue NOTICE messages when the estimated
> and actual row counts differ by a specified percentage.
>
> I picked this because it is somewhat related to query processing
> which is what I am most interested in. It also <seems> like a
> good start up project for a newbie like me. Before I start
> looking into what this would involve and start a conversation
> on designing a solution - I wanted to know what you guys think
> about this particular TODO, and it suitability to a newbie.
> Looking forward to your comments...

I even have a sample patch you can use as a start, attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/ref/explain.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/explain.sgml,v
retrieving revision 1.38
diff -c -c -r1.38 explain.sgml
*** doc/src/sgml/ref/explain.sgml    18 Sep 2006 19:54:01 -0000    1.38
--- doc/src/sgml/ref/explain.sgml    22 Dec 2006 17:09:05 -0000
***************
*** 64,72 ****
    <para>
     The <literal>ANALYZE</literal> option causes the statement to be actually executed, not only
     planned.  The total elapsed time expended within each plan node (in
!    milliseconds) and total number of rows it actually returned are added to
!    the display.  This is useful for seeing whether the planner's estimates
!    are close to reality.
    </para>

    <important>
--- 64,72 ----
    <para>
     The <literal>ANALYZE</literal> option causes the statement to be actually executed, not only
     planned.  The total elapsed time expended within each plan node (in
!    milliseconds) and total number of rows it actually returned and variance are added to
!    the display.  A sign of the variance indicates whether the estimate was too high or too low.
!    This is useful for seeing how close the planner's estimates are to reality.
    </para>

    <important>
***************
*** 222,229 ****

                                                         QUERY PLAN
   

-------------------------------------------------------------------------------------------------------------------------
!  HashAggregate  (cost=39.53..39.53 rows=1 width=8) (actual time=0.661..0.672 rows=7 loops=1)
!    ->  Index Scan using test_pkey on test  (cost=0.00..32.97 rows=1311 width=8) (actual time=0.050..0.395 rows=99
loops=1)
           Index Cond: ((id > $1) AND (id < $2))
   Total runtime: 0.851 ms
  (4 rows)
--- 222,229 ----

                                                         QUERY PLAN
   

-------------------------------------------------------------------------------------------------------------------------
!  HashAggregate  (cost=39.53..39.53 rows=1 width=8) (actual time=0.661..0.672 rows=7 var=-6.00 loops=1)
!    ->  Index Scan using test_pkey on test  (cost=0.00..32.97 rows=1311 width=8) (actual time=0.050..0.395 rows=99
var=+12.24loops=1) 
           Index Cond: ((id > $1) AND (id < $2))
   Total runtime: 0.851 ms
  (4 rows)
Index: src/backend/commands/explain.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/explain.c,v
retrieving revision 1.152
diff -c -c -r1.152 explain.c
*** src/backend/commands/explain.c    4 Oct 2006 00:29:51 -0000    1.152
--- src/backend/commands/explain.c    22 Dec 2006 17:09:09 -0000
***************
*** 57,62 ****
--- 57,63 ----
  static void show_sort_keys(Plan *sortplan, int nkeys, AttrNumber *keycols,
                 const char *qlabel,
                 StringInfo str, int indent, ExplainState *es);
+ static double ExplainVariance(double estimate, double actual);

  /*
   * ExplainQuery -
***************
*** 704,713 ****
      {
          double        nloops = planstate->instrument->nloops;

!         appendStringInfo(str, " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
                           1000.0 * planstate->instrument->startup / nloops,
                           1000.0 * planstate->instrument->total / nloops,
                           planstate->instrument->ntuples / nloops,
                           planstate->instrument->nloops);
      }
      else if (es->printAnalyze)
--- 705,716 ----
      {
          double        nloops = planstate->instrument->nloops;

!         appendStringInfo(str, " (actual time=%.3f..%.3f rows=%.0f var=%+.2f loops=%.0f)",
                           1000.0 * planstate->instrument->startup / nloops,
                           1000.0 * planstate->instrument->total / nloops,
                           planstate->instrument->ntuples / nloops,
+                          ExplainVariance(plan->plan_rows,
+                                     planstate->instrument->ntuples / nloops),
                           planstate->instrument->nloops);
      }
      else if (es->printAnalyze)
***************
*** 1205,1207 ****
--- 1208,1225 ----

      appendStringInfo(str, "\n");
  }
+
+
+ static double ExplainVariance(double estimate, double actual)
+ {
+     if (estimate == actual)
+         return 0;
+     else if (actual == 0)
+         return estimate;
+     else if (estimate == 0)
+         return -actual;
+     else if (estimate > actual)
+         return (estimate / actual) - 1;
+     else
+         return -(actual / estimate - 1);
+ }

Re: Need a mentor, and a project.

From
Robert Haas
Date:
On Fri, Dec 11, 2009 at 9:00 PM, Ashish <abindra@u.washington.edu> wrote:
> I am thinking about starting with the following TODO item:
>
> --> Have EXPLAIN ANALYZE issue NOTICE messages when the estimated and actual
> row counts differ by a specified percentage.
>
> I picked this because it is somewhat related to query processing which is
> what I am most interested in. It also <seems> like a good start up project
> for a newbie like me. Before I start looking into what this would involve
> and start a conversation on designing a solution - I wanted to know what you
> guys think about this particular TODO, and it suitability to a newbie.
> Looking forward to your comments...

If we're going to do this, I think we should implement this as an
optional behavior controlled by a new EXPLAIN option (maybe VARIANCE,
following Bruce's patch?) and generate the output using
ExplainProperty<some-data-type>.  We could possibly make the option
take an optional threshold indicating how much variance is required
before the variance gets displayed, and display the variance for every
node if VARIANCE is specified without an argument.

...Robert


Re: Need a mentor, and a project.

From
Robert Haas
Date:
On Fri, Dec 11, 2009 at 9:05 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Ashish wrote:
>> I am thinking about starting with the following TODO item:
>>
>> --> Have EXPLAIN ANALYZE issue NOTICE messages when the estimated
>> and actual row counts differ by a specified percentage.
>>
>> I picked this because it is somewhat related to query processing
>> which is what I am most interested in. It also <seems> like a
>> good start up project for a newbie like me. Before I start
>> looking into what this would involve and start a conversation
>> on designing a solution - I wanted to know what you guys think
>> about this particular TODO, and it suitability to a newbie.
>> Looking forward to your comments...
>
> I even have a sample patch you can use as a start, attached.

Interesting.  The logic in ExplainVariance() doesn't look right to me
- the cases where one argument is zero seem like they will produce a
differently-scaled result than otherwise.

...Robert


Re: Need a mentor, and a project.

From
Bruce Momjian
Date:
Robert Haas wrote:
> On Fri, Dec 11, 2009 at 9:05 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > Ashish wrote:
> >> I am thinking about starting with the following TODO item:
> >>
> >> --> Have EXPLAIN ANALYZE issue NOTICE messages when the estimated
> >> and actual row counts differ by a specified percentage.
> >>
> >> I picked this because it is somewhat related to query processing
> >> which is what I am most interested in. It also <seems> like a
> >> good start up project for a newbie like me. Before I start
> >> looking into what this would involve and start a conversation
> >> on designing a solution - I wanted to know what you guys think
> >> about this particular TODO, and it suitability to a newbie.
> >> Looking forward to your comments...
> >
> > I even have a sample patch you can use as a start, attached.
> 
> Interesting.  The logic in ExplainVariance() doesn't look right to me
> - the cases where one argument is zero seem like they will produce a
> differently-scaled result than otherwise.

Yea, it is just a starting point for him.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Need a mentor, and a project.

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Ashish wrote:
>> I am thinking about starting with the following TODO item:
>> --> Have EXPLAIN ANALYZE issue NOTICE messages when the estimated
>> and actual row counts differ by a specified percentage.

> I even have a sample patch you can use as a start, attached.

Of course, the reason that patch isn't already in there is that it's
pretty much useless.  It clutters what's already cluttered output
and doesn't do much of anything to help draw one's attention to the
larger estimation errors, which of course is what the TODO item is
really about.

IMO the hard part of the TODO item is to design a useful user interface
for highlighting specific EXPLAIN entries (and NOTICE messages probably
ain't it either).  Getting the numbers is trivial.

I'm not sure there is any really nice solution within the confines of
plain ASCII text output.  There was an interesting approach online
at http://explain-analyze.info, but that site seems to be down now :-(
        regards, tom lane


Re: Need a mentor, and a project.

From
Euler Taveira de Oliveira
Date:
Tom Lane escreveu:
> I'm not sure there is any really nice solution within the confines of
> plain ASCII text output.  There was an interesting approach online
> at http://explain-analyze.info, but that site seems to be down now :-(
> 
Estimation error is one of the ideas. The other ones I have in mind is: (i)
accumulative time or percentage per node and (ii) color node that has
estimation off (if the terminal support colors). Of course, those features
should be enabled using some explain options like ACCUMULATIVE and COLOR.

Another explain tool that has a similar approach is http://explain.depesz.com/ .


--  Euler Taveira de Oliveira http://www.timbira.com/


Re: Need a mentor, and a project.

From
decibel
Date:
On Dec 11, 2009, at 8:44 PM, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
>> Ashish wrote:
>>> I am thinking about starting with the following TODO item:
>>> --> Have EXPLAIN ANALYZE issue NOTICE messages when the estimated
>>> and actual row counts differ by a specified percentage.
>
>> I even have a sample patch you can use as a start, attached.
>
> IMO the hard part of the TODO item is to design a useful user interface
> for highlighting specific EXPLAIN entries (and NOTICE messages probably
> ain't it either).  Getting the numbers is trivial.

What about prefixing explain output with line numbers? NOTICEs (or whatever mechanism) could then reference the line
numbers.

Unfortunately, I think you'll be very hard-pressed to come up with a way to denote problems on the lines themselves,
sincehorizontal space is already very hard to come by in complex plans. 
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net




Re: Need a mentor, and a project.

From
Gurjeet Singh
Date:
<div dir="ltr"><div class="gmail_quote">2009/12/16 decibel <span dir="ltr"><<a
href="mailto:decibel@decibel.org">decibel@decibel.org</a>></span><br/><blockquote class="gmail_quote"
style="border-left:1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="im">On Dec
11,2009, at 8:44 PM, Tom Lane wrote:<br /> > Bruce Momjian <<a
href="mailto:bruce@momjian.us">bruce@momjian.us</a>>writes:<br /> >> Ashish wrote:<br /> >>> I am
thinkingabout starting with the following TODO item:<br /> >>> --> Have EXPLAIN ANALYZE issue NOTICE
messageswhen the estimated<br /> >>> and actual row counts differ by a specified percentage.<br /> ><br />
>>I even have a sample patch you can use as a start, attached.<br /> ><br /></div><div class="im">> IMO the
hardpart of the TODO item is to design a useful user interface<br /> > for highlighting specific EXPLAIN entries
(andNOTICE messages probably<br /> > ain't it either).  Getting the numbers is trivial.<br /><br /></div>What about
prefixingexplain output with line numbers? NOTICEs (or whatever mechanism) could then reference the line numbers.<br
/><br/></blockquote></div><br />+1<br clear="all" /><br />-- <br />Lets call it Postgres<br /><br />EnterpriseDB    
 <ahref="http://www.enterprisedb.com">http://www.enterprisedb.com</a><br /><br />gurjeet[.singh]@EnterpriseDB.com<br
/><br/> singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com<br />Twitter: singh_gurjeet<br />Skype:
singh_gurjeet<br/><br />Mail sent from my BlackLaptop device<br /></div>