Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking... - Mailing list pgsql-hackers

From Kang Yuzhe
Subject Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...
Date
Msg-id CAH=t1kqRDKe4tjkUOVwOnxQJBqN3uooh9zSBR5NY56d5ZCYHMQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...  (Kevin Grittner <kgrittn@gmail.com>)
List pgsql-hackers
Thanks Kevin for taking your time and justifying the real difficult of finding ones space/way in PG development.And thanks for your genuine advice which I have taken it AS IS.
My question is why is that there is a lot of hands-on about PG application development(eg. connecting to PG using JAVA/JDBC) but almost nothing about PG hacking hands-on lessons. For example, I wanna add the keyword "Encrypted" in "CREATE TABLE t1(a int, b int encrypted)" or "CREATE TABLE t1(a int, b int) encrypted". Alas, its not easy task.

Lastly, I have come to understand that PG community is not harsh to newbies and thus, I am feeling at home.

Regards,
Zeray

On Mon, Apr 17, 2017 at 6:53 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
On Tue, Mar 28, 2017 at 10:36 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

> Personally I have to agree that the learning curve is very steep. Some
> of the docs and presentations help, but there's a LOT to understand.

Some small patches can be kept to a fairly narrow set of areas, and
if you can find a similar capability to can crib technique for
handling some of the more mysterious areas it might brush up
against.  When I started working on my first *big* patch that was
bound to touch many areas (around the start of development for 9.1)
I counted lines of code and found over a million lines just in .c
and .h files.  We're now closing in on 1.5 million lines.  That's
not counting over 376,000 lines of documentation in .sgml files,
over 12,000 lines of text in README* files, over 26,000 lines of
perl code, over 103,000 lines of .sql code (60% of which is in
regression tests), over 38,000 lines of .y code (for flex/bison
parsing), about 9,000 lines of various type of code just for
generating the configure file, and over 439,000 lines of .po files
(for message translations).  I'm sure I missed a lot of important
stuff there, but it gives some idea the challenge it is to get your
head around it all.

My first advice is to try to identify which areas of the code you
will need to touch, and read those over.  Several times.  Try to
infer the API to areas *that* code needs to reference from looking
at other code (as similar to what you want to work on as you can
find), reading code comments and README  files, and asking
questions.  Secondly, there is a lot that is considered to be
"coding rules" that is, as far as I've been able to tell, only
contained inside the heads of veteran PostgreSQL coders, with
occasional references in the discussion list archives.  Asking
questions, proposing approaches before coding, and showing work in
progress early and often will help a lot in terms of discovering
these issues and allowing you to rearrange things to fit these
conventions.  If someone with the "gift of gab" is able to capture
these and put them into a readily available form, that would be
fantastic.

> * SSI (haven't gone there yet myself)

For anyone wanting to approach this area, there is a fair amount to
look at.  There is some overlap, but in rough order of "practical"
to "theoretical foundation", you might want to look at:

https://www.postgresql.org/docs/current/static/transaction-iso.html

https://wiki.postgresql.org/wiki/SSI

The SQL standard

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/backend/storage/lmgr/README-SSI;hb=refs/heads/master

http://www.vldb.org/pvldb/vol5.html

http://hdl.handle.net/2123/5353

Papers cited in these last two.  I have found papers authored by
Alan Fekete or Adul Adya particularly enlightening.

If any of the other areas that Craig listed have similar work
available, maybe we should start a Wiki page where we list areas of
code (starting with the list Craig included) as section headers, and
put links to useful reading below each?

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

pgsql-hackers by date:

Previous
From: Nikhil Sontakke
Date:
Subject: Re: [HACKERS] Failed recovery with new faster 2PC code
Next
From: Amit Khandekar
Date:
Subject: Re: [HACKERS] Parallel Append implementation