Thread: I am confused after reading codes of PostgreSQL three week

I am confused after reading codes of PostgreSQL three week

From

hom

Date:

17 March 2011, 11:57:21

Hi,
 I try to known how a database is implemented and I have been reading
PG source codes for a month.

Now, I only know a little about how PG work.  :(

I just know PG work like this but I don't know why PG work like this.  :(  :(

even worse, I feel I can better understand the source code. it may be
that I could't split the large module into small piece which may help
to understand.

Is there any article or some way could help understand the source code ?

Thanks for help ~

--
Best Wishes!

                                     hom

Re: I am confused after reading codes of PostgreSQL three week

From

Bruce Momjian

Date:

17 March 2011, 12:22:22

hom wrote:
> Hi,
> 
>   I try to known how a database is implemented and I have been reading
> PG source codes for a month.
> 
> Now, I only know a little about how PG work.  :(
> 
> I just know PG work like this but I don't know why PG work like this.  :(  :(
> 
> even worse, I feel I can better understand the source code. it may be
> that I could't split the large module into small piece which may help
> to understand.
> 
> Is there any article or some way could help understand the source code ?

I assume you have looked at these places:
http://wiki.postgresql.org/wiki/Developer_FAQhttp://www.postgresql.org/developer/coding

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: I am confused after reading codes of PostgreSQL three week

From

"Kevin Grittner"

Date:

17 March 2011, 12:50:09

hom <obsidianhom@gmail.com> wrote:
> I try to known how a database is implemented and I have been
> reading PG source codes for a month.
That's ambitious.
find -name '*.h' -or -name '*.c' \ | egrep -v '^\./src/test/.+/tmp_check/' \ | xargs cat | wc -l
1059144
Depending on how you do the math, that's about 50,000 lines of code
per day to get through it in the time you mention.
> Is there any article or some way could help understand the source
> code ?
Your best bet would be to follow links from the Developers tab on
the main PostgreSQL web site:
http://www.postgresql.org/developer/
In particular the Developer FAQ page:
http://wiki.postgresql.org/wiki/Developer_FAQ
And the "Coding" links:
http://www.postgresql.org/developer/coding
may help.
Before reading code in a directory, be sure to read any README
file(s) in that directory carefully.
It helps to read this list.
In spite of reviewing all of that myself, it was rather intimidating
when I went to work on a major patch 14 months ago.  Robert Haas
offered some good advice which served me well in that effort --
divide the effort in to a series of incremental steps, each of which
deals with a small enough portion of the code to get your head
around.  As you work in any one narrow area, it becomes increasingly
clear; with that as a base you can expand your scope.
When you're working in the code, it is tremendously helpful to use
an editor with ctags support (or similar IDE functionality).
I hope this is helpful.  Good luck.
-Kevin

Re: I am confused after reading codes of PostgreSQL three week

From

Markus Wanner

Date:

18 March 2011, 06:51:49

Hom,

On 03/17/2011 04:49 PM, Kevin Grittner wrote:
> That's ambitious.

Absolutely, yes.  Exercise patience with yourself.

A method that hasn't been mentioned, yet, is digging out your debugger
and attach it to a connected Postgres backend.  You can then issue a
query you are interested in and follow the backend doing its work.

That's particularly helpful in trying to find a certain spot of
interest.  Of course, it doesn't help much in getting the big picture.

Good luck on your journey through the code base.

Regards

Markus Wanner

Re: I am confused after reading codes of PostgreSQL three week

From

Brendan Jurd

Date:

18 March 2011, 07:09:51

On 18 March 2011 01:57, hom <obsidianhom@gmail.com> wrote:
>  I try to known how a database is implemented

This objective is so vast and so vague that it's difficult to give
meaningful help.

I'd emphasise Kevin Grittner's very worthwhile advice.  Try to break
your question down into smaller, more specific ones.  With a question
like "how does postgres work" you're likely to flounder.  But with a
more targeted question, e.g., "what format does postgres use to save
data to disk" or "how does postgres implement ORDER BY", you can make
easier progress, and perhaps you could get more useful pointers from
the people on this list.

Have you read through the "Overview of System Internals" chapter in
the documentation [1]?  Perhaps it will help you identify the areas
you wish to explore further, and form more specific questions.

[1] http://www.postgresql.org/docs/current/static/overview.html

Cheers,
BJ

Re: I am confused after reading codes of PostgreSQL three week

From

Vaibhav Kaushal

Date:

18 March 2011, 07:33:32

Hi,

That was the question I was facing 5 months ago and trust me I am doing it even now. With an average of 6+ hours going into PostgreSQL Code, even with best practices (as suggested by the developers) I still think I know less than 10 percent. It is too huge to be swallowed at once.

I too had to break it down into pieces and because everything is so interconnected with everything else, it is quite complicated in the beginning. Start with one piece; planner, parser, executor, storage management whatever and slowly it should help you get the bigger picture.

regards,

Vaibhav

I had to break it into

On Fri, Mar 18, 2011 at 3:39 PM, Brendan Jurd <direvus@gmail.com> wrote:

On 18 March 2011 01:57, hom <obsidianhom@gmail.com> wrote:
> I try to known how a database is implemented

This objective is so vast and so vague that it's difficult to give
meaningful help.

I'd emphasise Kevin Grittner's very worthwhile advice. Try to break
your question down into smaller, more specific ones. With a question
like "how does postgres work" you're likely to flounder. But with a
more targeted question, e.g., "what format does postgres use to save
data to disk" or "how does postgres implement ORDER BY", you can make
easier progress, and perhaps you could get more useful pointers from
the people on this list.

Have you read through the "Overview of System Internals" chapter in
the documentation [1]? Perhaps it will help you identify the areas
you wish to explore further, and form more specific questions.

[1] http://www.postgresql.org/docs/current/static/overview.html

Cheers,
BJ

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: I am confused after reading codes of PostgreSQL three week

From

hom

Date:

18 March 2011, 11:10:46

2011/3/17 Bruce Momjian <bruce@momjian.us>:
> hom wrote:
>> Hi,
>>
>>   I try to known how a database is implemented and I have been reading
>> PG source codes for a month.
>>
>> Now, I only know a little about how PG work.  :(
>>
>> I just know PG work like this but I don't know why PG work like this.  :(  :(
>>
>> even worse, I feel I can better understand the source code. it may be
>> that I could't split the large module into small piece which may help
>> to understand.
>>
>> Is there any article or some way could help understand the source code ?
>
> I assume you have looked at these places:
>
>        http://wiki.postgresql.org/wiki/Developer_FAQ
>        http://www.postgresql.org/developer/coding
>
> --
>  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>  EnterpriseDB                             http://enterprisedb.com
>
>  + It's impossible for everything to be true. +
>

Thanks Bruce.
I am also reading your book <PostgreSQL Introduction and Concepts>. :)

--
Best Wishes!

                                     hom

Re: I am confused after reading codes of PostgreSQL three week

From

hom

Date:

18 March 2011, 11:16:07

2011/3/17 Kevin Grittner <Kevin.Grittner@wicourts.gov>:
> hom <obsidianhom@gmail.com> wrote:
>
>> I try to known how a database is implemented and I have been
>> reading PG source codes for a month.
>
> That's ambitious.
>
> find -name '*.h' -or -name '*.c' \
>  | egrep -v '^\./src/test/.+/tmp_check/' \
>  | xargs cat | wc -l
> 1059144
>
> Depending on how you do the math, that's about 50,000 lines of code
> per day to get through it in the time you mention.
>
>> Is there any article or some way could help understand the source
>> code ?
>
> Your best bet would be to follow links from the Developers tab on
> the main PostgreSQL web site:
>
> http://www.postgresql.org/developer/
>
> In particular the Developer FAQ page:
>
> http://wiki.postgresql.org/wiki/Developer_FAQ
>
> And the "Coding" links:
>
> http://www.postgresql.org/developer/coding
>
> may help.
>
> Before reading code in a directory, be sure to read any README
> file(s) in that directory carefully.
>
> It helps to read this list.
>
> In spite of reviewing all of that myself, it was rather intimidating
> when I went to work on a major patch 14 months ago.  Robert Haas
> offered some good advice which served me well in that effort --
> divide the effort in to a series of incremental steps, each of which
> deals with a small enough portion of the code to get your head
> around.  As you work in any one narrow area, it becomes increasingly
> clear; with that as a base you can expand your scope.
>
> When you're working in the code, it is tremendously helpful to use
> an editor with ctags support (or similar IDE functionality).
>
> I hope this is helpful.  Good luck.
>
> -Kevin
>

Thanks Kevin.
I will follow your advice and I will also post the question to the
mail list for help.
Thanks a lot.

--
Best Wishes!

                                     hom

Re: I am confused after reading codes of PostgreSQL three week

From

hom

Date:

18 March 2011, 11:27:20

2011/3/18 Markus Wanner <markus@bluegap.ch>:
> Hom,
>
> On 03/17/2011 04:49 PM, Kevin Grittner wrote:
>> That's ambitious.
>
> Absolutely, yes.  Exercise patience with yourself.
>
> A method that hasn't been mentioned, yet, is digging out your debugger
> and attach it to a connected Postgres backend.  You can then issue a
> query you are interested in and follow the backend doing its work.
>
> That's particularly helpful in trying to find a certain spot of
> interest.  Of course, it doesn't help much in getting the big picture.
>
> Good luck on your journey through the code base.
>
> Regards
>
> Markus Wanner
>

Thanks Markus.
It's hard time at the beginning.
I should keep patient. :)

--
Best Wishes!

                                     hom

Re: I am confused after reading codes of PostgreSQL three week

From

hom

Date:

18 March 2011, 11:35:01

2011/3/18 Brendan Jurd <direvus@gmail.com>:
> On 18 March 2011 01:57, hom <obsidianhom@gmail.com> wrote:
>>  I try to known how a database is implemented
>
> This objective is so vast and so vague that it's difficult to give
> meaningful help.
>
> I'd emphasise Kevin Grittner's very worthwhile advice.  Try to break
> your question down into smaller, more specific ones.  With a question
> like "how does postgres work" you're likely to flounder.  But with a
> more targeted question, e.g., "what format does postgres use to save
> data to disk" or "how does postgres implement ORDER BY", you can make
> easier progress, and perhaps you could get more useful pointers from
> the people on this list.
>
> Have you read through the "Overview of System Internals" chapter in
> the documentation [1]?  Perhaps it will help you identify the areas
> you wish to explore further, and form more specific questions.
>
> [1] http://www.postgresql.org/docs/current/static/overview.html
>
> Cheers,
> BJ
>

Thanks Brendan.
I have a quickly glance on "Overview of System Internals"  before.
I think it is time to read it again.

--
Best Wishes!

                                     hom

Re: I am confused after reading codes of PostgreSQL three week

From

hom

Date:

18 March 2011, 11:44:55

2011/3/18 Vaibhav Kaushal <vaibhavkaushal123@gmail.com>:
> Hi,
> That was the question I was facing 5 months ago and trust me I am doing it
> even now. With an average of 6+ hours going into PostgreSQL Code, even with
> best practices (as suggested by the developers) I still think I know less
> than 10 percent. It is too huge to be swallowed at once.
> I too had to break it down into pieces and because everything is so
> interconnected with everything else, it is quite complicated in the
> beginning. Start with one piece; planner, parser, executor, storage
> management whatever and slowly it should help you get the bigger picture.
> regards,
> Vaibhav
> I had to break it into

Thanks Vaibhav .
I have step into parser before but I meet a problem:

when I debug step in the scanner_init(),  Eclipse always finds scan.l
and the excute order is not match the file.
I think it should be scan.c actually but I don't known how to trace
into scan.c :(
PS: I have turn "Search for duplicate source files" option on.

I have posted to the mail list, but it have not solved.

here is the link:
http://postgresql.1045698.n5.nabble.com/Open-unmatch-source-file-when-step-into-parse-analyze-in-Eclipse-td3408033.html

--
Best Wishes!

                                     hom

Re: I am confused after reading codes of PostgreSQL three week

From

Vaibhav Kaushal

Date:

19 March 2011, 00:31:45

Hello hom,

Frankly I am a learner as well. The experts here are almost always ready
to help and would be a better source of information. 

Moreover I am also using eclipse but I do not use it for building the
source. I use it only as a source code browser (its easy in GUI; isn't
it? ). I am trying to learn about the executor so can't say much about
the parser. However I suppose that you must be knowing the rules of the
tools flex and bison to understand the parser. And why are you into
scan.c? It is created by flex dear. Read the scan.l and gram.y instead.
It is these files which are responsible for the major work done by the
parser. 

If you are keen about the parser, go learn lex and yacc (or flex and
bison ... they are almost the same) and then go through the scan.l and
gram.y files. It is actually an _extremely_ tough job to read the
generated files. Once again, do turn off the "Search for duplicate
source files" option. There are no duplicate files in the source tree.

Also, if you are using the copy of source tree which was built once in
the workspace, things can be a little different.

@others: Well, I do know that there are a few books in the market
written by the devs but how much does it help when I am already banging
my head into source since last 5 months? 

Regards,
Vaibhav

On Fri, 2011-03-18 at 22:44 +0800, hom wrote:
> 2011/3/18 Vaibhav Kaushal <vaibhavkaushal123@gmail.com>:
> > Hi,
> > That was the question I was facing 5 months ago and trust me I am doing it
> > even now. With an average of 6+ hours going into PostgreSQL Code, even with
> > best practices (as suggested by the developers) I still think I know less
> > than 10 percent. It is too huge to be swallowed at once.
> > I too had to break it down into pieces and because everything is so
> > interconnected with everything else, it is quite complicated in the
> > beginning. Start with one piece; planner, parser, executor, storage
> > management whatever and slowly it should help you get the bigger picture.
> > regards,
> > Vaibhav
> > I had to break it into
> 
> Thanks Vaibhav .
> I have step into parser before but I meet a problem:
> 
> when I debug step in the scanner_init(),  Eclipse always finds scan.l
> and the excute order is not match the file.
> I think it should be scan.c actually but I don't known how to trace
> into scan.c :(
> PS: I have turn "Search for duplicate source files" option on.
> 
> I have posted to the mail list, but it have not solved.
> 
> here is the link:
>
http://postgresql.1045698.n5.nabble.com/Open-unmatch-source-file-when-step-into-parse-analyze-in-Eclipse-td3408033.html
>

Re: I am confused after reading codes of PostgreSQL three week

From

hom

Date:

20 March 2011, 00:50:10

2011/3/19 Vaibhav Kaushal <vaibhavkaushal123@gmail.com>:
> Hello hom,
>
> Frankly I am a learner as well. The experts here are almost always ready
> to help and would be a better source of information.
>
> Moreover I am also using eclipse but I do not use it for building the
> source. I use it only as a source code browser (its easy in GUI; isn't
> it? ). I am trying to learn about the executor so can't say much about
> the parser. However I suppose that you must be knowing the rules of the
> tools flex and bison to understand the parser. And why are you into
> scan.c? It is created by flex dear. Read the scan.l and gram.y instead.
> It is these files which are responsible for the major work done by the
> parser.
>
> If you are keen about the parser, go learn lex and yacc (or flex and
> bison ... they are almost the same) and then go through the scan.l and
> gram.y files. It is actually an _extremely_ tough job to read the
> generated files. Once again, do turn off the "Search for duplicate
> source files" option. There are no duplicate files in the source tree.
>
> Also, if you are using the copy of source tree which was built once in
> the workspace, things can be a little different.
>
> @others: Well, I do know that there are a few books in the market
> written by the devs but how much does it help when I am already banging
> my head into source since last 5 months?
>
>
> Regards,
> Vaibhav


Thanks Vaibhav.

I trace into scan.c because I want to known how the paser tree is
built and I debug the source step by step.
Then the eclipse pick up the scan.I and the excute order does not
match the code.

Actually, I have no idea which module of the source I should read first.
I have a quick glance at the source and I known a litter about how a
query excutes.
But the modules are so connected. I don't known what part I should be deep in.

Now, I plan to study deep in mmgr. Will it be suitable?

--
Best Wishes!

                                     hom

Re: I am confused after reading codes of PostgreSQL three week

From

Martijn van Oosterhout

Date:

20 March 2011, 07:08:53

On Sun, Mar 20, 2011 at 11:50:01AM +0800, hom wrote:
> I trace into scan.c because I want to known how the paser tree is
> built and I debug the source step by step.
> Then the eclipse pick up the scan.I and the excute order does not
> match the code.

Umm, the scanners produced by flex and bison are huge table driven
parsers, which makes following what is happening in terms of "parse
tree" extremely difficult to follow.

If you want to follow what's happening, see the following page:

http://dinosaur.compilertools.net/bison/bison_11.html

Which will cause the parser to dump what it's doing. As the page says,
stepping through the processed file reveals little, becuase it's the
same code being executed over and over again, only the variables
change.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle

Re: I am confused after reading codes of PostgreSQL three week

From

Nicolas Barbier

Date:

20 March 2011, 07:11:19

2011/3/20 hom <obsidianhom@gmail.com>:

> I trace into scan.c because I want to known how the paser tree is
> built and I debug the source step by step.

I suggest you learn how flex/bison work first. The contents of the *.c
files generated by flex/bison are not generally supposed to be
interpreted by humans, rather you should read their original sources
(*.l and *.y).

> Then the eclipse pick up the scan.I and the excute order does not
> match the code.

Eclipse seems to understand that any code corresponding to the
generated .c file actually originates in the .l file, but apparently
fails to match (some of?) the line numbers. OTOH, I cannot really
imagine how it is supposed to match them as long as you are not
executing lines that are literally copied from the .l file (e.g., as
long as the lexer or parser code itself is being executed), so that
may be normal.

Again: Do not try to read the generated .c files, but rather read the
corresponding .l and .y files. The tarballs may include those
generated .c files, but as you will find out when checking out the
repository itself, they are not really considered "source" (i.e., they
are not included). When debugging, skip over the lexer and parser code
itself, just put your breakpoints in the C code in the .l and .y files
(I hope Eclipse might match *those* line numbers a least, and make the
breakpoints work).

Nicolas

Re: I am confused after reading codes of PostgreSQL three week

From

hom

Date:

20 March 2011, 12:31:24

2011/3/20 Martijn van Oosterhout <kleptog@svana.org>:
> On Sun, Mar 20, 2011 at 11:50:01AM +0800, hom wrote:
>> I trace into scan.c because I want to known how the paser tree is
>> built and I debug the source step by step.
>> Then the eclipse pick up the scan.I and the excute order does not
>> match the code.
>
> Umm, the scanners produced by flex and bison are huge table driven
> parsers, which makes following what is happening in terms of "parse
> tree" extremely difficult to follow.
>
> If you want to follow what's happening, see the following page:
>
> http://dinosaur.compilertools.net/bison/bison_11.html
>
> Which will cause the parser to dump what it's doing. As the page says,
> stepping through the processed file reveals little, becuase it's the
> same code being executed over and over again, only the variables
> change.
>
> Have a nice day,
> --
> Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
>> Patriotism is when love of your own people comes first; nationalism,
>> when hate for people other than your own comes first.
>>                                       - Charles de Gaulle
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iD8DBQFNhdIqIB7bNG8LQkwRAoMeAJsG3Z1reT2E04fy+sFvA2izfXOX3gCfbxhN
> fra+WGq65WMfNlmFa9NGktU=
> =3kpG
> -----END PGP SIGNATURE-----
>
>

Thanks Martijn.
I am trying lex and yacc on my Linux. :)


--
Best Wishes!

                                     hom

Re: I am confused after reading codes of PostgreSQL three week

From

hom

Date:

20 March 2011, 12:44:38

2011/3/20 Nicolas Barbier <nicolas.barbier@gmail.com>:
> 2011/3/20 hom <obsidianhom@gmail.com>:
>
>> I trace into scan.c because I want to known how the paser tree is
>> built and I debug the source step by step.
>
> I suggest you learn how flex/bison work first. The contents of the *.c
> files generated by flex/bison are not generally supposed to be
> interpreted by humans, rather you should read their original sources
> (*.l and *.y).
>
>> Then the eclipse pick up the scan.I and the excute order does not
>> match the code.
>
> Eclipse seems to understand that any code corresponding to the
> generated .c file actually originates in the .l file, but apparently
> fails to match (some of?) the line numbers. OTOH, I cannot really
> imagine how it is supposed to match them as long as you are not
> executing lines that are literally copied from the .l file (e.g., as
> long as the lexer or parser code itself is being executed), so that
> may be normal.
>
> Again: Do not try to read the generated .c files, but rather read the
> corresponding .l and .y files. The tarballs may include those
> generated .c files, but as you will find out when checking out the
> repository itself, they are not really considered "source" (i.e., they
> are not included). When debugging, skip over the lexer and parser code
> itself, just put your breakpoints in the C code in the .l and .y files
> (I hope Eclipse might match *those* line numbers a least, and make the
> breakpoints work).
>
> Nicolas
>

Thanks Nicolas.
I put breakpoints in scan.I but it doesn't work sometime.
but it doesn't matter. I plan to spend more time on mmgr, storage, access. :)


--
Best Wishes!

                                     hom