Thread: I am confused after reading codes of PostgreSQL three week
Hi, I try to known how a database is implemented and I have been reading PG source codes for a month. Now, I only know a little about how PG work. :( I just know PG work like this but I don't know why PG work like this. :( :( even worse, I feel I can better understand the source code. it may be that I could't split the large module into small piece which may help to understand. Is there any article or some way could help understand the source code ? Thanks for help ~ -- Best Wishes! hom
hom wrote: > Hi, > > I try to known how a database is implemented and I have been reading > PG source codes for a month. > > Now, I only know a little about how PG work. :( > > I just know PG work like this but I don't know why PG work like this. :( :( > > even worse, I feel I can better understand the source code. it may be > that I could't split the large module into small piece which may help > to understand. > > Is there any article or some way could help understand the source code ? I assume you have looked at these places: http://wiki.postgresql.org/wiki/Developer_FAQhttp://www.postgresql.org/developer/coding -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
hom <obsidianhom@gmail.com> wrote: > I try to known how a database is implemented and I have been > reading PG source codes for a month. That's ambitious. find -name '*.h' -or -name '*.c' \ | egrep -v '^\./src/test/.+/tmp_check/' \ | xargs cat | wc -l 1059144 Depending on how you do the math, that's about 50,000 lines of code per day to get through it in the time you mention. > Is there any article or some way could help understand the source > code ? Your best bet would be to follow links from the Developers tab on the main PostgreSQL web site: http://www.postgresql.org/developer/ In particular the Developer FAQ page: http://wiki.postgresql.org/wiki/Developer_FAQ And the "Coding" links: http://www.postgresql.org/developer/coding may help. Before reading code in a directory, be sure to read any README file(s) in that directory carefully. It helps to read this list. In spite of reviewing all of that myself, it was rather intimidating when I went to work on a major patch 14 months ago. Robert Haas offered some good advice which served me well in that effort -- divide the effort in to a series of incremental steps, each of which deals with a small enough portion of the code to get your head around. As you work in any one narrow area, it becomes increasingly clear; with that as a base you can expand your scope. When you're working in the code, it is tremendously helpful to use an editor with ctags support (or similar IDE functionality). I hope this is helpful. Good luck. -Kevin
Hom, On 03/17/2011 04:49 PM, Kevin Grittner wrote: > That's ambitious. Absolutely, yes. Exercise patience with yourself. A method that hasn't been mentioned, yet, is digging out your debugger and attach it to a connected Postgres backend. You can then issue a query you are interested in and follow the backend doing its work. That's particularly helpful in trying to find a certain spot of interest. Of course, it doesn't help much in getting the big picture. Good luck on your journey through the code base. Regards Markus Wanner
On 18 March 2011 01:57, hom <obsidianhom@gmail.com> wrote: > I try to known how a database is implemented This objective is so vast and so vague that it's difficult to give meaningful help. I'd emphasise Kevin Grittner's very worthwhile advice. Try to break your question down into smaller, more specific ones. With a question like "how does postgres work" you're likely to flounder. But with a more targeted question, e.g., "what format does postgres use to save data to disk" or "how does postgres implement ORDER BY", you can make easier progress, and perhaps you could get more useful pointers from the people on this list. Have you read through the "Overview of System Internals" chapter in the documentation [1]? Perhaps it will help you identify the areas you wish to explore further, and form more specific questions. [1] http://www.postgresql.org/docs/current/static/overview.html Cheers, BJ
Hi,
That was the question I was facing 5 months ago and trust me I am doing it even now. With an average of 6+ hours going into PostgreSQL Code, even with best practices (as suggested by the developers) I still think I know less than 10 percent. It is too huge to be swallowed at once.
I too had to break it down into pieces and because everything is so interconnected with everything else, it is quite complicated in the beginning. Start with one piece; planner, parser, executor, storage management whatever and slowly it should help you get the bigger picture.
regards,
Vaibhav
I had to break it into
On Fri, Mar 18, 2011 at 3:39 PM, Brendan Jurd <direvus@gmail.com> wrote:
On 18 March 2011 01:57, hom <obsidianhom@gmail.com> wrote:This objective is so vast and so vague that it's difficult to give
> I try to known how a database is implemented
meaningful help.
I'd emphasise Kevin Grittner's very worthwhile advice. Try to break
your question down into smaller, more specific ones. With a question
like "how does postgres work" you're likely to flounder. But with a
more targeted question, e.g., "what format does postgres use to save
data to disk" or "how does postgres implement ORDER BY", you can make
easier progress, and perhaps you could get more useful pointers from
the people on this list.
Have you read through the "Overview of System Internals" chapter in
the documentation [1]? Perhaps it will help you identify the areas
you wish to explore further, and form more specific questions.
[1] http://www.postgresql.org/docs/current/static/overview.html
Cheers,
BJ
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2011/3/17 Bruce Momjian <bruce@momjian.us>: > hom wrote: >> Hi, >> >> I try to known how a database is implemented and I have been reading >> PG source codes for a month. >> >> Now, I only know a little about how PG work. :( >> >> I just know PG work like this but I don't know why PG work like this. :( :( >> >> even worse, I feel I can better understand the source code. it may be >> that I could't split the large module into small piece which may help >> to understand. >> >> Is there any article or some way could help understand the source code ? > > I assume you have looked at these places: > > http://wiki.postgresql.org/wiki/Developer_FAQ > http://www.postgresql.org/developer/coding > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com > > + It's impossible for everything to be true. + > Thanks Bruce. I am also reading your book <PostgreSQL Introduction and Concepts>. :) -- Best Wishes! hom
2011/3/17 Kevin Grittner <Kevin.Grittner@wicourts.gov>: > hom <obsidianhom@gmail.com> wrote: > >> I try to known how a database is implemented and I have been >> reading PG source codes for a month. > > That's ambitious. > > find -name '*.h' -or -name '*.c' \ > | egrep -v '^\./src/test/.+/tmp_check/' \ > | xargs cat | wc -l > 1059144 > > Depending on how you do the math, that's about 50,000 lines of code > per day to get through it in the time you mention. > >> Is there any article or some way could help understand the source >> code ? > > Your best bet would be to follow links from the Developers tab on > the main PostgreSQL web site: > > http://www.postgresql.org/developer/ > > In particular the Developer FAQ page: > > http://wiki.postgresql.org/wiki/Developer_FAQ > > And the "Coding" links: > > http://www.postgresql.org/developer/coding > > may help. > > Before reading code in a directory, be sure to read any README > file(s) in that directory carefully. > > It helps to read this list. > > In spite of reviewing all of that myself, it was rather intimidating > when I went to work on a major patch 14 months ago. Robert Haas > offered some good advice which served me well in that effort -- > divide the effort in to a series of incremental steps, each of which > deals with a small enough portion of the code to get your head > around. As you work in any one narrow area, it becomes increasingly > clear; with that as a base you can expand your scope. > > When you're working in the code, it is tremendously helpful to use > an editor with ctags support (or similar IDE functionality). > > I hope this is helpful. Good luck. > > -Kevin > Thanks Kevin. I will follow your advice and I will also post the question to the mail list for help. Thanks a lot. -- Best Wishes! hom
2011/3/18 Markus Wanner <markus@bluegap.ch>: > Hom, > > On 03/17/2011 04:49 PM, Kevin Grittner wrote: >> That's ambitious. > > Absolutely, yes. Exercise patience with yourself. > > A method that hasn't been mentioned, yet, is digging out your debugger > and attach it to a connected Postgres backend. You can then issue a > query you are interested in and follow the backend doing its work. > > That's particularly helpful in trying to find a certain spot of > interest. Of course, it doesn't help much in getting the big picture. > > Good luck on your journey through the code base. > > Regards > > Markus Wanner > Thanks Markus. It's hard time at the beginning. I should keep patient. :) -- Best Wishes! hom
2011/3/18 Brendan Jurd <direvus@gmail.com>: > On 18 March 2011 01:57, hom <obsidianhom@gmail.com> wrote: >> I try to known how a database is implemented > > This objective is so vast and so vague that it's difficult to give > meaningful help. > > I'd emphasise Kevin Grittner's very worthwhile advice. Try to break > your question down into smaller, more specific ones. With a question > like "how does postgres work" you're likely to flounder. But with a > more targeted question, e.g., "what format does postgres use to save > data to disk" or "how does postgres implement ORDER BY", you can make > easier progress, and perhaps you could get more useful pointers from > the people on this list. > > Have you read through the "Overview of System Internals" chapter in > the documentation [1]? Perhaps it will help you identify the areas > you wish to explore further, and form more specific questions. > > [1] http://www.postgresql.org/docs/current/static/overview.html > > Cheers, > BJ > Thanks Brendan. I have a quickly glance on "Overview of System Internals" before. I think it is time to read it again. -- Best Wishes! hom
2011/3/18 Vaibhav Kaushal <vaibhavkaushal123@gmail.com>: > Hi, > That was the question I was facing 5 months ago and trust me I am doing it > even now. With an average of 6+ hours going into PostgreSQL Code, even with > best practices (as suggested by the developers) I still think I know less > than 10 percent. It is too huge to be swallowed at once. > I too had to break it down into pieces and because everything is so > interconnected with everything else, it is quite complicated in the > beginning. Start with one piece; planner, parser, executor, storage > management whatever and slowly it should help you get the bigger picture. > regards, > Vaibhav > I had to break it into Thanks Vaibhav . I have step into parser before but I meet a problem: when I debug step in the scanner_init(), Eclipse always finds scan.l and the excute order is not match the file. I think it should be scan.c actually but I don't known how to trace into scan.c :( PS: I have turn "Search for duplicate source files" option on. I have posted to the mail list, but it have not solved. here is the link: http://postgresql.1045698.n5.nabble.com/Open-unmatch-source-file-when-step-into-parse-analyze-in-Eclipse-td3408033.html -- Best Wishes! hom
Hello hom, Frankly I am a learner as well. The experts here are almost always ready to help and would be a better source of information. Moreover I am also using eclipse but I do not use it for building the source. I use it only as a source code browser (its easy in GUI; isn't it? ). I am trying to learn about the executor so can't say much about the parser. However I suppose that you must be knowing the rules of the tools flex and bison to understand the parser. And why are you into scan.c? It is created by flex dear. Read the scan.l and gram.y instead. It is these files which are responsible for the major work done by the parser. If you are keen about the parser, go learn lex and yacc (or flex and bison ... they are almost the same) and then go through the scan.l and gram.y files. It is actually an _extremely_ tough job to read the generated files. Once again, do turn off the "Search for duplicate source files" option. There are no duplicate files in the source tree. Also, if you are using the copy of source tree which was built once in the workspace, things can be a little different. @others: Well, I do know that there are a few books in the market written by the devs but how much does it help when I am already banging my head into source since last 5 months? Regards, Vaibhav On Fri, 2011-03-18 at 22:44 +0800, hom wrote: > 2011/3/18 Vaibhav Kaushal <vaibhavkaushal123@gmail.com>: > > Hi, > > That was the question I was facing 5 months ago and trust me I am doing it > > even now. With an average of 6+ hours going into PostgreSQL Code, even with > > best practices (as suggested by the developers) I still think I know less > > than 10 percent. It is too huge to be swallowed at once. > > I too had to break it down into pieces and because everything is so > > interconnected with everything else, it is quite complicated in the > > beginning. Start with one piece; planner, parser, executor, storage > > management whatever and slowly it should help you get the bigger picture. > > regards, > > Vaibhav > > I had to break it into > > Thanks Vaibhav . > I have step into parser before but I meet a problem: > > when I debug step in the scanner_init(), Eclipse always finds scan.l > and the excute order is not match the file. > I think it should be scan.c actually but I don't known how to trace > into scan.c :( > PS: I have turn "Search for duplicate source files" option on. > > I have posted to the mail list, but it have not solved. > > here is the link: > http://postgresql.1045698.n5.nabble.com/Open-unmatch-source-file-when-step-into-parse-analyze-in-Eclipse-td3408033.html >
2011/3/19 Vaibhav Kaushal <vaibhavkaushal123@gmail.com>: > Hello hom, > > Frankly I am a learner as well. The experts here are almost always ready > to help and would be a better source of information. > > Moreover I am also using eclipse but I do not use it for building the > source. I use it only as a source code browser (its easy in GUI; isn't > it? ). I am trying to learn about the executor so can't say much about > the parser. However I suppose that you must be knowing the rules of the > tools flex and bison to understand the parser. And why are you into > scan.c? It is created by flex dear. Read the scan.l and gram.y instead. > It is these files which are responsible for the major work done by the > parser. > > If you are keen about the parser, go learn lex and yacc (or flex and > bison ... they are almost the same) and then go through the scan.l and > gram.y files. It is actually an _extremely_ tough job to read the > generated files. Once again, do turn off the "Search for duplicate > source files" option. There are no duplicate files in the source tree. > > Also, if you are using the copy of source tree which was built once in > the workspace, things can be a little different. > > @others: Well, I do know that there are a few books in the market > written by the devs but how much does it help when I am already banging > my head into source since last 5 months? > > > Regards, > Vaibhav Thanks Vaibhav. I trace into scan.c because I want to known how the paser tree is built and I debug the source step by step. Then the eclipse pick up the scan.I and the excute order does not match the code. Actually, I have no idea which module of the source I should read first. I have a quick glance at the source and I known a litter about how a query excutes. But the modules are so connected. I don't known what part I should be deep in. Now, I plan to study deep in mmgr. Will it be suitable? -- Best Wishes! hom
On Sun, Mar 20, 2011 at 11:50:01AM +0800, hom wrote: > I trace into scan.c because I want to known how the paser tree is > built and I debug the source step by step. > Then the eclipse pick up the scan.I and the excute order does not > match the code. Umm, the scanners produced by flex and bison are huge table driven parsers, which makes following what is happening in terms of "parse tree" extremely difficult to follow. If you want to follow what's happening, see the following page: http://dinosaur.compilertools.net/bison/bison_11.html Which will cause the parser to dump what it's doing. As the page says, stepping through the processed file reveals little, becuase it's the same code being executed over and over again, only the variables change. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patriotism is when love of your own people comes first; nationalism, > when hate for people other than your own comes first. > - Charles de Gaulle
2011/3/20 hom <obsidianhom@gmail.com>: > I trace into scan.c because I want to known how the paser tree is > built and I debug the source step by step. I suggest you learn how flex/bison work first. The contents of the *.c files generated by flex/bison are not generally supposed to be interpreted by humans, rather you should read their original sources (*.l and *.y). > Then the eclipse pick up the scan.I and the excute order does not > match the code. Eclipse seems to understand that any code corresponding to the generated .c file actually originates in the .l file, but apparently fails to match (some of?) the line numbers. OTOH, I cannot really imagine how it is supposed to match them as long as you are not executing lines that are literally copied from the .l file (e.g., as long as the lexer or parser code itself is being executed), so that may be normal. Again: Do not try to read the generated .c files, but rather read the corresponding .l and .y files. The tarballs may include those generated .c files, but as you will find out when checking out the repository itself, they are not really considered "source" (i.e., they are not included). When debugging, skip over the lexer and parser code itself, just put your breakpoints in the C code in the .l and .y files (I hope Eclipse might match *those* line numbers a least, and make the breakpoints work). Nicolas
2011/3/20 Martijn van Oosterhout <kleptog@svana.org>: > On Sun, Mar 20, 2011 at 11:50:01AM +0800, hom wrote: >> I trace into scan.c because I want to known how the paser tree is >> built and I debug the source step by step. >> Then the eclipse pick up the scan.I and the excute order does not >> match the code. > > Umm, the scanners produced by flex and bison are huge table driven > parsers, which makes following what is happening in terms of "parse > tree" extremely difficult to follow. > > If you want to follow what's happening, see the following page: > > http://dinosaur.compilertools.net/bison/bison_11.html > > Which will cause the parser to dump what it's doing. As the page says, > stepping through the processed file reveals little, becuase it's the > same code being executed over and over again, only the variables > change. > > Have a nice day, > -- > Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ >> Patriotism is when love of your own people comes first; nationalism, >> when hate for people other than your own comes first. >> - Charles de Gaulle > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > > iD8DBQFNhdIqIB7bNG8LQkwRAoMeAJsG3Z1reT2E04fy+sFvA2izfXOX3gCfbxhN > fra+WGq65WMfNlmFa9NGktU= > =3kpG > -----END PGP SIGNATURE----- > > Thanks Martijn. I am trying lex and yacc on my Linux. :) -- Best Wishes! hom
2011/3/20 Nicolas Barbier <nicolas.barbier@gmail.com>: > 2011/3/20 hom <obsidianhom@gmail.com>: > >> I trace into scan.c because I want to known how the paser tree is >> built and I debug the source step by step. > > I suggest you learn how flex/bison work first. The contents of the *.c > files generated by flex/bison are not generally supposed to be > interpreted by humans, rather you should read their original sources > (*.l and *.y). > >> Then the eclipse pick up the scan.I and the excute order does not >> match the code. > > Eclipse seems to understand that any code corresponding to the > generated .c file actually originates in the .l file, but apparently > fails to match (some of?) the line numbers. OTOH, I cannot really > imagine how it is supposed to match them as long as you are not > executing lines that are literally copied from the .l file (e.g., as > long as the lexer or parser code itself is being executed), so that > may be normal. > > Again: Do not try to read the generated .c files, but rather read the > corresponding .l and .y files. The tarballs may include those > generated .c files, but as you will find out when checking out the > repository itself, they are not really considered "source" (i.e., they > are not included). When debugging, skip over the lexer and parser code > itself, just put your breakpoints in the C code in the .l and .y files > (I hope Eclipse might match *those* line numbers a least, and make the > breakpoints work). > > Nicolas > Thanks Nicolas. I put breakpoints in scan.I but it doesn't work sometime. but it doesn't matter. I plan to spend more time on mmgr, storage, access. :) -- Best Wishes! hom