Thread: initdb needed for newest sources
I have just committed changes that require a recompile and initdb when next you pull the current CVS sources. Be warned. The changes revise the contents of the pg_statistic system table in order to support more accurate selectivity estimation, as per discussions a few days ago. I had hoped that pg_upgrade would be sufficient to deal with this, but it seems to be out of service at the moment... regards, tom lane
> The changes revise the contents of the pg_statistic system table > in order to support more accurate selectivity estimation, as per > discussions a few days ago. Hey Tom, I'm not sure I was all that enthused about trying to optimize the selectivity for *any* particular strategy. How about also allowing the value(s) used for selectivity estimates to be manually set in the table so folks can tune things to their satisfaction? Maybe they can already be set, in which case we could add a SET OPTIMIZATION command (the name is negotiable) to make it easier? - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> The changes revise the contents of the pg_statistic system table >> in order to support more accurate selectivity estimation, as per >> discussions a few days ago. > Hey Tom, I'm not sure I was all that enthused about trying to optimize > the selectivity for *any* particular strategy. How about also allowing > the value(s) used for selectivity estimates to be manually set in the > table so folks can tune things to their satisfaction? Well, I'm more interested in putting my effort into making the system do the right thing without help. Manual overrides are OK as long as you remember to revisit the settings whenever anything changes ... otherwise your manual optimization can become manual pessimization ... But if you want to spend time on the manual approach, I have no objection. There's room for everyone to play. > Maybe they can already be set, in which case we could add a SET > OPTIMIZATION command (the name is negotiable) to make it easier? There's no manual inputs presently, except for some rather crude control variables (_enable_mergejoin_ and so on --- see backend/optimizer/path/costsize.c). There are a mixture of SET commands and backend command-line switches (ugh) to set these, and some don't have any tweak method short of source code changes or going in with a debugger. This area could use some cleanup and rethinking, for sure. regards, tom lane
Another thing I found out during installation is that libpq.so is hard to find. Setting LD_LIBRARY_PATH works, but I hate depending on environment settings, mostly because I always screw them up sooner or later. On searching, I found that you can specify -R/usr/local/pgsql/lib during linking in gcc and in cc on Solaris -- is there some reason that would be bad to do in general? I tried forcing it on psql and it seems to work fine, and I don't need to set LD_LIBRARY_PATH any more. I'm talking about Solaris 2.5 here, by the way.
"J. Michael Roberts" <mirobert@cs.indiana.edu> writes: > On searching, I found that you can specify -R/usr/local/pgsql/lib during > linking in gcc and in cc on Solaris -- is there some reason that would be > bad to do in general? Only that it's platform-specific. We do already do the equivalent incantation for HPUX --- they pronounce it differently, of course, but it's the same idea --- and it seems to work well. If you want to submit a patch to make it happen on Solaris, go for it. (The HPUX tweak is in makefiles/Makefile.hpux, so I suppose you'd want to hack on Makefile.solaris_*.) Ultimately I think we want to get out of the business of learning all these grotty details about shared libraries on different flavors of Unix, and instead rely on GNU libtool to do it for us. But converting to libtool takes work too :-( regards, tom lane
Hi, I know of course that the pgsql optimizer is never wrong but for the sake of argument do you think we could have optimizer hints the way say MSSQL has them? Oh, I'll work on the rest of my wish list later. ;-) Regards John Ridout.
John Ridout wrote: > > Hi, > > I know of course that the pgsql optimizer is never wrong :) OTOH, it is never completely right either :) > but for the sake of argument do you think we could have > optimizer hints the way say MSSQL has them? I think it has been discussed (a little) but more on the lines of 'the way Oracle has them' ;) It seems that this is not currently very high priority for any of the current real developers, but if you would contribute a nice clean implementation I'm sure it would be accepted. ---------------- Hannu
> Hannu Krosing wrote: > > John Ridout wrote: > > > > Hi, > > > > I know of course that the pgsql optimizer is never wrong > > :) > > OTOH, it is never completely right either :) > > > but for the sake of argument do you think we could have > > optimizer hints the way say MSSQL has them? > > I think it has been discussed (a little) but more on the > lines of 'the way Oracle has them' ;) > Oh, you mean properly. > It seems that this is not currently very high priority for > any of the current real developers, but if you would > contribute a nice clean implementation I'm sure it would > be accepted. > > ---------------- > Hannu > Come the end of September I will have enough time to play with the internals of pgsql. I suppose there are more useful things that can be done before optimizer hints such as nearly everything on the TODO list. John.
> Come the end of September I will have enough time > to play with the internals of pgsql. > I suppose there are more useful things that > can be done before optimizer hints such as nearly > everything on the TODO list. Yes, that is basically the issue. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Mon, Aug 02, 1999 at 12:01:03AM -0400, Tom Lane wrote: > > Well, I'm more interested in putting my effort into making the system > do the right thing without help. Manual overrides are OK as long as > you remember to revisit the settings whenever anything changes ... > otherwise your manual optimization can become manual pessimization ... > Hey Tom - Ran across this paper, about an interesting approach, pulling in the indices to aid in selectivity estimates. http://db.cs.berkeley.edu/papers/CSD-98-1021.pdf I grabbed this from a link at: http://db.cs.berkeley.edu/papers/ while looking at the Mariposa work ( http://mariposa.cs.berkeley.edu) from the Sequoia2000 project. I've convinced my team to let me spend a couple days analyzing what it would take to fold the remote access features of Mariposa into the current PostgreSQL tree. I grabbed the alpha-1 code (June 23, 1996) which seems to be based on an early version of Postgres95. Interesting to see all the academic cruft you guys have already cleaned out ;-) Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
> On Mon, Aug 02, 1999 at 12:01:03AM -0400, Tom Lane wrote: > > > > Well, I'm more interested in putting my effort into making the system > > do the right thing without help. Manual overrides are OK as long as > > you remember to revisit the settings whenever anything changes ... > > otherwise your manual optimization can become manual pessimization ... > > > > Hey Tom - > Ran across this paper, about an interesting approach, pulling in the indices > to aid in selectivity estimates. > > http://db.cs.berkeley.edu/papers/CSD-98-1021.pdf > > I grabbed this from a link at: > > http://db.cs.berkeley.edu/papers/ > > while looking at the Mariposa work ( http://mariposa.cs.berkeley.edu) > from the Sequoia2000 project. I've convinced my team to let me spend > a couple days analyzing what it would take to fold the remote access > features of Mariposa into the current PostgreSQL tree. I grabbed the > alpha-1 code (June 23, 1996) which seems to be based on an early version > of Postgres95. Interesting to see all the academic cruft you guys have > already cleaned out ;-) We still have a directory called tioga which is also related to Mariposa. Basically, at the time, no one understood the academic stuff, and we had tons of bugs in general areas. We just didn't see any reason to keep around unusual features while our existing code was so poorly maintained from Berkeley. The mariposa remote access features looked like they were heavily done in the executor directory. This makes sense assuming they wanted the access to be done remotely. They also tried to fix some things while doing Mariposa. A few of those fixes have been added over the years. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
"Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu> writes: > Ran across this paper, about an interesting approach, pulling in the indices > to aid in selectivity estimates. > http://db.cs.berkeley.edu/papers/CSD-98-1021.pdf Looks pretty interesting, but also vastly more complex than I want to tackle at the moment. As of 6.5 the selectivity code is broken for everything except integers. What I'm trying to do for this release cycle is to get it to operate as well as can be expected given the existing design (in which available statistics are not much more than min/max/mode values for each column; strictly speaking the mode was not in the existing design, but given that VACUUM was computing it anyway, we might as well use it). A lot more *could* be done, as this paper suggests; but there are also many other important tasks, and only so many hours in the day. I doubt that building an entirely new selectivity estimation infrastructure is worthwhile until we have cured some more problems elsewhere :-( > while looking at the Mariposa work ( http://mariposa.cs.berkeley.edu) > from the Sequoia2000 project. I've convinced my team to let me spend > a couple days analyzing what it would take to fold the remote access > features of Mariposa into the current PostgreSQL tree. Let us know what you find out... regards, tom lane
On Mon, Aug 02, 1999 at 04:44:10PM -0400, Bruce Momjian wrote: > > We still have a directory called tioga which is also related to > Mariposa. Basically, at the time, no one understood the academic stuff, > and we had tons of bugs in general areas. We just didn't see any reason > to keep around unusual features while our existing code was so poorly > maintained from Berkeley. The right thing to do, I concur. Get the basics stable and working well, _then_ tack on the interesting stuff :-) A common complaint about us academics: we only want to do the interesting stuff. > > The mariposa remote access features looked like they were heavily done > in the executor directory. This makes sense assuming they wanted the > access to be done remotely. They also tried to fix some things while > doing Mariposa. A few of those fixes have been added over the years. > Right. As I've been able to make out so far, in Mariposa a query passes through the regular parser and single-site optimizer, then the selected plan tree is handed to a 'fragmenter' to break the work up into chunks, which are then handed around to a 'broker' which uses a microeconomic 'bid' process to parcels them out to both local and remote executors. The results from each site then go through a local 'coordinator' which merges the result sets, and hands them back to the original client. Whew! It's interesting to compare the theory describing the workings of Mariposa (such as the paper in VLDB), and the code. For the fragmenter, the paper describes basically a rational decomposition of the plan, while the code applies non-deterministic, but tuneable, methods (lots of calls to random and comparisions to user specified odds ratios). It strikes me as a bit odd to optimize the plan for a single site, then break it all apart again. My thoughts on this are to implement a two new node types: one a remote table, and one which represents access to a remote table. Remote tables have host info in them, and always be added to the plan with a remote-access node directly above them. Remote-access nodes would be seperate from their remote-table, to allow the communications cost to be slid up the plan tree, and merged with other remote-access nodes talking to the same server. This should maintain the order-agnostic nature of the optimizer. The executor will need to build SQL statements and from the sub-plans and submit them via standard network db access client librarys. First step, create a remote-table node, and teach the excutor how to get info from it. Later, add the seperable remote-access node. How insane does this sound now? Am I still a mad scientist? (...always!) Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
"Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu> writes: > It strikes me as a bit odd to optimize the plan for a single site, > then break it all apart again. Yes, that sounds pretty peculiar, especially considering that the optimizer's choices are all about access costs. A plan generated for entirely-local execution might be far from optimal when broken across multiple nodes. > My thoughts on this are to implement a two new node types: one a > remote table, and one which represents access to a remote > table. Remote tables have host info in them, and always be added to > the plan with a remote-access node directly above them. Remote-access > nodes would be seperate from their remote-table, to allow the > communications cost to be slid up the plan tree, and merged with other > remote-access nodes talking to the same server. I like that approach a lot better. If the access cost estimates for the shared-table node can be set to reflect remote communication costs, you might actually get reasonable plans out of the optimizer... You should not move too far with an actual implementation until you talk to Jan about rangetable entries for sub-selects. If we are going to invent new RTE types we may as well try to deal with that problem at the same time. regards, tom lane
> It strikes me as a bit odd to optimize the plan for a single site, > then break it all apart again. My thoughts on this are to implement > a two new node types: one a remote table, and one which represents > access to a remote table. Remote tables have host info in them, and > always be added to the plan with a remote-access node directly above > them. Remote-access nodes would be seperate from their remote-table, > to allow the communications cost to be slid up the plan tree, and merged > with other remote-access nodes talking to the same server. This should > maintain the order-agnostic nature of the optimizer. The executor will > need to build SQL statements and from the sub-plans and submit them via > standard network db access client librarys. > > First step, create a remote-table node, and teach the excutor how to get > info from it. Later, add the seperable remote-access node. > > How insane does this sound now? Am I still a mad scientist? (...always!) Sounds interesting, and doable. People have asked from time to time about this. Our access routines are very modular, so if you can get your stuff working inside the tuple access routines, you will have it made. The easiest way may be to just hack up the storage manager (smgr). Create a new access method, and hook your remote stuff to that. You could try something easy like common backend NFS with some locking protocol to prevent contension. In fact, NFS would be transparent except for locking issues. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Mon, Aug 02, 1999 at 10:25:15PM -0400, Bruce Momjian wrote: > > Sounds interesting, and doable. People have asked from time to time > about this. Our access routines are very modular, so if you can get > your stuff working inside the tuple access routines, you will have it > made. The easiest way may be to just hack up the storage manager > (smgr). Create a new access method, and hook your remote stuff to that. > I considered two quick and dirty proof-of-concept implementations first: hacking the smgr, and upgrading functions to allow them to return sets, then building views with an ON SELECT rule that fired an arbitrary db access routine. One advantage of going the smgr route is I automagically get all the buffer and relation caching that's builit-in. A negative is that it basically just gives me a remote table: every query's going to pull the whole table, or I'm going to have to write somewhat hairy access methods, I think. I think it's a little _too_ low level. I want to be able to have the backend DB do as much work as possible, minimizing the expensive network transfer. Oh, and BTW, while the smgr infrastructure is still in place, and all accesses vector through it, the relsmgr field has been removed from the pg_class table, and all calls to smgropen() and friends use the DEFAULT_SMGR #define. The second, having functions return sets, is still a possibility, but it suffers from the same lack of extensibility - I only get complete tables back. Even worse, I don't get the buffer caches for free. If rules _currently_ could return sets, it's be a 1-2 day hack to get something working. So, I think I'll see what Jan has to say about subselect RTEs, as Tom suggested. Seems the way to go for future work. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
> One advantage of going the smgr route is I automagically get all the > buffer and relation caching that's builit-in. A negative is that it > basically just gives me a remote table: every query's going to pull > the whole table, or I'm going to have to write somewhat hairy access > methods, I think. I think it's a little _too_ low level. I want to be > able to have the backend DB do as much work as possible, minimizing the > expensive network transfer. > > Oh, and BTW, while the smgr infrastructure is still in place, and all > accesses vector through it, the relsmgr field has been removed from > the pg_class table, and all calls to smgropen() and friends use the > DEFAULT_SMGR #define. That was me. The only other storage manager we had was the stable memory storage manager, and that was just too confusing to the developers. It can be easily added back by just changing the DEFAULT_SMGR define to access the relation tuple. In fact, our code uses the storage manager much more consistently now because to enable multi-segment tables (>2 gigs for non-supporting OS's), we had to clean that up. > The second, having functions return sets, is still a possibility, but it > suffers from the same lack of extensibility - I only get complete tables > back. Even worse, I don't get the buffer caches for free. If rules > _currently_ could return sets, it's be a 1-2 day hack to get something > working. > > So, I think I'll see what Jan has to say about subselect RTEs, as Tom > suggested. Seems the way to go for future work. Yes, I think he needs that for foreign keys, and I think he wants to do that for 6.6. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026