Thread: Re: [INTERFACES] Re: [HACKERS] changes in 6.4
Bruce Momjian wrote: > > > > Hannu Krosing wrote: > > > > > > The days where every release fixed server crashes, or added a feature > > > > that users were 'screaming for' may be a thing of the past. > > > > > > Is anyone working on fixing the exploding optimisations for many OR-s, > > > at least the canonic case used by access? > > > > > > My impression is that this has fallen somewhere between > > > insightdist and Vadim. > > > > This is really big for the ODBCers. (And I suspect for JDBCers too.) Many > > desktop libraries and end-user tools depend on this "record set" strategy to > > operate effectively. > > > > I have put together a workable hack that runs just before cnfify(). The > > option is activated through the SET command. Once activated, it identifies > > queries with this particular multi-OR pattern generated by these RECORD SET > > strategies. Qualified query trees are rewritten as multiple UNIONs. (One > > for each OR grouping). > > > > The results are profound. Queries that used to scan tables because of the > > ORs, now make use of any indexes. Thus, the size of the table has virtually > > no effect on performance. Furthermore, queries that used to crash the > > backend, now run in under a second. > > > > Currently the down sides are: > > 1. If there is no usable index, performance is significantly worse. The > > patch does not check to make sure that there is a usable index. I could use > > some pointers on this. > > > > 2. Small tables are actually a bit slower than without the patch. > > > > 3. Not very elegant. I am looking for a more generalized solution. > > I have lots of ideas, but I would need to know the backend much better before > > attempting any of them. My favorite idea is before cnfify(), to factor the > > OR terms and pull out the constants into a virtual (temporary) table spaces. > > Then rewrite the query as a join. The optimizer will (should) treat the new > > query accordingly. This assumes that an efficient factoring algorithm exists > > and that temporary tables can exist in the heap. > > > > Illustration: > > SELECT ... FROM tab WHERE > > (var1 = const1 AND var2 = const2) OR > > (var1 = const3 AND var2 = const4) OR > > (var1 = const5 AND var2 = const6) > > > > SELECT ... FROM tab, tmp WHERE > > (var1 = var_x AND var2 = var_y) > > > > tmp > > var_x | var_y > > -------------- > > const1|const2 > > const3|const4 > > const5|const6 > > David, where are we on this? I know we have OR's using indexes. Do we > still need to look this as a fix, or are we OK. I have not gotten far > enough in the optimizer to know how to fix the Bruce, If the question is, have I come up with a solution for the cnf'ify problem: No If the question is, is it still important: Very much yes. It is essential for many RAD tools using remote data objects which make use of key sets. Your recent optimization of the OR list goes a long way, but inevitably users are confronted with multi-part keys. When I look at the problem my head spins. I do not have the experience (yet?) with the backend to be mucking around in the optimizer. As I see it, cnf'ify is doing just what it is supposed to do. Boundless boolean logic. I think hope may lay though, in identifying each AND'ed group associated with a key and tagging it as a special sub-root node which cnf'ify does not penetrate. This node would be allowed to pass to the later stages of the optimizer where it will be used to plan index scans. Easy for me to say. In the meantime, I still have the patch that I described in prior email. It has worked well for us. Let me restate that. We could not survive without it! However, I do not feel that is a sufficiently functional approach that should be incorporated as a final solution. I will submit the patch if you, (anyone) does not come up with a better solution. It is coded to be activated by a SET KSQO to minimize its reach.
> > > Bruce Momjian wrote: > > > > > > > Hannu Krosing wrote: > > > > > > > > The days where every release fixed server crashes, or added a feature > > > > > that users were 'screaming for' may be a thing of the past. > > > > > > > > Is anyone working on fixing the exploding optimisations for many OR-s, > > > > at least the canonic case used by access? > > > > > > > > My impression is that this has fallen somewhere between > > > > insightdist and Vadim. > > > > > > This is really big for the ODBCers. (And I suspect for JDBCers too.) Many > > > desktop libraries and end-user tools depend on this "record set" strategy to > > > operate effectively. > > > > > > I have put together a workable hack that runs just before cnfify(). The > > > option is activated through the SET command. Once activated, it identifies > > > queries with this particular multi-OR pattern generated by these RECORD SET > > > strategies. Qualified query trees are rewritten as multiple UNIONs. (One > > > for each OR grouping). > > > > > > The results are profound. Queries that used to scan tables because of the > > > ORs, now make use of any indexes. Thus, the size of the table has virtually > > > no effect on performance. Furthermore, queries that used to crash the > > > backend, now run in under a second. > > > > > > Currently the down sides are: > > > 1. If there is no usable index, performance is significantly worse. The > > > patch does not check to make sure that there is a usable index. I could use > > > some pointers on this. > > > > > > 2. Small tables are actually a bit slower than without the patch. > > > > > > 3. Not very elegant. I am looking for a more generalized solution. > > > I have lots of ideas, but I would need to know the backend much better before > > > attempting any of them. My favorite idea is before cnfify(), to factor the > > > OR terms and pull out the constants into a virtual (temporary) table spaces. > > > Then rewrite the query as a join. The optimizer will (should) treat the new > > > query accordingly. This assumes that an efficient factoring algorithm exists > > > and that temporary tables can exist in the heap. > > > > > > Illustration: > > > SELECT ... FROM tab WHERE > > > (var1 = const1 AND var2 = const2) OR > > > (var1 = const3 AND var2 = const4) OR > > > (var1 = const5 AND var2 = const6) > > > > > > SELECT ... FROM tab, tmp WHERE > > > (var1 = var_x AND var2 = var_y) > > > > > > tmp > > > var_x | var_y > > > -------------- > > > const1|const2 > > > const3|const4 > > > const5|const6 > > > > David, where are we on this? I know we have OR's using indexes. Do we > > still need to look this as a fix, or are we OK. I have not gotten far > > enough in the optimizer to know how to fix the > > Bruce, > > If the question is, have I come up with a solution for the cnf'ify problem: No > > If the question is, is it still important: Very much yes. > > It is essential for many RAD tools using remote data objects which make use of key > sets. Your recent optimization of the OR list goes a long way, but inevitably > users are confronted with multi-part keys. > > When I look at the problem my head spins. I do not have the experience (yet?) > with the backend to be mucking around in the optimizer. As I see it, cnf'ify is > doing just what it is supposed to do. Boundless boolean logic. > > I think hope may lay though, in identifying each AND'ed group associated with a key > and tagging it as a special sub-root node which cnf'ify does not penetrate. This > node would be allowed to pass to the later stages of the optimizer where it will be > used to plan index scans. Easy for me to say. > > In the meantime, I still have the patch that I described in prior email. It has > worked well for us. Let me restate that. We could not survive without it! > However, I do not feel that is a sufficiently functional approach that should be > incorporated as a final solution. I will submit the patch if you, (anyone) does > not come up with a better solution. It is coded to be activated by a SET KSQO to > minimize its reach. > > OK, let me try this one. Why is the system cnf'ifying the query. Because it wants to have a list of qualifications that are AND'ed, so it can just pick the most restrictive/cheapest, and evaluate that one first. If you have: (a=b and c=d) or e=1 In this case, without cnf'ify, it has to evaluate both of them, because if one is false, you can't be sure another would be true. In the cnf'ify case, (a=b or e=1) and (c=d or e=1) In this case, it can choose either, and act on just one, if a row fails to meet it, it can stop and not evaluate it using the other restriction. The fact is that it is only going to use fancy join/index in one of the two cases, so it tries to pick the best one, and does a brute-force qualification test on the remaining item if the first one tried is true. The problem is of course large where clauses can exponentially expand this. What it really trying to do is to pick a cheapest restriction, but the memory explosion and query failure are serious problems. The issue is that it thinks it is doing something to help things, while it is actually hurting things. In the ODBC case of: (x=3 and y=4) or (x=3 and y=5) or (x=3 and y=6) or ... it clearly is not going to gain anything by choosing any CHEAPEST path, because they are all the same in terms of cost, and the use by ODBC clients is hurting reliability. I am inclined to agree with David's solution of breaking apart the query into separate UNION queries in certain cases. It seems to be the most logical solution, because the cnf'ify code is working counter to its purpose in these cases. Now, the question is how/where to implement this. I see your idea of making the OR a join to a temp table that holds all the constants. Another idea would be to do actual UNION queries: SELECT * FROM tab WHERE (x=3 and y=4) UNION SELECT * FROM tab WHERE (x=3 and y=5) UNION SELECT * FROM tab WHERE (x=3 and y=6) ... This would work well for tables with indexes, but for a sequential scan, you are doing a sequential scan for each UNION. Another idea is subselects. Also, you have to make sure you return the proper rows, keeping duplicates where they are in the base table, but not returning them when the meet more than one qualification. SELECT * FROM tab WHERE (x,y) IN (SELECT 3, 4 UNION SELECT 3, 5 UNION SELECT 3, 6) I believe we actually support this. This is not going to use an index on tab, so it may be slow if x and y are indexed. Another more bizarre solution is: SELECT * FROM tab WHERE (x,y) = (SELECT 3, 4) OR (x,y) = (SELECT 3, 5) OR (x,y) = (SELECT 3, 6) Again, I think we do this too. I don't think cnf'ify does anything with this. I also believe "=" uses indexes on subselects, while IN does not because IN could return lots of rows, and an index is slower than a non-index join on lots of rows. Of course, now that we index OR's. Let me ask another question. If I do: SELECT * FROM tab WHERE x=3 OR x=4 it works, and uses indexes. Why can't the optimizer just not cnf'ify things sometimes, and just do: SELECT * FROM tab WHERE (x=3 AND y=4) OR (x=3 AND y=5) OR (x=3 AND y=6) Why can it handle x=3 OR x=4, but not the more complicated case above, without trying to be too smart? If x,y is a multi-key index, it could use that quite easily. If not, it can do a sequentail scan and run the tests. Another issue. To the optimizer, x=3 and x=y are totally different. In x=3, it is a column compared to a constant, while in x=y, it is a join. That makes a huge difference. In the case of (a=b and c=d) or e=1, you pick the best path and do the a=b join, and throw in the e=1 entries. You can't easily do both joins, because you also need the e=1 stuff. I wounder what would happen if we prevent cnf'ifying of cases where the OR represent only column = constant restrictions. I meant to really go through the optimizer this month, but other backend items took my time. Can someone run some tests on disabling the cnf'ify calls. It is my understanding that with the non-cnf-ify'ed query, it can't choose an optimial path, and starts to do either straight index matches, sequential scans, or cartesian products where it joins every row to every other row looking for a match. Let's say we turn off cnf-ify just for non-join queries. Does that help? I am not sure of the ramifications of telling the optimizer it no longer has a variety of paths to choose for evaluating the query. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
Bruce Momjian wrote: > OK, let me try this one. > > Why is the system cnf'ifying the query. Because it wants to have a > list of qualifications that are AND'ed, so it can just pick the most > restrictive/cheapest, and evaluate that one first. > > If you have: > > (a=b and c=d) or e=1 > > In this case, without cnf'ify, it has to evaluate both of them, because > if one is false, you can't be sure another would be true. In the > cnf'ify case, > > (a=b or e=1) and (c=d or e=1) > > In this case, it can choose either, and act on just one, if a row fails > to meet it, it can stop and not evaluate it using the other restriction. > > The fact is that it is only going to use fancy join/index in one of the > two cases, so it tries to pick the best one, and does a brute-force > qualification test on the remaining item if the first one tried is true. > > The problem is of course large where clauses can exponentially expand > this. What it really trying to do is to pick a cheapest restriction, > but the memory explosion and query failure are serious problems. > > The issue is that it thinks it is doing something to help things, while > it is actually hurting things. > > In the ODBC case of: > > (x=3 and y=4) or > (x=3 and y=5) or > (x=3 and y=6) or ... > > it clearly is not going to gain anything by choosing any CHEAPEST path, > because they are all the same in terms of cost, and the use by ODBC > clients is hurting reliability. > > I am inclined to agree with David's solution of breaking apart the query > into separate UNION queries in certain cases. It seems to be the most > logical solution, because the cnf'ify code is working counter to its > purpose in these cases. > > Now, the question is how/where to implement this. I see your idea of > making the OR a join to a temp table that holds all the constants. > Another idea would be to do actual UNION queries: > > SELECT * FROM tab > WHERE (x=3 and y=4) > UNION > SELECT * FROM tab > WHERE (x=3 and y=5) > UNION > SELECT * FROM tab > WHERE (x=3 and y=6) ... > > This would work well for tables with indexes, but for a sequential scan, > you are doing a sequential scan for each UNION. Practically speaking, the lack of an index concern, may not be justified. The reason these queries are being generated, with this shape, is because remote data objects on the client side are being told that a primary key exists on these tables. The object is told about these keys in one of two ways. 1. It queries the database for the primary key of the table. The ODBC driver serviced this request by querying for the attributes used in {table_name}_pkey. 2. The user manually specifies the primary key. In this case an actual index may not exist. (i.e. MS Access asks the user for this information if a primary key is not found in a table) The second case is the only one that would cause a problem. Fortunately, the solution is simple. Add a primary key index! My only concern is to be able to accurately identify a query with the proper signature before rewriting it as a UNION. To what degree should this inspection be taken? BTW, I would not do the rewrite on OR's without AND's since you have fixed the OR's use of the index. There is one other potential issue. My experience with using arrays in tables and UNIONS creates problems. There are missing array comparison operators which are used by the implied DISTINCT. > Another idea is > subselects. Also, you have to make sure you return the proper rows, > keeping duplicates where they are in the base table, but not returning > them when the meet more than one qualification. > > SELECT * FROM tab > WHERE (x,y) IN (SELECT 3, 4 > UNION > SELECT 3, 5 > UNION > SELECT 3, 6) > > I believe we actually support this. This is not going to use an index > on tab, so it may be slow if x and y are indexed. > > Another more bizarre solution is: > > SELECT * FROM tab > WHERE (x,y) = (SELECT 3, 4) OR > (x,y) = (SELECT 3, 5) OR > (x,y) = (SELECT 3, 6) > > Again, I think we do this too. I don't think cnf'ify does anything with > this. I also believe "=" uses indexes on subselects, while IN does not > because IN could return lots of rows, and an index is slower than a > non-index join on lots of rows. Of course, now that we index OR's. > > Let me ask another question. If I do: > > SELECT * FROM tab WHERE x=3 OR x=4 > > it works, and uses indexes. Why can't the optimizer just not cnf'ify > things sometimes, and just do: > > SELECT * FROM tab > WHERE (x=3 AND y=4) OR > (x=3 AND y=5) OR > (x=3 AND y=6) > > Why can it handle x=3 OR x=4, but not the more complicated case above, > without trying to be too smart? If x,y is a multi-key index, it could > use that quite easily. If not, it can do a sequentail scan and run the > tests. > > Another issue. To the optimizer, x=3 and x=y are totally different. In > x=3, it is a column compared to a constant, while in x=y, it is a join. > That makes a huge difference. > > In the case of (a=b and c=d) or e=1, you pick the best path and do the > a=b join, and throw in the e=1 entries. You can't easily do both joins, > because you also need the e=1 stuff. > > I wounder what would happen if we prevent cnf'ifying of cases where the > OR represent only column = constant restrictions. > > I meant to really go through the optimizer this month, but other backend > items took my time. > > Can someone run some tests on disabling the cnf'ify calls. It is my > understanding that with the non-cnf-ify'ed query, it can't choose an > optimial path, and starts to do either straight index matches, > sequential scans, or cartesian products where it joins every row to > every other row looking for a match. > > Let's say we turn off cnf-ify just for non-join queries. Does that > help? > > I am not sure of the ramifications of telling the optimizer it no longer > has a variety of paths to choose for evaluating the query. I did not try this earlier because I thought it was too good to be true. I was right. I tried commenting out the normalize() function in the cnfify(). The EXPLAIN showed a sequential scan and the resulting tuple set was empty. Time will not allow me to dig into this further this weekend. Unless you come up with a better solution, I am going to submit my patch on Monday to make the Sept. 1st deadline. It includes a SET switch to activate the rewrite so as not to cause problems outside the ODBC users. We can either improve, it or yank it, by the Oct. 1st deadline.
Hello, At 11.40 30/08/98 -0400, David Hartwig wrote: >> Why is the system cnf'ifying the query. Because it wants to have a >> list of qualifications that are AND'ed, so it can just pick the most >> restrictive/cheapest, and evaluate that one first. Just a small question about all this optimizations stuff. I'm not a database expert but I think we are talking about a NP-complete problem. Could'nt we convert this optimization problem into another NP one that is known to have a good solution ? For example for the traveling salesman problem there's an alghoritm that provide a solution that's never more than two times the optimal one an provides results that are *really* near the optimal one most of the times. The simplex alghoritm may be another example. I think that this kind of alghoritm would be better than a collection ot tricks for special cases, and this tricks could be used anyway when special cases are detected. Furthermore I also know that exists a free program I used in the past that provides this kind of optimizations for chip design. I don't remember the exact name of the program but I remember it came from Berkeley university. Of course may be I'm totally missing the point. Hope it helps ! Bye! Dr. Sbragion Denis InfoTecna Tel, Fax: +39 39 2324054 URL: http://space.tin.it/internet/dsbragio
This is an old message, but still relivant. I belive 6.6 will have much better OR memory usage. > > > Bruce Momjian wrote: > > > > > > > Hannu Krosing wrote: > > > > > > > > The days where every release fixed server crashes, or added a feature > > > > > that users were 'screaming for' may be a thing of the past. > > > > > > > > Is anyone working on fixing the exploding optimisations for many OR-s, > > > > at least the canonic case used by access? > > > > > > > > My impression is that this has fallen somewhere between > > > > insightdist and Vadim. > > > > > > This is really big for the ODBCers. (And I suspect for JDBCers too.) Many > > > desktop libraries and end-user tools depend on this "record set" strategy to > > > operate effectively. > > > > > > I have put together a workable hack that runs just before cnfify(). The > > > option is activated through the SET command. Once activated, it identifies > > > queries with this particular multi-OR pattern generated by these RECORD SET > > > strategies. Qualified query trees are rewritten as multiple UNIONs. (One > > > for each OR grouping). > > > > > > The results are profound. Queries that used to scan tables because of the > > > ORs, now make use of any indexes. Thus, the size of the table has virtually > > > no effect on performance. Furthermore, queries that used to crash the > > > backend, now run in under a second. > > > > > > Currently the down sides are: > > > 1. If there is no usable index, performance is significantly worse. The > > > patch does not check to make sure that there is a usable index. I could use > > > some pointers on this. > > > > > > 2. Small tables are actually a bit slower than without the patch. > > > > > > 3. Not very elegant. I am looking for a more generalized solution. > > > I have lots of ideas, but I would need to know the backend much better before > > > attempting any of them. My favorite idea is before cnfify(), to factor the > > > OR terms and pull out the constants into a virtual (temporary) table spaces. > > > Then rewrite the query as a join. The optimizer will (should) treat the new > > > query accordingly. This assumes that an efficient factoring algorithm exists > > > and that temporary tables can exist in the heap. > > > > > > Illustration: > > > SELECT ... FROM tab WHERE > > > (var1 = const1 AND var2 = const2) OR > > > (var1 = const3 AND var2 = const4) OR > > > (var1 = const5 AND var2 = const6) > > > > > > SELECT ... FROM tab, tmp WHERE > > > (var1 = var_x AND var2 = var_y) > > > > > > tmp > > > var_x | var_y > > > -------------- > > > const1|const2 > > > const3|const4 > > > const5|const6 > > > > David, where are we on this? I know we have OR's using indexes. Do we > > still need to look this as a fix, or are we OK. I have not gotten far > > enough in the optimizer to know how to fix the > > Bruce, > > If the question is, have I come up with a solution for the cnf'ify problem: No > > If the question is, is it still important: Very much yes. > > It is essential for many RAD tools using remote data objects which make use of key > sets. Your recent optimization of the OR list goes a long way, but inevitably > users are confronted with multi-part keys. > > When I look at the problem my head spins. I do not have the experience (yet?) > with the backend to be mucking around in the optimizer. As I see it, cnf'ify is > doing just what it is supposed to do. Boundless boolean logic. > > I think hope may lay though, in identifying each AND'ed group associated with a key > and tagging it as a special sub-root node which cnf'ify does not penetrate. This > node would be allowed to pass to the later stages of the optimizer where it will be > used to plan index scans. Easy for me to say. > > In the meantime, I still have the patch that I described in prior email. It has > worked well for us. Let me restate that. We could not survive without it! > However, I do not feel that is a sufficiently functional approach that should be > incorporated as a final solution. I will submit the patch if you, (anyone) does > not come up with a better solution. It is coded to be activated by a SET KSQO to > minimize its reach. > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Hi All I'm a bit lost! Where can I find documentation on accessing postgres from inside PERL (5)? Any help will be appreciated. (The thing is, I'm sure I've seen the info somewhere, but for the life of me I can't remember where...) Thanks Jason Doller
On Sun, 19 Sep 1999, Jason Doller wrote: > I'm a bit lost! Where can I find documentation on accessing postgres > from inside PERL (5)? > > Any help will be appreciated. (The thing is, I'm sure I've seen the info > somewhere, but for the life of me I can't remember where...) Under the source tree, go to the interfaces directory, and there is a directory for the perl interface (Pg.pm). You have to enable the Perl option when you run configure, and you will also have to install the module as root, since it gets installed under the module hierarchy wherever you have Perl installed. Then you only need to do a 'perldoc Pg' to see the documentation on it. See the build instructions for more information on the how to install the Perl module. Brett W. McCoy http://www.lan2wan.com/~bmccoy/ ----------------------------------------------------------------------- Don't let your mind wander -- it's too little to be let out alone.