Thread: RE: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
> > Very nice, but that's like trying to code factorization of > numbers... not > > pretty, and very CPU intensive on complex queries... > > Yes, but how large are the WHERE clauses going to be? Considering the > cost of cnfify() and UNION, it seems like a clear win. Is it general > enough to solve our problems? Could be... the examples I received where the cnfify() was really bad were cases where the query was submitted alredy in DNF... and where the UNION was a simple one. However, I don't know of any algorithms for generic simplification of logical constraints. One problem is resolution/selection of factors: SELECT * FROM a WHERE (a = 1 AND b = 2 AND c = 3) OR (a = 4 AND b = 2 AND c = 3) OR (a = 1 AND b = 5 AND c = 3) OR (a = 1 AND b = 2 AND c = 6); Try that on for size. You can understand why that code gets ugly, fast. Somebody could try coding it, but it's not a clear win to me. My original heuristic was missing one thing: "Where the heuristic fails to process or decide, default to CNF." Since that's the current behavior, we're less likely to break things. Taral
Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
From
Bruce Momjian
Date:
[Charset iso-8859-1 unsupported, filtering to ASCII...] > > > Very nice, but that's like trying to code factorization of > > numbers... not > > > pretty, and very CPU intensive on complex queries... > > > > Yes, but how large are the WHERE clauses going to be? Considering the > > cost of cnfify() and UNION, it seems like a clear win. Is it general > > enough to solve our problems? > > Could be... the examples I received where the cnfify() was really bad were > cases where the query was submitted alredy in DNF... and where the UNION was > a simple one. However, I don't know of any algorithms for generic > simplification of logical constraints. One problem is resolution/selection > of factors: > > SELECT * FROM a WHERE (a = 1 AND b = 2 AND c = 3) OR (a = 4 AND b = 2 AND c > = 3) OR (a = 1 AND b = 5 AND c = 3) OR (a = 1 AND b = 2 AND c = 6); > > Try that on for size. You can understand why that code gets ugly, fast. > Somebody could try coding it, but it's not a clear win to me. > > My original heuristic was missing one thing: "Where the heuristic fails to > process or decide, default to CNF." Since that's the current behavior, we're > less likely to break things. OK, but if we use UNION, how to we return the proper rows? Is there any solution for that, and we are executing the query over and over again. Any factoring would be faster than running those multiple queries, wouldn't it? Also, I amagine the case where we are doing a join, so we have: SELECT * FROM tab1, tab2 WHERE tab1.col1 = tab2.col2 AND ((a=1 and b=2 and c=3) OR (a=1 and b=2 and c=4)) How do we do that with UNION, and return the right rows. Seems the _join_ happending multiple times would be much worse than the factoring. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> How do we do that with UNION, and return the right rows. Seems the > _join_ happending multiple times would be much worse than the factoring. Ok... We have two problems: 1) DNF for unjoined queries. 2) Factorization for the rest. I have some solutions for (1). Not for (2). Remember that unjoined queries are quite common. :) For (1), we can always try to parallel the multiple queries... especially in the case where a sequential search is required. Taral
Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
From
Bruce Momjian
Date:
[Charset iso-8859-1 unsupported, filtering to ASCII...] > > How do we do that with UNION, and return the right rows. Seems the > > _join_ happending multiple times would be much worse than the factoring. > > Ok... We have two problems: > > 1) DNF for unjoined queries. > 2) Factorization for the rest. > > I have some solutions for (1). Not for (2). Remember that unjoined queries > are quite common. :) > > For (1), we can always try to parallel the multiple queries... especially in > the case where a sequential search is required. I don't know how to return the proper rows with UNION. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
From
Bruce Momjian
Date:
I have another idea. When we cnfify, this: (A AND B) OR (C AND D) becomes (A OR C) AND (A OR D) AND (B OR C) AND (B OR D) however if A and C are identical, this could become: (A OR A) AND (A OR D) AND (B OR A) AND (B OR D) and A OR A is A: A AND (A OR D) AND (B OR A) AND (B OR D) and since we are now saying A has to be true, we can remove OR's with A: A AND (B OR D) Much smaller, and a big win for queries like: SELECT * FROM tab WHERE (a=1 AND b=2) OR (a=1 AND b=3) This becomes: (a=1) AND (b=2 OR b=3) which is accurate, and uses our OR indexing. Seems I could code cnfify() to look for identical qualifications in two joined OR clauses and remove the duplicates. Sound like big win, and fairly easy and inexpensive in processing time. Comments? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> however if A and C are identical, this could become: > > (A OR A) AND (A OR D) AND (B OR A) AND (B OR D) > > and A OR A is A: > > A AND (A OR D) AND (B OR A) AND (B OR D) > > and since we are now saying A has to be true, we can remove OR's with A: > > A AND (B OR D) Very nice... and you could do that after each iteration of the rewrite, preventing the size from getting too big. :) I have a symbolic expression tree evaluator that would be perfect for this... I'll see if I can't adapt it. Can someone mail me the structures for expression trees? I don't want to have to excise them from the source. Please? Taral
Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
From
Bruce Momjian
Date:
[Charset iso-8859-1 unsupported, filtering to ASCII...] > > however if A and C are identical, this could become: > > > > (A OR A) AND (A OR D) AND (B OR A) AND (B OR D) > > > > and A OR A is A: > > > > A AND (A OR D) AND (B OR A) AND (B OR D) > > > > and since we are now saying A has to be true, we can remove OR's with A: > > > > A AND (B OR D) > > Very nice... and you could do that after each iteration of the rewrite, > preventing the size from getting too big. :) > > I have a symbolic expression tree evaluator that would be perfect for > this... I'll see if I can't adapt it. > > Can someone mail me the structures for expression trees? I don't want to > have to excise them from the source. Please? That is very hard to do. We have lots of structures involved. I recommend you look at backend/optimizer/prep/prepqual.c. That has the CNF'ify code, and I am studying it now. There are supporing functions on backend/nodes that will allow comparisons of many structures. We may not be that far off. normalize() does much of the work, and qual_cleanup() reomves duplicates using remove_duplicates(), but qual_cleanup() is only called after normalize completes, not during the normalization, which seems to be the problem. If we can remove the duplicates BEFORE the OR explosion, we are much better off. You can then use ctags to jump around to see the supporting structures. See the developers FAQ in the web site or doc directory. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
From
David Hartwig
Date:
Bruce Momjian wrote: > I have another idea. > > When we cnfify, this: > > (A AND B) OR (C AND D) > > becomes > > (A OR C) AND (A OR D) AND (B OR C) AND (B OR D) > > however if A and C are identical, this could become: > > (A OR A) AND (A OR D) AND (B OR A) AND (B OR D) > > and A OR A is A: > > A AND (A OR D) AND (B OR A) AND (B OR D) > > and since we are now saying A has to be true, we can remove OR's with A: > > A AND (B OR D) > > Much smaller, and a big win for queries like: > > SELECT * > FROM tab > WHERE (a=1 AND b=2) OR > (a=1 AND b=3) > > This becomes: > > (a=1) AND (b=2 OR b=3) > > which is accurate, and uses our OR indexing. > > Seems I could code cnfify() to look for identical qualifications in two > joined OR clauses and remove the duplicates. > > Sound like big win, and fairly easy and inexpensive in processing time. > > Comments? Apologies for not commenting sooner. I have been incognito. As to your earlier question, Bruce, the KSQO patch rewrites qualifying queries as UNIONs. (A AND B) OR (C AND D) ==> (A AND B) UNION (C AND D) The rules to qualify are fairly strict. Must be have ANDs; rectangular in shape; all (VAR op CONST) type nodes; minimum of 10 nodes; etc. I was targeting the keysets queries generated by ODBC tools. As for the current direction this thread is going, (factoring) I have one word of caution. PREPARE. If you take this route, you will never be able to implement a workable PREPARE statement. I believe that in order for PostgrerSQL ever become a industrial strength client/server it must implement a PREPARE statement with parameters.
Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
From
Bruce Momjian
Date:
> (A AND B) OR (C AND D) ==> (A AND B) UNION (C AND D) > > The rules to qualify are fairly strict. Must be have ANDs; rectangular in > shape; all (VAR op CONST) type nodes; minimum of 10 nodes; etc. I was > targeting the keysets queries generated by ODBC tools. > > As for the current direction this thread is going, (factoring) I have one > word of caution. PREPARE. If you take this route, you will never be able > to implement a workable PREPARE statement. I believe that in order for > PostgrerSQL ever become a industrial strength client/server it must implement > a PREPARE statement with parameters. I see that adding nodes it going to mess up prepare, but we already add extra nodes as part of part of "col in (1, 2, 3)." I think the PARAM's we already use will be duplicated/removed and still retain their values for substitution. They just may be in a different order. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
From
David Hartwig
Date:
Bruce Momjian wrote: > > (A AND B) OR (C AND D) ==> (A AND B) UNION (C AND D) > > > > The rules to qualify are fairly strict. Must be have ANDs; rectangular in > > shape; all (VAR op CONST) type nodes; minimum of 10 nodes; etc. I was > > targeting the keysets queries generated by ODBC tools. > > > > As for the current direction this thread is going, (factoring) I have one > > word of caution. PREPARE. If you take this route, you will never be able > > to implement a workable PREPARE statement. I believe that in order for > > PostgrerSQL ever become a industrial strength client/server it must implement > > a PREPARE statement with parameters. > > I see that adding nodes it going to mess up prepare, but we already add > extra nodes as part of part of "col in (1, 2, 3)." It's not extra nodes I am worried about. It is factored out nodes. > I think the PARAM's we already use will be duplicated/removed and still > retain their values for substitution. They just may be in a different > order. I realize I may be stretching the point, since I brought it up I will complete my thoughts. Now, you may understand this, but just to be sure. Here is a typical client/server scenario: - prepare statement S - retrieve result description of S - retrieve number of parameters of S - retrieve parameter descriptions of S - put data into parameters of S - execute S - retrieve result [REPEAT] - put different data into parameters of S - execute S - retrieve result [END REPEAT] - free statement S The problem is that you cannot depend upon factoring to reduce these complex statements. We need to retain a place holder (pointer) for each passed parameter. Otherwise we need to re-(parse and plan) the statement before each execution; thus, loosing one of the major benefits of PREPARE. The other major benefits are: 1. Gaining access to the statement result description w/o having to actually execute the statement. Client/server tools live off this stuff. 2. Smaller statement size. The parameters in the WHERE clause can be sent to that backend in separate chunks. Back to the subject at hand. My point is that the factoring approach may be a bit short sighted in the long term evolution of PostgreSQL.
Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
From
Bruce Momjian
Date:
> > I see that adding nodes it going to mess up prepare, but we already add > > extra nodes as part of part of "col in (1, 2, 3)." > > It's not extra nodes I am worried about. It is factored out nodes. > > > I think the PARAM's we already use will be duplicated/removed and still > > retain their values for substitution. They just may be in a different > > order. > > I realize I may be stretching the point, since I brought it up I will complete my > thoughts. Now, you may understand this, but just to be sure. Here is a typical > client/server scenario: > > - prepare statement S > - retrieve result description of S > - retrieve number of parameters of S > - retrieve parameter descriptions of S > - put data into parameters of S > - execute S > - retrieve result > [REPEAT] > - put different data into parameters of S > - execute S > - retrieve result > [END REPEAT] > - free statement S > > The problem is that you cannot depend upon factoring to reduce these complex > statements. We need to retain a place holder (pointer) for each passed > parameter. Otherwise we need to re-(parse and plan) the statement before each > execution; thus, loosing one of the major benefits of PREPARE. > > The other major benefits are: > 1. Gaining access to the statement result description w/o having to actually > execute the statement. Client/server tools live off this stuff. > 2. Smaller statement size. The parameters in the WHERE clause can be sent to that > backend in separate chunks. > Back to the subject at hand. > > My point is that the factoring approach may be a bit short sighted in the long term > evolution of PostgreSQL. Yikes. I see what you mean. The factoring of one query with certain constants will be different than another query. That will certainly be a problem. I still haven't had time to look over the cnfify code, to see if calling qual_cleanup earlier in the code will help reduce the palloc failures. If it is easy to do, I will implement it, and we can remove it or change it once we start looking at prepared queries. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
From
Bruce Momjian
Date:
> The problem is that you cannot depend upon factoring to reduce these complex > statements. We need to retain a place holder (pointer) for each passed > parameter. Otherwise we need to re-(parse and plan) the statement before each > execution; thus, loosing one of the major benefits of PREPARE. I think we already have such a problem. When using optimization statistics, the optimizer checks the value of the constant to determine how many rows will be returned by a "x > 10" by looking at the min/max values for the column. Prepared queries where this value will change would make that a problem. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
From
David Hartwig
Date:
Bruce Momjian wrote: > > The problem is that you cannot depend upon factoring to reduce these complex > > statements. We need to retain a place holder (pointer) for each passed > > parameter. Otherwise we need to re-(parse and plan) the statement before each > > execution; thus, loosing one of the major benefits of PREPARE. > > I think we already have such a problem. When using optimization > statistics, the optimizer checks the value of the constant to determine > how many rows will be returned by a "x > 10" by looking at the min/max > values for the column. Prepared queries where this value will change > would make that a problem. Gad Zooks. The future is here. I wonder how Vadim's SPI_Prepare() will respond to this. I have not used it much, but I believe it accepts parameters. For that matter, I seem to recall some kind of reduction going on in the query plan. In 6.3.2 something like: -- with an index on bar EXPLAIN SELECT stuff FROM foo WHERE bar = 1 OR bar = 2; -- does not use index; this is expected in 6.3.2 EXPLAIN SELECT stuff FROM foo WHERE bar = 1 OR bar = 1; -- uses index; I speculated on some reduction going on here. ... I just tried it with on out with our corp (6.3.2) database. _day is an indexed field on dates. corp=> explain select * from dates where _day = '1/1/99'; NOTICE: QUERY PLAN: Index Scan on dates (cost=2.05 size=1 width=24) EXPLAIN corp=> explain select * from dates where _day = '1/1/99' or _day = '1/1/99'; NOTICE: QUERY PLAN: Index Scan on dates (cost=2.05 size=1 width=24) EXPLAIN corp=> explain select * from dates where _day = '1/1/99' or _day = '1/2/99'; NOTICE: QUERY PLAN: Seq Scan on dates (cost=91.27 size=219 width=24) SPI_prepare may need to be tested, along with your example, to see how it responds.
Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
From
Bruce Momjian
Date:
> Gad Zooks. The future is here. I wonder how Vadim's SPI_Prepare() will respond > to this. I have not used it much, but I believe it accepts parameters. > > For that matter, I seem to recall some kind of reduction going on in the query > plan. In 6.3.2 something like: > > -- with an index on bar > EXPLAIN SELECT stuff FROM foo WHERE bar = 1 OR bar = 2; > -- does not use index; this is expected in 6.3.2 > > EXPLAIN SELECT stuff FROM foo WHERE bar = 1 OR bar = 1; > -- uses index; I speculated on some reduction going on here. > > ... > > I just tried it with on out with our corp (6.3.2) database. _day is an indexed field > on dates. > > corp=> explain select * from dates where _day = '1/1/99'; > NOTICE: QUERY PLAN: > > Index Scan on dates (cost=2.05 size=1 width=24) > > EXPLAIN > corp=> explain select * from dates where _day = '1/1/99' or _day = '1/1/99'; > NOTICE: QUERY PLAN: > > Index Scan on dates (cost=2.05 size=1 width=24) I believe this reduction is done by cnfify when it removes duplicates as its last step. > > EXPLAIN > corp=> explain select * from dates where _day = '1/1/99' or _day = '1/2/99'; > NOTICE: QUERY PLAN: > > Seq Scan on dates (cost=91.27 size=219 width=24) > Yes, sure looks like that is what is happening. > > SPI_prepare may need to be tested, along with your example, to see how it responds. I don't think it has actual values in the prepare, but just place-holders, so it doesn't do the reduction, and my code wouldn't do it either. It is only when they use constants, and want to re-run the query with new constants that could cause a problem. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026