Thread: Another sub-select problem...
(Originally addressed to Tom Lane) I have spent the last 3 months developing an application using PostgreSQL. This application analyzes raw electronic design/timing data which can be in the range of 200 million fields + in size. I am currently using PostgreSQL 7.3.2. Performance on a 4 proc RISC AIX 6GB RAM machine is fair. Performance on a 2 proc 2.0 GHZ Xeon RedHat 9.0 3 GB RAM machine is better.PG_SETTINGS are all defaults except for logging. One query type is absolutely not working, so I am at a stopping point and would like to ask for some advice-- I have been through the PostgreSQL mailing list archives and Google search results many times over the past 6 weeks looking for an answer. I have a required query that looks like this: ---BAD QUERY--- LOG: query: SELECT * FROM row WHERE meta<>1 AND ((slew_rise < 10 AND slew_rise <> -999999) OR (slew_fall < 10 AND slew_fall <> -999999)) AND blockID IN (SELECT blockid FROM block WHERE parent_component_classid IN(8,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,39,40,41,42,43,44,45, 46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70, 71,72,73,75,76,77,78,79,80,81,82,83,84,85,86,88,117,133,143,145,146,178,188, 193,197)) ORDER BY slew_rise LIMIT 500 OFFSET 0 LOG: duration: 3439.646507 sec ---GOOD QUERY FOR COMPARISON--- LOG: query: SELECT * FROM row WHERE meta<>1 AND ((slew_rise < 10 AND slew_rise <> -999999) OR (slew_fall < 10 AND slew_fall <> -999999)) ORDER BY slew_rise LIMIT 500 OFFSET 0 LOG: duration: 2.269856 sec The "IN(id_list)" portion is derived from a PHP-based recursive query and is passed into postgres based on user input--always a different list of ID's. This seems to rule out a "JOIN" as an alternative query type which has been the general suggestion for avoiding the infamous "IN" issue which you are working on for 7.4. (ORDER BY and LIMIT/OFFSET are necessary for HTML paging of results. It is not unusual for these queries to return 100,000 rows of data.) Question 1) Will this type of query be handled well in 7.4? Question 2) If so,is there a dev version of 7.4 I can work with? Question 3) Can you suggest an alternative strategy for the query? (I have not included my table structures as I do not want to bother you too much.) Thanks for your time, Kevin Ready Toshiba America Electronic Components STI Design Center, Austin Texas (512)838-0332 keready@sti.taec.toshiba.com keready@us.ibm.com
> AND blockID IN (SELECT blockid FROM block WHERE parent_component_classid > IN(8,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,39,40,41,42,43,44,45, > 46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70, > 71,72,73,75,76,77,78,79,80,81,82,83,84,85,86,88,117,133,143,145,146,178,188, > 193,197) Have you tried the above in the EXISTS form? AND EXISTS (SELECT TRUE FROM block WHERE parent_component_classid IN (<list>) AND blockid = row.blockid) Exists as shown above will probably be significantly faster for 7.3 and prior if parent_component_classid.blockid is indexed. In 7.4 you may wish to try the IN style again, as there is a chance it'll be right. -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
Kevin Ready wrote: > (Originally addressed to Tom Lane) > I have spent the last 3 months developing an application using > PostgreSQL. This application analyzes raw electronic design/timing data > which can be in the range of 200 million fields + in size. > > I am currently using PostgreSQL 7.3.2. > > Performance on a 4 proc RISC AIX 6GB RAM machine is fair. Performance on > a 2 proc 2.0 GHZ Xeon RedHat 9.0 3 GB RAM machine is better.PG_SETTINGS > are all defaults except for logging. > > One query type is absolutely not working, so I am at a stopping point > and would like to ask for some advice-- I have been through the > PostgreSQL mailing list archives and Google search results many times > over the past 6 weeks looking for an answer. > > I have a required query that looks like this: > > ---BAD QUERY--- > LOG: query: SELECT * FROM row WHERE meta<>1 AND ((slew_rise < 10 AND > slew_rise <> -999999) OR (slew_fall < 10 AND slew_fall <> -999999)) > AND blockID IN (SELECT blockid FROM block WHERE parent_component_classid > IN(8,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,39,40,41,42,43,44,45, > 46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70, > 71,72,73,75,76,77,78,79,80,81,82,83,84,85,86,88,117,133,143,145,146,178,188, > 193,197)) ORDER BY slew_rise LIMIT 500 OFFSET 0 > LOG: duration: 3439.646507 sec I've found improvement on something similar by using a subselect of unions. Try this: SELECT * FROM row WHERE meta<>1 AND ((slew_rise < 10 AND slew_rise <> -999999) OR (slew_fall < 10 AND slew_fall <> -999999)) AND blockID IN (SELECT blockid FROM block b, (select 8 as id union all select 22 union all select 23 union all select 24 union all select 25 union all select 26 unionall select 27 union all select 28 union all select 29 union all select 30 union all select 31 union all select 32 unionall select 33 union all select 34 union all select 35 union all select 36 union all select 37 union all select 39 unionall select 40 union all select 41 union all select 42 union all select 43 union all select 44 union all select 45 unionall select 46 union all select 47 union all select 48 union all select 49 union all select 50 union all select 51 unionall select 52 union all select 53 union all select 54 union all select 55 union all select 56 union all select 57 unionall select 58 union all select 59 union all select 60 union all select 61 union all select 62 union all select 63 unionall select 64 union all select 65 union all select 66 union all select 67 union all select 68 union all select 69 unionall select 70 union all select 71 union all select 72 union all select 73 union all select 75 union all select 76 unionall select 77 union all select 78 union all select 79 union all select 80 union all select 81 union all select 82 unionall select 83 union all select 84 union all select 85 union all select 86 union all select 88 union all select 117 unionall select 133 union all select 143 union all select 145 union all select 146 union all select 178 union all select188 union all select 193 union all select 197) as ss WHERE b.parent_component_classid = ss.id) ORDER BY slew_rise LIMIT 500 OFFSET 0; > Question 1) Will this type of query be handled well in 7.4? not sure > Question 2) If so,is there a dev version of 7.4 I can work with? sure see http://developer.postgresql.org/docs/postgres/cvs.html > Question 3) Can you suggest an alternative strategy for the query? (I > have not included my table structures as I do not want to bother you too > much.) See above. HTH, Joe
On 20 May 2003, Kevin Ready wrote: > (Originally addressed to Tom Lane) > I have spent the last 3 months developing an application using > PostgreSQL. This application analyzes raw electronic design/timing data > which can be in the range of 200 million fields + in size. > > I am currently using PostgreSQL 7.3.2. > > Performance on a 4 proc RISC AIX 6GB RAM machine is fair. Performance on > a 2 proc 2.0 GHZ Xeon RedHat 9.0 3 GB RAM machine is better.PG_SETTINGS > are all defaults except for logging. > > One query type is absolutely not working, so I am at a stopping point > and would like to ask for some advice-- I have been through the > PostgreSQL mailing list archives and Google search results many times > over the past 6 weeks looking for an answer. > > I have a required query that looks like this: > > ---BAD QUERY--- > LOG: query: SELECT * FROM row WHERE meta<>1 AND ((slew_rise < 10 AND > slew_rise <> -999999) OR (slew_fall < 10 AND slew_fall <> -999999)) > AND blockID IN (SELECT blockid FROM block WHERE parent_component_classid > IN(8,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,39,40,41,42,43,44,45, > 46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70, > 71,72,73,75,76,77,78,79,80,81,82,83,84,85,86,88,117,133,143,145,146,178,188, > 193,197)) ORDER BY slew_rise LIMIT 500 OFFSET 0 > LOG: duration: 3439.646507 sec > > ---GOOD QUERY FOR COMPARISON--- > LOG: query: SELECT * FROM row WHERE meta<>1 AND ((slew_rise < 10 AND > slew_rise <> -999999) OR (slew_fall < 10 AND slew_fall <> -999999)) > ORDER BY slew_rise LIMIT 500 OFFSET 0 > LOG: duration: 2.269856 sec > > The "IN(id_list)" portion is derived from a PHP-based recursive query > and is passed into postgres based on user input--always a different list > of ID's. This seems to rule out a "JOIN" as an alternative query type > which has been the general suggestion for avoiding the infamous "IN" Actually, it doesn't entirely afaics (you could get rid of the blockid in (subselect) by joining the subselect as a subselect in from to the main query. But without explain analyze we can't tell if that's really the bottleneck. > Question 1) Will this type of query be handled well in 7.4? Possibly, see above. > Question 2) If so,is there a dev version of 7.4 I can work with? Yes, I think anoncvs is back, so you can get the CVS copy. > Question 3) Can you suggest an alternative strategy for the query? (I > have not included my table structures as I do not want to bother you too > much.) The first question would be whether select * from row, (select distinct blockid from block ... ) as a wheremeta<>1 ... slew_fall<>-999999)) and row.blockid=a.blockidorder byslew_rise limit 500 offset 0 gives the same results and is faster.
> > The "IN(id_list)" portion is derived from a PHP-based recursive query > and is passed into postgres based on user input--always a different list > of ID's. This seems to rule out a "JOIN" as an alternative query type > which has been the general suggestion for avoiding the infamous "IN" > issue which you are working on for 7.4. > You've seen several alternatives already, but my approach would be to populate a temp table with the PHP-based recursive query id_list (the COPY command offers a fast way to do so), and then JOIN on that temp table. Regards, Christoph