Thread: Another sub-select problem...

Another sub-select problem...

From
Kevin Ready
Date:
(Originally addressed to Tom Lane)
I have spent the last 3 months developing an application using
PostgreSQL. This application analyzes raw electronic design/timing data
which can be in the range of 200 million fields + in size. 

I am currently using PostgreSQL 7.3.2.

Performance on a 4 proc RISC AIX 6GB RAM machine is fair. Performance on
a 2 proc 2.0 GHZ Xeon RedHat 9.0 3 GB RAM machine is better.PG_SETTINGS
are all defaults except for logging.

One query type is absolutely not working, so I am at a stopping point
and would like to ask for some advice-- I have been through the
PostgreSQL mailing list archives and Google search results many times
over the past 6 weeks looking for an answer.

I have a required query that looks like this:

---BAD QUERY---
LOG:  query: SELECT * FROM row WHERE meta<>1 AND ((slew_rise < 10 AND
slew_rise <> -999999) OR (slew_fall < 10 AND slew_fall <> -999999))  
AND blockID IN (SELECT blockid FROM block WHERE parent_component_classid
IN(8,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,39,40,41,42,43,44,45,
46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,
71,72,73,75,76,77,78,79,80,81,82,83,84,85,86,88,117,133,143,145,146,178,188,
193,197))   ORDER BY slew_rise  LIMIT 500 OFFSET 0
LOG:  duration: 3439.646507 sec

---GOOD QUERY FOR COMPARISON---
LOG:  query: SELECT * FROM row WHERE meta<>1 AND ((slew_rise < 10 AND
slew_rise <> -999999) OR (slew_fall < 10 AND slew_fall <> -999999))    
ORDER BY slew_rise  LIMIT 500 OFFSET 0
LOG:  duration: 2.269856 sec

The "IN(id_list)" portion is derived from a PHP-based recursive query
and is passed into postgres based on user input--always a different list
of ID's.  This seems to rule out a "JOIN" as an alternative query type
which has been the general suggestion for avoiding the infamous "IN"
issue which you are working on for 7.4.

(ORDER BY and LIMIT/OFFSET are necessary for HTML paging of results. It
is not unusual for these queries to return 100,000 rows of data.)

Question 1) Will this type of query be handled well in 7.4?
Question 2) If so,is there a dev version of 7.4 I can work with?
Question 3) Can you suggest an alternative strategy for the query? (I
have not included my table structures as I do not want to bother you too
much.)

Thanks for your time,

Kevin Ready
Toshiba America Electronic Components
STI Design Center, Austin Texas
(512)838-0332
keready@sti.taec.toshiba.com
keready@us.ibm.com






Re: Another sub-select problem...

From
Rod Taylor
Date:
> AND blockID IN (SELECT blockid FROM block WHERE parent_component_classid
> IN(8,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,39,40,41,42,43,44,45,
> 46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,
> 71,72,73,75,76,77,78,79,80,81,82,83,84,85,86,88,117,133,143,145,146,178,188,
> 193,197)

Have you tried the above in the EXISTS form?

AND EXISTS (SELECT TRUE FROM block WHERE parent_component_classid IN
(<list>) AND blockid = row.blockid)

Exists as shown above will probably be significantly faster for 7.3 and
prior if parent_component_classid.blockid is indexed.  In 7.4 you may
wish to try the IN style again, as there is a chance it'll be right.

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Re: Another sub-select problem...

From
Joe Conway
Date:
Kevin Ready wrote:
> (Originally addressed to Tom Lane)
> I have spent the last 3 months developing an application using
> PostgreSQL. This application analyzes raw electronic design/timing data
> which can be in the range of 200 million fields + in size. 
> 
> I am currently using PostgreSQL 7.3.2.
> 
> Performance on a 4 proc RISC AIX 6GB RAM machine is fair. Performance on
> a 2 proc 2.0 GHZ Xeon RedHat 9.0 3 GB RAM machine is better.PG_SETTINGS
> are all defaults except for logging.
> 
> One query type is absolutely not working, so I am at a stopping point
> and would like to ask for some advice-- I have been through the
> PostgreSQL mailing list archives and Google search results many times
> over the past 6 weeks looking for an answer.
> 
> I have a required query that looks like this:
> 
> ---BAD QUERY---
> LOG:  query: SELECT * FROM row WHERE meta<>1 AND ((slew_rise < 10 AND
> slew_rise <> -999999) OR (slew_fall < 10 AND slew_fall <> -999999))  
> AND blockID IN (SELECT blockid FROM block WHERE parent_component_classid
> IN(8,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,39,40,41,42,43,44,45,
> 46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,
> 71,72,73,75,76,77,78,79,80,81,82,83,84,85,86,88,117,133,143,145,146,178,188,
> 193,197))   ORDER BY slew_rise  LIMIT 500 OFFSET 0
> LOG:  duration: 3439.646507 sec

I've found improvement on something similar by using a subselect of 
unions. Try this:

SELECT * FROM row WHERE meta<>1 AND ((slew_rise < 10 AND
slew_rise <> -999999) OR (slew_fall < 10 AND slew_fall <> -999999))
AND blockID IN (SELECT blockid FROM block b,
(select 8 as id union all select 22 union all select 23 union all select 24 union all select 25 union all select 26
unionall select 27 union all select 28 union all select 29 union all select 30 union all select 31 union all select 32
unionall select 33 union all select 34 union all select 35 union all select 36 union all select 37 union all select 39
unionall select 40 union all select 41 union all select 42 union all select 43 union all select 44 union all select 45
unionall select 46 union all select 47 union all select 48 union all select 49 union all select 50 union all select 51
unionall select 52 union all select 53 union all select 54 union all select 55 union all select 56 union all select 57
unionall select 58 union all select 59 union all select 60 union all select 61 union all select 62 union all select 63
unionall select 64 union all select 65 union all select 66 union all select 67 union all select 68 union all select 69
unionall select 70 union all select 71 union all select 72 union all select 73 union all select 75 union all select 76
unionall select 77 union all select 78 union all select 79 union all select 80 union all select 81 union all select 82
unionall select 83 union all select 84 union all select 85 union all select 86 union all select 88 union all select 117
unionall select 133 union all select 143 union all select 145 union all select 146 union all select 178 union all
select188 union all select 193 union all select 197) as ss
 
WHERE b.parent_component_classid = ss.id)
ORDER BY slew_rise  LIMIT 500 OFFSET 0;

> Question 1) Will this type of query be handled well in 7.4?

not sure

> Question 2) If so,is there a dev version of 7.4 I can work with?

sure see http://developer.postgresql.org/docs/postgres/cvs.html

> Question 3) Can you suggest an alternative strategy for the query? (I
> have not included my table structures as I do not want to bother you too
> much.)

See above.

HTH,

Joe




Re: Another sub-select problem...

From
Stephan Szabo
Date:
On 20 May 2003, Kevin Ready wrote:

> (Originally addressed to Tom Lane)
> I have spent the last 3 months developing an application using
> PostgreSQL. This application analyzes raw electronic design/timing data
> which can be in the range of 200 million fields + in size.
>
> I am currently using PostgreSQL 7.3.2.
>
> Performance on a 4 proc RISC AIX 6GB RAM machine is fair. Performance on
> a 2 proc 2.0 GHZ Xeon RedHat 9.0 3 GB RAM machine is better.PG_SETTINGS
> are all defaults except for logging.
>
> One query type is absolutely not working, so I am at a stopping point
> and would like to ask for some advice-- I have been through the
> PostgreSQL mailing list archives and Google search results many times
> over the past 6 weeks looking for an answer.
>
> I have a required query that looks like this:
>
> ---BAD QUERY---
> LOG:  query: SELECT * FROM row WHERE meta<>1 AND ((slew_rise < 10 AND
> slew_rise <> -999999) OR (slew_fall < 10 AND slew_fall <> -999999))
> AND blockID IN (SELECT blockid FROM block WHERE parent_component_classid
> IN(8,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,39,40,41,42,43,44,45,
> 46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,
> 71,72,73,75,76,77,78,79,80,81,82,83,84,85,86,88,117,133,143,145,146,178,188,
> 193,197))   ORDER BY slew_rise  LIMIT 500 OFFSET 0
> LOG:  duration: 3439.646507 sec
>
> ---GOOD QUERY FOR COMPARISON---
> LOG:  query: SELECT * FROM row WHERE meta<>1 AND ((slew_rise < 10 AND
> slew_rise <> -999999) OR (slew_fall < 10 AND slew_fall <> -999999))
> ORDER BY slew_rise  LIMIT 500 OFFSET 0
> LOG:  duration: 2.269856 sec
>
> The "IN(id_list)" portion is derived from a PHP-based recursive query
> and is passed into postgres based on user input--always a different list
> of ID's.  This seems to rule out a "JOIN" as an alternative query type
> which has been the general suggestion for avoiding the infamous "IN"

Actually, it doesn't entirely afaics (you could get rid of the blockid in
(subselect) by joining the subselect as a subselect in from to the main
query.  But without explain analyze we can't tell if that's really the
bottleneck.

> Question 1) Will this type of query be handled well in 7.4?

Possibly, see above.

> Question 2) If so,is there a dev version of 7.4 I can work with?

Yes, I think anoncvs is back, so you can get the CVS copy.

> Question 3) Can you suggest an alternative strategy for the query? (I
> have not included my table structures as I do not want to bother you too
> much.)

The first question would be whether
select * from row, (select distinct blockid from block ... ) as a wheremeta<>1 ... slew_fall<>-999999)) and
row.blockid=a.blockidorder byslew_rise limit 500 offset 0
 

gives the same results and is faster.



Re: Another sub-select problem...

From
Christoph Haller
Date:
>
> The "IN(id_list)" portion is derived from a PHP-based recursive query
> and is passed into postgres based on user input--always a different
list
> of ID's.  This seems to rule out a "JOIN" as an alternative query type

> which has been the general suggestion for avoiding the infamous "IN"
> issue which you are working on for 7.4.
>
You've seen several alternatives already, but my approach would be to
populate a temp table with the PHP-based recursive query id_list
(the COPY command offers a fast way to do so), and then JOIN on
that temp table.

Regards, Christoph