Thread: Performance Problem with sub-select using array

Performance Problem with sub-select using array

From

"Travis Whitton"

Date:

24 August 2006, 16:01:55

Hello all, I'm running the following query on about 6,000 records worth of data, and it takes about 8 seconds to
complete.Can anyone provide any suggestions to improve performance? I have an index on two columns in the transacts
table(program_id, customer_id). If I specify a number for <a href="http://customer.id/" target="_blank">customer.id</a>
inthe sub-select, query time is reduced to about 2 seconds, which still seems like a million years for only 6,000
records,but I'm guessing that the sub-select can't resolve the id since it's done before the outer query, so it scans
theentre recordset for every row? Transacts is a many to many table for customers and programs. I know this query
doesn'teven reference any columns from programs; however, I dynamically insert where clauses to constrain the result
set.<br /><br />SELECT distinct <a href="http://customers.id/" target="_blank">customers.id</a>, first_name, last_name,
address1,contact_city, contact_state, primary_phone, email, array(select programs.program_name from transacts, programs
wherecustomer_id = <a href="http://customers.id/" target="_blank">customers.id</a> and <a href="http://programs.id/"
target="_blank">programs.id</a> = transacts.program_id and submit_status = 'success') AS partners from customers,
transacts,programs where transacts.customer_id = <a href="http://customers.id/" target="_blank">customers.id</a> and
transacts.program_id= <a href="http://programs.id/" target="_blank"> programs.id</a><br /><br />

Re: Performance Problem with sub-select using array

From

"Aaron Bono"

Date:

28 August 2006, 01:10:13

On 8/24/06, Travis Whitton <tinymountain@gmail.com> wrote:

Hello all, I'm running the following query on about 6,000 records worth of data, and it takes about 8 seconds to complete. Can anyone provide any suggestions to improve performance? I have an index on two columns in the transacts table (program_id, customer_id). If I specify a number for customer.id in the sub-select, query time is reduced to about 2 seconds, which still seems like a million years for only 6,000 records, but I'm guessing that the sub-select can't resolve the id since it's done before the outer query, so it scans the entre recordset for every row? Transacts is a many to many table for customers and programs. I know this query doesn't even reference any columns from programs; however, I dynamically insert where clauses to constrain the result set.

SELECT distinct customers.id, first_name, last_name, address1, contact_city, contact_state, primary_phone, email, array(select programs.program_name from transacts, programs where customer_id = customers.id and programs.id = transacts.program_id and submit_status = 'success') AS partners from customers, transacts, programs where transacts.customer_id = customers.id and transacts.program_id = programs.id

My guess is that your problem is that you may be getting 6000 rows, but the array(select ....) is having to run once for each of record returned (so it is running 6000 times).

Try an explain analyze: http://www.postgresql.org/docs/7.4/interactive/sql-explain.html - that will reveal more of where the performance problem is.

==================================================================
   Aaron Bono
   Aranya Software Technologies, Inc.
   http://www.aranya.com
   http://codeelixir.com
==================================================================

Re: Performance Problem with sub-select using array

From

"Aaron Bono"

Date:

29 August 2006, 11:28:52

Make sure you do a reply to all so you include the list....

On 8/28/06, Travis Whitton <tinymountain@gmail.com > wrote:

Ok, I actually got down to business with EXPLAIN ANALYZE. My performance was actually suffering from the DISTINCT in my SQL query and not the subquery, which I guess isn't run repeatedly since it's not constrained and can be cached by the optimizer? Bottom line is, by replacing DISTINCT with DISTINCT ON all my index conditions show up in the EXPLAIN output. Best of all: Total runtime: 353.588 ms. Thanks for the help.

Travis

On 8/28/06, Travis Whitton <tinymountain@gmail.com> wrote:
I'm pretty sure you're right, which leads me to my next question. Is it possible to pass a column from an outer query to a subquery? For example, is there a way to do something like.

SELECT owners.id AS owner_id, array(SELECT dogs.name WHERE owners.id = owner_id) ...

I would just do a normal inner-join, but then I get a row for each item that would otherwise come back nicely packaged in the array. The overhead of rearranging the data takes even more time than the subquery approach.

On 8/28/06, Aaron Bono < postgresql@aranya.com> wrote:
On 8/24/06, Travis Whitton <tinymountain@gmail.com > wrote:
Hello all, I'm running the following query on about 6,000 records worth of data, and it takes about 8 seconds to complete. Can anyone provide any suggestions to improve performance? I have an index on two columns in the transacts table (program_id, customer_id). If I specify a number for customer.id in the sub-select, query time is reduced to about 2 seconds, which still seems like a million years for only 6,000 records, but I'm guessing that the sub-select can't resolve the id since it's done before the outer query, so it scans the entre recordset for every row? Transacts is a many to many table for customers and programs. I know this query doesn't even reference any columns from programs; however, I dynamically insert where clauses to constrain the result set.

SELECT distinct customers.id, first_name, last_name, address1, contact_city, contact_state, primary_phone, email, array(select programs.program_name from transacts, programs where customer_id = customers.id and programs.id = transacts.program_id and submit_status = 'success') AS partners from customers, transacts, programs where transacts.customer_id = customers.id and transacts.program_id = programs.id

My guess is that your problem is that you may be getting 6000 rows, but the array(select ....) is having to run once for each of record returned (so it is running 6000 times).

Try an explain analyze: http://www.postgresql.org/docs/7.4/interactive/sql-explain.html - that will reveal more of where the performance problem is.

Re: Performance Problem with sub-select using array

From

"Aaron Bono"

Date:

29 August 2006, 11:34:46

On 8/28/06, Travis Whitton <tinymountain@gmail.com> wrote:

I'm pretty sure you're right, which leads me to my next question. Is it possible to pass a column from an outer query to a subquery? For example, is there a way to do something like.

SELECT owners.id AS owner_id, array(SELECT dogs.name WHERE owners.id = owner_id) ...

I would just do a normal inner-join, but then I get a row for each item that would otherwise come back nicely packaged in the array. The overhead of rearranging the data takes even more time than the subquery approach.

I don't think you can do that but I may be wrong. I usually try to stay away from correlated sub-queries because of performance concerns and query complexity. I find simple subqueries with well formed inner/outer joins work much better.

Does anyone know where documentation about the array function can be found? I did a search but cannot find it on the postgresql web site.

On 8/28/06, Aaron Bono < postgresql@aranya.com> wrote:
On 8/24/06, Travis Whitton <tinymountain@gmail.com > wrote:
Hello all, I'm running the following query on about 6,000 records worth of data, and it takes about 8 seconds to complete. Can anyone provide any suggestions to improve performance? I have an index on two columns in the transacts table (program_id, customer_id). If I specify a number for customer.id in the sub-select, query time is reduced to about 2 seconds, which still seems like a million years for only 6,000 records, but I'm guessing that the sub-select can't resolve the id since it's done before the outer query, so it scans the entre recordset for every row? Transacts is a many to many table for customers and programs. I know this query doesn't even reference any columns from programs; however, I dynamically insert where clauses to constrain the result set.

SELECT distinct customers.id, first_name, last_name, address1, contact_city, contact_state, primary_phone, email, array(select programs.program_name from transacts, programs where customer_id = customers.id and programs.id = transacts.program_id and submit_status = 'success') AS partners from customers, transacts, programs where transacts.customer_id = customers.id and transacts.program_id = programs.id

My guess is that your problem is that you may be getting 6000 rows, but the array(select ....) is having to run once for each of record returned (so it is running 6000 times).

Try an explain analyze: http://www.postgresql.org/docs/7.4/interactive/sql-explain.html - that will reveal more of where the performance problem is.