Re: Performance Problem with sub-select using array - Mailing list pgsql-sql

From Aaron Bono
Subject Re: Performance Problem with sub-select using array
Date
Msg-id bf05e51c0608290734w7795d260q160abc9b91e8a58e@mail.gmail.com
Whole thread Raw
In response to Performance Problem with sub-select using array  ("Travis Whitton" <tinymountain@gmail.com>)
List pgsql-sql
On 8/28/06, Travis Whitton <tinymountain@gmail.com> wrote:
I'm pretty sure you're right, which leads me to my next question. Is it possible to pass a column from an outer query to a subquery? For example, is there a way to do something like.

SELECT owners.id AS owner_id, array(SELECT dogs.name WHERE owners.id = owner_id)  ...

I would just do a normal inner-join, but then I get a row for each item that would otherwise come back nicely packaged in the array. The overhead of rearranging the data takes even more time than the subquery approach.


I don't think you can do that but I may be wrong.  I usually try to stay away from correlated sub-queries because of performance concerns and query complexity.  I find simple subqueries with well formed inner/outer joins work much better.

Does anyone know where documentation about the array function can be found?  I did a search but cannot find it on the postgresql web site.
 

On 8/28/06, Aaron Bono < postgresql@aranya.com> wrote:
On 8/24/06, Travis Whitton <tinymountain@gmail.com > wrote:
Hello all, I'm running the following query on about 6,000 records worth of data, and it takes about 8 seconds to complete. Can anyone provide any suggestions to improve performance? I have an index on two columns in the transacts table (program_id, customer_id). If I specify a number for customer.id in the sub-select, query time is reduced to about 2 seconds, which still seems like a million years for only 6,000 records, but I'm guessing that the sub-select can't resolve the id since it's done before the outer query, so it scans the entre recordset for every row? Transacts is a many to many table for customers and programs. I know this query doesn't even reference any columns from programs; however, I dynamically insert where clauses to constrain the result set.

SELECT distinct customers.id, first_name, last_name, address1, contact_city, contact_state, primary_phone, email, array(select programs.program_name from transacts, programs where customer_id = customers.id and programs.id = transacts.program_id and submit_status = 'success') AS partners from customers, transacts, programs where transacts.customer_id = customers.id and transacts.program_id = programs.id

 
My guess is that your problem is that you may be getting 6000 rows, but the array(select ....) is having to run once for each of record returned (so it is running 6000 times).

Try an explain analyze: http://www.postgresql.org/docs/7.4/interactive/sql-explain.html - that will reveal more of where the performance problem is.

==================================================================
   Aaron Bono
   Aranya Software Technologies, Inc.
   http://www.aranya.com
   http://codeelixir.com
==================================================================

pgsql-sql by date:

Previous
From: "Aaron Bono"
Date:
Subject: Re: Performance Problem with sub-select using array
Next
From: Sumeet
Date:
Subject: