Re: Performance Problem with sub-select using array - Mailing list pgsql-sql

From Aaron Bono
Subject Re: Performance Problem with sub-select using array
Date
Msg-id bf05e51c0608290728t5c1c7cb8h8501aed7e359e3cb@mail.gmail.com
Whole thread Raw
In response to Performance Problem with sub-select using array  ("Travis Whitton" <tinymountain@gmail.com>)
List pgsql-sql
Make sure you do a reply to all so you include the list....

On 8/28/06, Travis Whitton <tinymountain@gmail.com > wrote:
Ok, I actually got down to business with EXPLAIN ANALYZE. My performance was actually suffering from the DISTINCT in my SQL query and not the subquery, which I guess isn't run repeatedly since it's not constrained and can be cached by the optimizer? Bottom line is, by replacing DISTINCT with DISTINCT ON all my index conditions show up in the EXPLAIN output. Best of all: Total runtime: 353.588 ms. Thanks for the help.

Travis


On 8/28/06, Travis Whitton <tinymountain@gmail.com> wrote:
I'm pretty sure you're right, which leads me to my next question. Is it possible to pass a column from an outer query to a subquery? For example, is there a way to do something like.

SELECT owners.id AS owner_id, array(SELECT dogs.name WHERE owners.id = owner_id)  ...

I would just do a normal inner-join, but then I get a row for each item that would otherwise come back nicely packaged in the array. The overhead of rearranging the data takes even more time than the subquery approach.


On 8/28/06, Aaron Bono < postgresql@aranya.com> wrote:
On 8/24/06, Travis Whitton <tinymountain@gmail.com > wrote:
Hello all, I'm running the following query on about 6,000 records worth of data, and it takes about 8 seconds to complete. Can anyone provide any suggestions to improve performance? I have an index on two columns in the transacts table (program_id, customer_id). If I specify a number for customer.id in the sub-select, query time is reduced to about 2 seconds, which still seems like a million years for only 6,000 records, but I'm guessing that the sub-select can't resolve the id since it's done before the outer query, so it scans the entre recordset for every row? Transacts is a many to many table for customers and programs. I know this query doesn't even reference any columns from programs; however, I dynamically insert where clauses to constrain the result set.

SELECT distinct customers.id, first_name, last_name, address1, contact_city, contact_state, primary_phone, email, array(select programs.program_name from transacts, programs where customer_id = customers.id and programs.id = transacts.program_id and submit_status = 'success') AS partners from customers, transacts, programs where transacts.customer_id = customers.id and transacts.program_id = programs.id

 
My guess is that your problem is that you may be getting 6000 rows, but the array(select ....) is having to run once for each of record returned (so it is running 6000 times).

Try an explain analyze: http://www.postgresql.org/docs/7.4/interactive/sql-explain.html - that will reveal more of where the performance problem is.

==================================================================
   Aaron Bono
   Aranya Software Technologies, Inc.
   http://www.aranya.com
   http://codeelixir.com
==================================================================

pgsql-sql by date:

Previous
From: "Jaime Casanova"
Date:
Subject: Re: dinamic sql
Next
From: "Aaron Bono"
Date:
Subject: Re: Performance Problem with sub-select using array