Re: Performance Problem with sub-select using array - Mailing list pgsql-sql
From | Aaron Bono |
---|---|
Subject | Re: Performance Problem with sub-select using array |
Date | |
Msg-id | bf05e51c0608290734w7795d260q160abc9b91e8a58e@mail.gmail.com Whole thread Raw |
In response to | Performance Problem with sub-select using array ("Travis Whitton" <tinymountain@gmail.com>) |
List | pgsql-sql |
On 8/28/06, Travis Whitton <tinymountain@gmail.com> wrote:
I don't think you can do that but I may be wrong. I usually try to stay away from correlated sub-queries because of performance concerns and query complexity. I find simple subqueries with well formed inner/outer joins work much better.
Does anyone know where documentation about the array function can be found? I did a search but cannot find it on the postgresql web site.
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
I'm pretty sure you're right, which leads me to my next question. Is it possible to pass a column from an outer query to a subquery? For example, is there a way to do something like.
SELECT owners.id AS owner_id, array(SELECT dogs.name WHERE owners.id = owner_id) ...
I would just do a normal inner-join, but then I get a row for each item that would otherwise come back nicely packaged in the array. The overhead of rearranging the data takes even more time than the subquery approach.
I don't think you can do that but I may be wrong. I usually try to stay away from correlated sub-queries because of performance concerns and query complexity. I find simple subqueries with well formed inner/outer joins work much better.
Does anyone know where documentation about the array function can be found? I did a search but cannot find it on the postgresql web site.
On 8/28/06, Aaron Bono < postgresql@aranya.com> wrote:On 8/24/06, Travis Whitton <tinymountain@gmail.com > wrote:Hello all, I'm running the following query on about 6,000 records worth of data, and it takes about 8 seconds to complete. Can anyone provide any suggestions to improve performance? I have an index on two columns in the transacts table (program_id, customer_id). If I specify a number for customer.id in the sub-select, query time is reduced to about 2 seconds, which still seems like a million years for only 6,000 records, but I'm guessing that the sub-select can't resolve the id since it's done before the outer query, so it scans the entre recordset for every row? Transacts is a many to many table for customers and programs. I know this query doesn't even reference any columns from programs; however, I dynamically insert where clauses to constrain the result set.
SELECT distinct customers.id, first_name, last_name, address1, contact_city, contact_state, primary_phone, email, array(select programs.program_name from transacts, programs where customer_id = customers.id and programs.id = transacts.program_id and submit_status = 'success') AS partners from customers, transacts, programs where transacts.customer_id = customers.id and transacts.program_id = programs.id
My guess is that your problem is that you may be getting 6000 rows, but the array(select ....) is having to run once for each of record returned (so it is running 6000 times).
Try an explain analyze: http://www.postgresql.org/docs/7.4/interactive/sql-explain.html - that will reveal more of where the performance problem is.
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================