Home > mailing lists

Thread: Query organization question

Query organization question

From

"Gauthier, Dave"

Date:

27 April 2009, 15:34:12

I have a poorly performing query that looks something like....

select x.name, x.title, x.value

from

(select a.name as name, b.book_title as title, c.cost as value from ......) x

where

exists (select ‘found_it’ from get_jobs(x.name) j where j.job = ‘carpenter’);

I did it this way because I was hoping that it would generate all the records in the... (select a.name as name, b.book_title as title, c.cost as value from ......) x first, and then run x.name into the get_jobs stored procedure later. In fact, running... (select a.name as name, b.book_title as title, c.cost as value from ......) x alone runs fast enough, and then running each of the returned x.name values through get_jobs manually runs fast too. But “explain” seems to indicate that it’s “relocating” that get_jobs stored procedure call inside the (select a.name as name, b.book_title as title, c.cost as value from ......) x , which gives terrible performance.

Is there a way I can force it to get the results from (select a.name as name, b.book_title as title, c.cost as value from ......) x before runing into the stored procedure?

BTW, what’s the formal name for the (select a.name as name, b.book_title as title, c.cost as value from ......) x piece of a query like this?

Thanks

-dave

Re: Query organization question

From

Grzegorz Jaśkiewicz

Date:

27 April 2009, 18:03:56

>   exists (select ‘found_it’ from get_jobs(x.name) j where j.job =
> ‘carpenter’);
What does this function do ?
If it only runs on the tables, than simple join will do it pretty fast.
also, keeping job as integer, if table is large will save you some
space, make index lookup faster, and generally make everything faster.
Subselects always perform poor, so please try writing that query as
join first. Postgresql is capable of reordering, and choosing right
approach for query, this isn't mysql - you don't have try to outsmart
db.


--
GJ

Re: Query organization question

From

"Gauthier, Dave"

Date:

27 April 2009, 18:25:38

The example was fictitious, but the structure is the same as the real problem.

The stored procedure calls another recursive stored procedure that can take a long time to run, usually about 3-4
seconds. Not bad for a handful of records, but it is now operating on a table with over 40,000 records.   

Without the stored procedure call (the fictitious "get_jobs" call), it returns about 50 records.  I can live with the
2-3minutes it'll take to run the stored proc for those, but not the 40,000+. This is why I tried to segregate it the
wayI did. 



-----Original Message-----
From: Grzegorz Jaśkiewicz [mailto:gryzman@gmail.com]
Sent: Monday, April 27, 2009 5:04 PM
To: Gauthier, Dave
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Query organization question

>   exists (select 'found_it' from get_jobs(x.name) j where j.job =
> 'carpenter');
What does this function do ?
If it only runs on the tables, than simple join will do it pretty fast.
also, keeping job as integer, if table is large will save you some
space, make index lookup faster, and generally make everything faster.
Subselects always perform poor, so please try writing that query as
join first. Postgresql is capable of reordering, and choosing right
approach for query, this isn't mysql - you don't have try to outsmart
db.


--
GJ

Re: Query organization question

From

Sam Mason

Date:

28 April 2009, 05:42:30

On Mon, Apr 27, 2009 at 02:15:05PM -0700, Gauthier, Dave wrote:
> The stored procedure calls another recursive stored procedure that
> can take a long time to run, usually about 3-4 seconds.  Not bad for
> a handful of records, but it is now operating on a table with over
> 40,000 records.

The most general solution I can think of would be to set the "cost" of
a function when creating it, have a look at the docs for create[1] or
alter[2] function.

The answer to your other question is that the inner select statement is
normally called a "sub-select".  The way to force the sub-select to be
evaluated first is to put an OFFSET 0 at the end of it, i.e.:

  SELECT a, b
  FROM (
    SELECT a.i AS a, b.i AS b
    FROM foo a, foo b
    OFFSET 0) x
  WHERE a < b;

will force PG to create the complete cross product of "foo" with itself
before trying to apply the outer WHERE clause.  I'd try changing the
cost of the function first as it should cause PG to do the "right thing"
when you use the function in other queries.

--
  Sam  http://samason.me.uk/

 [1] http://www.postgresql.org/docs/current/static/sql-createfunction.html
 [2] http://www.postgresql.org/docs/current/static/sql-alterfunction.html