Thread: Getting the output of a function used in a where clause
<div class="Section1"><p class="MsoNormal"><span class="EmailStyle15"><font color="black" face="Arial" size="2"><span style="font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family: Arial">HI,</span></font></span><p class="MsoNormal"><span class="EmailStyle15"><font color="black" face="Arial" size="2"><spanstyle="font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family: Arial"> </span></font></span><p class="MsoNormal"><span class="EmailStyle15"><font color="black" face="Arial" size="2"><spanstyle="font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family: Arial">I’m a newbie so please bear with me. I have a function defined (got it from one of your threads… thanks Joe Conway)which calculates the distance <span style="mso-spacerun: yes"> </span>between 2 zip code centeroids (in lat,long).This thing works great. However, I want to sort my results by distance without incurring the additional burdenof executing the function twice. A simplified version of my current SQL (written in a perl cgi) <span style="mso-spacerun:yes"> </span>that returns a set of zip codes within a given radius is:</span></font></span><p class="MsoNormal"><spanclass="EmailStyle15"><font color="black" face="Arial" size="2"><span style="font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family: Arial"> </span></font></span><pre><span class="EmailStyle15"><font color="black" face="Arial" size="2"><span style="font-size:10.0pt;font-family:Arial">$sql= “SELECT zipcode from zipcodes where </span></font></span><font face="Arial"><spanstyle="font-family:Arial">zipdist($lat1d,$lon1d,lat,long) <= $dist;”;</span></font></pre><pre><fontcolor="black" face="Arial" size="2"><span style="font-size:10.0pt;font-family:Arial"> </span></font></pre><pre><fontcolor="black" face="Arial" size="2"><span style="font-size:10.0pt;font-family:Arial">WhatI want to write is something like:</span></font></pre><pre><font color="black"face="Arial" size="2"><span style="font-size:10.0pt;font-family:Arial"> </span></font></pre><pre><span class="EmailStyle15"><fontcolor="black" face="Arial" size="2"><span style="font-size: 10.0pt;font-family:Arial">$sql = “SELECT zipcode, distance from zipcodes where </span></font></span><font face="Arial"><spanstyle="font-family:Arial">distance <= $dist order by distance;”;</span></font></pre><pre><font color="black"face="Arial" size="2"><span style="font-size:10.0pt;font-family:Arial"> </span></font></pre><pre><font color="black"face="Arial" size="2"><span style="font-size:10.0pt;font-family:Arial">But I don’t the magic SQL phrase to populatethe distance variable using my nifty function. Do I need to create an output type for distance?</span></font></pre><pre><fontcolor="black" face="Arial" size="2"><span style="font-size:10.0pt;font-family:Arial"> </span></font></pre><pre><fontcolor="black" face="Arial" size="2"><span style="font-size:10.0pt;font-family:Arial">Thanksin advance!</span></font></pre><pre><font color="black" face="Arial" size="2"><spanstyle="font-size:10.0pt;font-family:Arial"> </span></font></pre><pre><font color="black" face="Arial" size="2"><spanstyle="font-size:10.0pt;font-family:Arial">Bill</span></font></pre><p class="MsoNormal"><span class="EmailStyle15"><fontcolor="black" face="Arial" size="2"><span style="font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family: Arial"> </span></font></span><p class="MsoNormal"><span class="EmailStyle15"><font color="black" face="Arial" size="2"><spanstyle="font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family: Arial"> </span></font></span><p class="MsoNormal"><font color="black" face="Arial" size="3"><span style="font-size: 12.0pt;font-family:Arial;color:black"> </span></font><font color="black" face="Arial"><span style="font-family:Arial;color:black;mso-color-alt: windowtext"></span></font></div>
try: SELECT zipcode, zipdist($lat1d,$lon1d,lat,long) as distance from zipcodes where distance <= $dist;”; OR you could use a gist index with a geometric datatype to get it a lot faster. On Sat, 09 Apr 2005 03:43:39 +0200, Bill Lawrence <bill.lawrence@cox.net> wrote: > HI, > > I’m a newbie so please bear with me. I have a function defined (got it > from > one of your threads… thanks Joe Conway) which calculates the distance > between 2 zip code centeroids (in lat,long). This thing works great. > However, I want to sort my results by distance without incurring the > additional burden of executing the function twice. A simplified version > of > my current SQL (written in a perl cgi) that returns a set of zip codes > within a given radius is: > > > What I want to write is something like: > > $sql = “SELECT zipcode, distance from zipcodes where distance <= $dist > order > by distance;”; > > But I don’t the magic SQL phrase to populate the distance variable using > my > nifty function. Do I need to create an output type for distance? > > Thanks in advance! > > Bill > > >
Boy I sure thought that would work... I received the following from postgres: ERROR: Attribute "distance" not found. Started looking into gist.... Looks complex. Any other ideas? -----Original Message----- From: PFC [mailto:lists@boutiquenumerique.com] Sent: Monday, April 11, 2005 1:51 AM To: Bill Lawrence; pgsql-sql@postgresql.org Subject: Re: [SQL] Getting the output of a function used in a where clause try: SELECT zipcode, zipdist($lat1d,$lon1d,lat,long) as distance from zipcodes where distance <= $dist;"; OR you could use a gist index with a geometric datatype to get it a lot faster. On Sat, 09 Apr 2005 03:43:39 +0200, Bill Lawrence <bill.lawrence@cox.net> wrote: > HI, > > I'm a newbie so please bear with me. I have a function defined (got it > from > one of your threads... thanks Joe Conway) which calculates the distance > between 2 zip code centeroids (in lat,long). This thing works great. > However, I want to sort my results by distance without incurring the > additional burden of executing the function twice. A simplified version > of > my current SQL (written in a perl cgi) that returns a set of zip codes > within a given radius is: > > > What I want to write is something like: > > $sql = "SELECT zipcode, distance from zipcodes where distance <= $dist > order > by distance;"; > > But I don't the magic SQL phrase to populate the distance variable using > my > nifty function. Do I need to create an output type for distance? > > Thanks in advance! > > Bill > > >
Why not just do: SELECT zipcode, zipdist($lat1d,$lon1d,lat,long) as distance from zipcodes where zipdist($lat1d,$lon1d,lat,long) <= $dist;"; On Mon, 2005-04-11 at 20:25, Bill Lawrence wrote: > Boy I sure thought that would work... I received the following from postgres: > > ERROR: Attribute "distance" not found. > > Started looking into gist.... Looks complex. > > Any other ideas? > > > -----Original Message----- > From: PFC [mailto:lists@boutiquenumerique.com] > Sent: Monday, April 11, 2005 1:51 AM > To: Bill Lawrence; pgsql-sql@postgresql.org > Subject: Re: [SQL] Getting the output of a function used in a where clause > > > try: > > SELECT zipcode, zipdist($lat1d,$lon1d,lat,long) as distance from zipcodes > where distance <= $dist;"; > > OR you could use a gist index with a geometric datatype to get it a lot > faster. > > > On Sat, 09 Apr 2005 03:43:39 +0200, Bill Lawrence <bill.lawrence@cox.net> > wrote: > > > HI, > > > > I'm a newbie so please bear with me. I have a function defined (got it > > from > > one of your threads... thanks Joe Conway) which calculates the distance > > between 2 zip code centeroids (in lat,long). This thing works great. > > However, I want to sort my results by distance without incurring the > > additional burden of executing the function twice. A simplified version > > of > > my current SQL (written in a perl cgi) that returns a set of zip codes > > within a given radius is: > > > > > > What I want to write is something like: > > > > $sql = "SELECT zipcode, distance from zipcodes where distance <= $dist > > order > > by distance;"; > > > > But I don't the magic SQL phrase to populate the distance variable using > > my > > nifty function. Do I need to create an output type for distance? > > > > Thanks in advance! > > > > Bill > > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings
Thanks a bunch! Looks pretty step-by-step at the site for the link you sent. I'll give it a shot and see how it turns out. Thanks again for all your help! Bill -----Original Message----- From: PFC [mailto:lists@boutiquenumerique.com] Sent: Tuesday, April 12, 2005 1:03 AM To: Bill Lawrence Subject: Re: [SQL] Getting the output of a function used in a where clause > Boy I sure thought that would work... I received the following from > postgres: > > ERROR: Attribute "distance" not found. > > Started looking into gist.... Looks complex. > > Any other ideas? Complex ? CREATE TABLE stuff ( ... coords BOX NOT NULL, ... ) WITHOUT OIDS; CREATE INDEX cities_coords_idx ON geo.cities USING GIST ( coords gist_box_ops ); For some reason you must use BOX instead ot POINT to use the index. CREATE OR REPLACE FUNCTION boxpoint(FLOAT,FLOAT) RETURNS BOX RETURNS NULL ON NULL INPUT LANGUAGE plpgsql AS $$ DECLARE p POINT; BEGIN p := point($1,$2); IF $1=0 AND $2=0 THEN RETURN NULL; END IF; RETURN box(p,p); END; $$; now use boxpoint(x,y) to select a box : INSERT INTO stuff (...,coords,...) VALUES (...,boxpoint(x,y),...) Now to get all the records whose coords are inside a box using the index : SELECT ... WHERE cords && '((xa,ya),(xb,yb))'::box for all the details look there : http://www.postgis.org/docs/ch04.html#id3530280 it's simple once you're into it. You'll need to install postgis.
Thanks, Unfortunately, I think that solution requires the distance calculation to be executed twice for each record in the table. There are ~70K records in the table. Is the postgres query optimizer smart enough to only perform the calculation once? Bill -----Original Message----- From: Scott Marlowe [mailto:smarlowe@g2switchworks.com] Sent: Tuesday, April 12, 2005 6:55 AM To: Bill Lawrence Cc: PFC; pgsql-sql@postgresql.org Subject: Re: [SQL] Getting the output of a function used in a where clause Why not just do: SELECT zipcode, zipdist($lat1d,$lon1d,lat,long) as distance from zipcodes where zipdist($lat1d,$lon1d,lat,long) <= $dist;"; On Mon, 2005-04-11 at 20:25, Bill Lawrence wrote: > Boy I sure thought that would work... I received the following from postgres: > > ERROR: Attribute "distance" not found. > > Started looking into gist.... Looks complex. > > Any other ideas? > > > -----Original Message----- > From: PFC [mailto:lists@boutiquenumerique.com] > Sent: Monday, April 11, 2005 1:51 AM > To: Bill Lawrence; pgsql-sql@postgresql.org > Subject: Re: [SQL] Getting the output of a function used in a where clause > > > try: > > SELECT zipcode, zipdist($lat1d,$lon1d,lat,long) as distance from zipcodes > where distance <= $dist;"; > > OR you could use a gist index with a geometric datatype to get it a lot > faster. > > > On Sat, 09 Apr 2005 03:43:39 +0200, Bill Lawrence <bill.lawrence@cox.net> > wrote: > > > HI, > > > > I'm a newbie so please bear with me. I have a function defined (got it > > from > > one of your threads... thanks Joe Conway) which calculates the distance > > between 2 zip code centeroids (in lat,long). This thing works great. > > However, I want to sort my results by distance without incurring the > > additional burden of executing the function twice. A simplified version > > of > > my current SQL (written in a perl cgi) that returns a set of zip codes > > within a given radius is: > > > > > > What I want to write is something like: > > > > $sql = "SELECT zipcode, distance from zipcodes where distance <= $dist > > order > > by distance;"; > > > > But I don't the magic SQL phrase to populate the distance variable using > > my > > nifty function. Do I need to create an output type for distance? > > > > Thanks in advance! > > > > Bill > > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings
On Tue, 2005-04-12 at 23:08 -0700, Bill Lawrence wrote: > Thanks, > > Unfortunately, I think that solution requires the distance calculation to be > executed twice for each record in the table. There are ~70K records in the > table. Is the postgres query optimizer smart enough to only perform the > calculation once? It is in some places, but possibly not in that one. You can force it with a subselect though: SELECT * FROM (SELECT zipcode, zipdist($lat1d,$lon1d,lat,long) as distance from zipcodes) AS tab where distance <= $dist; > Bill > > > > -----Original Message----- > From: Scott Marlowe [mailto:smarlowe@g2switchworks.com] > Sent: Tuesday, April 12, 2005 6:55 AM > To: Bill Lawrence > Cc: PFC; pgsql-sql@postgresql.org > Subject: Re: [SQL] Getting the output of a function used in a where clause > > Why not just do: > > SELECT zipcode, zipdist($lat1d,$lon1d,lat,long) as distance from > zipcodes where zipdist($lat1d,$lon1d,lat,long) <= $dist;"; > > > On Mon, 2005-04-11 at 20:25, Bill Lawrence wrote: > > Boy I sure thought that would work... I received the following from > postgres: > > > > ERROR: Attribute "distance" not found. > > > > Started looking into gist.... Looks complex. > > > > Any other ideas? > > > > > > -----Original Message----- > > From: PFC [mailto:lists@boutiquenumerique.com] > > Sent: Monday, April 11, 2005 1:51 AM > > To: Bill Lawrence; pgsql-sql@postgresql.org > > Subject: Re: [SQL] Getting the output of a function used in a where clause > > > > > > try: > > > > SELECT zipcode, zipdist($lat1d,$lon1d,lat,long) as distance from zipcodes > > where distance <= $dist;"; > > > > OR you could use a gist index with a geometric datatype to get it a lot > > faster. > > > > > > On Sat, 09 Apr 2005 03:43:39 +0200, Bill Lawrence <bill.lawrence@cox.net> > > wrote: > > > > > HI, > > > > > > I'm a newbie so please bear with me. I have a function defined (got it > > > from > > > one of your threads... thanks Joe Conway) which calculates the distance > > > between 2 zip code centeroids (in lat,long). This thing works great. > > > However, I want to sort my results by distance without incurring the > > > additional burden of executing the function twice. A simplified version > > > of > > > my current SQL (written in a perl cgi) that returns a set of zip codes > > > within a given radius is: > > > > > > > > > What I want to write is something like: > > > > > > $sql = "SELECT zipcode, distance from zipcodes where distance <= $dist > > > order > > > by distance;"; > > > > > > But I don't the magic SQL phrase to populate the distance variable using > > > my > > > nifty function. Do I need to create an output type for distance? > > > > > > Thanks in advance! > > > > > > Bill > > > > > > > > > > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 7: don't forget to increase your free space map settings > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > --
Rod Taylor <pg@rbt.ca> writes: > You can force it with a subselect though: > SELECT * FROM (SELECT zipcode, zipdist($lat1d,$lon1d,lat,long) > as distance > from zipcodes) AS tab where distance <= $dist; The above will *not* stop zipdist from being run twice, because the planner will happily flatten the subquery into the outer query, resulting in the same situation of zipdist() being present twice in the text of the query. You could force the issue with SELECT * FROM (SELECT zipcode, zipdist($lat1d,$lon1d,lat,long) as distancefrom zipcodes OFFSET 0) AS tab wheredistance <= $dist; since LIMIT/OFFSET clauses presently disable the flattening optimization. Keep in mind though that the OFFSET is an absolute optimization fence: it will result in the subquery being evaluated completely, even if there were outer conditions that might have avoided the need to look at some rows. For example, if the query is SELECT * FROM (SELECT zipcode, zipdist($lat1d,$lon1d,lat,long) as distancefrom zipcodes OFFSET 0) AS tab wheredistance <= $distAND some-other-conditions then not letting the some-other-conditions migrate down below the evaluation of zipdist could result in making the query be far slower, not faster, than you would get if you weren't trying to outsmart the planner. In general the system operates on the assumption that function calls are cheap relative to disk I/O. If that's not true for you, you're going to have some issues ... regards, tom lane
> Thanks Tom and Rod. > > There are indeed several additional conditions on the "real" query which > prune the search space (I formulate a quick search box and filter on > Lat/Lon's within the box). Since my user interface limits the search to > a 30 > mile radius, there are at most 81 results (in New York city, far fewer, > for Why don't you use a GiST index which will index this bounding box search ?
Thanks Tom and Rod. There are indeed several additional conditions on the "real" query which prune the search space (I formulate a quick search box and filter on Lat/Lon's within the box). Since my user interface limits the search to a 30 mile radius, there are at most 81 results (in New York city, far fewer, for other regions of the US), so I've elected to post process the results in my script (calculating the distance for each returned record) and display only the closest 20 results (I intended to use a LIMIT clause combined with an ORDER BY in my SQL... LOL). Again, Thanks for all the great advice! -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Monday, April 18, 2005 8:00 PM To: Rod Taylor Cc: Bill Lawrence; Scott Marlowe; PFC; pgsql-sql@postgresql.org Subject: Re: [SQL] Getting the output of a function used in a where clause Rod Taylor <pg@rbt.ca> writes: > You can force it with a subselect though: > SELECT * FROM (SELECT zipcode, zipdist($lat1d,$lon1d,lat,long) > as distance > from zipcodes) AS tab where distance <= $dist; The above will *not* stop zipdist from being run twice, because the planner will happily flatten the subquery into the outer query, resulting in the same situation of zipdist() being present twice in the text of the query. You could force the issue with SELECT * FROM (SELECT zipcode, zipdist($lat1d,$lon1d,lat,long) as distance from zipcodes OFFSET 0) AStab where distance <= $dist; since LIMIT/OFFSET clauses presently disable the flattening optimization. Keep in mind though that the OFFSET is an absolute optimization fence: it will result in the subquery being evaluated completely, even if there were outer conditions that might have avoided the need to look at some rows. For example, if the query is SELECT * FROM (SELECT zipcode, zipdist($lat1d,$lon1d,lat,long) as distance from zipcodes OFFSET 0) AStab where distance <= $dist AND some-other-conditions then not letting the some-other-conditions migrate down below the evaluation of zipdist could result in making the query be far slower, not faster, than you would get if you weren't trying to outsmart the planner. In general the system operates on the assumption that function calls are cheap relative to disk I/O. If that's not true for you, you're going to have some issues ... regards, tom lane