Thread: JDBC/Stored procedure performance issue
Hi All, I am experiencing a strange performance issue with Postgresql (7.4.19) + PostGIS. (I posted to the PostGIS list but got no response, so am trying here.) We have a table of entries that contains latitude, longitude values and I have a simple query to retrieve all entries within a specified 2- D box. The latitude, longitude are stored as decimals, plus a trigger stores the corresponding geometry object. When I do an EXPLAIN ANALYZE on one query that returns 3261 rows, it executes in a reasonable 159ms: EXPLAIN ANALYZE SELECT DISTINCT latitude, longitude, color FROM NewEntries WHERE groupid = 57925 AND location @ SetSRID(MakeBox2D(SetSRID(MakePoint(-123.75, 36.597889), 4326), SetSRID(MakePoint(-118.125, 40.979898), 4326)), 4326); Unique (cost=23.73..23.74 rows=1 width=30) (actual time=143.648..156.081 rows=3261 loops=1) -> Sort (cost=23.73..23.73 rows=1 width=30) (actual time=143.640..146.214 rows=3369 loops=1) Sort Key: latitude, longitude, color -> Index Scan using group_index on newentries (cost=0.00..23.72 rows=1 width=30) (actual time=0.184..109.346 rows=3369 loops=1) Index Cond: (groupid = 57925) Filter: ("location" @ '0103000020E610000001000000050000000000000000F05EC0000000A0874C42400000000000F05EC0000000406D7D44400000000000885DC0000000406D7D44400000000000885DC0000000A0874C42400000000000F05EC0000000A0874C4240 '::geometry) Total runtime: 159.430 ms (7 rows) If I issue the same query over JDBC or use a PSQL stored procedure, it takes over 3000 ms, which, of course is unacceptable! Function Scan on gettilelocations (cost=0.00..12.50 rows=1000 width=30) (actual time=3311.368..3319.265 rows=3261 loops=1) Total runtime: 3322.529 ms (2 rows) The function gettilelocations is defined as: CREATE OR REPLACE FUNCTION GetTileLocations(Integer, real, real, real, real) RETURNS SETOF TileLocation AS ' DECLARE R TileLocation; BEGIN FOR R IN SELECT DISTINCT latitude, longitude, color FROM NewEntries WHERE groupid = $1 AND location @ SetSRID(MakeBox2D(SetSRID(MakePoint($2, $3), 4326), SetSRID(MakePoint($4, $5), 4326)), 4326) LOOP RETURN NEXT R; END LOOP; RETURN; END; ' LANGUAGE plpgsql STABLE RETURNS NULL ON NULL INPUT; Can someone please tell me what we are doing wrong? Any help would be greatly appreciated. Thanks Claire -- Claire McLister mclister@zeesource.net 21060 Homestead Road Suite 150 Cupertino, CA 95014 408-733-2737(fax) http://www.zeemaps.com
Claire McLister <mclister@zeesource.net> writes: > When I do an EXPLAIN ANALYZE on one query that returns 3261 rows, it > executes in a reasonable 159ms: > ... > If I issue the same query over JDBC or use a PSQL stored procedure, it > takes over 3000 ms, which, of course is unacceptable! I suspect that the problem is with "groupid = $1" instead of "groupid = 57925". The planner is probably avoiding an indexscan in the parameterized case because it's guessing the actual value will match so many rows as to make a seqscan faster. Is the distribution of groupid highly skewed? You might get better results if you increase the statistics target for that column. Switching to something newer than 7.4.x might help too. 8.1 and up support "bitmap" indexscans which work much better for large numbers of hits, and correspondingly the planner will use one in cases where it wouldn't use a plain indexscan. regards, tom lane
Hi Tom,
Is there any way to work out what plan the query is using in side the function? I think I have a similar problem with a query taking much longer from inside a function than it does as a select statement.
Regards
Matthew
Tom Lane wrote:
Is there any way to work out what plan the query is using in side the function? I think I have a similar problem with a query taking much longer from inside a function than it does as a select statement.
Regards
Matthew
Tom Lane wrote:
Claire McLister <mclister@zeesource.net> writes:When I do an EXPLAIN ANALYZE on one query that returns 3261 rows, it executes in a reasonable 159ms: ... If I issue the same query over JDBC or use a PSQL stored procedure, it takes over 3000 ms, which, of course is unacceptable!I suspect that the problem is with "groupid = $1" instead of "groupid = 57925". The planner is probably avoiding an indexscan in the parameterized case because it's guessing the actual value will match so many rows as to make a seqscan faster. Is the distribution of groupid highly skewed? You might get better results if you increase the statistics target for that column. Switching to something newer than 7.4.x might help too. 8.1 and up support "bitmap" indexscans which work much better for large numbers of hits, and correspondingly the planner will use one in cases where it wouldn't use a plain indexscan. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly _____________________________________________________________________ This e-mail has been scanned for viruses by Verizon Business Internet Managed Scanning Services - powered by MessageLabs. For further information visit http://www.verizonbusiness.com/uk
Matthew Lunnon <mlunnon@rwa-net.co.uk> writes: > Is there any way to work out what plan the query is using in side the > function? I think I have a similar problem with a query taking much > longer from inside a function than it does as a select statement. Standard approach is to PREPARE a statement that has parameters in the same places where the function uses variables/parameters, and then use EXPLAIN [ANALYZE] EXECUTE to test it. regards, tom lane
Thanks, Tom. Looks like that was the issue. I changed the function to use groupid = 57925 instead of groupid = $1 (I can do the same change in the JDBC prepare statement), and the performance is much better. It is still more than twice that of the simple query: 401.111 ms vs. 155.544 ms, which, however, is more acceptable than 3000ms. Will upgrade to 8.1 at some point, but would like to get reasonable performance with 7.4 until then. I did increase the statistics target to 1000. Claire On Jan 28, 2008, at 12:51 PM, Tom Lane wrote: > Claire McLister <mclister@zeesource.net> writes: >> When I do an EXPLAIN ANALYZE on one query that returns 3261 rows, it >> executes in a reasonable 159ms: >> ... >> If I issue the same query over JDBC or use a PSQL stored procedure, >> it >> takes over 3000 ms, which, of course is unacceptable! > > I suspect that the problem is with "groupid = $1" instead of > "groupid = 57925". The planner is probably avoiding an indexscan > in the parameterized case because it's guessing the actual value will > match so many rows as to make a seqscan faster. Is the distribution > of groupid highly skewed? You might get better results if you > increase > the statistics target for that column. > > Switching to something newer than 7.4.x might help too. 8.1 and up > support "bitmap" indexscans which work much better for large numbers > of hits, and correspondingly the planner will use one in cases where > it wouldn't use a plain indexscan. > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that > your > message can get through to the mailing list cleanly