Thread: JDBC/Stored procedure performance issue

JDBC/Stored procedure performance issue

From
Claire McLister
Date:
Hi All,

I am experiencing a strange performance issue with Postgresql (7.4.19)
+ PostGIS. (I posted to the PostGIS list but got no response, so am
trying here.)

We have a table of entries that contains latitude, longitude values
and I have a simple query to retrieve all entries within a specified 2-
D box.

The latitude, longitude are stored as decimals, plus a trigger stores
the corresponding geometry object.

When I do an EXPLAIN ANALYZE on one query that returns 3261 rows, it
executes in a reasonable 159ms:

EXPLAIN ANALYZE SELECT DISTINCT latitude, longitude, color FROM
NewEntries
     WHERE groupid = 57925 AND
               location @ SetSRID(MakeBox2D(SetSRID(MakePoint(-123.75,
36.597889), 4326),
                                            SetSRID(MakePoint(-118.125,
40.979898), 4326)), 4326);


Unique  (cost=23.73..23.74 rows=1 width=30) (actual
time=143.648..156.081 rows=3261 loops=1)
   ->  Sort  (cost=23.73..23.73 rows=1 width=30) (actual
time=143.640..146.214 rows=3369 loops=1)
         Sort Key: latitude, longitude, color
         ->  Index Scan using group_index on newentries
(cost=0.00..23.72 rows=1 width=30) (actual time=0.184..109.346
rows=3369 loops=1)
               Index Cond: (groupid = 57925)
               Filter: ("location" @

'0103000020E610000001000000050000000000000000F05EC0000000A0874C42400000000000F05EC0000000406D7D44400000000000885DC0000000406D7D44400000000000885DC0000000A0874C42400000000000F05EC0000000A0874C4240

'::geometry)
Total runtime: 159.430 ms
(7 rows)

If I issue the same query over JDBC or use a PSQL stored procedure, it
takes over 3000 ms, which, of course is unacceptable!

Function Scan on gettilelocations  (cost=0.00..12.50 rows=1000
width=30) (actual time=3311.368..3319.265 rows=3261 loops=1)
Total runtime: 3322.529 ms
(2 rows)

The function gettilelocations is defined as:

CREATE OR REPLACE FUNCTION GetTileLocations(Integer, real, real, real,
real)
    RETURNS SETOF TileLocation
AS
'
    DECLARE
        R TileLocation;
    BEGIN
        FOR R IN SELECT DISTINCT latitude, longitude, color FROM
NewEntries
            WHERE groupid = $1 AND
                location @ SetSRID(MakeBox2D(SetSRID(MakePoint($2,
$3), 4326),
                                     SetSRID(MakePoint($4, $5), 4326)),
                   4326) LOOP
            RETURN NEXT R;
        END LOOP;
    RETURN;
    END;
'
LANGUAGE plpgsql STABLE RETURNS NULL ON NULL INPUT;

Can someone please tell me what we are doing wrong? Any help would be
greatly appreciated.

Thanks

Claire

  --
  Claire McLister                        mclister@zeesource.net
  21060 Homestead Road Suite 150
  Cupertino, CA 95014            408-733-2737(fax)

                      http://www.zeemaps.com




Re: JDBC/Stored procedure performance issue

From
Tom Lane
Date:
Claire McLister <mclister@zeesource.net> writes:
> When I do an EXPLAIN ANALYZE on one query that returns 3261 rows, it
> executes in a reasonable 159ms:
> ...
> If I issue the same query over JDBC or use a PSQL stored procedure, it
> takes over 3000 ms, which, of course is unacceptable!

I suspect that the problem is with "groupid = $1" instead of
"groupid = 57925".  The planner is probably avoiding an indexscan
in the parameterized case because it's guessing the actual value will
match so many rows as to make a seqscan faster.  Is the distribution
of groupid highly skewed?  You might get better results if you increase
the statistics target for that column.

Switching to something newer than 7.4.x might help too.  8.1 and up
support "bitmap" indexscans which work much better for large numbers
of hits, and correspondingly the planner will use one in cases where
it wouldn't use a plain indexscan.

            regards, tom lane

Re: JDBC/Stored procedure performance issue

From
Matthew Lunnon
Date:
Hi Tom,

Is there any way to work out what plan the query is using in side the function?  I think I have a similar problem with a query taking much longer from inside a function than it does as a select statement.

Regards
Matthew

Tom Lane wrote:
Claire McLister <mclister@zeesource.net> writes: 
When I do an EXPLAIN ANALYZE on one query that returns 3261 rows, it  
executes in a reasonable 159ms:
...
If I issue the same query over JDBC or use a PSQL stored procedure, it  
takes over 3000 ms, which, of course is unacceptable!   
I suspect that the problem is with "groupid = $1" instead of
"groupid = 57925".  The planner is probably avoiding an indexscan
in the parameterized case because it's guessing the actual value will
match so many rows as to make a seqscan faster.  Is the distribution
of groupid highly skewed?  You might get better results if you increase
the statistics target for that column.

Switching to something newer than 7.4.x might help too.  8.1 and up
support "bitmap" indexscans which work much better for large numbers
of hits, and correspondingly the planner will use one in cases where
it wouldn't use a plain indexscan.
		regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate      subscribe-nomail command to majordomo@postgresql.org so that your      message can get through to the mailing list cleanly

_____________________________________________________________________
This e-mail has been scanned for viruses by Verizon Business Internet Managed Scanning Services - powered by MessageLabs. For further information visit http://www.verizonbusiness.com/uk 

Re: JDBC/Stored procedure performance issue

From
Tom Lane
Date:
Matthew Lunnon <mlunnon@rwa-net.co.uk> writes:
> Is there any way to work out what plan the query is using in side the
> function?  I think I have a similar problem with a query taking much
> longer from inside a function than it does as a select statement.

Standard approach is to PREPARE a statement that has parameters in the
same places where the function uses variables/parameters, and then use
EXPLAIN [ANALYZE] EXECUTE to test it.

            regards, tom lane

Re: JDBC/Stored procedure performance issue

From
Claire McLister
Date:
Thanks, Tom. Looks like that was the issue.

I changed the function to use groupid = 57925 instead of groupid = $1
(I can do the same change in the JDBC prepare statement), and the
performance is much better.

It is still more than twice that of the simple query: 401.111 ms vs.
155.544 ms, which, however, is more acceptable than 3000ms.

Will upgrade to 8.1 at some point, but would like to get reasonable
performance with 7.4 until then. I did increase the statistics target
to 1000.

Claire

On Jan 28, 2008, at 12:51 PM, Tom Lane wrote:

> Claire McLister <mclister@zeesource.net> writes:
>> When I do an EXPLAIN ANALYZE on one query that returns 3261 rows, it
>> executes in a reasonable 159ms:
>> ...
>> If I issue the same query over JDBC or use a PSQL stored procedure,
>> it
>> takes over 3000 ms, which, of course is unacceptable!
>
> I suspect that the problem is with "groupid = $1" instead of
> "groupid = 57925".  The planner is probably avoiding an indexscan
> in the parameterized case because it's guessing the actual value will
> match so many rows as to make a seqscan faster.  Is the distribution
> of groupid highly skewed?  You might get better results if you
> increase
> the statistics target for that column.
>
> Switching to something newer than 7.4.x might help too.  8.1 and up
> support "bitmap" indexscans which work much better for large numbers
> of hits, and correspondingly the planner will use one in cases where
> it wouldn't use a plain indexscan.
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that
> your
>       message can get through to the mailing list cleanly