Thread: Trouble with plan statistics for behaviour for query.

Trouble with plan statistics for behaviour for query.

From
Trevor Campbell
Date:
We are having trouble with a particular query being slow in a strange manner.

The query is a join over two large tables that are suitably indexed.

select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
   from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=? order by CG.CREATED asc, CI.ID asc

For some tasks we run this particular query a very large number of times and it has a significant performance impact when it runs slowly.

If we run ANALYSE over the
CHANGEITEM table then the performance picks up by a factor of 5 or more.  The problem is that a day later the performance will have dropped back to its previously slow state.

The reason this is so hard to understand is that the activity on this table is very low, with no updates and only a relatively small number of inserts each day, < 0.1% of the table size.

Explain output:
Sort  (cost=86.90..86.93 rows=11 width=118) (actual time=0.086..0.087 rows=14 loops=1)
  Sort Key: cg.created, ci.id
  Sort Method: quicksort  Memory: 26kB
  ->  Nested Loop  (cost=0.00..86.71 rows=11 width=118) (actual time=0.022..0.061 rows=14 loops=1)
        ->  Index Scan using chggroup_issue on changegroup cg  (cost=0.00..17.91 rows=8 width=33) (actual time=0.012..0.015 rows=7 loops=1)
              Index Cond: (issueid = 81001::numeric)
        ->  Index Scan using chgitem_chggrp on changeitem ci  (cost=0.00..8.58 rows=2 width=91) (actual time=0.005..0.005 rows=2 loops=7)
              Index Cond: (groupid = cg.id)
Total runtime: 0.116 ms

The explain output always seems the same even when the performance is poor, but I can't be sure of that.

Overall it seems like PostgreSQL just forgets about the statistics it has gathered after a short while.

Schema details:
CREATE TABLE changegroup
(
  id numeric(18,0) NOT NULL,
  issueid numeric(18,0),
  author character varying(255),
  created timestamp with time zone,
  CONSTRAINT pk_changegroup PRIMARY KEY (id )
)
WITH (
  OIDS=FALSE
);
CREATE INDEX chggroup_issue
  ON changegroup
  USING btree
  (issueid );

CREATE TABLE changeitem
(
  id numeric(18,0) NOT NULL,
  groupid numeric(18,0),
  fieldtype character varying(255),
  field character varying(255),
  oldvalue text,
  oldstring text,
  newvalue text,
  newstring text,
  CONSTRAINT pk_changeitem PRIMARY KEY (id )
)
WITH (
  OIDS=FALSE
);

CREATE INDEX chgitem_chggrp
  ON changeitem
  USING btree
  (groupid );

CREATE INDEX chgitem_field
  ON changeitem
  USING btree
  (field COLLATE pg_catalog."default" );

Table sizes
changegroup  :  2,000,000 rows
changeitem    :  2,500,000  rows

The changegroup table has on average about 4 rows per issueid value, which is the query parameter.

We run autovacuum and autoanalyse, but as the activity in the table is low these are rarely if ever invoked on these tables.

Environment.
Testing using PostgreSQL 9.1.3 on x86_64-redhat-linux-gnu,  although this is a problem across a variety of postgres versions.

Re: Trouble with plan statistics for behaviour for query.

From
Craig James
Date:


On Thu, May 31, 2012 at 3:29 PM, Trevor Campbell <tcampbell@atlassian.com> wrote:
We are having trouble with a particular query being slow in a strange manner.

The query is a join over two large tables that are suitably indexed.

select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
   from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=? order by CG.CREATED asc, CI.ID asc

This has an unbound variable '?' in it.


For some tasks we run this particular query a very large number of times and it has a significant performance impact when it runs slowly.

If we run ANALYSE over the
CHANGEITEM table then the performance picks up by a factor of 5 or more.  The problem is that a day later the performance will have dropped back to its previously slow state.

The reason this is so hard to understand is that the activity on this table is very low, with no updates and only a relatively small number of inserts each day, < 0.1% of the table size.

Explain output:
Sort  (cost=86.90..86.93 rows=11 width=118) (actual time=0.086..0.087 rows=14 loops=1)
  Sort Key: cg.created, ci.id
  Sort Method: quicksort  Memory: 26kB
  ->  Nested Loop  (cost=0.00..86.71 rows=11 width=118) (actual time=0.022..0.061 rows=14 loops=1)
        ->  Index Scan using chggroup_issue on changegroup cg  (cost=0.00..17.91 rows=8 width=33) (actual time=0.012..0.015 rows=7 loops=1)
              Index Cond: (issueid = 81001::numeric)
        ->  Index Scan using chgitem_chggrp on changeitem ci  (cost=0.00..8.58 rows=2 width=91) (actual time=0.005..0.005 rows=2 loops=7)
              Index Cond: (groupid = cg.id)
Total runtime: 0.116 ms

What's the exact SQL you used to get this ... did you use a specific CG.ISSUEID to run your test?  If that's the case, this EXPLAIN ANALYZE won't be the same as the one generated for your actual application.

Craig

 

The explain output always seems the same even when the performance is poor, but I can't be sure of that.

Overall it seems like PostgreSQL just forgets about the statistics it has gathered after a short while.

Schema details:
CREATE TABLE changegroup
(
  id numeric(18,0) NOT NULL,
  issueid numeric(18,0),
  author character varying(255),
  created timestamp with time zone,
  CONSTRAINT pk_changegroup PRIMARY KEY (id )
)
WITH (
  OIDS=FALSE
);
CREATE INDEX chggroup_issue
  ON changegroup
  USING btree
  (issueid );

CREATE TABLE changeitem
(
  id numeric(18,0) NOT NULL,
  groupid numeric(18,0),
  fieldtype character varying(255),
  field character varying(255),
  oldvalue text,
  oldstring text,
  newvalue text,
  newstring text,
  CONSTRAINT pk_changeitem PRIMARY KEY (id )
)
WITH (
  OIDS=FALSE
);

CREATE INDEX chgitem_chggrp
  ON changeitem
  USING btree
  (groupid );

CREATE INDEX chgitem_field
  ON changeitem
  USING btree
  (field COLLATE pg_catalog."default" );

Table sizes
changegroup  :  2,000,000 rows
changeitem    :  2,500,000  rows

The changegroup table has on average about 4 rows per issueid value, which is the query parameter.

We run autovacuum and autoanalyse, but as the activity in the table is low these are rarely if ever invoked on these tables.

Environment.
Testing using PostgreSQL 9.1.3 on x86_64-redhat-linux-gnu,  although this is a problem across a variety of postgres versions.


Re: Trouble with plan statistics for behaviour for query.

From
Trevor Campbell
Date:
On 01/06/12 08:55, Craig James wrote:


On Thu, May 31, 2012 at 3:29 PM, Trevor Campbell <tcampbell@atlassian.com> wrote:
We are having trouble with a particular query being slow in a strange manner.

The query is a join over two large tables that are suitably indexed.

select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
   from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=? order by CG.CREATED asc, CI.ID asc

This has an unbound variable '?' in it.
These queries are being run from a java application using JDBC and when run the variable is bound to an long integer value.  While trying to investigate the problem, I have been just hard coding a value in the statement.


For some tasks we run this particular query a very large number of times and it has a significant performance impact when it runs slowly.

If we run ANALYSE over the
CHANGEITEM table then the performance picks up by a factor of 5 or more.  The problem is that a day later the performance will have dropped back to its previously slow state.

The reason this is so hard to understand is that the activity on this table is very low, with no updates and only a relatively small number of inserts each day, < 0.1% of the table size.

Explain output:
Sort  (cost=86.90..86.93 rows=11 width=118) (actual time=0.086..0.087 rows=14 loops=1)
  Sort Key: cg.created, ci.id
  Sort Method: quicksort  Memory: 26kB
  ->  Nested Loop  (cost=0.00..86.71 rows=11 width=118) (actual time=0.022..0.061 rows=14 loops=1)
        ->  Index Scan using chggroup_issue on changegroup cg  (cost=0.00..17.91 rows=8 width=33) (actual time=0.012..0.015 rows=7 loops=1)
              Index Cond: (issueid = 81001::numeric)
        ->  Index Scan using chgitem_chggrp on changeitem ci  (cost=0.00..8.58 rows=2 width=91) (actual time=0.005..0.005 rows=2 loops=7)
              Index Cond: (groupid = cg.id)
Total runtime: 0.116 ms

What's the exact SQL you used to get this ... did you use a specific CG.ISSUEID to run your test?  If that's the case, this EXPLAIN ANALYZE won't be the same as the one generated for your actual application.

Craig

 

The explain output always seems the same even when the performance is poor, but I can't be sure of that.

Overall it seems like PostgreSQL just forgets about the statistics it has gathered after a short while.

Schema details:
CREATE TABLE changegroup
(
  id numeric(18,0) NOT NULL,
  issueid numeric(18,0),
  author character varying(255),
  created timestamp with time zone,
  CONSTRAINT pk_changegroup PRIMARY KEY (id )
)
WITH (
  OIDS=FALSE
);
CREATE INDEX chggroup_issue
  ON changegroup
  USING btree
  (issueid );

CREATE TABLE changeitem
(
  id numeric(18,0) NOT NULL,
  groupid numeric(18,0),
  fieldtype character varying(255),
  field character varying(255),
  oldvalue text,
  oldstring text,
  newvalue text,
  newstring text,
  CONSTRAINT pk_changeitem PRIMARY KEY (id )
)
WITH (
  OIDS=FALSE
);

CREATE INDEX chgitem_chggrp
  ON changeitem
  USING btree
  (groupid );

CREATE INDEX chgitem_field
  ON changeitem
  USING btree
  (field COLLATE pg_catalog."default" );

Table sizes
changegroup  :  2,000,000 rows
changeitem    :  2,500,000  rows

The changegroup table has on average about 4 rows per issueid value, which is the query parameter.

We run autovacuum and autoanalyse, but as the activity in the table is low these are rarely if ever invoked on these tables.

Environment.
Testing using PostgreSQL 9.1.3 on x86_64-redhat-linux-gnu,  although this is a problem across a variety of postgres versions.


Re: Trouble with plan statistics for behaviour for query.

From
Craig James
Date:


On Thu, May 31, 2012 at 4:01 PM, Trevor Campbell <tcampbell@atlassian.com> wrote:
On 01/06/12 08:55, Craig James wrote:


On Thu, May 31, 2012 at 3:29 PM, Trevor Campbell <tcampbell@atlassian.com> wrote:
We are having trouble with a particular query being slow in a strange manner.

The query is a join over two large tables that are suitably indexed.

select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
   from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=? order by CG.CREATED asc, CI.ID asc

This has an unbound variable '?' in it.
These queries are being run from a java application using JDBC and when run the variable is bound to an long integer value.  While trying to investigate the problem, I have been just hard coding a value in the statement.

I use Perl, not JDBC, but this thread may be relevant to your problem.

http://postgresql.1045698.n5.nabble.com/Slow-statement-when-using-JDBC-td3368379.html

Craig


For some tasks we run this particular query a very large number of times and it has a significant performance impact when it runs slowly.

If we run ANALYSE over the
CHANGEITEM table then the performance picks up by a factor of 5 or more.  The problem is that a day later the performance will have dropped back to its previously slow state.

The reason this is so hard to understand is that the activity on this table is very low, with no updates and only a relatively small number of inserts each day, < 0.1% of the table size.

Explain output:
Sort  (cost=86.90..86.93 rows=11 width=118) (actual time=0.086..0.087 rows=14 loops=1)
  Sort Key: cg.created, ci.id
  Sort Method: quicksort  Memory: 26kB
  ->  Nested Loop  (cost=0.00..86.71 rows=11 width=118) (actual time=0.022..0.061 rows=14 loops=1)
        ->  Index Scan using chggroup_issue on changegroup cg  (cost=0.00..17.91 rows=8 width=33) (actual time=0.012..0.015 rows=7 loops=1)
              Index Cond: (issueid = 81001::numeric)
        ->  Index Scan using chgitem_chggrp on changeitem ci  (cost=0.00..8.58 rows=2 width=91) (actual time=0.005..0.005 rows=2 loops=7)
              Index Cond: (groupid = cg.id)
Total runtime: 0.116 ms

What's the exact SQL you used to get this ... did you use a specific CG.ISSUEID to run your test?  If that's the case, this EXPLAIN ANALYZE won't be the same as the one generated for your actual application.

Craig

 

The explain output always seems the same even when the performance is poor, but I can't be sure of that.

Overall it seems like PostgreSQL just forgets about the statistics it has gathered after a short while.

Schema details:
CREATE TABLE changegroup
(
  id numeric(18,0) NOT NULL,
  issueid numeric(18,0),
  author character varying(255),
  created timestamp with time zone,
  CONSTRAINT pk_changegroup PRIMARY KEY (id )
)
WITH (
  OIDS=FALSE
);
CREATE INDEX chggroup_issue
  ON changegroup
  USING btree
  (issueid );

CREATE TABLE changeitem
(
  id numeric(18,0) NOT NULL,
  groupid numeric(18,0),
  fieldtype character varying(255),
  field character varying(255),
  oldvalue text,
  oldstring text,
  newvalue text,
  newstring text,
  CONSTRAINT pk_changeitem PRIMARY KEY (id )
)
WITH (
  OIDS=FALSE
);

CREATE INDEX chgitem_chggrp
  ON changeitem
  USING btree
  (groupid );

CREATE INDEX chgitem_field
  ON changeitem
  USING btree
  (field COLLATE pg_catalog."default" );

Table sizes
changegroup  :  2,000,000 rows
changeitem    :  2,500,000  rows

The changegroup table has on average about 4 rows per issueid value, which is the query parameter.

We run autovacuum and autoanalyse, but as the activity in the table is low these are rarely if ever invoked on these tables.

Environment.
Testing using PostgreSQL 9.1.3 on x86_64-redhat-linux-gnu,  although this is a problem across a variety of postgres versions.



Re: Trouble with plan statistics for behaviour for query.

From
Trevor Campbell
Date:
Thanks Craig, that certainly leads down the right path.

The following is all done in pgAdmin3:

Using an actual value we I get the plan I expect
explain analyze select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE,
CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
    from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=10006 order by
CG.CREATED asc, CI.ID asc

"Sort  (cost=106.18..106.22 rows=13 width=434) (actual time=0.115..0.115 rows=12 loops=1)"
"  Sort Key: cg.created, ci.id"
"  Sort Method: quicksort  Memory: 29kB"
"  ->  Nested Loop  (cost=0.00..105.94 rows=13 width=434) (actual time=0.019..0.067 rows=12 loops=1)"
"        ->  Index Scan using chggroup_issue on changegroup cg  (cost=0.00..19.73 rows=10 width=29) (actual
time=0.009..0.013 rows=10 loops=1)"
"              Index Cond: (issueid = 10006::numeric)"
"        ->  Index Scan using chgitem_chggrp on changeitem ci  (cost=0.00..8.58 rows=3 width=411) (actual
time=0.004..0.005 rows=1 loops=10)"
"              Index Cond: (groupid = cg.id)"
"Total runtime: 0.153 ms"

Using a prepared statement with a variable , I get a poor plan requiring a sequential scan
prepare t2(real) as
   select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING,
CI.NEWVALUE, CI.NEWSTRING
    from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=$1 order by
CG.CREATED asc, CI.ID asc;

   explain analyze execute t2 (10006);

"Sort  (cost=126448.89..126481.10 rows=12886 width=434) (actual time=1335.615..1335.616 rows=12 loops=1)"
"  Sort Key: cg.created, ci.id"
"  Sort Method: quicksort  Memory: 29kB"
"  ->  Nested Loop  (cost=0.00..125569.19 rows=12886 width=434) (actual time=0.046..1335.556 rows=12 loops=1)"
"        ->  Seq Scan on changegroup cg  (cost=0.00..44709.26 rows=10001 width=29) (actual time=0.026..1335.460 rows=10

loops=1)"
"              Filter: ((issueid)::double precision = $1)"
"        ->  Index Scan using chgitem_chggrp on changeitem ci  (cost=0.00..8.05 rows=3 width=411) (actual
time=0.007..0.008 rows=1 loops=10)"
"              Index Cond: (groupid = cg.id)"
"Total runtime: 1335.669 ms"

Using a prepared statement with a cast of the variable to the right type, I get the good plan back
prepare t2(real) as
   select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING,
CI.NEWVALUE, CI.NEWSTRING
    from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=cast($1 as
numeric) order by CG.CREATED asc, CI.ID asc;

   explain analyze execute t2 (10006);

"Sort  (cost=106.19..106.22 rows=13 width=434) (actual time=0.155..0.156 rows=12 loops=1)"
"  Sort Key: cg.created, ci.id"
"  Sort Method: quicksort  Memory: 29kB"
"  ->  Nested Loop  (cost=0.00..105.95 rows=13 width=434) (actual time=0.048..0.111 rows=12 loops=1)"
"        ->  Index Scan using chggroup_issue on changegroup cg  (cost=0.00..19.73 rows=10 width=29) (actual
time=0.031..0.042 rows=10 loops=1)"
"              Index Cond: (issueid = ($1)::numeric)"
"        ->  Index Scan using chgitem_chggrp on changeitem ci  (cost=0.00..8.58 rows=3 width=411) (actual
time=0.006..0.006 rows=1 loops=10)"
"              Index Cond: (groupid = cg.id)"
"Total runtime: 0.203 ms"

Now the challenge is to get java/jdbc to get this done right.  We make a big effort to ensure we always use prepared
statements and variable bindings to help protect from SQL injection vulnerabilities.



On 01/06/12 09:08, Craig James wrote:
> I use Perl, not JDBC, but this thread may be relevant to your problem.
>
> http://postgresql.1045698.n5.nabble.com/Slow-statement-when-using-JDBC-td3368379.html
>
>

Re: Trouble with plan statistics for behaviour for query.

From
Craig James
Date:
On Thu, May 31, 2012 at 4:34 PM, Trevor Campbell <tcampbell@atlassian.com> wrote:
Thanks Craig, that certainly leads down the right path.

The following is all done in pgAdmin3:

Using an actual value we I get the plan I expect
explain analyze select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
  from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=10006 order by CG.CREATED asc, CI.ID asc

"Sort  (cost=106.18..106.22 rows=13 width=434) (actual time=0.115..0.115 rows=12 loops=1)"
"  Sort Key: cg.created, ci.id"
"  Sort Method: quicksort  Memory: 29kB"
"  ->  Nested Loop  (cost=0.00..105.94 rows=13 width=434) (actual time=0.019..0.067 rows=12 loops=1)"
"        ->  Index Scan using chggroup_issue on changegroup cg  (cost=0.00..19.73 rows=10 width=29) (actual time=0.009..0.013 rows=10 loops=1)"
"              Index Cond: (issueid = 10006::numeric)"
"        ->  Index Scan using chgitem_chggrp on changeitem ci  (cost=0.00..8.58 rows=3 width=411) (actual time=0.004..0.005 rows=1 loops=10)"
"              Index Cond: (groupid = cg.id)"
"Total runtime: 0.153 ms"

Using a prepared statement with a variable , I get a poor plan requiring a sequential scan
prepare t2(real) as
 select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
  from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=$1 order by CG.CREATED asc, CI.ID asc;

 explain analyze execute t2 (10006);

"Sort  (cost=126448.89..126481.10 rows=12886 width=434) (actual time=1335.615..1335.616 rows=12 loops=1)"
"  Sort Key: cg.created, ci.id"
"  Sort Method: quicksort  Memory: 29kB"
"  ->  Nested Loop  (cost=0.00..125569.19 rows=12886 width=434) (actual time=0.046..1335.556 rows=12 loops=1)"
"        ->  Seq Scan on changegroup cg  (cost=0.00..44709.26 rows=10001 width=29) (actual time=0.026..1335.460 rows=10 loops=1)"
"              Filter: ((issueid)::double precision = $1)"
"        ->  Index Scan using chgitem_chggrp on changeitem ci  (cost=0.00..8.05 rows=3 width=411) (actual time=0.007..0.008 rows=1 loops=10)"
"              Index Cond: (groupid = cg.id)"
"Total runtime: 1335.669 ms"

Using a prepared statement with a cast of the variable to the right type, I get the good plan back
prepare t2(real) as
 select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
  from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=cast($1 as numeric) order by CG.CREATED asc, CI.ID asc;

 explain analyze execute t2 (10006);

"Sort  (cost=106.19..106.22 rows=13 width=434) (actual time=0.155..0.156 rows=12 loops=1)"
"  Sort Key: cg.created, ci.id"
"  Sort Method: quicksort  Memory: 29kB"
"  ->  Nested Loop  (cost=0.00..105.95 rows=13 width=434) (actual time=0.048..0.111 rows=12 loops=1)"
"        ->  Index Scan using chggroup_issue on changegroup cg  (cost=0.00..19.73 rows=10 width=29) (actual time=0.031..0.042 rows=10 loops=1)"
"              Index Cond: (issueid = ($1)::numeric)"
"        ->  Index Scan using chgitem_chggrp on changeitem ci  (cost=0.00..8.58 rows=3 width=411) (actual time=0.006..0.006 rows=1 loops=10)"
"              Index Cond: (groupid = cg.id)"
"Total runtime: 0.203 ms"

Now the challenge is to get java/jdbc to get this done right.  We make a big effort to ensure we always use prepared statements and variable bindings to help protect from SQL injection vulnerabilities.

JDBC has some features that are supposed to be convenient (automatic preparing based on a number-of-executions threshold) that strike me as misguided.  It's one thing to hide irrelevant details from the app, and another thing entirely to cause a huge change in the exact SQL that's sent to the server ... which is what JDBC seems to do.

I think the trick is that if you use JDBC prepared statements, you have to understand how it's trying to be trickly and circumvent it so that you're always in full control of what it's doing.

Craig
 



On 01/06/12 09:08, Craig James wrote:
I use Perl, not JDBC, but this thread may be relevant to your problem.

http://postgresql.1045698.n5.nabble.com/Slow-statement-when-using-JDBC-td3368379.html



Re: Trouble with plan statistics for behaviour for query.

From
Vitalii Tymchyshyn
Date:
If I am correct, JDBC uses named portal only on the 5th time you use
PreparedStatement (configurable). Before it uses unnamed thing that
should work as if you did embed the value. So the solution is to
recreate PreparedStatement each time (so you will have no problems with
SQL injection). Note that "smart" pools may detect this situation and
reuse PreparedStatement for same query texts internally. If so, this to
switch this off.
In case you still have problems, I'd recommend you to ask in postgresql
jdbc mailing list.
Also I've heard that somewhere in 9.2 postgresql server may replan such
cases each time.

Best regards, Vitalii Tymchyshyn

01.06.12 02:34, Trevor Campbell написав(ла):
> Thanks Craig, that certainly leads down the right path.
>
> The following is all done in pgAdmin3:
>
> Using an actual value we I get the plan I expect
> explain analyze select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED,
> CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE,
> CI.NEWSTRING
> from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID =
> CI.GROUPID where CG.ISSUEID=10006 order by CG.CREATED asc, CI.ID asc
>
> "Sort (cost=106.18..106.22 rows=13 width=434) (actual
> time=0.115..0.115 rows=12 loops=1)"
> " Sort Key: cg.created, ci.id"
> " Sort Method: quicksort Memory: 29kB"
> " -> Nested Loop (cost=0.00..105.94 rows=13 width=434) (actual
> time=0.019..0.067 rows=12 loops=1)"
> " -> Index Scan using chggroup_issue on changegroup cg
> (cost=0.00..19.73 rows=10 width=29) (actual time=0.009..0.013 rows=10
> loops=1)"
> " Index Cond: (issueid = 10006::numeric)"
> " -> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58
> rows=3 width=411) (actual time=0.004..0.005 rows=1 loops=10)"
> " Index Cond: (groupid = cg.id)"
> "Total runtime: 0.153 ms"
>
> Using a prepared statement with a variable , I get a poor plan
> requiring a sequential scan
> prepare t2(real) as
> select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE,
> CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
> from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID =
> CI.GROUPID where CG.ISSUEID=$1 order by CG.CREATED asc, CI.ID asc;
>
> explain analyze execute t2 (10006);
>
> "Sort (cost=126448.89..126481.10 rows=12886 width=434) (actual
> time=1335.615..1335.616 rows=12 loops=1)"
> " Sort Key: cg.created, ci.id"
> " Sort Method: quicksort Memory: 29kB"
> " -> Nested Loop (cost=0.00..125569.19 rows=12886 width=434) (actual
> time=0.046..1335.556 rows=12 loops=1)"
> " -> Seq Scan on changegroup cg (cost=0.00..44709.26 rows=10001
> width=29) (actual time=0.026..1335.460 rows=10 loops=1)"
> " Filter: ((issueid)::double precision = $1)"
> " -> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.05
> rows=3 width=411) (actual time=0.007..0.008 rows=1 loops=10)"
> " Index Cond: (groupid = cg.id)"
> "Total runtime: 1335.669 ms"
>
> Using a prepared statement with a cast of the variable to the right
> type, I get the good plan back
> prepare t2(real) as
> select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE,
> CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
> from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID =
> CI.GROUPID where CG.ISSUEID=cast($1 as numeric) order by CG.CREATED
> asc, CI.ID asc;
>
> explain analyze execute t2 (10006);
>
> "Sort (cost=106.19..106.22 rows=13 width=434) (actual
> time=0.155..0.156 rows=12 loops=1)"
> " Sort Key: cg.created, ci.id"
> " Sort Method: quicksort Memory: 29kB"
> " -> Nested Loop (cost=0.00..105.95 rows=13 width=434) (actual
> time=0.048..0.111 rows=12 loops=1)"
> " -> Index Scan using chggroup_issue on changegroup cg
> (cost=0.00..19.73 rows=10 width=29) (actual time=0.031..0.042 rows=10
> loops=1)"
> " Index Cond: (issueid = ($1)::numeric)"
> " -> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58
> rows=3 width=411) (actual time=0.006..0.006 rows=1 loops=10)"
> " Index Cond: (groupid = cg.id)"
> "Total runtime: 0.203 ms"
>
> Now the challenge is to get java/jdbc to get this done right. We make
> a big effort to ensure we always use prepared statements and variable
> bindings to help protect from SQL injection vulnerabilities.
>
>
>
> On 01/06/12 09:08, Craig James wrote:
>> I use Perl, not JDBC, but this thread may be relevant to your problem.
>>
>> http://postgresql.1045698.n5.nabble.com/Slow-statement-when-using-JDBC-td3368379.html
>>
>>
>>
>


Re: Trouble with plan statistics for behaviour for query.

From
Maciek Sakrejda
Date:
> If I am correct, JDBC uses named portal only on the 5th time you use
> PreparedStatement (configurable). Before it uses unnamed thing that should
> work as if you did embed the value.

If this is due to the difference in parameter type information, this
doesn't have anything to do with named portals.

My guess is that the driver has one idea about parameter types (based
on either the specific setTypeFoo call or the Java type of the
parameter passed to setObject), and the server another (based on the
type information of the CHANGEGROUP.ISSUEID column). Actually, I'm
rather surprised to see 'real' there: if you're using setObject with a
Long, I would imagine that turns into a bigint (which I believe the
server knows how to coerce to numeric). Can you show us your JDBC
code?

Array fundamentals

From
idc danny
Date:
Hi all,
I got 2 complementary functions, which will do opposite things.
1 - CombineStrings(stringarray character varying[])  RETURNS character varying
This one will take as parameter an array of strings and will return a string with some formatted information inside
2-  SplitString2Array(stringtosplit character varying)  RETURNS character varying[] 
This one will take as parameter a formatted string and will return an array of string

The following is true, both works just fine : 
select SplitString2Array(CombineStrings(ARRAY['abba', 'queen']))
will return {'abba', 'queen'}

Now, if I want do do the following:
select CombineStrings(ARRAY[SplitString2Array("SomeTextColumn"), 'New string to add']) from "SomeTable"
i get the following error:
array value must start with "{" or dimension information

What am I doing wrong, I am feeling I still don't get the array fundamentals. My goal is to add to inside formatted information in the column "SomeTextColumn" my new string 'New string to add' in the same manner if I would been used the following:
Insert into "SomeTable"("SomeTextColumn") values (CombineString(ARRAY['abba', 'queen', 'New string to add']))

Thank you in advance,
Danny

Re: Trouble with plan statistics for behaviour for query.

From
Trevor Campbell
Date:
Thanks for all your help so far.  I have been away for a couple of days so my apologies for not replying earlier.

We are using a third party library to run our SQL via JDBC (not one of the common ones like Hibernate etc), but I have been able to dig out the exact statements run in the scenario we are experiencing.

We always run a prepare and then execute a query as follows, we never reuse the prepared statement:

_ps = _connection.prepareStatement(sql, resultSetType, resultSetConcurrency); 

// The values here are:
// sql = SELECT CG.ID, CG.issueid, CG.AUTHOR, CG.CREATED, CI.ID, CI.groupid, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING 
//                           FROM public.changegroup CG INNER JOIN public.changeitem CI ON CG.ID = CI.groupid
//                           WHERE CG.issueid=? ORDER BY CG.CREATED ASC, CI.ID ASC
// resultSetType = ResultSet.TYPE_FORWARD_ONLY
// resultSetConcurrency =  ResultSet.CONCUR_READ_ONLY

_ps.setLong(1, field);     

_rs = _ps.executeQuery();

On 02/06/12 01:38, Maciek Sakrejda wrote:
If I am correct, JDBC uses named portal only on the 5th time you use
PreparedStatement (configurable). Before it uses unnamed thing that should
work as if you did embed the value.
If this is due to the difference in parameter type information, this
doesn't have anything to do with named portals.

My guess is that the driver has one idea about parameter types (based
on either the specific setTypeFoo call or the Java type of the
parameter passed to setObject), and the server another (based on the
type information of the CHANGEGROUP.ISSUEID column). Actually, I'm
rather surprised to see 'real' there: if you're using setObject with a
Long, I would imagine that turns into a bigint (which I believe the
server knows how to coerce to numeric). Can you show us your JDBC
code?

Re: Trouble with plan statistics for behaviour for query.

From
Trevor Campbell
Date:
> Actually, I'm rather surprised to see 'real' there: if you're using setObject with a Long, I would imagine that turns

> into a bigint (which I believe the server knows how to coerce to numeric).
The (real) is just my fault in testing.  I just copy/pasted from elsewhere and it is not what is coming from the JDBC
driver,  I don't know how to see what the driver is actually sending.

Re: Array fundamentals

From
Robert Haas
Date:
On Sat, Jun 2, 2012 at 1:05 PM, idc danny <idcdanny@yahoo.com> wrote:
> Hi all,
> I got 2 complementary functions, which will do opposite things.
> 1 - CombineStrings(stringarray character varying[])  RETURNS character
> varying
> This one will take as parameter an array of strings and will return a string
> with some formatted information inside
> 2-  SplitString2Array(stringtosplit character varying)  RETURNS character
> varying[]
> This one will take as parameter a formatted string and will return an array
> of string
>
> The following is true, both works just fine :
> select SplitString2Array(CombineStrings(ARRAY['abba', 'queen']))
> will return {'abba', 'queen'}
>
> Now, if I want do do the following:
> select CombineStrings(ARRAY[SplitString2Array("SomeTextColumn"), 'New string
> to add']) from "SomeTable"
> i get the following error:
> array value must start with "{" or dimension information
>
> What am I doing wrong, I am feeling I still don't get the array
> fundamentals. My goal is to add to inside formatted information in the
> column "SomeTextColumn" my new string 'New string to add' in the same manner
> if I would been used the following:
> Insert into "SomeTable"("SomeTextColumn") values
> (CombineString(ARRAY['abba', 'queen', 'New string to add']))

It sounds like one or both of your functions have a bug in them, but
without knowing what they're supposed to do or seeing the source code,
it's pretty hard to guess what it might be.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company