Thread: Trouble with plan statistics for behaviour for query.
We are having trouble with a particular query being slow in a strange manner.
The query is a join over two large tables that are suitably indexed.
select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=? order by CG.CREATED asc, CI.ID asc
For some tasks we run this particular query a very large number of times and it has a significant performance impact when it runs slowly.
If we run ANALYSE over the CHANGEITEM table then the performance picks up by a factor of 5 or more. The problem is that a day later the performance will have dropped back to its previously slow state.
The reason this is so hard to understand is that the activity on this table is very low, with no updates and only a relatively small number of inserts each day, < 0.1% of the table size.
Explain output:
Sort (cost=86.90..86.93 rows=11 width=118) (actual time=0.086..0.087 rows=14 loops=1)
Sort Key: cg.created, ci.id
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=0.00..86.71 rows=11 width=118) (actual time=0.022..0.061 rows=14 loops=1)
-> Index Scan using chggroup_issue on changegroup cg (cost=0.00..17.91 rows=8 width=33) (actual time=0.012..0.015 rows=7 loops=1)
Index Cond: (issueid = 81001::numeric)
-> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58 rows=2 width=91) (actual time=0.005..0.005 rows=2 loops=7)
Index Cond: (groupid = cg.id)
Total runtime: 0.116 ms
The explain output always seems the same even when the performance is poor, but I can't be sure of that.
Overall it seems like PostgreSQL just forgets about the statistics it has gathered after a short while.
Schema details:
CREATE TABLE changegroup
(
id numeric(18,0) NOT NULL,
issueid numeric(18,0),
author character varying(255),
created timestamp with time zone,
CONSTRAINT pk_changegroup PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
CREATE INDEX chggroup_issue
ON changegroup
USING btree
(issueid );
CREATE TABLE changeitem
(
id numeric(18,0) NOT NULL,
groupid numeric(18,0),
fieldtype character varying(255),
field character varying(255),
oldvalue text,
oldstring text,
newvalue text,
newstring text,
CONSTRAINT pk_changeitem PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
CREATE INDEX chgitem_chggrp
ON changeitem
USING btree
(groupid );
CREATE INDEX chgitem_field
ON changeitem
USING btree
(field COLLATE pg_catalog."default" );
Table sizes
changegroup : 2,000,000 rows
changeitem : 2,500,000 rows
The changegroup table has on average about 4 rows per issueid value, which is the query parameter.
We run autovacuum and autoanalyse, but as the activity in the table is low these are rarely if ever invoked on these tables.
Environment.
Testing using PostgreSQL 9.1.3 on x86_64-redhat-linux-gnu, although this is a problem across a variety of postgres versions.
The query is a join over two large tables that are suitably indexed.
select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=? order by CG.CREATED asc, CI.ID asc
For some tasks we run this particular query a very large number of times and it has a significant performance impact when it runs slowly.
If we run ANALYSE over the CHANGEITEM table then the performance picks up by a factor of 5 or more. The problem is that a day later the performance will have dropped back to its previously slow state.
The reason this is so hard to understand is that the activity on this table is very low, with no updates and only a relatively small number of inserts each day, < 0.1% of the table size.
Explain output:
Sort (cost=86.90..86.93 rows=11 width=118) (actual time=0.086..0.087 rows=14 loops=1)
Sort Key: cg.created, ci.id
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=0.00..86.71 rows=11 width=118) (actual time=0.022..0.061 rows=14 loops=1)
-> Index Scan using chggroup_issue on changegroup cg (cost=0.00..17.91 rows=8 width=33) (actual time=0.012..0.015 rows=7 loops=1)
Index Cond: (issueid = 81001::numeric)
-> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58 rows=2 width=91) (actual time=0.005..0.005 rows=2 loops=7)
Index Cond: (groupid = cg.id)
Total runtime: 0.116 ms
The explain output always seems the same even when the performance is poor, but I can't be sure of that.
Overall it seems like PostgreSQL just forgets about the statistics it has gathered after a short while.
Schema details:
CREATE TABLE changegroup
(
id numeric(18,0) NOT NULL,
issueid numeric(18,0),
author character varying(255),
created timestamp with time zone,
CONSTRAINT pk_changegroup PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
CREATE INDEX chggroup_issue
ON changegroup
USING btree
(issueid );
CREATE TABLE changeitem
(
id numeric(18,0) NOT NULL,
groupid numeric(18,0),
fieldtype character varying(255),
field character varying(255),
oldvalue text,
oldstring text,
newvalue text,
newstring text,
CONSTRAINT pk_changeitem PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
CREATE INDEX chgitem_chggrp
ON changeitem
USING btree
(groupid );
CREATE INDEX chgitem_field
ON changeitem
USING btree
(field COLLATE pg_catalog."default" );
Table sizes
changegroup : 2,000,000 rows
changeitem : 2,500,000 rows
The changegroup table has on average about 4 rows per issueid value, which is the query parameter.
We run autovacuum and autoanalyse, but as the activity in the table is low these are rarely if ever invoked on these tables.
Environment.
Testing using PostgreSQL 9.1.3 on x86_64-redhat-linux-gnu, although this is a problem across a variety of postgres versions.
On Thu, May 31, 2012 at 3:29 PM, Trevor Campbell <tcampbell@atlassian.com> wrote:
This has an unbound variable '?' in it.
What's the exact SQL you used to get this ... did you use a specific CG.ISSUEID to run your test? If that's the case, this EXPLAIN ANALYZE won't be the same as the one generated for your actual application.
Craig
We are having trouble with a particular query being slow in a strange manner.
The query is a join over two large tables that are suitably indexed.
select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=? order by CG.CREATED asc, CI.ID asc
This has an unbound variable '?' in it.
For some tasks we run this particular query a very large number of times and it has a significant performance impact when it runs slowly.
If we run ANALYSE over the CHANGEITEM table then the performance picks up by a factor of 5 or more. The problem is that a day later the performance will have dropped back to its previously slow state.
The reason this is so hard to understand is that the activity on this table is very low, with no updates and only a relatively small number of inserts each day, < 0.1% of the table size.
Explain output:
Sort (cost=86.90..86.93 rows=11 width=118) (actual time=0.086..0.087 rows=14 loops=1)
Sort Key: cg.created, ci.id
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=0.00..86.71 rows=11 width=118) (actual time=0.022..0.061 rows=14 loops=1)
-> Index Scan using chggroup_issue on changegroup cg (cost=0.00..17.91 rows=8 width=33) (actual time=0.012..0.015 rows=7 loops=1)
Index Cond: (issueid = 81001::numeric)
-> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58 rows=2 width=91) (actual time=0.005..0.005 rows=2 loops=7)
Index Cond: (groupid = cg.id)
Total runtime: 0.116 ms
What's the exact SQL you used to get this ... did you use a specific CG.ISSUEID to run your test? If that's the case, this EXPLAIN ANALYZE won't be the same as the one generated for your actual application.
Craig
The explain output always seems the same even when the performance is poor, but I can't be sure of that.
Overall it seems like PostgreSQL just forgets about the statistics it has gathered after a short while.
Schema details:
CREATE TABLE changegroup
(
id numeric(18,0) NOT NULL,
issueid numeric(18,0),
author character varying(255),
created timestamp with time zone,
CONSTRAINT pk_changegroup PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
CREATE INDEX chggroup_issue
ON changegroup
USING btree
(issueid );
CREATE TABLE changeitem
(
id numeric(18,0) NOT NULL,
groupid numeric(18,0),
fieldtype character varying(255),
field character varying(255),
oldvalue text,
oldstring text,
newvalue text,
newstring text,
CONSTRAINT pk_changeitem PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
CREATE INDEX chgitem_chggrp
ON changeitem
USING btree
(groupid );
CREATE INDEX chgitem_field
ON changeitem
USING btree
(field COLLATE pg_catalog."default" );
Table sizes
changegroup : 2,000,000 rows
changeitem : 2,500,000 rows
The changegroup table has on average about 4 rows per issueid value, which is the query parameter.
We run autovacuum and autoanalyse, but as the activity in the table is low these are rarely if ever invoked on these tables.
Environment.
Testing using PostgreSQL 9.1.3 on x86_64-redhat-linux-gnu, although this is a problem across a variety of postgres versions.
On 01/06/12 08:55, Craig James wrote:
These queries are being run from a java application using JDBC and when run the variable is bound to an long integer value. While trying to investigate the problem, I have been just hard coding a value in the statement.On Thu, May 31, 2012 at 3:29 PM, Trevor Campbell <tcampbell@atlassian.com> wrote:We are having trouble with a particular query being slow in a strange manner.
The query is a join over two large tables that are suitably indexed.
select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=? order by CG.CREATED asc, CI.ID asc
This has an unbound variable '?' in it.
For some tasks we run this particular query a very large number of times and it has a significant performance impact when it runs slowly.
If we run ANALYSE over the CHANGEITEM table then the performance picks up by a factor of 5 or more. The problem is that a day later the performance will have dropped back to its previously slow state.
The reason this is so hard to understand is that the activity on this table is very low, with no updates and only a relatively small number of inserts each day, < 0.1% of the table size.
Explain output:
Sort (cost=86.90..86.93 rows=11 width=118) (actual time=0.086..0.087 rows=14 loops=1)
Sort Key: cg.created, ci.id
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=0.00..86.71 rows=11 width=118) (actual time=0.022..0.061 rows=14 loops=1)
-> Index Scan using chggroup_issue on changegroup cg (cost=0.00..17.91 rows=8 width=33) (actual time=0.012..0.015 rows=7 loops=1)
Index Cond: (issueid = 81001::numeric)
-> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58 rows=2 width=91) (actual time=0.005..0.005 rows=2 loops=7)
Index Cond: (groupid = cg.id)
Total runtime: 0.116 ms
What's the exact SQL you used to get this ... did you use a specific CG.ISSUEID to run your test? If that's the case, this EXPLAIN ANALYZE won't be the same as the one generated for your actual application.
Craig
The explain output always seems the same even when the performance is poor, but I can't be sure of that.
Overall it seems like PostgreSQL just forgets about the statistics it has gathered after a short while.
Schema details:
CREATE TABLE changegroup
(
id numeric(18,0) NOT NULL,
issueid numeric(18,0),
author character varying(255),
created timestamp with time zone,
CONSTRAINT pk_changegroup PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
CREATE INDEX chggroup_issue
ON changegroup
USING btree
(issueid );
CREATE TABLE changeitem
(
id numeric(18,0) NOT NULL,
groupid numeric(18,0),
fieldtype character varying(255),
field character varying(255),
oldvalue text,
oldstring text,
newvalue text,
newstring text,
CONSTRAINT pk_changeitem PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
CREATE INDEX chgitem_chggrp
ON changeitem
USING btree
(groupid );
CREATE INDEX chgitem_field
ON changeitem
USING btree
(field COLLATE pg_catalog."default" );
Table sizes
changegroup : 2,000,000 rows
changeitem : 2,500,000 rows
The changegroup table has on average about 4 rows per issueid value, which is the query parameter.
We run autovacuum and autoanalyse, but as the activity in the table is low these are rarely if ever invoked on these tables.
Environment.
Testing using PostgreSQL 9.1.3 on x86_64-redhat-linux-gnu, although this is a problem across a variety of postgres versions.
On Thu, May 31, 2012 at 4:01 PM, Trevor Campbell <tcampbell@atlassian.com> wrote:
I use Perl, not JDBC, but this thread may be relevant to your problem.
http://postgresql.1045698.n5.nabble.com/Slow-statement-when-using-JDBC-td3368379.html
Craig
On 01/06/12 08:55, Craig James wrote:These queries are being run from a java application using JDBC and when run the variable is bound to an long integer value. While trying to investigate the problem, I have been just hard coding a value in the statement.On Thu, May 31, 2012 at 3:29 PM, Trevor Campbell <tcampbell@atlassian.com> wrote:We are having trouble with a particular query being slow in a strange manner.
The query is a join over two large tables that are suitably indexed.
select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=? order by CG.CREATED asc, CI.ID asc
This has an unbound variable '?' in it.
I use Perl, not JDBC, but this thread may be relevant to your problem.
http://postgresql.1045698.n5.nabble.com/Slow-statement-when-using-JDBC-td3368379.html
Craig
For some tasks we run this particular query a very large number of times and it has a significant performance impact when it runs slowly.
If we run ANALYSE over the CHANGEITEM table then the performance picks up by a factor of 5 or more. The problem is that a day later the performance will have dropped back to its previously slow state.
The reason this is so hard to understand is that the activity on this table is very low, with no updates and only a relatively small number of inserts each day, < 0.1% of the table size.
Explain output:
Sort (cost=86.90..86.93 rows=11 width=118) (actual time=0.086..0.087 rows=14 loops=1)
Sort Key: cg.created, ci.id
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=0.00..86.71 rows=11 width=118) (actual time=0.022..0.061 rows=14 loops=1)
-> Index Scan using chggroup_issue on changegroup cg (cost=0.00..17.91 rows=8 width=33) (actual time=0.012..0.015 rows=7 loops=1)
Index Cond: (issueid = 81001::numeric)
-> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58 rows=2 width=91) (actual time=0.005..0.005 rows=2 loops=7)
Index Cond: (groupid = cg.id)
Total runtime: 0.116 ms
What's the exact SQL you used to get this ... did you use a specific CG.ISSUEID to run your test? If that's the case, this EXPLAIN ANALYZE won't be the same as the one generated for your actual application.
Craig
The explain output always seems the same even when the performance is poor, but I can't be sure of that.
Overall it seems like PostgreSQL just forgets about the statistics it has gathered after a short while.
Schema details:
CREATE TABLE changegroup
(
id numeric(18,0) NOT NULL,
issueid numeric(18,0),
author character varying(255),
created timestamp with time zone,
CONSTRAINT pk_changegroup PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
CREATE INDEX chggroup_issue
ON changegroup
USING btree
(issueid );
CREATE TABLE changeitem
(
id numeric(18,0) NOT NULL,
groupid numeric(18,0),
fieldtype character varying(255),
field character varying(255),
oldvalue text,
oldstring text,
newvalue text,
newstring text,
CONSTRAINT pk_changeitem PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
CREATE INDEX chgitem_chggrp
ON changeitem
USING btree
(groupid );
CREATE INDEX chgitem_field
ON changeitem
USING btree
(field COLLATE pg_catalog."default" );
Table sizes
changegroup : 2,000,000 rows
changeitem : 2,500,000 rows
The changegroup table has on average about 4 rows per issueid value, which is the query parameter.
We run autovacuum and autoanalyse, but as the activity in the table is low these are rarely if ever invoked on these tables.
Environment.
Testing using PostgreSQL 9.1.3 on x86_64-redhat-linux-gnu, although this is a problem across a variety of postgres versions.
Thanks Craig, that certainly leads down the right path. The following is all done in pgAdmin3: Using an actual value we I get the plan I expect explain analyze select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=10006 order by CG.CREATED asc, CI.ID asc "Sort (cost=106.18..106.22 rows=13 width=434) (actual time=0.115..0.115 rows=12 loops=1)" " Sort Key: cg.created, ci.id" " Sort Method: quicksort Memory: 29kB" " -> Nested Loop (cost=0.00..105.94 rows=13 width=434) (actual time=0.019..0.067 rows=12 loops=1)" " -> Index Scan using chggroup_issue on changegroup cg (cost=0.00..19.73 rows=10 width=29) (actual time=0.009..0.013 rows=10 loops=1)" " Index Cond: (issueid = 10006::numeric)" " -> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58 rows=3 width=411) (actual time=0.004..0.005 rows=1 loops=10)" " Index Cond: (groupid = cg.id)" "Total runtime: 0.153 ms" Using a prepared statement with a variable , I get a poor plan requiring a sequential scan prepare t2(real) as select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=$1 order by CG.CREATED asc, CI.ID asc; explain analyze execute t2 (10006); "Sort (cost=126448.89..126481.10 rows=12886 width=434) (actual time=1335.615..1335.616 rows=12 loops=1)" " Sort Key: cg.created, ci.id" " Sort Method: quicksort Memory: 29kB" " -> Nested Loop (cost=0.00..125569.19 rows=12886 width=434) (actual time=0.046..1335.556 rows=12 loops=1)" " -> Seq Scan on changegroup cg (cost=0.00..44709.26 rows=10001 width=29) (actual time=0.026..1335.460 rows=10 loops=1)" " Filter: ((issueid)::double precision = $1)" " -> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.05 rows=3 width=411) (actual time=0.007..0.008 rows=1 loops=10)" " Index Cond: (groupid = cg.id)" "Total runtime: 1335.669 ms" Using a prepared statement with a cast of the variable to the right type, I get the good plan back prepare t2(real) as select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=cast($1 as numeric) order by CG.CREATED asc, CI.ID asc; explain analyze execute t2 (10006); "Sort (cost=106.19..106.22 rows=13 width=434) (actual time=0.155..0.156 rows=12 loops=1)" " Sort Key: cg.created, ci.id" " Sort Method: quicksort Memory: 29kB" " -> Nested Loop (cost=0.00..105.95 rows=13 width=434) (actual time=0.048..0.111 rows=12 loops=1)" " -> Index Scan using chggroup_issue on changegroup cg (cost=0.00..19.73 rows=10 width=29) (actual time=0.031..0.042 rows=10 loops=1)" " Index Cond: (issueid = ($1)::numeric)" " -> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58 rows=3 width=411) (actual time=0.006..0.006 rows=1 loops=10)" " Index Cond: (groupid = cg.id)" "Total runtime: 0.203 ms" Now the challenge is to get java/jdbc to get this done right. We make a big effort to ensure we always use prepared statements and variable bindings to help protect from SQL injection vulnerabilities. On 01/06/12 09:08, Craig James wrote: > I use Perl, not JDBC, but this thread may be relevant to your problem. > > http://postgresql.1045698.n5.nabble.com/Slow-statement-when-using-JDBC-td3368379.html > >
On Thu, May 31, 2012 at 4:34 PM, Trevor Campbell <tcampbell@atlassian.com> wrote:
JDBC has some features that are supposed to be convenient (automatic preparing based on a number-of-executions threshold) that strike me as misguided. It's one thing to hide irrelevant details from the app, and another thing entirely to cause a huge change in the exact SQL that's sent to the server ... which is what JDBC seems to do.
I think the trick is that if you use JDBC prepared statements, you have to understand how it's trying to be trickly and circumvent it so that you're always in full control of what it's doing.
Craig
Thanks Craig, that certainly leads down the right path.
The following is all done in pgAdmin3:
Using an actual value we I get the plan I expect
explain analyze select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=10006 order by CG.CREATED asc, CI.ID asc
"Sort (cost=106.18..106.22 rows=13 width=434) (actual time=0.115..0.115 rows=12 loops=1)"
" Sort Key: cg.created, ci.id"
" Sort Method: quicksort Memory: 29kB"
" -> Nested Loop (cost=0.00..105.94 rows=13 width=434) (actual time=0.019..0.067 rows=12 loops=1)"
" -> Index Scan using chggroup_issue on changegroup cg (cost=0.00..19.73 rows=10 width=29) (actual time=0.009..0.013 rows=10 loops=1)"
" Index Cond: (issueid = 10006::numeric)"
" -> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58 rows=3 width=411) (actual time=0.004..0.005 rows=1 loops=10)"
" Index Cond: (groupid = cg.id)"
"Total runtime: 0.153 ms"
Using a prepared statement with a variable , I get a poor plan requiring a sequential scan
prepare t2(real) as
select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=$1 order by CG.CREATED asc, CI.ID asc;
explain analyze execute t2 (10006);
"Sort (cost=126448.89..126481.10 rows=12886 width=434) (actual time=1335.615..1335.616 rows=12 loops=1)"
" Sort Key: cg.created, ci.id"
" Sort Method: quicksort Memory: 29kB"
" -> Nested Loop (cost=0.00..125569.19 rows=12886 width=434) (actual time=0.046..1335.556 rows=12 loops=1)"
" -> Seq Scan on changegroup cg (cost=0.00..44709.26 rows=10001 width=29) (actual time=0.026..1335.460 rows=10 loops=1)"
" Filter: ((issueid)::double precision = $1)"
" -> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.05 rows=3 width=411) (actual time=0.007..0.008 rows=1 loops=10)"
" Index Cond: (groupid = cg.id)"
"Total runtime: 1335.669 ms"
Using a prepared statement with a cast of the variable to the right type, I get the good plan back
prepare t2(real) as
select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=cast($1 as numeric) order by CG.CREATED asc, CI.ID asc;
explain analyze execute t2 (10006);
"Sort (cost=106.19..106.22 rows=13 width=434) (actual time=0.155..0.156 rows=12 loops=1)"
" Sort Key: cg.created, ci.id"
" Sort Method: quicksort Memory: 29kB"
" -> Nested Loop (cost=0.00..105.95 rows=13 width=434) (actual time=0.048..0.111 rows=12 loops=1)"
" -> Index Scan using chggroup_issue on changegroup cg (cost=0.00..19.73 rows=10 width=29) (actual time=0.031..0.042 rows=10 loops=1)"
" Index Cond: (issueid = ($1)::numeric)"
" -> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58 rows=3 width=411) (actual time=0.006..0.006 rows=1 loops=10)"
" Index Cond: (groupid = cg.id)"
"Total runtime: 0.203 ms"
Now the challenge is to get java/jdbc to get this done right. We make a big effort to ensure we always use prepared statements and variable bindings to help protect from SQL injection vulnerabilities.
JDBC has some features that are supposed to be convenient (automatic preparing based on a number-of-executions threshold) that strike me as misguided. It's one thing to hide irrelevant details from the app, and another thing entirely to cause a huge change in the exact SQL that's sent to the server ... which is what JDBC seems to do.
I think the trick is that if you use JDBC prepared statements, you have to understand how it's trying to be trickly and circumvent it so that you're always in full control of what it's doing.
Craig
On 01/06/12 09:08, Craig James wrote:I use Perl, not JDBC, but this thread may be relevant to your problem.
http://postgresql.1045698.n5.nabble.com/Slow-statement-when-using-JDBC-td3368379.html
If I am correct, JDBC uses named portal only on the 5th time you use PreparedStatement (configurable). Before it uses unnamed thing that should work as if you did embed the value. So the solution is to recreate PreparedStatement each time (so you will have no problems with SQL injection). Note that "smart" pools may detect this situation and reuse PreparedStatement for same query texts internally. If so, this to switch this off. In case you still have problems, I'd recommend you to ask in postgresql jdbc mailing list. Also I've heard that somewhere in 9.2 postgresql server may replan such cases each time. Best regards, Vitalii Tymchyshyn 01.06.12 02:34, Trevor Campbell написав(ла): > Thanks Craig, that certainly leads down the right path. > > The following is all done in pgAdmin3: > > Using an actual value we I get the plan I expect > explain analyze select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, > CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, > CI.NEWSTRING > from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = > CI.GROUPID where CG.ISSUEID=10006 order by CG.CREATED asc, CI.ID asc > > "Sort (cost=106.18..106.22 rows=13 width=434) (actual > time=0.115..0.115 rows=12 loops=1)" > " Sort Key: cg.created, ci.id" > " Sort Method: quicksort Memory: 29kB" > " -> Nested Loop (cost=0.00..105.94 rows=13 width=434) (actual > time=0.019..0.067 rows=12 loops=1)" > " -> Index Scan using chggroup_issue on changegroup cg > (cost=0.00..19.73 rows=10 width=29) (actual time=0.009..0.013 rows=10 > loops=1)" > " Index Cond: (issueid = 10006::numeric)" > " -> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58 > rows=3 width=411) (actual time=0.004..0.005 rows=1 loops=10)" > " Index Cond: (groupid = cg.id)" > "Total runtime: 0.153 ms" > > Using a prepared statement with a variable , I get a poor plan > requiring a sequential scan > prepare t2(real) as > select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, > CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING > from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = > CI.GROUPID where CG.ISSUEID=$1 order by CG.CREATED asc, CI.ID asc; > > explain analyze execute t2 (10006); > > "Sort (cost=126448.89..126481.10 rows=12886 width=434) (actual > time=1335.615..1335.616 rows=12 loops=1)" > " Sort Key: cg.created, ci.id" > " Sort Method: quicksort Memory: 29kB" > " -> Nested Loop (cost=0.00..125569.19 rows=12886 width=434) (actual > time=0.046..1335.556 rows=12 loops=1)" > " -> Seq Scan on changegroup cg (cost=0.00..44709.26 rows=10001 > width=29) (actual time=0.026..1335.460 rows=10 loops=1)" > " Filter: ((issueid)::double precision = $1)" > " -> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.05 > rows=3 width=411) (actual time=0.007..0.008 rows=1 loops=10)" > " Index Cond: (groupid = cg.id)" > "Total runtime: 1335.669 ms" > > Using a prepared statement with a cast of the variable to the right > type, I get the good plan back > prepare t2(real) as > select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, > CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING > from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = > CI.GROUPID where CG.ISSUEID=cast($1 as numeric) order by CG.CREATED > asc, CI.ID asc; > > explain analyze execute t2 (10006); > > "Sort (cost=106.19..106.22 rows=13 width=434) (actual > time=0.155..0.156 rows=12 loops=1)" > " Sort Key: cg.created, ci.id" > " Sort Method: quicksort Memory: 29kB" > " -> Nested Loop (cost=0.00..105.95 rows=13 width=434) (actual > time=0.048..0.111 rows=12 loops=1)" > " -> Index Scan using chggroup_issue on changegroup cg > (cost=0.00..19.73 rows=10 width=29) (actual time=0.031..0.042 rows=10 > loops=1)" > " Index Cond: (issueid = ($1)::numeric)" > " -> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58 > rows=3 width=411) (actual time=0.006..0.006 rows=1 loops=10)" > " Index Cond: (groupid = cg.id)" > "Total runtime: 0.203 ms" > > Now the challenge is to get java/jdbc to get this done right. We make > a big effort to ensure we always use prepared statements and variable > bindings to help protect from SQL injection vulnerabilities. > > > > On 01/06/12 09:08, Craig James wrote: >> I use Perl, not JDBC, but this thread may be relevant to your problem. >> >> http://postgresql.1045698.n5.nabble.com/Slow-statement-when-using-JDBC-td3368379.html >> >> >> >
> If I am correct, JDBC uses named portal only on the 5th time you use > PreparedStatement (configurable). Before it uses unnamed thing that should > work as if you did embed the value. If this is due to the difference in parameter type information, this doesn't have anything to do with named portals. My guess is that the driver has one idea about parameter types (based on either the specific setTypeFoo call or the Java type of the parameter passed to setObject), and the server another (based on the type information of the CHANGEGROUP.ISSUEID column). Actually, I'm rather surprised to see 'real' there: if you're using setObject with a Long, I would imagine that turns into a bigint (which I believe the server knows how to coerce to numeric). Can you show us your JDBC code?
Hi all,
I got 2 complementary functions, which will do opposite things.
1 - CombineStrings(stringarray character varying[]) RETURNS character varying
This one will take as parameter an array of strings and will return a string with some formatted information inside
2- SplitString2Array(stringtosplit character varying) RETURNS character varying[]
This one will take as parameter a formatted string and will return an array of string
The following is true, both works just fine :
select SplitString2Array(CombineStrings(ARRAY['abba', 'queen']))
will return {'abba', 'queen'}
Now, if I want do do the following:
select CombineStrings(ARRAY[SplitString2Array("SomeTextColumn"), 'New string to add']) from "SomeTable"
i get the following error:
array value must start with "{" or dimension information
What am I doing wrong, I am feeling I still don't get the array fundamentals. My goal is to add to inside formatted information in the column "SomeTextColumn" my new string 'New string to add' in the same manner if I would been used the following:
Insert into "SomeTable"("SomeTextColumn") values (CombineString(ARRAY['abba', 'queen', 'New string to add']))
Thank you in advance,
Danny
Thanks for all your help so far. I have been away for a couple of days so my apologies for not replying earlier.
We are using a third party library to run our SQL via JDBC (not one of the common ones like Hibernate etc), but I have been able to dig out the exact statements run in the scenario we are experiencing.
We always run a prepare and then execute a query as follows, we never reuse the prepared statement:
_ps = _connection.prepareStatement(sql, resultSetType, resultSetConcurrency);
// The values here are:
// sql = SELECT CG.ID, CG.issueid, CG.AUTHOR, CG.CREATED, CI.ID, CI.groupid, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
// FROM public.changegroup CG INNER JOIN public.changeitem CI ON CG.ID = CI.groupid
// WHERE CG.issueid=? ORDER BY CG.CREATED ASC, CI.ID ASC
// resultSetType = ResultSet.TYPE_FORWARD_ONLY
// resultSetConcurrency = ResultSet.CONCUR_READ_ONLY
_ps.setLong(1, field);
_rs = _ps.executeQuery();
On 02/06/12 01:38, Maciek Sakrejda wrote:
We are using a third party library to run our SQL via JDBC (not one of the common ones like Hibernate etc), but I have been able to dig out the exact statements run in the scenario we are experiencing.
We always run a prepare and then execute a query as follows, we never reuse the prepared statement:
_ps = _connection.prepareStatement(sql, resultSetType, resultSetConcurrency);
// The values here are:
// sql = SELECT CG.ID, CG.issueid, CG.AUTHOR, CG.CREATED, CI.ID, CI.groupid, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
// FROM public.changegroup CG INNER JOIN public.changeitem CI ON CG.ID = CI.groupid
// WHERE CG.issueid=? ORDER BY CG.CREATED ASC, CI.ID ASC
// resultSetType = ResultSet.TYPE_FORWARD_ONLY
// resultSetConcurrency = ResultSet.CONCUR_READ_ONLY
_ps.setLong(1, field);
_rs = _ps.executeQuery();
On 02/06/12 01:38, Maciek Sakrejda wrote:
If I am correct, JDBC uses named portal only on the 5th time you use PreparedStatement (configurable). Before it uses unnamed thing that should work as if you did embed the value.If this is due to the difference in parameter type information, this doesn't have anything to do with named portals. My guess is that the driver has one idea about parameter types (based on either the specific setTypeFoo call or the Java type of the parameter passed to setObject), and the server another (based on the type information of the CHANGEGROUP.ISSUEID column). Actually, I'm rather surprised to see 'real' there: if you're using setObject with a Long, I would imagine that turns into a bigint (which I believe the server knows how to coerce to numeric). Can you show us your JDBC code?
> Actually, I'm rather surprised to see 'real' there: if you're using setObject with a Long, I would imagine that turns > into a bigint (which I believe the server knows how to coerce to numeric). The (real) is just my fault in testing. I just copy/pasted from elsewhere and it is not what is coming from the JDBC driver, I don't know how to see what the driver is actually sending.
On Sat, Jun 2, 2012 at 1:05 PM, idc danny <idcdanny@yahoo.com> wrote: > Hi all, > I got 2 complementary functions, which will do opposite things. > 1 - CombineStrings(stringarray character varying[]) RETURNS character > varying > This one will take as parameter an array of strings and will return a string > with some formatted information inside > 2- SplitString2Array(stringtosplit character varying) RETURNS character > varying[] > This one will take as parameter a formatted string and will return an array > of string > > The following is true, both works just fine : > select SplitString2Array(CombineStrings(ARRAY['abba', 'queen'])) > will return {'abba', 'queen'} > > Now, if I want do do the following: > select CombineStrings(ARRAY[SplitString2Array("SomeTextColumn"), 'New string > to add']) from "SomeTable" > i get the following error: > array value must start with "{" or dimension information > > What am I doing wrong, I am feeling I still don't get the array > fundamentals. My goal is to add to inside formatted information in the > column "SomeTextColumn" my new string 'New string to add' in the same manner > if I would been used the following: > Insert into "SomeTable"("SomeTextColumn") values > (CombineString(ARRAY['abba', 'queen', 'New string to add'])) It sounds like one or both of your functions have a bug in them, but without knowing what they're supposed to do or seeing the source code, it's pretty hard to guess what it might be. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company