Thread: The logic behind the prepared statement in PostgreSQL

The logic behind the prepared statement in PostgreSQL

From
louis
Date:
Hi,

I've got several questions regarding the behaviors of prepared statement in PostgreSQL.

Environment:
JDK version: 6
JDBC driver: 8.4-701.jdbc3
No connection pool is used
prepareThreshold is set to 1

Code:
for (int i = 0; i < 10; i++) {PreparedStatement statement = 	conn.prepareStatement("select * from foo where value = ?");statement.setInt(1, 20);statement.executeQuery();// statement is left un-closed on purpose;// check the contents of pg_prepared_statement;
}


Obvervation:

The pg_prepared_statements contents of the 4th iteration is as follows:
S_2	select * from gsd_36c_unique where tag_length = $1	2010-10-06 19:01:06.052824+08	{integer}	f
S_4	select * from gsd_36c_unique where tag_length = $1	2010-10-06 19:01:06.63442+08	{integer}	f
S_3	select * from gsd_36c_unique where tag_length = $1	2010-10-06 19:01:06.317623+08	{integer}	f
S_1	BEGIN	2010-10-06 19:01:06.052524+08	{}	f
S_5	select * from gsd_36c_unique where tag_length = $1	2010-10-06 19:01:06.811521+08	{integer}	f

Discussion:

1. Even though all the prepare statements have the identical SQL string, the postgresql server creates one execution plan for each statement.
So I think the execution plan is fetched by name (S_1, S_2, etc.) instead of SQL query string from the server-side. On the other hand, 
a JDBC code tracing shows that the execution plan is fetched by SQL query String.  Is the above assertion correct? 

3. In the 5th iteration all the previously-prepared statements are dropped from the postgresql server, the pg_prepared_statements shows the following:
pg_prepared_statemnt S_6	select * from gsd_36c_unique where tag_length = $1	2010-10-06 19:01:06.99907+08	{integer}	f
pg_prepared_statemnt S_1	BEGIN	2010-10-06 19:01:06.052524+08	{}	f  
I think it's because postgresql JDBC Driver has a default limit for the max number of preparedStatement, which is 4. 
Can anyone tell me where to adjuest this parameter?

Cheers,

Louis

Re: The logic behind the prepared statement in PostgreSQL

From
Oliver Jowett
Date:
louis wrote:

> 1. Even though all the prepare statements have the identical SQL string, the postgresql server creates one execution
planfor each statement. 
> So I think the execution plan is fetched by name (S_1, S_2, etc.) instead of SQL query string from the server-side.
Onthe other hand,  
> a JDBC code tracing shows that the execution plan is fetched by SQL query String.  Is the above assertion correct?

From the driver's point of view, if you create multiple different
PreparedStatement objects (as you are doing here), each is a completely
separate statement - even if they happen to share a query string with
some other statement you executed. So a separate server-side statement
is prepared for each.

If you reuse the PreparedStatement object, the driver will reuse the
underlying server-side statement it prepared earlier.

I don't know what you mean by "JDBC code tracing" exactly, but there's
no mapping of SQL query string to statement name; there is just an
association from a PreparedStatement to a corresponding server-side
statement (see e.g. jdbc2.AbstractJdbc2Statement.preparedQuery ->
core.Query -> core.v3.SimpleQuery)

> 3. In the 5th iteration all the previously-prepared statements are dropped from the postgresql server, the
pg_prepared_statementsshows the following: 
>
>     pg_prepared_statemnt S_6    select * from gsd_36c_unique where tag_length = $1    2010-10-06 19:01:06.99907+08
{integer}   f 
>     pg_prepared_statemnt S_1    BEGIN    2010-10-06 19:01:06.052524+08    {}    f
>
> I think it's because postgresql JDBC Driver has a default limit for the max number of preparedStatement, which is 4.
> Can anyone tell me where to adjuest this parameter?

This is just luck of the draw. It is because you are leaking the
statements, rather than closing them. There is a reference queue +
phantom reference that is used to detect this case and free any
server-side resources associated with leaked statements, but exactly
when those references are cleared and enqueued is entirely up to the
JVM's garbage collector. The driver polls the queue before each query
execution and sends appropriate protocol messages to free the
server-side statements of any enqueued references.

If you properly close the prepared statement, the reference is
immediately cleared/enqueued, so you don't have to wait for GC in that case.

Oliver

Re: The logic behind the prepared statement in PostgreSQL

From
louis
Date:
On Thu, 2010-10-07 at 01:19 +1300, Oliver Jowett wrote:
> louis wrote:
>
> > 1. Even though all the prepare statements have the identical SQL string, the postgresql server creates one
executionplan for each statement. 
> > So I think the execution plan is fetched by name (S_1, S_2, etc.) instead of SQL query string from the server-side.
Onthe other hand,  
> > a JDBC code tracing shows that the execution plan is fetched by SQL query String.  Is the above assertion correct?
>
> >From the driver's point of view, if you create multiple different
> PreparedStatement objects (as you are doing here), each is a completely
> separate statement - even if they happen to share a query string with
> some other statement you executed. So a separate server-side statement
> is prepared for each.
>
> If you reuse the PreparedStatement object, the driver will reuse the
> underlying server-side statement it prepared earlier.
>
> I don't know what you mean by "JDBC code tracing" exactly, but there's
> no mapping of SQL query string to statement name; there is just an
> association from a PreparedStatement to a corresponding server-side
> statement (see e.g. jdbc2.AbstractJdbc2Statement.preparedQuery ->
> core.Query -> core.v3.SimpleQuery)

"JDBC code tracing" means firing up a debugger while running the code,
as you have expected. Sorry for the unclear expression.
Thanks for showing me the call stack.

However, while tracing the call stack I found that the
core.v3.QueryExecutorImpl.sendBind method sends a query statement Name
to PGStream, does it mean JDBC driver use the statement name to identify
the corresponding execution plan on a Postgresql server?
I have such assumption since PGStream seems to be the connection between
JDBC driver and the PostgreSQL server, and I can't find other obvious
path for JDBC code to specify an execution plan on the server side.

>
> > 3. In the 5th iteration all the previously-prepared statements are dropped from the postgresql server, the
pg_prepared_statementsshows the following: 
> >
> >     pg_prepared_statemnt S_6    select * from gsd_36c_unique where tag_length = $1    2010-10-06 19:01:06.99907+08
 {integer}    f 
> >     pg_prepared_statemnt S_1    BEGIN    2010-10-06 19:01:06.052524+08    {}    f
> >
> > I think it's because postgresql JDBC Driver has a default limit for the max number of preparedStatement, which is
4. 
> > Can anyone tell me where to adjuest this parameter?
>
> This is just luck of the draw. It is because you are leaking the
> statements, rather than closing them. There is a reference queue +
> phantom reference that is used to detect this case and free any
> server-side resources associated with leaked statements, but exactly
> when those references are cleared and enqueued is entirely up to the
> JVM's garbage collector. The driver polls the queue before each query
> execution and sends appropriate protocol messages to free the
> server-side statements of any enqueued references.
>
> If you properly close the prepared statement, the reference is
> immediately cleared/enqueued, so you don't have to wait for GC in that case.
>
> Oliver
>

Thanks for enlightening me on this issue.
Your explanation is very helpful.

Cheers,


Louis



Re: The logic behind the prepared statement in PostgreSQL

From
Oliver Jowett
Date:
louis wrote:

> However, while tracing the call stack I found that the
> core.v3.QueryExecutorImpl.sendBind method sends a query statement Name
> to PGStream, does it mean JDBC driver use the statement name to identify
> the corresponding execution plan on a Postgresql server?
> I have such assumption since PGStream seems to be the connection between
> JDBC driver and the PostgreSQL server, and I can't find other obvious
> path for JDBC code to specify an execution plan on the server side.

A Bind message associates parameter values with a previously created
statement. You probably want to read the docs that describe the FE/BE
protocol: http://www.postgresql.org/docs/9.0/static/protocol.html

Oliver

Re: The logic behind the prepared statement in PostgreSQL

From
louis
Date:
On Thu, 2010-10-07 at 10:16 +1300, Oliver Jowett wrote:
> louis wrote:
>
> > However, while tracing the call stack I found that the
> > core.v3.QueryExecutorImpl.sendBind method sends a query statement Name
> > to PGStream, does it mean JDBC driver use the statement name to identify
> > the corresponding execution plan on a Postgresql server?
> > I have such assumption since PGStream seems to be the connection between
> > JDBC driver and the PostgreSQL server, and I can't find other obvious
> > path for JDBC code to specify an execution plan on the server side.
>
> A Bind message associates parameter values with a previously created
> statement. You probably want to read the docs that describe the FE/BE
> protocol: http://www.postgresql.org/docs/9.0/static/protocol.html
>
> Oliver
>


Very helpful information.
Much appreciated

Louis


Re: The logic behind the prepared statement in PostgreSQL

From
Radosław Smogura
Date:
Hi,

> I think it's because postgresql JDBC Driver has a default limit for the max
> number of preparedStatement, which is 4. Can anyone tell me where to
> adjuest this parameter?

This parameter 4 is associated with prepare threshold. It says how many times
the given prepared statement (as Java object) will be executed before it will
be compiled into true, server side prepared statement.

In many applications developers uses PreparedStatement class just to wrap
parameters, so compiling such statement in server side statement adds unneeded
overhead. After you execute such PS few times then it's compiled, and in logs
you will see EXECUTE S_1 or S_1/C_1.

PG JDBC driver doesn't have built in support for caching statements in any
way, so two statements with same SQL will be mapped to different statements in
database, even if that are called in scope of same connection.

If you would like to cache statements, currently most servers support this, by
wrapping original database objects.

Kind ragards,
Radosław Smogura
http://www.softperience.eu

Re: The logic behind the prepared statement in PostgreSQL

From
Radosław Smogura
Date:
Hi,

> I think it's because postgresql JDBC Driver has a default limit for the max
> number of preparedStatement, which is 4. Can anyone tell me where to
> adjuest this parameter?

This parameter 4 is associated with prepare threshold. It says how many times
the given prepared statement (as Java object) will be executed before it will
be compiled into true, server side prepared statement.

In many applications developers uses PreparedStatement class just to wrap
parameters, so compiling such statement in server side statement adds unneeded
overhead. After you execute such PS few times then it's compiled, and in logs
you will see EXECUTE S_1 or S_1/C_1.

PG JDBC driver doesn't have built in support for caching statements in any
way, so two statements with same SQL will be mapped to different statements in
database, even if that are called in scope of same connection.

If you would like to cache statements, currently most servers support this, by
wrapping original database objects.

Kind ragards,
Radosław Smogura
http://www.softperience.eu