Re: Postgres restart during CopyManager.copyIn does not free connection, thread stuck on QueryExecutorImpl.waitOnLock - Mailing list pgsql-jdbc

From Alexis Meneses
Subject Re: Postgres restart during CopyManager.copyIn does not free connection, thread stuck on QueryExecutorImpl.waitOnLock
Date
Msg-id CANPkoZS7jNmPYyrPguw-RHJu1KzXAFtKh7teGNeWQZ_TQGro-A@mail.gmail.com
Whole thread Raw
In response to Postgres restart during CopyManager.copyIn does not free connection, thread stuck on QueryExecutorImpl.waitOnLock  (Brendan Reekie <breekie@sandvine.com>)
List pgsql-jdbc
Hi

I think that a similar issue has been seen already (see thread http://www.postgresql.org/message-id/flat/CADGbXSQ--8pJcSPkC7+tR6rsGrk7p=141Bp16VJiOR5mg_SQpQ@mail.gmail.com) but it has not yet been fixed.

Would you have time to work on a patch and submit a pull request on the github project?

Thanks.

Alexis


2015-02-09 19:38 GMT+01:00 Brendan Reekie <breekie@sandvine.com>:

Hi,

 

I’m currently using driver: 9.3.1100-jdbc3.jar with a 9.3.5 server.

 

The behaviour I’m seeing is if the connection to the database is lost due a restart of Postgres and the block of code being executed is a CopyManager.copyIn() method the connection to the database is never freed and the stack trace shows that the thread is still awaiting unlock:

 

                java.lang.Object.$$YJP$$wait(Native Method)

                java.lang.Object.wait(Object.java)

                java.lang.Object.wait(Object.java:503)

                org.postgresql.core.v3.QueryExecutorImpl.waitOnLock(QueryExecutorImpl.java:91)

                org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:228)

                org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:560)

                org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:403)

                org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:395)

 

Debugging through the code it looks like the issue might be in the QueryExecutorImpl.cancelCopy() operation.  When the operation is attempting to flush the pgStream this throws an IOException and the block of code to remove the lock (processCopyResults) is never called and the connection remains open and the lock never freed.

 

 

    /**

     * Finishes a copy operation and unlocks connection discarding any exchanged data.

     * @param op the copy operation presumably currently holding lock on this connection

     * @throws SQLException on any additional failure

     */

    public void cancelCopy(CopyOperationImpl op) throws SQLException {

        if(!hasLock(op))

            throw new PSQLException(GT.tr("Tried to cancel an inactive copy operation"), PSQLState.OBJECT_NOT_IN_STATE);

 

        SQLException error = null;

        int errors = 0;

 

        try {

            if(op instanceof CopyInImpl) {

                synchronized (this) {

                    if (logger.logDebug()) {

                        logger.debug("FE => CopyFail");

                    }

                    final byte[] msg = Utils.encodeUTF8("Copy cancel requested");

                    pgStream.SendChar('f'); // CopyFail

                    pgStream.SendInteger4(5 + msg.length);

                    pgStream.Send(msg);

                    pgStream.SendChar(0);

                    pgStream.flush();

                    do {

                        try {

                            processCopyResults(op, true); // discard rest of input

                        } catch(SQLException se) { // expected error response to failing copy

                            errors++;

                            if( error != null ) {

                                SQLException e = se, next;

                                while( (next = e.getNextException()) != null )

                                    e = next;

                                e.setNextException(error);

                            }

                            error = se;

                        }

                    } while(hasLock(op));

                }

            } else if (op instanceof CopyOutImpl) {

                protoConnection.sendQueryCancel();

            }

 

        } catch(IOException ioe) {

            throw new PSQLException(GT.tr("Database connection failed when canceling copy operation"), PSQLState.CONNECTION_FAILURE, ioe);

        }

 

        if (op instanceof CopyInImpl) {

            if(errors < 1) {

                throw new PSQLException(GT.tr("Missing expected error response to copy cancel request"), PSQLState.COMMUNICATION_ERROR);

            } else if(errors > 1) {

                throw new PSQLException(GT.tr("Got {0} error responses to single copy cancel request", String.valueOf(errors)), PSQLState.COMMUNICATION_ERROR, error);

            }

        }

    }

 

I’ve tried the latest driver 9.4-1200 and observed the same behaviour.  To reproduce this test I’m using a tester that writes to copyIn using a stream of data and set a break point and restart Postgres server while performing the copyIn.

 

Has anyone seen this issue previously?  Is there a work around to this scenario?

 

Thanks in advance,

Brendan


pgsql-jdbc by date:

Previous
From: Albe Laurenz
Date:
Subject: SSL renegotiation is broken
Next
From: Heikki Linnakangas
Date:
Subject: Re: SSL renegotiation is broken