Thread: PGStream synchronization

PGStream synchronization

From

Maciek Sakrejda

Date:

25 August 2009, 00:45:19

We've found some synchronization issues around PGStream in
QueryExecutorImpl and ProtocolConnectionImpl. Specifically, while
QueryExecutorImpl synchronizes all its uses off PGStream,
ProtocolConnectionImpl uses the same PGStream directly when close() is
called. This can easily cause protocol-level errors and,
theoretically, at least, bad data shoved into a COPY IN query while
the actual Connection.close() is ignored. This doesn't just affect
COPY, though--anything that causes writes to the QueryExecutorImpl's
pgStream could be affected.

The only thing I've achieved so far is protocol-level errors, but
that's a serious enough issue on its own.

It's not immediately clear how to fix this, since we don't want to
synchronize PGStream itself. Should ProtocolConnectionImpl just
delegate to QueryExecutorImpl on close() perhaps, with
QueryExecutorImpl handling the actual writes to PGStream?

See sample code to reproduce below:

package com.truviso;

import java.sql.Connection;
import java.sql.DriverManager;

import org.postgresql.PGConnection;
import org.postgresql.copy.CopyIn;

public class Test {

    public static void main(String[] args) throws Exception {
        Class.forName("org.postgresql.Driver");
        final PGConnection c = (PGConnection)
DriverManager.getConnection("jdbc:postgresql://localhost:5432/cqdb",
"foo", "");
        ((Connection)c).createStatement().execute(
            "drop table if exists foo; create temporary table foo(a text)");
        Thread t1 = new Thread(new Runnable() {
            public void run() {
                try {
                    CopyIn copy = c.getCopyAPI().copyInQuery("copy foo
from stdin");
                    String copyData = "abc";
                    byte[] copyBytes = copyData.getBytes();
                    copy.writeToCopy(copyBytes, 0, copyBytes.length);
                    Thread.sleep(2000);
                    copy.endCopy();
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        });
        t1.start();
        Thread.sleep(1000);
        ((Connection)c).close();
        t1.join();
    }

}

Thanks,
--
Maciek Sakrejda | Software Engineer | Truviso

1065 E. Hillsdale Blvd., Suite 230
Foster City, CA 94404
(650) 242-3500 Main
(650) 242-3501 F
msakrejda@truviso.com
www.truviso.com

Re: PGStream synchronization

From

Oliver Jowett

Date:

25 August 2009, 00:48:23

Maciek Sakrejda wrote:
> We've found some synchronization issues around PGStream in
> QueryExecutorImpl and ProtocolConnectionImpl. Specifically, while
> QueryExecutorImpl synchronizes all its uses off PGStream,
> ProtocolConnectionImpl uses the same PGStream directly when close() is
> called. This can easily cause protocol-level errors and,
> theoretically, at least, bad data shoved into a COPY IN query while
> the actual Connection.close() is ignored. This doesn't just affect
> COPY, though--anything that causes writes to the QueryExecutorImpl's
> pgStream could be affected.
>
> The only thing I've achieved so far is protocol-level errors, but
> that's a serious enough issue on its own.
>
> It's not immediately clear how to fix this, since we don't want to
> synchronize PGStream itself. Should ProtocolConnectionImpl just
> delegate to QueryExecutorImpl on close() perhaps, with
> QueryExecutorImpl handling the actual writes to PGStream?

What's the expected behaviour when Connection.close() is called
concurrently with other work happening on the connection - should the
close interrupt the current operation?

-O

Re: PGStream synchronization

From

Maciek Sakrejda

Date:

25 August 2009, 15:25:53

The jdbc spec seems vague on that (as on all multi-threaded behavior)
and can probably be interpreted either way. I'm not sure what other
jdbc drivers do. I can take a look at the MySQL driver.

Our concern is that even if the close *should* interrupt the current
operation, it should do so cleanly instead of causing a protocol
violation. I haven't been able to craft anything worse than an error
on close, but just looking at the code, there's definitely a lack of
synchronization on the shared PGStream object there that could
potentially lead to worse problems.

On Mon, Aug 24, 2009 at 5:48 PM, Oliver Jowett<oliver@opencloud.com> wrote:
> Maciek Sakrejda wrote:
>> We've found some synchronization issues around PGStream in
>> QueryExecutorImpl and ProtocolConnectionImpl. Specifically, while
>> QueryExecutorImpl synchronizes all its uses off PGStream,
>> ProtocolConnectionImpl uses the same PGStream directly when close() is
>> called. This can easily cause protocol-level errors and,
>> theoretically, at least, bad data shoved into a COPY IN query while
>> the actual Connection.close() is ignored. This doesn't just affect
>> COPY, though--anything that causes writes to the QueryExecutorImpl's
>> pgStream could be affected.
>>
>> The only thing I've achieved so far is protocol-level errors, but
>> that's a serious enough issue on its own.
>>
>> It's not immediately clear how to fix this, since we don't want to
>> synchronize PGStream itself. Should ProtocolConnectionImpl just
>> delegate to QueryExecutorImpl on close() perhaps, with
>> QueryExecutorImpl handling the actual writes to PGStream?
>
> What's the expected behaviour when Connection.close() is called
> concurrently with other work happening on the connection - should the
> close interrupt the current operation?
>
> -O
>



--
Maciek Sakrejda | Software Engineer | Truviso

(650) 242-3500 Main
(650) 242-3501 F
msakrejda@truviso.com
www.truviso.com

Re: PGStream synchronization

From

Maciek Sakrejda

Date:

25 August 2009, 16:52:51

Ok, so it looks like MySQL at least *intends to* wait for all
statements to finish executing before closing the connection. I don't
believe this is actually accomplished, since some things synchronize
on the MySQL Connection object itself, whereas some synchronize on a
separate mutex Object it exposes through a getMutex() method, but
that's sort of irrelevant to this discussion.

I'm not quite sure what the right behavior with respect to close()
should be. If we wait for all statements to finish executing, we could
be waiting for a very long time (especially in the case of an open
COPY or something silly like "SELECT pg_sleep(1000000)"). If, on the
other hand, we cancel all still-executing queries and in-progress COPY
operations, the burden of coordinating this falls to the user, which
is somewhat unfortunate. The second choice is probably more
reasonable--the jdbc API says that close()

"Releases this Connection object's database and JDBC resources
immediately instead of waiting for them to be automatically released."

and it would be tough to interpret "immediately" as "block until
everything currently executing is done".

I'm not that familiar with the FEBE protocol (I'll look into this as
well), but we don't need to cancel regular queries explicitly, right?
The Terminate message is always valid, except in COPY mode? If that's
the case, we only need to ensure that the various PGStream messages
are sent atomically (which is currently *not* the case in the case of
Terminate), and handle COPY specially (by issuing CopyFail followed by
Terminate). I'm happy to start work on a patch if we find an approach
we agree on.

Thanks,
--
Maciek Sakrejda | Software Engineer | Truviso
(650) 242-3500 Main
(650) 242-3501 F
msakrejda@truviso.com
www.truviso.com

Re: PGStream synchronization

From

Oliver Jowett

Date:

25 August 2009, 23:10:33

Maciek Sakrejda wrote:

> I'm not that familiar with the FEBE protocol (I'll look into this as
> well), but we don't need to cancel regular queries explicitly, right?

You can do query cancellation via a separate connection, but that's
really about halting long-running queries while preserving your
connection which is not really what we're after here. All that really
matters is getting the Terminate in at an appropriate point in the
stream so that the backend knows it's a deliberate shutdown and doesn't
complain about an unexpected-client-EOF.

> The Terminate message is always valid, except in COPY mode? If that's
> the case, we only need to ensure that the various PGStream messages
> are sent atomically (which is currently *not* the case in the case of
> Terminate), and handle COPY specially (by issuing CopyFail followed by
> Terminate). I'm happy to start work on a patch if we find an approach
> we agree on.

Right, so long as you get the atomicity correct I think that will work.
This will probably require more finegrained synchronization since
currently there's effectively one big lock that's held for the entire
duration of the query, which doesn't work so well here if you want
close() to interrupt any ongoing operation in a timely fashion.

-O

Re: PGStream synchronization

From

Maciek Sakrejda

Date:

26 August 2009, 09:02:05

> Right, so long as you get the atomicity correct I think that will work.
> This will probably require more finegrained synchronization since
> currently there's effectively one big lock that's held for the entire
> duration of the query, which doesn't work so well here if you want
> close() to interrupt any ongoing operation in a timely fashion.

I wasn't quite sure why this would impact regular (non-COPY) queries,
but after poking around in the source some more, I see what you are
saying. We don't really have a good message-level synchronization
point other than synchronizing on QueryExecutorImpl (i.e., its own
synchronized methods), but that will lead to blocking indefinitely on
close(). This is good enough for most communication since all the
relevant methods in QueryExecutorImpl are synchronized anyway, but
stop() gets around this to avoid blocking on all the other connection
activity before shutdown.

I'm a little wary of introducing message-level synchronization since
(a) it touches a *lot* of core code and (b) there might be a
non-negligible performance impact.

I'm not really sure if there's another clean approach, though. I'm
attaching a patch that takes another dirty approach, but other than
the fact that it seems to work, I don't really like it at all. The
basic idea is to cancel whatever you're doing before closing the
connection: if there's an active copy, have the executor send a
CopyFail; otherwise, have the ProtocolConnectionImpl request a query
cancellation. Then close the connection without fear of blocking
(although this should probably be synchronized with QueryExecutorImpl
since, theoretically, we could have close() trigger a cancel and then
another thread could start another query before we actually send the
Terminate message on PGStream).

However, this seems like a hack. QUERY CANCELED is not really the
right error for when a connection is shutting down while your query is
running. It should really be some sort of CONNECTION EXCEPTION, and we
should just run into that automatically when we send the Terminate
message, but for that, we're back to protocol message-level
synchronization.

If we do want to synchronize at the message level, is there a
reasonable way to gauge the performance impact of this change? I.e.,
is there a standard pgsql-jdbc performance suite for measuring
relative performance changes?

--
Maciek Sakrejda | Software Engineer | Truviso
(650) 242-3500 Main
(650) 242-3501 F
msakrejda@truviso.com
www.truviso.com

Attachment

close.diff

Re: PGStream synchronization

From

Oliver Jowett

Date:

27 August 2009, 00:07:54

Maciek Sakrejda wrote:

> I'm not really sure if there's another clean approach, though. I'm
> attaching a patch that takes another dirty approach, but other than
> the fact that it seems to work, I don't really like it at all. The
> basic idea is to cancel whatever you're doing before closing the
> connection: if there's an active copy, have the executor send a
> CopyFail; otherwise, have the ProtocolConnectionImpl request a query
> cancellation.

This seems a bit heavyweight because cancelling a query isn't a trivial
amount of work, and it'll happen whenever close() is called if I read
your patch correctly. 99% of apps are effectively single-threaded so it
seems expensive to always do this when it's rarely needed ..

-O

Re: PGStream synchronization

From

Maciek Sakrejda

Date:

27 August 2009, 00:37:44

Right. I did say I wasn't happy with it...

Would introducing message-level synchronization be the way to go,
then? As I see it, there are four possible approaches:

1. Cancel before close--what I just protoyped in the previous patch.
Pros: straighforward
Cons: overkill for most applications; semantics are ugly (queries
should not be canceled)

2. Status quo:
Pros: status quo, works in single-threaded case
Cons: protocol violations, possibly swallowed Terminate messages *and*
bad data in multi-threaded COPY

3. Introduce protocol-message-level synchronization
Pros: cleanest solution conceptually
Cons: may incur significant additional synchronization overhead,
significant changes in core code

4. Synchronize close() like other QueryExecutorImpl like other
connection activities
Pros: also straightforward
Cons: may block indefinitely on long-running queries or open copy
(although could additionally be made to trigger a CopyFail if a copy
operation is in progress)

I tend to agree that (1) is unworkable as a general-purpose solution.
I'm not sure how you guys feel about (2), but protocol violations seem
like a really, really ugly problem, even if they are only on closing
the connection. (3) is probably ideal, barring performance problems.
(4) may also be workable, but having close() block on currently
running queries is probably not what we want to do.

Any thoughts?

--
Maciek Sakrejda | Software Engineer | Truviso
(650) 242-3500 Main
(650) 242-3501 F
msakrejda@truviso.com
www.truviso.com

Re: PGStream synchronization

From

Oliver Jowett

Date:

27 August 2009, 00:55:47

Maciek Sakrejda wrote:

> Any thoughts?

How about, instead of using raw monitor synchronization to provide
mutual exclusion on access to the stream, we use a lock object (i.e.
something similar to java.util.concurrent.locks.Lock, though we can't
use exactly that class before 1.5 obviously), try to grab the lock
before close, and behave differently depending on if we succeeded or not.

So the close logic can look something like this:

  if (lock.tryLock()) {
    // we have exclusive access to the connection
    // do a normal shutdown
    try {
      sendTerminate();
      stream.close();
    } finally {
      lock.unlock();
    }
  } else {
    // something is concurrently using the connection
    // just abruptly close the connection
    stream.close();
  }

In the concurrent case, we don't send Terminate, but we also don't risk
sending it at the wrong point in the stream.

This means that a concurrent Connection.close() while something is
happening will result in an "unexpected client EOF" logged on the server
side, but that's almost what you want in this case anyway .. concurrent
close pretty much means "help, abort this!" ..

-O

PGStream synchronization

From

Maciek Sakrejda

Date:

27 August 2009, 06:38:23

(missed CCing the list on my reply to Oliver)

> How about, instead of using raw monitor synchronization to provide
> mutual exclusion on access to the stream, we use a lock object (i.e.
> something similar to java.util.concurrent.locks.Lock, though we can't
> use exactly that class before 1.5 obviously), try to grab the lock
> before close, and behave differently depending on if we succeeded or not.
>
> So the close logic can look something like this:
>
>  if (lock.tryLock()) {
>    // we have exclusive access to the connection
>    // do a normal shutdown
>    try {
>      sendTerminate();
>      stream.close();
>    } finally {
>      lock.unlock();
>    }
>  } else {
>    // something is concurrently using the connection
>    // just abruptly close the connection
>    stream.close();
>  }
>
> In the concurrent case, we don't send Terminate, but we also don't risk
> sending it at the wrong point in the stream.
>
> This means that a concurrent Connection.close() while something is
> happening will result in an "unexpected client EOF" logged on the server
> side, but that's almost what you want in this case anyway .. concurrent
> close pretty much means "help, abort this!" ..
>
> -O
>

That's interesting. I think it's not quite in the spirit of the
protocol, but there may not be much more we can do, and it's certainly
a reasonable behavior. I'd like to spend some more time with the FEBE
spec tomorrow and then I'll submit a patch based on this approach.

Thanks,
--
Maciek Sakrejda | Software Engineer | Truviso
(650) 242-3500 Main
(650) 242-3501 F
msakrejda@truviso.com
www.truviso.com

Re: PGStream synchronization

From

Maciek Sakrejda

Date:

28 August 2009, 05:50:34

> How about, instead of using raw monitor synchronization to provide
> mutual exclusion on access to the stream, we use a lock object (i.e.
> something similar to java.util.concurrent.locks.Lock, though we can't
> use exactly that class before 1.5 obviously), try to grab the lock
> before close, and behave differently depending on if we succeeded or not.
>
> So the close logic can look something like this:
>
>  if (lock.tryLock()) {
>    // we have exclusive access to the connection
>    // do a normal shutdown
>    try {
>      sendTerminate();
>      stream.close();
>    } finally {
>      lock.unlock();
>    }
>  } else {
>    // something is concurrently using the connection
>    // just abruptly close the connection
>    stream.close();
>  }
>
> In the concurrent case, we don't send Terminate, but we also don't risk
> sending it at the wrong point in the stream.
>
> This means that a concurrent Connection.close() while something is
> happening will result in an "unexpected client EOF" logged on the server
> side, but that's almost what you want in this case anyway .. concurrent
> close pretty much means "help, abort this!" ..

I read through the FEBE spec (props to the documenters, by the way--it
was very clear and concise), and discussed this with a colleague, and
he suggested a hybird approach based on what you are doing here and
what I suggested initially.

Essentially, the 'if' part of your pseudocode would be the same, but
in the 'else', instead of just closing the connection, fall back to my
original approach (issue CancelRequest / CopyFail and then Terminate).
I think this is somewhat cleaner but only incurs the expense of a
CancelRequest if you really are doing something else on that
connection at that moment.


--
Maciek Sakrejda | Software Engineer | Truviso
(650) 242-3500 Main
(650) 242-3501 F
msakrejda@truviso.com
www.truviso.com

Re: PGStream synchronization

From

Maciek Sakrejda

Date:

02 September 2009, 19:52:52

So, for what it's worth, the hybrid approach somehow turned out to be
a massive, massive performance regression in our tests. We're still
looking into why. The only thing I've found that works consistently so
far is to *always* just close the pgStream socket (without sending
anything), which is only marginally better in theory than the current
situation (and probably actually worse in practice, since you're
unlikely to see concurrent use of the same connection, whereas having
a clean protocol shutdown is nice). I'll let the list know if we find
a better approach.

On Thu, Aug 27, 2009 at 10:50 PM, Maciek Sakrejda<msakrejda@truviso.com> wrote:
>> How about, instead of using raw monitor synchronization to provide
>> mutual exclusion on access to the stream, we use a lock object (i.e.
>> something similar to java.util.concurrent.locks.Lock, though we can't
>> use exactly that class before 1.5 obviously), try to grab the lock
>> before close, and behave differently depending on if we succeeded or not.

--
Maciek Sakrejda | Software Engineer | Truviso
(650) 242-3500 Main
(650) 242-3501 F
msakrejda@truviso.com
www.truviso.com