Thread: psycopg2 (async) socket timeout
Hello, is it possible to specify the timeout for the socket underlying a connection? Alternatively, since I'm using the async interface anyway, is it possible proactively cancel a query that is "stuck" since the TCP connection to the database is down? So the specific case is: - connect to the postgres database using psycopg2 while network is up - run some queries, get the results fine etc - send a query - the network goes down before the result to this last query has been received - neither a result nor an error callback gets called - as far as I can see (using txpostgres.ConnectionPool) What's the proper way to deal with that? Regards, Danny
On Thu, Feb 3, 2011 at 8:04 PM, Danny Milosavljevic <danny.milo@gmail.com> wrote: > Hello, > > is it possible to specify the timeout for the socket underlying a connection? You can send a command SET statement_timeout to affect all the commands further sent to the connection http://www.postgresql.org/docs/9.0/static/runtime-config-client.html#GUC-STATEMENT-TIMEOUT > Alternatively, since I'm using the async interface anyway, is it > possible proactively cancel a query that is "stuck" since the TCP > connection to the database is down? Jan has contributed a connection.cancel() method, available since psycopg 2.3 http://initd.org/psycopg/docs/connection.html#connection.cancel > So the specific case is: > - connect to the postgres database using psycopg2 while network is up > - run some queries, get the results fine etc > - send a query > - the network goes down before the result to this last query has been received > - neither a result nor an error callback gets called - as far as I can > see (using txpostgres.ConnectionPool) > > What's the proper way to deal with that? Probably canceling the query from the client: if the network is down you wouldn't get the signal that the backend timed out. But I've never dealt with this scenario myself so I guess you should test if any of the above works for you :) -- Daniele
----- Original message ----- > So the specific case is: > - connect to the postgres database using psycopg2 while network is up > - run some queries, get the results fine etc > - send a query > - the network goes down before the result to this last query has been > received - neither a result nor an error callback gets called - as far > as I can see (using txpostgres.ConnectionPool) > > What's the proper way to deal with that? I'll try to reproduce this problem, AIUI you should have the Deferred errback if the connection is lost, but perhaps it takessome time for Twisted to detect it (actually it takes time for the kernel to detect it). You might try playing withyour TCP keepalive settings. Another option is implementing a timeout with a callLater. The problem there is that it requires additional code and txpostgresdoes not support query cancellation (yet, it's on the roadmap). Cheers, Jan
You could add connect_timeout to the dsn, e.g. dsn = "host='localhost' dbname='test' connect_timeout=60". http://www.postgresql.org/docs/8.4/static/libpq-connect.html#AEN33199.I don't know if this will help with your situation,but it might. Mark ----- Original Message ----- From: "Danny Milosavljevic" <danny.milo@gmail.com> To: psycopg@postgresql.org Sent: Thursday, February 3, 2011 2:04:39 PM Subject: [psycopg] psycopg2 (async) socket timeout Hello, is it possible to specify the timeout for the socket underlying a connection? Alternatively, since I'm using the async interface anyway, is it possible proactively cancel a query that is "stuck" since the TCP connection to the database is down? So the specific case is: - connect to the postgres database using psycopg2 while network is up - run some queries, get the results fine etc - send a query - the network goes down before the result to this last query has been received - neither a result nor an error callback gets called - as far as I can see (using txpostgres.ConnectionPool) What's the proper way to deal with that? Regards, Danny -- Sent via psycopg mailing list (psycopg@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/psycopg
Hi, 2011/2/9 Jan Urbański <wulczer@wulczer.org>: > ----- Original message ----- > I'll try to reproduce this problem, AIUI you should have the Deferred errback if the connection is lost, but perhaps ittakes some time for Twisted to detect it (actually it takes time for the kernel to detect it). You might try playing withyour TCP keepalive settings. I'm trying. No luck so far... http://twistedmatrix.com/trac/wiki/FrequentlyAskedQuestions says "If you rely on TCP timeouts, expect as much as two hours (the precise amount is platform specific) to pass between when the disruption occurs and when connectionLost is called". Oops. Hmm, even when I connect, then just down the network interface and only after that call runQuery, it is also never calling back anything (well, I didn't wait more than half an hour per try so far). But good point, although does this even work for async sockets? - where you are not reading actively, that is, nobody knows you want to receive any data? If that worked, that would be the nicest fix. For the not-so-nice fix, read on :-) I've now started to do it the way Daniele and you suggested ("just close it from the client"), so I modified the Connection to start a timer which will fire if I don't defuse it early enough (and modified ConnectionPool to check connections periodically and reconnect). After I receive a response, I defuse the timer. If not, the timer callback will be run. It will call the errback - which will call connection.close(). As far as noticing the "disconnect" (well, potential disconnect) goes, this works perfectly. However, doing a connection.close() then doesn't seem to help much, still investigating why... getting the following: File "/usr/lib/python2.6/site-packages/twisted/internet/selectreactor.py", line 104, in doSelect [], timeout) exceptions.ValueError: file descriptor cannot be a negative integer (-1) So it seems the FD of the closed connection to postgres is still in the Twisted reactor? Seems I am missing some calls to self.reactor.removeReader or -Writer, maybe. Do those belong in Connection.close() ? If I try to reconnect periodically, can I use the same txpostgres Connection instance and just call connect() again? > Another option is implementing a timeout with a callLater. The problem there is that it requires additional code Yeah, did that now, see the end of this post for the code... Since I'm trying to make the pg client more resilient against our flaky network, what I want to do is just close the socket to the server on timeout. What I don't want to do it send a cancellation request over the broken socket to the server saying that the connection is broken and he please cancel the query :-) (I hope the PostgreSQL server will notice soon enough when the client doesn't answer) > and txpostgres does not support query cancellation (yet, it's on the roadmap). Yeah, but when you say "cancellation" do you mean "sending a - cancellation - request to the server via the non-working connection"? :) Well, I'll be reading a bit more of the twisted reactor code now, I guess :-) Cheers, Danny The file "postgresto.py" which is all that extra code needed for client-side impatience follows. For "Connection" it's mostly your code with a few minimal changes: some connectTimeouter() calls sprinkled in. For "ConnectionPool" it's a periodically-checking health checker and reconnecter too now - please note that this is work in progress and in no way stable yet: #!/usr/bin/env python2 # Postgres with timeouts. import sys from txpostgres import txpostgres from twisted.internet import interfaces, reactor, defer from twisted.python import log class Connection(txpostgres.Connection): def connectTimeouter(self, d, timeout): """ connect a timeouter to a deferred """ delayedCall = reactor.callLater(timeout, self.handleTimeout, d) d.addBoth(self.cancelTimeout, delayedCall) return(d) def _runQuery(self, *args, **kwargs): c = self.cursor() timeout = kwargs.get("timeout") or 10 d = c.execute(*args, **kwargs) return self.connectTimeouter(d, timeout).addCallback(lambda c: c.fetchall()) def _runOperation(self, *args, **kwargs): c = self.cursor() timeout = kwargs.get("timeout") or 10 d = c.execute(*args, **kwargs) return self.connectTimeouter(d, timeout).addCallback(lambda _: None) def _runInteraction(self, interaction, *args, **kwargs): c = self.cursor() timeout = kwargs.get("timeout") or 10 d = c.execute("begin") # we assume that the interaction does something on the database here, so if the interaction times out, take it as a database timeout! self.connectTimeouter(d, timeout).addCallback(lambda arg: self.connectTimeouter(defer.maybeDeferred(interaction(arg, *args, **kwargs)))) # FIXME also timeout the interaction itself. def commitAndPassthrough(ret, cursor): e = cursor.execute("commit") return e.addCallback(lambda _: ret) def rollbackAndPassthrough(f, cursor): # maybeDeferred in case cursor.execute raises a synchronous exception e = defer.maybeDeferred(cursor.execute, "rollback") def just_panic(rf): log.err(rf) return defer.fail(RollbackFailed(self, f)) # if rollback failed, panic e.addErrback(just_panic) # reraise the original failure afterwards return e.addCallback(lambda _: f) #self.connectTimeouter(d, timeout) d.addCallback(commitAndPassthrough, c) d.addErrback(rollbackAndPassthrough, c) return d def handleTimeout(self, d): """ handles the timeout since we DID time out """ log.err("timed out") self.close() # close the connection (maybe it was a connection problem...) def cancelTimeout(self, arg, delayedCall): """ cancels the timeout since we DID NOT time out """ #print >>sys.stderr, "not timed out, OK" if delayedCall.active(): delayedCall.cancel() return(arg) def isConnected(self): return self.pollable() is not None class ConnectionPool(txpostgres.ConnectionPool): connectionFactory = Connection """def connect(self, *args, **kwargs): result = txpostgres.ConnectionPool.connect(self, *args, **kwargs) return(result) """ def __init__(self, *args, **kwargs): txpostgres.ConnectionPool.__init__(self, *args, **kwargs) self.reconnectionInterval = 10 # sec reactor.callLater(self.reconnectionInterval, self.reconnectIfNeeded) self.connectionAttempts = set() def reconnectIfNeeded(self): for connection in self.connections: if not connection.isConnected() and connection not in self.connectionAttempts: # TODO don't try that too often... log.msg("database connection was lost, trying again") self.connectionAttempts.add(connection) d = connection.connect(*self.connargs, **self.connkw) delayedCall = reactor.callLater(self.reconnectionInterval, self.handleTimeout, d, connection) d.addBoth(self.cancelTimeout, delayedCall, connection) reactor.callLater(self.reconnectionInterval, self.reconnectIfNeeded) def handleTimeout(self, d, connection): """ handles the timeout since we DID time out """ log.err("reconnect timed out") try: self.connectionAttempts.remove(connection) except KeyError: pass d.errback() def cancelTimeout(self, arg, delayedCall, connection): """ cancels the timeout since we DID NOT time out """ print >>sys.stderr, "reconnect DID NOT time out, OK" # does not neccessarily mean that it worked. if delayedCall.active(): delayedCall.cancel() try: self.connectionAttempts.remove(connection) except KeyError: pass return(arg)
On 14/02/11 19:59, Danny Milosavljevic wrote: > Hi, > > 2011/2/9 Jan Urbański <wulczer@wulczer.org>: >> ----- Original message ----- >> I'll try to reproduce this problem, AIUI you should have the Deferred errback if the connection is lost, but perhaps ittakes some time for Twisted to detect it (actually it takes time for the kernel to detect it). You might try playing withyour TCP keepalive settings. > > I'm trying. No luck so far... > > http://twistedmatrix.com/trac/wiki/FrequentlyAskedQuestions says "If > you rely on TCP timeouts, expect as much as two hours (the precise > amount is platform specific) to pass between when the disruption > occurs and when connectionLost is called". Oops. Yup, default settings for TCP keepalives are quite high... > Hmm, even when I connect, then just down the network interface and > only after that call runQuery, it is also never calling back anything > (well, I didn't wait more than half an hour per try so far). > > But good point, although does this even work for async sockets? - > where you are not reading actively, that is, nobody knows you want to > receive any data? If that worked, that would be the nicest fix. For > the not-so-nice fix, read on :-) AFAIK if you're connected through TCP and waiting for data from the other side, and the other side decides to never send you anything (for instance because it died and did not even send you a RST packet), you have no way of detecting that short of trying to send something every now and then and if there's no response assuming the connection's down. So you actually *need* a heartbeat solution to be able to detect network dying... I think the best idea would be starting a timer every time you start a query and cancelling it when it finishes, and (important) setting the timeout of that timer only a little bit higher than the query timeout setting on the server. This way if your code times out the server won't keep on running your query. > I've now started to do it the way Daniele and you suggested ("just > close it from the client"), so I modified the Connection to start a > timer which will fire if I don't defuse it early enough (and modified > ConnectionPool to check connections periodically and reconnect). Well something like that ;) I'd try doing it on the per-query level, actually. Since you can't have more than one outstanding query, your keepalive won't be sent until the current query finishes. Actually, libpq recently got a feature called PQPing that just checks the state of the connection. So you can have timeouts on your queries and periodic PQPings when you're not running anything. Reminds me: psycopg2 needs to support PQPing, but that should be easy. > After I receive a response, I defuse the timer. If not, the timer > callback will be run. It will call the errback - which will call > connection.close(). > > As far as noticing the "disconnect" (well, potential disconnect) goes, > this works perfectly. > However, doing a connection.close() then doesn't seem to help much, > still investigating why... getting the following: > > File "/usr/lib/python2.6/site-packages/twisted/internet/selectreactor.py", > line 104, in doSelect > [], timeout) > exceptions.ValueError: file descriptor cannot be a negative integer (-1) > > So it seems the FD of the closed connection to postgres is still in > the Twisted reactor? > Seems I am missing some calls to self.reactor.removeReader or -Writer, > maybe. Do those belong in Connection.close() ? Ha, it always comes back to the ticket I filed when writing txpostgres: http://twistedmatrix.com/trac/ticket/4539 Believe it or not, this problem seems to also prevent proper LISTEN/NOTIFY implementation... > If I try to reconnect periodically, can I use the same txpostgres > Connection instance and just call connect() again? I think you can, although recreating the Connection object should not be a problem. Jan
On Mon, Feb 14, 2011 at 7:16 PM, Jan Urbański <wulczer@wulczer.org> wrote: > Actually, libpq recently got a feature called PQPing that just checks > the state of the connection. So you can have timeouts on your queries > and periodic PQPings when you're not running anything. Reminds me: > psycopg2 needs to support PQPing, but that should be easy. That's tasty. It's too early now to include something only available in PG 9.1, but a psycopg2.ping(dsn_or_params) would actually be useful. -- Daniele
On Thu, Feb 3, 2011 at 10:04 PM, Danny Milosavljevic <danny.milo@gmail.com> wrote: > is it possible to specify the timeout for the socket underlying a connection? > > Alternatively, since I'm using the async interface anyway, is it > possible proactively cancel a query that is "stuck" since the TCP > connection to the database is down? > > So the specific case is: > - connect to the postgres database using psycopg2 while network is up > - run some queries, get the results fine etc > - send a query > - the network goes down before the result to this last query has been received > - neither a result nor an error callback gets called - as far as I can > see (using txpostgres.ConnectionPool) > > What's the proper way to deal with that? TCP keepalive. By default the timeouts are quite high, but they are tunable. libpq supports keepalive tuning since 9.0, on older libpq you can do it yourself: https://github.com/markokr/skytools/blob/master/python/skytools/psycopgwrapper.py#L153 -- marko
On 15/02/11 06:39, Marko Kreen wrote: > On Thu, Feb 3, 2011 at 10:04 PM, Danny Milosavljevic > <danny.milo@gmail.com> wrote: >> is it possible to specify the timeout for the socket underlying a connection? >> >> Alternatively, since I'm using the async interface anyway, is it >> possible proactively cancel a query that is "stuck" since the TCP >> connection to the database is down? >> >> So the specific case is: >> - connect to the postgres database using psycopg2 while network is up >> - run some queries, get the results fine etc >> - send a query >> - the network goes down before the result to this last query has been received >> - neither a result nor an error callback gets called - as far as I can >> see (using txpostgres.ConnectionPool) >> >> What's the proper way to deal with that? > > TCP keepalive. By default the timeouts are quite high, > but they are tunable. > > libpq supports keepalive tuning since 9.0, on older libpq > you can do it yourself: > > https://github.com/markokr/skytools/blob/master/python/skytools/psycopgwrapper.py#L153 After doing lots of tests, it seems that keepalives are not the full solution. They're useful if you want to detect the connection breaking while it's idle, but they don't help in the case of: * the connection is idle * the the app sends a keepalive, receives response * before the next keepalive is sent, you want to do a query * the connection breaks silently * you try sending the query * libpq tries to write the query to the conncetion socket, does not receive TCP confirmation * the kernel starts retransmitting the data, using TCP's RTO algorithm * you don't get notified about the failure until the TCP gives up, which might be a long time So it seems to me that you need an application-level timeout also. I'm thinking about supporting it in txpostgres, but will have to think exactly how to do it and what would be the interface. Alternatively, you can lower the kernel TCP retry parameters (net.ipv4.tcp_retries1 and net.ipv4.tcp_retries2), which will make TCP give up earlier. Unfortunately it seems that you can only set the globally at the kernel level and not per connection, which IMHO is a bit too scary. What bothers me is that the keepalives mechanism does not come into play while you're doing TCP retries, but that's apparently how TCP works (at least on Linux...). If you want to detect the connection failing as soon as possible, and not the next time you try to make a query, you need to regularly make queries, IOW have a heartbeat. But all the things I wrote before still apply, and without an app-level timeout or lowering the TCP retry parameters it might take a lot of time to detect that the heartbeat failed. Cheers, Jan
On Tue, Feb 15, 2011 at 3:32 PM, Jan Urbański <wulczer@wulczer.org> wrote: > On 15/02/11 06:39, Marko Kreen wrote: >> On Thu, Feb 3, 2011 at 10:04 PM, Danny Milosavljevic >> <danny.milo@gmail.com> wrote: >>> is it possible to specify the timeout for the socket underlying a connection? >>> >>> Alternatively, since I'm using the async interface anyway, is it >>> possible proactively cancel a query that is "stuck" since the TCP >>> connection to the database is down? >>> >>> So the specific case is: >>> - connect to the postgres database using psycopg2 while network is up >>> - run some queries, get the results fine etc >>> - send a query >>> - the network goes down before the result to this last query has been received >>> - neither a result nor an error callback gets called - as far as I can >>> see (using txpostgres.ConnectionPool) >>> >>> What's the proper way to deal with that? >> >> TCP keepalive. By default the timeouts are quite high, >> but they are tunable. >> >> libpq supports keepalive tuning since 9.0, on older libpq >> you can do it yourself: >> >> https://github.com/markokr/skytools/blob/master/python/skytools/psycopgwrapper.py#L153 Keepalive will help to detect if TCP connection is down, it will not help if connection is up but server app is unresponsive. > After doing lots of tests, it seems that keepalives are not the full > solution. They're useful if you want to detect the connection breaking > while it's idle, but they don't help in the case of: > > * the the app sends a keepalive, receives response Sort of true, except Postgres does not have app-level keepalive (except SELECT 1). The PQping mentioned earlier creates new connection. > * the connection is idle > * before the next keepalive is sent, you want to do a query > * the connection breaks silently > * you try sending the query > * libpq tries to write the query to the conncetion socket, does not > receive TCP confirmation The TCP keepalive should help for those cases, perhaps you are doing something wrong if you are not seeing the effect. > * the kernel starts retransmitting the data, using TCP's RTO algorithm > * you don't get notified about the failure until the TCP gives up, which > might be a long time I'm not familiar with RTO, so cannot comment. Why would it stop keepalive from working? > So it seems to me that you need an application-level timeout also. I'm > thinking about supporting it in txpostgres, but will have to think > exactly how to do it and what would be the interface. > > Alternatively, you can lower the kernel TCP retry parameters > (net.ipv4.tcp_retries1 and net.ipv4.tcp_retries2), which will make TCP > give up earlier. Unfortunately it seems that you can only set the > globally at the kernel level and not per connection, which IMHO is a bit > too scary. What bothers me is that the keepalives mechanism does not > come into play while you're doing TCP retries, but that's apparently how > TCP works (at least on Linux...). > > If you want to detect the connection failing as soon as possible, and > not the next time you try to make a query, you need to regularly make > queries, IOW have a heartbeat. But all the things I wrote before still > apply, and without an app-level timeout or lowering the TCP retry > parameters it might take a lot of time to detect that the heartbeat failed. The need for periodic query is exactly the thing that keepalive should fix. OTOH, if you have connections that are long time idle you could simply drop them. We have the (4m idle + 4x15sec ping) parameters as default and they work fine - dead connection is killed after 5m. -- marko
On 15/02/11 21:55, Marko Kreen wrote: > On Tue, Feb 15, 2011 at 3:32 PM, Jan Urbański <wulczer@wulczer.org> wrote: >> * the the app sends a keepalive, receives response > > Sort of true, except Postgres does not have app-level > keepalive (except SELECT 1). The PQping mentioned > earlier creates new connection. By this I meant that an app is connected using libpq with keepalives enabled. >> * the connection is idle >> * before the next keepalive is sent, you want to do a query >> * the connection breaks silently >> * you try sending the query >> * libpq tries to write the query to the conncetion socket, does not >> receive TCP confirmation > > The TCP keepalive should help for those cases, perhaps > you are doing something wrong if you are not seeing the effect. Well for me it doesn't help, I'm not sure if it's my fault or the kernel's or it's just how TCP ought to work. >> * the kernel starts retransmitting the data, using TCP's RTO algorithm >> * you don't get notified about the failure until the TCP gives up, which >> might be a long time > > I'm not familiar with RTO, so cannot comment. > > Why would it stop keepalive from working? Looking at the traffic in Wireshark I'm seeing TCP retransmissions and no keepalive traffic. > The need for periodic query is exactly the thing that keepalive > should fix. OTOH, if you have connections that are long time idle > you could simply drop them. > > We have the (4m idle + 4x15sec ping) parameters as > default and they work fine - dead connection is killed > after 5m. Hm, so my test is like this: * I connect with psycopg2 enabling keepalives in the connection string, using "keepalives_idle=4 keepalives_interval=1 keepalives_count=1" * the test program sends a "select pg_sleep(6)" and then sleeps itself for 6 seconds, and does that in a loop * each time after the query is sent and 4 seconds elapse I'm seeing TCP keepalive packets going to the server and the server responding * each time after the program sleeps for 4 seconds, a keepalive is sent To simulate a connectivity loss I'm adding two rules to my firewall that block (the iptables DROP target) communication from or to port 5432. Now there are two scenarios: 1. if I block the connection right after the test program goes to sleep, the response to the keepalive is not received and a connectivity loss is detected. The app sends a RST packet (that obviously does not reach the server) and when it wakes up and tries to send the query, psycopg complains about a broken connection. Important: the backend stays alive and PG shows the connection as "IDLE in transaction". 2. if I block the connection after the test program already sent the keepalive, but before it sent the query it actually goes ahead and tries to send the query, and then blocks because the kernel is retrying the TCP delivery of the query. Keepalives are *not* sent and the process of TCP giving up can take quite some time (depends on the settigs for TCP timeout). The connection stays alive on the server anyway. 3. if I block the connection while it's waiting for the query to complete, a keepalive is sent, the connection is detected to be broken, the execute statement fails with a SystemError: null argument to internal routine (sic!), and the connection stays on the server. I'm not really sure what's the deal with the SystemError, but my conclusions are: * while TCP retry is in action, it disables keepalives * the backend stays alive on the server side anyway It's possible that TCP retries take a few minutes and I'm simply not patient enough (of course I'm not using a keepalive interval of 1 second in production). So if all you want is to detect a broken connection a couple of minutes from the moment it happened, you can have client-side keepalives tuned as Marko does it, and check that your TCP stack gives up a delivery attempt in less then a few minutes. On the other hand, you probably *should* also use server-side keepalives, so the server can detect a broken connection and kill the backend, otherwise you will end up with lots of "IDLE in connection" backends, which is Very Bad (can block autovacuum, still holds transaction locks etc). I'm going to do some more tests to see the default timeout for TCP delivery and if it's really in the range of 5 minutes, I'll be happy. Now I don't have any clue what's with the SystemError I'm getting, might take a look if I find the time. Attached is the test script and the command I use to simulate network outage. And last but not least, txpostgres does not play that well with client-side keepalives, because while the connection is idle, it's not watching its file descriptor, so error conditions on that descriptor will be detected only when you go and do a query. That is something I might fix in the future. Gah, looking at all that TCP stuff always makes my head spin. Cheers, Jan
Attachment
Hi, > Well something like that ;) I'd try doing it on the per-query level, > actually. Since you can't have more than one outstanding query, your > keepalive won't be sent until the current query finishes. Hm, what would be the difference to how I did it? It's per query, isn't it? I kinda know what you mean... PostgreSQL thinks that connection=transaction and so you have only one outstanding operation on the connection(=transaction), otherwise it won't be able to serialize. But how would the implementation in the txpostgresapi Connection code change? Would it? > psycopg2 needs to support PQPing, but that should be easy. >> However, doing a connection.close() then doesn't seem to help much, After adding debugging messages to the reactor, found out why: Both the connection and the cursor are still in the select() list of the reactor. So my current try is the following one and this works (far) - no exceptions raised by the reactor anymore... (note that I pass the cursor instance too, now) def handleTimeout(self, d, cursor): """ handles the timeout since we DID time out """ log.err("timed out") #cursor.close() # cannot be used while async query is in progress :-( self.reactor.removeReader(cursor) self.reactor.removeWriter(cursor) self.close() # close the connection (maybe it was a conn problem) if not d.called: d.errback(failure.Failure(RuntimeError())) def close(self): self.reactor.removeReader(self) self.reactor.removeWriter(self) return txpostgres.Connection.close(self) I don't really understand why I need to remove the cursor from the reactor as well. Shouldn't the cursor and the connection use the same socket? Or does it mean that there are two different python objects in the reactor which are actually using the same socket, but the reactor doesn't know that? I feel that I'm so close to it, but unfortunately the next time I try to create a new cursor, it will nag about async operation still in progress... I can't call cursor.close() since it is guarded by EXC_IF_ASYNC_IN_PROGRESS in psycopg2 - and so cannot be called while my query is (technically) still in progress. There's a PQcancel thing on the connection (not the cursor!) which cancels the query by sending a request to the server - that's not exactly what I want, though. I planned to set statement_timeout for the server to automatically abandon the query after a timeout and set my client timer to the same timeout - so both would do the right thing without explicit synchronization. > Ha, it always comes back to the ticket I filed when writing txpostgres: > http://twistedmatrix.com/trac/ticket/4539 Interesting... however, in my simple case, it seems to be fixed by calling both "removeReader" and "removeWriter" to remove both the cursor instance and the connection instance from the watchlist. > Believe it or not, this problem seems to also prevent proper > LISTEN/NOTIFY implementation... If it helps, in my case I changed functions in /usr/lib/python2.6/site-packages/twisted/internet/selectreactor.py (addReader, addWriter, removeReader, removeWriter) to print what's going on and found out about the cursor that way. What the ValueError means is that some of the file-like objects closed but are still in the reactor - so the reactor doesn't know what it's supposed to do with them: when is a closed FD ready? So one only has to make sure to remove the file-like object early enough, before the reactor gets control again. Danny