Thread: Strange slowness
Hi all,
I'm investigating some occasional slowness in our API servers, and I've hit upon some strange behavior that I can't explain. I'm hoping someone on this list can help.
Stack is django 1.3.1, Ubuntu 10.04, python 2.6.5, psycopg 2.4.5, postgresql 9.0, pgbouncer 1.4.1. All of these run on the same machine with the exception of postgresql, which runs on a dedicated db machine. So, when django creates a connection to the database, it is connecting to localhost and really talking to pgbouncer.
Here's the relevant code from django/db/backends/postgresql_psycopg2/base.py:
self.connection = Database.connect(**conn_params)
self.connection.set_client_encoding('UTF8')
self.connection.set_isolation_level(self.isolation_level)
cursor = self.connection.cursor()
cursor.execute("SET TIME ZONE %s", [settings_dict['TIME_ZONE']])
I put timing code between these lines, and found that calling set_client_encoding() was sometimes taking upwards of 6 seconds. Very strange, but it led me to the fact that our db encoding was mistakenly set to sql_ascii, so I've fixed that. Now, I'm seeing occasional 6 seconds delays on the next line - set_isolation_level. We're using defaults, so this is setting the level to 1, or "read committed". Theoretically that's a noop, but obviously the code needs to check that the value passed in equals the value already set, and then return. Still, it shouldn't take upwards of 6 seconds to do this at times. The machines are not swapping, and not doing anything else besides serving some static content.
It seems very strange that after I fixed the db encoding, the timing problem would jump to the next line. Before I fixed the db encoding, we never saw long times spent in set_isolation_level. This leads me to believe the issue is something else, perhaps network related. For example, supposed Database.connect lazily creates a connection, and now that set_client_encoding can short circuit due to the encodings matching, the first time the connection is used is now in set_isolation_level. This is just wild speculation, but I'm at a loss in explaining what I'm seeing.
Anyone have any ideas about what could be going on, or where to look further? Thanks,
Russ
On Mon, Apr 30, 2012 at 10:29 PM, Russ Neufeld <russn@nasuni.com> wrote: > Hi all, > > I'm investigating some occasional slowness in our API servers, and I've hit > upon some strange behavior that I can't explain. I'm hoping someone on this > list can help. > > Stack is django 1.3.1, Ubuntu 10.04, python 2.6.5, psycopg 2.4.5, postgresql > 9.0, pgbouncer 1.4.1. All of these run on the same machine with the > exception of postgresql, which runs on a dedicated db machine. So, when > django creates a connection to the database, it is connecting to localhost > and really talking to pgbouncer. > > Here's the relevant code > from django/db/backends/postgresql_psycopg2/base.py: > > self.connection = Database.connect(**conn_params) > self.connection.set_client_encoding('UTF8') > self.connection.set_isolation_level(self.isolation_level) > > cursor = self.connection.cursor() > > cursor.execute("SET TIME ZONE %s", [settings_dict['TIME_ZONE']]) > > I put timing code between these lines, and found that calling > set_client_encoding() was sometimes taking upwards of 6 seconds. Very > strange, but it led me to the fact that our db encoding was mistakenly set > to sql_ascii, so I've fixed that. Now, I'm seeing occasional 6 seconds > delays on the next line - set_isolation_level. We're using defaults, so > this is setting the level to 1, or "read committed". Theoretically that's a > noop, but obviously the code needs to check that the value passed in equals > the value already set, and then return. Still, it shouldn't take upwards of > 6 seconds to do this at times. The machines are not swapping, and not doing > anything else besides serving some static content. > > It seems very strange that after I fixed the db encoding, the timing problem > would jump to the next line. Before I fixed the db encoding, we never saw > long times spent in set_isolation_level. This leads me to believe the issue > is something else, perhaps network related. For example, > supposed Database.connect lazily creates a connection, and now > that set_client_encoding can short circuit due to the encodings matching, > the first time the connection is used is now in set_isolation_level. This > is just wild speculation, but I'm at a loss in explaining what I'm seeing. > > Anyone have any ideas about what could be going on, or where to look > further? Thanks, From the description I'd look into how pgbouncer connects to the db: if it was a normal PG connection, the slowest of the commands would be connect() (and looking at Django's code, Database is just an alias for psycopg2, so it's really just it). For a test I would connect directly to the db instead of pgbouncer and see if connect() becomes the slow part. You can also enable logging in the server to get the duration of each statement, which should show you that set time zone or whatever is the first command is not the slow one (you could combine it with a couple of prints, which I bet will show you that the delay happens between the execute() statement and the moment the server receives the command). Could it be that pgbouncer is slow to connect because of some ssl issue? If you rule it out of your setup and see the slowness at connect time, then it would be an area to look into. -- Daniele
On 30/04/12 23:29, Russ Neufeld wrote: > Hi all, > > I'm investigating some occasional slowness in our API servers, and I've > hit upon some strange behavior that I can't explain. I'm hoping someone > on this list can help. Hi Russ, you may want to configure PostgreSQL to log connections and queries, with execution times. Then cross-check PostgreSQL, Django and pgbouncer logs to understand exactly where that 6 seconds are spent. I bet on pgbouncer connecting to PostgreSQL (SSL?) but we don't have enough data to do a proper analysis here. federico -- Federico Di Gregorio federico.digregorio@dndg.it Studio Associato Di Nunzio e Di Gregorio http://dndg.it Quis custodiet ipsos custodes? -- Juvenal, Satires, VI, 347
Thanks Federico. I think you might be right - pgbouncer does not seem to be maintaining connections to postgresql for verylong, and as a result when a load spike comes in I suspect there's a fair bit of time and contention in reestablishingthem. I'm looking there now. On May 1, 2012, at 5:54 AM, Federico Di Gregorio wrote: > On 30/04/12 23:29, Russ Neufeld wrote: >> Hi all, >> >> I'm investigating some occasional slowness in our API servers, and I've >> hit upon some strange behavior that I can't explain. I'm hoping someone >> on this list can help. > > Hi Russ, > > you may want to configure PostgreSQL to log connections and queries, > with execution times. Then cross-check PostgreSQL, Django and pgbouncer > logs to understand exactly where that 6 seconds are spent. I bet on > pgbouncer connecting to PostgreSQL (SSL?) but we don't have enough data > to do a proper analysis here. > > federico > > -- > Federico Di Gregorio federico.digregorio@dndg.it > Studio Associato Di Nunzio e Di Gregorio http://dndg.it > Quis custodiet ipsos custodes? -- Juvenal, Satires, VI, 347 > > -- > Sent via psycopg mailing list (psycopg@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/psycopg
Thanks Daniele. We're not using ssl, but I think you might be right to look at pgbouncer. It does not seem to be maintainingconnections to postgresql for very long, and as a result when a load spike comes in I suspect there's a fair bitof time and contention in reestablishing them. I'm looking there now. On Apr 30, 2012, at 5:46 PM, Daniele Varrazzo wrote: > On Mon, Apr 30, 2012 at 10:29 PM, Russ Neufeld <russn@nasuni.com> wrote: >> Hi all, >> >> I'm investigating some occasional slowness in our API servers, and I've hit >> upon some strange behavior that I can't explain. I'm hoping someone on this >> list can help. >> >> Stack is django 1.3.1, Ubuntu 10.04, python 2.6.5, psycopg 2.4.5, postgresql >> 9.0, pgbouncer 1.4.1. All of these run on the same machine with the >> exception of postgresql, which runs on a dedicated db machine. So, when >> django creates a connection to the database, it is connecting to localhost >> and really talking to pgbouncer. >> >> Here's the relevant code >> from django/db/backends/postgresql_psycopg2/base.py: >> >> self.connection = Database.connect(**conn_params) >> self.connection.set_client_encoding('UTF8') >> self.connection.set_isolation_level(self.isolation_level) >> >> cursor = self.connection.cursor() >> >> cursor.execute("SET TIME ZONE %s", [settings_dict['TIME_ZONE']]) >> >> I put timing code between these lines, and found that calling >> set_client_encoding() was sometimes taking upwards of 6 seconds. Very >> strange, but it led me to the fact that our db encoding was mistakenly set >> to sql_ascii, so I've fixed that. Now, I'm seeing occasional 6 seconds >> delays on the next line - set_isolation_level. We're using defaults, so >> this is setting the level to 1, or "read committed". Theoretically that's a >> noop, but obviously the code needs to check that the value passed in equals >> the value already set, and then return. Still, it shouldn't take upwards of >> 6 seconds to do this at times. The machines are not swapping, and not doing >> anything else besides serving some static content. >> >> It seems very strange that after I fixed the db encoding, the timing problem >> would jump to the next line. Before I fixed the db encoding, we never saw >> long times spent in set_isolation_level. This leads me to believe the issue >> is something else, perhaps network related. For example, >> supposed Database.connect lazily creates a connection, and now >> that set_client_encoding can short circuit due to the encodings matching, >> the first time the connection is used is now in set_isolation_level. This >> is just wild speculation, but I'm at a loss in explaining what I'm seeing. >> >> Anyone have any ideas about what could be going on, or where to look >> further? Thanks, > > From the description I'd look into how pgbouncer connects to the db: > if it was a normal PG connection, the slowest of the commands would be > connect() (and looking at Django's code, Database is just an alias for > psycopg2, so it's really just it). For a test I would connect directly > to the db instead of pgbouncer and see if connect() becomes the slow > part. You can also enable logging in the server to get the duration of > each statement, which should show you that set time zone or whatever > is the first command is not the slow one (you could combine it with a > couple of prints, which I bet will show you that the delay happens > between the execute() statement and the moment the server receives the > command). > > Could it be that pgbouncer is slow to connect because of some ssl > issue? If you rule it out of your setup and see the slowness at > connect time, then it would be an area to look into. > > -- Daniele