Thread: Problems with inconsistant query performance.
List, I posted a little about this a while back to the general list, but never really got any where with it so I'll try again, this time with a little more detail and hopefully someone can send me in the right direction. Here is the problem, I have a procedure that is called 100k times a day. Most of the time it's screaming fast, but other times it takes a few seconds. When it does lag my machine can get behind which causes other problems, so I'm trying to figure out why there is such a huge delta in performance with this proc. The proc is pretty large (due to the number of vars) so I will summarize it here: ========================================================================== CREATE acctmessage( <lots of accounting columns> )RETURNS void AS $$ BEGIN INSERT into tmpaccounting_tab ( ... ) values ( ... ); IF _acctType = 'start' THEN BEGIN INSERT into radutmp_tab ( ... ) valuse ( ... ); EXCEPTION WHEN UNIQUE_VIOLATION THEN NULL; END; ELSIF _acctType = 'stop' THEN UPDATE radutmp_tab SET ... WHERE sessionId = _sessionId AND userName = _userName; IF (NOT FOUND) THEN INSERT into radutmp_tab ( ... ) values ( ... ); END IF; END IF; END; $$ LANGUAGE plpgsql; ========================================================================== So in a nutshell, if I get an accounting record put it in the tmpaccounting_tab and then insert or update the radutmp_tab based on what kind of record it is. If for some reason the message is a start message and a duplicate, drop it, and if the message is a stop message and we don't have the start then insert it. The tmpaccounting_tab table doesn't have any indexes and gets flushed to the accounting_tab table nightly so it should have very good insert performance as the table is kept small (compared to accounting_tab) and doesn't have index overhead. The radutmp_tab is also kept small as completed sessions are flushed to another table nightly, but I do keep an index on sessionId and userName so the update isn't slow. Now that you have the layout, the problem: I log whenever a query takes more than 250ms and have logged this query: duration: 3549.229 ms statement: select acctMessage( 'stop', 'username', 'address', 'time', 'session', 'port', 'address', 'bytes', 'bytes', 0, 0, 1, 'reason', '', '', '', 'proto', 'service', 'info') But when I go do an explain analyze it is very fast: QUERY PLAN ------------------------------------------------------------------------------------ Result (cost=0.00..0.03 rows=1 width=0) (actual time=6.812..6.813 rows=1 loops=1) Total runtime: 6.888 ms So the question is why on a relatively simple proc and I getting a query performance delta between 3549ms and 7ms? Here are some values from my postgres.conf to look at: shared_buffers = 60000 # min 16 or max_connections*2, 8KB each temp_buffers = 5000 # min 100, 8KB each #max_prepared_transactions = 5 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). work_mem = 131072 # min 64, size in KB maintenance_work_mem = 262144 # min 1024, size in KB max_stack_depth = 2048 # min 100, size in KB effective_cache_size = 65536 # typically 8KB each Thanks for any help you can give, schu
Periodically taking longer is probably a case of some other process in the database holding a lock you need, or otherwise bogging the system down, especially if you're always running acctmessage from the same connection (because the query plans shouldn't be changing then). I'd suggest looking at what else is happening at the same time. Also, it's more efficient to operate on chunks of data rather than one row at a time whenever possible. If you have to log each row individually, consider simply logging them into a table, and then periodically pulling data out of that table to do additional processing on it. BTW, your detection of duplicates/row existance has a race condition. Take a look at example 36-1 at http://www.postgresql.org/docs/8.1/interactive/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING for a better way to handle it. On Wed, Sep 27, 2006 at 10:37:22AM -0800, Matthew Schumacher wrote: > List, > > I posted a little about this a while back to the general list, but never > really got any where with it so I'll try again, this time with a little > more detail and hopefully someone can send me in the right direction. > > Here is the problem, I have a procedure that is called 100k times a day. > Most of the time it's screaming fast, but other times it takes a few > seconds. When it does lag my machine can get behind which causes other > problems, so I'm trying to figure out why there is such a huge delta in > performance with this proc. > > The proc is pretty large (due to the number of vars) so I will summarize > it here: > > ========================================================================== > CREATE acctmessage( <lots of accounting columns> )RETURNS void AS $$ > BEGIN > INSERT into tmpaccounting_tab ( ... ) values ( ... ); > > IF _acctType = 'start' THEN > BEGIN > INSERT into radutmp_tab ( ... ) valuse ( ... ); > EXCEPTION WHEN UNIQUE_VIOLATION THEN > NULL; > END; > ELSIF _acctType = 'stop' THEN > UPDATE radutmp_tab SET ... WHERE sessionId = _sessionId AND userName = > _userName; > IF (NOT FOUND) THEN > INSERT into radutmp_tab ( ... ) values ( ... ); > END IF; > > END IF; > END; > $$ > LANGUAGE plpgsql; > ========================================================================== > > So in a nutshell, if I get an accounting record put it in the > tmpaccounting_tab and then insert or update the radutmp_tab based on > what kind of record it is. If for some reason the message is a start > message and a duplicate, drop it, and if the message is a stop message > and we don't have the start then insert it. > > The tmpaccounting_tab table doesn't have any indexes and gets flushed to > the accounting_tab table nightly so it should have very good insert > performance as the table is kept small (compared to accounting_tab) and > doesn't have index overhead. The radutmp_tab is also kept small as > completed sessions are flushed to another table nightly, but I do keep > an index on sessionId and userName so the update isn't slow. > > Now that you have the layout, the problem: I log whenever a query takes > more than 250ms and have logged this query: > > duration: 3549.229 ms statement: select acctMessage( 'stop', > 'username', 'address', 'time', 'session', 'port', 'address', 'bytes', > 'bytes', 0, 0, 1, 'reason', '', '', '', 'proto', 'service', 'info') > > But when I go do an explain analyze it is very fast: > > QUERY PLAN > ------------------------------------------------------------------------------------ > Result (cost=0.00..0.03 rows=1 width=0) (actual time=6.812..6.813 > rows=1 loops=1) > Total runtime: 6.888 ms > > So the question is why on a relatively simple proc and I getting a query > performance delta between 3549ms and 7ms? > > Here are some values from my postgres.conf to look at: > > shared_buffers = 60000 # min 16 or max_connections*2, > 8KB each > temp_buffers = 5000 # min 100, 8KB each > #max_prepared_transactions = 5 # can be 0 or more > # note: increasing max_prepared_transactions costs ~600 bytes of shared > memory > # per transaction slot, plus lock space (see max_locks_per_transaction). > work_mem = 131072 # min 64, size in KB > maintenance_work_mem = 262144 # min 1024, size in KB > max_stack_depth = 2048 # min 100, size in KB > effective_cache_size = 65536 # typically 8KB each > > > Thanks for any help you can give, > schu > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Jim, Thanks for the help. I went and looked at that example and I don't see how it's different than the "INSERT into radutmp_tab" I'm already doing. Both raise an exception, the only difference is that I'm not doing anything with it. Perhaps you are talking about the "IF (NOT FOUND)" I put after the "UPDATE radutmp_tab". Should this be an EXCEPTION instead? Also I don't know how this could cause a race condition. As far as I understand each proc is run in it's own transaction, and the code in the proc is run serially. Can you explain more why this could case a race? Thanks, schu Jim C. Nasby wrote: > Periodically taking longer is probably a case of some other process in > the database holding a lock you need, or otherwise bogging the system > down, especially if you're always running acctmessage from the same > connection (because the query plans shouldn't be changing then). I'd > suggest looking at what else is happening at the same time. > > Also, it's more efficient to operate on chunks of data rather than one > row at a time whenever possible. If you have to log each row > individually, consider simply logging them into a table, and then > periodically pulling data out of that table to do additional processing > on it. > > BTW, your detection of duplicates/row existance has a race condition. > Take a look at example 36-1 at > http://www.postgresql.org/docs/8.1/interactive/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING > for a better way to handle it. >> ========================================================================== >> CREATE acctmessage( <lots of accounting columns> )RETURNS void AS $$ >> BEGIN >> INSERT into tmpaccounting_tab ( ... ) values ( ... ); >> >> IF _acctType = 'start' THEN >> BEGIN >> INSERT into radutmp_tab ( ... ) valuse ( ... ); >> EXCEPTION WHEN UNIQUE_VIOLATION THEN >> NULL; >> END; >> ELSIF _acctType = 'stop' THEN >> UPDATE radutmp_tab SET ... WHERE sessionId = _sessionId AND userName = >> _userName; >> IF (NOT FOUND) THEN >> INSERT into radutmp_tab ( ... ) values ( ... ); >> END IF; >> >> END IF; >> END; >> $$ >> LANGUAGE plpgsql; >> ==========================================================================
On Wed, Sep 27, 2006 at 01:33:09PM -0800, Matthew Schumacher wrote: > Jim, > > Thanks for the help. I went and looked at that example and I don't see > how it's different than the "INSERT into radutmp_tab" I'm already doing. > Both raise an exception, the only difference is that I'm not doing > anything with it. Perhaps you are talking about the "IF (NOT FOUND)" I > put after the "UPDATE radutmp_tab". Should this be an EXCEPTION > instead? Also I don't know how this could cause a race condition. As > far as I understand each proc is run in it's own transaction, and the > code in the proc is run serially. Can you explain more why this could > case a race? It can cause a race if another process could be performing those same inserts or updates at the same time. I know the UPDATE case can certainly cause a race. 2 connections try to update, both hit NOT FOUND, both try to insert... only one will get to commit. I think that the UNIQUE_VIOLATION case should be safe, since a second inserter should block if there's another insert that's waiting to commit. DELETEs are something else to think about for both cases. If you're certain that only one process will be performing DML on those tables at any given time, then what you have is safe. But if that's the case, I'm thinking you should be able to group things into chunks, which should be more efficient. > Thanks, > schu > > > > Jim C. Nasby wrote: > > Periodically taking longer is probably a case of some other process in > > the database holding a lock you need, or otherwise bogging the system > > down, especially if you're always running acctmessage from the same > > connection (because the query plans shouldn't be changing then). I'd > > suggest looking at what else is happening at the same time. > > > > Also, it's more efficient to operate on chunks of data rather than one > > row at a time whenever possible. If you have to log each row > > individually, consider simply logging them into a table, and then > > periodically pulling data out of that table to do additional processing > > on it. > > > > BTW, your detection of duplicates/row existance has a race condition. > > Take a look at example 36-1 at > > http://www.postgresql.org/docs/8.1/interactive/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING > > for a better way to handle it. > > >> ========================================================================== > >> CREATE acctmessage( <lots of accounting columns> )RETURNS void AS $$ > >> BEGIN > >> INSERT into tmpaccounting_tab ( ... ) values ( ... ); > >> > >> IF _acctType = 'start' THEN > >> BEGIN > >> INSERT into radutmp_tab ( ... ) valuse ( ... ); > >> EXCEPTION WHEN UNIQUE_VIOLATION THEN > >> NULL; > >> END; > >> ELSIF _acctType = 'stop' THEN > >> UPDATE radutmp_tab SET ... WHERE sessionId = _sessionId AND userName = > >> _userName; > >> IF (NOT FOUND) THEN > >> INSERT into radutmp_tab ( ... ) values ( ... ); > >> END IF; > >> > >> END IF; > >> END; > >> $$ > >> LANGUAGE plpgsql; > >> ========================================================================== > -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Jim C. Nasby wrote: > > It can cause a race if another process could be performing those same > inserts or updates at the same time. There are inserts and updates running all of the time, but never the same data. I'm not sure how I can get around this since the queries are coming from my radius system which is not able to queue this stuff up because it waits for a successful query before returning an OK packet back to the client. > > I know the UPDATE case can certainly cause a race. 2 connections try to > update, both hit NOT FOUND, both try to insert... only one will get to > commit. Why is that? Doesn't the first update lock the row causing the second one to wait, then the second one stomps on the row allowing both to commit? I must be confused.... > > I think that the UNIQUE_VIOLATION case should be safe, since a second > inserter should block if there's another insert that's waiting to > commit. Are you saying that inserts inside of an EXCEPTION block, but normal inserts don't? > > DELETEs are something else to think about for both cases. I only do one delete and that is every night when I move the data to the primary table and remove that days worth of data from the tmp table. This is done at non-peak times with a vacuum, so I think I'm good here. > > If you're certain that only one process will be performing DML on those > tables at any given time, then what you have is safe. But if that's the > case, I'm thinking you should be able to group things into chunks, which > should be more efficient. Yea, I wish I could, but I really need to do one at a time because of how radius waits for a successful query before telling the access server all is well. If the query fails, the access server won't get the 'OK' packet and will send the data to the secondary radius system where it gets queued. Do you know of a way to see what is going on with the locking system other than "select * from pg_locks"? I can't ever seem to catch the system when queries start to lag. Thanks again, schu
On Wed, Sep 27, 2006 at 02:17:23PM -0800, Matthew Schumacher wrote: > Jim C. Nasby wrote: > > > > It can cause a race if another process could be performing those same > > inserts or updates at the same time. > > There are inserts and updates running all of the time, but never the > same data. I'm not sure how I can get around this since the queries are > coming from my radius system which is not able to queue this stuff up > because it waits for a successful query before returning an OK packet > back to the client. > > > > > I know the UPDATE case can certainly cause a race. 2 connections try to > > update, both hit NOT FOUND, both try to insert... only one will get to > > commit. > > Why is that? Doesn't the first update lock the row causing the second > one to wait, then the second one stomps on the row allowing both to > commit? I must be confused.... What if there's no row to update? Process A Process B UPDATE .. NOT FOUND UPDATE .. NOT FOUND INSERT INSERT blocks COMMIT UNIQUE_VIOLATION That's assuming that there's a unique index. If there isn't one, you'd get duplicate records. > > I think that the UNIQUE_VIOLATION case should be safe, since a second > > inserter should block if there's another insert that's waiting to > > commit. > > Are you saying that inserts inside of an EXCEPTION block, but normal > inserts don't? No... if there's a unique index, a second INSERT attempting to create a duplicate record will block until the first INSERT etiher commits or rollsback. > > DELETEs are something else to think about for both cases. > > I only do one delete and that is every night when I move the data to the > primary table and remove that days worth of data from the tmp table. > This is done at non-peak times with a vacuum, so I think I'm good here. Except that you might still have someone fire off that function while the delete's running, or vice-versa. So there could be a race condition (I haven't thought enough about what race conditions that could cause). > > If you're certain that only one process will be performing DML on those > > tables at any given time, then what you have is safe. But if that's the > > case, I'm thinking you should be able to group things into chunks, which > > should be more efficient. > > Yea, I wish I could, but I really need to do one at a time because of > how radius waits for a successful query before telling the access server > all is well. If the query fails, the access server won't get the 'OK' > packet and will send the data to the secondary radius system where it > gets queued. In that case, the key is to do the absolute smallest amount of work possible as part of that transaction. Ideally, you would only insert a record into a queue table somewhere, and then periodically process records out of that table in batches. > Do you know of a way to see what is going on with the locking system > other than "select * from pg_locks"? I can't ever seem to catch the > system when queries start to lag. No. Your best bet is to open two psql sessions and step through things in different combinations (make sure and do this in transactions). -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
> So the question is why on a relatively simple proc and I getting a query > performance delta between 3549ms and 7ms? What version of PG is it? I had such problems in a pseudo-realtime app I use here with Postgres, and they went away when I moved to 8.1 (from 7.4). I guess it is better shared buffer management code (don`t You see a big_query_searching_through_half_the_db just before You get this slow insert? ) . Greetings Marcin
Marcin Mank wrote: >> So the question is why on a relatively simple proc and I getting a query >> performance delta between 3549ms and 7ms? > > What version of PG is it? > > I had such problems in a pseudo-realtime app I use here with Postgres, and > they went away when I moved to 8.1 (from 7.4). I guess it is better shared > buffer management code (don`t You see a > big_query_searching_through_half_the_db just before You get this slow > insert? ) . > > Greetings > Marcin > Marcin, It is 8.1.4, and there is searching being done on the radutmp_tab all of the time. It is relatively small though, only a couple of thousand rows. The tmpaccounting_tab table which also gets inserts is only used for inserting, then one large query every night to flush the data to an indexed table. What I really need is a way to profile my proc when it runs slow so that I can resolve which of the queries is really slow. Anyone with an idea on how to do this? schu
In response to Matthew Schumacher <matt.s@aptalaska.net>: > > What I really need is a way to profile my proc when it runs slow so that > I can resolve which of the queries is really slow. Anyone with an idea > on how to do this? You could turn on statement logging and duration logging. This would give you a record of when things run and how long they take. A little work analyzing should show you which queries are running when your favorite query slows down. -- Bill Moran Collaborative Fusion Inc.
On Thu, Sep 28, 2006 at 11:28:43AM -0400, Bill Moran wrote: > In response to Matthew Schumacher <matt.s@aptalaska.net>: > > > > What I really need is a way to profile my proc when it runs slow so that > > I can resolve which of the queries is really slow. Anyone with an idea > > on how to do this? > > You could turn on statement logging and duration logging. This would > give you a record of when things run and how long they take. A little > work analyzing should show you which queries are running when your > favorite query slows down. By default, that doesn't help you debug what's happening inside a function, because you only get the call to the function. I don't know if you can increase verbosity to combat that. Something else to consider is that gettimeofday() on some platforms is painfully slow, which could completely skew all your numbers. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
In response to "Jim C. Nasby" <jim@nasby.net>: > On Thu, Sep 28, 2006 at 11:28:43AM -0400, Bill Moran wrote: > > In response to Matthew Schumacher <matt.s@aptalaska.net>: > > > > > > What I really need is a way to profile my proc when it runs slow so that > > > I can resolve which of the queries is really slow. Anyone with an idea > > > on how to do this? > > > > You could turn on statement logging and duration logging. This would > > give you a record of when things run and how long they take. A little > > work analyzing should show you which queries are running when your > > favorite query slows down. > > By default, that doesn't help you debug what's happening inside a > function, because you only get the call to the function. I don't know if > you can increase verbosity to combat that. Right, but my point was that he believes another query is interfering when the target query is slow. Turning on those logging statements will: a) Allow him to identify times when the query is slow. b) Identify other queries that are running at the same time. If he sees a pattern (i.e. My query is always slow if query X5 is running at the same time) he'll have a good lead into further research. -- Bill Moran Collaborative Fusion Inc.