Thread: pg 8.1.2 performance issue
I have a performance riddle, hoping someone can point me in a helpful direction. We have a pg 8.1.2 cluster using Apache::Sessions and experiencing simple UPDATEs taking sometimes 30+ seconds to do a very simply update, no foreign keys, no triggers: Table "public.sessions" Column | Type | Modifiers -----------+---------------+----------- id | character(32) | not null a_session | text | Indexes: "sessions_pkey" PRIMARY KEY, btree (id) This is on an HP ia64 11.23 box with what appears to be gobs of surplus CPU, I/O (it's on a SAN), and RAM, pretty high query volume from 180 concurrent client connections. Wondering if it is a locking issue, I set up logging to capture existing locks every 10s with this query: SELECT now(), dbu.usename as locker, l.mode as locktype, CASE l.granted WHEN true THEN 'granted' ELSE 'pending' END as status, pg_stat_get_backend_pid(S.backendid) as pid, l.transaction as xid, db.datname||'.'||n.nspname||'.'|| r.relname as relation, case l.mode when 'AccessShareLock' then 1 when 'RowShareLock' then 2 when 'Row ExclusiveLock' then 3 when 'ShareUpdateExclusiveLock' then 4 when 'ShareLock' then 5 when 'ShareRowExclusiveLock' then 6 when 'ExclusiveLock' then 7 else 100 end as exclusivity, pg_stat_get_backend_activity(S.backendid) as query FROM pg_user dbu, (SELECT pg_stat_get_backend_idset() AS backendid) AS S, pg_database db, pg_locks l, pg_class r, pg_namespace n WHERE db.oid = pg_stat_get_backend_dbid(S.backendid) AND dbu.usesysid = pg_stat_get_backend_userid(S.backendid) AND l.pid = pg_stat_get_backend_pid(S.backendid) AND l.relation = r.oid AND l.database = db.oid AND r.relnamespace = n.oid ORDER BY exclusivity DESC, db.datname, n.nspname, r.relname, l.mode;" I see what appear to be many single transactions holding RowExclusiveLocks for sometimes 40-50 seconds while their query shows "<IDLE> in transaction". 2006-03-25 20:04:01.063873-08 | www | RowExclusiveLock | granted | 17192 | 270205914 | db1.public.sessions | 3 | <IDLE> in transaction 2006-03-25 20:04:11.128632-08 | www | RowExclusiveLock | granted | 17192 | 270205914 | db1.public.sessions | 3 | <IDLE> in transaction 2006-03-25 20:04:21.215896-08 | www | RowExclusiveLock | granted | 17192 | 270205914 | db1.public.sessions | 3 | <IDLE> in transaction I'm thinking that means the client is simply tweaking a row and then failing to commit the change for 40-50 seconds. Is that consistent? Is there something else obvious here to explain the delays? Does this sound like a browser stop button issue where they may be aborting the query that has the lock, and then issuing another that waits on the first? Other suggestions? Thanks, Ed
On Saturday March 25 2006 9:36 pm, Ed L. wrote: > I have a performance riddle, hoping someone can point me in a > helpful direction. We have a pg 8.1.2 cluster using > Apache::Sessions and experiencing simple UPDATEs taking > sometimes 30+ seconds to do a very simply update, no foreign > keys, no triggers: > > Table "public.sessions" > Column | Type | Modifiers > -----------+---------------+----------- > id | character(32) | not null > a_session | text | > Indexes: > "sessions_pkey" PRIMARY KEY, btree (id) The table has 6800 rows over 18000 pages, and is getting a minimum of many tens of thousands of updates per day with queries like this: UPDATE sessions SET a_session = ? WHERE id = ? Ed > > > This is on an HP ia64 11.23 box with what appears to be gobs > of surplus CPU, I/O (it's on a SAN), and RAM, pretty high > query volume from 180 concurrent client connections. > > Wondering if it is a locking issue, I set up logging to > capture existing locks every 10s with this query: > > SELECT now(), dbu.usename as locker, l.mode as locktype, CASE > l.granted WHEN true THEN 'granted' ELSE 'pending' END as > status, pg_stat_get_backend_pid(S.backendid) > as pid, l.transaction as xid, > db.datname||'.'||n.nspname||'.'|| r.relname as relation, case > l.mode when 'AccessShareLock' then 1 when 'RowShareLock' then > 2 when 'Row > ExclusiveLock' then 3 when 'ShareUpdateExclusiveLock' then 4 > when 'ShareLock' then 5 when 'ShareRowExclusiveLock' then 6 > when 'ExclusiveLock' then 7 else 100 end as > exclusivity, pg_stat_get_backend_activity(S.backendid) as > query FROM pg_user dbu, > (SELECT pg_stat_get_backend_idset() AS backendid) AS S, > pg_database db, pg_locks l, pg_class r, pg_namespace n > WHERE db.oid = pg_stat_get_backend_dbid(S.backendid) > AND dbu.usesysid = pg_stat_get_backend_userid(S.backendid) > AND l.pid = pg_stat_get_backend_pid(S.backendid) > AND l.relation = r.oid > AND l.database = db.oid > AND r.relnamespace = n.oid > ORDER BY exclusivity DESC, db.datname, n.nspname, r.relname, > l.mode;" > > > I see what appear to be many single transactions holding > RowExclusiveLocks for sometimes 40-50 seconds while their > query shows "<IDLE> in transaction". > > 2006-03-25 20:04:01.063873-08 | www | RowExclusiveLock | > granted > > | 17192 | 270205914 | db1.public.sessions | 3 | > | <IDLE> > > in transaction > 2006-03-25 20:04:11.128632-08 | www | RowExclusiveLock | > granted > > | 17192 | 270205914 | db1.public.sessions | > | 3 <IDLE> in transaction > > 2006-03-25 20:04:21.215896-08 | www | RowExclusiveLock | > granted > > | 17192 | 270205914 | db1.public.sessions | > | 3 <IDLE> in transaction > > I'm thinking that means the client is simply tweaking a row > and then failing to commit the change for 40-50 seconds. Is > that consistent? Is there something else obvious here to > explain the delays? Does this sound like a browser stop > button issue where they may be aborting the query that has the > lock, and then issuing another that waits on the first? Other > suggestions? > > Thanks, > Ed
On Saturday March 25 2006 9:49 pm, Ed L. wrote: > On Saturday March 25 2006 9:36 pm, Ed L. wrote: > > I have a performance riddle, hoping someone can point me in > > a helpful direction. We have a pg 8.1.2 cluster using > > Apache::Sessions and experiencing simple UPDATEs taking > > sometimes 30+ seconds to do a very simply update, no foreign > > keys, no triggers: > > > > Table "public.sessions" > > Column | Type | Modifiers > > -----------+---------------+----------- > > id | character(32) | not null > > a_session | text | > > Indexes: > > "sessions_pkey" PRIMARY KEY, btree (id) > > The table has 6800 rows over 18000 pages, and is getting a > minimum of many tens of thousands of updates per day with > queries like this: > > UPDATE sessions SET a_session = ? WHERE id = ? Sorry, it's getting late. Yes, the table has been analyzed and explain shows an index scan is occurring. Ed
On 3/26/06, Ed L. <pgsql@bluepolka.net> wrote: > On Saturday March 25 2006 9:36 pm, Ed L. wrote: > > I have a performance riddle, hoping someone can point me in a > > helpful direction. We have a pg 8.1.2 cluster using > > Apache::Sessions and experiencing simple UPDATEs taking > > sometimes 30+ seconds to do a very simply update, no foreign > > keys, no triggers: > > > > Table "public.sessions" > > Column | Type | Modifiers > > -----------+---------------+----------- > > id | character(32) | not null > > a_session | text | > > Indexes: > > "sessions_pkey" PRIMARY KEY, btree (id) > > The table has 6800 rows over 18000 pages, and is getting a > minimum of many tens of thousands of updates per day with > queries like this: If you're updating that much, how often are you running 'analyze'? Are you running autovacuum? How often? -- Postgresql & php tutorials http://www.designmagick.com/
On Saturday March 25 2006 9:55 pm, chris smith wrote: > On 3/26/06, Ed L. <pgsql@bluepolka.net> wrote: > > On Saturday March 25 2006 9:36 pm, Ed L. wrote: > > > I have a performance riddle, hoping someone can point me > > > in a helpful direction. We have a pg 8.1.2 cluster using > > > Apache::Sessions and experiencing simple UPDATEs taking > > > sometimes 30+ seconds to do a very simply update, no > > > foreign keys, no triggers: > > > > The table has 6800 rows over 18000 pages, and is getting a > > minimum of many tens of thousands of updates per day with > > queries like this: > > If you're updating that much, how often are you running > 'analyze'? Are you running autovacuum? How often? I count on the built-in autovacuum to do do analyzes (per 8.1.2 docs). I'm running autovacuum with the following non-default parameters: autovacuum = on autovacuum_naptime = 600 # 10 minutes With our set of 4 DBs, that amounts to once every 40 minutes for the given database. I see "LOG: autovacuum: processing database "xyz"" in the log, but I do not see any analyze/vacuum commands being issued at all (does it log when it analyzes/vacuums?). Thanks, Ed
On Sunday March 26 2006 7:22 am, Ed L. wrote: > On Saturday March 25 2006 9:55 pm, chris smith wrote: > > On 3/26/06, Ed L. <pgsql@bluepolka.net> wrote: > > > On Saturday March 25 2006 9:36 pm, Ed L. wrote: > > > > I have a performance riddle, hoping someone can point me > > > > in a helpful direction. We have a pg 8.1.2 cluster > > > > using Apache::Sessions and experiencing simple UPDATEs > > > > taking sometimes 30+ seconds to do a very simply update, > > > > no foreign keys, no triggers: > > > > > > The table has 6800 rows over 18000 pages, and is getting a > > > minimum of many tens of thousands of updates per day with > > > queries like this: > > > > If you're updating that much, how often are you running > > 'analyze'? Are you running autovacuum? How often? > > I count on the built-in autovacuum to do do analyzes (per > 8.1.2 docs). I'm running autovacuum with the following > non-default parameters: > > autovacuum = on > autovacuum_naptime = 600 # 10 minutes I also have these non-default settings: autovacuum_vacuum_cost_delay = 500 autovacuum_vacuum_cost_limit = 200 Thanks, Ed
"Ed L." <pgsql@bluepolka.net> writes: > With our set of 4 DBs, that amounts to once every 40 minutes for > the given database. I see "LOG: autovacuum: processing > database "xyz"" in the log, but I do not see any analyze/vacuum > commands being issued at all (does it log when it > analyzes/vacuums?). Not at the default logging level --- I think you have to go to DEBUG1 or DEBUG2 to get info about per-table actions. (This ought to be rethought, likely, as that will also clutter your log with a ton of extraneous info...) regards, tom lane
"Ed L." <pgsql@bluepolka.net> writes: > I see what appear to be many single transactions holding > RowExclusiveLocks for sometimes 40-50 seconds while their query > shows "<IDLE> in transaction". > ... > I'm thinking that means the client is simply tweaking a row and > then failing to commit the change for 40-50 seconds. Is that > consistent? That's what it sounds like to me. You might consider logging all commands from these clients for awhile so you can check that theory. regards, tom lane
>>>> The table has 6800 rows over 18000 pages, and is getting a >>>> minimum of many tens of thousands of updates per day with >>>> queries like this: >>> If you're updating that much, how often are you running >>> 'analyze'? Are you running autovacuum? How often? >> I count on the built-in autovacuum to do do analyzes (per >> 8.1.2 docs). I'm running autovacuum with the following >> non-default parameters: >> >> autovacuum = on >> autovacuum_naptime = 600 # 10 minutes > > I also have these non-default settings: > > autovacuum_vacuum_cost_delay = 500 > autovacuum_vacuum_cost_limit = 200 Not totally sure, but it sounds like the table isn't getting vacuumed often enough. To help, you might reduce the naptime from 10 minutes to 5. But I think the cost_delay settings might be the larger problem. If I remember correctly, even small values here tend to greatly increase the time it takes vacuum commands to complete, so you might try backing down those settings. Can you tell from the log files how often autovacuum is actually taking actions, or how long those actions are taking? Does anyone out there have any empirical data on good autovacuum_*_cost_* settings? I would be curious to hear about them.
On Sun, Mar 26, 2006 at 11:27:33AM -0500, Matthew T. O'Connor wrote: > >>>>The table has 6800 rows over 18000 pages, and is getting a > >>>>minimum of many tens of thousands of updates per day with > >>>>queries like this: > >>>If you're updating that much, how often are you running > >>>'analyze'? Are you running autovacuum? How often? > >>I count on the built-in autovacuum to do do analyzes (per > >>8.1.2 docs). I'm running autovacuum with the following > >>non-default parameters: > >> > >> autovacuum = on > >> autovacuum_naptime = 600 # 10 minutes > > > >I also have these non-default settings: > > > > autovacuum_vacuum_cost_delay = 500 > > autovacuum_vacuum_cost_limit = 200 > > Not totally sure, but it sounds like the table isn't getting vacuumed > often enough. To help, you might reduce the naptime from 10 minutes to Just look at the number of rows an the number of pages; the table has a huge amount of bloat, and all the indexes will as well. Best bet at this point to get things under control is a VACUUM FULL and a REINDEX. > 5. But I think the cost_delay settings might be the larger problem. If > I remember correctly, even small values here tend to greatly increase > the time it takes vacuum commands to complete, so you might try backing > down those settings. Can you tell from the log files how often > autovacuum is actually taking actions, or how long those actions are taking? > > Does anyone out there have any empirical data on good > autovacuum_*_cost_* settings? I would be curious to hear about them. Even 5 minutes is too long to sleep in this case; I'd go with something closer to 10 seconds. The default threasholds won't work either; I'd cut autovacuum_*_(scale_factor|threshold) in half. And yes, that cost delay is way too large, I'd go with 20-50, given that you've got "gobs of IO". Finally, PostgreSQL just flat-out doesn't handle things like webapp session tables well at all, because of the high update volume. You should consider other alternatives. Someone did point me to a session manager that allows you to do most work in memory, only flushing to permanent storage (ie: the database) periodically, that would probably be a good option for you. Another possibility is using something like SQLite just for storing the session info (though I think it's MVCC based as well, so it might have just as much difficulty with this as PostgreSQL does). -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Sunday March 26 2006 9:14 am, Tom Lane wrote: > "Ed L." <pgsql@bluepolka.net> writes: > > With our set of 4 DBs, that amounts to once every 40 minutes > > for the given database. I see "LOG: autovacuum: processing > > database "xyz"" in the log, but I do not see any > > analyze/vacuum commands being issued at all (does it log > > when it > > analyzes/vacuums?). > > Not at the default logging level --- I think you have to go to > DEBUG1 or DEBUG2 to get info about per-table actions. (This > ought to be rethought, likely, as that will also clutter your > log with a ton of extraneous info...) Yes, I'd definitely like to see those vacuums and analyzes logged similarly to user-generated queries. Ed
On Sunday March 26 2006 9:16 am, Tom Lane wrote: > "Ed L." <pgsql@bluepolka.net> writes: > > I see what appear to be many single transactions holding > > RowExclusiveLocks for sometimes 40-50 seconds while their > > query shows "<IDLE> in transaction". > > ... > > I'm thinking that means the client is simply tweaking a row > > and then failing to commit the change for 40-50 seconds. Is > > that consistent? > > That's what it sounds like to me. You might consider logging > all commands from these clients for awhile so you can check > that theory. This indeed appears to be locking problem from within Apache::Session where it deletes a row from the DB but fails to commit the change for an extended period while another transaction waits on the same row. Not sure how/why that's happening in the client code, but it's not a DB issue. Thanks, Ed
On Mar 31, 2006, at 1:51 PM, Ed L. wrote: > This indeed appears to be locking problem from within > Apache::Session where it deletes a row from the DB but fails to > commit the change for an extended period while another And you should read well the notes in the Pg driver for Apache::Session where it requires you to define your transaction model. Ie, this is pilot error more than Apache::Session problem. Apache::Session works extremely well with Postgres.
Tom Lane wrote: > "Ed L." <pgsql@bluepolka.net> writes: > > With our set of 4 DBs, that amounts to once every 40 minutes for > > the given database. I see "LOG: autovacuum: processing > > database "xyz"" in the log, but I do not see any analyze/vacuum > > commands being issued at all (does it log when it > > analyzes/vacuums?). > > Not at the default logging level --- I think you have to go to DEBUG1 > or DEBUG2 to get info about per-table actions. (This ought to be > rethought, likely, as that will also clutter your log with a ton of > extraneous info...) Added to TODO: * Allow log_min_messages to be specified on a per-module basis This would allow administrators to see more detailed information from specific sections of the backend, e.g. checkpoints, autovacuum, etc. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +