Thread: Slow sequential scans on one DB but not another; fragmentation?
This 8.0.8 on Solaris 8. Yes I know; old technologies but we can't upgrade yet. I have two identical servers. One is production and overnight we do a complete dump and scp the results to the other. The other is standby. It's currently running data that's about 7 months old because we haven't needed to fail over yet. I have one specific table ("sweep_users") that has 900,000 rows in prod and 630,000 on the standby. On the standby a "select count(*) from sweep_users" takes a couple of seconds. On production it takes... 240 seconds! Data updates on this table consist of the following meta-logic: for host in list_of_hosts delete from sweep_users where hostid=host for user in users_for_host insert into sweep_users .... vacuum analyze sweep_users I'm at a loss to understand why the production server is so slow. While the query is running "iostat -x" returns values like (on a striped mirror): device r/s w/s kr/s kw/s wait actv svc_t %w %b md30 764.0 0.0 92110.5 0.0 0.0 1.3 1.6 0 87 md30 1666.8 0.0 67254.3 0.0 0.0 1.7 1.0 0 92 md30 844.6 0.4 75716.1 0.3 0.0 1.3 1.6 0 90 The disk on the standby machine only shows 1/20th of that activity. Now there is a difference in tuning between these values where we tried to allocate more memory to the database to make queries more likely to be in RAM (but we could have made a mistake)... % diff prod standby < effective_cache_size | 262144 --- > effective_cache_size | 1000 92c92 < maintenance_work_mem | 524288 --- > maintenance_work_mem | 16384 106c106 < random_page_cost | 1.5 --- > random_page_cost | 4 113c113 < shared_buffers | 30000 --- > shared_buffers | 1000 141c141 < work_mem | 20480 --- > work_mem | 1024 The only idea I have is that our update pattern is somehow causing excessive fragmentation, either at the DB level or the OS file level. Anyone else have any thoughts? -- rgds Stephen
Stephen Harris <lists@spuddy.org> writes: > I have one specific table ("sweep_users") that has 900,000 rows in prod > and 630,000 on the standby. On the standby a "select count(*) from > sweep_users" takes a couple of seconds. On production it takes... 240 > seconds! Lots of dead rows/free space perhaps? What does VACUUM VERBOSE have to say about this table on each machine? regards, tom lane
On Wed, Mar 28, 2007 at 11:07:54AM -0400, Tom Lane wrote: > Stephen Harris <lists@spuddy.org> writes: > > I have one specific table ("sweep_users") that has 900,000 rows in prod > > and 630,000 on the standby. On the standby a "select count(*) from > > sweep_users" takes a couple of seconds. On production it takes... 240 > > seconds! > > Lots of dead rows/free space perhaps? What does VACUUM VERBOSE have to > say about this table on each machine? I'm vacuuming every night after the inserts are done. "vacuum analyze verbose" says (from the overnight log) on production: INFO: vacuuming "ibusassets.sweep_users" INFO: index "sweep_users_host_id_idx" now contains 972662 row versions in 43147 pages DETAIL: 835831 index row versions were removed. 37954 index pages have been deleted, 20000 are currently reusable. CPU 3.76s/10.29u sec elapsed 244.73 sec. INFO: "sweep_users": removed 835831 row versions in 21579 pages DETAIL: CPU 6.77s/6.64u sec elapsed 607.08 sec. INFO: "sweep_users": found 835831 removable, 972662 nonremovable row versions in 2890304 pages DETAIL: 0 dead row versions cannot be removed yet. There were 112212932 unused item pointers. 0 pages are entirely empty. CPU 116.15s/48.07u sec elapsed 1145.11 sec. INFO: analyzing "ibusassets.sweep_users" INFO: "sweep_users": scanned 3000 of 2890304 pages, containing 1236 live rows and 0 dead rows; 1236 rows in sample, 1190805estimated total rows -- rgds Stephen
Stephen Harris <lists@spuddy.org> writes: > INFO: "sweep_users": found 835831 removable, 972662 nonremovable row versions in 2890304 pages > DETAIL: 0 dead row versions cannot be removed yet. > There were 112212932 unused item pointers. Oy, that's one bloated table ... only one live row in every three or so pages. Probably a CLUSTER is the most effective way of cleaning it up. Once you get it down to size, revisit your vacuuming policy, because it definitely isn't getting vacuumed often enough. regards, tom lane
On Wed, Mar 28, 2007 at 11:36:27AM -0400, Tom Lane wrote: > Stephen Harris <lists@spuddy.org> writes: > > INFO: "sweep_users": found 835831 removable, 972662 nonremovable row versions in 2890304 pages > > DETAIL: 0 dead row versions cannot be removed yet. > > There were 112212932 unused item pointers. > > Oy, that's one bloated table ... only one live row in every three or so pages. > > Probably a CLUSTER is the most effective way of cleaning it up. Once > you get it down to size, revisit your vacuuming policy, because it > definitely isn't getting vacuumed often enough. It's vacuumed every night after the updates. There are minimal (zero, most days) updates during the day. As I mentioned earlier, nightly we do: for host in list_of_hosts delete from sweep_users where hostid=host for user in users_for_host insert into sweep_users .... vacuum analyze sweep_users (in fact we just do "vacuum verbose analyze" for the whole database). You recommend a "cluster sweep_users" before the vacuum, then? Thanks! -- rgds Stephen
Stephen Harris <lists@spuddy.org> writes: > It's vacuumed every night after the updates. There are minimal (zero, > most days) updates during the day. As I mentioned earlier, nightly we do: > for host in list_of_hosts > delete from sweep_users where hostid=host > for user in users_for_host > insert into sweep_users .... > vacuum analyze sweep_users Hmm ... no overlap between the sets of users for different hosts? This looks like the worst-case bloat should be 2X (at most one dead and one live row per user), but the numbers you are reporting --- particularly the unused-item-pointer count --- show clearly that at some point there was a bloat factor of more than 100, ie, there had been at least 100 complete replacements of the table without a vacuum. Perhaps that vacuum step was only recently added to this script? > You recommend a "cluster sweep_users" before the vacuum, then? I wouldn't think you need to do it every night, it's just a one-time fix. regards, tom lane
On Wed, Mar 28, 2007 at 12:10:27PM -0400, Tom Lane wrote: > Stephen Harris <lists@spuddy.org> writes: > > It's vacuumed every night after the updates. There are minimal (zero, > > most days) updates during the day. As I mentioned earlier, nightly we do: > > > for host in list_of_hosts > > delete from sweep_users where hostid=host > > for user in users_for_host > > insert into sweep_users .... > > > vacuum analyze sweep_users > > Hmm ... no overlap between the sets of users for different hosts? No; each row also includes the host ID so each row is unique. > Perhaps that vacuum step was only recently added to this script? No; the subversion logs show it was added to UAT in July 2006 and we pushed it to production soon after. When we did the push we switched between the servers and so rebuilt the database (initdb and imported a backup). We do it this way on major releases so we can always switch back to the previous server if there are problems with the new code :-) Sometimes the vacuum code doesn't run if earlier code hangs; we've sometimes gone 3 or 4 days without a vacuum, but vacuum is generally consistent. > I wouldn't think you need to do it every night, it's just a one-time > fix. Interesting; maybe we'll schedule one for the next minor code push. Thanks! -- rgds Stephen
Stephen Harris wrote: > I'm vacuuming every night after the inserts are done. > You should vacuum after the deletes and before the inserts, so the inserts can go into the just reclaimed space.
On Wed, Mar 28, 2007 at 11:36:27AM -0400, Tom Lane wrote: > Stephen Harris <lists@spuddy.org> writes: > > INFO: "sweep_users": found 835831 removable, 972662 nonremovable row versions in 2890304 pages > Oy, that's one bloated table ... only one live row in every three or so pages. > > Probably a CLUSTER is the most effective way of cleaning it up. Once OK, we were doing a code release today and so had a change window open. We ran a cluster command on this table. It took 15 minutes to complete, and the number of pages went down to 27,000 - ie by a factor of 100. "select count(*)" took 4 seconds instead of 220, giving us a 55 times speed increase. We'll keep our eye on this but since it was relatively quick we might schedule a weekly cluster "just in case". Thanks! -- rgds Stephen