Thread: Slow sequential scans on one DB but not another; fragmentation?

Slow sequential scans on one DB but not another; fragmentation?

From
Stephen Harris
Date:
This 8.0.8 on Solaris 8.  Yes I know; old technologies but we can't
upgrade yet.

I have two identical servers.  One is production and overnight we do
a complete dump and scp the results to the other.  The other is standby.
It's currently running data that's about 7 months old because we haven't
needed to fail over yet.

I have one specific table ("sweep_users") that has 900,000 rows in prod
and 630,000 on the standby.  On the standby a "select count(*) from
sweep_users" takes a couple of seconds.  On production it takes... 240
seconds!

Data updates on this table consist of the following meta-logic:

  for host in list_of_hosts
    delete from sweep_users where hostid=host
    for user in users_for_host
      insert into sweep_users ....
  vacuum analyze sweep_users

I'm at a loss to understand why the production server is so slow.  While
the query is running "iostat -x" returns values like (on a striped mirror):

  device       r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
  md30       764.0    0.0 92110.5    0.0  0.0  1.3    1.6   0  87
  md30      1666.8    0.0 67254.3    0.0  0.0  1.7    1.0   0  92
  md30       844.6    0.4 75716.1    0.3  0.0  1.3    1.6   0  90

The disk on the standby machine only shows 1/20th of that activity.

Now there is a difference in tuning between these values where we tried
to allocate more memory to the database to make queries more likely to be
in RAM (but we could have made a mistake)...

  % diff prod standby
  <  effective_cache_size           | 262144
  ---
  >  effective_cache_size           | 1000
  92c92
  <  maintenance_work_mem           | 524288
  ---
  >  maintenance_work_mem           | 16384
  106c106
  <  random_page_cost               | 1.5
  ---
  >  random_page_cost               | 4
  113c113
  <  shared_buffers                 | 30000
  ---
  >  shared_buffers                 | 1000
  141c141
  <  work_mem                       | 20480
  ---
  >  work_mem                       | 1024

The only idea I have is that our update pattern is somehow causing excessive
fragmentation, either at the DB level or the OS file level.

Anyone else have any thoughts?

--

rgds
Stephen

Re: Slow sequential scans on one DB but not another; fragmentation?

From
Tom Lane
Date:
Stephen Harris <lists@spuddy.org> writes:
> I have one specific table ("sweep_users") that has 900,000 rows in prod
> and 630,000 on the standby.  On the standby a "select count(*) from
> sweep_users" takes a couple of seconds.  On production it takes... 240
> seconds!

Lots of dead rows/free space perhaps?  What does VACUUM VERBOSE have to
say about this table on each machine?

            regards, tom lane

Re: Slow sequential scans on one DB but not another; fragmentation?

From
Stephen Harris
Date:
On Wed, Mar 28, 2007 at 11:07:54AM -0400, Tom Lane wrote:
> Stephen Harris <lists@spuddy.org> writes:
> > I have one specific table ("sweep_users") that has 900,000 rows in prod
> > and 630,000 on the standby.  On the standby a "select count(*) from
> > sweep_users" takes a couple of seconds.  On production it takes... 240
> > seconds!
>
> Lots of dead rows/free space perhaps?  What does VACUUM VERBOSE have to
> say about this table on each machine?

I'm vacuuming every night after the inserts are done.

"vacuum analyze verbose" says (from the overnight log) on production:

INFO:  vacuuming "ibusassets.sweep_users"
INFO:  index "sweep_users_host_id_idx" now contains 972662 row versions in 43147 pages
DETAIL:  835831 index row versions were removed.
37954 index pages have been deleted, 20000 are currently reusable.
CPU 3.76s/10.29u sec elapsed 244.73 sec.
INFO:  "sweep_users": removed 835831 row versions in 21579 pages
DETAIL:  CPU 6.77s/6.64u sec elapsed 607.08 sec.
INFO:  "sweep_users": found 835831 removable, 972662 nonremovable row versions in 2890304 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 112212932 unused item pointers.
0 pages are entirely empty.
CPU 116.15s/48.07u sec elapsed 1145.11 sec.
INFO:  analyzing "ibusassets.sweep_users"
INFO:  "sweep_users": scanned 3000 of 2890304 pages, containing 1236 live rows and 0 dead rows; 1236 rows in sample,
1190805estimated total rows 

--

rgds
Stephen

Re: Slow sequential scans on one DB but not another; fragmentation?

From
Tom Lane
Date:
Stephen Harris <lists@spuddy.org> writes:
> INFO:  "sweep_users": found 835831 removable, 972662 nonremovable row versions in 2890304 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 112212932 unused item pointers.

Oy, that's one bloated table ... only one live row in every three or so pages.

Probably a CLUSTER is the most effective way of cleaning it up.  Once
you get it down to size, revisit your vacuuming policy, because it
definitely isn't getting vacuumed often enough.

            regards, tom lane

Re: Slow sequential scans on one DB but not another; fragmentation?

From
Stephen Harris
Date:
On Wed, Mar 28, 2007 at 11:36:27AM -0400, Tom Lane wrote:
> Stephen Harris <lists@spuddy.org> writes:
> > INFO:  "sweep_users": found 835831 removable, 972662 nonremovable row versions in 2890304 pages
> > DETAIL:  0 dead row versions cannot be removed yet.
> > There were 112212932 unused item pointers.
>
> Oy, that's one bloated table ... only one live row in every three or so pages.
>
> Probably a CLUSTER is the most effective way of cleaning it up.  Once

> you get it down to size, revisit your vacuuming policy, because it
> definitely isn't getting vacuumed often enough.

It's vacuumed every night after the updates.  There are minimal (zero,
most days) updates during the day.  As I mentioned earlier, nightly we do:

  for host in list_of_hosts
    delete from sweep_users where hostid=host
    for user in users_for_host
      insert into sweep_users ....

  vacuum analyze sweep_users

(in fact we just do "vacuum verbose analyze" for the whole database).

You recommend a "cluster sweep_users" before the vacuum, then?

Thanks!

--

rgds
Stephen

Re: Slow sequential scans on one DB but not another; fragmentation?

From
Tom Lane
Date:
Stephen Harris <lists@spuddy.org> writes:
> It's vacuumed every night after the updates.  There are minimal (zero,
> most days) updates during the day.  As I mentioned earlier, nightly we do:

>   for host in list_of_hosts
>     delete from sweep_users where hostid=host
>     for user in users_for_host
>       insert into sweep_users ....

>   vacuum analyze sweep_users

Hmm ... no overlap between the sets of users for different hosts?
This looks like the worst-case bloat should be 2X (at most one dead
and one live row per user), but the numbers you are reporting ---
particularly the unused-item-pointer count --- show clearly that at
some point there was a bloat factor of more than 100, ie, there had
been at least 100 complete replacements of the table without a vacuum.

Perhaps that vacuum step was only recently added to this script?

> You recommend a "cluster sweep_users" before the vacuum, then?

I wouldn't think you need to do it every night, it's just a one-time
fix.

            regards, tom lane

Re: Slow sequential scans on one DB but not another; fragmentation?

From
Stephen Harris
Date:
On Wed, Mar 28, 2007 at 12:10:27PM -0400, Tom Lane wrote:
> Stephen Harris <lists@spuddy.org> writes:
> > It's vacuumed every night after the updates.  There are minimal (zero,
> > most days) updates during the day.  As I mentioned earlier, nightly we do:
>
> >   for host in list_of_hosts
> >     delete from sweep_users where hostid=host
> >     for user in users_for_host
> >       insert into sweep_users ....
>
> >   vacuum analyze sweep_users
>
> Hmm ... no overlap between the sets of users for different hosts?

No; each row also includes the host ID so each row is unique.

> Perhaps that vacuum step was only recently added to this script?

No; the subversion logs show it was added to UAT in July 2006 and we
pushed it to production soon after.  When we did the push we switched
between the servers and so rebuilt the database (initdb and imported
a backup).  We do it this way on major releases so we can always switch
back to the previous server if there are problems with the new code :-)

Sometimes the vacuum code doesn't run if earlier code hangs; we've
sometimes gone 3 or 4 days without a vacuum, but vacuum is generally
consistent.

> I wouldn't think you need to do it every night, it's just a one-time
> fix.

Interesting; maybe we'll schedule one for the next minor code push.

Thanks!

--

rgds
Stephen

Re: Slow sequential scans on one DB but not another; fragmentation?

From
Joseph S
Date:
Stephen Harris wrote:

> I'm vacuuming every night after the inserts are done.
>
You should vacuum after the deletes and before the inserts, so the
inserts can go into the just reclaimed space.

Re: Slow sequential scans on one DB but not another; fragmentation?

From
Stephen Harris
Date:
On Wed, Mar 28, 2007 at 11:36:27AM -0400, Tom Lane wrote:
> Stephen Harris <lists@spuddy.org> writes:
> > INFO:  "sweep_users": found 835831 removable, 972662 nonremovable row versions in 2890304 pages

> Oy, that's one bloated table ... only one live row in every three or so pages.
>
> Probably a CLUSTER is the most effective way of cleaning it up.  Once

OK, we were doing a code release today and so had a change window open.
We ran a cluster command on this table.  It took 15 minutes to complete,
and the number of pages went down to 27,000 - ie by a factor of 100.
"select count(*)" took 4 seconds instead of 220, giving us a 55 times
speed increase.

We'll keep our eye on this but since it was relatively quick we might
schedule a weekly cluster "just in case".

Thanks!

--

rgds
Stephen