Re: Heavily fragmented table and index data in 8.0.3 - Mailing list pgsql-general

From Gregory Stark
Subject Re: Heavily fragmented table and index data in 8.0.3
Date
Msg-id 87lk1icoc1.fsf@oxford.xeocode.com
Whole thread Raw
In response to Heavily fragmented table and index data in 8.0.3  (Zoltan Boszormenyi <zb@cybertec.at>)
Responses Re: Heavily fragmented table and index data in 8.0.3
List pgsql-general
"Zoltan Boszormenyi" <zb@cybertec.at> writes:

> Also, VACUUM FULL also takes too much time, on an otherwise idle database, I
> worked on a copy of their live database. During VACUUM, _bt_getbuf() was
> also called repeatedly with the block number jumping up and down.

VACUUM or VACUUM FULL? VACUUM should only read the table sequentially but
VACUUM FULL behaves exactly as you describe which is one of the reasons it
sucks so much.

That said I think 8.0's VACUUM does jump around when cleaning indexes. That's
a big reason to upgrade to a more modern version. More recent VACUUM's (but
not VACUUM FULL) do only sequential scans of both the table and indexes.

VACUUM FULL also tends to bloat indexes. If you've been running VACUUM FULL
regularly on this table you may want to REINDEX this table.

> I know, 8.0.3 is quite old. But nothing jumped out from the changelog
> up to 8.0.15 that would explain this excessive slowness. SELECTs are
> pretty fast on any of the tables I tried, but INSERT hangs on this table.
> How does this fragmentation happen and how can we prevent this situation?

I'm not sure "fragmentation" has a direct analogy since tuples lie entirely on
one page. Unless perhaps you have a problem with TOAST data being laid out
poorly. Are any individual rows in tables over 2k?

The problems people do run into are either

a) lots of dead space because either vacuum (plain old vacuum, not full)
wasn't run regularly or because large batch updates or deletes were run which
later activity could never reuse

b) indexes with bloat either due to the above or due to deleting many but not
all tuples from a range and then never inserting into that range again.
indexes can only reuse tuples if you insert in the same page again or if you
delete all the tuples on the page.

One trick you could use if you can stand the downtime is to periodically
CLUSTER the table. Older versions of Postgres had a concurrency bugs in
CLUSTER to watch out for, but as long as you don't run it at the same time as
a very long-running transaction such as pg_dump it shouldn't be a problem.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

pgsql-general by date:

Previous
From: Kevin Hunter
Date:
Subject: Re: when to reindex?
Next
From: Craig Ringer
Date:
Subject: Re: when to reindex?