Thread: Heavily fragmented table and index data in 8.0.3

Heavily fragmented table and index data in 8.0.3

From
Zoltan Boszormenyi
Date:
Hi,

we have a customer with PostgreSQL 8.0.3 with a quite interesting problem.
They have around 24 identical databases and all but one is working nicely.
The one that doesn't work nicely show this problem: INSERT "hangs"
on an apparently empty table where "select count(*)" returns 0 quite
quickly.

The relfilenodes of the table and its only (non-unique) index are below:

> ls -l ./17230/20387 ./17230/20382
-rw-------   1 postgres postgres 2727936 Jun  6 03:31 ./17230/20382
-rw-------   1 postgres postgres  630784 May 24 13:18 ./17230/20387

The machine is:

> uname -a
SunOS ihds00 5.10 Generic_125100-10 sun4u sparc SUNW,Netra-T12

The realtime trace I captured from the hung INSERT shows that it
enters two functions repeatedly: _bt_relandgetbuf() and _bt_compare().
The pattern in which these functions entered match either _bt_moveright() or
_bt_insertonpg(). Also, VACUUM FULL also takes too much time,
on an otherwise idle database, I worked on a copy of their live database.
During VACUUM, _bt_getbuf() was also called repeatedly with the
block number jumping up and down. Obviously the table data is very
fragmented. The total database size is around 366MB, the only client
at the time was VACUUM, both the table and the index fit easily into
shared_buffers at the same time.

I know, 8.0.3 is quite old. But nothing jumped out from the changelog
up to 8.0.15 that would explain this excessive slowness. SELECTs are
pretty fast on any of the tables I tried, but INSERT hangs on this table.
How does this fragmentation happen and how can we prevent this situation?

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/


Re: Heavily fragmented table and index data in 8.0.3

From
Gregory Stark
Date:
"Zoltan Boszormenyi" <zb@cybertec.at> writes:

> Also, VACUUM FULL also takes too much time, on an otherwise idle database, I
> worked on a copy of their live database. During VACUUM, _bt_getbuf() was
> also called repeatedly with the block number jumping up and down.

VACUUM or VACUUM FULL? VACUUM should only read the table sequentially but
VACUUM FULL behaves exactly as you describe which is one of the reasons it
sucks so much.

That said I think 8.0's VACUUM does jump around when cleaning indexes. That's
a big reason to upgrade to a more modern version. More recent VACUUM's (but
not VACUUM FULL) do only sequential scans of both the table and indexes.

VACUUM FULL also tends to bloat indexes. If you've been running VACUUM FULL
regularly on this table you may want to REINDEX this table.

> I know, 8.0.3 is quite old. But nothing jumped out from the changelog
> up to 8.0.15 that would explain this excessive slowness. SELECTs are
> pretty fast on any of the tables I tried, but INSERT hangs on this table.
> How does this fragmentation happen and how can we prevent this situation?

I'm not sure "fragmentation" has a direct analogy since tuples lie entirely on
one page. Unless perhaps you have a problem with TOAST data being laid out
poorly. Are any individual rows in tables over 2k?

The problems people do run into are either

a) lots of dead space because either vacuum (plain old vacuum, not full)
wasn't run regularly or because large batch updates or deletes were run which
later activity could never reuse

b) indexes with bloat either due to the above or due to deleting many but not
all tuples from a range and then never inserting into that range again.
indexes can only reuse tuples if you insert in the same page again or if you
delete all the tuples on the page.

One trick you could use if you can stand the downtime is to periodically
CLUSTER the table. Older versions of Postgres had a concurrency bugs in
CLUSTER to watch out for, but as long as you don't run it at the same time as
a very long-running transaction such as pg_dump it shouldn't be a problem.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

Re: Heavily fragmented table and index data in 8.0.3

From
Zoltan Boszormenyi
Date:
Gregory Stark írta:
> "Zoltan Boszormenyi" <zb@cybertec.at> writes:
>
>
>> Also, VACUUM FULL also takes too much time, on an otherwise idle database, I
>> worked on a copy of their live database. During VACUUM, _bt_getbuf() was
>> also called repeatedly with the block number jumping up and down.
>>
>
> VACUUM or VACUUM FULL?

Read the first line, it was VACCUUM FULL. When I mentioned it second time
I didn't spell it out again, sorry.

>  VACUUM should only read the table sequentially but
> VACUUM FULL behaves exactly as you describe which is one of the reasons it
> sucks so much.
>
> That said I think 8.0's VACUUM does jump around when cleaning indexes. That's
> a big reason to upgrade to a more modern version. More recent VACUUM's (but
> not VACUUM FULL) do only sequential scans of both the table and indexes.
>
> VACUUM FULL also tends to bloat indexes. If you've been running VACUUM FULL
> regularly on this table you may want to REINDEX this table.
>
>
>> I know, 8.0.3 is quite old. But nothing jumped out from the changelog
>> up to 8.0.15 that would explain this excessive slowness. SELECTs are
>> pretty fast on any of the tables I tried, but INSERT hangs on this table.
>> How does this fragmentation happen and how can we prevent this situation?
>>
>
> I'm not sure "fragmentation" has a direct analogy since tuples lie entirely on
> one page. Unless perhaps you have a problem with TOAST data being laid out
> poorly. Are any individual rows in tables over 2k?
>

Let me reiterate my original question: the problem was with INSERT hanging,
SELECT COUNT(*) on the table returned immediately telling the nr of rows
==  0.
What I gathered from "truss -d -u a.out::* -v pollsys -p <pid of
backend>" output was:
"

The realtime trace I captured from the hung INSERT shows that it
enters two functions repeatedly: _bt_relandgetbuf() and _bt_compare().
The pattern in which these functions entered match either _bt_moveright() or
_bt_insertonpg().

"

So, it wasn't actually hung, but was very slow. The page numbers jumped
wildly in the trace. It's true that tuples lie inside one page, but
tuple chains
can exceed one page, right? One dead tuple points the next in the chain
which
can be on another page. Consecutive or monotonically increasing page numbers
aren't guaranteed: 8.0.x doesn't have FILLFACTOR, dead space can be
anywhere,
new tuples in the chain can be written on pages much earlier in the
file, causing
seek-dominated load. So, can I call it "fragmentation"?

This is so embarrassing that the file and its only index used up only 3.3MB,
at the time of my testing no one else used the database, shared_buffers is
large enough to hold both the index and the table data totally:

shared_buffers = 4000 # it's about about 31MB on 8.0.x

So, how comes INSERT gets so busy on an empty table (no live records)
that is so small that it fits into the cache and SELECT COUNT(*) returns
immediately?

> The problems people do run into are either
>
> a) lots of dead space because either vacuum (plain old vacuum, not full)
> wasn't run regularly or because large batch updates or deletes were run which
> later activity could never reuse
>

Lots of dead space. Indeed. But on a table that's 2.7MB plus its index
is only 600K?

> b) indexes with bloat either due to the above or due to deleting many but not
> all tuples from a range and then never inserting into that range again.
> indexes can only reuse tuples if you insert in the same page again or if you
> delete all the tuples on the page.
>
> One trick you could use if you can stand the downtime is to periodically
> CLUSTER the table. Older versions of Postgres had a concurrency bugs in
> CLUSTER to watch out for, but as long as you don't run it at the same time as
> a very long-running transaction such as pg_dump it shouldn't be a problem.
>

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/


Re: Heavily fragmented table and index data in 8.0.3

From
Alvaro Herrera
Date:
Zoltan Boszormenyi wrote:

> This is so embarrassing that the file and its only index used up only 3.3MB,
> at the time of my testing no one else used the database, shared_buffers is
> large enough to hold both the index and the table data totally:

I would be embarrased if this was on a recent release.  8.0 is an old
release.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Heavily fragmented table and index data in 8.0.3

From
Tom Lane
Date:
Zoltan Boszormenyi <zb@cybertec.at> writes:
> The realtime trace I captured from the hung INSERT shows that it
> enters two functions repeatedly: _bt_relandgetbuf() and _bt_compare().
> The pattern in which these functions entered match either _bt_moveright() or
> _bt_insertonpg().

What that sounds like to me is a corrupt index (infinite loop of
right-links, perhaps).  Have you tried REINDEXing the table?

            regards, tom lane

Re: Heavily fragmented table and index data in 8.0.3

From
Zoltan Boszormenyi
Date:
Tom Lane írta:
> Zoltan Boszormenyi <zb@cybertec.at> writes:
>
>> The realtime trace I captured from the hung INSERT shows that it
>> enters two functions repeatedly: _bt_relandgetbuf() and _bt_compare().
>> The pattern in which these functions entered match either _bt_moveright() or
>> _bt_insertonpg().
>>
>
> What that sounds like to me is a corrupt index (infinite loop of
> right-links, perhaps).  Have you tried REINDEXing the table?
>
>             regards, tom lane
>

No, TRUNCATE was faster because there were no live records in the table.

How can such an infinite loop sneak into an index?
Hardware is from Sun, not a grocery store PC, so I don't suppose it to
be faulty.
Is there anything in the 8.0.x series that fixes this (or a similar) bug?
If I could point to something in the release notes, I may get them to
upgrade
and they may upgrade to a newer generation even.

Thanks.

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/


Re: Heavily fragmented table and index data in 8.0.3

From
Andrew Sullivan
Date:
On Thu, Jun 12, 2008 at 05:31:37PM +0200, Zoltan Boszormenyi wrote:

> Hardware is from Sun, not a grocery store PC, so I don't suppose it to
> be faulty.

FWIW, I had a pair of E4500s in one job that I would _happily_ have
traded for any beige box you care to mention.  Heck, at the end I
mighta traded them for a lump of pig iron, they were so flakey.
Having "Sun" on the outside in no way protects you from faulty
hardware.

A

--
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: Heavily fragmented table and index data in 8.0.3

From
"Joshua D. Drake"
Date:

On Thu, 2008-06-12 at 11:56 -0400, Andrew Sullivan wrote:
> On Thu, Jun 12, 2008 at 05:31:37PM +0200, Zoltan Boszormenyi wrote:
>
> > Hardware is from Sun, not a grocery store PC, so I don't suppose it to
> > be faulty.
>
> FWIW, I had a pair of E4500s in one job that I would _happily_ have
> traded for any beige box you care to mention.  Heck, at the end I
> mighta traded them for a lump of pig iron, they were so flakey.
> Having "Sun" on the outside in no way protects you from faulty
> hardware.

Considering that a lot Sun hardware is just rebranded newsys (newisys?)
I wouldn't put too much stock in the old "It's name brand" either. That
is in no way to suggest that newsys is a bad vendor, just that Sun is
just re-branding too.

Sincerely,

Joshua D. Drake



Re: Heavily fragmented table and index data in 8.0.3

From
Tom Lane
Date:
Zoltan Boszormenyi <zb@cybertec.at> writes:
> Tom Lane �rta:
>> What that sounds like to me is a corrupt index (infinite loop of
>> right-links, perhaps).  Have you tried REINDEXing the table?

> How can such an infinite loop sneak into an index?
> Hardware is from Sun, not a grocery store PC, so I don't suppose it to
> be faulty.
> Is there anything in the 8.0.x series that fixes this (or a similar) bug?
> If I could point to something in the release notes, I may get them to
> upgrade
> and they may upgrade to a newer generation even.

Well, for instance there's this bug fixed in 8.0.6:
http://archives.postgresql.org/pgsql-hackers/2006-01/msg00206.php

The discussion there only covers what would happen when the bug
manifests on a plain heap relation.  I don't think we ever worked out
what would happen in an index ... but it seems fairly clear that
you would end up with two places linking to the same page, and that
might net out to being a loop, depending...

In any case, there are enough known bugs in 8.0.3 that I hardly
think there needs to be any debate about whether they need to update.

            regards, tom lane