Home > mailing lists

Re: Heavily fragmented table and index data in 8.0.3 - Mailing list pgsql-general

From	Zoltan Boszormenyi
Subject	Re: Heavily fragmented table and index data in 8.0.3
Date	June 12, 2008 11:40:07
Msg-id	48510B0A.5050504@cybertec.at Whole thread Raw
In response to	Re: Heavily fragmented table and index data in 8.0.3 (Gregory Stark <stark@enterprisedb.com>)
Responses	Re: Heavily fragmented table and index data in 8.0.3 Re: Heavily fragmented table and index data in 8.0.3
List	pgsql-general

Tree view

Gregory Stark írta:
> "Zoltan Boszormenyi" <zb@cybertec.at> writes:
>
>
>> Also, VACUUM FULL also takes too much time, on an otherwise idle database, I
>> worked on a copy of their live database. During VACUUM, _bt_getbuf() was
>> also called repeatedly with the block number jumping up and down.
>>
>
> VACUUM or VACUUM FULL?

Read the first line, it was VACCUUM FULL. When I mentioned it second time
I didn't spell it out again, sorry.

>  VACUUM should only read the table sequentially but
> VACUUM FULL behaves exactly as you describe which is one of the reasons it
> sucks so much.
>
> That said I think 8.0's VACUUM does jump around when cleaning indexes. That's
> a big reason to upgrade to a more modern version. More recent VACUUM's (but
> not VACUUM FULL) do only sequential scans of both the table and indexes.
>
> VACUUM FULL also tends to bloat indexes. If you've been running VACUUM FULL
> regularly on this table you may want to REINDEX this table.
>
>
>> I know, 8.0.3 is quite old. But nothing jumped out from the changelog
>> up to 8.0.15 that would explain this excessive slowness. SELECTs are
>> pretty fast on any of the tables I tried, but INSERT hangs on this table.
>> How does this fragmentation happen and how can we prevent this situation?
>>
>
> I'm not sure "fragmentation" has a direct analogy since tuples lie entirely on
> one page. Unless perhaps you have a problem with TOAST data being laid out
> poorly. Are any individual rows in tables over 2k?
>

Let me reiterate my original question: the problem was with INSERT hanging,
SELECT COUNT(*) on the table returned immediately telling the nr of rows
==  0.
What I gathered from "truss -d -u a.out::* -v pollsys -p <pid of
backend>" output was:
"

The realtime trace I captured from the hung INSERT shows that it
enters two functions repeatedly: _bt_relandgetbuf() and _bt_compare().
The pattern in which these functions entered match either _bt_moveright() or
_bt_insertonpg().

"

So, it wasn't actually hung, but was very slow. The page numbers jumped
wildly in the trace. It's true that tuples lie inside one page, but
tuple chains
can exceed one page, right? One dead tuple points the next in the chain
which
can be on another page. Consecutive or monotonically increasing page numbers
aren't guaranteed: 8.0.x doesn't have FILLFACTOR, dead space can be
anywhere,
new tuples in the chain can be written on pages much earlier in the
file, causing
seek-dominated load. So, can I call it "fragmentation"?

This is so embarrassing that the file and its only index used up only 3.3MB,
at the time of my testing no one else used the database, shared_buffers is
large enough to hold both the index and the table data totally:

shared_buffers = 4000 # it's about about 31MB on 8.0.x

So, how comes INSERT gets so busy on an empty table (no live records)
that is so small that it fits into the cache and SELECT COUNT(*) returns
immediately?

> The problems people do run into are either
>
> a) lots of dead space because either vacuum (plain old vacuum, not full)
> wasn't run regularly or because large batch updates or deletes were run which
> later activity could never reuse
>

Lots of dead space. Indeed. But on a table that's 2.7MB plus its index
is only 600K?

> b) indexes with bloat either due to the above or due to deleting many but not
> all tuples from a range and then never inserting into that range again.
> indexes can only reuse tuples if you insert in the same page again or if you
> delete all the tuples on the page.
>
> One trick you could use if you can stand the downtime is to periodically
> CLUSTER the table. Older versions of Postgres had a concurrency bugs in
> CLUSTER to watch out for, but as long as you don't run it at the same time as
> a very long-running transaction such as pg_dump it shouldn't be a problem.
>

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/

pgsql-general by date:

From: "Pavel Stehule"
Date: 12 June 2008, 09:14:59
Subject: Re: pl/pgsql function with argument of type DATE

From: Magnus Hagander
Date: 12 June 2008, 13:42:42
Subject: Re: Insert into master table ->" 0 rows affected" -> Hibernate problems

Re: Heavily fragmented table and index data in 8.0.3 - Mailing list pgsql-general

Previous

Next