Thread: Heavily fragmented table and index data in 8.0.3
Hi, we have a customer with PostgreSQL 8.0.3 with a quite interesting problem. They have around 24 identical databases and all but one is working nicely. The one that doesn't work nicely show this problem: INSERT "hangs" on an apparently empty table where "select count(*)" returns 0 quite quickly. The relfilenodes of the table and its only (non-unique) index are below: > ls -l ./17230/20387 ./17230/20382 -rw------- 1 postgres postgres 2727936 Jun 6 03:31 ./17230/20382 -rw------- 1 postgres postgres 630784 May 24 13:18 ./17230/20387 The machine is: > uname -a SunOS ihds00 5.10 Generic_125100-10 sun4u sparc SUNW,Netra-T12 The realtime trace I captured from the hung INSERT shows that it enters two functions repeatedly: _bt_relandgetbuf() and _bt_compare(). The pattern in which these functions entered match either _bt_moveright() or _bt_insertonpg(). Also, VACUUM FULL also takes too much time, on an otherwise idle database, I worked on a copy of their live database. During VACUUM, _bt_getbuf() was also called repeatedly with the block number jumping up and down. Obviously the table data is very fragmented. The total database size is around 366MB, the only client at the time was VACUUM, both the table and the index fit easily into shared_buffers at the same time. I know, 8.0.3 is quite old. But nothing jumped out from the changelog up to 8.0.15 that would explain this excessive slowness. SELECTs are pretty fast on any of the tables I tried, but INSERT hangs on this table. How does this fragmentation happen and how can we prevent this situation? Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
"Zoltan Boszormenyi" <zb@cybertec.at> writes: > Also, VACUUM FULL also takes too much time, on an otherwise idle database, I > worked on a copy of their live database. During VACUUM, _bt_getbuf() was > also called repeatedly with the block number jumping up and down. VACUUM or VACUUM FULL? VACUUM should only read the table sequentially but VACUUM FULL behaves exactly as you describe which is one of the reasons it sucks so much. That said I think 8.0's VACUUM does jump around when cleaning indexes. That's a big reason to upgrade to a more modern version. More recent VACUUM's (but not VACUUM FULL) do only sequential scans of both the table and indexes. VACUUM FULL also tends to bloat indexes. If you've been running VACUUM FULL regularly on this table you may want to REINDEX this table. > I know, 8.0.3 is quite old. But nothing jumped out from the changelog > up to 8.0.15 that would explain this excessive slowness. SELECTs are > pretty fast on any of the tables I tried, but INSERT hangs on this table. > How does this fragmentation happen and how can we prevent this situation? I'm not sure "fragmentation" has a direct analogy since tuples lie entirely on one page. Unless perhaps you have a problem with TOAST data being laid out poorly. Are any individual rows in tables over 2k? The problems people do run into are either a) lots of dead space because either vacuum (plain old vacuum, not full) wasn't run regularly or because large batch updates or deletes were run which later activity could never reuse b) indexes with bloat either due to the above or due to deleting many but not all tuples from a range and then never inserting into that range again. indexes can only reuse tuples if you insert in the same page again or if you delete all the tuples on the page. One trick you could use if you can stand the downtime is to periodically CLUSTER the table. Older versions of Postgres had a concurrency bugs in CLUSTER to watch out for, but as long as you don't run it at the same time as a very long-running transaction such as pg_dump it shouldn't be a problem. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
Gregory Stark írta: > "Zoltan Boszormenyi" <zb@cybertec.at> writes: > > >> Also, VACUUM FULL also takes too much time, on an otherwise idle database, I >> worked on a copy of their live database. During VACUUM, _bt_getbuf() was >> also called repeatedly with the block number jumping up and down. >> > > VACUUM or VACUUM FULL? Read the first line, it was VACCUUM FULL. When I mentioned it second time I didn't spell it out again, sorry. > VACUUM should only read the table sequentially but > VACUUM FULL behaves exactly as you describe which is one of the reasons it > sucks so much. > > That said I think 8.0's VACUUM does jump around when cleaning indexes. That's > a big reason to upgrade to a more modern version. More recent VACUUM's (but > not VACUUM FULL) do only sequential scans of both the table and indexes. > > VACUUM FULL also tends to bloat indexes. If you've been running VACUUM FULL > regularly on this table you may want to REINDEX this table. > > >> I know, 8.0.3 is quite old. But nothing jumped out from the changelog >> up to 8.0.15 that would explain this excessive slowness. SELECTs are >> pretty fast on any of the tables I tried, but INSERT hangs on this table. >> How does this fragmentation happen and how can we prevent this situation? >> > > I'm not sure "fragmentation" has a direct analogy since tuples lie entirely on > one page. Unless perhaps you have a problem with TOAST data being laid out > poorly. Are any individual rows in tables over 2k? > Let me reiterate my original question: the problem was with INSERT hanging, SELECT COUNT(*) on the table returned immediately telling the nr of rows == 0. What I gathered from "truss -d -u a.out::* -v pollsys -p <pid of backend>" output was: " The realtime trace I captured from the hung INSERT shows that it enters two functions repeatedly: _bt_relandgetbuf() and _bt_compare(). The pattern in which these functions entered match either _bt_moveright() or _bt_insertonpg(). " So, it wasn't actually hung, but was very slow. The page numbers jumped wildly in the trace. It's true that tuples lie inside one page, but tuple chains can exceed one page, right? One dead tuple points the next in the chain which can be on another page. Consecutive or monotonically increasing page numbers aren't guaranteed: 8.0.x doesn't have FILLFACTOR, dead space can be anywhere, new tuples in the chain can be written on pages much earlier in the file, causing seek-dominated load. So, can I call it "fragmentation"? This is so embarrassing that the file and its only index used up only 3.3MB, at the time of my testing no one else used the database, shared_buffers is large enough to hold both the index and the table data totally: shared_buffers = 4000 # it's about about 31MB on 8.0.x So, how comes INSERT gets so busy on an empty table (no live records) that is so small that it fits into the cache and SELECT COUNT(*) returns immediately? > The problems people do run into are either > > a) lots of dead space because either vacuum (plain old vacuum, not full) > wasn't run regularly or because large batch updates or deletes were run which > later activity could never reuse > Lots of dead space. Indeed. But on a table that's 2.7MB plus its index is only 600K? > b) indexes with bloat either due to the above or due to deleting many but not > all tuples from a range and then never inserting into that range again. > indexes can only reuse tuples if you insert in the same page again or if you > delete all the tuples on the page. > > One trick you could use if you can stand the downtime is to periodically > CLUSTER the table. Older versions of Postgres had a concurrency bugs in > CLUSTER to watch out for, but as long as you don't run it at the same time as > a very long-running transaction such as pg_dump it shouldn't be a problem. > -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Zoltan Boszormenyi wrote: > This is so embarrassing that the file and its only index used up only 3.3MB, > at the time of my testing no one else used the database, shared_buffers is > large enough to hold both the index and the table data totally: I would be embarrased if this was on a recent release. 8.0 is an old release. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Zoltan Boszormenyi <zb@cybertec.at> writes: > The realtime trace I captured from the hung INSERT shows that it > enters two functions repeatedly: _bt_relandgetbuf() and _bt_compare(). > The pattern in which these functions entered match either _bt_moveright() or > _bt_insertonpg(). What that sounds like to me is a corrupt index (infinite loop of right-links, perhaps). Have you tried REINDEXing the table? regards, tom lane
Tom Lane írta: > Zoltan Boszormenyi <zb@cybertec.at> writes: > >> The realtime trace I captured from the hung INSERT shows that it >> enters two functions repeatedly: _bt_relandgetbuf() and _bt_compare(). >> The pattern in which these functions entered match either _bt_moveright() or >> _bt_insertonpg(). >> > > What that sounds like to me is a corrupt index (infinite loop of > right-links, perhaps). Have you tried REINDEXing the table? > > regards, tom lane > No, TRUNCATE was faster because there were no live records in the table. How can such an infinite loop sneak into an index? Hardware is from Sun, not a grocery store PC, so I don't suppose it to be faulty. Is there anything in the 8.0.x series that fixes this (or a similar) bug? If I could point to something in the release notes, I may get them to upgrade and they may upgrade to a newer generation even. Thanks. -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
On Thu, Jun 12, 2008 at 05:31:37PM +0200, Zoltan Boszormenyi wrote: > Hardware is from Sun, not a grocery store PC, so I don't suppose it to > be faulty. FWIW, I had a pair of E4500s in one job that I would _happily_ have traded for any beige box you care to mention. Heck, at the end I mighta traded them for a lump of pig iron, they were so flakey. Having "Sun" on the outside in no way protects you from faulty hardware. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
On Thu, 2008-06-12 at 11:56 -0400, Andrew Sullivan wrote: > On Thu, Jun 12, 2008 at 05:31:37PM +0200, Zoltan Boszormenyi wrote: > > > Hardware is from Sun, not a grocery store PC, so I don't suppose it to > > be faulty. > > FWIW, I had a pair of E4500s in one job that I would _happily_ have > traded for any beige box you care to mention. Heck, at the end I > mighta traded them for a lump of pig iron, they were so flakey. > Having "Sun" on the outside in no way protects you from faulty > hardware. Considering that a lot Sun hardware is just rebranded newsys (newisys?) I wouldn't put too much stock in the old "It's name brand" either. That is in no way to suggest that newsys is a bad vendor, just that Sun is just re-branding too. Sincerely, Joshua D. Drake
Zoltan Boszormenyi <zb@cybertec.at> writes: > Tom Lane �rta: >> What that sounds like to me is a corrupt index (infinite loop of >> right-links, perhaps). Have you tried REINDEXing the table? > How can such an infinite loop sneak into an index? > Hardware is from Sun, not a grocery store PC, so I don't suppose it to > be faulty. > Is there anything in the 8.0.x series that fixes this (or a similar) bug? > If I could point to something in the release notes, I may get them to > upgrade > and they may upgrade to a newer generation even. Well, for instance there's this bug fixed in 8.0.6: http://archives.postgresql.org/pgsql-hackers/2006-01/msg00206.php The discussion there only covers what would happen when the bug manifests on a plain heap relation. I don't think we ever worked out what would happen in an index ... but it seems fairly clear that you would end up with two places linking to the same page, and that might net out to being a loop, depending... In any case, there are enough known bugs in 8.0.3 that I hardly think there needs to be any debate about whether they need to update. regards, tom lane