Thread: xid wrap / optimize frozen tables?
Hi, as many before, I ran into the issue of a postgresql database (8.4.1) - committing many transactions - to huge volume tables (3-figure GB in size) - running the xid wrap vacuum (to freeze tuples) where the additional read IO load has negative impact to the extent of the system becoming unusable. Besides considering the fact that this can be worked around by exchanging printed sheets of paper or plastic (hello to .au) for hardware, I'd very much appreciate answers to these questions: * have I missed any more recent improvements regarding this problem? My understanding is that the full scan for unfrozen tuples can be made less likely (by reducing the number of transactions and tuning the autovac), but that it is still required. Is this correct? * A pretty obvious idea seems to be to add special casing for "fully frozen tables": If we could register the fact that a table is fully frozen (has no new tuples after the complete-freeze xid), a vacuum would get reduced to just increasing that "last frozen" xid. It seems like Alvaro Herrera had implemented a patch along the lines of this idea but I fail to find any other references to it: http://grokbase.com/t/postgresql/pgsql-hackers/0666gann3t/how-to-avoid-transaction-id-wrap#200606113hlzxtcuzrcsfwc4pxjimyvwgu Does anyone have pointers what happened to the patch? Thanks, Nils
On Sat, May 23, 2015 at 03:46:33PM +0200, Nils Goroll wrote: > Hi, > > as many before, I ran into the issue of a postgresql database (8.4.1) Uh, did you mean 9.4.1 as 8.4.1 is end-of-lifed. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Nils Goroll <slink@schokola.de> writes: > as many before, I ran into the issue of a postgresql database (8.4.1) *Please* tell us that was a typo. If it wasn't, there were possibly-relevant fixes in 8.4.6, 8.4.9, and perhaps later; I got tired of scanning the release notes. 8.4.x is out of support entirely now, of course, but it ought to be fairly painless to drop in the last 8.4.x release (8.4.22) and see if that makes things any better. If not, you ought to be thinking about an upgrade to something 9.recent rather than trying to hack up 8.4. regards, tom lane
On 23/05/15 16:50, Tom Lane wrote: >> > as many before, I ran into the issue of a postgresql database (8.4.1) > *Please* tell us that was a typo. Yes it was, my sincere apologies. It's 9.4.1 Nils
On Sat, May 23, 2015 at 6:46 AM, Nils Goroll <slink@schokola.de> wrote:
Hi,
as many before, I ran into the issue of a postgresql database (8.4.1)
- committing many transactions
- to huge volume tables (3-figure GB in size)
- running the xid wrap vacuum (to freeze tuples)
where the additional read IO load has negative impact to the extent of the
system becoming unusable.
(9.4.1 noted)
Are you sure it is the read IO that causes the problem? It should be pretty light, as it would mostly be satisfied by read-ahead (unless you have GIN indexes) and for pages that aren't automatically prefetched, it just waits patiently for the requested data to arrive. (As opposed to writing, in which it runs around dirtying things that other processes need, clogging up their IO rather than just its own).
What monitoring techniques do you use to determine the source of the slowdown?
Besides considering the fact that this can be worked around by exchanging
printed sheets of paper or plastic (hello to .au) for hardware, I'd very much
appreciate answers to these questions:
* have I missed any more recent improvements regarding this problem? My
understanding is that the full scan for unfrozen tuples can be made less likely
(by reducing the number of transactions and tuning the autovac), but that it is
still required. Is this correct?
* A pretty obvious idea seems to be to add special casing for "fully frozen
tables": If we could register the fact that a table is fully frozen (has no new
tuples after the complete-freeze xid), a vacuum would get reduced to just
increasing that "last frozen" xid.
It seems like Alvaro Herrera had implemented a patch along the lines of this
idea but I fail to find any other references to it:
http://grokbase.com/t/postgresql/pgsql-hackers/0666gann3t/how-to-avoid-transaction-id-wrap#200606113hlzxtcuzrcsfwc4pxjimyvwgu
Does anyone have pointers what happened to the patch?
I don't know happened to that, but there is another patch waiting for review and testing:
Cheers,
Jeff
Hi Jeff and all, On 23/05/15 22:13, Jeff Janes wrote: > Are you sure it is the read IO that causes the problem? Yes. Trouble is here that we are talking about a 361 GB table List of relationsSchema | Name | Type | Owner | Size | Description --------+-----------------------------+----------+----------+------------+-------------public | *redacted*_y2015m04 | table | postgres | 361 GB | and while we have shared_buffers = 325GBhuge_pages = on this is not the only table of this size (total db size ist 1.8tb) and more current data got written to *redacted*_y2015m05 (the manually-partitioned table for may), so most of the m04 data would have got evicted from the cache when this issue surfaced initially. There is one application pushing data (bulk inserts) and we have transaction rates for this app in a log. The moment the vacuum started, these rates dropped. Unfortunately I cannot present helpful log excerpts here as the autovacuum never finished so far (because the admin killed the db), so we have zero logging about past autovac events. At the moment, the application is shut down and the machine is only running the vacs: query_start | 2015-05-22 19:33:52.44334+02 waiting | f query | autovacuum: VACUUM public.*redacted*_y2015m04 (to prevent wraparound) query_start | 2015-05-22 19:34:02.46004+02 waiting | f query | autovacuum: VACUUM ANALYZE public.*redacted*_y2015m05 (to prevent wraparound) so we know that any io must be caused by the vacs: shell# uptime13:33:33 up 1 day, 18:01, 2 users, load average: 5.75, 12.71, 8.43 shell# zpool iostat capacity operations bandwidth pool alloc free read write read write --------------- ----- ----- ----- ----- ----- ----- tank1 358G 6.90T 872 55 15.1M 3.08M Again, we know IO capacity is insufficient, the pool is on 2 magnetic disks only atm, so an avg read rate of 872 IOPS averaged over 42 hours is not even bad. > I don't know happened to that, but there is another patch waiting for review and > testing: > > https://commitfest.postgresql.org/5/221/ This is really interesting, thank you very much for the pointer. Cheers, Nils
On 5/24/15 6:42 AM, Nils Goroll wrote: > shared_buffers = 325GB FWIW, a lot of people report performance loss with shared buffers that large. At a minimum, if you're going to set them that large then you want to make sure that the OS has a bare minimum of memory in use for it's disk cache, since everything in there will be double cached. > query_start | 2015-05-22 19:33:52.44334+02 > waiting | f > query | autovacuum: VACUUM public.*redacted*_y2015m04 (to prevent > wraparound) > query_start | 2015-05-22 19:34:02.46004+02 > waiting | f > query | autovacuum: VACUUM ANALYZE public.*redacted*_y2015m05 (to > prevent wraparound) By default autovac has a cost delay > 0, which will significantly slow down those vacuums. Normally that's what you want (so you're not swamping the system), but in this case it's going to be problematic for you. Also, if there's other tables that need to be frozen you should experiment with starting those vacuums now, by hand. That means you'll have more IO happening than you have spindles, but my experience is that it can take a lot of extra Postgres backends to actually swamp sequential reads on an IO system. VACUUM's reads are sequential by nature, so you should be able to get close to the maximum theoretical sequential read speed from your drives. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
<p dir="ltr"><br /> On May 24, 2015 6:42 AM, "Nils Goroll" <<a href="mailto:slink@schokola.de">slink@schokola.de</a>>wrote:<br /> ><br /> > Hi Jeff and all,<br /> ><br /> >On 23/05/15 22:13, Jeff Janes wrote:<br /> > > Are you sure it is the read IO that causes the problem?<br /> ><br/> > Yes. Trouble is here that we are talking about a 361 GB table<br /> ><br /> > List of relations<br /> > Schema | Name | Type | Owner | Size |<br/> > Description<br /> > --------+-----------------------------+----------+----------+------------+-------------<br/> > public | *redacted*_y2015m04 | table | postgres | 361 GB |<br /> ><br /> > and while we have<br /> ><br />> shared_buffers = 325GB<br /> > huge_pages = on<p dir="ltr">As mentioned, that is very large settingfor share buffers.<p dir="ltr">><br /> > this is not the only table of this size (total db size ist 1.8tb) andmore<br /> > current data got written to *redacted*_y2015m05 (the manually-partitioned table<br /> > for may), somost of the m04 data would have got evicted from the cache when<br /> > this issue surfaced initially.<br /> ><br/> > There is one application pushing data (bulk inserts) and we have transaction<br /> > rates for this appin a log. The moment the vacuum started, these rates dropped.<br /> > Unfortunately I cannot present helpful log excerptshere as the autovacuum never<br /> > finished so far (because the admin killed the db), so we have zero loggingabout<br /> > past autovac events.<p dir="ltr">Could you do an experiment in which you do a large sequential readon the database files and measure the impact on the queries that way? Like:<p dir="ltr">tar -cf - data_dir | wc -c<pdir="ltr">Or better, use some fancy version that throttles to the read rate observed below.<p dir="ltr">> At the moment,the application is shut down and the machine is only running the<br /> > vacs:<br /> ><br /> > query_start | 2015-05-22 19:33:52.44334+02<br /> > waiting | f<br /> > query | autovacuum:VACUUM public.*redacted*_y2015m04 (to prevent<br /> > wraparound)<br /> > query_start | 2015-05-2219:34:02.46004+02<br /> > waiting | f<br /> > query | autovacuum: VACUUM ANALYZE public.*redacted*_y2015m05(to<br /> > prevent wraparound)<br /> ><br /> > so we know that any io must be causedby the vacs:<br /> ><br /> > shell# uptime<br /> > 13:33:33 up 1 day, 18:01, 2 users, load average: 5.75,12.71, 8.43<p dir="ltr">What OS is this? This load average looks very high. Does the OS charge processes that areblocked on IO against uptime?<p dir="ltr">> shell# zpool iostat<br /> > capacity operations bandwidth<br /> > pool alloc free read write read write<br /> > --------------- ----- ----- ----- ----- ----- -----<br /> > tank1 358G 6.90T 872 55 15.1M 3.08M<pdir="ltr">I'm not familiar with zpool but this shows a lot of writing going on. If the table was already frozenand just needed to be observed as being all frozen, then it should not be dirtying one block for every 5 blocks read. <p dir="ltr">I would not be surprised if it were the reading, not the writing, which caused the performance problem.<br/><p dir="ltr">Cheers, Jeff
Just FYI: We have worked around these issues by running regular (scripted and thus controlled) vaccuums on all tables but the active ones and adding L2 ZFS caching (l2arc). I hope to get back to this again soon.
On Wed, Jun 3, 2015 at 2:49 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
I would not be surprised if it were the reading, not the writing, which caused the performance problem.
Of course I screwed up that last sentence. I meant the opposite, it would not surprise me if it were the writing that caused the problem, despite there being 5 times less of it.
Jeff