Thread: Limitations on 7.0.3?
I have been using postgres 7.0.3 (Solaris 2.9) for the past 4 years without any major problems, until about a month ago. We are now experiencing crashes on the backend (connection lost to backend) while running queries (inserts, etc). Anyway I don't want to make this too technical but I'd just like to understand if maybe I'm running into some sort of limit on the size of the database. My biggest table is currently at 1.5B tuples. Would appreciate if anyone could let me know or is aware of any limits with 7.0 version. Thx, Jose
ARTEAGA Jose wrote: > I have been using postgres 7.0.3 (Solaris 2.9) for the past 4 years > without any major problems, until about a month ago. We are now > experiencing crashes on the backend (connection lost to backend) while > running queries (inserts, etc). Anyway I don't want to make this too > technical but I'd just like to understand if maybe I'm running into some > sort of limit on the size of the database. My biggest table is currently > at 1.5B tuples. > > Would appreciate if anyone could let me know or is aware of any limits > with 7.0 version. I don't remember any specific size limitations on the 7.0 series. For more detailed help you'll have to provide some specific error messages. Is there any chance you could move to a more recent version in the near future? You'll get much better support with a more recent version (there's just that many more users). You should also notice some substantial performance improvements compared to 7.0. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > ARTEAGA Jose wrote: > >I have been using postgres 7.0.3 (Solaris 2.9) for the past 4 years > >without any major problems, until about a month ago. We are now > >experiencing crashes on the backend (connection lost to backend) while > >running queries (inserts, etc). Anyway I don't want to make this too > >technical but I'd just like to understand if maybe I'm running into some > >sort of limit on the size of the database. My biggest table is currently > >at 1.5B tuples. > > > >Would appreciate if anyone could let me know or is aware of any limits > >with 7.0 version. > > I don't remember any specific size limitations on the 7.0 series. For > more detailed help you'll have to provide some specific error messages. 7.0 didn't have any protection against Xid wraparound. As soon as you hit the 4 billion transactions mark, your data suddenly disappeared. That's what I heard at least -- I didn't have much experience with such an old beast. We switched rather quickly to 7.1. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "Before you were born your parents weren't as boring as they are now. They got that way paying your bills, cleaning up your room and listening to you tell them how idealistic you are." -- Charles J. Sykes' advice to teenagers
On Wed, Jun 06, 2007 at 02:40:08PM -0500, ARTEAGA Jose wrote: > I have been using postgres 7.0.3 (Solaris 2.9) for the past 4 years > without any major problems, until about a month ago. We are now > experiencing crashes on the backend (connection lost to backend) while > running queries (inserts, etc). Anyway I don't want to make this too > technical but I'd just like to understand if maybe I'm running into some > sort of limit on the size of the database. My biggest table is currently > at 1.5B tuples. I don't believe it's a size limit. But 7.0 is pre-WAL. Are you running with the -F switch turned on, for performance? It could well be that you are running into data corruption that crashes the database. (The lack of WAL is not the only reason you might be running into that. 7.0 is a long time ago, and there are a lot of bugs that have been squished since then.) A -- Andrew Sullivan | ajs@crankycanuck.ca However important originality may be in some fields, restraint and adherence to procedure emerge as the more significant virtues in a great many others. --Alain de Botton
I've looked at the pg_index table and we are currently at 15Mill entries, which should be OK. After 2-3 days runtime I just get a disconnect error from backend while doing an insert. After I restore the DB and insert the same entries it runs fine. Following is the error I get: "Query pgsql8.1: PGRES_FATAL_ERROR, pqReadData() -- backend closed the channel unexpectedly.\n\tThis probably means the backend terminated abnormally\n\tbefore or while processing the request.\n INSERT INTO teststeprun (description, stepunqid, stepid, stepstarttime, duration, parentsn, stepresult) VALUES ('Qos Reset', '24920757/3267', ' 95. 4', '2007-06-02 19:02:05', '0 0:0:25', '5311955', '0')" Also worth mentioning is that I just this week found out about a very, very important parameter "shared buffers". Ever since the original person setup our PG (individual no longer with us) this DB had been running without any major glitches, albeit slow. All this time the shared buffers were running at default of "64" (8192 block size). Once I have got this back up and running I have since set this to 1600 shared buffers (~125MB). I've since noticed a dramatic performance improvement, I hope that I've striked gold. But cannot claim victory yet it's only been up for 2 days. -Jose -----Original Message----- From: Alvaro Herrera [mailto:alvherre@commandprompt.com] Sent: Thursday, June 07, 2007 7:02 AM To: Richard Huxton Cc: ARTEAGA Jose; pgsql-general@postgresql.org Subject: Re: [GENERAL] Limitations on 7.0.3? Richard Huxton wrote: > ARTEAGA Jose wrote: > >I have been using postgres 7.0.3 (Solaris 2.9) for the past 4 years > >without any major problems, until about a month ago. We are now > >experiencing crashes on the backend (connection lost to backend) while > >running queries (inserts, etc). Anyway I don't want to make this too > >technical but I'd just like to understand if maybe I'm running into some > >sort of limit on the size of the database. My biggest table is currently > >at 1.5B tuples. > > > >Would appreciate if anyone could let me know or is aware of any limits > >with 7.0 version. > > I don't remember any specific size limitations on the 7.0 series. For > more detailed help you'll have to provide some specific error messages. 7.0 didn't have any protection against Xid wraparound. As soon as you hit the 4 billion transactions mark, your data suddenly disappeared. That's what I heard at least -- I didn't have much experience with such an old beast. We switched rather quickly to 7.1. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "Before you were born your parents weren't as boring as they are now. They got that way paying your bills, cleaning up your room and listening to you tell them how idealistic you are." -- Charles J. Sykes' advice to teenagers
On Thu, Jun 07, 2007 at 02:32:09PM -0500, ARTEAGA Jose wrote: > I've looked at the pg_index table and we are currently at 15Mill > entries, which should be OK. After 2-3 days runtime I just get a > disconnect error from backend while doing an insert. After I restore the > DB and insert the same entries it runs fine. Following is the error I > get: Note: Version 7.0 *will* eat your data eventually. It has no protection against XID wraparound, you'll just start noticing data disappearing. From the documentation: Prior to PostgreSQL 7.2, the only defense against XID wraparound was to re-initdb at least every 4 billion transactions. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Martijn van Oosterhout wrote: > On Thu, Jun 07, 2007 at 02:32:09PM -0500, ARTEAGA Jose wrote: >> I've looked at the pg_index table and we are currently at 15Mill >> entries, which should be OK. After 2-3 days runtime I just get a >> disconnect error from backend while doing an insert. After I restore the >> DB and insert the same entries it runs fine. Following is the error I >> get: > > Note: Version 7.0 *will* eat your data eventually. It has no protection > against XID wraparound, you'll just start noticing data disappearing. > > From the documentation: > Prior to PostgreSQL 7.2, the only defense against XID wraparound was to > re-initdb at least every 4 billion transactions. This is a very kind way of saying you are completely nuts for running 7.0 anything. The minimum you should be running is 7.4 and preferably 8.2.4. Joshua D. Drake > > Have a nice day, -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
ARTEAGA Jose wrote: > Also worth mentioning is that I just this week found out about a very, > very important parameter "shared buffers". Ever since the original > person setup our PG (individual no longer with us) this DB had been > running without any major glitches, albeit slow. All this time the > shared buffers were running at default of "64" (8192 block size). Once I > have got this back up and running I have since set this to 1600 shared > buffers (~125MB). I've since noticed a dramatic performance improvement, > I hope that I've striked gold. But cannot claim victory yet it's only > been up for 2 days. If you think increasing shared buffers in 7.0 improves things, you should see what upgrading to 8.2.4 will do. Seriously, you'll freak at the increase in speed. 7.0 = handcar: http://www.handcar.net/ 8.2 = TGV: http://en.wikipedia.org/wiki/TGV (p.s. please upgrade before 7.0 eats your data...)
ARTEAGA Jose wrote: > I have spent the last month battling and looking deeper into the issue, > here's a summary of were I'm at: > - Increasing shared buffers improved performance but did not resolve the > backend FATAL disconnect error. > - Dumping and recreating entire database also did not resolve the issue. OK, so it's not a corrupted index/file then. > - re-initializing the DB and recreating from the dump also did not > resolve the issue. > On both cases above the issue re-occurred within 2-3 days of run-time > (insert of new records). > > I got the issue narrowed down to the point were I was able to re-create > the issue at will by just inserting enough data, the data content did > not matter. The issue always occurred while inserting into my > "teststeprun" table, which is the largest of my tables (~15 Mill rows). > The issue is that once I got this table to a certain size, then the > backend system would crash. > > Since I was able to reproduce, I then decided to analyze the core dumps. > Looking at the core dumps I immediately began to see a pattern, even the > same patter was there from the initial core dumps I had when the problem > began occurring back two months ago. In every case the dump indicated > the last instruction was always in the call to tag_hash(). I also > noticed that each time the values passed to tag_hash which are used to > generate the key were just below the 32-bit max value, and tag_hash > should be returning a uint32 value. Now I'm really suspecting that there > is some issue with this. Below are the traces of the four core dumps > which point to the issue I'm suspecting. I think tag_hash (in /backend/utils/hash/hashfn.c) is responsible for internal hash-tables (rather than hash indexes). It takes a pointer to a key to hash and a keysize (in bytes), so either the pointer is bad or the size is too long and it's reading off the end. At the other end of your call, _bt_insertonpg (/backend/access/nbtree/nbtinsert.c) is inserting into a btree index. In one case it's splitting the index page it tries to insert into (because it's full) but not in the others. If it's not a hardware related problem, then it's a bug, but you're unlikely to get a fix given how old the code is. If an upgrade to 8.2 looks like it will take a lot of effort, perhaps consider an intermediate upgrade to 7.2 - I think schemas were introduced in 7.3 so before that should be easier. There is a chance that you might reduce the problem by REINDEXing the table concerned every night. That's just a guess though, and you're real solution will be to upgrade to something more recent. -- Richard Huxton Archonet Ltd
I have spent the last month battling and looking deeper into the issue, here's a summary of were I'm at: - Increasing shared buffers improved performance but did not resolve the backend FATAL disconnect error. - Dumping and recreating entire database also did not resolve the issue. - re-initializing the DB and recreating from the dump also did not resolve the issue. On both cases above the issue re-occurred within 2-3 days of run-time (insert of new records). I got the issue narrowed down to the point were I was able to re-create the issue at will by just inserting enough data, the data content did not matter. The issue always occurred while inserting into my "teststeprun" table, which is the largest of my tables (~15 Mill rows). The issue is that once I got this table to a certain size, then the backend system would crash. Since I was able to reproduce, I then decided to analyze the core dumps. Looking at the core dumps I immediately began to see a pattern, even the same patter was there from the initial core dumps I had when the problem began occurring back two months ago. In every case the dump indicated the last instruction was always in the call to tag_hash(). I also noticed that each time the values passed to tag_hash which are used to generate the key were just below the 32-bit max value, and tag_hash should be returning a uint32 value. Now I'm really suspecting that there is some issue with this. Below are the traces of the four core dumps which point to the issue I'm suspecting. This is likely already resolved in new versions, which BTW I am planning to upgrade to the 8.2 in the coming months. But I'd still like to know if anyone out there knows about this issue and could confirm my suspicions. DUMP1 core 'pg_core.20070603' of 22757: /home/ttadmin/postgresql/bin/postmaster -D /home/ttadmin/postgresql/da 001db000 tag_hash (ff4001d8, c, 1daff4, fef8ccd0, fef8d00c, 1) + c 001da134 call_hash (263370, ff4001d8, 0, 1, 301688, 300740) + 2c 001da1f8 hash_search (263370, ff4001d8, 0, ff40015b, 301688, 300740) + 64 00144ae8 BufTableLookup (ff4001d8, ff4001d8, 125aa000, 244c00, 92d4, fefe3c) + 3c 001458e4 BufferAlloc (2d23e8, 3, ff40027f, 0, 92d3, ff1764) + 94 00145420 ReadBufferWithBufferLock (2d23e8, 3, 0, 0, 92d2, ff00dc) + f0 00145314 ReadBuffer (2d23e8, 3, 0, fef8d360, fef8e020, ff001c) + 18 0005fdec _bt_getbuf (2d23e8, 3, 2, 10000, fef8ecb0, 3) + 28 000601fc _bt_getstackbuf (2d23e8, 3003c8, 2, fef8ccd0, fef8d00c, 1) + 28 0005bc8c _bt_insertonpg (2d23e8, 1, 3003c8, 1, 301688, 300740) + d04 DUMP2 core 'core' of 9640: /home/ttadmin/postgresql/bin/postmaster -B 18000 -D /home/ttadmin/post 001db000 tag_hash (ff4001c8, c, 1daff4, fd4f13d0, fd4f170c, 1) + c 001da134 call_hash (262b50, ff4001c8, 0, 1, 465cf8, 464db0) + 2c 001da1f8 hash_search (262b50, ff4001c8, 0, ff40014b, 465cf8, 464db0) + 64 00144ae8 BufTableLookup (ff4001c8, ff4001c8, 124f6000, 244c00, 927a, 10efd64) + 3c 001458e4 BufferAlloc (428980, 3, ff40026f, 0, 9279, 10efd7c) + 94 00145420 ReadBufferWithBufferLock (428980, 3, 0, 0, 9278, 10efd94) + f0 00145314 ReadBuffer (428980, 3, 0, fd4f1a60, fd4f2720, 10efdac) + 18 0005fdec _bt_getbuf (428980, 3, 2, 10000, fd4f33b0, 3) + 28 000601fc _bt_getstackbuf (428980, 464a38, 2, fd4f13d0, fd4f170c, 1) + 28 DUMP3 core 'core' of 10466: /home/ttadmin/postgresql/bin/postmaster -B 18000 -D /home/ttadmin/post 001db000 tag_hash (ff4001c8, c, 1daff4, fd6093d0, fd60970c, 1) + c 001da134 call_hash (262b50, ff4001c8, 0, 1, 463be0, 462c98) + 2c 001da1f8 hash_search (262b50, ff4001c8, 0, ff40014b, 463be0, 462c98) + 64 00144ae8 BufTableLookup (ff4001c8, ff4001c8, 124f6000, 244c00, 927a, 10ef044) + 3c 001458e4 BufferAlloc (428848, 3, ff40026f, 0, 9279, 10ef05c) + 94 00145420 ReadBufferWithBufferLock (428848, 3, 0, 0, 9278, 10ef074) + f0 00145314 ReadBuffer (428848, 3, 0, fd609a60, fd60a720, 10ef08c) + 18 0005fdec _bt_getbuf (428848, 3, 2, 10000, fd60b3b0, 3) + 28 000601fc _bt_getstackbuf (428848, 462920, 2, fd6093d0, fd60970c, 1) + 28 DUMP4 core 'core' of 12400: /home/ttadmin/postgresql/bin/postmaster -B 18000 -D /home/ttadmin/post 001db000 tag_hash (ff4001f0, c, 1daff4, 19c7, ce, 1) + c 001da134 call_hash (262b50, ff4001f0, 0, 22, ffffffff, 455bd0) + 2c 001da1f8 hash_search (262b50, ff4001f0, 0, ff400173, f6000000, 0) + 64 00144ae8 BufTableLookup (ff4001f0, ff4001f0, 0, 7, 246000, 1146554) + 3c 001458e4 BufferAlloc (4273d8, ffffffff, ff400297, 0, 0, 0) + 94 00145420 ReadBufferWithBufferLock (4273d8, ffffffff, 0, 0, 0, 0) + f0 00145314 ReadBuffer (4273d8, ffffffff, 6, 260000, 0, 0) + 18 0005fe2c _bt_getbuf (4273d8, ffffffff, 2, 455f18, f9583a60, ffbaa0e8) + 68 0005cb8c _bt_split (4273d8, 1, 456c48, 19c7, ce, 1) + 28 0005bd38 _bt_insertonpg (4273d8, 19c7, 455988, 1, 456c48, 455bd0) + db0 Rgds/ Jose Arteaga -----Original Message----- From: ARTEAGA Jose Sent: Thursday, June 07, 2007 12:32 PM To: Alvaro Herrera; Richard Huxton Cc: pgsql-general@postgresql.org Subject: RE: [GENERAL] Limitations on 7.0.3? I've looked at the pg_index table and we are currently at 15Mill entries, which should be OK. After 2-3 days runtime I just get a disconnect error from backend while doing an insert. After I restore the DB and insert the same entries it runs fine. Following is the error I get: "Query pgsql8.1: PGRES_FATAL_ERROR, pqReadData() -- backend closed the channel unexpectedly.\n\tThis probably means the backend terminated abnormally\n\tbefore or while processing the request.\n INSERT INTO teststeprun (description, stepunqid, stepid, stepstarttime, duration, parentsn, stepresult) VALUES ('Qos Reset', '24920757/3267', ' 95. 4', '2007-06-02 19:02:05', '0 0:0:25', '5311955', '0')" Also worth mentioning is that I just this week found out about a very, very important parameter "shared buffers". Ever since the original person setup our PG (individual no longer with us) this DB had been running without any major glitches, albeit slow. All this time the shared buffers were running at default of "64" (8192 block size). Once I have got this back up and running I have since set this to 1600 shared buffers (~125MB). I've since noticed a dramatic performance improvement, I hope that I've striked gold. But cannot claim victory yet it's only been up for 2 days. -Jose -----Original Message----- From: Alvaro Herrera [mailto:alvherre@commandprompt.com] Sent: Thursday, June 07, 2007 7:02 AM To: Richard Huxton Cc: ARTEAGA Jose; pgsql-general@postgresql.org Subject: Re: [GENERAL] Limitations on 7.0.3? Richard Huxton wrote: > ARTEAGA Jose wrote: > >I have been using postgres 7.0.3 (Solaris 2.9) for the past 4 years > >without any major problems, until about a month ago. We are now > >experiencing crashes on the backend (connection lost to backend) while > >running queries (inserts, etc). Anyway I don't want to make this too > >technical but I'd just like to understand if maybe I'm running into some > >sort of limit on the size of the database. My biggest table is currently > >at 1.5B tuples. > > > >Would appreciate if anyone could let me know or is aware of any limits > >with 7.0 version. > > I don't remember any specific size limitations on the 7.0 series. For > more detailed help you'll have to provide some specific error messages. 7.0 didn't have any protection against Xid wraparound. As soon as you hit the 4 billion transactions mark, your data suddenly disappeared. That's what I heard at least -- I didn't have much experience with such an old beast. We switched rather quickly to 7.1. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "Before you were born your parents weren't as boring as they are now. They got that way paying your bills, cleaning up your room and listening to you tell them how idealistic you are." -- Charles J. Sykes' advice to teenagers
Richard Huxton <dev@archonet.com> writes: > I think tag_hash (in /backend/utils/hash/hashfn.c) is responsible for > internal hash-tables (rather than hash indexes). It takes a pointer to a > key to hash and a keysize (in bytes), so either the pointer is bad or > the size is too long and it's reading off the end. Those stack traces hardly seem credible --- the key being hashed is a local variable in BufferAlloc, so the pointer can't really be bad, and the length apparently is 12 as it should be. I was wondering if maybe the hashp->hash function pointer got corrupted, but that seems unlikely too seeing that it's part of a struct that never changes. > If it's not a hardware related problem, then it's a bug, but you're > unlikely to get a fix given how old the code is. If you can reproduce the problem in something reasonably recent, we'd be interested in taking a look. Nobody is going to spend any time on 7.0.x though. It *will* eat your data someday ... you need to put more time into getting off it and less into studying its bugs. regards, tom lane