Thread: database troubles - various errors
We're having a lot of trouble with one of our new servers. It's an AMD64 dual cpu system, running on Gentoo Linux, with kernel 2.6.7. Data is stored on a JFS partition. Before we installed the server, memory was fully tested and no problems were found. Postgres is version 7.4.3. We've had a number of different errors. One of the most popular seems to be a Cannot open segment. This usually occurs on an index, generally the primary key index. A reindex will fix the problem. Last night we got this one: ERROR: could not find a feasible split point for "some_index". This one seemed funny because it was on a btree index of one column. A reindex made the error go away. Every table in the database was reindexed at some point last week. The table with the index involved uses all standard data types and has no non-btree indexes. Another one we've had and fixed was ERROR: duplicate key violates unique constraint "pg_class_oid_index". One of the most annoying errors is occassional index corruption that is not reported. We make use of gist tsearch2 indexes, and sometimes they will decide to stop working without an error. The only clue we have that there is a problem is that search words that generally return big results will return any results or very few. A reindex will fix the problem. Vacuums are done manually, but generally there's at least one every week. Basically, I'd like to know if anyone has seen any problems similar to this, and if they have any suggestions on how to fix these issues. -Adam Palmblad
On Mon, 2004-08-23 at 10:14, A Palmblad wrote: > We're having a lot of trouble with one of our new servers. It's an AMD64 > dual cpu system, running on Gentoo Linux, with kernel 2.6.7. Data is stored > on a JFS partition. Before we installed the server, memory was fully tested > and no problems were found. Postgres is version 7.4.3. > > We've had a number of different errors. One of the most popular seems > to be a Cannot open segment. This usually occurs on an index, generally the > primary key index. A reindex will fix the problem. > > Last night we got this one: ERROR: could not find a feasible split > point for "some_index". This one seemed funny because it was on a btree > index of one column. A reindex made the error go away. Every table in the > database was reindexed at some point last week. The table with the index > involved uses all standard data types and has no non-btree indexes. > > Another one we've had and fixed was ERROR: duplicate key violates unique > constraint "pg_class_oid_index". > > One of the most annoying errors is occassional index corruption that is not > reported. We make use of gist tsearch2 indexes, and sometimes they will > decide to stop working without an error. The only clue we have that there > is a problem is that search words that generally return big results will > return any results or very few. A reindex will fix the problem. > > Vacuums are done manually, but generally there's at least one every week. > > Basically, I'd like to know if anyone has seen any problems similar to this, > and if they have any suggestions on how to fix these issues. Have you tested the memory, hard drive and CPU on this machine to ensure proper operation? This sounds like a hardware problem to me.
Memory has been tested, and it was okay. What's the best way to test the CPU and hard drive / controller? -Adam ----- Original Message ----- From: "Scott Marlowe" <smarlowe@qwest.net> To: "A Palmblad" <adampalmblad@yahoo.ca> Cc: "General Postgres" <pgsql-general@postgresql.org> Sent: Monday, August 23, 2004 9:41 AM Subject: Re: [GENERAL] database troubles - various errors > On Mon, 2004-08-23 at 10:14, A Palmblad wrote: > > We're having a lot of trouble with one of our new servers. It's an AMD64 > > dual cpu system, running on Gentoo Linux, with kernel 2.6.7. Data is stored > > on a JFS partition. Before we installed the server, memory was fully tested > > and no problems were found. Postgres is version 7.4.3. > > > > We've had a number of different errors. One of the most popular seems > > to be a Cannot open segment. This usually occurs on an index, generally the > > primary key index. A reindex will fix the problem. > > > > Last night we got this one: ERROR: could not find a feasible split > > point for "some_index". This one seemed funny because it was on a btree > > index of one column. A reindex made the error go away. Every table in the > > database was reindexed at some point last week. The table with the index > > involved uses all standard data types and has no non-btree indexes. > > > > Another one we've had and fixed was ERROR: duplicate key violates unique > > constraint "pg_class_oid_index". > > > > One of the most annoying errors is occassional index corruption that is not > > reported. We make use of gist tsearch2 indexes, and sometimes they will > > decide to stop working without an error. The only clue we have that there > > is a problem is that search words that generally return big results will > > return any results or very few. A reindex will fix the problem. > > > > Vacuums are done manually, but generally there's at least one every week. > > > > Basically, I'd like to know if anyone has seen any problems similar to this, > > and if they have any suggestions on how to fix these issues. > > Have you tested the memory, hard drive and CPU on this machine to ensure > proper operation? This sounds like a hardware problem to me. > >
"A Palmblad" <adampalmblad@yahoo.ca> writes: > Basically, I'd like to know if anyone has seen any problems similar to this, > and if they have any suggestions on how to fix these issues. It's not going to answer your issue directly, but see Joe Conway's recent report of similar failures: http://archives.postgresql.org/pgsql-hackers/2004-08/msg01251.php and earlier messages in that thread. Personally I'm wondering about kernel/filesystem bugs. Gentoo's emphasis on bleeding-edge code is fine for dev work but I think it's a poor choice for a production server. regards, tom lane
On Mon, 2004-08-23 at 11:51, A Palmblad wrote: > Memory has been tested, and it was okay. What's the best way to test the > CPU and hard drive / controller? > -Adam What method did you use for testing the memory? For testing the hard drive I usually write a large file of random / semi-random garbage that I have the md5 for, then md5 what's on the disk. Over and over, usually for days when testing a new server. The CPU will cause many other problems should it really be acting up. While it's a remote possibility the CPU is causing the problems, it's more likely another bit of hardware, if not a kernel / fs bug. I'd read the article the other poster mentioned, and try a different FS to see if the problem goes away. I'm not quite as leery of Gentoo as the other fellow, but I probably wouldn't be using it in production either. Do you have an unusual setup? HW/RAID/LVM etc?