Thread: Commercial PostgreSQL support
We are looking for some sort of commercial PostgreSQL support, but we have limited budget. Email based type support may be enough. Any pointer will be appreciated. Thanks. Regards, Louie Kwan
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, On Mon, 1 Mar 2004, Louie Kwan wrote: > We are looking for some sort of commercial PostgreSQL support, but we have > limited budget. AFAICS, you're located in Canada. Both hub.org and Lanux Limited are located on Canada. You can find their addresses from the following URL: http://www.pgsql.com/partnerlinks/ Regards, - -- Devrim GUNDUZ devrim@gunduz.org devrim.gunduz@linux.org.tr http://www.TDMSoft.com http://www.gunduz.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQFAQ1wptl86P3SPfQ4RAjOaAKC7mwrIhgMD5dq/gy67qFIQbO1U3wCg6v2K 923N9RY6j+exI5gbH8e8Y1c= =q3e/ -----END PGP SIGNATURE-----
We are running a highly transaction-intensive application in the following environment: Postgres Version 7.3.5 built off the source tree, not RPMs Operating System: Red Hat Enterprise Linux AS (v 2.1) Kernel: 2.4.9-e.34smp Hardware: Dell PE 2600-Dual 2.8 Ghz Xeon w/2GB RAM and Dual 36GB Mirrors using PERC RAID The same problem has occurred on at least three separate occasions on 3 different tables and manifests itself through the following error while performing "COPY TABLE" commands: ERROR: MemoryContextAlloc: invalid request size 4294967293 lost synchronization with server, resetting connection In each case, by examining the copy table output up to the point where it errors out, a single row could be identified that contained corrupted char/varchar values but could be queried using primary key or numeric lookups. We've been able to work around the issue by deleting the row and manually re-inserting it based on values determined from a previous backup. Note that in each case, it has been determined that the corrupt row existed without a problem earlier as it could be found in old backups. Thus the rows seem to get into the table ok but got wacked at some future point in time. Any ideas on what's causing this? Marc Mitchell - Senior Application Architect Enterprise Information Solutions, Inc. Downers Grove, IL 60515 marcm@eisolution.com
"Marc Mitchell" <marcm@eisolution.com> writes: > In each case, by examining the copy table output up to the point where > it errors out, a single row could be identified that contained corrupted > char/varchar values but could be queried using primary key or numeric > lookups. We've been able to work around the issue by deleting the row > and manually re-inserting it based on values determined from a previous > backup. Note that in each case, it has been determined that the corrupt > row existed without a problem earlier as it could be found in old > backups. Thus the rows seem to get into the table ok but got wacked at > some future point in time. > Any ideas on what's causing this? Hardware problems possibly? You might try running memtest86 or some such. It would be good to take note of the exact bit pattern of the corruption, if you can match up correct and corrupted versions of the rows. regards, tom lane
This is follow-up to a problem first reported on 3/1/04. The problem has continued to occur intermittently and recently we experienced the first occurrence where the first column of a table was the column where the corrupted and thus we could not recover it. Google groups searching have found numerous hits for people reporting the same symptoms. While we've seen some instructions to get things back, we've seen nothing about correcting the root cause. This is becoming a major production problem and starting to cast doubt on the "Postgres in Production" decision. We've observed nothing that would lead us to believe there are any hardware problems. Initially we were using write-caching using battery-backed up cache but we turned that off and are using direct I/O and still experiencing the same problem. Furthermore, the fact that the problems seem isolated to 3 specific tables in a 50+ table database makes us weary of hardware-level issues. As far as matching up correct and corrupted rows, here's more detail on a recent occurrence: [root@cin1 backups]# /usr/local/pgsql/7.3.5/bin/pg_dump -Ft -p 5432 -U postgres solo > solo.dmp pg_dump: ERROR: MemoryContextAlloc: invalid request size 4294967293 pg_dump: lost synchronization with server, resetting connection pg_dump: SQL command to dump the contents of table "freight_track_detail" failed: PQendcopy() failed. pg_dump: Error message from server: pg_dump: The command was: COPY public.freight_track_detail (ftd_uid, ftm_uid, txl_uid, ref_nbr_1, ref_nbr_2, fg_tab_uid, fg_tab_alias, ftd_status_code, scan_timestamp, add_userid, add_timestamp, mod_userid, mod_timestamp) TO stdout; ..<snip>.. 118171 512 2159 00004300854908405208 46366 FGI 2004-01-21 12:25:00 postgres 2004-01-21 15:39:29 OSD 2004-01-22 01:04:40 118153 512 2159 00004304730000071106 46990 FGI 2004-01-21 12:20:00 post 2000-01-01 00:00:00 ..End of output. The second row shows the vchar userid getting lopped off after the first 4 characters. Note that we've experienced this problem with several different vchar-typed columns though, as mentioned before, we have recently seen corruption of integer typed columns. If we issued an update setting that column plus the subsequent 3 columns to "null", everything then was back to normal. This row was right in the middle of the table. Furthermore, we recently found problems reported in the same table from nightly vacuums. See the following cron-generated emails that contain error messages as well as datetimes to show the temporal relationship between these problems:
"Marc Mitchell" <marcm@eisolution.com> writes: > This is follow-up to a problem first reported on 3/1/04. The problem > has continued to occur intermittently and recently we experienced the > first occurrence where the first column of a table was the column where > the corrupted and thus we could not recover it. It would be useful to look at pg_filedump output for the affected pages. See http://sources.redhat.com/rhdb/utilities.html to get that program. I find "-i -f" options to be its most useful display format, although the raw hex dump (-d) is also good to look at when investigating data corruption. The first part of the TID (15 in your latest example) is the block number within the table file; you can use contrib/oid2name if you need help figuring out which file is the table you want. > We've observed nothing that would lead us to believe there are any > hardware problems. Have you done anything to proactively test for hardware problems? memtest86, badblocks, etc? It's possible you have a software problem, but the symptoms sound more like hardware glitches to me. regards, tom lane
On Wed, 17 Mar 2004, Marc Mitchell wrote: > This is follow-up to a problem first reported on 3/1/04. The problem > has continued to occur intermittently and recently we experienced the > first occurrence where the first column of a table was the column where > the corrupted and thus we could not recover it. > Google groups searching have found numerous hits for people reporting > the same symptoms. While we've seen some instructions to get things > back, we've seen nothing about correcting the root cause. Have you tested your hardware and proven to yourself that both your memory and your drive subsystem have no bad blocks or bits? Postgresql is good, but it can't make up for broken hardware. If you haven't actually tested your hardware, then you don't know if it's truly reliable or not. and if you put a server into production without testing the drives and memory, you can't just expect it to be good. I've seen many a brand new server, both intel and Sparc based, with bad memory or drives right from the factory. > We've observed nothing that would lead us to believe there are any > hardware problems. What you are seeing from Postgresql IS a sign that you have hardware problems. Please test your hardware immediately.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Marc Mitchell wrote: | This is follow-up to a problem first reported on 3/1/04. The problem | has continued to occur intermittently and recently we experienced the | first occurrence where the first column of a table was the column where | the corrupted and thus we could not recover it. | Google groups searching have found numerous hits for people reporting | the same symptoms. While we've seen some instructions to get things | back, we've seen nothing about correcting the root cause. | Marc, any update on the problem ? Was it hardware ? The suspense is growing. Generally speaking, I think people should bother to end the thread they started by something like "unsolved" or "resolution found, thanks". Cheers, - -- Radu-Adrian Popescu CSA, DBA, Developer Aldratech Ltd. +40213212243 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFAXwSQVZmwYru5w6ERAnD3AJ0Tk+1jJIUR1VapSHWz83fffDwBSgCfdax2 5/3f6DngeQpR4WVDZsUUXv0= =npgX -----END PGP SIGNATURE-----