Thread: Commercial PostgreSQL support

Commercial PostgreSQL support

From

Louie Kwan

Date:

01 March 2004, 11:48:11

We are looking for some sort of commercial PostgreSQL support, but we have
limited budget.

Email based type support may be enough.

Any pointer will be appreciated.

Thanks.

Regards,
Louie Kwan

Re: Commercial PostgreSQL support

From

Devrim GUNDUZ

Date:

01 March 2004, 11:52:14

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hi,

On Mon, 1 Mar 2004, Louie Kwan wrote:

> We are looking for some sort of commercial PostgreSQL support, but we have
> limited budget.

AFAICS, you're located in Canada. Both hub.org and Lanux Limited are
located on Canada.

You can find their addresses from the following URL:

http://www.pgsql.com/partnerlinks/

Regards,
- --
Devrim GUNDUZ
devrim@gunduz.org                devrim.gunduz@linux.org.tr
            http://www.TDMSoft.com
            http://www.gunduz.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFAQ1wptl86P3SPfQ4RAjOaAKC7mwrIhgMD5dq/gy67qFIQbO1U3wCg6v2K
923N9RY6j+exI5gbH8e8Y1c=
=q3e/
-----END PGP SIGNATURE-----

Row data corruption under 7.3.5

From

"Marc Mitchell"

Date:

01 March 2004, 14:35:39

We are running a highly transaction-intensive application in the
following environment:

Postgres Version 7.3.5 built off the source tree, not RPMs
Operating System: Red Hat Enterprise Linux AS (v 2.1)
Kernel: 2.4.9-e.34smp
Hardware: Dell PE 2600-Dual 2.8 Ghz Xeon w/2GB RAM and Dual 36GB Mirrors
using PERC RAID

The same problem has occurred on at least three separate occasions on 3
different tables and manifests itself through the following error while
performing "COPY TABLE" commands:

ERROR:  MemoryContextAlloc: invalid request size 4294967293 lost
synchronization with server, resetting connection

In each case, by examining the copy table output up to the point where
it errors out, a single row could be identified that contained corrupted
char/varchar values but could be queried using primary key or numeric
lookups.  We've been able to work around the issue by deleting the row
and manually re-inserting it based on values determined from a previous
backup.  Note that in each case, it has been determined that the corrupt
row existed without a problem earlier as it could be found in old
backups.  Thus the rows seem to get into the table ok but got wacked at
some future point in time.

Any ideas on what's causing this?

Marc Mitchell - Senior Application Architect
Enterprise Information Solutions, Inc.
Downers Grove, IL 60515
marcm@eisolution.com

Re: Row data corruption under 7.3.5

From

Tom Lane

Date:

01 March 2004, 16:05:51

"Marc Mitchell" <marcm@eisolution.com> writes:
> In each case, by examining the copy table output up to the point where
> it errors out, a single row could be identified that contained corrupted
> char/varchar values but could be queried using primary key or numeric
> lookups.  We've been able to work around the issue by deleting the row
> and manually re-inserting it based on values determined from a previous
> backup.  Note that in each case, it has been determined that the corrupt
> row existed without a problem earlier as it could be found in old
> backups.  Thus the rows seem to get into the table ok but got wacked at
> some future point in time.

> Any ideas on what's causing this?

Hardware problems possibly?  You might try running memtest86 or some
such.

It would be good to take note of the exact bit pattern of the
corruption, if you can match up correct and corrupted versions of the
rows.

            regards, tom lane

Re: Row data corruption under 7.3.5

From

"Marc Mitchell"

Date:

17 March 2004, 12:13:17

This is follow-up to a problem first reported on 3/1/04.  The problem
has continued to occur intermittently and recently we experienced the
first occurrence where the first column of a table was the column where
the corrupted and thus we could not recover it.
Google groups searching have found numerous hits for people reporting
the same symptoms.  While we've seen some instructions to get things
back, we've seen nothing about correcting the root cause.

This is becoming a major production problem and starting to cast doubt
on the "Postgres in Production" decision.

We've observed nothing that would lead us to believe there are any
hardware problems.  Initially we were using write-caching using
battery-backed up cache but we turned that off and are using direct I/O
and still experiencing the same problem.  Furthermore, the fact that the
problems seem isolated to 3 specific tables in a 50+ table database
makes us weary of hardware-level issues.

As far as matching up correct and corrupted rows, here's more detail on
a recent occurrence:

[root@cin1 backups]# /usr/local/pgsql/7.3.5/bin/pg_dump -Ft -p 5432 -U
postgres solo > solo.dmp

pg_dump: ERROR:  MemoryContextAlloc: invalid request size 4294967293
pg_dump: lost synchronization with server, resetting connection
pg_dump: SQL command to dump the contents of table
"freight_track_detail" failed: PQendcopy() failed.
pg_dump: Error message from server: pg_dump: The command was: COPY
public.freight_track_detail (ftd_uid, ftm_uid, txl_uid, ref_nbr_1,
ref_nbr_2, fg_tab_uid, fg_tab_alias, ftd_status_code, scan_timestamp,
add_userid, add_timestamp, mod_userid, mod_timestamp) TO stdout;
..<snip>..
118171  512     2159            00004300854908405208    46366   FGI
2004-01-21 12:25:00     postgres        2004-01-21 15:39:29     OSD
2004-01-22 01:04:40

118153  512     2159            00004304730000071106    46990   FGI
2004-01-21 12:20:00     post    2000-01-01 00:00:00
..End of output.

The second row shows the vchar userid getting lopped off after the first
4 characters.  Note that we've experienced this problem with several
different vchar-typed columns though, as mentioned before, we have
recently seen corruption of integer typed columns.

If we issued an update setting that column plus the subsequent 3 columns
to "null", everything then was back to normal.  This row was right in
the middle of the table.

Furthermore, we recently found problems reported in the same table from
nightly vacuums.  See the following cron-generated emails that contain
error messages as well as datetimes to show the temporal relationship
between these problems:

Re: Row data corruption under 7.3.5

From

Tom Lane

Date:

17 March 2004, 12:53:50

"Marc Mitchell" <marcm@eisolution.com> writes:
> This is follow-up to a problem first reported on 3/1/04.  The problem
> has continued to occur intermittently and recently we experienced the
> first occurrence where the first column of a table was the column where
> the corrupted and thus we could not recover it.

It would be useful to look at pg_filedump output for the affected pages.
See http://sources.redhat.com/rhdb/utilities.html to get that program.
I find "-i -f" options to be its most useful display format, although
the raw hex dump (-d) is also good to look at when investigating
data corruption.  The first part of the TID (15 in your latest example)
is the block number within the table file; you can use contrib/oid2name
if you need help figuring out which file is the table you want.

> We've observed nothing that would lead us to believe there are any
> hardware problems.

Have you done anything to proactively test for hardware problems?
memtest86, badblocks, etc?  It's possible you have a software problem,
but the symptoms sound more like hardware glitches to me.

            regards, tom lane

Re: Row data corruption under 7.3.5

From

"scott.marlowe"

Date:

17 March 2004, 12:56:00

On Wed, 17 Mar 2004, Marc Mitchell wrote:

> This is follow-up to a problem first reported on 3/1/04.  The problem
> has continued to occur intermittently and recently we experienced the
> first occurrence where the first column of a table was the column where
> the corrupted and thus we could not recover it.
> Google groups searching have found numerous hits for people reporting
> the same symptoms.  While we've seen some instructions to get things
> back, we've seen nothing about correcting the root cause.

Have you tested your hardware and proven to yourself that both your memory
and your drive subsystem have no bad blocks or bits?

Postgresql is good, but it can't make up for broken hardware.

If you haven't actually tested your hardware, then you don't know if it's
truly reliable or not.  and if you put a server into production without
testing the drives and memory, you can't just expect it to be good.  I've
seen many a brand new server, both intel and Sparc based, with bad memory
or drives right from the factory.

> We've observed nothing that would lead us to believe there are any
> hardware problems.

What you are seeing from Postgresql IS a sign that you have hardware
problems.  Please test your hardware immediately.

Re: Row data corruption under 7.3.5

From

Radu-Adrian Popescu

Date:

22 March 2004, 11:44:28

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Marc Mitchell wrote:
| This is follow-up to a problem first reported on 3/1/04.  The problem
| has continued to occur intermittently and recently we experienced the
| first occurrence where the first column of a table was the column where
| the corrupted and thus we could not recover it.
| Google groups searching have found numerous hits for people reporting
| the same symptoms.  While we've seen some instructions to get things
| back, we've seen nothing about correcting the root cause.
|

Marc, any update on the problem ? Was it hardware ? The suspense is growing.

Generally speaking, I think people should bother to end the thread they
started by something like "unsolved" or "resolution found, thanks".

Cheers,
- --
Radu-Adrian Popescu
CSA, DBA, Developer
Aldratech Ltd.
+40213212243
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFAXwSQVZmwYru5w6ERAnD3AJ0Tk+1jJIUR1VapSHWz83fffDwBSgCfdax2
5/3f6DngeQpR4WVDZsUUXv0=
=npgX
-----END PGP SIGNATURE-----