Re: COPY losing information - Mailing list pgsql-general

From Jaime Silvela
Subject Re: COPY losing information
Date
Msg-id 4609265B.20706@bear.com
Whole thread Raw
In response to Re: Insert fail: could not open relation with OID 3221204992  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: COPY losing information  (Jaime Silvela <JSilvela@Bear.com>)
List pgsql-general
Just bringing back to life a message I sent last July.

The problem I was having was that when importing very large data sets,
COPY seemed to drop some data. I built a script to use INSERTs, and same
problem. My server runs 8.1.3 on Linux. Several people investigated,
Reece Hart was unable to reproduce the problem using my same data file,
and Tom Lane suggested a buggy client installation dropping info, or
hardware problems in the server.

I've come back to this problem recently, and have found a couple of
interesting things.
I'm using a 418MB data file. wc -l gives me 6,802,367 lines

On the server (Linux running Postgres 2.1.3). 4GB RAM, and the disk
Postgres lives in is a network drive.
After copying the data file to the server and running psql locally.
- Import try 1: count(*) gives 6,801,664
- Import try 2: count(*) gives 6,802,241
- Every import gave a different count
The log file doesn't say anything other than
LOG: checkpoints are occurring too frequently (27 seconds apart), but
that's should not be a problem, right?

I also tried it in my local-disk Windows installation,  which runs
Postgres 8.1.0
- Import try 1: count(*) gives 6,824,366
- All imports since then give, correctly, count(*) = 6,802,367

This time, I FTP'd the file to my mac laptop, which is running 8.2.3.
The import gave me the correct count every time.
Remembering the suggestion of hardware problems, I made an empty install
of 8.2.3 on my server. On it, the data file was fully imported every
single time.
I decided to copy all my data to 8.2.3 to test dependency of the bug on
a) size of the database (30GB of data) or b) some problem with data or
some stored procedure. I've tried the import, and again, it's correct
every single time.

During the import of the database data to 8.2.3, I got mostly a clean
set, except for the 3 error messages listed below which suggest a buggy
kernel.

A bunch of questions:
1) Has anybody seen this type of behavior in 8.1.*, is there a known bug
that might explain the problems in both the Linux server and the Windows
box?
2) Those errors on the database import are troubling, but the affected
only 3 tables. Wouldn't a buggy kernel give more trouble? Is it possible
that this was a bug in 8.1.3's pg_dump? Also, since the database is so
big, the output of pg_dump is split'd and bzip2'd, so there's room for
error there too.
3) On Friday I'm going to upgrade the production database from 8.1.3 to
8.2.3. Any caveats or words of advice?

Thank you,
Jaime


ERROR:  unexpected data beyond EOF in block 23662 of relation "portfolio"
HINT:  This has been seen to occur with buggy kernels; consider updating
your system.
CONTEXT:  COPY portfolio, line 3426949: "210395 1974263 1       723"
STATEMENT:  COPY portfolio (deal_id, security_id, amount,
portfolio_version) FROM stdin;
ERROR:  unexpected data beyond EOF in block 4028 of relation
"coverage_test_val"
HINT:  This has been seen to occur with buggy kernels; consider updating
your system.
CONTEXT:  COPY coverage_test_val, line 338035: "340676  588
2006-08-25      STEPUP_TRIGGER2 6       0.0     0.0   \
  7"
STATEMENT:  COPY coverage_test_val (coverage_test_val_id, deal_id,
observation_date, coverage_test_name, coverage_test\
_priority, coverage_test_value, coverage_test_trigger,
coverage_test_type_id) FROM stdin;
ERROR:  unexpected data beyond EOF in block 4049 of relation
"deal_current_val"
HINT:  This has been seen to occur with buggy kernels; consider updating
your system.
CONTEXT:  COPY deal_current_val, line 511050: "612884   7008
2005-09-21      23      1.957871"
STATEMENT:  COPY deal_current_val (deal_current_val_id, deal_id,
observation_date, type_id, deal_current_val) FROM std\
in;



***********************************************************************
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.

Bear Stearns does not provide tax, legal or accounting advice.  You
should consult your own tax, legal and accounting advisors before
engaging in any transaction. In order for Bear Stearns to comply with
Internal Revenue Service Circular 230 (if applicable), you are notified
that any discussion of U.S. federal tax issues contained or referred to
herein is not intended or written to be used, and cannot be used, for
the purpose of:  (A) avoiding penalties that may be imposed under the
Internal Revenue Code; nor (B) promoting, marketing or recommending to
another party any transaction or matter addressed herein.
***********************************************************************

pgsql-general by date:

Previous
From: merlyn@stonehenge.com (Randal L. Schwartz)
Date:
Subject: Re: cutting out the middleperl
Next
From: "Filip Rembiałkowski"
Date:
Subject: Re: Every user has own database - how?