Thread: COPY bug on 7.3.4 whe copying lots of data

COPY bug on 7.3.4 whe copying lots of data

From
"Mason"
Date:
We have a strange issue going on with PostGres 7.3.4 on NetBSD.  When we
have a COPY command in a script that will be importing a great deal of data
(on the order of 200Meg), the command does not always insert the data into
the table in the same order that is in the script.  i.e.  When you do a
SELECT * FROM table, or else a pg_dump, the data is all correctly there, and
most of it is in the same order as the original script, but ten or twenty
lines will be shifted randomly around.

This is an issue for us because we would like to have a utility to diff the
data in databases once a day or so by dumping their tables into individual
files and then making sure they contain everything that we used to build
them with.

I'm primarily wondering if anyone could explain WHY this happens.  I don't
know the internals of Postgres, so the best I could come up with is that
there may be more than one thread inserting data and they are getting out of
synch in some way.

Can anyone put me on the right track?

Re: COPY bug on 7.3.4 whe copying lots of data

From
Tom Lane
Date:
"Mason" <mason@vanten.com> writes:
> We have a strange issue going on with PostGres 7.3.4 on NetBSD.  When we
> have a COPY command in a script that will be importing a great deal of data
> (on the order of 200Meg), the command does not always insert the data into
> the table in the same order that is in the script.

This is not a bug: neither the SQL standard nor Postgres make any
guarantees about the order of rows in a table.

> I'm primarily wondering if anyone could explain WHY this happens.

Probably a matter of where the free space happens to be.  Once the
system runs out of free space internal to the table, it will
consistently append to the end, but as long as there is any free space
it will get used in a not-very-predictable order.

If you can TRUNCATE the table before you insert then you would get the
result you want, at least as long as no updates happen.  But I think any
code that relies on row ordering for correctness is broken by
definition, and you'd be better off rethinking that assumption.

            regards, tom lane