COPY FROM WITH HEADER skips a tuple every 4 billion tuples - Mailing list pgsql-hackers

From David Rowley
Subject COPY FROM WITH HEADER skips a tuple every 4 billion tuples
Date
Msg-id CAKJS1f88yh-6wwEfO6QLEEvH3BEugOq2QX1TOja0vCauoynmOQ@mail.gmail.com
Whole thread Raw
Responses Re: COPY FROM WITH HEADER skips a tuple every 4 billion tuples  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I'd been looking over the COPY FROM code tonight when I saw something
pretty scary looking:

/* on input just throw the header line away */
if (cstate->cur_lineno == 0 && cstate->header_line)
{
    cstate->cur_lineno++;
    if (CopyReadLine(cstate))
       return false; /* done */
}

while it might not look too scary by itself, it gets a bit more so
when you learn that the cur_lineno is only 32 bits wide. This will
result in skipping a tuple every time the 32-bit variable wraps back
around to 0 again.

Maybe when this code was written copying > 4 billion rows was just a
far-off dream, but with today's hardware, it really didn't take that
long to see this actually happen for real.

postgres=# create unlogged table t(a int);
CREATE TABLE
Time: 1.339 ms
postgres=# insert into t select 0 from generate_series(1, 4300000000);
INSERT 0 4300000000
Time: 2128367.019 ms (35:28.367)
postgres=# copy t to '/home/ubuntu/t.csv' with (format csv, header);
COPY 4300000000
Time: 2294331.494 ms (38:14.331)
postgres=# truncate t;
TRUNCATE TABLE
Time: 30.367 ms
postgres=# copy t from '/home/ubuntu/t.csv' with (format csv, header);
COPY 4299999999
Time: 2693186.475 ms (44:53.186)

A patch to fix is attached.

(I just made the variable 64-bit)

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Christoph Berg
Date:
Subject: PG11 jit failing on ppc64el
Next
From: Andres Freund
Date:
Subject: Re: PG11 jit failing on ppc64el