Re: autocommit (true/false) for more than 1 million records - Mailing list pgsql-performance

From Kevin Grittner
Subject Re: autocommit (true/false) for more than 1 million records
Date
Msg-id 1409092428.41025.YahooMailNeo@web122306.mail.ne1.yahoo.com
Whole thread Raw
In response to Re: autocommit (true/false) for more than 1 million records  (Alex Goncharov <alex.goncharov.usa@gmail.com>)
Responses Re: autocommit (true/false) for more than 1 million records
List pgsql-performance
Alex Goncharov <alex.goncharov.usa@gmail.com> wrote:

> Suppose I COPY a huge amount of data, e.g. 100 records.
>
> My 99 records are fine for the target, and the 100-th is not --
> it comes with a wrong record format or a target constraint
> violation.
>
> The whole thing is aborted then, and the good 99 records are not
> making it into the target table.

Right.  This is one reason people often batch such copies or check
the data very closely before copying in.

> My question is: Where are these 99 records have been living, on
> the database server, while the 100-th one hasn't come yet, and
> the need to throw the previous data accumulation away has not
> come yet?

They will have been written into the table.  They do not become
visible to any other transaction until and unless the inserting
transaction successfully commits.  These slides may help:

http://momjian.us/main/writings/pgsql/mvcc.pdf

> There have to be some limits to the space and/or counts taken by
> the new, uncommitted, data, while the COPY operation is still in
> progress.  What are they?

Primarily disk space for the table.  If you are not taking
advantage of the "unlogged load" optimization, you will have
written Write Ahead Log (WAL) records, too -- which (depending on
your configuration) you may be archiving.  In that case, you may
need to be concerned about the archive space required.  If you have
foreign keys defined for the table, you may get into trouble on the
RAM used to track pending checks for those constraints.  I would
recommend adding any FKs after you are done with the big bulk load.

PostgreSQL does *not* have a "rollback log" which will impose a limit.

> Say, I am COPYing 100 TB of data and the bad records are close
> to the end of the feed -- how will this all error out?

The rows will all be in the table, but not visible to any other
transaction.  Autovacuum will clean them out in the background, but
if you want to restart your load against an empty table it might be
a good idea to TRUNCATE that table; it will be a lot faster.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-performance by date:

Previous
From: Alex Goncharov
Date:
Subject: Re: autocommit (true/false) for more than 1 million records
Next
From: Alex Goncharov
Date:
Subject: Re: autocommit (true/false) for more than 1 million records