Tom Lane wrote:
> Ultimately, there's always going to be a tradeoff between speed and
> flexibility. It may be that we should just say "if you want to import
> dirty data, it's gonna cost ya" and not worry about the speed penalty
> of subtransaction-per-row. But that still leaves us with the 2^32
> limit. I wonder whether we could break down COPY into sub-sub
> transactions to work around that...
>
Regarding that tradeoff between speed and flexibility I think we could
propose multiple options:
- maximum speed: current implementation fails on first error
- speed with error logging: copy command fails if there is an error but
continue to log all errors
- speed with error logging best effort: no use of sub-transactions but
errors that can safely be trapped with pg_try/catch (no index violation,
no before insert trigger, etc...) are logged and command can complete
- pre-loading (2-phase copy): phase 1: copy good tuples into a [temp]
table and bad tuples into an error table. phase 2: push good tuples to
destination table. Note that if phase 2 fails, it could be retried since
the temp table would be dropped only on success of phase 2.
- slow but flexible: have every row in a sub-transaction -> is there any
real benefits compared to pg_loader?
Tom was also suggesting 'refactoring COPY into a series of steps that
the user can control'. What would these steps be? Would that be per row
and allow to discard a bad tuple?
Emmanuel
--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu@frogthinker.org
Skype: emmanuel_cecchet