I am using the COPY command to insert 10 million rows from a CSV file
into a database table and I am finding the performance is
unacceptable. When the COPY command is executed the disk I/O is
extremely high which leads to degraded query performance for other
queries being executed on the database.
I have tried removing the indexes from the database table and this
dramatically improved performance (both reducing the execution time
and reducing disk I/O).
The constraints I am working with are:
- Indexes must be in place to ensure that other queries that are
executed on the table while the COPY command is executing have
acceptable performance.
- The table should not be locked for reads during the COPY (it is
fine to lock for writes).
I don't know how COPY is implemented, but it seems that the indexes
are updated as data is inserted into the table. Ideally I would like
to delay updating the indexes until the COPY command has completed and
just update the indexes a single time, I think this would lead to much
better performance. Is this possible?
Another idea that I had was to start a transaction, drop the indexes,
run the COPY command and then recreate the indexes. The problem with
this approach is that DROP INDEX acquires an exclusive lock on the
table, which is not acceptable in this system (otherwise other queries
won't be able to execute). I tried using DROP INDEX CONCURRENTLY, but
that cannot be executed inside a transaction.
What are my options in this scenario?
Thanks