On 9/10/07, Jason L. Buberel <jason@buberel.org> wrote:
>
> When loading very large data exports (> 1 million records) I have found it
> necessary to use the following sequence to achieve even reasonable import
> performance:
>
> 1. Drop all indices on the recipient table
> 2. Use "copy recipient_table from '/tmp/input.file';"
> 3. Recreate all indices on the recipient table
>
> However, I now have tables so large that even the 'recreate all indices'
> step is taking too long (15-20 minutes on 8.2.4).
Well, that's pretty much the fastest way to import data.
If you've got foreign keys you can look into disabling them while
importing, if they're slowing things down.
I've got indexes that individually take 20 to 30 minutes to create on
one large reporting table. all 8 or so indexes take well over 2 hours
to create. But I don't load it very often.
Note that with 8.2 you can create your new indexes in a non-blocking
mode so that the table is accessible while the indexes are being
created.
> I am considering moving to date-based partitioned tables (each table = one
> month-year of data, for example). Before I go that far - is there any other
> tricks I can or should be using to speed up my bulk data loading?
Partitioning would help. I've used it for tables where I've had to do
stuff like that, and it definitely helps.