Mass file imports - Mailing list pgsql-performance

From Greg Spiegelberg
Subject Mass file imports
Date
Msg-id 3F1C3638.8060100@cranel.com
Whole thread Raw
List pgsql-performance
Hello,

I'm hunting for some advice on loading 50,000+ files all less than
32KB to a 7.3.2 database.  The table is simple.

create table files (
  id    int8 not null primary key,
  file  text not null,
  size  int8 not null,
  uid   int not null,
  raw   oid
);

The script (currently bash) pulls a TAR file out of a queue, unpacks it
to a large ramdisk mounted with noatime and performs a battery of tests
on the files included in the TAR file.  For each file in the TAR is will
add the following to a SQL file...

update files set raw=lo_import('/path/to/file/from/tar') where
file='/path/to/file/from/tar';

This file begins with BEGIN; and ends with END; and is fed to Postgres
via a "psql -f sqlfile" command.  This part of the process can take
anywhere from 30 to over 90 minutes depending on the number of files
included in the TAR file.

System is a RedHat 7.3 running a current 2.4.20 RedHat kernel and
   dual PIII 1.4GHz
   2GB of memory
   512MB ramdisk (mounted noatime)
   mirrored internal SCSI160 10k rpm drives for OS and swap
   1 PCI 66MHz 64bit QLA2300
   1 Gbit SAN with several RAID5 LUN's on a Hitachi 9910

All filesystems are ext3.

Any thoughts?

Greg

--
Greg Spiegelberg
  Sr. Product Development Engineer
  Cranel, Incorporated.
  Phone: 614.318.4314
  Fax:   614.431.8388
  Email: gspiegelberg@Cranel.com
Cranel. Technology. Integrity. Focus.



pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Commenting postgresql.conf
Next
From: "scott.marlowe"
Date:
Subject: Re: Commenting postgresql.conf