COPY is the command used for bulk loading in PostgreSQL (check the "SQL
Commands" in the "Reference Manual" part of the docs).
COPY is not intelligent about interpreting varying data formats. Your
flexibility is limited to specifying the data delimiter and whatever string
you want to represent NULL. If your data is not in the correct format to
match the PostgreSQL data type for that column in the table, COPY will error
out.
If you have data that doesn't meet these criteria, you have two choices:
1. Write a script to edit your data into the required form (which is what
people generally do), or
2. Go to the source code for COPY, and develop it into a more
intelligent/flexible form.
Requests for no. 2 don't seem to be frequent, so I am guessing that
development of COPY is not high on the developers' todo list.
----- Original Message -----
From: "Guy Fraser" <guy@incentre.net>
To: "Joel Pang" <joelpang@spectratech.com>; "PostgreSQL general"
<pgsql-general@postgresql.org>
Sent: Monday, September 10, 2001 12:10 PM
Subject: Re: [GENERAL] SQL Loader?
> Joel Pang wrote:
> >
> > Hi Fraser,
> >
> > I've read your mail in postgresql web site that you've a procedure that
will do bulk loading of records into database. I've been looking a utility
just like the sql loader of Oracle for the Postgres database. I was
wondering if you will share your procedure will me? or give me some URL
links that I can get reference on how to write my own sql loader procedure?
> >
> > Thanks in advance for your help.
> >
> > Cheers,
> > Joel
>
> Please not the script below uses some special program to modify radius
> accounting files for import as an ordered tab delimited file format.
>
> The script also shows a way to import into an intermediate table for
> seperating different uniqu records into seperate files then moving the
> duplicate entries to a table for data integrity.
>
> This also demonstrates a method of using tables determined from the data
> being imported.
>
> I wrote this software a long time ago and have spent little time
> patching or rewriting. Since the script was initially written, I now
> know some better ways of performing some of these tasks. I don't have
> time to rewrite the script and it has worked for over 3 years so it is
> low priority.
>
> Guy Fraser
>
> ======== begin radimport ========
> #!/bin/bash
> ORIG=`pwd`
> WORKDIR='/usr/local/pgsql'
>
> cd $WORKDIR
> echo Start collecting files
> wget -P $WORKDIR/ -t 10 -T 30 -c \
> ftp://username:password@host.domain/path/*.acct.gz # modified for
> security
> echo Finished collecting files
> echo Start decompressing files
> gunzip -v *.acct.gz
> echo Finished decompressing files
>
> for fname in `ls *.acct`;
> do {
> [ -f $fname ] && {
> date
> echo Start inputing $fname
> # parserad - convert radius detail file to tab delimited format.
> /usr/local/sbin/parserad -i $fname \
> | /usr/local/pgsql/bin/psql -c "\
> copy detail from stdin;" radius
> echo Finished inputing $fname
> echo Start compressing $fname
> gzip -v $fname
> echo Finished compressing $fname
> #
> #
> # Clean up detail
> #
> /usr/local/bin/psql -c "vacuum detail;" radius
> #
> # If more than one month determine Current and Previous, Month and Year.
> #
> #
> MINTIME=`/usr/local/pgsql/bin/psql -c "\
> select date_trunc('month',min(\"Time-Stamp\")::abstime) from detail;\
> " -A -t radius`
> MINMON=`echo $MINTIME | awk '{print $2}' -`
> MINYEAR=`echo $MINTIME | awk '{print $5}' -`
> MAXTIME=`/usr/local/pgsql/bin/psql -c "\
> select date_trunc('month',max(\"Time-Stamp\")::abstime) from detail;\
> " -A -t radius`
> MAXMON=`echo $MAXTIME | awk '{print $2}' -`
> MAXYEAR=`echo $MAXTIME | awk '{print $5}' -`
> [ "$MAXYEAR" = "" ] && (echo "Exiting: No Data in detail table." || exit
> 1)
> echo Moving $fname
> mv $fname.gz /mnt/sdb3/done/$MAXYEAR
> echo Start processing data from $fname
> #
> # Process records in detail file and create a unique record file called
> radius.
> #
> #
> echo Creating lookup table
> /usr/local/bin/psql -c "\
> select min(oid) as recnum,max(\"Acct-Session-Time\"),\
>
>
\"Acct-Status-Type\",\"Acct-Session-Id\",\"NAS-IP-Address\",\"NAS-Port-Id\",
\
> \"User-Name\",\"Realm\",\"Framed-IP-Address\",\"Calling-Station-Id\" \
> into radius \
> from detail \
> group by
> \"Acct-Session-Id\",\"NAS-IP-Address\",\"NAS-Port-Id\",\"User-Name\",\
>
>
\"Realm\",\"Framed-IP-Address\",\"Calling-Station-Id\",\"Acct-Status-Type\";
\
> " radius
> /usr/local/bin/psql -c "vacuum radius;" radius
>
> #
> #
> # Move stop records to stop table
> #
> echo Creating unique stop record table
> /usr/local/bin/psql -c "\
> select recnum into radstop from radius where \"Acct-Status-Type\" =
> 'Stop';\
> " radius
> echo Filtering stop records
> /usr/local/bin/psql -c "\
> select a.* into stop from detail as a,radstop as b where a.oid =
> b.recnum;\
> " radius
> /usr/local/bin/psql -c "select count(*) as \"Filtered\" from stop;"
> radius
> echo Cleaning stop records
> /usr/local/bin/psql -c "\
> delete from detail where oid = radstop.recnum;\
> " radius
> echo Cleaning temporary tables
> /usr/local/bin/psql -c "drop table radstop;" radius
> /usr/local/bin/psql -c "vacuum detail;" radius
>
> #
> # Move start and alive records to start table
> #
> echo Creating unique start record table
> /usr/local/bin/psql -c "\
> select recnum into radstart from radius where \"Acct-Status-Type\" !=
> 'Stop';\
> " radius
> echo Filtering start records
> /usr/local/bin/psql -c "\
> select a.* into start from detail as a,radstart as b where a.oid =
> b.recnum;\
> " radius
> /usr/local/bin/psql -c "select count(*) as \"Filtered\" from start;"
> radius
> echo Cleaning start records
> /usr/local/bin/psql -c "\
> delete from detail where oid = radstart.recnum;\
> " radius
> echo Cleaning temporary tables
> /usr/local/bin/psql -c "drop table radstart;" radius
> /usr/local/bin/psql -c "drop table radius;" radius
> /usr/local/bin/psql -c "vacuum detail;" radius
>
> #
> # Move rest of records to dups table
> #
> echo Filtering duplicate records
> /usr/local/bin/psql -c "select * into dups from detail;" radius
> /usr/local/bin/psql -c "select count(*) as \"Filtered\" from dups;"
> radius
> echo Cleaning duplicate records
> /usr/local/bin/psql -c "delete from detail;" radius
> echo Cleaning temporary tables
> /usr/local/bin/psql -c "vacuum detail;" radius
>
> #
> #
> # Determine if more than one month.
> #
> [ "$MAXMON" != "$MINMON" ] && {
> echo Previous Month
> #
> # Starting Previous Month
> #
> #
> #
> #
> # Put Stop records in Previous stop
> #
> #
> echo Inserting stop records into ${MINMON}start
> /usr/local/bin/psql -c "\
> insert into ${MINMON}stop \
> select * \
> from stop \
> where \"Time-Stamp\" between \
> date_trunc('month','$MINMON 1 $MINYEAR 00:00:00'::abstime) \
> and date_trunc('month','$MINMON 1 $MINYEAR 00:00:00'::abstime \
> + '32 days')::abstime - '1 second';
> " radius
> #
> # Put Start and Alive records in Previous start
> #
> echo Inserting start records into ${MINMON}start
> /usr/local/bin/psql -c "\
> insert into ${MINMON}start \
> select * \
> from start \
> where \"Time-Stamp\" between \
> date_trunc('month','$MINMON 1 $MINYEAR 00:00:00'::abstime) \
> and date_trunc('month','$MINMON 1 $MINYEAR 00:00:00'::abstime \
> + '32 days')::abstime - '1 second';\
> " radius
> #
> # Put rest of data in Previous dups
> #
> echo Inserting duplicate records into ${MINMON}dups
> /usr/local/bin/psql -c "\
> insert into ${MINMON}dups \
> select * \
> from dups \
> where \"Time-Stamp\" between \
> date_trunc('month','$MINMON 1 $MINYEAR 00:00:00'::abstime) \
> and date_trunc('month','$MINMON 1 $MINYEAR 00:00:00'::abstime \
> + '32 days')::abstime - '1 second';\
> " radius
> #
> #
> # clean up after previous month.
> #
> echo Finished processing data from $fname for $MINMON $MINYEAR
> date
> echo Cleaning up after $fname
> /usr/local/bin/psql -c "vacuum ${MINMON}stop;" radius
> /usr/local/bin/psql -c "vacuum ${MINMON}start;" radius
> /usr/local/bin/psql -c "vacuum ${MINMON}dups;" radius
> date
> echo Making Daily and Monthly Tables for $MINMON
> /usr/local/pgsql/make-day-time $MINMON
> }
> echo Current Month
> #
> # Starting Current Month
> #
> #
> #
> #
> # Put Stop records in Current stop
> #
> echo Inserting stop records into ${MINMON}stop
> /usr/local/bin/psql -c "\
> insert into ${MAXMON}stop \
> select * \
> from stop \
> where \"Time-Stamp\" between \
> date_trunc('month','$MAXMON 1 $MAXYEAR 00:00:00'::abstime) \
> and date_trunc('month','$MAXMON 1 $MAXYEAR 00:00:00'::abstime \
> + '32 days')::abstime - '1 second';\
> " radius
> /usr/local/bin/psql -c "drop table stop;" radius
> #
> # Put Stop records in Current stop
> #
> echo Inserting start records into ${MINMON}start
> /usr/local/bin/psql -c "\
> insert into ${MAXMON}start \
> select * \
> from start \
> where \"Time-Stamp\" between \
> date_trunc('month','$MAXMON 1 $MAXYEAR 00:00:00'::abstime) \
> and date_trunc('month','$MAXMON 1 $MAXYEAR 00:00:00'::abstime \
> + '32 days')::abstime - '1 second';\
> " radius
> /usr/local/bin/psql -c "drop table start;" radius
> #
> # Put rest of data in Current dups
> #
> echo Inserting duplicate records into ${MINMON}dups
> /usr/local/bin/psql -c "\
> insert into ${MAXMON}dups \
> select * \
> from dups \
> where \"Time-Stamp\" between \
> date_trunc('month','$MAXMON 1 $MAXYEAR 00:00:00'::abstime) \
> and date_trunc('month','$MAXMON 1 $MAXYEAR 00:00:00'::abstime \
> + '32 days')::abstime - '1 second';\
> " radius
> /usr/local/bin/psql -c "drop table dups;" radius
> #
> #
> #
> #
> #
> #
> # clean up after current month.
> #
> echo Finished processing data from $fname for $MAXMON $MAXYEAR
> date
> echo Cleaning up after $fname
> /usr/local/bin/psql -c "vacuum ${MAXMON}stop;" radius
> /usr/local/bin/psql -c "vacuum ${MAXMON}start;" radius
> /usr/local/bin/psql -c "vacuum ${MAXMON}dups;" radius
> date
> echo Making Daily and Monthly Tables for $MAXMON
> /usr/local/pgsql/make-day-time $MAXMON
> }
> }
> date
> done
> echo "Finished, no more files to process ."
> cd $ORIG
> ======== end radimport ========
>
> --
> There is a fine line between genius and lunacy, fear not, walk the
> line with pride. Not all things will end up as you wanted, but you
> will certainly discover things the meek and timid will miss out on.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>
>