Thread: Fastest import?

Fastest import?

From
Matthew Hagerty
Date:
Greetings,

What is the fastest way to import data from an ASCII file?  I have ~17,000
records in a .DBF file that I need to get into my pg6.4.2 database.  I have
written a program that reads the .DBF file, massages the data and writes it
out to a text file, one record per line.

I initially had my program issue the insert commands instead of writing
them to a text file, but the import took about 2 hours!  We are only
talking about 4Meg worth of data here, it should not take that long should
it?  I was hoping that once in a text file that psql < text.file would be
faster, but it is not.  Is there any way to speed this up?

I was also looking at the copy command and maybe using the binary mode, but
writing out binary data is not my idea of a good time and I'm on a deadline
here.  Is there a program out there that might take a text file of insert
statements and convert them to a binary format that copy can use?

Any insight would be greatly appreciated.

Thanks,
Matthew Hagerty


Re: [INTERFACES] Fastest import?

From
"Patrick Welche"
Date:
> What is the fastest way to import data from an ASCII file?  I have ~17,000
> records in a .DBF file that I need to get into my pg6.4.2 database.  I have
> written a program that reads the .DBF file, massages the data and writes it
> out to a text file, one record per line.

Funny, I just reinvented the wheel too on Friday.

> I initially had my program issue the insert commands instead of writing
> them to a text file, but the import took about 2 hours!

Did you surround the inserts with "BEGIN WORK" and "COMMIT"? That is
rumoured to help - I only had 6000 rows, so it wasn't a problem.

Re the text file, are you using "INSERT" or better "COPY"?


Cheers,

Patrick

Re: [INTERFACES] Fastest import?

From
Peter T Mount
Date:
On Sat, 10 Apr 1999, Matthew Hagerty wrote:

> Greetings,
>
> What is the fastest way to import data from an ASCII file?  I have ~17,000
> records in a .DBF file that I need to get into my pg6.4.2 database.  I have
> written a program that reads the .DBF file, massages the data and writes it
> out to a text file, one record per line.
>
> I initially had my program issue the insert commands instead of writing
> them to a text file, but the import took about 2 hours!  We are only
> talking about 4Meg worth of data here, it should not take that long should
> it?  I was hoping that once in a text file that psql < text.file would be
> faster, but it is not.  Is there any way to speed this up?
>
> I was also looking at the copy command and maybe using the binary mode, but
> writing out binary data is not my idea of a good time and I'm on a deadline
> here.  Is there a program out there that might take a text file of insert
> statements and convert them to a binary format that copy can use?
>
> Any insight would be greatly appreciated.

Why use binary? Just get your program to output Tab delimited ASCII
data, and use ASCII COPY.

Tip: Don't create your indices before the copy. It will load faster
without them. Once the data's in there, then create them.

Peter

--
       Peter T Mount peter@retep.org.uk
      Main Homepage: http://www.retep.org.uk
PostgreSQL JDBC Faq: http://www.retep.org.uk/postgres
 Java PDF Generator: http://www.retep.org.uk/pdf


Re: [INTERFACES] Fastest import?

From
Wybo Dekker
Date:
Matthew Hagerty wrote:
>
> Greetings,
>
> What is the fastest way to import data from an ASCII file?  I have ~17,000
> records in a .DBF file that I need to get into my pg6.4.2 database.  I have
> written a program that reads the .DBF file, massages the data and writes it
> out to a text file, one record per line.

I do that straight under linux by reading the .dbf file with a small
C-program which writes the data out in tab-separated records. Those
can be read with the copy from command:

/* __________ db2u __________
   convert dbase file
   Usage:
   db2u < dbase-file.dbf > unix_database
    or
   db2u dbase-file.dbf > unix_database
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXFIELDS 100 /* maximum nr of fields */

void main(int argc, char *argv[])
{ char *p,*q,*r;
  typedef char string11[12];
  FILE *in;
  int i,j,n,field_pos[MAXFIELDS+1],nfield=0,pos,lrec;
  char *record,field_type[MAXFIELDS];
  string11 field_name[MAXFIELDS];
  struct
  { char  version;    /* dBase version */
    char  datum[3];   /* date of last update YYMMDD */
    long  nrec;       /* nr of records in th database file */
    short nhead;      /* nr of bytes in the header */
    short lrec;       /* nr of bytes in the record */
    char  dum1[2];
    char  transact;   /* flag for incomplete transaction */
    char  encrypt;    /* encryption flag */
    char  lan[12];    /* reserved for network applications */
    char  mdx;        /* flag for presence of mdx-file */
    char  dum2[3];
  } header;

  struct
  { char  name[11];         /* fieldname zero filled */
    char  type;             /* field type (C, D, F, L, M, N) */
    char  dum1[4];
    unsigned char  length;  /* field length in binary */
    char  ndec;             /* field decimal count in binary */
    char  dum2[2];
    char  workid;           /* work area ID */
    char  dum3[11];
  } field;

  switch(argc)
  { case 1:   in=stdin;
              break;
    case 2:   if ((in = fopen(argv[1],"rb")) == NULL)
              { fprintf(stderr,"Input file %s not found\n",argv[1]);
                exit(1);
              }
              break;
    default:  fprintf(stderr,"Usage: %s dbase_file > unix_database\n",argv[0]);
              fprintf(stderr,"Or: cat dbase_file | %s > unix_database\n",argv[0]);
              exit(1);
  }
  fread(&header,32,1,in);
  n=header.lrec;
  pos=1;
  while (n>1)
  { if (nfield==MAXFIELDS)
    { fprintf(stderr,"Too many fields\n");
      exit(1);
    }
    fread(&field,32,1,in);
    strcpy(field_name[nfield],field.name);
    field_type[nfield]=field.type;
    field_pos[nfield++]=pos;
    pos+=(int)field.length;
    n-=(int)field.length;
  }
  for (i=0;i<nfield;i++) printf("!%d-%s%c",i+1,field_name[i],i==nfield-1?'\n':'\t');
  for (i=0;i<nfield;i++) printf(   "!%s%c",    field_name[i],i==nfield-1?'\n':'\t');
  field_pos[nfield]=pos; /* one extra for length calculation of last field */
  fread(&i,1,1,in); /*read field terminator */
  if (i!=0xD)
  { fprintf(stderr,"Illegal field terminator; more than %d fields?\n",MAXFIELDS);
    exit(1);
  }
  lrec=header.lrec;
  record=malloc(lrec);
  for (i=0;i<header.nrec;i++)
  { if (fread(record,lrec,1,in)!=1)
    { fprintf(stderr,"Error reading record %d\n",i+1);
      exit(1);
    }
    if (*record!='*')  /* if not a deleted record */
    for (j=0;j<nfield;j++)
    {
      p=record+field_pos[j];             /* first char of the field */
      q=record+field_pos[j+1]-1;         /* last char of the field */
      while (*p==' ' && p<=q) p++;        /* remove leading spaces */
      while (*q==' ' && p<=q) *(q--)='\0';    /* remove trailing spaces */
      for (r=p;r<=q;r++) putchar(*r);
      putchar(j==nfield-1?'\n':'\t');
    }
  }
}


--
Hartelijke groet, Wybo Dekker
___________________Servalys Analytical Chemistry Services__________________
wybo@servalys.hobby.nl | Deilsedijk 60                 | tel +31-345-652164
www.hobby.nl/~servalys | 4158 CH Deil, The Netherlands | fax
+31-345-652383

Re: [INTERFACES] Fastest import?

From
Joe Dieckert
Date:
AppGEN works nicely.

Peter T Mount wrote:

> On Sat, 10 Apr 1999, Matthew Hagerty wrote:
>
> > Greetings,
> >
> > What is the fastest way to import data from an ASCII file?  I have ~17,000
> > records in a .DBF file that I need to get into my pg6.4.2 database.  I have
> > written a program that reads the .DBF file, massages the data and writes it
> > out to a text file, one record per line.
> >
> > I initially had my program issue the insert commands instead of writing
> > them to a text file, but the import took about 2 hours!  We are only
> > talking about 4Meg worth of data here, it should not take that long should
> > it?  I was hoping that once in a text file that psql < text.file would be
> > faster, but it is not.  Is there any way to speed this up?
> >
> > I was also looking at the copy command and maybe using the binary mode, but
> > writing out binary data is not my idea of a good time and I'm on a deadline
> > here.  Is there a program out there that might take a text file of insert
> > statements and convert them to a binary format that copy can use?
> >
> > Any insight would be greatly appreciated.
>
> Why use binary? Just get your program to output Tab delimited ASCII
> data, and use ASCII COPY.
>
> Tip: Don't create your indices before the copy. It will load faster
> without them. Once the data's in there, then create them.
>
> Peter
>
> --
>        Peter T Mount peter@retep.org.uk
>       Main Homepage: http://www.retep.org.uk
> PostgreSQL JDBC Faq: http://www.retep.org.uk/postgres
>  Java PDF Generator: http://www.retep.org.uk/pdf


libpq++ sample c++ programs

From
Hugh Lawson
Date:
Man libpq++ says it has 3 sample c++ programs at the end, by on my system
they're not there.

Does anybody know how I can get these sample files? My c++ skills are
well, extremely limited, and examples would really help.

Hugh Lawson
Greensboro, North Carolina
hglawson@nr.infi.net