Loading speed - Mailing list pgsql-novice

From Cath Lawrence
Subject Loading speed
Date
Msg-id FC0064C3-0436-11D8-B7DB-00039390F614@anu.edu.au
Whole thread Raw
Responses Re: Loading speed  (Oliver Elphick <olly@lfix.co.uk>)
Re: Loading speed  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-novice
Hi again,

Is there an FAQ or something I can read about optimising speed of
insertions and queries? I am doing some simple-minded stuff at the
moment with pgsql, and my insertions are very very slow.

A rough idea of the problem here for background:
A protein has a small number of chains ('A', 'B' etc), each of which
has N2 amino acids, each of which has N3 atoms.
Now I've got lots and lots of atoms. Hundreds of thousands most like,
maybe a million or two, I haven't counted.  How can I most efficiently
load these?

The atom table is a bit like this with some language changes:
CREATE TABLE atom (
        id                  INTEGER PRIMARY KEY DEFAULT
nextval('atom_id_seq'),
        protein_id          CHAR(4) REFERENCES protein,
        chain_id            CHAR(1) NOT NULL,
        amino_acid_id       INTEGER REFERENCES amino_acid,
        atom_type           TEXT,
        element             TEXT,
        x                   FLOAT,
        y                   FLOAT,
        z                   FLOAT
);

Now the REFERENCES bits build triggers, and in each case it refers to
the primary key of the table. This is always indexed, yes? When I
construct the table it says  CREATE TABLE / PRIMARY KEY will create
implicit index 'amino_acid_pkey' for table 'amino_acid' - this IS an
index on the column I named as the PRIMARY KEY, not on some oid, isn't
id?

With my simple-minded pygresql script, I have one insert statement per
atom, and of course it crawls very feebly. Perhaps there's an
insert_many method I haven't found yet (the docs suck, I will keep
looking).

Or perhaps I should write them out to a text file and use COPY?

Any advice?

cheers
Cath
Cath Lawrence,                       Cath.Lawrence@anu.edu.au
Senior Scientific Programmer,  Centre for Bioinformation Science,
John Curtin School of Medical Research (room 4088)
Australian National University,  Canberra ACT 0200
ph: (02) 61257959   mobile: 0421-902694   fax: (02) 61252595


pgsql-novice by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Almost relational PostgreSQL (was: one-to-one)
Next
From: Michael Glaesmann
Date:
Subject: Re: Almost relational PostgreSQL (was: one-to-one)