Buld Insert and Index use. - Mailing list pgsql-performance

From Rudi Starcevic
Subject Buld Insert and Index use.
Date
Msg-id 411AB302.809@wildcash.com
Whole thread Raw
Responses Re: Buld Insert and Index use.  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
Hi,

This email is picking up a thread from yesterday on INSERTS and INDEXES.

In this case the question is to use and index or a sequential scan.

I have included the db DDL and SELECT query.

For each month I have a csv data dump of council property data.

So the First CD will have almost all unique records.
 From there most properties will already be in the db.

Each line of the csv file populates three tables.

gc_prop
gc_value
gc_owner

gc_prop has a one to many relationship with gc_value and gc_owner.

For-each line we first SELECT from the gc_prop table.
If there is no record we INSERT into the gc_prop,gc_value and gc_owner
tables.

If there is a matching property we do not need to INSERT into the
gc_prop table only the gc_value and gc_owner tables.

So each row will require either:

a) One SELECT + three INSERTS
or
b) One SELECT + two INSERTS

CREATE TABLE gc_prop
(
gc_id                serial PRIMARY KEY,
acc_no                 numeric(9,0),
master_code         varchar(32),
no_prop             numeric(4,0),
lot_no                 numeric(9,0),
plan_no             varchar(32),
volume_no             numeric(8,0),
folio_no             numeric(8,0),
street_no             varchar(32),
street_name         varchar(30),
suburb_name         varchar(30),
suburb_postcode     numeric(4,0),
address1_txt         varchar(80),
address2_txt         varchar(80),
address3_txt         varchar(80),
address4_txt         varchar(80),
address5_txt         varchar(80),
address6_txt         varchar(80),
owner_postcode         text,
planning_zone         varchar(80),
waterfront_code     varchar(80),
vacant_land         varchar(32),
property_area         text,
improvemeny_type     varchar(20),
bup_indicator         varchar(4),
property_name         varchar(30),
lot_quantity         text,
pool_indicator         varchar(32),
owner_occupied         varchar(32),
prsh_text             varchar(30),
vg_no                 varchar(20),
property_part         varchar(20),
lot_no_txt             varchar(32),
plan_qualifier1     varchar(80),
plan_qualifier2     varchar(80),
owner_trans_date     date,
legal_entity         varchar(32),
"1st_surname_com"     varchar(80),
"1st_given_name"     varchar(64),
"2nd_surname"         varchar(64),
"2nd_given_name"     varchar(64),
easement_flag         varchar(32),
trans_code             varchar(32)
);

CREATE TABLE gc_value
(
gc_v_id                serial PRIMARY KEY,
gc_id                integer NOT NULL, -- foreign key to gc_prop table
current_valuation     numeric(10,0),
valuation_date         date,
last_sale_date         date,
annual_rates         numeric(7,0),
sale_amount         numeric(9,0),
type_sale             varchar(32),
date_sale             date,
pre_val_date         date,
pre_val_amount         numeric(10,0),
pre_vg_no             varchar(20),
future_val_date     varchar(32),
fut_val_amount         numeric(10,0),
fut_val_no             varchar(32)
);

CREATE TABLE gc_owner
(
gc_o_id                serial PRIMARY KEY,
gc_id                integer NOT NULL, -- foreign key to gc_prop table
pre_legal_entity    varchar(32),
pre_surname_com     varchar(76),
pre_given_name         varchar(32),
pre_2nd_surname     varchar(40),
pre_2nd_given         varchar(32),
orig_lot_no            numeric(9,0),
orig_lot_txt        varchar(32),
orig_prop_txt        text,
orig_plan_qualifier1    varchar(32),
orig_plan_qualifier2    varchar(32),
orig_plan_no         varchar(32)
);

CREATE INDEX gc_prop_check ON gc_prop (
acc_no,no_prop,lot_no,plan_no,street_no,street_name,suburb_postcode );

//check if this property already exists using gc_prop_check index
$sql = "
SELECT
gp.gc_id,
gp.acc_no,
gp.no_prop,
gp.lot_no,
gp.plan_no,
gp.street_no,
gp.street_name,
gp.suburb_postcode,
gp.owner_trans_date
FROM    gc_prop gp
WHERE
gp.acc_no = $acc_no,
AND
gp.no_prop = $no_prop,
AND
gp.lot_no = $lot_no
AND
gp.plan_no = '$plan_no'
AND
gp.street_no = '$street_no'
AND
gp.street_name = '$street_name'
AND
gp.suburb_postcode = $suburb_postcode
";

Do you think an Index or Seq. scan should be used?

Thanks.
Regards,
Rudi.

pgsql-performance by date:

Previous
From: Russell Smith
Date:
Subject: Re: Storing binary data.
Next
From: Gaetano Mendola
Date:
Subject: Re: [HACKERS] fsync vs open_sync