Buld Insert and Index use. - Mailing list pgsql-performance
From | Rudi Starcevic |
---|---|
Subject | Buld Insert and Index use. |
Date | |
Msg-id | 411AB302.809@wildcash.com Whole thread Raw |
Responses |
Re: Buld Insert and Index use.
|
List | pgsql-performance |
Hi, This email is picking up a thread from yesterday on INSERTS and INDEXES. In this case the question is to use and index or a sequential scan. I have included the db DDL and SELECT query. For each month I have a csv data dump of council property data. So the First CD will have almost all unique records. From there most properties will already be in the db. Each line of the csv file populates three tables. gc_prop gc_value gc_owner gc_prop has a one to many relationship with gc_value and gc_owner. For-each line we first SELECT from the gc_prop table. If there is no record we INSERT into the gc_prop,gc_value and gc_owner tables. If there is a matching property we do not need to INSERT into the gc_prop table only the gc_value and gc_owner tables. So each row will require either: a) One SELECT + three INSERTS or b) One SELECT + two INSERTS CREATE TABLE gc_prop ( gc_id serial PRIMARY KEY, acc_no numeric(9,0), master_code varchar(32), no_prop numeric(4,0), lot_no numeric(9,0), plan_no varchar(32), volume_no numeric(8,0), folio_no numeric(8,0), street_no varchar(32), street_name varchar(30), suburb_name varchar(30), suburb_postcode numeric(4,0), address1_txt varchar(80), address2_txt varchar(80), address3_txt varchar(80), address4_txt varchar(80), address5_txt varchar(80), address6_txt varchar(80), owner_postcode text, planning_zone varchar(80), waterfront_code varchar(80), vacant_land varchar(32), property_area text, improvemeny_type varchar(20), bup_indicator varchar(4), property_name varchar(30), lot_quantity text, pool_indicator varchar(32), owner_occupied varchar(32), prsh_text varchar(30), vg_no varchar(20), property_part varchar(20), lot_no_txt varchar(32), plan_qualifier1 varchar(80), plan_qualifier2 varchar(80), owner_trans_date date, legal_entity varchar(32), "1st_surname_com" varchar(80), "1st_given_name" varchar(64), "2nd_surname" varchar(64), "2nd_given_name" varchar(64), easement_flag varchar(32), trans_code varchar(32) ); CREATE TABLE gc_value ( gc_v_id serial PRIMARY KEY, gc_id integer NOT NULL, -- foreign key to gc_prop table current_valuation numeric(10,0), valuation_date date, last_sale_date date, annual_rates numeric(7,0), sale_amount numeric(9,0), type_sale varchar(32), date_sale date, pre_val_date date, pre_val_amount numeric(10,0), pre_vg_no varchar(20), future_val_date varchar(32), fut_val_amount numeric(10,0), fut_val_no varchar(32) ); CREATE TABLE gc_owner ( gc_o_id serial PRIMARY KEY, gc_id integer NOT NULL, -- foreign key to gc_prop table pre_legal_entity varchar(32), pre_surname_com varchar(76), pre_given_name varchar(32), pre_2nd_surname varchar(40), pre_2nd_given varchar(32), orig_lot_no numeric(9,0), orig_lot_txt varchar(32), orig_prop_txt text, orig_plan_qualifier1 varchar(32), orig_plan_qualifier2 varchar(32), orig_plan_no varchar(32) ); CREATE INDEX gc_prop_check ON gc_prop ( acc_no,no_prop,lot_no,plan_no,street_no,street_name,suburb_postcode ); //check if this property already exists using gc_prop_check index $sql = " SELECT gp.gc_id, gp.acc_no, gp.no_prop, gp.lot_no, gp.plan_no, gp.street_no, gp.street_name, gp.suburb_postcode, gp.owner_trans_date FROM gc_prop gp WHERE gp.acc_no = $acc_no, AND gp.no_prop = $no_prop, AND gp.lot_no = $lot_no AND gp.plan_no = '$plan_no' AND gp.street_no = '$street_no' AND gp.street_name = '$street_name' AND gp.suburb_postcode = $suburb_postcode "; Do you think an Index or Seq. scan should be used? Thanks. Regards, Rudi.
pgsql-performance by date: