Re: Very Poor Insert Performance - Mailing list pgsql-performance
From | Damien Dougan |
---|---|
Subject | Re: Very Poor Insert Performance |
Date | |
Msg-id | 200310291022.24035.damien.dougan@mobilecohesion.com Whole thread Raw |
In response to | Re: Very Poor Insert Performance (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Very Poor Insert Performance
|
List | pgsql-performance |
On Monday 27 October 2003 8:12 pm, Tom Lane wrote: > Damien Dougan <damien.dougan@mobilecohesion.com> writes: > > Has anyone any ideas as to what could be causing the spiraling > > performance? > > You really haven't provided any information that would allow anything > but guesses, but I'll guess anyway: poor plans for foreign key checks? > See nearby threads. > > regards, tom lane Apologies for not including more info - I had been hoping that spiralling performance was a known tell-tale sign of something:) Here is some additional information - sorry if its overload, but I figured I should give an intro to the schema before showingthe EXPLAIN results! Firstly, a quick schema overview for the relevant tables: contact has many contactparts address has many addressparts contact has many address Now, the above table relationships are connected via relationship tables (rather than foreign indexes directly to each other),so we have: contact rel_contact_has_contactpart address rel_address_has_addresspart (The reasons behind this are for meta-data purposes - our database is intended to be very abstract from the code...) Table "public.contact" Column | Type | Modifiers ------------------+-----------------------------+--------------------------------------------------- id | integer | default nextval('contact_id_seq'::text) version | integer | default 1 contactid | character varying | enddate | timestamp without time zone | preferredaddress | character varying | startdate | timestamp without time zone | Indexes: "contact_id_idx" unique, btree (id) "contact_key" unique, btree (contactid) So we have an index on the meta-data related "id" and the externally visible "contactid" values. The "id" is used with therel_contact_has_XXX tables (see below). Table "public.contactpart" Column | Type | Modifiers -------------+-------------------+------------------------------------------------ id | integer | default nextval('contactpart_id_seq'::text) version | integer | default 1 detailname | character varying | not null detailvalue | character varying | Indexes: "contactpart_id_idx" unique, btree (id) So we have an index on the meta-data related "id". Table "public.address" Column | Type | Modifiers ---------+-----------------------------+-------------------------------------------- id | integer | default nextval('mc_address_id_seq'::text) version | integer | default 1 enddate | timestamp without time zone | format | character varying | type | character varying | not null value | character varying | Indexes: "address_id_idx" unique, btree (id) "address_value_key" btree (value) So we have an index on the meta-data related "id". Table "public.addresspart" Column | Type | Modifiers -------------+-------------------+------------------------------------------------ id | integer | default nextval('addresspart_id_seq'::text) version | integer | default 1 detailname | character varying | not null detailvalue | character varying | Indexes: "addresspart_id_idx" unique, btree (id) So we have an index on the meta-data related "id". This is used with the rel_address_has_addresspart table (see below). Table "public.rel_contact_has_contactpart" Column | Type | Modifiers ----------------------+---------+----------- contact_id | integer | contactpart_id | integer | Indexes: "rel_contact_has_contactpart_idx2" unique, btree (contactpart_id) "rel_contact_has_contactpart_idx1" btree (contact_id) So we have a unique index on the contactpart and a non-unique index on the contact (to reflect the 1:M relationship contacthas contactparts) Table "public.rel_address_has_addresspart" Column | Type | Modifiers -------------------+---------+----------- address_id | integer | addresspart_id | integer | Indexes: "rel_address_has_addresspart_idx2" unique, btree (addresspart_id) "rel_address_has_addresspart_idx1" btree (address_id) So we have a unique index on the addresspart and a non-unique index on the address (to reflect the 1:M relationship addresshas addressparts) Table "public.rel_contact_has_address" Column | Type | Modifiers ----------------------+---------+----------- contact_id | integer | address_id | integer | Indexes: "rel_contact_has_address_idx2" unique, btree (address_id) "rel_contact_has_address_idx1" btree (contactdetails_id) So we have a unique index on the address and a non-unique index on the contact (to reflect the 1:M relationship contact hasaddresses) However, to add a layer of abstraction to the business logic, the underlying tables are never directly exposed through anythingother than public views. The public views combine the <table> and <tablepart> into a single table. So we have 2 public views: PvContact which ties together the contact and contactparts, and PvAddress which ties togetherthe address and addresspart. View "public.pvcontact" Column | Type | Modifiers ------------------+-----------------------------+----------- version | integer | contactid | character varying | startdate | timestamp without time zone | enddate | timestamp without time zone | preferredaddress | character varying | firstname | character varying | lastname | character varying | (Note - firstname and lastname are dervied from the detailnames of contactpart table) View "public.pvaddress" Column | Type | Modifiers -----------+-----------------------------+----------- version | integer | contactid | character varying | type | character varying | format | character varying | enddate | timestamp without time zone | npi | character varying | ton | character varying | number | character varying | prefix | character varying | addrvalue | character varying | link | character varying | house | character varying | street | character varying | town | character varying | city | character varying | county | character varying | postcode | character varying | state | character varying | zipcode | character varying | extension | character varying | (Note - number, prefix, link, house, street, town, city, postcode etc are derived from the detailnames of addresspart table) For example, suppose we have 2 contactparts for a particular contact (with unique id = y): FirstName and LastName, then thecontactpart table would have 2 rows like: id = x version = 1 partname = 'FirstName' partvalue = 'John' id = x+1 version = 1 partname = 'LastName' partvalue = 'Doe' Then the public view, PvContact, would look like: Version = 1 ContactId = y StartDate = ... EndDate = ... PreferredAddress = ... FirstName = John LastName = Doe All Create, Read, Update, Delete operations on the DB are performed by StoredProcedures (again, to help abstract the codefrom the DB). The SPs are capable of dealing with externally (public view) advertised schemas, and ensuring data integrityacross the underlying tables. Now, our problem seems to be the delays introduced by reading from the public views. I've taken some measurements of rawINSERTS that mirror what the SPs are doing (but the data is invalid and its not correctly linked across tables (whichis what the StoredProcs are responsible for) - but the INSERTS are happening in the same order and frequency for avalid data upload). We can upload 2000 sets of users+contacts+addresses in about 17 seconds. But when it is done via theStored Procedures (which do some inserts, some reads, some more inserts etc to ensure tables are properly linked via therelationships), this drops to 2 minutes for 2000. And the performance spirals down to less than 1 user+contact+addressper second after a short while. First of all then, the definition of the PublicView PvAddress View definition: SELECT address.id AS addressid, address."version", contact.id AS contactid, contact.contactid AS contactuuid, address."type",address.format, address.enddate, address.value, rel_npi.npiid, rel_npi.npi, rel_ton.tonid, rel_ton.ton, rel_number.numberid,rel_number.number, rel_prefix.prefixid, rel_prefix.prefix, rel_addrvalue.addrvalueid, rel_addrvalue.addrvalue,rel_link.linkid, rel_link.link, rel_house.houseid, rel_house.house, rel_street.streetid, rel_street.street,rel_town.townid, rel_town.town, rel_city.cityid, rel_city.city, rel_county.countyid, rel_county.county,rel_postcode.postcodeid, rel_postcode.postcode, rel_state.stateid, rel_state.state, rel_zipcode.zipcodeid,rel_zipcode.zipcode, rel_extension.extensionid, rel_extension.extension FROM svcurrentcontactdetails contact, rel_contact_has_address rel_contact, svcurrentaddress address LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS npiid, det.detailvalue AS npi FROM rel_address_has_addresspart rel, addresspart det WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Npi'::text) rel_npi ON address.id = rel_npi.addressid LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS tonid, det.detailvalue AS ton FROM rel_address_has_addresspart rel, addresspart det WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Ton'::text) rel_ton ON address.id = rel_ton.addressid LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS numberid, det.detailvalue AS number FROM rel_address_has_addresspart rel, addresspart det WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Number'::text) rel_number ON address.id = rel_number.addressid LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS prefixid, det.detailvalue AS prefix FROM rel_address_has_addresspart rel, addresspart det WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Prefix'::text) rel_prefix ON address.id = rel_prefix.addressid LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS addrvalueid, det.detailvalue AS addrvalue FROM rel_address_has_addresspart rel, addresspart det WHERE rel.addresspart_id = det.id AND det.detailname::text = 'AddrValue'::text) rel_addrvalue ON address.id= rel_addrvalue.addressid LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS linkid, det.detailvalue AS link FROM rel_address_has_addresspart rel, addresspart det WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Link'::text) rel_link ON address.id = rel_link.addressid LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS houseid, det.detailvalue AS house FROM rel_address_has_addresspart rel, addresspart det WHERE rel.addresspart_id = det.id AND det.detailname::text = 'House'::text) rel_house ON address.id = rel_house.addressid LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS streetid, det.detailvalue AS street FROM rel_address_has_addresspart rel, addresspart det WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Street'::text) rel_street ON address.id = rel_street.addressid LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS townid, det.detailvalue AS town FROM rel_address_has_addresspart rel, addresspart det WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Town'::text) rel_town ON address.id = rel_town.addressid LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS cityid, det.detailvalue AS city FROM rel_address_has_addresspart rel, addresspart det WHERE rel.addresspart_id = det.id AND det.detailname::text = 'City'::text) rel_city ON address.id = rel_city.addressid LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS countyid, det.detailvalue AS county FROM rel_address_has_addresspart rel, addresspart det WHERE rel.addresspart_id = det.id AND det.detailname::text = 'County'::text) rel_county ON address.id = rel_county.addressid LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS postcodeid, det.detailvalue AS postcode FROM rel_address_has_addresspart rel, addresspart det WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Postcode'::text) rel_postcode ON address.id= rel_postcode.addressid LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS stateid, det.detailvalue AS state FROM rel_address_has_addresspart rel, addresspart det WHERE rel.addresspart_id = det.id AND det.detailname::text = 'State'::text) rel_state ON address.id = rel_state.addressid LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS zipcodeid, det.detailvalue AS zipcode FROM rel_address_has_addresspart rel, addresspart det WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Zipcode'::text) rel_zipcode ON address.id =rel_zipcode.addressid LEFT JOIN ( SELECT rel.address_id AS addressid, rel.addresspart_id AS extensionid, det.detailvalue AS extension FROM rel_address_has_addresspart rel, addresspart det WHERE rel.addresspart_id = det.id AND det.detailname::text = 'Extension'::text) rel_extension ON address.id= rel_extension.addressid WHERE contact.id = rel_contact.contact_id AND address.id = rel_contact.address_id; (The JOINs are where our problems are below ...) hydradb=# explain select * from pvaddress where contactuuid = 'test' and type = 'sms' and format is null ; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------- Merge Join (cost=42499.93..44975.38 rows=1 width=358) Merge Cond: ("outer".id = "inner".address_id) -> Merge Left Join (cost=42491.11..44957.05 rows=3795 width=323) Merge Cond: ("outer".id = "inner".address_id) -> Merge Left Join (cost=41822.20..44278.07 rows=3795 width=305) Merge Cond: ("outer".id = "inner".address_id) -> Merge Left Join (cost=41153.29..43599.08 rows=3795 width=287) Merge Cond: ("outer".id = "inner".address_id) -> Merge Left Join (cost=40484.39..42920.10 rows=3795 width=269) Merge Cond: ("outer".id = "inner".address_id) -> Merge Left Join (cost=39815.48..42241.12 rows=3795 width=251) Merge Cond: ("outer".id = "inner".address_id) -> Merge Left Join (cost=39146.58..41562.13 rows=3795 width=233) Merge Cond: ("outer".id = "inner".address_id) -> Merge Left Join (cost=38477.67..40883.15 rows=3795 width=215) Merge Cond: ("outer".id = "inner".address_id) -> Merge Left Join (cost=37808.76..40204.16 rows=3795 width=197) Merge Cond: ("outer".id = "inner".address_id) -> Merge Left Join (cost=37139.86..39525.18 rows=3795 width=179) Merge Cond: ("outer".id = "inner".address_id) -> Merge Left Join (cost=36470.95..38846.20 rows=3795 width=161) Merge Cond: ("outer".id = "inner".address_id) -> Merge Left Join (cost=35802.04..38167.21 rows=3795 width=143) Merge Cond: ("outer".id = "inner".address_id) -> Merge Left Join (cost=28634.40..30852.70 rows=3795width=125) Merge Cond: ("outer".id = "inner".address_id) -> Merge Left Join (cost=21495.85..23569.10rows=3795 width=107) Merge Cond: ("outer".id = "inner".address_id) -> Merge Left Join (cost=14328.21..16254.59rows=3795 width=89) Merge Cond: ("outer".id = "inner".address_id) -> Merge Left Join (cost=7102.23..8878.06rows=3795 width=71) Merge Cond: ("outer".id = "inner".address_id) -> Index Scan using address_id_idxon address (cost=0.00..1633.07 rows=3795 width=53) Filter: (((enddate ISNULL) OR (('now'::text)::timestamp(6) with time zone < (enddate)::timestamp with time zone)) AND (("typ e")::text = 'sms'::text) AND (format IS NULL)) -> Sort (cost=7102.23..7159.65rows=22970 width=22) Sort Key: rel.address_id -> Merge Join (cost=0.00..5438.34rows=22970 width=22) Merge Cond: ("outer".id= "inner".addressline_id) -> Index Scan usingaddressline_id_idx on addressline det (cost=0.00..2773.61 rows=22969 width=18) Filter: ((detailname)::text= 'Npi'::text) -> Index Scan usingrel_address_has_addressline_idx2 on rel_address_has_addressline rel (cost=0.00..2082.13 rows=1181 93 width=8) -> Sort (cost=7225.98..7286.76 rows=24310width=22) Sort Key: rel.address_id -> Merge Join (cost=0.00..5455.09rows=24310 width=22) Merge Cond: ("outer".id= "inner".addressline_id) -> Index Scan using addressline_id_idxon addressline det (cost=0.00..2773.61 rows=24309 width=18) Filter: ((detailname)::text= 'Ton'::text) -> Index Scan using rel_address_has_addressline_idx2on rel_address_has_addressline rel (cost=0.00..2082.13 rows=118193 wid th=8) -> Sort (cost=7167.64..7226.84 rows=23679width=22) Sort Key: rel.address_id -> Merge Join (cost=0.00..5447.20rows=23679 width=22) Merge Cond: ("outer".id = "inner".addressline_id) -> Index Scan using addressline_id_idxon addressline det (cost=0.00..2773.61 rows=23678 width=18) Filter: ((detailname)::text= 'Number'::text) -> Index Scan using rel_address_has_addressline_idx2on rel_address_has_addressline rel (cost=0.00..2082.13 rows=118193 width=8) -> Sort (cost=7138.56..7196.97 rows=23364 width=22) Sort Key: rel.address_id -> Merge Join (cost=0.00..5443.27 rows=23364width=22) Merge Cond: ("outer".id = "inner".addressline_id) -> Index Scan using addressline_id_idxon addressline det (cost=0.00..2773.61 rows=23363 width=18) Filter: ((detailname)::text= 'Prefix'::text) -> Index Scan using rel_address_has_addressline_idx2on rel_address_has_addressline rel (cost=0.00..2082.13 rows=118193 width=8) -> Sort (cost=7167.64..7226.84 rows=23679 width=22) Sort Key: rel.address_id -> Merge Join (cost=0.00..5447.20 rows=23679width=22) Merge Cond: ("outer".id = "inner".addressline_id) -> Index Scan using addressline_id_idxon addressline det (cost=0.00..2773.61 rows=23678 width=18) Filter: ((detailname)::text = 'AddrValue'::text) -> Index Scan using rel_address_has_addressline_idx2on rel_address_has_addressline rel (cost=0.00..2082.13 rows=118193 width=8) -> Sort (cost=668.91..669.16 rows=100 width=22) Sort Key: rel.address_id -> Nested Loop (cost=0.00..665.58 rows=100 width=22) -> Index Scan using addressline_detail_idx onaddressline det (cost=0.00..366.01 rows=99 width=18) Index Cond: ((detailname)::text = 'Link'::text) -> Index Scan using rel_address_has_addressline_idx2on rel_address_has_addressline rel (cost=0.00..3.01 rows=1 width=8) Index Cond: (rel.addressline_id = "outer".id) -> Sort (cost=668.91..669.16 rows=100 width=22) Sort Key: rel.address_id -> Nested Loop (cost=0.00..665.58 rows=100 width=22) -> Index Scan using addressline_detail_idx on addresslinedet (cost=0.00..366.01 rows=99 width=18) Index Cond: ((detailname)::text = 'House'::text) -> Index Scan using rel_address_has_addressline_idx2on rel_address_has_addressline rel (cost=0.00..3.01 rows=1 width=8) Index Cond: (rel.addressline_id = "outer".id) -> Sort (cost=668.91..669.16 rows=100 width=22) Sort Key: rel.address_id -> Nested Loop (cost=0.00..665.58 rows=100 width=22) -> Index Scan using addressline_detail_idx on addresslinedet (cost=0.00..366.01 rows=99 width=18) Index Cond: ((detailname)::text = 'Street'::text) -> Index Scan using rel_address_has_addressline_idx2 on rel_address_has_addresslinerel (cost=0.00..3.01 rows=1 width=8) Index Cond: (rel.addressline_id = "outer".id) -> Sort (cost=668.91..669.16 rows=100 width=22) Sort Key: rel.address_id -> Nested Loop (cost=0.00..665.58 rows=100 width=22) -> Index Scan using addressline_detail_idx on addressline det (cost=0.00..366.01 rows=99 width=18) Index Cond: ((detailname)::text = 'Town'::text) -> Index Scan using rel_address_has_addressline_idx2 on rel_address_has_addresslinerel (cost=0.00..3.01 rows=1 width=8) Index Cond: (rel.addressline_id = "outer".id) -> Sort (cost=668.91..669.16 rows=100 width=22) Sort Key: rel.address_id -> Nested Loop (cost=0.00..665.58 rows=100 width=22) -> Index Scan using addressline_detail_idx on addressline det (cost=0.00..366.01rows=99 width=18) Index Cond: ((detailname)::text = 'City'::text) -> Index Scan using rel_address_has_addressline_idx2 on rel_address_has_addresslinerel (cost=0.00..3.01 rows=1 width=8) Index Cond: (rel.addressline_id = "outer".id) -> Sort (cost=668.91..669.16 rows=100 width=22) Sort Key: rel.address_id -> Nested Loop (cost=0.00..665.58 rows=100 width=22) -> Index Scan using addressline_detail_idx on addressline det (cost=0.00..366.01rows=99 width=18) Index Cond: ((detailname)::text = 'County'::text) -> Index Scan using rel_address_has_addressline_idx2 on rel_address_has_addresslinerel (cost=0.00..3.01 rows=1 width=8) Index Cond: (rel.addressline_id = "outer".id) -> Sort (cost=668.91..669.16 rows=100 width=22) Sort Key: rel.address_id -> Nested Loop (cost=0.00..665.58 rows=100 width=22) -> Index Scan using addressline_detail_idx on addressline det (cost=0.00..366.01rows=99 width=18) Index Cond: ((detailname)::text = 'Postcode'::text) -> Index Scan using rel_address_has_addressline_idx2 on rel_address_has_addresslinerel (cost=0.00..3.01 rows=1 width=8) Index Cond: (rel.addressline_id = "outer".id) -> Sort (cost=668.91..669.16 rows=100 width=22) Sort Key: rel.address_id -> Nested Loop (cost=0.00..665.58 rows=100 width=22) -> Index Scan using addressline_detail_idx on addressline det (cost=0.00..366.01 rows=99width=18) Index Cond: ((detailname)::text = 'State'::text) -> Index Scan using rel_address_has_addressline_idx2 on rel_address_has_addressline rel (cost=0.00..3.01 rows=1 width=8) Index Cond: (rel.addressline_id = "outer".id) -> Sort (cost=668.91..669.16 rows=100 width=22) Sort Key: rel.address_id -> Nested Loop (cost=0.00..665.58 rows=100 width=22) -> Index Scan using addressline_detail_idx on addressline det (cost=0.00..366.01 rows=99 width=18) Index Cond: ((detailname)::text = 'Zipcode'::text) -> Index Scan using rel_address_has_addressline_idx2 on rel_address_has_addressline rel (cost=0.00..3.01rows=1 width=8) Index Cond: (rel.addressline_id = "outer".id) -> Sort (cost=668.91..669.16 rows=100 width=22) Sort Key: rel.address_id -> Nested Loop (cost=0.00..665.58 rows=100 width=22) -> Index Scan using addressline_detail_idx on addressline det (cost=0.00..366.01 rows=99 width=18) Index Cond: ((detailname)::text = 'Extension'::text) -> Index Scan using rel_address_has_addressline_idx2 on rel_address_has_addressline rel (cost=0.00..3.01rows=1 width=8) Index Cond: (rel.addressline_id = "outer".id) -> Sort (cost=8.83..8.83 rows=2 width=39) Sort Key: rel_contact.address_id -> Nested Loop (cost=0.00..8.82 rows=2 width=39) -> Index Scan using contact_key on contact (cost=0.00..5.77 rows=1 width=35) Index Cond: ((contactid)::text = 'test'::text) Filter: (((startdate IS NULL) OR (('now'::text)::timestamp(6) with time zone >= (startdate)::timestampwith time zone)) AND ((enddate IS NULL) OR (('now'::text)::timestamp(6) with time zone < (enddate):: timestamp with time zone))) -> Index Scan using rel_contact_has_address_idx1 on rel_contact_has_address rel_contact (cost=0.00..3.02rows=2 width=8) Index Cond: ("outer".id = rel_contact.contact_id) (147 rows) As you can see, the PublicView is resulting in a huge nested loop, with an index scan of the contact only occurring at theend. I would have expected something more like: (1) An index scan of the contact table to determine the correct contact (2) An index scan of the address table using the rel_contact_has_address.address_id to obtain the (relatively small - max16, and typically 2) addresses (3) A number of joins - at the same level rather than looping - to obtain the detailnames for the new column names of thepublic view As I said in my original email, these delays are after applying all the performance related enhancements (fsync off, increasedbackbuffers, sort memory etc) I have picked up from the archives and FAQ. The upload script was also modified tocommit and vacuum analyze at different intervals without providing any significant improvement. top reports the CPU usageat 99% - so I believe its all looping of the above intermediate SELECTs that is causing the spiralling delays as thenumber of rows increases. Again, any help would be very much appreciated! Damien
pgsql-performance by date: