Update performance ... is 200,000 updates per hour what I should expect? - Mailing list pgsql-performance
From | Erik Norvelle |
---|---|
Subject | Update performance ... is 200,000 updates per hour what I should expect? |
Date | |
Msg-id | A461D6B8-24DF-11D8-BDFB-000A9583BF06@norvelle.net Whole thread Raw |
Responses |
Re: Update performance ... is 200,000 updates per hour
Re: Update performance ... is 200,000 updates per hour what I should expect? Re: Update performance ... is 200,000 updates per hour what I should expect? |
List | pgsql-performance |
<fixed><fontfamily><param>Courier New</param>Folks: I´m running a query which is designed to generate a foreign key for a table of approx. 10 million records (I've mentioned this in an earlier posting). The table is called "indethom", and each row contains a single word from the works of St. Thomas Aquinas, along with grammatical data about the word form, and (most importantly for my current problem) a set of columns identifying the particular work/section/paragraph that the word appears in. This database is completely non-normalized, and I'm working on performing some basic normalization, beginning with creating a table called "s2.sectiones" which (naturally) contains a complete listing of all of the sections of all the works of St. Thomas. I will then eliminate this information from the original "indethom" table, replacing it with the foreign key I am currently generating. ** My question has to do with whether or not I am getting maximal speed out of PostgreSQL, or whether I need to perform further optimizations. I am currently getting about 200,000 updates per hour, and updating the entire 10 million rows thus requires 50 hours, which seems a bit much. Here's the query I am running: update indethom set query_counter = nextval('s2.query_counter_seq'), -- Just for keeping track of how fast the query is running sectref = (select clavis from s2.sectiones where s2.sectiones.nomeoper = indethom.nomeoper and s2.sectiones.refere1a = indethom.refere1a and s2.sectiones.refere1b = indethom.refere1b and s2.sectiones.refere2a = indethom.refere2a and s2.sectiones.refere2b = indethom.refere2b and s2.sectiones.refere3a = indethom.refere3a and s2.sectiones.refere3b = indethom.refere3b and s2.sectiones.refere4a = indethom.refere4a and s2.sectiones.refere4b = indethom.refere4b); Here´s the query plan: QUERY PLAN ------------------------------------------------------------------------------------- Seq Scan on indethom (cost=0.00..1310352.72 rows=10631972 width=212) SubPlan -> Index Scan using sectiones_ndx on sectiones (cost=0.00..6.03 rows=1 width=4) Index Cond: ((nomeoper = $0) AND (refere1a = $1) AND (refere1b = $2) AND (refere2a = $3) AND (refere2b = $4) AND (refere3a = $5) AND (refere3b = $6) AND (refere4a = $7) AND (refere4b = $8)) (4 rows) <smaller>Note: I have just performed a VACUUM ANALYZE on the indethom table, as suggested by this listserve.</smaller> Here's the structure of the s2.sectiones table: it=> \d s2.sectiones Table "s2.sectiones" Column | Type | Modifiers ----------+--------------+----------- nomeoper | character(3) | refere1a | character(2) | refere1b | character(2) | refere2a | character(2) | refere2b | character(2) | refere3a | character(2) | refere3b | character(2) | refere4a | character(2) | refere4b | character(2) | clavis | integer | Indexes: sectiones_ndx btree (nomeoper, refere1a, refere1b, refere2a, refere2b, refere3a, refere3b, refere4a, refere4b) Finally, here is the structure of indethom (some non-relevant columns not shown): it=> \d indethom Table "public.indethom" Column | Type | Modifiers ---------------+-----------------------+----------- numeoper | smallint | not null nomeoper | character(3) | not null editcrit | character(1) | refere1a | character(2) | refere1b | character(2) | refere2a | character(2) | refere2b | character(2) | refere3a | character(2) | refere3b | character(2) | refere4a | character(2) | refere4b | character(2) | refere5a | character(2) | not null refere5b | smallint | not null referen6 | smallint | not null ... several columns skipped ... verbum | character varying(22) | not null ... other columns skipped ... poslinop | integer | not null posverli | smallint | not null posverop | integer | not null clavis | integer | not null articref | integer | sectref | integer | query_counter | integer | Indexes: indethom_pkey primary key btree (clavis), indethom_articulus_ndx btree (nomeoper, refere1a, refere1b, refere2a, refere2b, refere3a, refere3b), indethom_sectio_ndx btree (nomeoper, refere1a, refere1b, refere2a, refere2b, refere3a, refere3b, refere4a, refere4b), verbum_ndx btree (verbum) Thanks for your assistance! -Erik Norvelle</fontfamily></fixed>Folks: I´m running a query which is designed to generate a foreign key for a table of approx. 10 million records (I've mentioned this in an earlier posting). The table is called "indethom", and each row contains a single word from the works of St. Thomas Aquinas, along with grammatical data about the word form, and (most importantly for my current problem) a set of columns identifying the particular work/section/paragraph that the word appears in. This database is completely non-normalized, and I'm working on performing some basic normalization, beginning with creating a table called "s2.sectiones" which (naturally) contains a complete listing of all of the sections of all the works of St. Thomas. I will then eliminate this information from the original "indethom" table, replacing it with the foreign key I am currently generating. ** My question has to do with whether or not I am getting maximal speed out of PostgreSQL, or whether I need to perform further optimizations. I am currently getting about 200,000 updates per hour, and updating the entire 10 million rows thus requires 50 hours, which seems a bit much. Here's the query I am running: update indethom set query_counter = nextval('s2.query_counter_seq'), -- Just for keeping track of how fast the query is running sectref = (select clavis from s2.sectiones where s2.sectiones.nomeoper = indethom.nomeoper and s2.sectiones.refere1a = indethom.refere1a and s2.sectiones.refere1b = indethom.refere1b and s2.sectiones.refere2a = indethom.refere2a and s2.sectiones.refere2b = indethom.refere2b and s2.sectiones.refere3a = indethom.refere3a and s2.sectiones.refere3b = indethom.refere3b and s2.sectiones.refere4a = indethom.refere4a and s2.sectiones.refere4b = indethom.refere4b); Here´s the query plan: QUERY PLAN ------------------------------------------------------------------------ ------------- Seq Scan on indethom (cost=0.00..1310352.72 rows=10631972 width=212) SubPlan -> Index Scan using sectiones_ndx on sectiones (cost=0.00..6.03 rows=1 width=4) Index Cond: ((nomeoper = $0) AND (refere1a = $1) AND (refere1b = $2) AND (refere2a = $3) AND (refere2b = $4) AND (refere3a = $5) AND (refere3b = $6) AND (refere4a = $7) AND (refere4b = $8)) (4 rows) Note: I have just performed a VACUUM ANALYZE on the indethom table, as suggested by this listserve. Here's the structure of the s2.sectiones table: it=> \d s2.sectiones Table "s2.sectiones" Column | Type | Modifiers ----------+--------------+----------- nomeoper | character(3) | refere1a | character(2) | refere1b | character(2) | refere2a | character(2) | refere2b | character(2) | refere3a | character(2) | refere3b | character(2) | refere4a | character(2) | refere4b | character(2) | clavis | integer | Indexes: sectiones_ndx btree (nomeoper, refere1a, refere1b, refere2a, refere2b, refere3a, refere3b, refere4a, refere4b) Finally, here is the structure of indethom (some non-relevant columns not shown): it=> \d indethom Table "public.indethom" Column | Type | Modifiers ---------------+-----------------------+----------- numeoper | smallint | not null nomeoper | character(3) | not null editcrit | character(1) | refere1a | character(2) | refere1b | character(2) | refere2a | character(2) | refere2b | character(2) | refere3a | character(2) | refere3b | character(2) | refere4a | character(2) | refere4b | character(2) | refere5a | character(2) | not null refere5b | smallint | not null referen6 | smallint | not null ... several columns skipped ... verbum | character varying(22) | not null ... other columns skipped ... poslinop | integer | not null posverli | smallint | not null posverop | integer | not null clavis | integer | not null articref | integer | sectref | integer | query_counter | integer | Indexes: indethom_pkey primary key btree (clavis), indethom_articulus_ndx btree (nomeoper, refere1a, refere1b, refere2a, refere2b, refere3a, refere3b), indethom_sectio_ndx btree (nomeoper, refere1a, refere1b, refere2a, refere2b, refere3a, refere3b, refere4a, refere4b), verbum_ndx btree (verbum) Thanks for your assistance! -Erik Norvelle
pgsql-performance by date: