<fixed><fontfamily><param>Courier New</param>Folks:
I´m running a query which is designed to generate a foreign key for a
table of approx. 10 million records (I've mentioned this in an earlier
posting). The table is called "indethom", and each row contains a
single word from the works of St. Thomas Aquinas, along with
grammatical data about the word form, and (most importantly for my
current problem) a set of columns identifying the particular
work/section/paragraph that the word appears in.
This database is completely non-normalized, and I'm working on
performing some basic normalization, beginning with creating a table
called "s2.sectiones" which (naturally) contains a complete listing of
all of the sections of all the works of St. Thomas. I will then
eliminate this information from the original "indethom" table,
replacing it with the foreign key I am currently generating.
** My question has to do with whether or not I am getting maximal
speed out of PostgreSQL, or whether I need to perform further
optimizations. I am currently getting about 200,000 updates per hour,
and updating the entire 10 million rows thus requires 50 hours, which
seems a bit much.
Here's the query I am running:
update indethom
set query_counter = nextval('s2.query_counter_seq'), --
Just for keeping track of how fast the query is running
sectref = (select clavis from s2.sectiones where
s2.sectiones.nomeoper = indethom.nomeoper
and s2.sectiones.refere1a = indethom.refere1a and
s2.sectiones.refere1b = indethom.refere1b
and s2.sectiones.refere2a = indethom.refere2a and
s2.sectiones.refere2b = indethom.refere2b
and s2.sectiones.refere3a = indethom.refere3a and
s2.sectiones.refere3b = indethom.refere3b
and s2.sectiones.refere4a = indethom.refere4a and
s2.sectiones.refere4b = indethom.refere4b);
Here´s the query plan:
QUERY PLAN
-------------------------------------------------------------------------------------
Seq Scan on indethom (cost=0.00..1310352.72 rows=10631972 width=212)
SubPlan
-> Index Scan using sectiones_ndx on sectiones (cost=0.00..6.03
rows=1 width=4)
Index Cond: ((nomeoper = $0) AND (refere1a = $1) AND
(refere1b = $2) AND (refere2a = $3) AND (refere2b = $4) AND (refere3a
= $5) AND (refere3b = $6) AND (refere4a = $7) AND (refere4b = $8))
(4 rows)
<smaller>Note: I have just performed a VACUUM ANALYZE on the indethom
table, as suggested by this listserve.</smaller>
Here's the structure of the s2.sectiones table:
it=> \d s2.sectiones
Table "s2.sectiones"
Column | Type | Modifiers
----------+--------------+-----------
nomeoper | character(3) |
refere1a | character(2) |
refere1b | character(2) |
refere2a | character(2) |
refere2b | character(2) |
refere3a | character(2) |
refere3b | character(2) |
refere4a | character(2) |
refere4b | character(2) |
clavis | integer |
Indexes: sectiones_ndx btree (nomeoper, refere1a, refere1b, refere2a,
refere2b, refere3a, refere3b, refere4a, refere4b)
Finally, here is the structure of indethom (some non-relevant columns
not shown):
it=> \d indethom
Table "public.indethom"
Column | Type | Modifiers
---------------+-----------------------+-----------
numeoper | smallint | not null
nomeoper | character(3) | not null
editcrit | character(1) |
refere1a | character(2) |
refere1b | character(2) |
refere2a | character(2) |
refere2b | character(2) |
refere3a | character(2) |
refere3b | character(2) |
refere4a | character(2) |
refere4b | character(2) |
refere5a | character(2) | not null
refere5b | smallint | not null
referen6 | smallint | not null
... several columns skipped ...
verbum | character varying(22) | not null
... other columns skipped ...
poslinop | integer | not null
posverli | smallint | not null
posverop | integer | not null
clavis | integer | not null
articref | integer |
sectref | integer |
query_counter | integer |
Indexes: indethom_pkey primary key btree (clavis),
indethom_articulus_ndx btree (nomeoper, refere1a, refere1b,
refere2a, refere2b, refere3a, refere3b),
indethom_sectio_ndx btree (nomeoper, refere1a, refere1b,
refere2a, refere2b, refere3a, refere3b, refere4a, refere4b),
verbum_ndx btree (verbum)
Thanks for your assistance!
-Erik Norvelle</fontfamily></fixed>Folks:
I´m running a query which is designed to generate a foreign key for a
table of approx. 10 million records (I've mentioned this in an earlier
posting). The table is called "indethom", and each row contains a
single word from the works of St. Thomas Aquinas, along with
grammatical data about the word form, and (most importantly for my
current problem) a set of columns identifying the particular
work/section/paragraph that the word appears in.
This database is completely non-normalized, and I'm working on
performing some basic normalization, beginning with creating a table
called "s2.sectiones" which (naturally) contains a complete listing of
all of the sections of all the works of St. Thomas. I will then
eliminate this information from the original "indethom" table,
replacing it with the foreign key I am currently generating.
** My question has to do with whether or not I am getting maximal speed
out of PostgreSQL, or whether I need to perform further optimizations.
I am currently getting about 200,000 updates per hour, and updating the
entire 10 million rows thus requires 50 hours, which seems a bit much.
Here's the query I am running:
update indethom
set query_counter = nextval('s2.query_counter_seq'), -- Just
for keeping track of how fast the query is running
sectref = (select clavis from s2.sectiones where
s2.sectiones.nomeoper = indethom.nomeoper
and s2.sectiones.refere1a = indethom.refere1a and
s2.sectiones.refere1b = indethom.refere1b
and s2.sectiones.refere2a = indethom.refere2a and
s2.sectiones.refere2b = indethom.refere2b
and s2.sectiones.refere3a = indethom.refere3a and
s2.sectiones.refere3b = indethom.refere3b
and s2.sectiones.refere4a = indethom.refere4a and
s2.sectiones.refere4b = indethom.refere4b);
Here´s the query plan:
QUERY PLAN
------------------------------------------------------------------------
-------------
Seq Scan on indethom (cost=0.00..1310352.72 rows=10631972 width=212)
SubPlan
-> Index Scan using sectiones_ndx on sectiones (cost=0.00..6.03
rows=1 width=4)
Index Cond: ((nomeoper = $0) AND (refere1a = $1) AND
(refere1b = $2) AND (refere2a = $3) AND (refere2b = $4) AND (refere3a =
$5) AND (refere3b = $6) AND (refere4a = $7) AND (refere4b = $8))
(4 rows)
Note: I have just performed a VACUUM ANALYZE on the indethom table, as
suggested by this listserve.
Here's the structure of the s2.sectiones table:
it=> \d s2.sectiones
Table "s2.sectiones"
Column | Type | Modifiers
----------+--------------+-----------
nomeoper | character(3) |
refere1a | character(2) |
refere1b | character(2) |
refere2a | character(2) |
refere2b | character(2) |
refere3a | character(2) |
refere3b | character(2) |
refere4a | character(2) |
refere4b | character(2) |
clavis | integer |
Indexes: sectiones_ndx btree (nomeoper, refere1a, refere1b, refere2a,
refere2b, refere3a, refere3b, refere4a, refere4b)
Finally, here is the structure of indethom (some non-relevant columns
not shown):
it=> \d indethom
Table "public.indethom"
Column | Type | Modifiers
---------------+-----------------------+-----------
numeoper | smallint | not null
nomeoper | character(3) | not null
editcrit | character(1) |
refere1a | character(2) |
refere1b | character(2) |
refere2a | character(2) |
refere2b | character(2) |
refere3a | character(2) |
refere3b | character(2) |
refere4a | character(2) |
refere4b | character(2) |
refere5a | character(2) | not null
refere5b | smallint | not null
referen6 | smallint | not null
... several columns skipped ...
verbum | character varying(22) | not null
... other columns skipped ...
poslinop | integer | not null
posverli | smallint | not null
posverop | integer | not null
clavis | integer | not null
articref | integer |
sectref | integer |
query_counter | integer |
Indexes: indethom_pkey primary key btree (clavis),
indethom_articulus_ndx btree (nomeoper, refere1a, refere1b,
refere2a, refere2b, refere3a, refere3b),
indethom_sectio_ndx btree (nomeoper, refere1a, refere1b,
refere2a, refere2b, refere3a, refere3b, refere4a, refere4b),
verbum_ndx btree (verbum)
Thanks for your assistance!
-Erik Norvelle