Our DB has an audit table which is 500M rows and growing. (FYI the objects
being audited are grouped semantically, not individual field values).
Recently we wanted to add a new feature and we altered the table to add a
new column. We are backfilling this varchar(255) column by writing a TCL
script to page through the rows (where every update is a UPDATE ... WHERE id
>= x AND id < x+10 and a commit is performed after every 1000 updates
statement, i.e. every 10000 rows.)
We have 10 columns, six of which are indexed. Rough calculations suggest
that this will take two to three weeks to complete on an 8-core CPU with
more than enough memory.
As a ballpark estimate - is this sort of performance for an 500M updates
what one would expect of PG given the table structure (detailed below) or
should I dig deeper to look for performance issues?
As always, thanks!
Carlo
Table/index structure:
CREATE TABLE mdx_core.audit_impt
(
audit_impt_id serial NOT NULL,
impt_session integer,
impt_version character varying(255),
impt_name character varying(255),
impt_id integer,
target_table character varying(255),
target_id integer,
target_op character varying(10),
note text,
source_table character varying(255),
CONSTRAINT audit_impt_pkey PRIMARY KEY (audit_impt_id)
)
CREATE INDEX audit_impt_impt_id_idx
ON mdx_core.audit_impt
USING btree
(impt_id);
CREATE INDEX audit_impt_impt_name
ON mdx_core.audit_impt
USING btree
(impt_name, impt_version);
CREATE INDEX audit_impt_session_idx
ON mdx_core.audit_impt
USING btree
(impt_session);
CREATE INDEX audit_impt_source_table
ON mdx_core.audit_impt
USING btree
(source_table);
CREATE INDEX audit_impt_target_id_idx
ON mdx_core.audit_impt
USING btree
(target_id, audit_impt_id);
CREATE INDEX audit_impt_target_table_idx
ON mdx_core.audit_impt
USING btree
(target_table, target_id, audit_impt_id);