Re: Massive table (500M rows) update nightmare - Mailing list pgsql-performance

From Eduardo Morras
Subject Re: Massive table (500M rows) update nightmare
Date
Msg-id 20100108112431.46277558B18@s21sec.com
Whole thread Raw
In response to Massive table (500M rows) update nightmare  ("Carlo Stonebanks" <stonec.register@sympatico.ca>)
List pgsql-performance
At 08:17 07/01/2010, Carlo Stonebanks wrote:
>Our DB has an audit table which is 500M rows and growing. (FYI the objects being audited are grouped semantically, not
individualfield values). 
>
>Recently we wanted to add a new feature and we altered the table to add a new column. We are backfilling this
varchar(255)column by writing a TCL script to page through the rows (where every update is a UPDATE ... WHERE id >= x
ANDid < x+10 and a commit is performed after every 1000 updates statement, i.e. every 10000 rows.) 
>
>We have 10 columns, six of which are indexed. Rough calculations suggest that this will take two to three weeks to
completeon an 8-core CPU with more than enough memory. 
>
>As a ballpark estimate - is this sort of performance for an 500M updates what one would expect of PG given the table
structure(detailed below) or should I dig deeper to look for performance issues? 
>
>As always, thanks!
>
>Carlo

You can dump the table with pg_dumpand get a file like this (only a few columns, not all of them). Note that the last
line,has the data in TSV (Tab Separate Format) plus a LF. 

SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;

SET search_path = public, pg_catalog;

SET default_tablespace = '';

SET default_with_oids = false;

CREATE TABLE mdx_core.audit_impt
(
 audit_impt_id serial NOT NULL,
 impt_session integer,
 impt_version character varying(255),
}

COPY mdx_core.audit_impt (audit_impt_id, impt_session, impt_version) FROM stdin;
1       1       tateti
2       32      coll

You can add a new column hacking it, just adding the new column to the schema, the name in the copy statement and a
tabulator+dataat the end of the line (before the LF). Note that the table name is different from the original. 

SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
CREATE TABLE mdx_core.audit_impt2
(
 audit_impt_id serial NOT NULL,
 impt_session integer,
 impt_version character varying(255),
 source_table character varying(255)
}
COPY mdx_core.audit_impt2 (audit_impt_id, impt_session, impt_version, source_table) FROM stdin;
1       1       tateti  tentown
1       32      coll    krof


After this, add indexes, constraints as usual.

HTH

--------------------------------
Eduardo Morrás González
Dept. I+D+i e-Crime Vigilancia Digital
S21sec Labs
Tlf: +34 902 222 521
Móvil: +34 555 555 555
www.s21sec.com, blog.s21sec.com


Salvo que se indique lo contrario, esta información es CONFIDENCIAL y
contiene datos de carácter personal que han de ser tratados conforme a la
legislación vigente en materia de protección de datos. Si usted no es
destinatario original de este mensaje, le comunicamos que no está autorizado
a revisar, reenviar, distribuir, copiar o imprimir la información en él
contenida y le rogamos que proceda a borrarlo de sus sistemas.

Kontrakoa adierazi ezean, posta elektroniko honen barruan doana ISILPEKO
informazioa da eta izaera pertsonaleko datuak dituenez, indarrean dagoen
datu pertsonalak babesteko legediaren arabera tratatu beharrekoa. Posta
honen hartzaile ez zaren kasuan, jakinarazten dizugu baimenik ez duzula
bertan dagoen informazioa aztertu, igorri, banatu, kopiatu edo inprimatzeko.
Hortaz, erregutzen dizugu posta hau zure sistemetatik berehala ezabatzea.

Antes de imprimir este mensaje valora si verdaderamente es necesario. De
esta forma contribuimos a la preservación del Medio Ambiente.


pgsql-performance by date:

Previous
From: Tore Halvorsen
Date:
Subject: FusionIO performance
Next
From: "Kevin Grittner"
Date:
Subject: Re: Massive table (500M rows) update nightmare