Re: Massive table (500M rows) update nightmare - Mailing list pgsql-performance
From | Eduardo Morras |
---|---|
Subject | Re: Massive table (500M rows) update nightmare |
Date | |
Msg-id | 20100108112431.46277558B18@s21sec.com Whole thread Raw |
In response to | Massive table (500M rows) update nightmare ("Carlo Stonebanks" <stonec.register@sympatico.ca>) |
List | pgsql-performance |
At 08:17 07/01/2010, Carlo Stonebanks wrote: >Our DB has an audit table which is 500M rows and growing. (FYI the objects being audited are grouped semantically, not individualfield values). > >Recently we wanted to add a new feature and we altered the table to add a new column. We are backfilling this varchar(255)column by writing a TCL script to page through the rows (where every update is a UPDATE ... WHERE id >= x ANDid < x+10 and a commit is performed after every 1000 updates statement, i.e. every 10000 rows.) > >We have 10 columns, six of which are indexed. Rough calculations suggest that this will take two to three weeks to completeon an 8-core CPU with more than enough memory. > >As a ballpark estimate - is this sort of performance for an 500M updates what one would expect of PG given the table structure(detailed below) or should I dig deeper to look for performance issues? > >As always, thanks! > >Carlo You can dump the table with pg_dumpand get a file like this (only a few columns, not all of them). Note that the last line,has the data in TSV (Tab Separate Format) plus a LF. SET statement_timeout = 0; SET client_encoding = 'UTF8'; SET standard_conforming_strings = off; SET check_function_bodies = false; SET client_min_messages = warning; SET escape_string_warning = off; SET search_path = public, pg_catalog; SET default_tablespace = ''; SET default_with_oids = false; CREATE TABLE mdx_core.audit_impt ( audit_impt_id serial NOT NULL, impt_session integer, impt_version character varying(255), } COPY mdx_core.audit_impt (audit_impt_id, impt_session, impt_version) FROM stdin; 1 1 tateti 2 32 coll You can add a new column hacking it, just adding the new column to the schema, the name in the copy statement and a tabulator+dataat the end of the line (before the LF). Note that the table name is different from the original. SET statement_timeout = 0; SET client_encoding = 'UTF8'; SET standard_conforming_strings = off; SET check_function_bodies = false; SET client_min_messages = warning; SET escape_string_warning = off; SET search_path = public, pg_catalog; SET default_tablespace = ''; SET default_with_oids = false; CREATE TABLE mdx_core.audit_impt2 ( audit_impt_id serial NOT NULL, impt_session integer, impt_version character varying(255), source_table character varying(255) } COPY mdx_core.audit_impt2 (audit_impt_id, impt_session, impt_version, source_table) FROM stdin; 1 1 tateti tentown 1 32 coll krof After this, add indexes, constraints as usual. HTH -------------------------------- Eduardo Morrás González Dept. I+D+i e-Crime Vigilancia Digital S21sec Labs Tlf: +34 902 222 521 Móvil: +34 555 555 555 www.s21sec.com, blog.s21sec.com Salvo que se indique lo contrario, esta información es CONFIDENCIAL y contiene datos de carácter personal que han de ser tratados conforme a la legislación vigente en materia de protección de datos. Si usted no es destinatario original de este mensaje, le comunicamos que no está autorizado a revisar, reenviar, distribuir, copiar o imprimir la información en él contenida y le rogamos que proceda a borrarlo de sus sistemas. Kontrakoa adierazi ezean, posta elektroniko honen barruan doana ISILPEKO informazioa da eta izaera pertsonaleko datuak dituenez, indarrean dagoen datu pertsonalak babesteko legediaren arabera tratatu beharrekoa. Posta honen hartzaile ez zaren kasuan, jakinarazten dizugu baimenik ez duzula bertan dagoen informazioa aztertu, igorri, banatu, kopiatu edo inprimatzeko. Hortaz, erregutzen dizugu posta hau zure sistemetatik berehala ezabatzea. Antes de imprimir este mensaje valora si verdaderamente es necesario. De esta forma contribuimos a la preservación del Medio Ambiente.
pgsql-performance by date: