Re: Savepoints in transactions for speed? - Mailing list pgsql-performance

From Claudio Freire
Subject Re: Savepoints in transactions for speed?
Date
Msg-id CAGTBQpbQSCkPcGR_buPs4PZsLBOR6+2BepNX0eXpeA6dkbpVdA@mail.gmail.com
Whole thread Raw
In response to Re: Savepoints in transactions for speed?  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Savepoints in transactions for speed?
List pgsql-performance
On Thu, Nov 29, 2012 at 3:32 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Thu, 2012-11-29 at 00:48 -0300, Claudio Freire wrote:
>> Not really that fast if you have indices (and who doesn't have a PK or two).
>>
>> I've never been able to update (update) 2M rows in one transaction in
>> reasonable times (read: less than several hours) without dropping
>> indices. Doing it in batches is way faster if you can't drop the
>> indices, and if you can leverage HOT updates.
>
> I tried a quick test with 2M tuples and 3 indexes over int8, numeric,
> and text (generated data). There was also an unindexed bytea column.
> Using my laptop, a full update of the int8 column (which is indexed,
> forcing cold updates) took less than 4 minutes.
>
> I'm sure there are other issues with real-world workloads, and I know
> that it's wasteful compared to something that can make use of HOT
> updates. But unless there is something I'm missing, it's not really
> worth the effort to batch if that is the size of the update.

On a pre-production database I have (that is currently idle), on a
server with 4G RAM and a single SATA disk (probably similar to your
laptop in that sense more or less, possibly more TPS since the HD rpm
is 7k and your laptop probably is 5k), it's been running for half an
hour and is still running (and I don't expect it to finish today if
past experience is to be believed).

The database sees somewhat real test workloads from time to time, so
it's probably a good example of a live database (sans the concurrent
load).

The table is probably a lot wider than yours, having many columns,
some text typed, and many indices too. Updating one indexed int4 like
so:

begin;
update users set country_id = 1 where id < (328973625/2);
rollback;

(the where condition returns about 2M rows out of a ~5M total)

There is quite a few foreign key constraints that I expect are
interfering as well. I could try dropping them, just not on this
database.

The schema:


CREATE TABLE users
(
  id integer NOT NULL,
  about_me character varying(500),
  birth_date timestamp without time zone,
  confirmed boolean,
  creation_date timestamp without time zone,
  email character varying(255),
  first_name character varying(255),
  image_link character varying(255),
  is_native_location boolean,
  is_panelist boolean,
  last_name character varying(255),
  privacy bigint NOT NULL,
  sex integer,
  username character varying(255),
  city_id integer,
  country_id integer,
  state_id integer,
  last_access_to_inbox timestamp without time zone,
  profile_background_color character varying(255),
  profile_background_image_link character varying(255),
  url character varying(255),
  notifications bigint,
  last_activity_date timestamp without time zone,
  site_country_id integer,
  show_welcome_message boolean NOT NULL,
  invited boolean,
  partner_id integer,
  panelist_update bigint,
  unregistered boolean DEFAULT false,
  import_state integer,
  show_alerts_since timestamp without time zone,
  super_user boolean DEFAULT false,
  survey_id integer,
  site_id smallint NOT NULL,
  panelist_percentage smallint NOT NULL DEFAULT 0,
  reason integer,
  unregistered_date timestamp without time zone,
  is_panelist_update_date timestamp without time zone,
  confirmation_update_date timestamp without time zone,
  no_panelist_reason integer,
  facebook_connect_status smallint,
  CONSTRAINT user_pkey PRIMARY KEY (id ),
  CONSTRAINT fk36ebcb26f1a196 FOREIGN KEY (site_country_id)
      REFERENCES countries (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_city_id FOREIGN KEY (city_id)
      REFERENCES cities (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_country_id FOREIGN KEY (country_id)
      REFERENCES countries (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_state_id FOREIGN KEY (state_id)
      REFERENCES states (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_survey_id FOREIGN KEY (survey_id)
      REFERENCES surveys (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT users_id_check CHECK (id >= 0)
)

Indices on:

  (creation_date );

  (panelist_update )
  WHERE panelist_update IS NOT NULL;

  (birth_date );

  (email COLLATE pg_catalog."default" );

  (lower(email::text) COLLATE pg_catalog."default" );

  (partner_id , creation_date );

  (site_id , country_id , city_id );

  (site_id , country_id , state_id );

  (username COLLATE pg_catalog."default" );


pgsql-performance by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Savepoints in transactions for speed?
Next
From: Jeff Janes
Date:
Subject: Re: Savepoints in transactions for speed?