Thread: Savepoints in transactions for speed?

Savepoints in transactions for speed?

From
Mike Blackwell
Date:
I need to delete about 1.5 million records from a table and reload it in one transaction.  The usual advice when loading with inserts seems to be group them into transactions of around 1k records.  Committing at that point would leave the table in an inconsistent state.  Would issuing a savepoint every 1k or so records negate whatever downside there is to keeping a transaction open for all 1.5 million records, or just add more overhead?

The data to reload the table is coming from a Perl DBI connection to a different database (not PostgreSQL) so I'm not sure the COPY alternative applies here.

Any suggestions are welcome.


Mike

Re: Savepoints in transactions for speed?

From
Richard Huxton
Date:
On 27/11/12 22:04, Mike Blackwell wrote:
> I need to delete about 1.5 million records from a table and reload it
> in one transaction.

> The data to reload the table is coming from a Perl DBI connection to a
> different database (not PostgreSQL) so I'm not sure the COPY
> alternative applies here.
No reason why it shouldn't.

https://metacpan.org/module/DBD::Pg#COPY-support

--
   Richard Huxton
   Archonet Ltd


Re: Savepoints in transactions for speed?

From
Steve Atkins
Date:
On Nov 27, 2012, at 2:04 PM, Mike Blackwell <mike.blackwell@rrd.com> wrote:

> I need to delete about 1.5 million records from a table and reload it in one transaction.  The usual advice when
loadingwith inserts seems to be group them into transactions of around 1k records.  Committing at that point would
leavethe table in an inconsistent state.  

I'd probably just do the whole thing in one transaction.

Do you have specific reasons you want to avoid a long transaction, or just relying on rules of thumb? Postgresql isn't
goingto run out of resources doing a big transaction, in the way some other databases will. 

Long running transactions will interfere with vacuuming, but inserting a couple of million rows shouldn't take that
long.

>  Would issuing a savepoint every 1k or so records negate whatever downside there is to keeping a transaction open for
all1.5 million records, or just add more overhead? 


Savepoints are going to increase overhead and have no effect on the length of the transaction. If you want to catch
errorsand not have to redo the entire transaction, they're great, but that's about it. 

> The data to reload the table is coming from a Perl DBI connection to a different database (not PostgreSQL) so I'm not
surethe COPY alternative applies here. 

COPY works nicely from perl:

$dbh->do("COPY foo FROM STDIN");
$dbh->pg_putcopydata("foo\tbar\tbaz\n");
$dbh->pg_putcopyend();

The details are in DBD::Pg. I use this a lot for doing big-ish (tens of millions of rows) bulk inserts. It's not as
fastas you can get, but it's probably as fast as you can get with perl. 

Cheers,
  Steve



Re: Savepoints in transactions for speed?

From
Bob Lunney
Date:
Mike,

Is there anything that the 1.5 million rows have in common that would allow you to use partitions?  if so, you could load the new data into a partition  at your leisure, start a transaction, alter the partition table with the old data to no longer inherit from the parent, alter the new partition table to
inherit from the parent, commit, then drop the old table.  This operation would be very fast, the users probably won't even notice. 

Bob Lunney

On Nov 27, 2012, at 4:04 PM, Mike Blackwell <mike.blackwell@rrd.com> wrote:

I need to delete about 1.5 million records from a table and reload it in one transaction.  The usual advice when loading with inserts seems to be group them into transactions of around 1k records.  Committing at that point would leave the table in an inconsistent state.  Would issuing a savepoint every 1k or so records negate whatever downside there is to keeping a transaction open for all 1.5 million records, or just add more overhead?

The data to reload the table is coming from a Perl DBI connection to a different database (not PostgreSQL) so I'm not sure the COPY alternative applies here.

Any suggestions are welcome.


Mike

Re: Savepoints in transactions for speed?

From
Mike Blackwell
Date:
Steve Atkins wrote:

> Postgresql isn't going to run out of resources doing a big transaction, in the way some other databases will.

I thought I had read something at one point about keeping the transaction size on the order of a couple thousand because there were issues when it got larger.  As that apparently is not an issue I went ahead and tried the DELETE and COPY in a transaction.  The load time is quite reasonable this way.

Thanks!

Mike

Re: Savepoints in transactions for speed?

From
Claudio Freire
Date:
On Tue, Nov 27, 2012 at 10:08 PM, Mike Blackwell <mike.blackwell@rrd.com> wrote:
>
> > Postgresql isn't going to run out of resources doing a big transaction, in the way some other databases will.
>
> I thought I had read something at one point about keeping the transaction size on the order of a couple thousand
becausethere were issues when it got larger.  As that apparently is not an issue I went ahead and tried the DELETE and
COPYin a transaction.  The load time is quite reasonable this way. 

Updates, are faster if batched, if your business logic allows it,
because it creates less bloat and creates more opportunities for with
HOT updates. I don't think it applies to inserts, though, and I
haven't heard it either.

In any case, if your business logic doesn't allow it (and your case
seems to suggest it), there's no point in worrying.


Re: Savepoints in transactions for speed?

From
"Franklin, Dan"
Date:
On Tue, Nov 27, 2012 at 6:26 PM, Steve Atkins <steve@blighty.com> wrote:

On Nov 27, 2012, at 2:04 PM, Mike Blackwell <mike.blackwell@rrd.com> wrote:

> I need to delete about 1.5 million records from a table and reload it in one transaction.  The usual advice when loading with inserts seems to be group them into transactions of around 1k records.  Committing at that point would leave the table in an inconsistent state.

I'd probably just do the whole thing in one transaction.

Do you have specific reasons you want to avoid a long transaction, or just relying on rules of thumb? Postgresql isn't going to run out of resources doing a big transaction, in the way some other databases will.

Long running transactions will interfere with vacuuming, but inserting a couple of million rows shouldn't take that long.

>  Would issuing a savepoint every 1k or so records negate whatever downside there is to keeping a transaction open for all 1.5 million records, or just add more overhead?


Savepoints are going to increase overhead and have no effect on the length of the transaction. If you want to catch errors and not have to redo the entire transaction, they're great, but that's about it.

> The data to reload the table is coming from a Perl DBI connection to a different database (not PostgreSQL) so I'm not sure the COPY alternative applies here.

COPY works nicely from perl:

$dbh->do("COPY foo FROM STDIN");
$dbh->pg_putcopydata("foo\tbar\tbaz\n");
$dbh->pg_putcopyend();

The details are in DBD::Pg. I use this a lot for doing big-ish (tens of millions of rows) bulk inserts. It's not as fast as you can get, but it's probably as fast as you can get with perl.

Cheers,
  Steve
 
I do this as well - insert a few million rows into a table using the DBI::Pg copy interface.  It works well.

I ended up batching the copies so that each COPY statement only does a few hundred thousand at a time, but it's all one transaction.

The batching was necessary because of an idiosyncrasy of COPY in Pg 8.1: each COPY statement's contents was buffered in a malloc'd space, and if there were several million rows buffered up, the allocated virtual memory could get quite large - as in several GB.  It plus the buffer pool sometimes exceeded the amount of RAM I had available at that time (several years ago), with bad effects on performance.

This may have been fixed since then, or maybe RAM's gotten big enough that it's not a problem.

Dan Franklin 

Re: Savepoints in transactions for speed?

From
Willem Leenen
Date:

Savepoint are not created for performance.  If you have one very long running transactions that fails in the end, it will all be rolled back. So be pretty sure about your dataquality or use safepoints.

Re: Savepoints in transactions for speed?

From
Mike Blackwell
Date:


On Tue, Nov 27, 2012 at 7:16 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
Updates, are faster if batched, if your business logic allows it,
because it creates less bloat and creates more opportunities for with
HOT updates. I don't think it applies to inserts, though, and I
haven't heard it either.

Ah.  That must have been what I'd half-remembered.  Thanks for the clarification.

Mike 

Re: Savepoints in transactions for speed?

From
Willem Leenen
Date:

Commitmarks are written to disk after each transaction. So transactionsize has impact on performance.


Date: Wed, 28 Nov 2012 09:18:20 -0600
Subject: Re: [PERFORM] Savepoints in transactions for speed?
From: mike.blackwell@rrd.com
To: klaussfreire@gmail.com
CC: pgsql-performance@postgresql.org



On Tue, Nov 27, 2012 at 7:16 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
Updates, are faster if batched, if your business logic allows it,
because it creates less bloat and creates more opportunities for with
HOT updates. I don't think it applies to inserts, though, and I
haven't heard it either.

Ah.  That must have been what I'd half-remembered.  Thanks for the clarification.

Mike 

Re: Savepoints in transactions for speed?

From
Jeff Davis
Date:
On Tue, 2012-11-27 at 22:16 -0300, Claudio Freire wrote:
> Updates, are faster if batched, if your business logic allows it,
> because it creates less bloat and creates more opportunities for with
> HOT updates. I don't think it applies to inserts, though, and I
> haven't heard it either.

Huge updates (e.g. UPDATE with no WHERE clause) are less likely to
benefit from HOT. HOT has two main optimizations:

1. Remove dead tuples faster without waiting for VACUUM -- this only
works if the transaction that updated/deleted the tuple actually
finished (otherwise the tuple can't be removed yet), so it only benefits
the *next* update to come along. But if it's one big update, then VACUUM
is probably just as good at cleaning up the space.

2. Doesn't make new index entries for the new tuple; reuses the old
index entries -- this only works if the update is on the same page, but
large updates tend to fill pages up (because of the buildup of dead
tuples) and force new to go to new pages.

HOT is essentially designed for lots of small updates, which didn't
perform well before PG 8.3.

Batching of inserts/updates/deletes has a big benefit over separate
transactions, but only up to a point, after which it levels off. I'm not
sure exactly when that point is, but after that, the downsides of
keeping a transaction open (like inability to remove the previous
version of an updated tuple) take over.

Regards,
    Jeff Davis



Re: Savepoints in transactions for speed?

From
Jeff Davis
Date:
On Tue, 2012-11-27 at 16:04 -0600, Mike Blackwell wrote:
> I need to delete about 1.5 million records from a table and reload it
> in one transaction.  The usual advice when loading with inserts seems
> to be group them into transactions of around 1k records.  Committing
> at that point would leave the table in an inconsistent state.  Would
> issuing a savepoint every 1k or so records negate whatever downside
> there is to keeping a transaction open for all 1.5 million records, or
> just add more overhead?

A large transaction isn't really a big problem for postgres, and 1.5M
records should be processed quickly anyway.

The main problem with a long-running delete or update transaction is
that the dead tuples (deleted tuples or the old version of an updated
tuple) can't be removed until the transaction finishes. That can cause
temporary "bloat", but 1.5M records shouldn't be noticeable.

Adding subtransactions into the mix won't help, but probably won't hurt,
either. The transaction will still run just as long, and you still can't
delete the tuples ahead of time (unless you abort a subtransaction). If
you *do* use subtransactions, make sure to release them as quickly as
you create them (don't just use ROLLBACK TO, that still leaves the
savepoint there); having 1500 open subtransactions might cause
performance problems elsewhere.

Regards,
    Jeff Davis



Re: Savepoints in transactions for speed?

From
Claudio Freire
Date:
On Wed, Nov 28, 2012 at 8:28 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>
> The main problem with a long-running delete or update transaction is
> that the dead tuples (deleted tuples or the old version of an updated
> tuple) can't be removed until the transaction finishes. That can cause
> temporary "bloat", but 1.5M records shouldn't be noticeable.

Not really that fast if you have indices (and who doesn't have a PK or two).

I've never been able to update (update) 2M rows in one transaction in
reasonable times (read: less than several hours) without dropping
indices. Doing it in batches is way faster if you can't drop the
indices, and if you can leverage HOT updates.


Re: Savepoints in transactions for speed?

From
Mike Blackwell
Date:

On Wed, Nov 28, 2012 at 9:48 PM, Claudio Freire <klaussfreire@gmail.com> wrote:

On Wed, Nov 28, 2012 at 8:28 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>
> The main problem with a long-running delete or update transaction is
> that the dead tuples (deleted tuples or the old version of an updated
> tuple) can't be removed until the transaction finishes. That can cause
> temporary "bloat", but 1.5M records shouldn't be noticeable.

Not really that fast if you have indices (and who doesn't have a PK or two).

I've never been able to update (update) 2M rows in one transaction in
reasonable times (read: less than several hours) without dropping
indices. Doing it in batches is way faster if you can't drop the
indices, and if you can leverage HOT updates.

What I'm trying at this point is:

BEGIN;
DROP INDEX -- only one unique index exists
DELETE FROM table;
COPY table FROM STDIN;
COMMIT;
CREATE INDEX CONCURRENTLY;

Do I understand correctly that DROP/CREATE index are not transactional, and thus the index will disappear immediately for other transactions?  Am I better off in that case moving the DROP INDEX outside the transaction?

The access pattern for the table is such that I can afford the occasional stray hit without an index during the reload time.  It's been pretty quick using the above.

Mike

Re: Savepoints in transactions for speed?

From
Scott Marlowe
Date:
On Thu, Nov 29, 2012 at 9:38 AM, Mike Blackwell <mike.blackwell@rrd.com> wrote:
> On Wed, Nov 28, 2012 at 9:48 PM, Claudio Freire <klaussfreire@gmail.com>
> wrote:
>>
>> On Wed, Nov 28, 2012 at 8:28 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>> >
>> > The main problem with a long-running delete or update transaction is
>> > that the dead tuples (deleted tuples or the old version of an updated
>> > tuple) can't be removed until the transaction finishes. That can cause
>> > temporary "bloat", but 1.5M records shouldn't be noticeable.
>>
>> Not really that fast if you have indices (and who doesn't have a PK or
>> two).
>>
>> I've never been able to update (update) 2M rows in one transaction in
>> reasonable times (read: less than several hours) without dropping
>> indices. Doing it in batches is way faster if you can't drop the
>> indices, and if you can leverage HOT updates.
>
>
> What I'm trying at this point is:
>
> BEGIN;
> DROP INDEX -- only one unique index exists
> DELETE FROM table;
> COPY table FROM STDIN;
> COMMIT;
> CREATE INDEX CONCURRENTLY;
>
> Do I understand correctly that DROP/CREATE index are not transactional, and
> thus the index will disappear immediately for other transactions?  Am I
> better off in that case moving the DROP INDEX outside the transaction?
>
> The access pattern for the table is such that I can afford the occasional
> stray hit without an index during the reload time.  It's been pretty quick
> using the above.

Drop / create index ARE transactional, like most other things in
postgresql (only drop / create database and drop / create tablespace
are non-transactional).  Your current sequence will result in the
table you are dropping the index on being locked for other
transactions until commit or rollback.  Run two psql sessions and test
it to see.


Re: Savepoints in transactions for speed?

From
Mike Blackwell
Date:
Ah.  So it does.  Testing with two psql sessions locks as you said, and moving the DROP INDEX to a separate transaction give the results I was looking for.

Thanks,
Mike

__________________________________________________________________________________
Mike Blackwell | Technical Analyst, Distribution Services/Rollout Management | RR Donnelley
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
Mike.Blackwell@rrd.com
http://www.rrdonnelley.com





On Thu, Nov 29, 2012 at 10:54 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Thu, Nov 29, 2012 at 9:38 AM, Mike Blackwell <mike.blackwell@rrd.com> wrote:
> On Wed, Nov 28, 2012 at 9:48 PM, Claudio Freire <klaussfreire@gmail.com>
> wrote:
>>
>> On Wed, Nov 28, 2012 at 8:28 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>> >
>> > The main problem with a long-running delete or update transaction is
>> > that the dead tuples (deleted tuples or the old version of an updated
>> > tuple) can't be removed until the transaction finishes. That can cause
>> > temporary "bloat", but 1.5M records shouldn't be noticeable.
>>
>> Not really that fast if you have indices (and who doesn't have a PK or
>> two).
>>
>> I've never been able to update (update) 2M rows in one transaction in
>> reasonable times (read: less than several hours) without dropping
>> indices. Doing it in batches is way faster if you can't drop the
>> indices, and if you can leverage HOT updates.
>
>
> What I'm trying at this point is:
>
> BEGIN;
> DROP INDEX -- only one unique index exists
> DELETE FROM table;
> COPY table FROM STDIN;
> COMMIT;
> CREATE INDEX CONCURRENTLY;
>
> Do I understand correctly that DROP/CREATE index are not transactional, and
> thus the index will disappear immediately for other transactions?  Am I
> better off in that case moving the DROP INDEX outside the transaction?
>
> The access pattern for the table is such that I can afford the occasional
> stray hit without an index during the reload time.  It's been pretty quick
> using the above.

Drop / create index ARE transactional, like most other things in
postgresql (only drop / create database and drop / create tablespace
are non-transactional).  Your current sequence will result in the
table you are dropping the index on being locked for other
transactions until commit or rollback.  Run two psql sessions and test
it to see.

Re: Savepoints in transactions for speed?

From
Jeff Janes
Date:
On Thu, Nov 29, 2012 at 8:38 AM, Mike Blackwell <mike.blackwell@rrd.com> wrote:
>
> What I'm trying at this point is:
>
> BEGIN;
> DROP INDEX -- only one unique index exists
> DELETE FROM table;
> COPY table FROM STDIN;
> COMMIT;
> CREATE INDEX CONCURRENTLY;
>
> Do I understand correctly that DROP/CREATE index are not transactional, and
> thus the index will disappear immediately for other transactions?

The DROP is transactional.

But the way it works here is that the index is access exclusively
locked when the DROP is encountered (and so is the table) so any other
transaction will block on it, even though the index is still there.
(Transactionality does not inherently demand this behavior, it is just
the way PG implements it.  For example, it could take a weaker lock at
the time DROP is encountered and then escalate it to exclusive only
during the commit processing.  But that would greatly expand the risk
of deadlock, and would certainly be more complicated to code.)


>  Am I
> better off in that case moving the DROP INDEX outside the transaction?
>
> The access pattern for the table is such that I can afford the occasional
> stray hit without an index during the reload time.

If you don't mind queries doing doing full table scans, and not having
the benefit of the unique constraint, for that period, then yes you
should move the drop index into a separate transaction.

But If you do keep the drop index inside the transaction, then you
would probably be better off using truncate rather than delete, and
rebuild the index non-concurrently and move that inside the
transaction as well.

Cheers,

Jeff


Re: Savepoints in transactions for speed?

From
Mike Blackwell
Date:



On Thu, Nov 29, 2012 at 12:09 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

But If you do keep the drop index inside the transaction, then you
would probably be better off using truncate rather than delete, and
rebuild the index non-concurrently and move that inside the
transaction as well.



Hmm....  From the 9.2 manual it seems that might not work out so well:

TRUNCATE is not MVCC-safe (see Chapter 13 for general information about MVCC). After truncation, the table will appear empty to all concurrent transactions, even if they are using a snapshot taken before the truncation occurred.

It looks like other transactions could find an empty table while it was being reloaded under that approach.

Re: Savepoints in transactions for speed?

From
Jeff Davis
Date:
On Thu, 2012-11-29 at 00:48 -0300, Claudio Freire wrote:
> Not really that fast if you have indices (and who doesn't have a PK or two).
>
> I've never been able to update (update) 2M rows in one transaction in
> reasonable times (read: less than several hours) without dropping
> indices. Doing it in batches is way faster if you can't drop the
> indices, and if you can leverage HOT updates.

I tried a quick test with 2M tuples and 3 indexes over int8, numeric,
and text (generated data). There was also an unindexed bytea column.
Using my laptop, a full update of the int8 column (which is indexed,
forcing cold updates) took less than 4 minutes.

I'm sure there are other issues with real-world workloads, and I know
that it's wasteful compared to something that can make use of HOT
updates. But unless there is something I'm missing, it's not really
worth the effort to batch if that is the size of the update.

Regards,
    Jeff Davis




Re: Savepoints in transactions for speed?

From
Claudio Freire
Date:
On Thu, Nov 29, 2012 at 3:32 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Thu, 2012-11-29 at 00:48 -0300, Claudio Freire wrote:
>> Not really that fast if you have indices (and who doesn't have a PK or two).
>>
>> I've never been able to update (update) 2M rows in one transaction in
>> reasonable times (read: less than several hours) without dropping
>> indices. Doing it in batches is way faster if you can't drop the
>> indices, and if you can leverage HOT updates.
>
> I tried a quick test with 2M tuples and 3 indexes over int8, numeric,
> and text (generated data). There was also an unindexed bytea column.
> Using my laptop, a full update of the int8 column (which is indexed,
> forcing cold updates) took less than 4 minutes.
>
> I'm sure there are other issues with real-world workloads, and I know
> that it's wasteful compared to something that can make use of HOT
> updates. But unless there is something I'm missing, it's not really
> worth the effort to batch if that is the size of the update.

On a pre-production database I have (that is currently idle), on a
server with 4G RAM and a single SATA disk (probably similar to your
laptop in that sense more or less, possibly more TPS since the HD rpm
is 7k and your laptop probably is 5k), it's been running for half an
hour and is still running (and I don't expect it to finish today if
past experience is to be believed).

The database sees somewhat real test workloads from time to time, so
it's probably a good example of a live database (sans the concurrent
load).

The table is probably a lot wider than yours, having many columns,
some text typed, and many indices too. Updating one indexed int4 like
so:

begin;
update users set country_id = 1 where id < (328973625/2);
rollback;

(the where condition returns about 2M rows out of a ~5M total)

There is quite a few foreign key constraints that I expect are
interfering as well. I could try dropping them, just not on this
database.

The schema:


CREATE TABLE users
(
  id integer NOT NULL,
  about_me character varying(500),
  birth_date timestamp without time zone,
  confirmed boolean,
  creation_date timestamp without time zone,
  email character varying(255),
  first_name character varying(255),
  image_link character varying(255),
  is_native_location boolean,
  is_panelist boolean,
  last_name character varying(255),
  privacy bigint NOT NULL,
  sex integer,
  username character varying(255),
  city_id integer,
  country_id integer,
  state_id integer,
  last_access_to_inbox timestamp without time zone,
  profile_background_color character varying(255),
  profile_background_image_link character varying(255),
  url character varying(255),
  notifications bigint,
  last_activity_date timestamp without time zone,
  site_country_id integer,
  show_welcome_message boolean NOT NULL,
  invited boolean,
  partner_id integer,
  panelist_update bigint,
  unregistered boolean DEFAULT false,
  import_state integer,
  show_alerts_since timestamp without time zone,
  super_user boolean DEFAULT false,
  survey_id integer,
  site_id smallint NOT NULL,
  panelist_percentage smallint NOT NULL DEFAULT 0,
  reason integer,
  unregistered_date timestamp without time zone,
  is_panelist_update_date timestamp without time zone,
  confirmation_update_date timestamp without time zone,
  no_panelist_reason integer,
  facebook_connect_status smallint,
  CONSTRAINT user_pkey PRIMARY KEY (id ),
  CONSTRAINT fk36ebcb26f1a196 FOREIGN KEY (site_country_id)
      REFERENCES countries (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_city_id FOREIGN KEY (city_id)
      REFERENCES cities (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_country_id FOREIGN KEY (country_id)
      REFERENCES countries (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_state_id FOREIGN KEY (state_id)
      REFERENCES states (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_survey_id FOREIGN KEY (survey_id)
      REFERENCES surveys (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT users_id_check CHECK (id >= 0)
)

Indices on:

  (creation_date );

  (panelist_update )
  WHERE panelist_update IS NOT NULL;

  (birth_date );

  (email COLLATE pg_catalog."default" );

  (lower(email::text) COLLATE pg_catalog."default" );

  (partner_id , creation_date );

  (site_id , country_id , city_id );

  (site_id , country_id , state_id );

  (username COLLATE pg_catalog."default" );


Re: Savepoints in transactions for speed?

From
Jeff Janes
Date:
On Thu, Nov 29, 2012 at 10:14 AM, Mike Blackwell <mike.blackwell@rrd.com> wrote:
>
>
>
> On Thu, Nov 29, 2012 at 12:09 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>
>>
>> But If you do keep the drop index inside the transaction, then you
>> would probably be better off using truncate rather than delete, and
>> rebuild the index non-concurrently and move that inside the
>> transaction as well.
>>
>
>
> Hmm....  From the 9.2 manual it seems that might not work out so well:
>
> TRUNCATE is not MVCC-safe (see Chapter 13 for general information about
> MVCC). After truncation, the table will appear empty to all concurrent
> transactions, even if they are using a snapshot taken before the truncation
> occurred.
>
> It looks like other transactions could find an empty table while it was
> being reloaded under that approach.

They would block during the load, it is just after the load that they
would see the table as empty.  I thought that that would only be a
problem for repeatable read or higher, but a test shows that read
committed has that problem as well.  But yeah, that could definitely
be a problem with that method.


Cheers,

Jeff


Re: Savepoints in transactions for speed?

From
Jeff Janes
Date:
On Thu, Nov 29, 2012 at 11:58 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Thu, Nov 29, 2012 at 3:32 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>>
>> I tried a quick test with 2M tuples and 3 indexes over int8, numeric,
>> and text (generated data). There was also an unindexed bytea column.
>> Using my laptop, a full update of the int8 column (which is indexed,
>> forcing cold updates) took less than 4 minutes.
>>
>> I'm sure there are other issues with real-world workloads, and I know
>> that it's wasteful compared to something that can make use of HOT
>> updates. But unless there is something I'm missing, it's not really
>> worth the effort to batch if that is the size of the update.
>
> On a pre-production database I have (that is currently idle), on a
> server with 4G RAM and a single SATA disk (probably similar to your
> laptop in that sense more or less, possibly more TPS since the HD rpm
> is 7k and your laptop probably is 5k), it's been running for half an
> hour and is still running (and I don't expect it to finish today if
> past experience is to be believed).

So probably Jeff Davis's indexes fit in RAM (or the part that can be
dirtied without causing thrashing), and yours do not.

But, does batching them up help at all?  I doubt it does.

Cheers,

Jeff