Re: Update table performance - Mailing list pgsql-performance

From Erik Jones
Subject Re: Update table performance
Date
Msg-id 1C6EEA75-603F-4B02-8127-920F17AC5411@myemma.com
Whole thread Raw
In response to Re: Update table performance  (Mark Makarowsky <bedrockconstruction@yahoo.com>)
Responses Re: Update table performance
Re: Update table performance
List pgsql-performance
On Aug 7, 2007, at 6:13 PM, Mark Makarowsky wrote:

> Can you provide more detail on what you mean by your
> two suggestions below:
>
> Yeah, I've used "vertical partitioning" very
> successfully in the past, though I've never done it
> for just a single field. I'll typically leave the few
> most common fields in the "main" table and pull
> everything else into a second table.

Vertical partitioning is where you split up your table on disk by
columns, i.e on the vertical lines.  He quoted it because Postgres
doesn't actually support it transparently but you can always fake it
by splitting up your table.  For example, given the following table
wherein column bar gets updated a lot but the others don't:

create table foo (
id    int     not null,
bar    int,
baz     int,

primary key (id)
);

You could split it up like so:

create table foo_a (
id     int,
baz    int,

primary key (id)
);

create table foo_b (
foo_id    int,
bar        int,

foreign key foo_a_id (foo_id) references foo_a (id)
);

The reason you'd ever want to do this is that when Postgres goes to
update a row what it actually does is inserts a new row with the new
value(s) that you changed and marks the old one as deleted.  So, if
you have a wide table and frequently update only certain columns,
you'll take a performance hit as you're having to re-write a lot of
static values.

>
> I should mention that if you can handle splitting the
> update into multiple transactions, that will help a
> lot since it means you won't be doubling the size of
> the table.

As I mentioned above, when you do an update you're actually inserting
a new row and deleting the old one.  That deleted row is still
considered part of the table (for reasons of concurrency, read up on
the concurrency chapter in the manual for the details) and once it is
no longer visible by any live transactions can be re-used by future
inserts.  So, if you update one column on every row of a one million
row table all at once, you have to allocate and write out one million
new rows.  But, if you do the update a quarter million at a time, the
last three updates would be able to re-use many of the rows deleted
in earlier updates.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



pgsql-performance by date:

Previous
From: Mark Makarowsky
Date:
Subject: Re: Update table performance
Next
From: Heikki Linnakangas
Date:
Subject: Re: Update table performance