Re: Update table performance - Mailing list pgsql-performance

From Decibel!
Subject Re: Update table performance
Date
Msg-id 20070808172814.GJ20424@nasby.net
Whole thread Raw
In response to Re: Update table performance  (Erik Jones <erik@myemma.com>)
List pgsql-performance
On Tue, Aug 07, 2007 at 08:46:20PM -0500, Erik Jones wrote:
> Vertical partitioning is where you split up your table on disk by
> columns, i.e on the vertical lines.  He quoted it because Postgres
> doesn't actually support it transparently but you can always fake it
> by splitting up your table.  For example, given the following table
> wherein column bar gets updated a lot but the others don't:
>
> create table foo (
> id    int     not null,
> bar    int,
> baz     int,
>
> primary key (id)
> );
>
> You could split it up like so:
>
> create table foo_a (
> id     int,
> baz    int,
>
> primary key (id)
> );
>
> create table foo_b (
> foo_id    int,
> bar        int,
>
> foreign key foo_a_id (foo_id) references foo_a (id)
> );

FWIW, the cases where I've actually used this have been on much wider
tables, and a number of the attributes are in-frequently accessed. An
example would be if you keep snail-mail address info for users; you
probably don't use those fields very often, so they would be good
candidates for going into a second table.

When does it actually make sense to use this? When you do a *lot* with a
small number of fields in the table. In this example, perhaps you very
frequently need to look up either user_name or user_id, probably via
joins. Having a table with just name, id, perhaps password and a few
other fields might add up to 50 bytes per row (with overhead), while
address information by itself could easily be 50 bytes. So by pushing
that out to another table, you cut the size of the main table in half.
That means more efficient use of cache, faster seqscans, etc.

The case Erik is describing is more unique to PostgreSQL and how it
handles MVCC. In some cases, splitting a frequently updated row out to a
separate table might not gain as much once we get HOT, but it's still a
good tool to consider. Depending on what you're doing another useful
technique is to not update the field as often by logging updates to be
performed into a separate table and periodically processing that
information into the main table.
--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Attachment

pgsql-performance by date:

Previous
From: Erik Jones
Date:
Subject: Re: Update table performance
Next
From: Vivek Khera
Date:
Subject: Re: When/if to Reindex