Re: Re: best practice for moving millions of rows to child table when setting up partitioning? - Mailing list pgsql-admin

From Greg Smith
Subject Re: Re: best practice for moving millions of rows to child table when setting up partitioning?
Date
Msg-id 4DBE2CB7.7020208@2ndQuadrant.com
Whole thread Raw
In response to Re: best practice for moving millions of rows to child table when setting up partitioning?  (Mark Stosberg <mark@summersault.com>)
List pgsql-admin
On 04/27/2011 03:35 PM, Mark Stosberg wrote:
> In particular, I wanted to check whether the UPDATE statement would
> alter all the rows automatically, or if the underlying trigger would
> cause all the rows processed a row at a time.
>
> It appears from my test that the result of the UPDATE was going to
> appear all at once. I'm worried about the resource implications of
> inserting mullions of rows all at once.
>

You can put a LIMIT on an UPDATE, same as any other type of query.  No
reason that style of migration must happen all at once, you can just
target a smaller number of rows at a time and slowly siphon rows over to
the children by iteration.

I don't see anything inherently wrong with the approach you're
proposing.  CREATE INDEX CONCURRENTLY does something similar to the
logic you've outlined--mark where data was inserted at, move over all
older data, then just copy over the new rows at the end.  The main
downsides of that is complexity and the need for downtime to get an
atomic swap to using the child tables.  You can't add them to the parent
until the original is gone, if you've inserted duplicate data into them.

I'm not sure what all that complexity buys you, compared to just adding
all the children, putting a limit on the UPDATE, and looping over that
with some delay after each iteration until it's finished if you want to
further control the rate.  There's usually no reason you have to be in a
rush to moving data over.  Relying on the database's transactional
scheme to avoid making any mistakes here--making it so a goof will
ROLLBACK--and avoiding any need for downtime are normally higher
priorities in a partition migration than making the move happen as fast
as possible.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


pgsql-admin by date:

Previous
From: Tom Lane
Date:
Subject: Re: Option shared_buffers in PostgreSQL
Next
From: Greg Smith
Date:
Subject: Re: archive_timeout behavior (8.4.6)