Need help optimizing DB migration - Mailing list pgsql-general

From Steve Frampton
Subject Need help optimizing DB migration
Date
Msg-id Pine.LNX.4.33.0108020650020.420-100000@astro.phpwebhosting.com
Whole thread Raw
List pgsql-general
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello:

I've written some ODBC-based code in C which does a migration between two
given DB's.  I need to do this with code as opposed to a straight pg_dump,
because DB "A" represents state transitions and billable transactions,
while DB "B" is a billing database, and therefore only appropriate columns
should be inserted or updated in DB "B".

For example, user Joe Blow has never made a purchase before.  His
transaction results in a demographic record being inserted into DB "A", as
well as a charge transaction of what he has purchased.  At night, my
custom migration facility runs, this data is inserted into DB "B", and
purged from DB "A".

Next time, Joe Blow comes back and makes another purchase.  He's already
been recognized as a valid user, but wants to use a different credit card.
Therefore, the demographic record on "A" consists of a mostly blank
record, but with his primary key and new credit card number fields set.
Again, a charge transaction of what he has purchased is inserted into
another table.  That night, my custom migration facility runs, notices Joe
Blow's demographic information already exists in "B" and therefore only
updated the non-empty fields from "A".  The charge transaction is inserted
as it is a new transaction.

My migration works fine, but it is SLOOOOOOOOOW.  I am hoping there is a
way to optimize it.  Here is how my migration facility basically works:

- - prior to running the batch job, all outstanding records in DB "A" have a
"processed_ind" set to "M" (for "migration").

- - the ODBC-based batch job starts, and turns OFF auto-commit
- - the batch job first sets all "M" records to "P" (processing)
- - the batch job selects all "P" records into an ODBC cursor
  - for each record, any empty columns are removed from the data structure
    so they won't modify existing "good" values if they are present in "B"
  - for each record, it constructs a "WHERE" clause so it can see if
    a record already exists in the DB "B" (with a SELECT COUNT(*)).
  - if the record exists in "B", it is updated, otherwise it is inserted
- - after all records have been processed, their processed_ind status is set
  to "F" (finished processing)

- - after the batch job completes, all "F" records are deleted from "A".

As expected, the switch from "A" to "M" then "M" to "P", then "P" to "F"
perform fairly quickly (although much slower than I would have expected).

A lot of the overhead appears to be from PostgreSQL's multiple row
versioning.  Is there a way to use an Oracle-like simple rollback segment
which would probably speed things up?

Is there any way I can optimize my migration routine?  (Earlier, I was
setting the "P" status to "F", inserting/updating, and committing each
transaction if successful.  This was _INCREDIBLY SLOW_ and I have a factor
of n speedup, but perhaps I can make it even faster).

Any advice would be appreciated.

- ---------------< LINUX: The choice of a GNU generation. >-------------
Steve Frampton   <frampton@LinuxNinja.com>   http://www.LinuxNinja.com
GNU Privacy Guard ID: D055EBC5  (see http://www.gnupg.org for details)
GNU-PG Fingerprint: EEFB F03D 29B6 07E8 AF73  EF6A 9A72 F1F5 D055 EBC5
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.0 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE7aTRHmnLx9dBV68URAsnyAJ4qxYrhzUnMNjOs7aXhxiQsBjrFKwCgk8BW
bIqWXt7wq7RJzuTLjOVnxyw=
=o6dX
-----END PGP SIGNATURE-----


pgsql-general by date:

Previous
From: tankgirl@worldonline.es
Date:
Subject: RE : Re: Re: Asking for some PL/pgSQL Tips
Next
From: Tom Lane
Date:
Subject: Re: RE : Re: Asking for some PL/pgSQL Tips