Need help optimizing DB migration - Mailing list pgsql-general
From | Steve Frampton |
---|---|
Subject | Need help optimizing DB migration |
Date | |
Msg-id | Pine.LNX.4.33.0108020650020.420-100000@astro.phpwebhosting.com Whole thread Raw |
List | pgsql-general |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello: I've written some ODBC-based code in C which does a migration between two given DB's. I need to do this with code as opposed to a straight pg_dump, because DB "A" represents state transitions and billable transactions, while DB "B" is a billing database, and therefore only appropriate columns should be inserted or updated in DB "B". For example, user Joe Blow has never made a purchase before. His transaction results in a demographic record being inserted into DB "A", as well as a charge transaction of what he has purchased. At night, my custom migration facility runs, this data is inserted into DB "B", and purged from DB "A". Next time, Joe Blow comes back and makes another purchase. He's already been recognized as a valid user, but wants to use a different credit card. Therefore, the demographic record on "A" consists of a mostly blank record, but with his primary key and new credit card number fields set. Again, a charge transaction of what he has purchased is inserted into another table. That night, my custom migration facility runs, notices Joe Blow's demographic information already exists in "B" and therefore only updated the non-empty fields from "A". The charge transaction is inserted as it is a new transaction. My migration works fine, but it is SLOOOOOOOOOW. I am hoping there is a way to optimize it. Here is how my migration facility basically works: - - prior to running the batch job, all outstanding records in DB "A" have a "processed_ind" set to "M" (for "migration"). - - the ODBC-based batch job starts, and turns OFF auto-commit - - the batch job first sets all "M" records to "P" (processing) - - the batch job selects all "P" records into an ODBC cursor - for each record, any empty columns are removed from the data structure so they won't modify existing "good" values if they are present in "B" - for each record, it constructs a "WHERE" clause so it can see if a record already exists in the DB "B" (with a SELECT COUNT(*)). - if the record exists in "B", it is updated, otherwise it is inserted - - after all records have been processed, their processed_ind status is set to "F" (finished processing) - - after the batch job completes, all "F" records are deleted from "A". As expected, the switch from "A" to "M" then "M" to "P", then "P" to "F" perform fairly quickly (although much slower than I would have expected). A lot of the overhead appears to be from PostgreSQL's multiple row versioning. Is there a way to use an Oracle-like simple rollback segment which would probably speed things up? Is there any way I can optimize my migration routine? (Earlier, I was setting the "P" status to "F", inserting/updating, and committing each transaction if successful. This was _INCREDIBLY SLOW_ and I have a factor of n speedup, but perhaps I can make it even faster). Any advice would be appreciated. - ---------------< LINUX: The choice of a GNU generation. >------------- Steve Frampton <frampton@LinuxNinja.com> http://www.LinuxNinja.com GNU Privacy Guard ID: D055EBC5 (see http://www.gnupg.org for details) GNU-PG Fingerprint: EEFB F03D 29B6 07E8 AF73 EF6A 9A72 F1F5 D055 EBC5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.0 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE7aTRHmnLx9dBV68URAsnyAJ4qxYrhzUnMNjOs7aXhxiQsBjrFKwCgk8BW bIqWXt7wq7RJzuTLjOVnxyw= =o6dX -----END PGP SIGNATURE-----
pgsql-general by date: