Thread: Suggestions for a data-warehouse migration routine

Suggestions for a data-warehouse migration routine

From
Richard Rowell
Date:
I've ported enough of my companies database to Postgres to make
warehousing on PG a real possibility.  I thought I would toss my data
migration architecture ideas out for the list to shoot apart..

1.  Script on production server dumps the production database (MSSQL) to
a set of delimited text files.
2.  Script on production server moves files via FTP to a Postgres
database server.
3.  File Alteration Monitor trigger on PG server executes script when
last file is transferred.
4.  Script on PG server drops the target database (by issuing a "dropdb"
command).
5.  Script on PG server re-creates target database. (createdb command)
6.  Script on PG server re-creates the tables.
7.  Script on PG server issues COPY commands to import data.
8.  Script on PG server indexes tables.
9.  Script on PG server builds de-normalized reporting tables.
10. Script on PG server indexes the reporting tables.
11. Script on PG server creates needed reporting functions.
12. Vacuum analyze?

My question revolves around the drop/create for the database.  Is their
significant downside to this approach?  I'm taking this approach because
it is simpler from a scripting point of view to simply start from
scratch on each warehouse update.  If I do not drop the database I would
need to delete the contents of each table and drop all indexes prior to
the COPY/data import.  My assumption is all the table deletes and index
drops would be more expensive then just droping/re-creating the entire
database.

Also, is the Vacuum analyze step needed on a freshly minted database
where the indexes have all been newly created?

Thanks in advance for all feedback.


--


Re: Suggestions for a data-warehouse migration routine

From
John A Meinel
Date:
Richard Rowell wrote:
> I've ported enough of my companies database to Postgres to make
> warehousing on PG a real possibility.  I thought I would toss my data
> migration architecture ideas out for the list to shoot apart..
>
> 1.  Script on production server dumps the production database (MSSQL) to
> a set of delimited text files.
> 2.  Script on production server moves files via FTP to a Postgres
> database server.
> 3.  File Alteration Monitor trigger on PG server executes script when
> last file is transferred.
> 4.  Script on PG server drops the target database (by issuing a "dropdb"
> command).
> 5.  Script on PG server re-creates target database. (createdb command)
> 6.  Script on PG server re-creates the tables.
> 7.  Script on PG server issues COPY commands to import data.
> 8.  Script on PG server indexes tables.
> 9.  Script on PG server builds de-normalized reporting tables.
> 10. Script on PG server indexes the reporting tables.
> 11. Script on PG server creates needed reporting functions.
> 12. Vacuum analyze?
>
> My question revolves around the drop/create for the database.  Is their
> significant downside to this approach?  I'm taking this approach because
> it is simpler from a scripting point of view to simply start from
> scratch on each warehouse update.  If I do not drop the database I would
> need to delete the contents of each table and drop all indexes prior to
> the COPY/data import.  My assumption is all the table deletes and index
> drops would be more expensive then just droping/re-creating the entire
> database.

I believe you are correct. If you are going to completely wipe the
database, just drop it and re-create. Deleting is much slower than
dropping. (One of the uses of partitioning is so that you can just drop
one of the tables, rather than deleting the entries). Dropping the whole
db skips any Foreign Key checks, etc.

>
> Also, is the Vacuum analyze step needed on a freshly minted database
> where the indexes have all been newly created?
>
> Thanks in advance for all feedback.

ANALYZE is needed, since you haven't updated any of your statistics yet.
So the planner doesn't really know how many rows there are.

VACUUM probably isn't since everything should be pretty well aligned.

John
=:->

Attachment

Re: Suggestions for a data-warehouse migration routine

From
Mischa Sandberg
Date:
Quoting Richard Rowell <richard@bowmansystems.com>:

> I've ported enough of my companies database to Postgres to make
> warehousing on PG a real possibility.  I thought I would toss my
> data
> migration architecture ideas out for the list to shoot apart..
>
[...]
Not much feedback required.

Yes, dropping the entire database is faster and simpler.
If your database is small enough that you can rebuild it from scratch
every time, go for it.

Yes, vacuum analyze required; creating indexes alone does not create
statistics.

From a I'd dump an extract of pg_stat[io_]user_(tables|indexes)
to see how index usage and table load changes over time.
--
"Dreams come true, not free." -- S.Sondheim, ITW