Thread: [Fwd: [PORTS] M$ SQL server DTS package equivalent in Postgres]

[Fwd: [PORTS] M$ SQL server DTS package equivalent in Postgres]

From
Paul Lambert
Date:
Sorry, posted this to the wrong list :(<br /><br /> -------- Original Message -------- <table border="0"
cellpadding="0"cellspacing="0" class="moz-email-headers-table"><tbody><tr><th align="right" nowrap
valign="baseline">Subject:</th><td>[PORTS] M$ SQL server DTS package equivalent in Postgres</td></tr><tr><th
align="right"nowrap valign="baseline">Date: </th><td>Tue, 23 Jan 2007 10:15:06 +0900</td></tr><tr><th align="right"
nowrapvalign="baseline">From: </th><td>Paul Lambert <a class="moz-txt-link-rfc2396E"
href="mailto:paul.lambert@autoledgers.com.au"><paul.lambert@autoledgers.com.au></a></td></tr><tr><thalign="right"
nowrapvalign="baseline">To: </th><td><a class="moz-txt-link-abbreviated"
href="mailto:pgsql-ports@postgresql.org">pgsql-ports@postgresql.org</a></td></tr></tbody></table><br/><br
/><pre>G'day,

Is there an equivalent in Postgres to the DTS Packages available in M$ 
SQL server.

I use these in SQL server to pre-load data from CSV files prior to 
enabling replication from my primary application. Any pointers on where 
best to go for this would be appreciated. I'm reading about something 
called EMS, is that the way to go?

Sample of one of the files I use:


DEALER_ID^DATE_CHANGED^TIME_CHANGED^BILLING_CODE_ID^DES^BILLING_CODE_TYPE^LABOUR_RATE^LABOUR_SALES_GROUP^CUSTOMER_NO^PARTS_SALES_GRO
f UP^COMEBACK^WORKSHOP^FRANCHISE^LOCATION^DELETEFLAG
F65^23-Jan-2007^10:13^AA^ADVERSITING ADMIN^I^45^40^2196^18^^0^BLANK^0^
F65^23-Jan-2007^10:13^AN^ADV NEW^I^45^40^1636^18^^0^BLANK^0^
F65^23-Jan-2007^10:13^AP^ADV PARTS^I^45^40^1919^18^^0^BLANK^0^
F65^23-Jan-2007^10:13^AS^ADV SERV^I^45^40^2057^18^^0^BLANK^0^
F65^23-Jan-2007^10:13^AU^ADV USED^I^45^40^1775^18^^0^BLANK^0^N
F65^23-Jan-2007^10:13^BA^B R&M ADM^I^45^40^2823^18^^0^BLANK^0^
F65^23-Jan-2007^10:13^BG^BUILDING MAINTENANCE GM 
HOLDEN^I^45^40^1311^18^^0^BLANK^0^
F65^23-Jan-2007^10:13^BN^B R&M NEW^I^45^40^2268^18^^0^BLANK^0^
F65^23-Jan-2007^10:13^BP^B R&M PART^I^45^40^2541^18^^0^BLANK^0^
F65^23-Jan-2007^10:13^BS^B R&M SERV^I^45^40^2680^18^^0^BLANK^0^
F65^23-Jan-2007^10:13^BU^B R&M USED^I^45^40^2401^18^^0^BLANK^0^
F65^23-Jan-2007^10:13^F^FLEET^C^50^27^0^17^^0^BLANK^0^
F65^23-Jan-2007^10:13^FC^FORD COMEBACK MECHANIC^I^65^21^140^19^Y^0^BLANK^0^

Cheers,
Paul.

-- 
Paul Lambert
Database Administrator
AutoLedgers



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


</pre><br /><pre class="moz-signature" cols="72">-- 
Paul Lambert
Technical Support Team Leader and Database Administrator
AutoLedgers
Level 3, 823 Wellington Street, West Perth, W.A. 6005
Postal: P.O. Box 106, West Perth, W.A. 6872
Ph: 08 9217 5086 Fax: 08 9217 5055
AutoLedgers Technical Support Desk: 1800 649 987 (Free call) 08 9217 5050 (Perth local and mobile)
Email: <a class="moz-txt-link-abbreviated"
href="mailto:paul.lambert@autoledgers.com.au">paul.lambert@autoledgers.com.au</a><a class="moz-txt-link-rfc2396E"
href="http://www.reynolds.com.au"><http://www.reynolds.com.au></a>
------------------------------------------------------------------------------------
For AutoLedgers technical support, please send an email to <a class="moz-txt-link-abbreviated"
href="mailto:helpdesk@autoledgers.com.au">helpdesk@autoledgers.com.au</a>.

</pre>

Re: [Fwd: [PORTS] M$ SQL server DTS package equivalent in Postgres]

From
"Tomi N/A"
Date:

2007/1/23, Paul Lambert <paul.lambert@autoledgers.com.au>:
G'day,

Is there an equivalent in Postgres to the DTS Packages available in M$
SQL server.

I use these in SQL server to pre-load data from CSV files prior to
enabling replication from my primary application. Any pointers on where
best to go for this would be appreciated. I'm reading about something
called EMS, is that the way to go?

Sample of one of the files I use:

DEALER_ID^DATE_CHANGED^TIME_CHANGED^BILLING_CODE_ID^DES^BILLING_CODE_TYPE^LABOUR_RATE^LABOUR_SALES_GROUP^CUSTOMER_NO^PARTS_SALES_GRO
f UP^COMEBACK^WORKSHOP^FRANCHISE^LOCATION^DELETEFLAG
F65^23-Jan-2007^10:13^AA^ADVERSITING ADMIN^I^45^40^2196^18^^0^BLANK^0^
F65^23-Jan-2007^10:13^AN^ADV NEW^I^45^40^1636^18^^0^BLANK^0^
F65^23-Jan-2007^10:13^AP^ADV PARTS^I^45^40^1919^18^^0^BLANK^0^
F65^23-Jan-2007^10:13^AS^ADV SERV^I^45^40^2057^18^^0^BLANK^0^
F65^23-Jan-2007^10:13^AU^ADV USED^I^45^40^1775^18^^0^BLANK^0^N


Paul,
what you're looking for exists in a number of variations.
You can use a good text editor with the postgres' COPY command for simple bulk .csv loading, but in the long run, you might want to consider a full-fledged ETL tool like the Pentaho Data Integration suite.  It allows you to use a wide variety of data sources (anything with a jdbc driver, .xls, .csv, .xml for starters), chain data transformations, store them and execute them on a regular basis or on-demand from a java app etc. A number of such tools are available, but I suggested Pentaho Data Integration as I have experience with it and because it is under an open source licence. If anyone can suggest a better tool for the job, please say so: I for one would like to have my horizons broadened. :)

Cheers,
t.n.a.

Re: [Fwd: [PORTS] M$ SQL server DTS package equivalent in Postgres]

From
"Merlin Moncure"
Date:
On 1/26/07, Tomi N/A <hefest@gmail.com> wrote:
> 2007/1/23, Paul Lambert <paul.lambert@autoledgers.com.au>:
> > Is there an equivalent in Postgres to the DTS Packages available in M$
> > SQL server.

> what you're looking for exists in a number of variations.
> You can use a good text editor with the postgres' COPY command for simple
> bulk .csv loading, but in the long run, you might want to consider a

In my opinion, if your input data is in well-formed csv, you don't
really need much of anything.  Make a table(s) with all text columns
which will accept the csv data from the copy statement.  After that,
write queries to insert...select data from your import tables into the
actual tables holding the data doing all the appropriate casting
in-query.

Besides being easy to schedule and very flexible, manipulating data
with queries is extremely powerful and fairly easy to maintain
assuming you know a little SQL -- thanks to postgresql's huge array of
built in string manipulation functions.  Your skills learned here will
pay off using the database as well for other things.

Not only that, but this approach will be fast since it is declarative
and handles entire tables at once as opposed to DTS-ish solutions
which tend to do processing record by record.  Not to mention they are
overcomplicated and tend to suck.  (DTS does have the ability to read
from any ODBC source which is nice...but that does not apply here).
In fact, my favorite use for DTS is to convert databases out of
Microsoft SQL server and (ugh!) Access, a task which it excels
at...but the real magic here is in the ODBC driver, not DTS.

Worst case scenario is you have to do some preprocessing in C or perl
on the csv document if it is not completely well formed and blows up
postgresql's copy statement.  In other words, you don't need a data
processor, PostgreSQL *is* a data processor.

merlin

Re: [Fwd: [PORTS] M$ SQL server DTS package equivalent in Postgres]

From
"Tomi N/A"
Date:
> Besides being easy to schedule and very flexible, manipulating data
> with queries is extremely powerful and fairly easy to maintain
> assuming you know a little SQL -- thanks to postgresql's huge array of
> built in string manipulation functions.  Your skills learned here will
> pay off using the database as well for other things.
>
> Not only that, but this approach will be fast since it is declarative
> and handles entire tables at once as opposed to DTS-ish solutions
> which tend to do processing record by record.  Not to mention they are
> overcomplicated and tend to suck. (DTS does have the ability to read
> from any ODBC source which is nice...but that does not apply here).

Different strokes for different folks, it seems.
I'd argue that COPY followed by a barrage of plpgsql statements can't
be used for anything but the most trivial data migration cases (where
it's invaluable) where you have line-organized data input for a
hand-full of tables at most.
In my experience (which is probably very different from anyone
else's), most real world situations include data from a number of very
different sources, ranging from the simplest (.csv and, arguably,
.xml) to the relatively complex (a couple of proprietary databases,
lots of tables, on-the fly row merging, splitting or generating
primary keys, date format problems and general pseudo-structured,
messed up information).
Once you've got your data in your target database (say, pgsql), using
SQL to manipulate the data makes sense, but it is only the _final_
step of an average, real world data transformation.

Cheers,
t.n.a.

Re: [Fwd: [PORTS] M$ SQL server DTS package equivalent in

From
"Jeremy Haile"
Date:
I've also used Pentaho Data Integration (previously known as Kettle)
quite extensively, and can recommend it.  It supports many different
databases and has fairly good documentation (although thin in some
areas).  It has a GUI drag-and-drop tool that can be used to configure
transformations and is very flexible.  It also has an active community
that responds when you have issues.

I use it as part of a regular job that runs every 5 minutes and hourly
to copy and transform data from a SQL Server DB to a PostgreSQL DB.  I
use COPY when I can simply select data into a CSV and load it into
another DB - but as Tomi said, when you have to do primary key
generation, row merging, data cleanup, and data transformations - I
would use some sort of ETL tool over just SQL.

My 2 cents,
Jeremy Haile


On Fri, 26 Jan 2007 15:14:22 +0000, "Tomi N/A" <hefest@gmail.com> said:
> > Besides being easy to schedule and very flexible, manipulating data
> > with queries is extremely powerful and fairly easy to maintain
> > assuming you know a little SQL -- thanks to postgresql's huge array of
> > built in string manipulation functions.  Your skills learned here will
> > pay off using the database as well for other things.
> >
> > Not only that, but this approach will be fast since it is declarative
> > and handles entire tables at once as opposed to DTS-ish solutions
> > which tend to do processing record by record.  Not to mention they are
> > overcomplicated and tend to suck. (DTS does have the ability to read
> > from any ODBC source which is nice...but that does not apply here).
>
> Different strokes for different folks, it seems.
> I'd argue that COPY followed by a barrage of plpgsql statements can't
> be used for anything but the most trivial data migration cases (where
> it's invaluable) where you have line-organized data input for a
> hand-full of tables at most.
> In my experience (which is probably very different from anyone
> else's), most real world situations include data from a number of very
> different sources, ranging from the simplest (.csv and, arguably,
> .xml) to the relatively complex (a couple of proprietary databases,
> lots of tables, on-the fly row merging, splitting or generating
> primary keys, date format problems and general pseudo-structured,
> messed up information).
> Once you've got your data in your target database (say, pgsql), using
> SQL to manipulate the data makes sense, but it is only the _final_
> step of an average, real world data transformation.
>
> Cheers,
> t.n.a.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/

Re: [Fwd: [PORTS] M$ SQL server DTS package equivalent

From
Paul Lambert
Date:
Thanks all for your tips and pointers.

Looking at copy I think it may do just what I need. The tables I load
the data into have the same columns in the same order as those in the
CSV file. Loading data in this manner is going to be a rare occurance -
just when we install a new customer site and need to do an initial
transfer of data from the main system before we switch on my real-time
replication program. The programs that extract these csv files already
take care of duplicate key checking and so forth, so there shouldn't be
any issues as far as data integrity checking goes. I.e. there's no
actual data transformation, row merging and the like.

Thanks again to everyone who's offered some advice, much appreciated.

Regards,
Paul.