Thread: Data Warehousing

Data Warehousing

From
"Rob Kirkbride"
Date:
Hi,

I've got a postgres database collected logged data. This data I have to keep for at least 3 years. The data in the first instance is being recorded in a postgres cluster. This then needs to be moved a reports database server for analysis. Therefore I'd like a job to dump data on the cluster say every hour and record this is in the reports database. The clustered database could be purged of say data more than a week old.

So basically I need a dump/restore that only appends new data to the reports server database.

I've googled but can't find anything, can anyone help?

Thanks

Rob

Re: Data Warehousing

From
"Scott Marlowe"
Date:
On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:
> Hi,
>
> I've got a postgres database collected logged data. This data I have to keep
> for at least 3 years. The data in the first instance is being recorded in a
> postgres cluster. This then needs to be moved a reports database server for
> analysis. Therefore I'd like a job to dump data on the cluster say every
> hour and record this is in the reports database. The clustered database
> could be purged of say data more than a week old.
>
> So basically I need a dump/restore that only appends new data to the reports
> server database.
>
> I've googled but can't find anything, can anyone help?

You might find an answer in partitioning your data.  There's a section
in the docs on it.  Then you can just dump the old data from the
newest couple of partitions if you're partitioning by week, and dump
anything older with a simple delete where date < now() - interval '1
week' or something like that.

Re: Data Warehousing

From
"Andrej Ricnik-Bay"
Date:
On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:

> So basically I need a dump/restore that only appends new
> data to the reports server database.
I guess that will all depend on whether or not your
data has a record of the time it got stuck in the cluster
or not ... if there's no concept of a time-stamp attached
to the records as they get entered I don't think it can be
done.


> Thanks
>
> Rob
Cheers,
Andrej

Re: Data Warehousing

From
"Rob Kirkbride"
Date:
On 03/09/07, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:
> Hi,
>
> I've got a postgres database collected logged data. This data I have to keep
> for at least 3 years. The data in the first instance is being recorded in a
> postgres cluster. This then needs to be moved a reports database server for
> analysis. Therefore I'd like a job to dump data on the cluster say every
> hour and record this is in the reports database. The clustered database
> could be purged of say data more than a week old.
>
> So basically I need a dump/restore that only appends new data to the reports
> server database.
>
> I've googled but can't find anything, can anyone help?

You might find an answer in partitioning your data.  There's a section
in the docs on it.  Then you can just dump the old data from the
newest couple of partitions if you're partitioning by week, and dump
anything older with a simple delete where date < now() - interval '1
week' or something like that.


We're using hibernate to write to the database. Partitioning looks like it will be too much of a re-architecture. In reply to Andrej we do have a logged_time entity in the required tables. That being the case how does that help me with the tools provided?

Might I have to write a custom JDBC application to do the data migration?

Rob


Re: Data Warehousing

From
"Andrej Ricnik-Bay"
Date:
On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:

> We're using hibernate to write to the database. Partitioning looks like it
> will be too much of a re-architecture. In reply to Andrej we do have a
> logged_time entity in the required tables. That being the case how does that
> help me with the tools provided?
>
> Might I have to write a custom JDBC application to do the data migration?
That would be one option :}

If the server is on a Unix/Linux-platform you should be able
to achieve the result with a reasonably simple shell-script
and cron, I'd say.


> Rob
Cheers,
Andrej



--
Please don't top post, and don't use HTML e-Mail :}  Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

Re: Data Warehousing

From
Rob Kirkbride
Date:
Andrej Ricnik-Bay wrote:
> On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:
>
>
>> We're using hibernate to write to the database. Partitioning looks like it
>> will be too much of a re-architecture. In reply to Andrej we do have a
>> logged_time entity in the required tables. That being the case how does that
>> help me with the tools provided?
>>
>> Might I have to write a custom JDBC application to do the data migration?
>>
> That would be one option :}
>
> If the server is on a Unix/Linux-platform you should be able
> to achieve the result with a reasonably simple shell-script
> and cron, I'd say.
>
>
I am on a Linux platform but I'm going to need some pointers regarding
the cron job. Are you suggesting that I parse the dump file? I assume I
would need to switch to using inserts and then parse the dump looking
for where I need to start from?




Re: Data Warehousing

From
"Andrej Ricnik-Bay"
Date:
On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:

> I am on a Linux platform but I'm going to need some pointers regarding
> the cron job. Are you suggesting that I parse the dump file? I assume I
> would need to switch to using inserts and then parse the dump looking
> for where I need to start from?
The question is: how complex is the data you need to
extract?  I guess where I was heading was to run a
select with the interval Scott described from psql into
a file, and then copy-from that into the analysis database.

However, if the structure is more complex, if you needed
to join tables, the parsing of a dump-file may be an option,
even though (always retaining a weeks worth) might make
that into quite some overhead.


Cheers,
Andrej



--
Please don't top post, and don't use HTML e-Mail :}  Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

Re: Data Warehousing

From
Ken.Colson@sage.com
Date:
>I am on a Linux platform but I'm going to need some pointers regarding
>the cron job. Are you suggesting that I parse the dump file? I assume I
>would need to switch to using inserts and then parse the dump looking
>for where I need to start from?

Something that you may want to consider is dblink from contrib.  We have a
similar situation for the archiving of collected data and have been able to
implement a fairly easy solution that does not require the parsing of dump
files, just a simple(ish) query based on the time inserted.

-Ken




---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: Data Warehousing

From
melina386
Date:
Here's a link to the docs for rskeymgmt, a command line utility for changing
the key used to access the catalog.

http://msdn2.microsoft.com/en-us/library/aa179504(SQL.80).aspx

You might also need to use the rsactivate, and rsconfig utilities to get
everything working.
-------------------------------
[url=http://e-datapro.net/]Data entry india[/url]
--
View this message in context: http://old.nabble.com/Data-Warehousing-tp12457670p26515230.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.