Thread: Merging Data from Multiple DB

Merging Data from Multiple DB

From

"anon permutation"

Date:

03 January 2005, 09:55:08

Hi,

I am working on a application and I need some advice on the data warehousing
aspect of it.  The situation is like this:  I have an application that will
be used at each one of my branch offices.  Each branch office is autonomous
and each has its own instance of the db.  There is currently no interaction
between branches.

However, I need to do centralized reporting.  This means I need a way to
merge all the data from the branches into the HQ central data warehouse.
Obviously, there is a hugh problem with duplicated primary keys.

For performance reasons, each branch must has its own database and a
centralized transactional system is not an option.

I was considering just centralizing primary keys generation, but that seems
very slow too.

Segmenting primary keys among the branches is doable, but it is too much of
a maintainence nightmare.

What do you suggest?

Thanks.

Re: Merging Data from Multiple DB

From

Pierre-Frédéric Caillaud

Date:

03 January 2005, 10:10:36

> What do you suggest?

    In your reporting tool, I suggest you to use (branch_id,
local_primary_key) as a primary key. This way they are guaranteed unique.

>
> Thanks.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>

Re: Merging Data from Multiple DB

From

Joe Conway

Date:

03 January 2005, 15:21:03

anon permutation wrote:
> For performance reasons, each branch must has its own database and a
> centralized transactional system is not an option.
>
> I was considering just centralizing primary keys generation, but that
> seems very slow too.
>
> Segmenting primary keys among the branches is doable, but it is too much
> of a maintainence nightmare.
>
> What do you suggest?

We have a similar application. What we did is this:

1. Each database instance is assigned a unique identifier, stored in a 1
row, 1 column table (with a trigger to ensure it stays that way).

2. Write a function that can take two integers, convert them to text,
and concatenate them. In our case we convert to hex and concatenate with
a delimiter character.

3. Write another function, called something like 'nextrowid', that takes
a sequence name as its argument. Use the sequence name to get the next
value from the sequence, lookup the local unique identifier from the
table defined in #1, and pass them both to the function defined in #2.

4. Use nextrowid('seq_name') to generate your primary keys.

HTH,

Joe

Re: Merging Data from Multiple DB

From

Pierre-Frédéric Caillaud

Date:

03 January 2005, 17:20:26

    You could also say you can have 2^N databases and 2^(63-N) records in
each database, and use a BIGSERIAL with the N higher bits pointing to the
DB number, and the 63-N lower bits being the actual serial... faster than
strings for indexing, and you init the serial to start at the first value
for this database, and to stop at the last value for this database... just
another way to generate GUID's...


On Mon, 03 Jan 2005 07:20:54 -0800, Joe Conway <mail@joeconway.com> wrote:

> anon permutation wrote:
>> For performance reasons, each branch must has its own database and a
>> centralized transactional system is not an option.
>>  I was considering just centralizing primary keys generation, but that
>> seems very slow too.
>>  Segmenting primary keys among the branches is doable, but it is too
>> much of a maintainence nightmare.
>>  What do you suggest?
>
> We have a similar application. What we did is this:
>
> 1. Each database instance is assigned a unique identifier, stored in a 1
> row, 1 column table (with a trigger to ensure it stays that way).
>
> 2. Write a function that can take two integers, convert them to text,
> and concatenate them. In our case we convert to hex and concatenate with
> a delimiter character.
>
> 3. Write another function, called something like 'nextrowid', that takes
> a sequence name as its argument. Use the sequence name to get the next
> value from the sequence, lookup the local unique identifier from the
> table defined in #1, and pass them both to the function defined in #2.
>
> 4. Use nextrowid('seq_name') to generate your primary keys.
>
> HTH,
>
> Joe
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
>       joining column's datatypes do not match
>

Re: Merging Data from Multiple DB

From

"anon permutation"

Date:

06 January 2005, 09:39:20

Thank you for the feedback.  I am considering just centralizing pk
generation to the HQ.  Basically, every so often, the branch will connect
back to HQ to get back a block of pk for each table.  This way, performance
impact is limited and maintainence is simple.

thanks.


>From: Joe Conway <mail@joeconway.com>
>To: anon permutation <anonpermutation@hotmail.com>
>CC: pgsql-general@postgresql.org
>Subject: Re: [GENERAL] Merging Data from Multiple DB
>Date: Mon, 03 Jan 2005 07:20:54 -0800
>
>anon permutation wrote:
>>For performance reasons, each branch must has its own database and a
>>centralized transactional system is not an option.
>>
>>I was considering just centralizing primary keys generation, but that
>>seems very slow too.
>>
>>Segmenting primary keys among the branches is doable, but it is too much
>>of a maintainence nightmare.
>>
>>What do you suggest?
>
>We have a similar application. What we did is this:
>
>1. Each database instance is assigned a unique identifier, stored in a 1
>row, 1 column table (with a trigger to ensure it stays that way).
>
>2. Write a function that can take two integers, convert them to text, and
>concatenate them. In our case we convert to hex and concatenate with a
>delimiter character.
>
>3. Write another function, called something like 'nextrowid', that takes a
>sequence name as its argument. Use the sequence name to get the next value
>from the sequence, lookup the local unique identifier from the table
>defined in #1, and pass them both to the function defined in #2.
>
>4. Use nextrowid('seq_name') to generate your primary keys.
>
>HTH,
>
>Joe