Thread: Replication for a large database

Replication for a large database

From

Michael A Nachbaur

Date:

05 May 2003, 15:52:35

Hello all,

I apologize if this has already been covered in the past, but I couldn't seem 
to find an adequate solution to my problem in the archives.

I have a database that is used for a bandwidth tracking system at a broadband 
ISP.  To make a long story short, I'm inserting over 800,000 records per day 
into this database.  Suffice to say, the uptime of this database is of 
paramount importance, so I would like to have a more up-to-date backup copy 
of my database in the event of a failure (more recent than my twice-per-day 
db_dump backup).

I have two servers, both Dual Xeon-2G with 4G of RAM, and would like to 
replicate between the two.  I would like to have "live" replication, but I 
couldn't seem to find a solution for that for PostgreSQL.  I tried RServ but, 
after attempting it, I saw a mailing list posting saying that it is 
more-or-less useless for databases that have a large number of inserts (like 
mine).

When I perform a replication after a batch of data is inserted, the query runs 
literally for hours before it returns.  I have never actually been present 
during the whole replication duration since it takes longer than my 8-12 hour 
days here at work.

Is there any replication solution that would fit my needs?  I'm taking 
advantage of some PG7.2 features so "downgrading" to the 6.x version of 
postgres that has replication support isn't an option.

Thanks.

--man

Re: Replication for a large database

From

"Ryan"

Date:

05 May 2003, 17:12:53

Ok, mabye this is just because I'm coming from a layman's perspective
regarding enterprise level databases, but couldn't you fake replication
by inserting the data into both databases?  (granted this involves
having source access to the program doing the insertion.)

It may be a kludge, but it would work until something better came along.

Ryan

> Hello all,
>
> I apologize if this has already been covered in the past, but I
> couldn't seem  to find an adequate solution to my problem in the
> archives.
>
> I have a database that is used for a bandwidth tracking system at a
> broadband  ISP.  To make a long story short, I'm inserting over
> 800,000 records per day  into this database.  Suffice to say, the
> uptime of this database is of  paramount importance, so I would like
> to have a more up-to-date backup copy  of my database in the event of
> a failure (more recent than my twice-per-day  db_dump backup).
>
> I have two servers, both Dual Xeon-2G with 4G of RAM, and would like
> to replicate between the two.  I would like to have "live"
> replication, but I  couldn't seem to find a solution for that for
> PostgreSQL.  I tried RServ but,  after attempting it, I saw a mailing
> list posting saying that it is  more-or-less useless for databases
> that have a large number of inserts (like  mine).
>
> When I perform a replication after a batch of data is inserted, the
> query runs  literally for hours before it returns.  I have never
> actually been present  during the whole replication duration since it
> takes longer than my 8-12 hour  days here at work.
>
> Is there any replication solution that would fit my needs?  I'm taking
> advantage of some PG7.2 features so "downgrading" to the 6.x version
> of postgres that has replication support isn't an option.
>
> Thanks.
>
> --man

Re: Replication for a large database

From

Michael Teter

Date:

05 May 2003, 18:19:55

Here's something interesting: http://www.objectweb.org/c-jdbc/index.html

It's alpha quality, and it's not "true" replication, but it might be
interesting to keep an eye on.  I've only read about it, so I can't claim any
personal experience.  Sounds like a good solution for this situation, if it
works as advertised.

MT

--- Ryan <pgsql-sql@seahat.com> wrote:
> Ok, mabye this is just because I'm coming from a layman's perspective
> regarding enterprise level databases, but couldn't you fake replication
> by inserting the data into both databases?  (granted this involves
> having source access to the program doing the insertion.)
> 
> It may be a kludge, but it would work until something better came along.
> 
> Ryan
> 
> > Hello all,
> >
> > I apologize if this has already been covered in the past, but I
> > couldn't seem  to find an adequate solution to my problem in the
> > archives.
> >
> > I have a database that is used for a bandwidth tracking system at a
> > broadband  ISP.  To make a long story short, I'm inserting over
> > 800,000 records per day  into this database.  Suffice to say, the
> > uptime of this database is of  paramount importance, so I would like
> > to have a more up-to-date backup copy  of my database in the event of
> > a failure (more recent than my twice-per-day  db_dump backup).
> >
> > I have two servers, both Dual Xeon-2G with 4G of RAM, and would like
> > to replicate between the two.  I would like to have "live"
> > replication, but I  couldn't seem to find a solution for that for
> > PostgreSQL.  I tried RServ but,  after attempting it, I saw a mailing
> > list posting saying that it is  more-or-less useless for databases
> > that have a large number of inserts (like  mine).
> >
> > When I perform a replication after a batch of data is inserted, the
> > query runs  literally for hours before it returns.  I have never
> > actually been present  during the whole replication duration since it
> > takes longer than my 8-12 hour  days here at work.
> >
> > Is there any replication solution that would fit my needs?  I'm taking
> > advantage of some PG7.2 features so "downgrading" to the 6.x version
> > of postgres that has replication support isn't an option.
> >
> > Thanks.
> >
> > --man
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com

Re: Replication for a large database

From

Michael A Nachbaur

Date:

05 May 2003, 18:26:44

I have thought about this.  The problem I come into is data consistancy.  I 
have at least 8 different processes that harvest data, and an intranet 
website that can also manipulate the database (to assign customers to 
different packages, re-assign modems to different customers, etc).  Trying to 
maintain consistancy across the entire application would be such a nightmare, 
I don't want to think about it.

If I go with a centralized middleware server that manages all database access, 
then I could perhaps do that in there...and then I could use transactions on 
both databases, and if either transaction fails then I'll roll back the 
other.  But this would make my entire framework very rigid.

On Monday 05 May 2003 09:16 am, Ryan wrote:
> Ok, mabye this is just because I'm coming from a layman's perspective
> regarding enterprise level databases, but couldn't you fake replication
> by inserting the data into both databases?  (granted this involves
> having source access to the program doing the insertion.)
>
> It may be a kludge, but it would work until something better came along.
>
> Ryan
>
> > Hello all,
> >
> > I apologize if this has already been covered in the past, but I
> > couldn't seem  to find an adequate solution to my problem in the
> > archives.
> >
> > I have a database that is used for a bandwidth tracking system at a
> > broadband  ISP.  To make a long story short, I'm inserting over
> > 800,000 records per day  into this database.  Suffice to say, the
> > uptime of this database is of  paramount importance, so I would like
> > to have a more up-to-date backup copy  of my database in the event of
> > a failure (more recent than my twice-per-day  db_dump backup).
> >
> > I have two servers, both Dual Xeon-2G with 4G of RAM, and would like
> > to replicate between the two.  I would like to have "live"
> > replication, but I  couldn't seem to find a solution for that for
> > PostgreSQL.  I tried RServ but,  after attempting it, I saw a mailing
> > list posting saying that it is  more-or-less useless for databases
> > that have a large number of inserts (like  mine).
> >
> > When I perform a replication after a batch of data is inserted, the
> > query runs  literally for hours before it returns.  I have never
> > actually been present  during the whole replication duration since it
> > takes longer than my 8-12 hour  days here at work.
> >
> > Is there any replication solution that would fit my needs?  I'm taking
> > advantage of some PG7.2 features so "downgrading" to the 6.x version
> > of postgres that has replication support isn't an option.
> >
> > Thanks.
> >
> > --man
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: Replication for a large database

From

Michael A Nachbaur

Date:

05 May 2003, 18:29:00

IIRC, pgsql.com's product (Postgres-R I think it's called) is based on 
PostgreSQL-6.4.2, which won't work for me since I use a lot of features 
available only in the 7.x generation.

I'm using the Rserv code from the contrib section of Postgres, and that's what 
I'm having problems with.

On Monday 05 May 2003 10:30 am, Rajesh Kumar Mallah wrote:
> Have u tried the Rserv solution which pgsql.com provides
> commercially? I have heard its pretty good.
>
>
> regds
> mallah.
>
> On Mon, 5 May 2003, Michael A Nachbaur wrote:
> > Hello all,
> >
> > I apologize if this has already been covered in the past, but I couldn't
> > seem to find an adequate solution to my problem in the archives.
> >
> > I have a database that is used for a bandwidth tracking system at a
> > broadband ISP.  To make a long story short, I'm inserting over 800,000
> > records per day into this database.  Suffice to say, the uptime of this
> > database is of paramount importance, so I would like to have a more
> > up-to-date backup copy of my database in the event of a failure (more
> > recent than my twice-per-day db_dump backup).
> >
> > I have two servers, both Dual Xeon-2G with 4G of RAM, and would like to
> > replicate between the two.  I would like to have "live" replication, but
> > I couldn't seem to find a solution for that for PostgreSQL.  I tried
> > RServ but, after attempting it, I saw a mailing list posting saying that
> > it is more-or-less useless for databases that have a large number of
> > inserts (like mine).
> >
> > When I perform a replication after a batch of data is inserted, the query
> > runs literally for hours before it returns.  I have never actually been
> > present during the whole replication duration since it takes longer than
> > my 8-12 hour days here at work.
> >
> > Is there any replication solution that would fit my needs?  I'm taking
> > advantage of some PG7.2 features so "downgrading" to the 6.x version of
> > postgres that has replication support isn't an option.
> >
> > Thanks.
> >
> > --man
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster

pgsql Replication Proxy (was Re: Replication for a large database)

From

Michael A Nachbaur

Date:

05 May 2003, 18:57:20

I've thought some more about this, and I want to pass this idea past you guys.  
What do you think about a replication proxy, essentially a daemon that sits 
between a PostgreSQL client and server.  Every single SQL query, transaction 
statement, etc that the proxy recieves it repeats back to all the database 
servers.  In this way, if a back-end database server goes down queries will 
continue unabated (except the downed server won't recieve updates).

Basically, the proxy server could intercept these queries and place them in a 
stack (on a per-database basis) and when every server in the queue 
acknowledges the update, the query is removed from the stack.  Each database 
server can have their own position in the stack, so if servers A and B 
successfully run a query, but C doesn't (e.g. it requires human 
intervention), C is removed from the list of acceptable servers but A and B 
can keep moving through the queue.

What do you think?  Also, should this discussion be moved to another mailing 
list?

On Monday 05 May 2003 12:26 pm, Michael A Nachbaur wrote:
> I have thought about this.  The problem I come into is data consistancy.  I
> have at least 8 different processes that harvest data, and an intranet
> website that can also manipulate the database (to assign customers to
> different packages, re-assign modems to different customers, etc).  Trying
> to maintain consistancy across the entire application would be such a
> nightmare, I don't want to think about it.
>
> If I go with a centralized middleware server that manages all database
> access, then I could perhaps do that in there...and then I could use
> transactions on both databases, and if either transaction fails then I'll
> roll back the other.  But this would make my entire framework very rigid.

Re: Replication for a large database

From

Jeff Eckermann

Date:

05 May 2003, 21:39:16

--- Michael A Nachbaur <mike@nachbaur.com> wrote:
> IIRC, pgsql.com's product (Postgres-R I think it's
> called) is based on 
> PostgreSQL-6.4.2, which won't work for me since I
> use a lot of features 
> available only in the 7.x generation.

You are thinking of the pgreplication project here. 
The pgsql.com product is (IIRC) an improved version of
rserv.  It is reported to work well in enterprise
level applications.

__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com

Re: Replication for a large database

From

The Hermit Hacker

Date:

08 May 2003, 00:27:55

Postgres-R is not pgsql.com's product ... eRServer is ... the code that is
in product has been quite extensively pounded on by both the .org and
.info registries ...


On Mon, 5 May 2003, Michael A Nachbaur wrote:

> IIRC, pgsql.com's product (Postgres-R I think it's called) is based on
> PostgreSQL-6.4.2, which won't work for me since I use a lot of features
> available only in the 7.x generation.
>
> I'm using the Rserv code from the contrib section of Postgres, and that's what
> I'm having problems with.
>
> On Monday 05 May 2003 10:30 am, Rajesh Kumar Mallah wrote:
> > Have u tried the Rserv solution which pgsql.com provides
> > commercially? I have heard its pretty good.
> >
> >
> > regds
> > mallah.
> >
> > On Mon, 5 May 2003, Michael A Nachbaur wrote:
> > > Hello all,
> > >
> > > I apologize if this has already been covered in the past, but I couldn't
> > > seem to find an adequate solution to my problem in the archives.
> > >
> > > I have a database that is used for a bandwidth tracking system at a
> > > broadband ISP.  To make a long story short, I'm inserting over 800,000
> > > records per day into this database.  Suffice to say, the uptime of this
> > > database is of paramount importance, so I would like to have a more
> > > up-to-date backup copy of my database in the event of a failure (more
> > > recent than my twice-per-day db_dump backup).
> > >
> > > I have two servers, both Dual Xeon-2G with 4G of RAM, and would like to
> > > replicate between the two.  I would like to have "live" replication, but
> > > I couldn't seem to find a solution for that for PostgreSQL.  I tried
> > > RServ but, after attempting it, I saw a mailing list posting saying that
> > > it is more-or-less useless for databases that have a large number of
> > > inserts (like mine).
> > >
> > > When I perform a replication after a batch of data is inserted, the query
> > > runs literally for hours before it returns.  I have never actually been
> > > present during the whole replication duration since it takes longer than
> > > my 8-12 hour days here at work.
> > >
> > > Is there any replication solution that would fit my needs?  I'm taking
> > > advantage of some PG7.2 features so "downgrading" to the 6.x version of
> > > postgres that has replication support isn't an option.
> > >
> > > Thanks.
> > >
> > > --man
> > >
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 4: Don't 'kill -9' the postmaster
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
>

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org