Re: hardware performance and some more - Mailing list pgsql-performance

From William Yu
Subject Re: hardware performance and some more
Date
Msg-id 3F200C90.5050001@talisys.com
Whole thread Raw
In response to hardware performance and some more  (Kasim Oztoprak <kasim@saglik.gov.tr>)
Responses Re: hardware performance and some more
List pgsql-performance
| first of all I would like to learn that, any of you use the postgresql
| within the clustered environment? Or, let me ask you the question, in
| different manner, can we use postgresql in a cluster environment? If
| we can do what is the support method of the postgresql for clusters?

You could do active-active but it would require work on your end. I did
a recent check on all the Postgres replication packages and they all
seem to be single master -> single/many slaves. Updating on more than 1
server looks to be problematic. I run an active-active now but I had to
develop my own custom replication strategy.

As a background, we develop & host web-based apps that use Postgres as
the DB engine. Since our clients access our server over the internet,
uptime is a big issue. Hence, we have two server farms: one colocated in
San Francisco and the other in Sterling, VA. In addition to redudancy,
we also wanted to spread the load across the servers. To do this, we
went with the expedient method of 1-minute DNS zonemaps where if both
servers are up, 70% traffic is sent to the faster farm and 30% to the
other. Both servers are constantly monitored and if one goes down, a new
zonemap is pushed out listing only the servers that are up.

The first step in making this work was converting all integer keys to
character keys. By making keys into characters, we could prepend a
server location code so ID 100 generated at SF would not conflict with
ID 100 generated in Sterling. Instead, they would be marked as S00000100
and V00000100. Another benefit is the increase of possible key
combinations by being able to use alpha characters. (36^(n-1) versus 10^n)

At this time, the method we use is a periodic sweep of all updated
records. In every table, we add extra fields to mark the date/time the
record was last inserted/updated/deleted. All records touched as of the
last resync are extracted, zipped up, pgp-encrypted and then posted on
an ftp server. Files are then transfered between servers, records
unpacked and inserted/updated. Some checks are needed to determine what
takes precedence if users updated the same record on both servers but
otherwise it's a straightforward process.

As far as I can tell, the performance impact seems to be minimal.
There's a periodic storm of replication updates in cases where there's
mass updates sync last resync. But if you have mostly reads and few
writes, you shouldn't see this situation. The biggest performance impact
seems to be the CPU power needed to zip/unzip/encrypt/decrypt files.

I'm thinking over strats to get more "real-time" replication working. I
suppose I could just make the resync program run more often but that's a
bit inelegant. Perhaps I could capture every update/delete/insert/alter
statement from the postgres logs, parsing them out to commands and then
zipping/encrypting every command as a separate item to be processed. Or
add triggers to every table where updated records are pushed to a custom
"updated log".

The biggest problem is of course locks -- especially at the application
level. I'm still thinking over what to do here.


pgsql-performance by date:

Previous
From: Kasim Oztoprak
Date:
Subject: Re: hardware performance and some more
Next
From: Greg Stark
Date:
Subject: Re: Tuning PostgreSQL