Thread: 24x7x365 high-volume ops ideas
I have a few high-volume, fairly large clusters that I'm struggling to keep up 24x7x365. I want to ask for advice from anyone with similar experience or hard-won wisdom. Generally these are clusters with 100-200 queries/second, maybe 10GB-30GB of data (always increasing), and maybe 10% writes. A little regular routine downtime for maintenance would do wonders for these systems, but unfortunately, the requirement is 100% uptime all the time, and any downtime at all is a liability. Here are some of the issues: 1) Big tables. When the tables grow large enough, it takes too long to vacuum them. In some cases there's just too much data. In other cases, it's dead space, but both reindex and vacuum full block production queries (a lesser version of downtime). In the past, we have taken a PR hit for downtime to dump/reload (we've found it to be faster than vacuum full). Async replication helps with cluster moves from one server to another, but still don't have a low-to-zero downtime solution for regular maint. 2) Big tables, part 2. Of course, customers want all data that ever existed online and quickly available via sub-second queries. I assume at some point this data is going to be too much for one table (how much is too much?). This is a little vague, I know, but what sorts of segmenting strategies to folks employ to deal with data that cannot be retired but gets too expensive to vacuum, etc. 3) Simple restarts for configuration changes (shared_buffers, max_connections, etc). When we have to break client connections, we have to notify the customer and take a PR hit. Maybe pgpool is a possible solution? Are these issues for Oracle, DB2, etc as well? TIA. Ed
> -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Ed L. > Sent: Wednesday, November 03, 2004 5:10 PM > To: pgsql-general@postgresql.org > Subject: [GENERAL] 24x7x365 high-volume ops ideas > > > > I have a few high-volume, fairly large clusters that I'm > struggling to keep > up 24x7x365. I want to ask for advice from anyone with > similar experience > or hard-won wisdom. > > Generally these are clusters with 100-200 queries/second, > maybe 10GB-30GB of > data (always increasing), and maybe 10% writes. A little > regular routine > downtime for maintenance would do wonders for these systems, but > unfortunately, the requirement is 100% uptime all the time, and any > downtime at all is a liability. Here are some of the issues: > > 1) Big tables. When the tables grow large enough, it takes > too long to > vacuum them. In some cases there's just too much data. In > other cases, > it's dead space, but both reindex and vacuum full block > production queries > (a lesser version of downtime). In the past, we have taken a > PR hit for > downtime to dump/reload (we've found it to be faster than > vacuum full). > Async replication helps with cluster moves from one server to > another, but > still don't have a low-to-zero downtime solution for regular maint. This is a tough problem. Related to those below, of course. > 2) Big tables, part 2. Of course, customers want all data that ever > existed online and quickly available via sub-second queries. > I assume at > some point this data is going to be too much for one table > (how much is too > much?). This is a little vague, I know, but what sorts of segmenting > strategies to folks employ to deal with data that cannot be > retired but > gets too expensive to vacuum, etc. This is an opportunity for savings. Create a history table that contains everything more than about 2 years old. For people who want access to everything create a view with union all against current and history. Once they see how fast the current data and how rarely they really need the old stuff, they will shy away from using it. E.g. Table inventory has current data Table inventory_hist has old data View inventory_all has "SELECT * FROM inventory UNION ALL SELECT * FROM inventory_hist" > 3) Simple restarts for configuration changes (shared_buffers, > max_connections, etc). When we have to break client > connections, we have > to notify the customer and take a PR hit. Maybe pgpool is a possible > solution? There's the time-tested method of throwing more hardware at it. Would an 8-CPU machine with 64 GB ram and Ultra320 stripped raid array help? The hardware is always less expensive than the software and the data -- usually by orders of magnitude. If a $25,000-$50,000 machine will save one man-year of effort over its lifetime, then it is a splendid idea. > Are these issues for Oracle, DB2, etc as well? Yes, but not as pronounced. For instance with SQL*Server you do "UPDATE STATISTICS" to get the equivalent of a vacuum full.
On Wed, 2004-11-03 at 18:10, Ed L. wrote: > unfortunately, the requirement is 100% uptime all the time, and any > downtime at all is a liability. Here are some of the issues: Seems like 100% uptime is always an issue, but not even close to reality. I think it's unreasonable to expect a single piece of software that NEVER to be restarted. Never is a really long time. For this case, isn't replication sufficient? (FWIW, in 1 month I have to answer this same question). Would this work? * 'Main' db server up 99.78% of time * 'Replicant' up 99.78% of time (using slony, dbmirror) * When Main goes down (crisis, maintenance), Replicant answers for Main, in a read-only fashion. * When Main comes back up, any waiting writes can now happen. * Likewise, Replicant can be taken down for maint, then Main syncs to it when going back online. Is this how it's done? \<.
A long time ago, in a galaxy far, far away, Karim.Nassar@NAU.EDU (Karim Nassar) wrote: > On Wed, 2004-11-03 at 18:10, Ed L. wrote: >> unfortunately, the requirement is 100% uptime all the time, and any >> downtime at all is a liability. Here are some of the issues: > > Seems like 100% uptime is always an issue, but not even close to > reality. I think it's unreasonable to expect a single piece of > software that NEVER to be restarted. Never is a really long time. > > For this case, isn't replication sufficient? (FWIW, in 1 month I > have to answer this same question). Would this work? > > * 'Main' db server up 99.78% of time > * 'Replicant' up 99.78% of time (using slony, dbmirror) > * When Main goes down (crisis, maintenance), Replicant answers for Main, > in a read-only fashion. > * When Main comes back up, any waiting writes can now happen. > * Likewise, Replicant can be taken down for maint, then Main syncs to it > when going back online. > > Is this how it's done? The challenge lies in two places: 1. You need some mechanism to detect that the "replica" should take over, and to actually perform that takeover. That "takeover" requires having some way for your application to become aware of the new IP address of the DB host. 2. Some changes need to take place in order to prepare the "replica" to be treated as "master." For instance, in the case of Slony-I, you can do a fullscale "failover" where you tell it to treat the "main" database as being dead. At that point, the replica becomes the master. That essentially discards the former 'master' as dead. Alternatively, there's a "MOVE SET" which is suitable for predictable maintenance; that shifts the "master" node from one node to another; you can take MAIN out of service for a while, and add it back, perhaps making it the "master" again. None of these systems _directly_ address how apps would get pointed to the shifting servers. A neat approach would involve making pgpool, a C-based 'connection pool' manager, Slony-I-aware. If it were to submit either MOVE SET or FAILOVER, it would be aware of which DB to point things to, so that applications that pass requests through pgpool would not necessarily need to be aware of there being a change beyond perhaps seeing some transactions terminated. That won't be ready tomorrow... Something needs to be "smart enough" to point apps to the right place; that's something to think about... -- let name="cbbrowne" and tld="linuxfinances.info" in String.concat "@" [name;tld];; http://www3.sympatico.ca/cbbrowne/advocacy.html "XFS might (or might not) come out before the year 3000. As far as kernel patches go, SGI are brilliant. As far as graphics, especially OpenGL, go, SGI is untouchable. As far as filing systems go, a concussed doormouse in a tarpit would move faster." -- jd on Slashdot
On Sun, 2004-11-07 at 21:16, Christopher Browne wrote: > None of these systems _directly_ address how apps would get pointed to > the shifting servers. <snip> > Something needs to be "smart enough" to point apps to the right place; > that's something to think about... Seems like it would be pretty easy to be smart in PHP: function db_connect() { $conn = pg_connect("dbname='foo' user='dawg' password='HI!' host='master'"); if (!($conn AND (pg_connection_status($conn) == 0))) { // problem with master $conn = pg_connect("dbname='foo' user='dawg' password='HI!' host='replica'"); if ($conn AND (pg_connection_status($conn) == 0)) { return $conn; } } else { return $conn; } return NULL; } Whatever client-side language one uses, the technique is the same (though the coding style might differ :P ), can be used for persistent connections (eg: with pg_pconnect in PHP), and seems like it could be extended to any reasonable number of database servers. What is the problem with this? The only issue I can see is that "replica" might be behind. Depending on the application, this might not be bad. If the app MUST have the very most accurate DB, you could remove the logic that connects to the replica, but then that nullifies this whole conversation... \<.
Karim.Nassar@NAU.EDU (Karim Nassar) writes: > On Sun, 2004-11-07 at 21:16, Christopher Browne wrote: >> None of these systems _directly_ address how apps would get pointed to >> the shifting servers. > <snip> >> Something needs to be "smart enough" to point apps to the right place; >> that's something to think about... > > Seems like it would be pretty easy to be smart in PHP: > > function db_connect() { > $conn = pg_connect("dbname='foo' user='dawg' password='HI!' > host='master'"); > if (!($conn AND (pg_connection_status($conn) == 0))) { > // problem with master > $conn = pg_connect("dbname='foo' user='dawg' password='HI!' > host='replica'"); > if ($conn AND (pg_connection_status($conn) == 0)) { > return $conn; > } > } else { > return $conn; > } > return NULL; > } > > Whatever client-side language one uses, the technique is the same > (though the coding style might differ :P ), can be used for > persistent connections (eg: with pg_pconnect in PHP), and seems like > it could be extended to any reasonable number of database servers. > > What is the problem with this? The only issue I can see is that > "replica" might be behind. Depending on the application, this might > not be bad. If the app MUST have the very most accurate DB, you > could remove the logic that connects to the replica, but then that > nullifies this whole conversation... The "problem" is that this requires modifications to the application, and communicating configuration changes gets that bit more complicated. Supposing, for instance, the code that accesses connections has already gotten wrapped in some more-or-less arcane object class specific to the application, it may be somewhat troublesome to make the modification. It would be attractive to allow the configuration change to take place outside the application in a manner that allows the application to be completely ignorant about it. By the way, your db_connect() suggestion doesn't cope with the problem where a connection is broken and the application continues to use that broken connection. There may be a need to cope with that... -- let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];; http://www.ntlug.org/~cbbrowne/linuxxian.html A VAX is virtually a computer, but not quite.
Hi Chris and Karim, I haven't been following this thread, so excuse me if I suggest something that has already been tossed out. Solaris allows multiple IP addresses to be assigned to a single NIC. I just looked at the man page for Linux ifconfig but didn't see quickly how to do this. If Linux doesn't allow this, the same thing can be accomplished using multiple NICs per server. We reserve a special IP for the DB server. This IP can be assigned to the NIC of the machine currently hosting the database. If you want apps to connect to a different server, remove the IP from one machine and reassign it to the other. This special DB IP is assigned on top of the regular IP for the machine. Newly connecting apps are never the wiser, but existing connections must be terminated. machine A: (DB server) ifconfig hme0 192.1.1.1 up ifconfig hme0:1 192.1.1.100 up #special DB IP To change DB server from machine A to B: MachineB: (New DB server) ifconfig hme0 192.1.1.2 up ifconfig hme0:1 192.1.1.100 up Machine A: ifconfig hme0:1 down Hope this helps, Rick Chris Browne <cbbrowne@acm.org> To: pgsql-general@postgresql.org Sent by: cc: pgsql-general-owner@pos Subject: Re: [GENERAL] 24x7x365 high-volume ops ideas tgresql.org 11/15/2004 05:20 PM Karim.Nassar@NAU.EDU (Karim Nassar) writes: > On Sun, 2004-11-07 at 21:16, Christopher Browne wrote: >> None of these systems _directly_ address how apps would get pointed to >> the shifting servers. > <snip> >> Something needs to be "smart enough" to point apps to the right place; >> that's something to think about... > > Seems like it would be pretty easy to be smart in PHP: > > function db_connect() { > $conn = pg_connect("dbname='foo' user='dawg' password='HI!' > host='master'"); > if (!($conn AND (pg_connection_status($conn) == 0))) { > // problem with master > $conn = pg_connect("dbname='foo' user='dawg' password='HI!' > host='replica'"); > if ($conn AND (pg_connection_status($conn) == 0)) { > return $conn; > } > } else { > return $conn; > } > return NULL; > } > > Whatever client-side language one uses, the technique is the same > (though the coding style might differ :P ), can be used for > persistent connections (eg: with pg_pconnect in PHP), and seems like > it could be extended to any reasonable number of database servers. > > What is the problem with this? The only issue I can see is that > "replica" might be behind. Depending on the application, this might > not be bad. If the app MUST have the very most accurate DB, you > could remove the logic that connects to the replica, but then that > nullifies this whole conversation... The "problem" is that this requires modifications to the application, and communicating configuration changes gets that bit more complicated. Supposing, for instance, the code that accesses connections has already gotten wrapped in some more-or-less arcane object class specific to the application, it may be somewhat troublesome to make the modification. It would be attractive to allow the configuration change to take place outside the application in a manner that allows the application to be completely ignorant about it. By the way, your db_connect() suggestion doesn't cope with the problem where a connection is broken and the application continues to use that broken connection. There may be a need to cope with that... -- let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];; http://www.ntlug.org/~cbbrowne/linuxxian.html A VAX is virtually a computer, but not quite. ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Hi, On Tue, 2004-11-16 at 14:17, Richard_D_Levine@raytheon.com wrote: > Hi Chris and Karim, > > I haven't been following this thread, so excuse me if I suggest something > that has already been tossed out. > > Solaris allows multiple IP addresses to be assigned to a single NIC. I > just looked at the man page for Linux ifconfig but didn't see quickly how > to do this. If Linux doesn't allow this, the same thing can be > accomplished using multiple NICs per server. > > We reserve a special IP for the DB server. This IP can be assigned to the > NIC of the machine currently hosting the database. If you want apps to > connect to a different server, remove the IP from one machine and reassign > it to the other. This special DB IP is assigned on top of the regular IP > for the machine. > > Newly connecting apps are never the wiser, but existing connections must be > terminated. Yes, linux can do it as well. But either case beware the arp cache :-) There is sqlrelay which could do the switching as well without forcing the apps to reconnect. Regards Tino
Tino Wildenhain wrote: >Hi, > >On Tue, 2004-11-16 at 14:17, Richard_D_Levine@raytheon.com wrote: > > >>Hi Chris and Karim, >> >>I haven't been following this thread, so excuse me if I suggest something >>that has already been tossed out. >> >>Solaris allows multiple IP addresses to be assigned to a single NIC. I >>just looked at the man page for Linux ifconfig but didn't see quickly how >>to do this. If Linux doesn't allow this, the same thing can be >>accomplished using multiple NICs per server. >> >> ifconfig device:<alias> ipaddress up For linux. Sincerely, Joshua D. Drake >>We reserve a special IP for the DB server. This IP can be assigned to the >>NIC of the machine currently hosting the database. If you want apps to >>connect to a different server, remove the IP from one machine and reassign >>it to the other. This special DB IP is assigned on top of the regular IP >>for the machine. >> >>Newly connecting apps are never the wiser, but existing connections must be >>terminated. >> >> > >Yes, linux can do it as well. But either case beware the arp cache :-) >There is sqlrelay which could do the switching as well without >forcing the apps to reconnect. > >Regards >Tino > > >---------------------------(end of broadcast)--------------------------- >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
Attachment
Thanks Joshua, I just played with it and it is more flexible that Solaris. Linux allows <alias> to be four characters, but Solaris only allows low numbers. I like this: ifconfig eth0:dbms 192.1.1.100 up Tino's right about the ARP tables. Gotta watch that one, especially with lower grade switches. Rick "Joshua D. Drake" <jd@commandprompt To: Tino Wildenhain <tino@wildenhain.de> .com> cc: Richard_D_Levine@raytheon.com, Chris Browne <cbbrowne@acm.org>, pgsql-general@postgresql.org, pgsql-general-owner@postgresql.org 11/16/2004 11:08 Subject: Re: [GENERAL] 24x7x365 high-volume ops ideas AM Tino Wildenhain wrote: >Hi, > >On Tue, 2004-11-16 at 14:17, Richard_D_Levine@raytheon.com wrote: > > >>Hi Chris and Karim, >> >>I haven't been following this thread, so excuse me if I suggest something >>that has already been tossed out. >> >>Solaris allows multiple IP addresses to be assigned to a single NIC. I >>just looked at the man page for Linux ifconfig but didn't see quickly how >>to do this. If Linux doesn't allow this, the same thing can be >>accomplished using multiple NICs per server. >> >> ifconfig device:<alias> ipaddress up For linux. Sincerely, Joshua D. Drake >>We reserve a special IP for the DB server. This IP can be assigned to the >>NIC of the machine currently hosting the database. If you want apps to >>connect to a different server, remove the IP from one machine and reassign >>it to the other. This special DB IP is assigned on top of the regular IP >>for the machine. >> >>Newly connecting apps are never the wiser, but existing connections must be >>terminated. >> >> > >Yes, linux can do it as well. But either case beware the arp cache :-) >There is sqlrelay which could do the switching as well without >forcing the apps to reconnect. > >Regards >Tino > > >---------------------------(end of broadcast)--------------------------- >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL (See attached file: jd.vcf)