Thread: postgresql and openmosix migration
Hi, I am trying to make a cluster out of any database, postgresql or mysql or any other free database. I have looked at openmosix patched with the migshm patch for shared memory support and it seems that neither work fully. Postgresql in particular uses "shared memory but not the system semaphores for locking it". Thus apparently it won't benefit from an openmosix cluster. In addition mysql doesn't seem to migrate because it is multithreaded. Any ideas of how I can cluster my database (around 800 GB in size so even partial replication is not really practical)?
If interested this is my source for openmosix and migshm information http://howto.ipng.be/MigSHM-openMosix/x90.html
Thanks.
Bill, > Any ideas of how I can cluster my database (around 800 GB > in size so even partial replication is not really practical)? Um, raise $150,000 to pay for a clustering implementation? Various techniques of "shared memory clustering" have been tried with PostgreSQL, and none work. Neither does LinuxLabs "ClusGres", which is based on similar principles -- unfortunately. (at least, LL repeatedly postponed the demo they said they'd give me. I've yet to see anything working ...) Frankly, we're waiting for a well-funded corporation to jump in and decide they want PostgreSQL clustering. Database server clustering is a "big ticket item" requiring roughly 1,000 hours of programming and troubleshooting. As such, you're not likely to see it come out of the OSS community unaided. Oh, and FYI, MySQL's "clustering" doesn't work either. It requires your entire database to fit into available RAM .... -- Josh Berkus Aglio Database Solutions San Francisco
Ok, so maybe someone on this group will have a better idea. We have a database of financial information, and this has literally millions of entries. I have installed indicies, but for the rather computationally demanding processes we like to use, like a select query to find the commodity with the highest monthly or annual returns, the computer generally runs unacceptably slow. So, other than clustring, how could I achieve a speed increase in these complex queries? Is this better in mysql or postgresql? Thanks. ----- Original Message ----- From: "Josh Berkus" <josh@agliodbs.com> To: "Bill" <bill@math.uchicago.edu>; <pgsql-performance@postgresql.org> Sent: Tuesday, June 22, 2004 11:31 AM Subject: Re: [PERFORM] postgresql and openmosix migration > Bill, > > > Any ideas of how I can cluster my database (around 800 GB > > in size so even partial replication is not really practical)? > > Um, raise $150,000 to pay for a clustering implementation? > > Various techniques of "shared memory clustering" have been tried with > PostgreSQL, and none work. Neither does LinuxLabs "ClusGres", which is > based on similar principles -- unfortunately. (at least, LL repeatedly > postponed the demo they said they'd give me. I've yet to see anything > working ...) > > Frankly, we're waiting for a well-funded corporation to jump in and decide > they want PostgreSQL clustering. Database server clustering is a "big > ticket item" requiring roughly 1,000 hours of programming and > troubleshooting. As such, you're not likely to see it come out of the OSS > community unaided. > > Oh, and FYI, MySQL's "clustering" doesn't work either. It requires your > entire database to fit into available RAM .... > > -- > Josh Berkus > Aglio Database Solutions > San Francisco >
On Tue, Jun 22, 2004 at 12:31:15 -0500, Bill <bill@math.uchicago.edu> wrote: > Ok, so maybe someone on this group will have a better idea. We have a > database of financial information, and this has literally millions of > entries. I have installed indicies, but for the rather computationally > demanding processes we like to use, like a select query to find the > commodity with the highest monthly or annual returns, the computer generally > runs unacceptably slow. So, other than clustring, how could I achieve a > speed increase in these complex queries? Is this better in mysql or > postgresql? Queries using max (or min) can often be rewritten as queries using ORDER BY and LIMIT so that they can take advantage of indexes. Doing this might help with some of the problems you are seeing. If you commonly query on aggregated data it might be better to create derived tables of the aggregated data maintained by triggers, and query against them. If you do lots of selects relative to inserts and updates, this could be a big win.
Sounds like an issue I have experienced in Oracle as well. If you can you might want consider breaking out your database into oltp (on line transaction processing) and data warehouse db. You run you any reports you can nightly into a set of warehouse tables and save your daytime cpus for incoming info and special real-time (hottest commodity of the day) reports that you have tuned the best you can. Anything you can calculate in advance that won't change over time, should be saved in the warehouse tables, so you don't waste cpus, re-working data in real time. Pre-running your reports won't speed them up but your users won't be waiting for a report to calculate while they are looking at the screen. -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Bill Sent: Tuesday, June 22, 2004 11:31 AM To: Josh Berkus Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] postgresql and openmosix migration Ok, so maybe someone on this group will have a better idea. We have a database of financial information, and this has literally millions of entries. I have installed indicies, but for the rather computationally demanding processes we like to use, like a select query to find the commodity with the highest monthly or annual returns, the computer generally runs unacceptably slow. So, other than clustring, how could I achieve a speed increase in these complex queries? Is this better in mysql or postgresql? Thanks. ----- Original Message ----- From: "Josh Berkus" <josh@agliodbs.com> To: "Bill" <bill@math.uchicago.edu>; <pgsql-performance@postgresql.org> Sent: Tuesday, June 22, 2004 11:31 AM Subject: Re: [PERFORM] postgresql and openmosix migration > Bill, > > > Any ideas of how I can cluster my database (around 800 GB > > in size so even partial replication is not really practical)? > > Um, raise $150,000 to pay for a clustering implementation? > > Various techniques of "shared memory clustering" have been tried with > PostgreSQL, and none work. Neither does LinuxLabs "ClusGres", which is > based on similar principles -- unfortunately. (at least, LL repeatedly > postponed the demo they said they'd give me. I've yet to see anything > working ...) > > Frankly, we're waiting for a well-funded corporation to jump in and decide > they want PostgreSQL clustering. Database server clustering is a "big > ticket item" requiring roughly 1,000 hours of programming and > troubleshooting. As such, you're not likely to see it come out of the OSS > community unaided. > > Oh, and FYI, MySQL's "clustering" doesn't work either. It requires your > entire database to fit into available RAM .... > > -- > Josh Berkus > Aglio Database Solutions > San Francisco > ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings
On Tue, 22 Jun 2004 12:31:15 -0500 Bill <bill@math.uchicago.edu> wrote: > I have installed indicies, but are there any statistics? vacuum analyze is your friend > but for the rather computationally > demanding processes we like to use, like a select query to find the > commodity with the highest monthly or annual returns, the computer generally > runs unacceptably slow. So, other than clustring, how could I achieve a > speed increase in these complex queries? 1) have you gone to the effort to tune the values in postgresql.conf? 2) have you tried using explain to find out what the query planner is up to? > Is this better in mysql or > postgresql? if there is any complexity to the queries, postgresql will serve you better if you learn how to use it properly. richard -- Richard Welty rwelty@averillpark.net Averill Park Networking 518-573-7592 Java, PHP, PostgreSQL, Unix, Linux, IP Network Engineering, Security
Hi Bill, I am more often in the "needing help" category than the "giving help" when it comes to advise about using postgresql. I have found it to be an extremely powerful tool and by far the best performance/price for my work. I think you will get some excellent answers and help to your performance questions if you send the list details about specific queries that are running too slow. If you are willing to throw more/bigger hardware at the problem, let people know that when you ask and they will tell you if your bottleneck can be alleviated through more ram, disks, cpu or whatever. Having been watching this list for some time now, I suspect most of the performance problems can be improved using non-intuitive query or configuration modifications (for example, replacing min()/max() as suggested by Mr. Wolf). The heavy hitters on the list will usually ask for an "explain analyze" of your query. If your query is "select * from foo", then change it to "EXPLAIN ANALYZE select * from foo" and post the output. It will look something like this: QUERY PLAN ---------------------------------------------------------------------------- ------------------------------- Seq Scan on foo (cost=0.00..1.04 rows=4 width=44) (actual time=8.46..8.47 rows=4 loops=1) Total runtime: 19.63 msec (2 rows) I'm sure your data is confidential; mine is too. The good news is that none of your data is included in the query. Only technical details about what the database is doing. If your problem might involve the application that works with the data, give some details about that. For example, if you're using a Java application, let people know what driver version you use, what jvm and other related info. There are lurkers on this list using just about every programming language imaginable on more platforms than you can shake a stick at (I don't care how good you are at shaking sticks, either). The more details you give the better help you're going to get and you'd be amazed at the results I've seen people get with a judicious amount of tweaking. The other day someone had a query that took hours decrease to less than 10 minutes by using some techniques prescribed by members on the list. Bringing 30 - 60 second queries down to 2-3 seconds is commonplace. You seem to be ready to throw money at the problem by investing in new hardware but I would suggest digging into the performance problems first. Too many times we've seen people on the list say, "I've just spent $x0,000 on a new xyz and I'm still having problems with this query." Often times the true solution is rewriting queries, tweaking config parameters, adding RAM and upgrading disks (in that order I believe). As I found out even today on the SQL list, it's best to ask questions in this form: "I want to do this... I've been trying this... I'm getting this... which is problematic because..." The more clearly you state the abstract goal the more creative answers you'll get with people often suggesting things you'd never considered. I hope this helps and I hope that you achieve your goals of a well performing application. Matthew Nuzum | Makers of "Elite Content Management System" www.followers.net | View samples of Elite CMS in action matt@followers.net | http://www.followers.net/portfolio/ > -----Original Message----- > From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance- > owner@postgresql.org] On Behalf Of Bill > Sent: Tuesday, June 22, 2004 1:31 PM > To: Josh Berkus > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] postgresql and openmosix migration > > Ok, so maybe someone on this group will have a better idea. We have a > database of financial information, and this has literally millions of > entries. I have installed indicies, but for the rather computationally > demanding processes we like to use, like a select query to find the > commodity with the highest monthly or annual returns, the computer > generally > runs unacceptably slow. So, other than clustring, how could I achieve a > speed increase in these complex queries? Is this better in mysql or > postgresql? > > Thanks.
Bill wrote: > Ok, so maybe someone on this group will have a better idea. We have a > database of financial information, and this has literally millions of > entries. I have installed indicies, but for the rather computationally > demanding processes we like to use, like a select query to find the > commodity with the highest monthly or annual returns, the computer generally > runs unacceptably slow. So, other than clustring, how could I achieve a > speed increase in these complex queries? Is this better in mysql or > postgresql? If the bottleneck is really computational, not I/O, you might try PL/R in conjunction with the rpvm R package. rpvm allows R to make use of pvm to split its load among a cluster. See: R: http://www.r-project.org/ PL/R: http://www.joeconway.com/plr/ rpvm: http://cran.r-project.org/src/contrib/Descriptions/rpvm.html http://cran.r-project.org/doc/packages/rpvm.pdf I haven't had a chance to play with this myself yet, but I hope to relatively soon. HTH, Joe
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Bill wrote: | Ok, so maybe someone on this group will have a better idea. We have a | database of financial information, and this has literally millions of | entries. I have installed indicies, but for the rather computationally | demanding processes we like to use, like a select query to find the | commodity with the highest monthly or annual returns, the computer generally | runs unacceptably slow. So, other than clustring, how could I achieve a | speed increase in these complex queries? Is this better in mysql or | postgresql? Postgres generally beats MySQL on complex queries. The easiest solution to speed issues is to throw hardware at it. Generally, you're first bound by disk, RAM then CPU. 1) Move your data over to an array of smallish 15kRPM disks. The more spindles the better. 2) Use a 64 bit platform and take advantage of >4 GB memory. There are dozens of options for the disk array. For the processing platform, I'd recommend looking at Opteron. I've heard only good things and their price is much more reasonable than the other options. - -- Andrew Hammond 416-673-4138 ahammond@ca.afilias.info Database Administrator, Afilias Canada Corp. CB83 2838 4B67 D40F D086 3568 81FC E7E5 27AF 4A9A -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFA2Zf3gfzn5SevSpoRAr0HAJ0S/uVjuqYEuhMgdSAI3rfHK0ga1wCgwpHl g+yuBYpAt58vnJWtX+wii1s= =2fGN -----END PGP SIGNATURE-----
Attachment
Bill wrote: > Ok, so maybe someone on this group will have a better idea. We have a > database of financial information, and this has literally millions of > entries. I have installed indicies, but for the rather computationally > demanding processes we like to use, like a select query to find the > commodity with the highest monthly or annual returns, the computer > generally > runs unacceptably slow. So, other than clustring, how could I achieve a > speed increase in these complex queries? Is this better in mysql or > postgresql? This is a very broad question. Optimizing your SQL to run fast as on any other database is something of an art form. This is a very broad topic that could fill a book. For example, a common performance killer is not having enough sort memory for large ordered result sets. A critical skill is being able to figure out if the planner is optimizing your queries badly. Knowing this is a mixture of observation and intuition that comes with experience. The absolute best case performance of a query is roughly defined by the data that is looked at to generate the result set and the size of the result set itself when the query is pulling data from the cache. The cache problem is compromisable by throwing more money at the problem but a poorly planned query will run slowly on any hardware. I would suggest isolating particular problems and posting them to the list. (explain analyze works wonders). Merlin
Bill, > Ok, so maybe someone on this group will have a better idea. We have a > database of financial information, and this has literally millions of > entries. I have installed indicies, but for the rather computationally > demanding processes we like to use, like a select query to find the > commodity with the highest monthly or annual returns, the computer > generally runs unacceptably slow. So, other than clustring, how could I > achieve a speed increase in these complex queries? Well, you can do this 2 ways: 1) you can pick out one query at a time, and send us complete information on it, like Matt's really nice e-mail describes. People on this list will help you troubleshoot it. It will take a lot of time, but no money. 2) You can hire a PG database expert. This will be much faster, but cost you a lot of money. >Is this better in mysql > or postgresql? Complex queries? Large databases? That's us. MySQL is obtimized for simple queries on small databases. -- Josh Berkus Aglio Database Solutions San Francisco
> 2) You can hire a PG database expert. This will be much faster, but cost > you a lot of money. I wouldn't exactly say "a lot of money". Lots of consulters out there are willing to put in a weeks worth of effort, on site, for significantly less than a support contract with most commercial DB organizations (including MySQL) -- and often give better results since they're on-site rather than over phone or via email. But yes, doing it via this mailing list is probably the cheapest option.
On Wed, 23 Jun 2004 13:52:39 -0400 Rod Taylor <pg@rbt.ca> wrote: > But yes, doing it via this mailing list is probably the cheapest option. yes, he just needs to decide how big a hurry he's in. also, if he does decide to hire a consultant, i suggest he pop over to pgsql-jobs and ask there. richard -- Richard Welty rwelty@averillpark.net Averill Park Networking 518-573-7592 Java, PHP, PostgreSQL, Unix, Linux, IP Network Engineering, Security