Thread: postgresql and openmosix migration

postgresql and openmosix migration

From
"Bill"
Date:
Hi, I am trying to make a cluster out of any database, postgresql or mysql or any other free database.  I have looked at openmosix patched with the migshm patch for shared memory support and it seems that neither work fully.  Postgresql in particular uses "shared memory but not the system semaphores for locking it".  Thus apparently it won't benefit from an openmosix cluster.  In addition mysql doesn't seem to migrate because it is multithreaded.  Any ideas of how I can cluster my database (around 800 GB in size so even partial replication is not really practical)?
 
If interested this is my source for openmosix and migshm information http://howto.ipng.be/MigSHM-openMosix/x90.html
 
Thanks.

Re: postgresql and openmosix migration

From
Josh Berkus
Date:
Bill,

>   Any ideas of how I can cluster my database (around 800 GB
> in size so even partial replication is not really practical)?

Um, raise $150,000 to pay for a clustering implementation?

Various techniques of "shared memory clustering" have been tried with
PostgreSQL, and none work.    Neither does LinuxLabs "ClusGres", which is
based on similar principles -- unfortunately. (at least, LL repeatedly
postponed the demo they said they'd give me.  I've yet to see anything
working ...)

Frankly, we're waiting for a well-funded corporation to jump in and decide
they want PostgreSQL clustering.   Database server clustering is a "big
ticket item" requiring roughly 1,000 hours of programming and
troubleshooting.   As such, you're not likely to see it come out of the OSS
community unaided.

Oh, and FYI, MySQL's "clustering" doesn't work either.  It requires your
entire database to fit into available RAM ....

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: postgresql and openmosix migration

From
"Bill"
Date:
Ok, so maybe someone on this group will have a better idea.  We have a
database of financial information, and this has literally millions of
entries.  I have installed indicies, but for the rather computationally
demanding processes we like to use, like a select query to find the
commodity with the highest monthly or annual returns, the computer generally
runs unacceptably slow.  So, other than clustring, how could I achieve a
speed increase in these complex queries?  Is this better in mysql or
postgresql?

Thanks.
----- Original Message -----
From: "Josh Berkus" <josh@agliodbs.com>
To: "Bill" <bill@math.uchicago.edu>; <pgsql-performance@postgresql.org>
Sent: Tuesday, June 22, 2004 11:31 AM
Subject: Re: [PERFORM] postgresql and openmosix migration


> Bill,
>
> >   Any ideas of how I can cluster my database (around 800 GB
> > in size so even partial replication is not really practical)?
>
> Um, raise $150,000 to pay for a clustering implementation?
>
> Various techniques of "shared memory clustering" have been tried with
> PostgreSQL, and none work.    Neither does LinuxLabs "ClusGres", which is
> based on similar principles -- unfortunately. (at least, LL repeatedly
> postponed the demo they said they'd give me.  I've yet to see anything
> working ...)
>
> Frankly, we're waiting for a well-funded corporation to jump in and decide
> they want PostgreSQL clustering.   Database server clustering is a "big
> ticket item" requiring roughly 1,000 hours of programming and
> troubleshooting.   As such, you're not likely to see it come out of the
OSS
> community unaided.
>
> Oh, and FYI, MySQL's "clustering" doesn't work either.  It requires your
> entire database to fit into available RAM ....
>
> --
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>



Re: postgresql and openmosix migration

From
Bruno Wolff III
Date:
On Tue, Jun 22, 2004 at 12:31:15 -0500,
  Bill <bill@math.uchicago.edu> wrote:
> Ok, so maybe someone on this group will have a better idea.  We have a
> database of financial information, and this has literally millions of
> entries.  I have installed indicies, but for the rather computationally
> demanding processes we like to use, like a select query to find the
> commodity with the highest monthly or annual returns, the computer generally
> runs unacceptably slow.  So, other than clustring, how could I achieve a
> speed increase in these complex queries?  Is this better in mysql or
> postgresql?

Queries using max (or min) can often be rewritten as queries using ORDER BY
and LIMIT so that they can take advantage of indexes. Doing this might help
with some of the problems you are seeing.
If you commonly query on aggregated data it might be better to create
derived tables of the aggregated data maintained by triggers, and query
against them. If you do lots of selects relative to inserts and updates,
this could be a big win.

Re: postgresql and openmosix migration

From
Date:
Sounds like an issue I have experienced in Oracle as well. If you can
you might want consider breaking out your database into oltp (on line
transaction processing) and data warehouse db. You run you any reports
you can nightly into a set of warehouse tables and save your daytime
cpus for incoming info and special real-time (hottest commodity of the
day) reports that you have tuned the best you can. Anything you can
calculate in advance that won't change over time, should be saved in the
warehouse tables, so you don't waste cpus, re-working data in real time.
Pre-running your reports won't speed them up but your users won't be
waiting for a report to calculate while they are looking at the screen.

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Bill
Sent: Tuesday, June 22, 2004 11:31 AM
To: Josh Berkus
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] postgresql and openmosix migration

Ok, so maybe someone on this group will have a better idea.  We have a
database of financial information, and this has literally millions of
entries.  I have installed indicies, but for the rather computationally
demanding processes we like to use, like a select query to find the
commodity with the highest monthly or annual returns, the computer
generally
runs unacceptably slow.  So, other than clustring, how could I achieve a
speed increase in these complex queries?  Is this better in mysql or
postgresql?

Thanks.
----- Original Message -----
From: "Josh Berkus" <josh@agliodbs.com>
To: "Bill" <bill@math.uchicago.edu>; <pgsql-performance@postgresql.org>
Sent: Tuesday, June 22, 2004 11:31 AM
Subject: Re: [PERFORM] postgresql and openmosix migration


> Bill,
>
> >   Any ideas of how I can cluster my database (around 800 GB
> > in size so even partial replication is not really practical)?
>
> Um, raise $150,000 to pay for a clustering implementation?
>
> Various techniques of "shared memory clustering" have been tried with
> PostgreSQL, and none work.    Neither does LinuxLabs "ClusGres", which
is
> based on similar principles -- unfortunately. (at least, LL repeatedly
> postponed the demo they said they'd give me.  I've yet to see anything
> working ...)
>
> Frankly, we're waiting for a well-funded corporation to jump in and
decide
> they want PostgreSQL clustering.   Database server clustering is a
"big
> ticket item" requiring roughly 1,000 hours of programming and
> troubleshooting.   As such, you're not likely to see it come out of
the
OSS
> community unaided.
>
> Oh, and FYI, MySQL's "clustering" doesn't work either.  It requires
your
> entire database to fit into available RAM ....
>
> --
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>



---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Re: postgresql and openmosix migration

From
Richard Welty
Date:
On Tue, 22 Jun 2004 12:31:15 -0500 Bill <bill@math.uchicago.edu> wrote:
> I have installed indicies,

but are there any statistics? vacuum analyze is your friend

> but for the rather computationally
> demanding processes we like to use, like a select query to find the
> commodity with the highest monthly or annual returns, the computer generally
> runs unacceptably slow.  So, other than clustring, how could I achieve a
> speed increase in these complex queries?

1) have you gone to the effort to tune the values in postgresql.conf?

2) have you tried using explain to find out what the query planner is
    up to?

>  Is this better in mysql or
> postgresql?

if there is any complexity to the queries, postgresql will serve you better
if you learn how to use it properly.

richard
--
Richard Welty                                         rwelty@averillpark.net
Averill Park Networking                                         518-573-7592
    Java, PHP, PostgreSQL, Unix, Linux, IP Network Engineering, Security

Re: postgresql and openmosix migration

From
"Matthew Nuzum"
Date:
Hi Bill, I am more often in the "needing help" category than the "giving
help" when it comes to advise about using postgresql.  I have found it to be
an extremely powerful tool and by far the best performance/price for my
work.

I think you will get some excellent answers and help to your performance
questions if you send the list details about specific queries that are
running too slow.  If you are willing to throw more/bigger hardware at the
problem, let people know that when you ask and they will tell you if your
bottleneck can be alleviated through more ram, disks, cpu or whatever.
Having been watching this list for some time now, I suspect most of the
performance problems can be improved using non-intuitive query or
configuration modifications (for example, replacing min()/max() as suggested
by Mr. Wolf).

The heavy hitters on the list will usually ask for an "explain analyze" of
your query.  If your query is "select * from foo", then change it to
"EXPLAIN ANALYZE select * from foo" and post the output.  It will look
something like this:
                                                QUERY PLAN

----------------------------------------------------------------------------
-------------------------------
 Seq Scan on foo  (cost=0.00..1.04 rows=4 width=44) (actual time=8.46..8.47
rows=4 loops=1)
 Total runtime: 19.63 msec
(2 rows)

I'm sure your data is confidential; mine is too.  The good news is that none
of your data is included in the query. Only technical details about what the
database is doing.

If your problem might involve the application that works with the data, give
some details about that. For example, if you're using a Java application,
let people know what driver version you use, what jvm and other related
info.  There are lurkers on this list using just about every programming
language imaginable on more platforms than you can shake a stick at (I don't
care how good you are at shaking sticks, either).

The more details you give the better help you're going to get and you'd be
amazed at the results I've seen people get with a judicious amount of
tweaking. The other day someone had a query that took hours decrease to less
than 10 minutes by using some techniques prescribed by members on the list.
Bringing 30 - 60 second queries down to 2-3 seconds is commonplace.

You seem to be ready to throw money at the problem by investing in new
hardware but I would suggest digging into the performance problems first.
Too many times we've seen people on the list say, "I've just spent $x0,000
on a new xyz and I'm still having problems with this query."  Often times
the true solution is rewriting queries, tweaking config parameters, adding
RAM and upgrading disks (in that order I believe).

As I found out even today on the SQL list, it's best to ask questions in
this form:
"I want to do this...  I've been trying this...  I'm getting this... which
is problematic because..."

The more clearly you state the abstract goal the more creative answers
you'll get with people often suggesting things you'd never considered.

I hope this helps and I hope that you achieve your goals of a well
performing application.

Matthew Nuzum        | Makers of "Elite Content Management System"
www.followers.net        | View samples of Elite CMS in action
matt@followers.net    | http://www.followers.net/portfolio/


> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-
> owner@postgresql.org] On Behalf Of Bill
> Sent: Tuesday, June 22, 2004 1:31 PM
> To: Josh Berkus
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] postgresql and openmosix migration
>
> Ok, so maybe someone on this group will have a better idea.  We have a
> database of financial information, and this has literally millions of
> entries.  I have installed indicies, but for the rather computationally
> demanding processes we like to use, like a select query to find the
> commodity with the highest monthly or annual returns, the computer
> generally
> runs unacceptably slow.  So, other than clustring, how could I achieve a
> speed increase in these complex queries?  Is this better in mysql or
> postgresql?
>
> Thanks.


Re: postgresql and openmosix migration

From
Joe Conway
Date:
Bill wrote:
> Ok, so maybe someone on this group will have a better idea.  We have a
> database of financial information, and this has literally millions of
> entries.  I have installed indicies, but for the rather computationally
> demanding processes we like to use, like a select query to find the
> commodity with the highest monthly or annual returns, the computer generally
> runs unacceptably slow.  So, other than clustring, how could I achieve a
> speed increase in these complex queries?  Is this better in mysql or
> postgresql?

If the bottleneck is really computational, not I/O, you might try PL/R
in conjunction with the rpvm R package. rpvm allows R to make use of pvm
to split its load among a cluster. See:

R:
   http://www.r-project.org/

PL/R:
   http://www.joeconway.com/plr/

rpvm:
   http://cran.r-project.org/src/contrib/Descriptions/rpvm.html
   http://cran.r-project.org/doc/packages/rpvm.pdf

I haven't had a chance to play with this myself yet, but I hope to
relatively soon.

HTH,

Joe

Re: postgresql and openmosix migration

From
Andrew Hammond
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bill wrote:
| Ok, so maybe someone on this group will have a better idea.  We have a
| database of financial information, and this has literally millions of
| entries.  I have installed indicies, but for the rather computationally
| demanding processes we like to use, like a select query to find the
| commodity with the highest monthly or annual returns, the computer
generally
| runs unacceptably slow.  So, other than clustring, how could I achieve a
| speed increase in these complex queries?  Is this better in mysql or
| postgresql?

Postgres generally beats MySQL on complex queries. The easiest solution
to speed issues is to throw hardware at it. Generally, you're first
bound by disk, RAM then CPU.

1) Move your data over to an array of smallish 15kRPM disks. The more
spindles the better.
2) Use a 64 bit platform and take advantage of >4 GB memory.

There are dozens of options for the disk array. For the processing
platform, I'd recommend looking at Opteron. I've heard only good things
and their price is much more reasonable than the other options.

- --
Andrew Hammond    416-673-4138    ahammond@ca.afilias.info
Database Administrator, Afilias Canada Corp.
CB83 2838 4B67 D40F D086 3568 81FC E7E5 27AF 4A9A
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFA2Zf3gfzn5SevSpoRAr0HAJ0S/uVjuqYEuhMgdSAI3rfHK0ga1wCgwpHl
g+yuBYpAt58vnJWtX+wii1s=
=2fGN
-----END PGP SIGNATURE-----

Attachment

Re: postgresql and openmosix migration

From
"Merlin Moncure"
Date:
Bill wrote:
> Ok, so maybe someone on this group will have a better idea.  We have a
> database of financial information, and this has literally millions of
> entries.  I have installed indicies, but for the rather
computationally
> demanding processes we like to use, like a select query to find the
> commodity with the highest monthly or annual returns, the computer
> generally
> runs unacceptably slow.  So, other than clustring, how could I achieve
a
> speed increase in these complex queries?  Is this better in mysql or
> postgresql?

This is a very broad question.  Optimizing your SQL to run fast as on
any other database is something of an art form.  This is a very broad
topic that could fill a book.  For example, a common performance killer
is not having enough sort memory for large ordered result sets.

A critical skill is being able to figure out if the planner is
optimizing your queries badly.  Knowing this is a mixture of observation
and intuition that comes with experience.  The absolute best case
performance of a query is roughly defined by the data that is looked at
to generate the result set and the size of the result set itself when
the query is pulling data from the cache.  The cache problem is
compromisable by throwing more money at the problem but a poorly planned
query will run slowly on any hardware.

I would suggest isolating particular problems and posting them to the
list. (explain analyze works wonders).

Merlin

Re: postgresql and openmosix migration

From
Josh Berkus
Date:
Bill,

> Ok, so maybe someone on this group will have a better idea.  We have a
> database of financial information, and this has literally millions of
> entries.  I have installed indicies, but for the rather computationally
> demanding processes we like to use, like a select query to find the
> commodity with the highest monthly or annual returns, the computer
> generally runs unacceptably slow.  So, other than clustring, how could I
> achieve a speed increase in these complex queries?

Well, you can do this 2 ways:
1) you can pick out one query at a time, and send us complete information on
it, like Matt's really nice e-mail describes.     People on this list will
help you troubleshoot it.   It will take a lot of time, but no money.

2) You can hire a PG database expert.    This will be much faster, but cost
you a lot of money.

>Is this better in mysql
> or postgresql?

Complex queries?   Large databases?   That's us.   MySQL is obtimized for
simple queries on small databases.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: postgresql and openmosix migration

From
Rod Taylor
Date:
> 2) You can hire a PG database expert.    This will be much faster, but cost
> you a lot of money.

I wouldn't exactly say "a lot of money". Lots of consulters out there
are willing to put in a weeks worth of effort, on site, for
significantly less than a support contract with most commercial DB
organizations (including MySQL) -- and often give better results since
they're on-site rather than over phone or via email.

But yes, doing it via this mailing list is probably the cheapest option.



Re: postgresql and openmosix migration

From
Richard Welty
Date:
On Wed, 23 Jun 2004 13:52:39 -0400 Rod Taylor <pg@rbt.ca> wrote:
> But yes, doing it via this mailing list is probably the cheapest option.

yes, he just needs to decide how big a hurry he's in.

also, if he does decide to hire a consultant, i suggest he pop over
to pgsql-jobs and ask there.

richard
--
Richard Welty                                         rwelty@averillpark.net
Averill Park Networking                                         518-573-7592
    Java, PHP, PostgreSQL, Unix, Linux, IP Network Engineering, Security