Thread: PostgreSQL for a mission critical system

PostgreSQL for a mission critical system

From
Andrés Robinet
Date:
Greetings everyone, I'm new to this list (No, I'm not about to ask how to
install pgsql on Windows 3.11!)

We are involved with a financial project, similar to Paypal (Web Payments
Standard/Payment Buttons) but the scope and functionality is much narrower. This
project will be released in a market niche that will not exceed thirty thousand
(30000) users in the very best (and very optimistic) of scenarios and will
surely have a much lower transaction/user/day ratio than Paypal.

Our development department has to take all the technical decisions (what RDBMS
to use, what additional personnel is required and what equipment to
acquire/rent), and at the head of those decisions is, unfortunately, me. We have
vast experience in MySQL, though not in large-scale or mission critical database
setups. We have reviewed some MySQL specific use cases and whitepapers and have
done intensive research (Paggo is a highlighted use case here, since it's
similar though much bigger).

However, I refuse to take the bait and just follow the propaganda. I've been
working with PostgreSQL a few (very few) times before and I believe it's
superior, TCO is much better (since you hire expertise and not licenses), but I
don't know how easy it can be to scale up/out and if it can handle the most
optimistic scenario (optimistic from the business point of view, for me it will
be a pain in the ...).

In addition, we have to design this application in such a way that it can be
scaled up/out to the presumed limit (30000 users) by just upgrading hardware,
acquiring new hardware, or modifying the hardware setup. We don't want to
redesign the application or modify it heavily once launched, though some tune up
or bug fix will surely happen. We have our way to deal with this, we've planned
for some front-end web applications (written in PHP, with caching and
acceleration) and a back-end which is responsible for all the database
management (probably written in Java and C/C++, we haven't decided yet). The
front-end and back-end would communicate by means of a web services API (say
SOAP, or the more lightweight XML-RPC), this will allow us to scale the
front-end and back-end parts separately.

This system could grow to more than 30000 users (say, to a million) in the very
far future, but for that to happen, this "pilot test" should be a complete
success (which means our client pocket would be full of money) and the market
niche should have to be extended. Meaning, if we need to jump higher, we can
redesign our application and/or switch to another database/programming
technology, such as Oracle (which I strongly hate :@). However, it's more likely
that if we succeed, you find me again on this list looking for arguments to
adopt PostgreSQL :D.

What I would like to know (if you are so kind ;) ) is:

1) Do you really think that we can start low (say, a master/master or
master/slave replication system on commodity servers) and jump high (say,
upgrading to Opteron 8300 servers)? That is... how much can we scale up, without
the need to scale out?

2) Do you know of any Hosting/Collocation service that can provide PostgreSQL
installation/configuration support at an affordable price? (This is very
important in the early stages). I've seen the hosting list at postgresql.org,
but if possible, please share your experiences.

3) Any articles, benchmarking, white papers, ebooks, seminars, personal
experience, use cases that you can recommend about this? (Yes, I've downloaded
the postgresql.org use cases and I can use Google, but your voice is louder,
specially if it speaks for your experience)

4) Any other comments are welcome.


Thanks in advance (and sorry if I have bothered you in any way),


Andrés Robinet | Lead Developer | BESTPLACE CORPORATION 
5100 Bayview Drive 206, Royal Lauderdale Landings, Fort Lauderdale, FL 33308 |
TEL 954-607-4296 | FAX 954-337-2695 |
Email: info@bestplace.net  | MSN Chat: best@bestplace.net  |  SKYPE: bestplace |
 Web: bestplace.biz  | Web: seo-diy.com


how to do this select?

From
Yi Zhao
Date:
hi, all, I have a table in postgresql which have 2 columns like this:
a|b
------
X A
X A
Y A
D B
H B
E B
D B
P C
when I do select and order by, I got this:
 a | b
---+---
 X | A
 X | A
 Y | A
 D | B
 H | B
   | B
 D | B
 P | C

I want to get the only 2 rows(limit or random) if the column "b" have
the same value. so, the result of the above is
a | b
---+---
X | A
X | A
D | B
H | B
P | C

how to do that, thanks all!


Re: how to do this select?

From
Craig Ringer
Date:
Yi Zhao wrote:

> I want to get the only 2 rows(limit or random) if the column "b" have
> the same value. so, the result of the above is
> a | b
> ---+---
> X | A
> X | A
> D | B
> H | B
> P | C
>
> how to do that, thanks all!
>
>

I don't understand what you want based on your description and your
example results. What do you mean by "if the column `b' have the same
value" ?


By the way, when you're posting this kind of question it's really
helpful to send SQL code to set up your test environment. For example,
if you had sent:


CREATE TABLE ab (a text, b text);

INSERT INTO ab (a,b) VALUES
('X','A'),('X','B'),('Y','A'),('D','B'),('H','B'),('E','B'),('D','B'),('P','C');



... then it would've saved the hassle of assembling a test table.

--
Craig Ringer

Re: how to do this select?

From
Yi Zhao
Date:
thanks Ringer.
my mean is that:
I want less than 2 rows which have the same value of column "b"!

for example, there is 3 columns have the same value "A",
 X | A
 X | A
 Y | A
I want my result have two of them.

thanks.
On Thu, 2009-02-19 at 15:49 +0900, Craig Ringer wrote:
> Yi Zhao wrote:
>
> > I want to get the only 2 rows(limit or random) if the column "b" have
> > the same value. so, the result of the above is
> > a | b
> > ---+---
> > X | A
> > X | A
> > D | B
> > H | B
> > P | C
> >
> > how to do that, thanks all!
> >
> >
>
> I don't understand what you want based on your description and your
> example results. What do you mean by "if the column `b' have the same
> value" ?
>
>
> By the way, when you're posting this kind of question it's really
> helpful to send SQL code to set up your test environment. For example,
> if you had sent:
>
>
> CREATE TABLE ab (a text, b text);
>
> INSERT INTO ab (a,b) VALUES
> ('X','A'),('X','B'),('Y','A'),('D','B'),('H','B'),('E','B'),('D','B'),('P','C');
>
>
>
> ... then it would've saved the hassle of assembling a test table.
>
> --
> Craig Ringer
>


Re: how to do this select?

From
"A. Kretschmer"
Date:
In response to Craig Ringer :
> I don't understand what you want based on your description and your
> example results. What do you mean by "if the column `b' have the same
> value" ?

Additionally, don't hijack other threads by answer to an old message and
changing the subject. Your eMails contains a references-header for a old
thread, because ot this, your mail occurs in a foreign thread.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: how to do this select?

From
Craig Ringer
Date:
Yi Zhao wrote:
> thanks Ringer.
> my mean is that:
> I want less than 2 rows which have the same value of column "b"!
>
> for example, there is 3 columns have the same value "A",
>  X | A
>  X | A
>  Y | A
> I want my result have two of them.

Less than two? ie just one? That's easy:

test=# SELECT DISTINCT ON (b) a, b FROM ab;
  a | b
---+---
  Y | A
  D | B
  P | C
(3 rows)

but if you want a randomly selected TWO rows, rather than just one, it's
harder. In PostgreSQL 8.4 (not yet released) you could use the new
windowing functions to do it quite easily. I can't off the top of my
head think of any reasonably nice way to do it in 8.3.

--
Craig Ringer

Re: how to do this select?

From
"A. Kretschmer"
Date:
In response to Yi Zhao :
> thanks Ringer.
> my mean is that:
> I want less than 2 rows which have the same value of column "b"!
>
> for example, there is 3 columns have the same value "A",
>  X | A
>  X | A
>  Y | A
> I want my result have two of them.

I think, you are searching for DISTINCT:

test=# create table foo (a text, b text);
CREATE TABLE
test=*# copy foo from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> a    a
>> a    a
>> b    a
>> \.
test=*# select distinct a,b from foo;
 a | b
---+---
 a | a
 b | a
(2 rows)


Please, no silly top-posting.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: how to do this select?

From
Yi Zhao
Date:
ok, thanks, I will create a new message when I post next time.

about my question, I think distinct can't solve my problem, because I
want to get more than one rows. if there is more  than (or equal) 2 (eg:
2, 3, 4, 100 ...)rows have the same value of column 'b' , I want to get
only 2 rows. if lesse than 2, I want get all the result of them.

ps: I' think, the *2* in my example is not appropriate, how about 10,
50?

thanks.
On Thu, 2009-02-19 at 08:33 +0100, A. Kretschmer wrote:
> In response to Yi Zhao :
> > thanks Ringer.
> > my mean is that:
> > I want less than 2 rows which have the same value of column "b"!
> >
> > for example, there is 3 columns have the same value "A",
> >  X | A
> >  X | A
> >  Y | A
> > I want my result have two of them.
>
> I think, you are searching for DISTINCT:
>
> test=# create table foo (a text, b text);
> CREATE TABLE
> test=*# copy foo from stdin;
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself.
> >> a    a
> >> a    a
> >> b    a
> >> \.
> test=*# select distinct a,b from foo;
>  a | b
> ---+---
>  a | a
>  b | a
> (2 rows)
>
>
> Please, no silly top-posting.
>
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing in e-mail?
>
>
> Andreas
> --
> Andreas Kretschmer
> Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
> GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net
>


Re: how to do this select?

From
Craig Ringer
Date:
Yi Zhao wrote:
> ok, thanks, I will create a new message when I post next time.

And it's nice to reply below the original message, after cutting off the
bits that don't matter anymore. It saves space and makes your messages
easier for other people to read, which means you are more likely to get
replies and people are more likely to help you out.

> About my question, I think distinct [on] can't solve my problem, because I
> want to get more than one rows. if there is more  than (or equal) 2 (eg:
> 2, 3, 4, 100 ...)rows have the same value of column 'b' , I want to get
> only 2 rows. if lesse than 2, I want get all the result of them.
>
> ps: I' think, the *2* in my example is not appropriate, how about 10,
> 50?

OK, so for each distinct value in `b' you wish to obtain a random
selection of `n' or fewer rows in which that value of `b' occurs?

I'm pretty sure you want the PostgreSQL 8.4 windowing functions. See:

http://www.depesz.com/index.php/2009/01/21/waiting-for-84-window-functions/
http://developer.postgresql.org/pgdocs/postgres/tutorial-window.html
http://developer.postgresql.org/pgdocs/postgres/functions-window.html

Is this a production system, or something in development? If it's only
in development I strongly suggest moving to PostgreSQL 8.4 now, so that
by the time you're ready to release 8.4 will have been released too and
you will be able to use its features.

If that's not an option ... there are probably clever ways of doing what
you want, but personally I'd just hack something ugly in Pl/PgSQL like
(using my previous example code):

CREATE OR REPLACE FUNCTION blah () RETURNS SETOF ab AS
$$
DECLARE
   thisb text;
BEGIN
   FOR thisb IN SELECT DISTINCT(b) FROM ab LOOP
     RETURN QUERY SELECT a, b FROM ab WHERE b = thisb LIMIT 2;
   END LOOP;
   RETURN;
END;
$$ LANGUAGE 'plpgsql' STABLE;

test=# SELECT * FROM blah();
  a | b
---+---
  X | A
  Y | A
  X | B
  D | B
  P | C
(5 rows)

--
Craig Ringer

Re: how to do this select?

From
"A. Kretschmer"
Date:
In response to Yi Zhao :
> ok, thanks, I will create a new message when I post next time.
>
> about my question, I think distinct can't solve my problem, because I
> want to get more than one rows. if there is more  than (or equal) 2 (eg:
> 2, 3, 4, 100 ...)rows have the same value of column 'b' , I want to get
> only 2 rows. if lesse than 2, I want get all the result of them.
>
> ps: I' think, the *2* in my example is not appropriate, how about 10,
> 50?

Okay, as Craig Ringer said, you can use new features in 8.4. For example:

test=# select * from foo;
 a | b
---+---
 a | a
 b | a
 c | a
 d | a
 a | b
 b | b
 c | b
 a | c
 a | d
 a | e
 b | e
 c | e
 d | e
(13 rows)


So, now i'm counting the rows, partition by b:

test=# select b, a, row_number() over (partition by b) from foo order by b, a;
 b | a | row_number
---+---+------------
 a | a |          1
 a | b |          2
 a | c |          3
 a | d |          4
 b | a |          1
 b | b |          2
 b | c |          3
 c | a |          1
 d | a |          1
 e | a |          1
 e | b |          2
 e | c |          3
 e | d |          4
(13 rows)

Next step, only up to 2 entries for every value in b:

test=# select * from (select b, a, row_number() over (partition by b) from foo order by b, a) temp where row_number <=
2;
 b | a | row_number
---+---+------------
 a | a |          1
 a | b |          2
 b | a |          1
 b | b |          2
 c | a |          1
 d | a |          1
 e | a |          1
 e | b |          2
(8 rows)



Regards, Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net