Thread: PostgreSQL for a mission critical system
Greetings everyone, I'm new to this list (No, I'm not about to ask how to install pgsql on Windows 3.11!) We are involved with a financial project, similar to Paypal (Web Payments Standard/Payment Buttons) but the scope and functionality is much narrower. This project will be released in a market niche that will not exceed thirty thousand (30000) users in the very best (and very optimistic) of scenarios and will surely have a much lower transaction/user/day ratio than Paypal. Our development department has to take all the technical decisions (what RDBMS to use, what additional personnel is required and what equipment to acquire/rent), and at the head of those decisions is, unfortunately, me. We have vast experience in MySQL, though not in large-scale or mission critical database setups. We have reviewed some MySQL specific use cases and whitepapers and have done intensive research (Paggo is a highlighted use case here, since it's similar though much bigger). However, I refuse to take the bait and just follow the propaganda. I've been working with PostgreSQL a few (very few) times before and I believe it's superior, TCO is much better (since you hire expertise and not licenses), but I don't know how easy it can be to scale up/out and if it can handle the most optimistic scenario (optimistic from the business point of view, for me it will be a pain in the ...). In addition, we have to design this application in such a way that it can be scaled up/out to the presumed limit (30000 users) by just upgrading hardware, acquiring new hardware, or modifying the hardware setup. We don't want to redesign the application or modify it heavily once launched, though some tune up or bug fix will surely happen. We have our way to deal with this, we've planned for some front-end web applications (written in PHP, with caching and acceleration) and a back-end which is responsible for all the database management (probably written in Java and C/C++, we haven't decided yet). The front-end and back-end would communicate by means of a web services API (say SOAP, or the more lightweight XML-RPC), this will allow us to scale the front-end and back-end parts separately. This system could grow to more than 30000 users (say, to a million) in the very far future, but for that to happen, this "pilot test" should be a complete success (which means our client pocket would be full of money) and the market niche should have to be extended. Meaning, if we need to jump higher, we can redesign our application and/or switch to another database/programming technology, such as Oracle (which I strongly hate :@). However, it's more likely that if we succeed, you find me again on this list looking for arguments to adopt PostgreSQL :D. What I would like to know (if you are so kind ;) ) is: 1) Do you really think that we can start low (say, a master/master or master/slave replication system on commodity servers) and jump high (say, upgrading to Opteron 8300 servers)? That is... how much can we scale up, without the need to scale out? 2) Do you know of any Hosting/Collocation service that can provide PostgreSQL installation/configuration support at an affordable price? (This is very important in the early stages). I've seen the hosting list at postgresql.org, but if possible, please share your experiences. 3) Any articles, benchmarking, white papers, ebooks, seminars, personal experience, use cases that you can recommend about this? (Yes, I've downloaded the postgresql.org use cases and I can use Google, but your voice is louder, specially if it speaks for your experience) 4) Any other comments are welcome. Thanks in advance (and sorry if I have bothered you in any way), Andrés Robinet | Lead Developer | BESTPLACE CORPORATION 5100 Bayview Drive 206, Royal Lauderdale Landings, Fort Lauderdale, FL 33308 | TEL 954-607-4296 | FAX 954-337-2695 | Email: info@bestplace.net | MSN Chat: best@bestplace.net | SKYPE: bestplace | Web: bestplace.biz | Web: seo-diy.com
hi, all, I have a table in postgresql which have 2 columns like this: a|b ------ X A X A Y A D B H B E B D B P C when I do select and order by, I got this: a | b ---+--- X | A X | A Y | A D | B H | B | B D | B P | C I want to get the only 2 rows(limit or random) if the column "b" have the same value. so, the result of the above is a | b ---+--- X | A X | A D | B H | B P | C how to do that, thanks all!
Yi Zhao wrote: > I want to get the only 2 rows(limit or random) if the column "b" have > the same value. so, the result of the above is > a | b > ---+--- > X | A > X | A > D | B > H | B > P | C > > how to do that, thanks all! > > I don't understand what you want based on your description and your example results. What do you mean by "if the column `b' have the same value" ? By the way, when you're posting this kind of question it's really helpful to send SQL code to set up your test environment. For example, if you had sent: CREATE TABLE ab (a text, b text); INSERT INTO ab (a,b) VALUES ('X','A'),('X','B'),('Y','A'),('D','B'),('H','B'),('E','B'),('D','B'),('P','C'); ... then it would've saved the hassle of assembling a test table. -- Craig Ringer
thanks Ringer. my mean is that: I want less than 2 rows which have the same value of column "b"! for example, there is 3 columns have the same value "A", X | A X | A Y | A I want my result have two of them. thanks. On Thu, 2009-02-19 at 15:49 +0900, Craig Ringer wrote: > Yi Zhao wrote: > > > I want to get the only 2 rows(limit or random) if the column "b" have > > the same value. so, the result of the above is > > a | b > > ---+--- > > X | A > > X | A > > D | B > > H | B > > P | C > > > > how to do that, thanks all! > > > > > > I don't understand what you want based on your description and your > example results. What do you mean by "if the column `b' have the same > value" ? > > > By the way, when you're posting this kind of question it's really > helpful to send SQL code to set up your test environment. For example, > if you had sent: > > > CREATE TABLE ab (a text, b text); > > INSERT INTO ab (a,b) VALUES > ('X','A'),('X','B'),('Y','A'),('D','B'),('H','B'),('E','B'),('D','B'),('P','C'); > > > > ... then it would've saved the hassle of assembling a test table. > > -- > Craig Ringer >
In response to Craig Ringer : > I don't understand what you want based on your description and your > example results. What do you mean by "if the column `b' have the same > value" ? Additionally, don't hijack other threads by answer to an old message and changing the subject. Your eMails contains a references-header for a old thread, because ot this, your mail occurs in a foreign thread. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
Yi Zhao wrote: > thanks Ringer. > my mean is that: > I want less than 2 rows which have the same value of column "b"! > > for example, there is 3 columns have the same value "A", > X | A > X | A > Y | A > I want my result have two of them. Less than two? ie just one? That's easy: test=# SELECT DISTINCT ON (b) a, b FROM ab; a | b ---+--- Y | A D | B P | C (3 rows) but if you want a randomly selected TWO rows, rather than just one, it's harder. In PostgreSQL 8.4 (not yet released) you could use the new windowing functions to do it quite easily. I can't off the top of my head think of any reasonably nice way to do it in 8.3. -- Craig Ringer
In response to Yi Zhao : > thanks Ringer. > my mean is that: > I want less than 2 rows which have the same value of column "b"! > > for example, there is 3 columns have the same value "A", > X | A > X | A > Y | A > I want my result have two of them. I think, you are searching for DISTINCT: test=# create table foo (a text, b text); CREATE TABLE test=*# copy foo from stdin; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> a a >> a a >> b a >> \. test=*# select distinct a,b from foo; a | b ---+--- a | a b | a (2 rows) Please, no silly top-posting. A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
ok, thanks, I will create a new message when I post next time. about my question, I think distinct can't solve my problem, because I want to get more than one rows. if there is more than (or equal) 2 (eg: 2, 3, 4, 100 ...)rows have the same value of column 'b' , I want to get only 2 rows. if lesse than 2, I want get all the result of them. ps: I' think, the *2* in my example is not appropriate, how about 10, 50? thanks. On Thu, 2009-02-19 at 08:33 +0100, A. Kretschmer wrote: > In response to Yi Zhao : > > thanks Ringer. > > my mean is that: > > I want less than 2 rows which have the same value of column "b"! > > > > for example, there is 3 columns have the same value "A", > > X | A > > X | A > > Y | A > > I want my result have two of them. > > I think, you are searching for DISTINCT: > > test=# create table foo (a text, b text); > CREATE TABLE > test=*# copy foo from stdin; > Enter data to be copied followed by a newline. > End with a backslash and a period on a line by itself. > >> a a > >> a a > >> b a > >> \. > test=*# select distinct a,b from foo; > a | b > ---+--- > a | a > b | a > (2 rows) > > > Please, no silly top-posting. > > A: Because it messes up the order in which people normally read text. > Q: Why is top-posting such a bad thing? > A: Top-posting. > Q: What is the most annoying thing in e-mail? > > > Andreas > -- > Andreas Kretschmer > Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) > GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net >
Yi Zhao wrote: > ok, thanks, I will create a new message when I post next time. And it's nice to reply below the original message, after cutting off the bits that don't matter anymore. It saves space and makes your messages easier for other people to read, which means you are more likely to get replies and people are more likely to help you out. > About my question, I think distinct [on] can't solve my problem, because I > want to get more than one rows. if there is more than (or equal) 2 (eg: > 2, 3, 4, 100 ...)rows have the same value of column 'b' , I want to get > only 2 rows. if lesse than 2, I want get all the result of them. > > ps: I' think, the *2* in my example is not appropriate, how about 10, > 50? OK, so for each distinct value in `b' you wish to obtain a random selection of `n' or fewer rows in which that value of `b' occurs? I'm pretty sure you want the PostgreSQL 8.4 windowing functions. See: http://www.depesz.com/index.php/2009/01/21/waiting-for-84-window-functions/ http://developer.postgresql.org/pgdocs/postgres/tutorial-window.html http://developer.postgresql.org/pgdocs/postgres/functions-window.html Is this a production system, or something in development? If it's only in development I strongly suggest moving to PostgreSQL 8.4 now, so that by the time you're ready to release 8.4 will have been released too and you will be able to use its features. If that's not an option ... there are probably clever ways of doing what you want, but personally I'd just hack something ugly in Pl/PgSQL like (using my previous example code): CREATE OR REPLACE FUNCTION blah () RETURNS SETOF ab AS $$ DECLARE thisb text; BEGIN FOR thisb IN SELECT DISTINCT(b) FROM ab LOOP RETURN QUERY SELECT a, b FROM ab WHERE b = thisb LIMIT 2; END LOOP; RETURN; END; $$ LANGUAGE 'plpgsql' STABLE; test=# SELECT * FROM blah(); a | b ---+--- X | A Y | A X | B D | B P | C (5 rows) -- Craig Ringer
In response to Yi Zhao : > ok, thanks, I will create a new message when I post next time. > > about my question, I think distinct can't solve my problem, because I > want to get more than one rows. if there is more than (or equal) 2 (eg: > 2, 3, 4, 100 ...)rows have the same value of column 'b' , I want to get > only 2 rows. if lesse than 2, I want get all the result of them. > > ps: I' think, the *2* in my example is not appropriate, how about 10, > 50? Okay, as Craig Ringer said, you can use new features in 8.4. For example: test=# select * from foo; a | b ---+--- a | a b | a c | a d | a a | b b | b c | b a | c a | d a | e b | e c | e d | e (13 rows) So, now i'm counting the rows, partition by b: test=# select b, a, row_number() over (partition by b) from foo order by b, a; b | a | row_number ---+---+------------ a | a | 1 a | b | 2 a | c | 3 a | d | 4 b | a | 1 b | b | 2 b | c | 3 c | a | 1 d | a | 1 e | a | 1 e | b | 2 e | c | 3 e | d | 4 (13 rows) Next step, only up to 2 entries for every value in b: test=# select * from (select b, a, row_number() over (partition by b) from foo order by b, a) temp where row_number <= 2; b | a | row_number ---+---+------------ a | a | 1 a | b | 2 b | a | 1 b | b | 2 c | a | 1 d | a | 1 e | a | 1 e | b | 2 (8 rows) Regards, Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net