Thread: Limitations of PostgreSQL
Hello There...
I am new to PostgreSQL.
I usually check out negative sides of any software or system, before implementing it or using it.
I would like to know the limitations of PostgreSQL.
Also, I would like to know, whether there is any limitations for the number of records or the size of the tables or database in PostgreSQL?
Thanks in Advance,
Regards,
Denis
On Mon, Oct 03, 2005 at 12:01:02PM -0000, Denis G Dudhia wrote: > I would like to know the limitations of PostgreSQL. > > Also, I would like to know, whether there is any limitations for the > number of records or the size of the tables or database in > PostgreSQL? See PostgreSQL FAQ. Section 4.4) What is the maximum size for a row, a table, and a database? http://www.postgresql.org/docs/faqs.FAQ.html#4.4 -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Denis G Dudhia wrote: > Hello There... > > I am new to PostgreSQL. > > I usually check out negative sides of any software or system, before > implementing it or using it. > Compared to MySQL, I can't think of any downsides. All relevant usability issues have been solved, though there are some functions like INTERVAL that are not supported (see my migration guide at http://www.metatrontech.com/wpapers/) Compared to Oracle, DB2, Terradata, etc, PostgreSQL lacks support for parallelism in queries, thereby making it a poor choice for certain types of data warehousing applications. This being said, 8.1, depending on the application, may meet your needs once it comes out if you are doing data warehousing. Josh Berkus can correct me if I am wrong, but I believe he is working on a version of the open source Bizgress with will have massively parallel queries. Multimaster async replication w/updates is a pain at the moment and mostly a set of kludges. I am not aware of any good sync. replication solutions for PostgreSQL at the moment. Does not have full XA support at the moment (does have TPC). May not be the best choice on Windows for production use, though for development, it should be adequate. Hope this helps, Chris Travers Metatron Technology COnsulting > > I would like to know the limitations of PostgreSQL. > > Also, I would like to know, whether there is any limitations for the > number of records or the size of the tables or database in PostgreSQL? > > Thanks in Advance, > > Regards, > Denis > > > > <http://adworks.rediff.com/cgi-bin/AdWorks/sigclick.cgi/www.rediff.com/signature-home.htm/1507191490@Middle5?PARTNER=3>
On Wed, 2005-10-12 at 16:16, Chris Travers wrote: > Denis G Dudhia wrote: > > > Hello There... > > > > I am new to PostgreSQL. > > > > I usually check out negative sides of any software or system, before > > implementing it or using it. > > > Compared to MySQL, I can't think of any downsides. All relevant > usability issues have been solved, though there are some functions like > INTERVAL that are not supported (see my migration guide at > http://www.metatrontech.com/wpapers/) What, exactly, is the interval function in MySQL? IS that one that creates a sequence of numbers or whatnot? If so, there is an equivalent in 8.0 now. By the way, interval is a SQL reserved keyword, so it's surprising MySQL would choose to name a function after it. Thought I'd comment on this. According to the author of the innodb engine, innodb uses MVCC. OTOH, I consider innodb to be broken in production, due to issues with constant growth and no way to reclaim the lost space. This means that vacuuming, a minor annoyance in PostgreSQL, is a major issue for 24/7 mysql databases running on innodb, where they must be shut down and restarted to clear up the unused space in the innodb tablespace. About Mammoth, I'm pretty sure that when you purchase a support contract you get a copy of the source code. > Multimaster async replication w/updates is a pain at the moment and > mostly a set of kludges. There really are too many use cases for there to be a "simple" resolution to the problems presented by multi-master replication. It's a complex problem that creates more complex problems as you attempt to solve it. > I am not aware of any good sync. replication solutions for PostgreSQL at > the moment. pgpool does a good job. Many folks miss the fact that it can do replication as well as load balancing. pgcluster uses parts of pgpool to do its clustering as well. They are, however, statement level, not log level. > Does not have full XA support at the moment (does have TPC). I'd point out here that MySQL's XA support is quite primitive, and only useful for a fairly smaller number of cases. > May not be the best choice on Windows for production use, though for > development, it should be adequate. Agreed...
On Wed, Oct 12, 2005 at 04:49:59PM -0500, Scott Marlowe wrote: > On Wed, 2005-10-12 at 16:16, Chris Travers wrote: > > Compared to MySQL, I can't think of any downsides. All relevant > > usability issues have been solved, though there are some functions like > > INTERVAL that are not supported (see my migration guide at > > http://www.metatrontech.com/wpapers/) > > What, exactly, is the interval function in MySQL? IS that one that > creates a sequence of numbers or whatnot? If so, there is an equivalent > in 8.0 now. By the way, interval is a SQL reserved keyword, so it's > surprising MySQL would choose to name a function after it. Surprising? C'mon now, this is MySQL :-> Here's an excerpt from the MySQL documentation: INTERVAL(N,N1,N2,N3,...) Returns 0 if N < N1, 1 if N < N2 and so on or -1 if N is NULL. All arguments are treated as integers. It is required that N1 < N2 < N3 < ... < Nn for this function to work correctly. This is because a binary search is used (very fast). mysql> SELECT INTERVAL(23, 1, 15, 17, 30, 44, 200); -> 3 mysql> SELECT INTERVAL(10, 1, 10, 100, 1000); -> 2 mysql> SELECT INTERVAL(22, 23, 30, 44, 200); -> 0 -- Michael Fuhr
Wow, I must be missing something, because I cannot even dream up a case where I think I might find a use for the mysql INTERVAL() function, far less actually NEED it :) Terry Michael Fuhr wrote: > On Wed, Oct 12, 2005 at 04:49:59PM -0500, Scott Marlowe wrote: > >>On Wed, 2005-10-12 at 16:16, Chris Travers wrote: >> >>>Compared to MySQL, I can't think of any downsides. All relevant >>>usability issues have been solved, though there are some functions like >>>INTERVAL that are not supported (see my migration guide at >>>http://www.metatrontech.com/wpapers/) >> >>What, exactly, is the interval function in MySQL? IS that one that >>creates a sequence of numbers or whatnot? If so, there is an equivalent >>in 8.0 now. By the way, interval is a SQL reserved keyword, so it's >>surprising MySQL would choose to name a function after it. > > > Surprising? C'mon now, this is MySQL :-> > > Here's an excerpt from the MySQL documentation: > > INTERVAL(N,N1,N2,N3,...) > Returns 0 if N < N1, 1 if N < N2 and so on or -1 if N is > NULL. All arguments are treated as integers. It is required > that N1 < N2 < N3 < ... < Nn for this function to work > correctly. This is because a binary search is used (very fast). > > mysql> SELECT INTERVAL(23, 1, 15, 17, 30, 44, 200); > -> 3 > mysql> SELECT INTERVAL(10, 1, 10, 100, 1000); > -> 2 > mysql> SELECT INTERVAL(22, 23, 30, 44, 200); > -> 0 > -- Terry Fielder terry@greatgulfhomes.com Associate Director Software Development and Deployment Great Gulf Homes / Ashton Woods Homes Fax: (416) 441-9085
Scott Marlowe wrote: >On Wed, 2005-10-12 at 16:16, Chris Travers wrote: > > >>Denis G Dudhia wrote: >> >> >> >>>Hello There... >>> >>>I am new to PostgreSQL. >>> >>>I usually check out negative sides of any software or system, before >>>implementing it or using it. >>> >>> >>> >>Compared to MySQL, I can't think of any downsides. All relevant >>usability issues have been solved, though there are some functions like >>INTERVAL that are not supported (see my migration guide at >>http://www.metatrontech.com/wpapers/) >> >> > >What, exactly, is the interval function in MySQL? IS that one that >creates a sequence of numbers or whatnot? If so, there is an equivalent >in 8.0 now. By the way, interval is a SQL reserved keyword, so it's >surprising MySQL would choose to name a function after it. > > > It is sort of like LEAST and GREATEST but somewhat different.... For example, INTERVAL(45, 0, 20, 40, 80, 100) will return 3 (I think) because 45 is between 40 and 80. I don't know if it is specified in the standard or not or even if it is useful. Just thought I would mention it. >Thought I'd comment on this. > >According to the author of the innodb engine, innodb uses MVCC. >OTOH, I consider innodb to be broken in production, due to issues with >constant growth and no way to reclaim the lost space. > > Any sources on that? I would love to have info on that. >This means that vacuuming, a minor annoyance in PostgreSQL, is a major >issue for 24/7 mysql databases running on innodb, where they must be >shut down and restarted to clear up the unused space in the innodb >tablespace. > > > > No kidding. Would like more info. >>Multimaster async replication w/updates is a pain at the moment and >>mostly a set of kludges. >> >> > >There really are too many use cases for there to be a "simple" >resolution to the problems presented by multi-master replication. It's >a complex problem that creates more complex problems as you attempt to >solve it. > > I have come up with some ways of doing this but they are difficult. And the question is always "How good is good enough" > > >>I am not aware of any good sync. replication solutions for PostgreSQL at >>the moment. >> >> > >pgpool does a good job. Many folks miss the fact that it can do >replication as well as load balancing. pgcluster uses parts of pgpool >to do its clustering as well. They are, however, statement level, not >log level. > > I will remember that. > > >>Does not have full XA support at the moment (does have TPC). >> >> > >I'd point out here that MySQL's XA support is quite primitive, and only >useful for a fairly smaller number of cases. > > Again, I was comparing with DB2 and Oracle. One should consider all new features of MySQL to be both overmarketed and primitive for some time. Best Wishes, Chris Travers Metatron Technology Consulting
On Wed, 2005-10-12 at 16:49 -0500, Scott Marlowe wrote: > On Wed, 2005-10-12 at 16:16, Chris Travers wrote: > > Denis G Dudhia wrote: > > > > > Hello There... > > > > > > I am new to PostgreSQL. > > > > > > I usually check out negative sides of any software or system, before > > > implementing it or using it. <snip request for pimples> <snip pimple list> Remarkable how exchanges like that continue to leave an impression with me. I can't believe I'm the only one, though.. Somebody says, "Hey, can you tell me what's wrong with your product?" And there's an answer from the product's camp: "Well, the consensus is pretty much this...". Simple request, honest answer. That's the kind of currency that should be coined everywhere. PostgreSQL has a strong and growing following because of its quality and capability, and more power to it, I say. But there are plenty of us who have made the choice between equal functionalities, based completely upon the quality I've just mentioned. Outstanding, folks. (as well as outstanding folks) Andy
[snipped] > May not be the best choice on Windows for production use, though for > development, it should be adequate. Are there known issues with the Windows version for production use, or is this simply because of the relative newness of the Windows-native version? - Bill
Bill Bartlett wrote: > [snipped] > >> May not be the best choice on Windows for production use, though for >> development, it should be adequate. > > Are there known issues with the Windows version for production use, or > is this simply because of the relative newness of the Windows-native > version? Performance isn't as good (yet) as on the *nix releases, and there are issues with the autovacuum module and unicode support. Might be worth checking out the 8.1 beta release if you have specific requirements. -- Richard Huxton Archonet Ltd
I would like to believe that its the latter, it should improve with time. On Wed, 12 Oct 2005, Bill Bartlett wrote: > >[snipped] > >> May not be the best choice on Windows for production use, though for >> development, it should be adequate. > >Are there known issues with the Windows version for production use, or >is this simply because of the relative newness of the Windows-native >version? > >- Bill > > >---------------------------(end of broadcast)--------------------------- >TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- Aly S.P Dharshi aly.dharshi@telus.net "A good speech is like a good dress that's short enough to be interesting and long enough to cover the subject"
> > [snipped] > > > >> May not be the best choice on Windows for production use, > though for > >> development, it should be adequate. > > > > Are there known issues with the Windows version for > production use, or > > is this simply because of the relative newness of the > Windows-native > > version? > > Performance isn't as good (yet) as on the *nix releases, and > there are issues with the autovacuum module and unicode > support. Might be worth checking out the 8.1 beta release if > you have specific requirements. Specifically, both autovacuum and unicode support is fixed in 8.1. Along with other things, of course , but those are the two big ones biting win32 people in 8.0. (IIRC nothing specific done for win32 regarding performance since 8.0.x, for example - but it was done in one of the early 8.0.x releases) //Magnus
On Wed, 2005-10-12 at 23:11, Chris Travers wrote: > Scott Marlowe wrote: > >Thought I'd comment on this. > > > >According to the author of the innodb engine, innodb uses MVCC. > >OTOH, I consider innodb to be broken in production, due to issues with > >constant growth and no way to reclaim the lost space. > > > > > Any sources on that? I would love to have info on that. Just do a google search for innodb and mvcc. Lots of soruces, not sure how good any one might be. > >This means that vacuuming, a minor annoyance in PostgreSQL, is a major > >issue for 24/7 mysql databases running on innodb, where they must be > >shut down and restarted to clear up the unused space in the innodb > >tablespace. > > > > > > > > > No kidding. Would like more info. Setup a MySQL database with 5.0.13 and insert and delete a lot in an mvcc table. The innodb tablespace just grows and grows. > >>Multimaster async replication w/updates is a pain at the moment and > >>mostly a set of kludges. > >> > >> > > > >There really are too many use cases for there to be a "simple" > >resolution to the problems presented by multi-master replication. It's > >a complex problem that creates more complex problems as you attempt to > >solve it. > > > > > I have come up with some ways of doing this but they are difficult. And > the question is always "How good is good enough" Right, and the real issue is that what might be a good compromise between performance and data security in one situation might be an awful choice in another. So, multi-master replication isn't likely to become a plug in module for postgresql any time soon. > >>Does not have full XA support at the moment (does have TPC). > >> > >> > > > >I'd point out here that MySQL's XA support is quite primitive, and only > >useful for a fairly smaller number of cases. > > > > > Again, I was comparing with DB2 and Oracle. One should consider all new > features of MySQL to be both overmarketed and primitive for some time. True. PostgreSQL IS working on XA support (specifically the jdbc list is.)
On Wed, 2005-10-12 at 20:08, Michael Fuhr wrote: > On Wed, Oct 12, 2005 at 04:49:59PM -0500, Scott Marlowe wrote: > > On Wed, 2005-10-12 at 16:16, Chris Travers wrote: > > > Compared to MySQL, I can't think of any downsides. All relevant > > > usability issues have been solved, though there are some functions like > > > INTERVAL that are not supported (see my migration guide at > > > http://www.metatrontech.com/wpapers/) > > > > What, exactly, is the interval function in MySQL? IS that one that > > creates a sequence of numbers or whatnot? If so, there is an equivalent > > in 8.0 now. By the way, interval is a SQL reserved keyword, so it's > > surprising MySQL would choose to name a function after it. > > Surprising? C'mon now, this is MySQL :-> > > Here's an excerpt from the MySQL documentation: > > INTERVAL(N,N1,N2,N3,...) > Returns 0 if N < N1, 1 if N < N2 and so on or -1 if N is > NULL. All arguments are treated as integers. It is required > that N1 < N2 < N3 < ... < Nn for this function to work > correctly. This is because a binary search is used (very fast). > > mysql> SELECT INTERVAL(23, 1, 15, 17, 30, 44, 200); > -> 3 > mysql> SELECT INTERVAL(10, 1, 10, 100, 1000); > -> 2 > mysql> SELECT INTERVAL(22, 23, 30, 44, 200); I could see how it might be possible to make a two argument user defined function that took an argument like: select intvl(10,'20 30 40 50 60'); so that the multiple arguments are really just a space or comma separated list fed to the function. I wouldn't name it interval though. :)
On Thu, Oct 13, 2005 at 10:46:29AM -0500, Scott Marlowe wrote: > choice in another. So, multi-master replication isn't likely to become > a plug in module for postgresql any time soon. It's not even a thing, so it can't become a plug-in. Consider just two kinds of multi-master: 1. Oracle's RAC. This is a shared-disk, engine-failover kind of multi-master. It provides a certain amount of scaling, but nothing I've seen or heard suggests that the license cost couldn't just as easily and effectively be thrown at larger hardware for better scaling. The really big reason to use RAC is five-nines situations: you're trying to make sure that even unlikely failures of your machines never cause the database to stop working (for suitably lawyer-understood values of "stop". RAC remastering is not a zero-cost, nor even invisible, operation. But from an application perspective, it can be made to look like "database is slow" as opposed to "database crashed"). 2. Disconnected sales forces with local copies of some portion of the sales database. This is completely distributed database use, with potential for conflicts and an associated need for conflict resolution strategies. These are different sorts of systems addressing completely different use cases. But they're both potentially marketed as "multi-master". Often, a manager asking for multi-master thinks s/he is buying all of this; which desire is probably impossible to satisfy with one piece of software (as opposed to one thing all called by the same name by the marketing department). A -- Andrew Sullivan | ajs@crankycanuck.ca The whole tendency of modern prose is away from concreteness. --George Orwell
Scott Marlowe <smarlowe@g2switchworks.com> writes: > I could see how it might be possible to make a two argument user defined > function that took an argument like: > select intvl(10,'20 30 40 50 60'); > so that the multiple arguments are really just a space or comma > separated list fed to the function. I wouldn't name it interval though. Use an array: select intvl(10, array[20,30,40,50,60]); I think you could even code this as a generic polymorphic function: create function intvl(anyelement, anyarray) returns anyelement ... Anybody feel like filling it in with a standard binary search algorithm? regards, tom lane
What's the point of a binary search if the list is small enough to fit on a line or two? And if a query can be substituted for N1-NN, you have to read all the values anyway, and then the function is trivially expressed as a normal query with no decrease in speed. -- Dean > On Wed, 2005-10-12 at 20:08, Michael Fuhr wrote: > >> Here's an excerpt from the MySQL documentation: >> INTERVAL(N,N1,N2,N3,...) >> Returns 0 if N < N1, 1 if N < N2 and so on or -1 if N is >> NULL. All arguments are treated as integers. It is required >> that N1 < N2 < N3 < ... < Nn for this function to work >> correctly. This is because a binary search is used (very fast). >>
I have to admit, my thoughts on it were to build a query with case statements in it and execute it. That sounds about like you're proposing, right? On Thu, 2005-10-13 at 11:30, Dean Gibson (DB Administrator) wrote: > What's the point of a binary search if the list is small enough to fit > on a line or two? And if a query can be substituted for N1-NN, you have > to read all the values anyway, and then the function is trivially > expressed as a normal query with no decrease in speed. > > -- Dean > > On Wed, 2005-10-12 at 20:08, Michael Fuhr wrote: > > > >> Here's an excerpt from the MySQL documentation: > >> INTERVAL(N,N1,N2,N3,...) > >> Returns 0 if N < N1, 1 if N < N2 and so on or -1 if N is > >> NULL. All arguments are treated as integers. It is required > >> that N1 < N2 < N3 < ... < Nn for this function to work > >> correctly. This is because a binary search is used (very fast). > >> > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
On Thu, Oct 13, 2005 at 10:53:51AM -0700, Chris Travers wrote: > Now, what about PgPool as a multimaster sync replication solution? Sure > it is statement level.... But is there any reason why you cannot have > multiple PgPool instances running against a number of DB servers? Well, to begin with, you have a serious race condition: pgpool begins T1 on M1 and M2. Someone logs into M2 and does some work in T2. M1 completes the work of T1. M2 completes the work of T2. pgpool issues COMMIT. M1 replies with the COMMIT. M2 detects a deadlock when T2 tries to COMMIT. Now what? There's nothing to prevent this in the system, as near as I can see, so it's just not bullet proof enough for the cases where people really, really need only five minutes of down time a year. If you _really_ needed that, you'd be willing (and able) to pay the costs. Of course, we can do what we can to make those costs go down. :-> But they're not that low yet. Also, there is still (or was last I checked) a limitation on the number of machines pgpool could address, and there are some stability and reliability issues we've seen. It's a great piece of code, don't get me wrong; but it's not stable enough yet to bet millions of dollars on. ObNit: ORAC isn't really synchronous; it just looks that way. > This is multimaster async replication. But it can be further broken > down into four types: Sure; I think you could break it even smaller sub-types, if you worked at it, too. For example, an async system that tolerates farily brief interruptions in two-way communications is very different from the one where your sales force (or your Palm) shows up after a week and dumps a whole bunch of new conflicts on your lap. This second case is something Slony wouldn't tolerate; but I think a relatively-high availability would probably work with some multi-way conflict resolution, if someone were willing to build it. That wasn't the itch Afilias needed scratching, because of the kinds of problems we have to solve (to begin with, exactly one person may be the registrant of record of a domain name at any one time, so conflict resolution is just not allowed in our problem set: we have to maintain global uniqueness). But we did have some discussions about how one might file the corners of the hole to make it square enough for the peg. I think it's possible, if someone volunteers to do the work (maybe in a sub-project, maybe as a co-operative project). I don't have the problem, so I can't justify the staff time. So if someone _else_ has the problem, maybe s/he can. A -- Andrew Sullivan | ajs@crankycanuck.ca I remember when computers were frustrating because they *did* exactly what you told them to. That actually seems sort of quaint now. --J.D. Baldwin
Andrew Sullivan <ajs@crankycanuck.ca> writes: > On Thu, Oct 13, 2005 at 10:53:51AM -0700, Chris Travers wrote: >> Now, what about PgPool as a multimaster sync replication solution? Sure >> it is statement level.... But is there any reason why you cannot have >> multiple PgPool instances running against a number of DB servers? > Well, to begin with, you have a serious race condition: > pgpool begins T1 on M1 and M2. > Someone logs into M2 and does some work in T2. > M1 completes the work of T1. > M2 completes the work of T2. > pgpool issues COMMIT. > M1 replies with the COMMIT. > M2 detects a deadlock when T2 tries to COMMIT. > Now what? This particular issue is fixable as of 8.1: pgpool should be using 2-phase commit. regards, tom lane
On Thu, Oct 13, 2005 at 02:30:11PM -0400, Tom Lane wrote: > > This particular issue is fixable as of 8.1: pgpool should be using > 2-phase commit. Sure, but if we're talking about what people can put in production today, I don't think 8.1 + pgpool with 2-phase commit is on the list. (We got started down this road, I think, because someone was wanting a multi-master system right away.) But I think it will be soon. A -- Andrew Sullivan | ajs@crankycanuck.ca A certain description of men are for getting out of debt, yet are against all taxes for raising money to pay it off. --Alexander Hamilton
Ok. I was completely unclear regarding how I think Slony would fit into a multimaster async solution with updates. Andrew Sullivan wrote: >Also, there is still (or was last I checked) a limitation on the >number of machines pgpool could address, and there are some stability >and reliability issues we've seen. It's a great piece of code, don't >get me wrong; but it's not stable enough yet to bet millions of >dollars on. > >ObNit: ORAC isn't really synchronous; it just looks that way. > > > If it is shared disk, is it even really replication? >>This is multimaster async replication. But it can be further broken >>down into four types: >> >> > >Sure; I think you could break it even smaller sub-types, if you >worked at it, too. For example, an async system that tolerates >farily brief interruptions in two-way communications is very >different from the one where your sales force (or your Palm) shows up >after a week and dumps a whole bunch of new conflicts on your lap. > Well, a disconnected sales force is interesting in that you will almost certainly have a "star" topology in your replication environment. After all, your sales force is not replicating the entire sales database onto their laptops we hope.... I think you could still use Slony as a sort of pre-conflict-resolution update-log replication solution. Then you could have custom triggers on your update logs to actually attempt conflict resolution and handle failure on that area. I.e. anytime you have updates, you are going to have a replication solution that operates in the following way: 1) Replicates changed data (updatelog) 2) Resolves conflicts 3) Makes changes to authoritative data sets 4) Replicates datasets back. Stages 1 and 4 could be handled by Slony, while 2 and 3 would require custom triggers. In essence this is really master/slave that appears multimaster. You will have tradeoffs here in granularity of conflict resolution versus performance. For a smaller number of nodes (maybe one at each branch office, where you might have a small level of interruption, you could use a mesh topology, but it uses the same 4 stages up above. Just you will probably have a separate update log for every other node, and use a similar solution. Again, you will have a tradeoff between granularity in conflict resolution and performance. > >This second case is something Slony wouldn't tolerate; but I think a >relatively-high availability would probably work with some multi-way >conflict resolution, if someone were willing to build it. > Why would Slony have to do your conflict resolution at all? Why not just use it to replicate update logs and data sets, and leave conflict resolution to custom triggers? Conflict resolution is really going to be the area where you have individual needs anyway. Slony makes a good piece of a solution but it is just that, a piece. Because multimaster async replication is unlikely to be a one-size fits all area, any out of the box solution will likely be incomplete, unwieldy, or worse. Maybe a pgfoundry project consisting of sample setups might be useful as starting points, however. > That >wasn't the itch Afilias needed scratching, because of the kinds of >problems we have to solve (to begin with, exactly one person may be >the registrant of record of a domain name at any one time, so >conflict resolution is just not allowed in our problem set: we have >to maintain global uniqueness). But we did have some discussions >about how one might file the corners of the hole to make it square >enough for the peg. I think it's possible, if someone volunteers to >do the work (maybe in a sub-project, maybe as a co-operative >project). I don't have the problem, so I can't justify the staff >time. So if someone _else_ has the problem, maybe s/he can. > > Well, I want to thank Afilias for such a useful tool. Again, if we all could make such contributions, we would take over the world in no time whatsoever..... Again, I see Slony as a very useful piece of a multimaster async replication solution, but as you point out, Slony can't handle it by itself. This isn't in my view a question of a square peg in a round hole, but an issue of a useful piece of a solution. Best Wishes, Chris Travers Metatron Technology Consulting
On Thu, Oct 13, 2005 at 12:01:19PM -0700, Chris Travers wrote: > Stages 1 and 4 could be handled by Slony, while 2 and 3 would require > custom triggers. In essence this is really master/slave that appears > multimaster. You will have tradeoffs here in granularity of conflict > resolution versus performance. This sounds very similar to an idea Josh Berkus sketched for me while getting off a train at OSCON. So I bet I didn't get all the details :) But I think it's likely worth pursuing. > Why would Slony have to do your conflict resolution at all? Why not > just use it to replicate update logs and data sets, and leave conflict > resolution to custom triggers? Conflict resolution is really going to That would probably work. The real thing Slony can't stand is really long periods of disconnection. You might be able to hack the log shipping approach to help there, though. In any case, it's probably a subject for the Slony lists, once a full-ish proposal is ready. > Well, I want to thank Afilias for such a useful tool. Again, if we all Afilias didn't exactly contribute it -- it was always intended to be a community project, and we just contributed some seeds (read "Jan" and later "Chris") to get it growing. That seems to have worked -- I think there is now as much contribution from non-Afilias folks as Afilias folks, and I think Slony is better for it. A -- Andrew Sullivan | ajs@crankycanuck.ca Information security isn't a technological problem. It's an economics problem. --Bruce Schneier
Here's one (of many) solutions: CREATE OR REPLACE FUNCTION range( ANYELEMENT, ANYARRAY ) RETURNS INTEGER LANGUAGE SQL AS 'SELECT CASE array_upper( $2, 1 ) WHEN 1 THEN 0 ELSE range( $1, $2[ 1:(array_upper( $2, 1 ) - 1) ] ) END + CASE WHEN $1 > $2[ array_upper( $2, 1 ) ] THEN 1 ELSE 0 END; '; Called as SELECT range( 25, ARRAY[ 10, 20, 30, 40, 50 ] ); -- Dean On 2005-10-13 09:48, Scott Marlowe wrote: > I have to admit, my thoughts on it were to build a query with case statements in it and execute it. That sounds aboutlike you're proposing, right? > > On Thu, 2005-10-13 at 11:30, Dean Gibson (DB Administrator) wrote: > >> What's the point of a binary search if the list is small enough to fit on a line or two? And if a query can be substitutedfor N1-NN, you have to read all the values anyway, and then the function is trivially expressed as a normal querywith no decrease in speed. >> >> -- Dean >> >>> On Wed, 2005-10-12 at 20:08, Michael Fuhr wrote: >>> >>> >>>> Here's an excerpt from the MySQL documentation: >>>> INTERVAL(N,N1,N2,N3,...) >>>> Returns 0 if N < N1, 1 if N < N2 and so on or -1 if N is >>>> NULL. All arguments are treated as integers. It is required >>>> that N1 < N2 < N3 < ... < Nn for this function to work >>>> correctly. This is because a binary search is used (very fast). >>>> >>>>
> On Thu, Oct 13, 2005 at 10:53:51AM -0700, Chris Travers wrote: > > Now, what about PgPool as a multimaster sync replication solution? Sure > > it is statement level.... But is there any reason why you cannot have > > multiple PgPool instances running against a number of DB servers? > > Well, to begin with, you have a serious race condition: > > pgpool begins T1 on M1 and M2. > Someone logs into M2 and does some work in T2. > M1 completes the work of T1. > M2 completes the work of T2. > pgpool issues COMMIT. > M1 replies with the COMMIT. > M2 detects a deadlock when T2 tries to COMMIT. > > Now what? There's nothing to prevent this in the system, as near as > I can see, so it's just not bullet proof enough for the cases where > people really, really need only five minutes of down time a year. If > you _really_ needed that, you'd be willing (and able) to pay the > costs. Of course, we can do what we can to make those costs go down. > :-> But they're not that low yet. Why pgpool should bother? pgpool supposes every transaction should go through pgpool. Your example sounds like someone logs into M2 and tries to shut down it. > Also, there is still (or was last I checked) a limitation on the > number of machines pgpool could address, and there are some stability > and reliability issues we've seen. It's a great piece of code, don't > get me wrong; but it's not stable enough yet to bet millions of > dollars on. I don't know what you kind of problem you are talking about, but... If you find problems, please post it to pgpool-general and let's solve it. That's the open source way. -- SRA OSS, Inc. Japan Tatsuo Ishii > ObNit: ORAC isn't really synchronous; it just looks that way. > > > This is multimaster async replication. But it can be further broken > > down into four types: > > Sure; I think you could break it even smaller sub-types, if you > worked at it, too. For example, an async system that tolerates > farily brief interruptions in two-way communications is very > different from the one where your sales force (or your Palm) shows up > after a week and dumps a whole bunch of new conflicts on your lap. > This second case is something Slony wouldn't tolerate; but I think a > relatively-high availability would probably work with some multi-way > conflict resolution, if someone were willing to build it. That > wasn't the itch Afilias needed scratching, because of the kinds of > problems we have to solve (to begin with, exactly one person may be > the registrant of record of a domain name at any one time, so > conflict resolution is just not allowed in our problem set: we have > to maintain global uniqueness). But we did have some discussions > about how one might file the corners of the hole to make it square > enough for the peg. I think it's possible, if someone volunteers to > do the work (maybe in a sub-project, maybe as a co-operative > project). I don't have the problem, so I can't justify the staff > time. So if someone _else_ has the problem, maybe s/he can. > > A > > -- > Andrew Sullivan | ajs@crankycanuck.ca > I remember when computers were frustrating because they *did* exactly what > you told them to. That actually seems sort of quaint now. > --J.D. Baldwin > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
On Fri, Oct 14, 2005 at 07:48:00AM +0900, Tatsuo Ishii wrote: > > Why pgpool should bother? pgpool supposes every transaction should go > through pgpool. Your example sounds like someone logs into M2 and tries > to shut down it. But because there's no enforcement of "every transaction should go through pgpool", it's not enough for the managers who are ultimately responsible for deciding on system design. In the hypothetical case, we're aiming at multimaster systems that are there for reliability, not performance. Decreasing the reliance on fault-tolerant hardware by increasing the potential for human error does not solve that problem. > I don't know what you kind of problem you are talking about, but... > > If you find problems, please post it to pgpool-general and let's solve > it. That's the open source way. We have been (my colleague Brad is the one who's been working on this). But for something to qualify for real production-grade use, it needs to be rock solid stable in heavy use for a considerable period of time. We're not there yet, is all I'm suggesting. (This principle is why it's also a good thing that Red Hat Enterprise isn't always completely up to date with the community sources.) A -- Andrew Sullivan | ajs@crankycanuck.ca It is above all style through which power defers to reason. --J. Robert Oppenheimer
Andrew Sullivan wrote: >On Thu, Oct 13, 2005 at 10:46:29AM -0500, Scott Marlowe wrote: > > >>choice in another. So, multi-master replication isn't likely to become >>a plug in module for postgresql any time soon. >> >> > >It's not even a thing, so it can't become a plug-in. > > I was referring specifically to Multimaster Async Replication. >Consider just two kinds of multi-master: > >1. Oracle's RAC. This is a shared-disk, engine-failover kind of >multi-master. It provides a certain amount of scaling, but nothing >I've seen or heard suggests that the license cost couldn't just as >easily and effectively be thrown at larger hardware for better >scaling. The really big reason to use RAC is five-nines situations: >you're trying to make sure that even unlikely failures of your >machines never cause the database to stop working (for suitably >lawyer-understood values of "stop". RAC remastering is not a >zero-cost, nor even invisible, operation. But from an application >perspective, it can be made to look like "database is slow" as >opposed to "database crashed"). > > So this is basically a multimaster synchronous replication solution utilizing a shared disk architecture. I generally agree with your assessment that the license costs could be better spent on redundant hardware and more scalable hardware. Also if the shared disk fails, you may lose everything after your last backup. Now, what about PgPool as a multimaster sync replication solution? Sure it is statement level.... But is there any reason why you cannot have multiple PgPool instances running against a number of DB servers? >2. Disconnected sales forces with local copies of some portion >of the sales database. This is completely distributed database use, >with potential for conflicts and an associated need for conflict >resolution strategies. > > This is multimaster async replication. But it can be further broken down into four types: 1) Insert-only, small number of nodes (not too hard to do with Slony-I). 2) Insert-only, large number of nodes (a real pain to do with Slony-I, could become unmanageable easily, but I suppose one could build automated management tools) 3) Insert/Update, small number of nodes. Probably would require a custom solution on top of Slony-I 4) Insert/update, large number of nodes. Not sure if this is possible to manage effectively under any circumstances. Best Wishes, Chris Travers Metatron Technoloy Consulting
Attachment
<snip>
Of course thats highly unlikely because in Oracle you have _two_ complete copies of your active database from your last backup with archive redo logs, so in reality you would have to loose your _entire_ disk cluster, which if you have things organised by the book, you would have archive redo on a seperate controller, and preferably on a seperate array for that very reason.
Oracle though this out pretty well ;)
>multi-master. It provides a certain amount of scaling, but nothing
>I've seen or heard suggests that the license cost couldn't just as
>easily and effectively be thrown at larger hardware for better
>scaling. The really big reason to use RAC is five-nines situations:
>you're trying to make sure that even unlikely failures of your
>machines never cause the database to stop working (for suitably
>lawyer-understood values of "stop". RAC remastering is not a
>zero-cost, nor even invisible, operation. But from an application
>perspective, it can be made to look like "database is slow" as
>opposed to "database crashed").
>
>
So this is basically a multimaster synchronous replication solution
utilizing a shared disk architecture. I generally agree with your
assessment that the license costs could be better spent on redundant
hardware and more scalable hardware. Also if the shared disk fails, you
may lose everything after your last backup.
Of course thats highly unlikely because in Oracle you have _two_ complete copies of your active database from your last backup with archive redo logs, so in reality you would have to loose your _entire_ disk cluster, which if you have things organised by the book, you would have archive redo on a seperate controller, and preferably on a seperate array for that very reason.
Oracle though this out pretty well ;)
<snip>
> But because there's no enforcement of "every transaction should go > through pgpool", it's not enough for the managers who are ultimately > responsible for deciding on system design. In the hypothetical case, > we're aiming at multimaster systems that are there for reliability, > not performance. Decreasing the reliance on fault-tolerant hardware > by increasing the potential for human error does not solve that > problem. Enforcement? There would be plenty of ways to achieve that. For example, you could set pg_hba.conf so that on ly the host where pgpool is running on could connect to the host where postmaster is running on. > We have been (my colleague Brad is the one who's been working on > this). But for something to qualify for real production-grade use, > it needs to be rock solid stable in heavy use for a considerable > period of time. We're not there yet, is all I'm suggesting. (This > principle is why it's also a good thing that Red Hat Enterprise isn't > always completely up to date with the community sources.) Right. It's your freedom that you do not use pgpool until you think it's solid enough. -- SRA OSS, Inc. Japan Tatsuo Ishii
On Thu, 2005-10-13 at 17:48, Tatsuo Ishii wrote: > > On Thu, Oct 13, 2005 at 10:53:51AM -0700, Chris Travers wrote: > > > Now, what about PgPool as a multimaster sync replication solution? Sure > > > it is statement level.... But is there any reason why you cannot have > > > multiple PgPool instances running against a number of DB servers? > > > > Well, to begin with, you have a serious race condition: > > > > pgpool begins T1 on M1 and M2. > > Someone logs into M2 and does some work in T2. > > M1 completes the work of T1. > > M2 completes the work of T2. > > pgpool issues COMMIT. > > M1 replies with the COMMIT. > > M2 detects a deadlock when T2 tries to COMMIT. > > > > Now what? There's nothing to prevent this in the system, as near as > > I can see, so it's just not bullet proof enough for the cases where > > people really, really need only five minutes of down time a year. If > > you _really_ needed that, you'd be willing (and able) to pay the > > costs. Of course, we can do what we can to make those costs go down. > > :-> But they're not that low yet. > > Why pgpool should bother? pgpool supposes every transaction should go > through pgpool. Your example sounds like someone logs into M2 and tries > to shut down it. I had the same thought. Every time I've set up a system with pgpool, I've always configured the db servers on an odd port, cut it off with a firewall to anything but the pgpool machine and set it to only answer to pgpool's IP and only let users access the system through pgpool. USers accessing machines behind the scenes is a VERY bad idea. It's not a pgpool bug, is a user bug. :)
On Fri, Oct 14, 2005 at 11:54:19PM +0900, Tatsuo Ishii wrote: > Enforcement? There would be plenty of ways to achieve that. For > example, you could set pg_hba.conf so that on ly the host where pgpool > is running on could connect to the host where postmaster is running > on. That just changes the problem to someone logging in from that host. (This isn't a theoretical problem, by the way; it's an objection that I've heard from people.) > Right. It's your freedom that you do not use pgpool until you think > it's solid enough. And my managers approve it :) I should note, for the record, that I'm extremely impressed with pgpool. I just think we have some room to grow before we can say we have something to really compete with the commercial multimaster systems. a -- Andrew Sullivan | ajs@crankycanuck.ca The plural of anecdote is not data. --Roger Brinner
On Fri, Oct 14, 2005 at 10:20:41AM -0500, Scott Marlowe wrote: > USers accessing machines behind the scenes is a VERY bad idea. It's not > a pgpool bug, is a user bug. :) The problem with this glib answer is that we are talking about systems where such a "user bug" can cost people millions of dollars. They want the _machine_ to prevent the user bug. That's what they think they're buying, and my understanding is that some of the other systems provide greater protection. Remember, a five-nines system means five minutes of downtime, all told, per year. People who really need that are willing to pay for it, because it's worth it. Most of the time, it isn't, and most so called five-nines systems really aren't. (There is no way you could really claim reliable five nines performance on the in-memory-only MySQL system, for instance: it'd be too risky, unless you could guarantee you'd never exceed your memory. Who's willing to guarantee the data set won't grow unexpectedly?) That said, using pgpool for higher-reliability, we-checked-it-real- good systems isn't a bad idea; on the contrary. Just let's not pretend it's something that it isn't really. A -- Andrew Sullivan | ajs@crankycanuck.ca In the future this spectacle of the middle classes shocking the avant- garde will probably become the textbook definition of Postmodernism. --Brad Holland
On Fri, 2005-10-14 at 10:48, Andrew Sullivan wrote: > On Fri, Oct 14, 2005 at 11:54:19PM +0900, Tatsuo Ishii wrote: > > Enforcement? There would be plenty of ways to achieve that. For > > example, you could set pg_hba.conf so that on ly the host where pgpool > > is running on could connect to the host where postmaster is running > > on. > > That just changes the problem to someone logging in from that host. > (This isn't a theoretical problem, by the way; it's an objection that > I've heard from people.) You're users shouldn't be able to do that. If they can, you've set up your system wrong. Only the DBA should have access to that machine. This is the same kind of problem as having a user log into a slony replicant and issue the command "drop schema _clustername cascade" being a problem. It's a permission / user problem. > > Right. It's your freedom that you do not use pgpool until you think > > it's solid enough. > > And my managers approve it :) I should note, for the record, that > I'm extremely impressed with pgpool. I just think we have some room > to grow before we can say we have something to really compete with > the commercial multimaster systems. Agreed. pgpool is great. But it's not RAC (That's a good thing, actually :)
Please allow me to wad into this discussion giving it a different view. LDAP is a form of a database, it has Multi-Master afaik it runs somewhat decently for two masters, can't we use a similar setup as say Fedora Directory Server or OpenLDAP's replication strategy and with some modification/improvement use it to help PostgreSQL do Multi-Master/Replication ? Am I missing the point ? Cheers, Aly. On Fri, 14 Oct 2005, Scott Marlowe wrote: >On Thu, 2005-10-13 at 17:48, Tatsuo Ishii wrote: >> > On Thu, Oct 13, 2005 at 10:53:51AM -0700, Chris Travers wrote: >> > > Now, what about PgPool as a multimaster sync replication solution? Sure >> > > it is statement level.... But is there any reason why you cannot have >> > > multiple PgPool instances running against a number of DB servers? >> > >> > Well, to begin with, you have a serious race condition: >> > >> > pgpool begins T1 on M1 and M2. >> > Someone logs into M2 and does some work in T2. >> > M1 completes the work of T1. >> > M2 completes the work of T2. >> > pgpool issues COMMIT. >> > M1 replies with the COMMIT. >> > M2 detects a deadlock when T2 tries to COMMIT. >> > >> > Now what? There's nothing to prevent this in the system, as near as >> > I can see, so it's just not bullet proof enough for the cases where >> > people really, really need only five minutes of down time a year. If >> > you _really_ needed that, you'd be willing (and able) to pay the >> > costs. Of course, we can do what we can to make those costs go down. >> > :-> But they're not that low yet. >> >> Why pgpool should bother? pgpool supposes every transaction should go >> through pgpool. Your example sounds like someone logs into M2 and tries >> to shut down it. > >I had the same thought. Every time I've set up a system with pgpool, >I've always configured the db servers on an odd port, cut it off with a >firewall to anything but the pgpool machine and set it to only answer to >pgpool's IP and only let users access the system through pgpool. > >USers accessing machines behind the scenes is a VERY bad idea. It's not >a pgpool bug, is a user bug. :) > >---------------------------(end of broadcast)--------------------------- >TIP 2: Don't 'kill -9' the postmaster > -- Aly S.P Dharshi aly.dharshi@telus.net "A good speech is like a good dress that's short enough to be interesting and long enough to cover the subject"
> On Fri, 2005-10-14 at 10:48, Andrew Sullivan wrote: > > On Fri, Oct 14, 2005 at 11:54:19PM +0900, Tatsuo Ishii wrote: > > > Enforcement? There would be plenty of ways to achieve that. For > > > example, you could set pg_hba.conf so that on ly the host where pgpool > > > is running on could connect to the host where postmaster is running > > > on. > > > > That just changes the problem to someone logging in from that host. > > (This isn't a theoretical problem, by the way; it's an objection that > > I've heard from people.) > > You're users shouldn't be able to do that. If they can, you've set up > your system wrong. Only the DBA should have access to that machine. > This is the same kind of problem as having a user log into a slony > replicant and issue the command "drop schema _clustername cascade" being > a problem. It's a permission / user problem. > > > > Right. It's your freedom that you do not use pgpool until you think > > > it's solid enough. > > > > And my managers approve it :) I should note, for the record, that > > I'm extremely impressed with pgpool. I just think we have some room > > to grow before we can say we have something to really compete with > > the commercial multimaster systems. > > Agreed. pgpool is great. But it's not RAC (That's a good thing, > actually :) BTW, the reason why I myself stick with pgpool is there's no perfect or acceptable replication solution for PostgreSQL (please do not talk about RAC or MySQL Cluster. I hate them:-). Slony-I seems great but after all it's an async. PGCluster is even greater. However the performance for updation is too poor(don't get me wrong. It's a great product. The performance for SELECT is amazing, and has truly high-avilabilty). Maybe Slony-II is one of the hope, but I have no idea how the performance is... -- SRA OSS, Inc. Japan Tatsuo Ishii
On Fri, Oct 14, 2005 at 11:16:36AM -0500, Scott Marlowe wrote: > You're users shouldn't be able to do that. If they can, you've set up > your system wrong. Only the DBA should have access to that machine. And DBAs aren't users? Oftentimes, a big goal is to protect against operator error. DBAs who are called to resolve a problem at 2 AM are _exactly_ the people we're protecting against. No, you can never completely lock down a system to protect against root doing 'rm -rf /'. But you can make it harder, and this approach doesn't do that well enough to be able to advertise that you can't muck with the system by accident. (Note that erserver had this problem, too; it's one of the things we were at some pains to prevent in Slony-I. We didn't get it perfect, though, and there are gaps in that system as a result. It'd be even more dangerous in a multimaster system.) And again, this is not to say there are any flies on pgpool. A -- Andrew Sullivan | ajs@crankycanuck.ca The fact that technology doesn't work is no bar to success in the marketplace. --Philip Greenspun
On Sat, Oct 15, 2005 at 01:33:22AM +0900, Tatsuo Ishii wrote: > > BTW, the reason why I myself stick with pgpool is there's no > perfect or acceptable replication solution for PostgreSQL (please do > not talk about RAC or MySQL Cluster. I hate them:-). And that's part of why we're looking at it, too. Slony-I has a target, but it's not this one. > and has truly high-avilabilty). Maybe Slony-II is one of the hope, but > I have no idea how the performance is... Well, since it doesn't exist except in prototype yet, I think the performance is pretty bad :-) A -- Andrew Sullivan | ajs@crankycanuck.ca I remember when computers were frustrating because they *did* exactly what you told them to. That actually seems sort of quaint now. --J.D. Baldwin
Andrew Sullivan wrote: >On Fri, Oct 14, 2005 at 10:20:41AM -0500, Scott Marlowe wrote: > > >>USers accessing machines behind the scenes is a VERY bad idea. It's not >>a pgpool bug, is a user bug. :) >> >> > >The problem with this glib answer is that we are talking about >systems where such a "user bug" can cost people millions of dollars. >They want the _machine_ to prevent the user bug. That's what they >think they're buying, and my understanding is that some of the other >systems provide greater protection. > > Out of curiosity, what is wrong with requiring client SSL certs to access the system and only issuing them to the PGPool system (or using a different CA if you need to issue client certs to the end users)? This doesn't eliminate the problem, but it does mean that you have to take the gun out of its sheath, take the safety off, and carefully aim it at your foot before you can get bit by this user bug.... In other words, it should eliminate inadvertant circumvention of the protections in place (though deliberate circumvention is always an issue when both sides are open source and the DBA has access to all systems-- after all, the DBA could rewrite the pg_hba to allow connecting from another host, and then issue the change, but then this isn't really the same objection, is it). Best Wishes, Chris Travers Metatron Technology Consulting
Andrew Sullivan wrote: >On Fri, Oct 14, 2005 at 11:54:19PM +0900, Tatsuo Ishii wrote: > > >>Enforcement? There would be plenty of ways to achieve that. For >>example, you could set pg_hba.conf so that on ly the host where pgpool >>is running on could connect to the host where postmaster is running >>on. >> >> > >That just changes the problem to someone logging in from that host. >(This isn't a theoretical problem, by the way; it's an objection that >I've heard from people.) > > Repeat after me... Data integrity, like security, is a process, not a product. Your objection is not an insoluble one. But it involves care in designing the system. If your DBA logs in to the actual DB host and issues a command, then that is the DBA's fault, and your DBA should know better. If he/she doesn't you have bigger problems than replication. A more valid concern is the question of not knowing that the DBA logged in from the wrong system to the wrong system. This can be greately reduced by intelligently engineering the network, and setting things up so that the system automatically logs in with a read-only account unless a username is specified.... I think that this risk can be reduced to something acceptable even in the world of high availability systems, but you are right, it is not so done out of the box. I think that this objection comes from the difference between FOSS where you have projects as toolkits for solutions and COTS Software where the packages are supposed to be complete solutions. > > > >>Right. It's your freedom that you do not use pgpool until you think >>it's solid enough. >> >> > >And my managers approve it :) I should note, for the record, that >I'm extremely impressed with pgpool. I just think we have some room >to grow before we can say we have something to really compete with >the commercial multimaster systems. > > Now, I never said it was a direct competitor by itself with commerical multimaster systems. I do think that with appropriate consulting services, network engineering, etc. services it could be with a little work, if these are the only objections you have :-) Best Wishes, Chris Travers Metatron Technology Consulting
Attachment
On Sat, Oct 15, 2005 at 06:04:54PM -0700, Chris Travers wrote: > Out of curiosity, what is wrong with requiring client SSL certs to > access the system and only issuing them to the PGPool system (or using a > different CA if you need to issue client certs to the end users)? This Hmm, I like this, although client SSL certs still didn't work with JDBC last I checked, so it won't solve all the problems. But you're right, this would mostly solve the problem I was thinking of, provided it was described correctly to the (mostly-clueless) technology rule-producers. > place (though deliberate circumvention is always an issue when both > sides are open source and the DBA has access to all systems-- after all, Open source has nothing to do with it, of course. Malicious attack by technical staff is something virtually no technology can guarantee against. The best you can usually get is adequate logging (and probably log monitoring) -- and we already provide that. A -- Andrew Sullivan | ajs@crankycanuck.ca The plural of anecdote is not data. --Roger Brinner
On Sat, Oct 15, 2005 at 05:58:20PM -0700, Chris Travers wrote: > Repeat after me... Data integrity, like security, is a process, not a > product. I understand that. The problem is not that _I_ don't understand it, but that the market for real, industrial multi-master replication is so far relatively small, and it looks like the market for RAC. Which means that one has to present the complete package to people who don't know what a Postgres is, and don't want to. It is _of course_ not impossible to provide for this. But one needs canned, "this is standard" ways to do these things, or else the people who are in a position to authorise Postgres use instead of something else are not going to do it. They don't want to be guinea pigs, and they can afford not to be. (Note that I'm not suggesting this is really a technical problem. It's just a social problem to which we have to present technology.) The sad truth of the matter is that if you want to alter the social circumstances in favour of some new, unconventional approach, your unconventional approach has to be _better than_ the existing convention, not merely as good as. And in the market we're talking about "cheaper" is not the main consideration for "better than". I think other arguments are useful -- access to source (and therefore auditability) is an obvious one -- but one needs to establish a well-known set of practices around these things if one wishes to be taken seriously for this kind of application. A -- Andrew Sullivan | ajs@crankycanuck.ca In the future this spectacle of the middle classes shocking the avant- garde will probably become the textbook definition of Postmodernism. --Brad Holland
On 18 Oct 2005, at 15:59, Andrew Sullivan wrote: > On Sat, Oct 15, 2005 at 05:58:20PM -0700, Chris Travers wrote: > <snip> > And in the market we're talking > about "cheaper" is not the main consideration for "better than". I > think other arguments are useful -- access to source (and therefore > auditability) is an obvious one -- but one needs to establish a > well-known set of practices around these things if one wishes to be > taken seriously for this kind of application. The current market thinks like that, but I suspect that a lot of small to medium sized companies which don't want to get sucked into the Oracle consultancy / £16,000 per CPU licensing vacuum are currently prepared to simply choose a less good solution that just happens to kinda get the job done. The current market for these solutions is made up of high paying customers with very expensive data precisely because nobody has released a cheaper alternative. There is no serious variation in price in the market, and hence the client base doesn't change because there isn't any real innovation. Release a cheaper / free alternative and people will use it because they will have almost no reason not to. This means that cheaper and as good as does have a place in the market even if it's not a conventional solution. It just needs evidence and evangelism. The current market should not be the principal target. I do agree that there being a single solution under PostgreSQL to this problem is the best path though, it is attractive to everyone for there to by one way to do it, not just the current users of similar systems.
On Tue, Oct 18, 2005 at 04:37:23PM +0100, Alex Stapleton wrote: > Release a cheaper / free alternative and people will use it because > they will have almost no reason not to. This means that cheaper and > as good as does have a place in the market even if it's not a > conventional solution. It just needs evidence and evangelism. The > current market should not be the principal target. I agree with this; but I'm always concerned about something that's _almost_ as good as the competition, but not quite there, being pointed at as being "as good as" the competition. That way lies an invitation to the point-and-laugh responses that MySQL's so-called cluster system has garnered: it's too dangerous to use for many systems where the data is important enough, because the failure mode is "near-complete catastrophe". See another thread, where I talk about the strategy some companies may be using of lumping PostgreSQL in with other products, and then attacking the other product. Irrelevance may be fallacious, but it makes for depressingly successful marketing. A -- Andrew Sullivan | ajs@crankycanuck.ca A certain description of men are for getting out of debt, yet are against all taxes for raising money to pay it off. --Alexander Hamilton
On 18 Oct 2005, at 16:57, Andrew Sullivan wrote: > On Tue, Oct 18, 2005 at 04:37:23PM +0100, Alex Stapleton wrote: > >> Release a cheaper / free alternative and people will use it because >> they will have almost no reason not to. This means that cheaper and >> as good as does have a place in the market even if it's not a >> conventional solution. It just needs evidence and evangelism. The >> current market should not be the principal target. >> > > I agree with this; but I'm always concerned about something that's > _almost_ as good as the competition, but not quite there, being > pointed at as being "as good as" the competition. That way lies an > invitation to the point-and-laugh responses that MySQL's so-called > cluster system has garnered: it's too dangerous to use for many > systems where the data is important enough, because the failure mode > is "near-complete catastrophe". See another thread, where I talk > about the strategy some companies may be using of lumping PostgreSQL > in with other products, and then attacking the other product. > Irrelevance may be fallacious, but it makes for depressingly > successful marketing. > If MySQL advertised it's bad points as well as it's good points it would not be nearly as dangerous, or generally laughable as a database. For example MySQL Cluster (NDB) is actually not that bad in a lot of cases, and can probably be pretty useful in a lot of situations. Unfortunately it seems to get mistaken (possibly even marketed as?) an actual database, rather than just a particularly smart caching system. A known quantity, is often infinitely better than a homegrown or unknown solution (which most homegrown ones really are.) Even if it is not as good as RAC. I agree that any solution which has a failure mode of nearly complete catastrophe is pretty much useless, but it's only dangerous (and an industry joke) if marketed using FUD in the same way MySQL do. I don't see any reason to worry about PostgreSQL getting bad (false) press, there are plenty of big names using PG, and plenty of commercial interest in it. It's not like PG should be aiming to render Oracle and so on redundant, it's the wrong way of thinking about it imo.
Andrew Sullivan wrote: >On Sat, Oct 15, 2005 at 06:04:54PM -0700, Chris Travers wrote: > > >>Out of curiosity, what is wrong with requiring client SSL certs to >>access the system and only issuing them to the PGPool system (or using a >>different CA if you need to issue client certs to the end users)? This >> >> > >Hmm, I like this, although client SSL certs still didn't work with >JDBC last I checked, so it won't solve all the problems. But you're >right, this would mostly solve the problem I was thinking of, >provided it was described correctly to the (mostly-clueless) >technology rule-producers. > Oops. I guess PgPool doesn't support SSL connections to backend servers. Too bad :-( This would have been a really nice elegant solution to this problem. It looks like PgCluster may support SSL, I am not sure.... The problem is that one needs some way of authenticating the client not just the user. SSL would work for that. I can't think of any other way to authenticate the client while still allowing one to authenticate the user afterwards... And I doubt that it is possible to use Kerberos to authenticate the daemon as well as the end user... Best Wishes, Chris Travers Metatron Technology Consulting
Attachment
On Wed, 2005-12-10 at 18:02 -0400, Bill Bartlett wrote: > [snipped] > > > May not be the best choice on Windows for production use, though for > > development, it should be adequate. > > Are there known issues with the Windows version for production use, or > is this simply because of the relative newness of the Windows-native > version? Yes, there is... Windows itself ;) (sorry, that was too easy!)