Thread: Hope for a new PostgreSQL era?
http://www.dbms2.com/2011/11/23/hope-for-a-new-postgresql-era/ Some of the points mentioned: - MySQL is narrowing the gap, but PostgreSQL is still ahead of MySQL in some ways. (Database extensibility if nothing else.) - Neither EnterpriseDB (which now calls itself “The enterprise PostgreSQL company”) nor the PostgreSQL community leadership have covered themselves with stewardship glory. - PostgreSQL advancement is not dead. Comments?
On Wed, Dec 7, 2011 at 7:52 PM, Rodrigo E. De León Plicet <rdeleonp@gmail.com> wrote: > http://www.dbms2.com/2011/11/23/hope-for-a-new-postgresql-era/ > > Some of the points mentioned: > - MySQL is narrowing the gap, but PostgreSQL is still ahead of MySQL > in some ways. (Database extensibility if nothing else.) > - Neither EnterpriseDB (which now calls itself “The enterprise > PostgreSQL > company”) nor the PostgreSQL community leadership have covered > themselves > with stewardship glory. > - PostgreSQL advancement is not dead. I'd like to see the author's thoughts filled out on these points. they seem rather vague and overly simplistic, and I wonder what specific points he might have to make rather than this vague "hand wavy" list he has so far.
On Wed, Dec 7, 2011 at 8:08 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > > I'd like to see the author's thoughts filled out on these points. > they seem rather vague and overly simplistic, and I wonder what > specific points he might have to make rather than this vague "hand > wavy" list he has so far. > > Additionally I am not entirely sure what he means by the last point. If you look at the work that NTT along with EDB has put into Postgres-XC, for example, it looks to me like the Postgres ecosystem is growing by leaps and bounds and we are approaching an era where Oracle is no longer ahead in any significant use case. The thing I am personally worried about is the ability of one company to dominate the framing of PostgreSQL service offerings. For example while in the US it hasn't caught on, a lot of people at MYGOSSCON accepted EnterpriseDB's framing of the official PostgreSQL release as the "community edition." If you have a single vendor which dominates the dialogue that's a bad thing. To be clear this isn't a criticism of EDB. I greatly appreciate the substantial effort they have put into building Pg awareness here in SE Asia. However, it is a caution about the recommendation that we need a corporate steward. I argue corporate stewardship would be a strong net negative because it would be first and foremost a way to crowd everyone else out. We have stewardship. It's the core committee, and it's the best kind of stewardship we can have. Here's a useful post that I was forwarded by another LSMB developer. http://openlife.cc/blogs/2010/november/how-grow-your-open-source-project-10x-and-revenues-5x Additionally, I would suggest that PostgreSQL has a lot of users because we have a great---and open---community. I think a new PostgreSQL era is coming but I don't think it will happen the way that blog poster implies. There is a tremendous need for Pg skills in SE Asia right now, and I expect this to continue to grow exponentially. PostgreSQL advancement also by my view is also not merely "not dead" but in fact accelerating. Best Wishes, Chris Travers
On Dec 8, 2011 1:27 PM, "Chris Travers" <chris.travers@gmail.com> wrote:
>
> Additionally I am not entirely sure what he means by the last point.
> If you look at the work that NTT along with EDB has put into
> Postgres-XC, for example, it looks to me like the Postgres ecosystem
> is growing by leaps and bounds and we are approaching an era where
> Oracle is no longer ahead in any significant use case.
While Pg is impressively capable now, I don't agree that Oracle (if DB2, MS-SQL etc) isn't ahead for any significant use case. Not on a purely technical basis anyway - once cost is considered there may be a stronger argument.
Multi-tenant hosting is a weak pint for Pg for quite a few reasons, done of which appear below. It's not the only role Pg isn't a great fit for, but probably one of the more obvious.
Areas in which Pg seems significantly less capable include:
- multi-tenant hosting and row level security
- admission control, queuing and resource limiting to optimally load a machine. Some limited level is possible with external pooling, but only by limiting concurrent workers.
- performance monitoring and diagnostics. It's way harder to find out what's causing load on a busy Pg server or report on frequent/expensive queries etc. Tooling is limited and fairly primitive. It's find, but nowhere near as powerful and easy as some if the other DBs.
- prioritisation of queries or users. It's hard to say "prefer this query over this one, give it more resources" or "user A's work always preempts user B's" in Pg.
- transparent failover and recovery back to the original master.
- shared-storage clustering. Dunno if anyone still cares about this one though.
On Thu, Dec 8, 2011 at 11:24 AM, Craig Ringer <ringerc@ringerc.id.au> wrote: > Areas in which Pg seems significantly less capable include: Please can you explain the features Oracle has in these area, I'm not clear. Thanks. > - admission control, queuing and resource limiting to optimally load a > machine. Some limited level is possible with external pooling, but only by > limiting concurrent workers. > - prioritisation of queries or users. It's hard to say "prefer this query > over this one, give it more resources" or "user A's work always preempts > user B's" in Pg. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 8 Prosinec 2011, 12:24, Craig Ringer wrote: > - admission control, queuing and resource limiting to optimally load a > machine. Some limited level is possible with external pooling, but only by > limiting concurrent workers. The first thing I'd like to see is "user profiles" - being able to set things like work_mem, synchronous_commit, etc. on per-user basis separately. > - performance monitoring and diagnostics. It's way harder to find out > what's causing load on a busy Pg server or report on frequent/expensive > queries etc. Tooling is limited and fairly primitive. It's find, but > nowhere near as powerful and easy as some if the other DBs. True. Greg Smith actually mentioned this as one of the frequently asked features in his post about two weeks ago (http://blog.2ndquadrant.com/en/2011/11/global-trends-in-deploying-pos.html). I've started to build my own tool and got it somehow working for my needs, and there are other tools available, but none of them is really a complete solution. Would be nice to form a dev group that would work on this. > - prioritisation of queries or users. It's hard to say "prefer this query > over this one, give it more resources" or "user A's work always preempts > user B's" in Pg. I wonder if the prioritisation could be done using nice - each backend is a separate process, so why not to do 'nice(10)' for low priority processes or something like that.
On 12/08/2011 08:53 PM, Tomas Vondra wrote: > On 8 Prosinec 2011, 12:24, Craig Ringer wrote: >> - admission control, queuing and resource limiting to optimally load a >> machine. Some limited level is possible with external pooling, but only by >> limiting concurrent workers. >> o d > The first thing I'd like to see is "user profiles"z- being able to set > things like work_mem, synchronous_commit, etc. on per-user basis > separately. You can. ALTER USER username SET work_mem = '100MB'; It's not a hard cap - the user can raise/lower it however they like. The initial value can be set globally, per-user, per-database, or globally. > I wonder if the prioritisation could be done using nice - each backend > is a separate process, so why not to do 'nice(10)' for low priority > processes or something like that. Yes, to a limited degree you can prioritise queries using nice and ionice, but it's awkward because: - All queries run as `postgres' so you can't do per-user limiting very easily - The postmaster doesn't have a way to set the nice level and ionice level when it forks a backend, nor does the backend have any way to do it later. You can use your own user-defined C functions for this, though. - Most importantly, even if you nice and ionice using C functions or manually with the cmdline utilities, you can't affect the bgwriter, nor can you affect how much data a low-priority query pushes out of cache. -- Craig Ringer
On Thu, Dec 08, 2011 at 01:53:45PM +0100, Tomas Vondra wrote: > I wonder if the prioritisation could be done using nice - each backend is > a separate process, so why not to do 'nice(10)' for low priority processes > or something like that. This won't work because if you are holding a lock on something someone else needs, your low nice score is going to cause them problems. It could make things worse rather than better. (This suggestion comes up a lot, by the way, so there's been a lot of discussion of it historically.) -- Andrew Sullivan ajs@crankycanuck.ca
2011/12/08 21:53, Tomas Vondra wrote: >> - performance monitoring and diagnostics. It's way harder to find out >> what's causing load on a busy Pg server or report on frequent/expensive >> queries etc. Tooling is limited and fairly primitive. It's find, but >> nowhere near as powerful and easy as some if the other DBs. > > True. Greg Smith actually mentioned this as one of the frequently asked > features in his post about two weeks ago > (http://blog.2ndquadrant.com/en/2011/11/global-trends-in-deploying-pos.html). > I've started to build my own tool and got it somehow working for my needs, > and there are other tools available, but none of them is really a complete > solution. Would be nice to form a dev group that would work on this. Seems a good point. I'm trying to build "a complete solution". :) Anyway, one of the reasons of such difficulties to build "a complete solution" is based on necessity of the support from the *entire* core code. Without the core support, a complete solution would never be built. Obtaining LWLock statistics or write I/O operations is actually pretty tough work for "non-experienced" PostgreSQL DBA, like me. :) For examples, I've been working on investigating PostgreSQL LWLock behaviors precisely for a few weeks, and it could not be obtained within PostgreSQL itself, therefore, I picked up SystemTap. However, SystemTap could not be used in a production system, because it often kills the target processes. :( How can I observe LWLocks in the production system? There are several tools to monitor system behaviors around operating systems, but it is far from understanding PostgreSQL behavior. And DBAs coming from other RDBMSes, in particular proprietary RDBMSes, need it, because they've already been using such facilities (or tools) in their RDBMSes. That's the reason why we need more facilities to observe inside PostgreSQL. In addition, one more reason of the difficulties is that experienced PostgreSQL DBAs (or hackers) do not need such facilities in general, because they can imagine how PostgreSQL works in such particular situation. I still think we can implement (or enhance) for those facilities if we focus on it, but I sometimes feel it's like "a chicken and egg situation". Regards, -- NAGAYASU Satoshi <satoshi.nagayasu@gmail.com>
On Wed, Dec 7, 2011 at 8:52 PM, Rodrigo E. De León Plicet <rdeleonp@gmail.com> wrote: > http://www.dbms2.com/2011/11/23/hope-for-a-new-postgresql-era/ > > Some of the points mentioned: > - MySQL is narrowing the gap, but PostgreSQL is still ahead of MySQL > in some ways. (Database extensibility if nothing else.) There is simply no comparing mysql's backend programming features with those of postgres. Postgres is a development platform in a box, mysql is not. merlin
Le Thu, 8 Dec 2011 12:27:22 +0000, Simon Riggs <simon@2ndQuadrant.com> a écrit : > On Thu, Dec 8, 2011 at 11:24 AM, Craig Ringer <ringerc@ringerc.id.au> > wrote: > > > Areas in which Pg seems significantly less capable include: > > Please can you explain the features Oracle has in these area, I'm not > clear. Thanks. Maybe I can answer from my own Oracle experience. I hope it will be what Craig had in mind :) > > > > - admission control, queuing and resource limiting to optimally > > load a machine. Some limited level is possible with external > > pooling, but only by limiting concurrent workers. Oracle has natively two ways of handling inbound connections: - Dedicated, which is very similar to the PostgreSQL way of accepting connections: accept(), fork() and so on - Shared, which is based on processes listening and handling the connections (called dispatchers) and processes doing the real work (called workers, obviously). All of this works internally with some sort of queuing and storing results in shared memory (I don't remember the details of it) The advantage of this second architecture being of course that you can't have more than N workers hitting your database simultaneously. So it's easier to keep the load on the server to a reasonable value. > > > - prioritisation of queries or users. It's hard to say "prefer this > > query over this one, give it more resources" or "user A's work > > always preempts user B's" in Pg. > It's called the resource manager in Oracle. You define 'resource plans', 'consumer groups', etc… and you get some sort of QoS for your queries. It's mostly about CPU resource allocation if I remember correctly (I never used it, except during training :) ) Being able of changing the backend's nice level may do something similar I guess. I don't think Oracle's resource manager solves the priority inversion due to locking in the database, but I'm not sure of it.
On Thu, Dec 8, 2011 at 9:11 AM, Marc Cousin <cousinmarc@gmail.com> wrote: > Le Thu, 8 Dec 2011 12:27:22 +0000, > Simon Riggs <simon@2ndQuadrant.com> a écrit : > >> On Thu, Dec 8, 2011 at 11:24 AM, Craig Ringer <ringerc@ringerc.id.au> >> wrote: >> >> > Areas in which Pg seems significantly less capable include: >> >> Please can you explain the features Oracle has in these area, I'm not >> clear. Thanks. > > Maybe I can answer from my own Oracle experience. I hope it will be what > Craig had in mind :) > >> >> >> > - admission control, queuing and resource limiting to optimally >> > load a machine. Some limited level is possible with external >> > pooling, but only by limiting concurrent workers. > > Oracle has natively two ways of handling inbound connections: > - Dedicated, which is very similar to the PostgreSQL way of accepting > connections: accept(), fork() and so on > - Shared, which is based on processes listening and handling the > connections (called dispatchers) and processes doing the real work > (called workers, obviously). All of this works internally with > some sort of queuing and storing results in shared memory (I don't > remember the details of it) > > The advantage of this second architecture being of course that you > can't have more than N workers hitting your database simultaneously. So > it's easier to keep the load on the server to a reasonable value. you have a couple of very good options to achieve the same in postgres -- pgbouncer, pgpool. merlin
On Thu, Dec 08, 2011 at 09:29:28AM -0600, Merlin Moncure wrote: > > you have a couple of very good options to achieve the same in postgres > -- pgbouncer, pgpool. One of the central issues that Postgres has in the "enterprise" land is exactly this sort of answer: "you have good options" but they're "not part of the core release". Now, we all know that this is a stupid and wrong way of thinking about it. But one cannot complain about being held to those sorts of enterprisey standards when one is having an enterprisey discussion. The original analysis (on the blog) seemed to be primarily aimed at exactly that sort of discussion, and I suspect that this is the kind of thing that was meant by the "community leadership" not having covered itself in stewardship glory. One of the "stewardship" tests, from a business-analysis point of view, is whether you're going to be able to find a reliable supply of experienced admins at all levels for your systems. Having a bunch of different, indifferently-documented projects that are all doing similar but slightly different things is, to someone looking from that point of view, a liability and not a strength. I happen to disagree, but it always seemed to me that something the Postgres community did poorly (and I count myself in that number, though less now than in the past) was understanding the hardships of the integrator and coming up with reasonably simple answers for those kinds of questions. It is not unreasonable to say that there are no simple answers here; but as unhappy as it makes me, those reasonably simple answers are necessary for some classes of users. And let's face it: companies like Oracle (and products like MySQL) are in a position to treat those sorts of answers as part of the cost of doing business, because they have revenue associated with their licenses so they can pay for coming up with those answers that way. In Postgres-land, everyone needs to charge money for those answers (i.e. be consultants), because that's the only real place to make a living. Alternatively, you can put together those answers as part of your own package; but in that case, it's not "the core PostgreSQL product", but something else. In this respect, the decision of the core team a number of years ago to say, "We're going to have 'integrated' replication that does x, y, and z," was the right one, despite the fact that it undermined the momentum of other interesting projects (and ones better suited to some environments). Sometimes, it's better to cut off options. Best, A -- Andrew Sullivan ajs@crankycanuck.ca
Le Thu, 8 Dec 2011 09:29:28 -0600, Merlin Moncure <mmoncure@gmail.com> a écrit : > On Thu, Dec 8, 2011 at 9:11 AM, Marc Cousin <cousinmarc@gmail.com> > wrote: > > Le Thu, 8 Dec 2011 12:27:22 +0000, > > Simon Riggs <simon@2ndQuadrant.com> a écrit : > > > >> On Thu, Dec 8, 2011 at 11:24 AM, Craig Ringer > >> <ringerc@ringerc.id.au> wrote: > >> > >> > Areas in which Pg seems significantly less capable include: > >> > >> Please can you explain the features Oracle has in these area, I'm > >> not clear. Thanks. > > > > Maybe I can answer from my own Oracle experience. I hope it will be > > what Craig had in mind :) > > > >> > >> > >> > - admission control, queuing and resource limiting to optimally > >> > load a machine. Some limited level is possible with external > >> > pooling, but only by limiting concurrent workers. > > > > Oracle has natively two ways of handling inbound connections: > > - Dedicated, which is very similar to the PostgreSQL way of > > accepting connections: accept(), fork() and so on > > - Shared, which is based on processes listening and handling the > > connections (called dispatchers) and processes doing the real work > > (called workers, obviously). All of this works internally with > > some sort of queuing and storing results in shared memory (I don't > > remember the details of it) > > > > The advantage of this second architecture being of course that you > > can't have more than N workers hitting your database > > simultaneously. So it's easier to keep the load on the server to a > > reasonable value. > > you have a couple of very good options to achieve the same in postgres > -- pgbouncer, pgpool. > I wish it was the same (I use and like both pgbouncer and pgpool too, and they do a good job, I'm not arguing on that). But unfortunately it isn't: you still have the notion of session for each connected client in Oracle when using the shared servers model. It means you keep your session variables, your prepared statements, your running transaction, etc… in each individual session while having the multiplexing equivalent of a 'statement level' from pgbouncer.
On Thu, Dec 8, 2011 at 9:11 AM, Marc Cousin <cousinmarc@gmail.com> > > wrote: > > I wish it was the same (I use and like both pgbouncer and pgpool too, > and they do a good job, I'm not arguing on that). But unfortunately it > isn't: you still have the notion of session for each connected client > in Oracle when using the shared servers model. > > It means you keep your session variables, your prepared statements, > your running transaction, etc… in each individual session while having > the multiplexing equivalent of a 'statement level' from pgbouncer. In Oracle - can the pool share connections between DB users and/or databases on the instance? If the answer is yes to either,that is a fair bit better than what we can achieve today. Brad.
On 8 Prosinec 2011, 14:17, Craig Ringer wrote: > You can. > > ALTER USER username SET work_mem = '100MB'; > > It's not a hard cap - the user can raise/lower it however they like. The > initial value can be set globally, per-user, per-database, or globally. Oh, shame on me! I wasn't aware of this. Too bad it's not possible to restrict this (changing certain config values). A simple 'before SET' hook might do the trick. >> I wonder if the prioritisation could be done using nice - each backend >> is a separate process, so why not to do 'nice(10)' for low priority >> processes or something like that. > > Yes, to a limited degree you can prioritise queries using nice and > ionice, but it's awkward because: > > - All queries run as `postgres' so you can't do per-user limiting very > easily > > - The postmaster doesn't have a way to set the nice level and ionice > level when it forks a backend, nor does the backend have any way to do > it later. You can use your own user-defined C functions for this, though. Yes, that's what I meant. > - Most importantly, even if you nice and ionice using C functions or > manually with the cmdline utilities, you can't affect the bgwriter, nor > can you affect how much data a low-priority query pushes out of cache. IMHO bgwriter may be reasonably tuned by bgwriter_* GUC variables. The user backends are probably more interesting here. Tomas
On 8 Prosinec 2011, 16:11, Marc Cousin wrote: >> > - admission control, queuing and resource limiting to optimally >> > load a machine. Some limited level is possible with external >> > pooling, but only by limiting concurrent workers. > > Oracle has natively two ways of handling inbound connections: > - Dedicated, which is very similar to the PostgreSQL way of accepting > connections: accept(), fork() and so on > - Shared, which is based on processes listening and handling the > connections (called dispatchers) and processes doing the real work > (called workers, obviously). All of this works internally with > some sort of queuing and storing results in shared memory (I don't > remember the details of it) > > The advantage of this second architecture being of course that you > can't have more than N workers hitting your database simultaneously. So > it's easier to keep the load on the server to a reasonable value. Which is exactly what pgbouncer and other connection pools are for ... >> > - prioritisation of queries or users. It's hard to say "prefer this >> > query over this one, give it more resources" or "user A's work >> > always preempts user B's" in Pg. >> > It's called the resource manager in Oracle. You define 'resource plans', > 'consumer groups', etc… and you get some sort of QoS for your queries. > It's mostly about CPU resource allocation if I remember correctly (I > never used it, except during training :) ) And it's damn difficult to get it working properly ... the simpler the better here. Tomas
On 8 Prosinec 2011, 17:11, Marc Cousin wrote: > Le Thu, 8 Dec 2011 09:29:28 -0600, >> >> > - admission control, queuing and resource limiting to optimally >> >> > load a machine. Some limited level is possible with external >> >> > pooling, but only by limiting concurrent workers. >> > >> > Oracle has natively two ways of handling inbound connections: >> > - Dedicated, which is very similar to the PostgreSQL way of >> > accepting connections: accept(), fork() and so on >> > - Shared, which is based on processes listening and handling the >> > connections (called dispatchers) and processes doing the real work >> > (called workers, obviously). All of this works internally with >> > some sort of queuing and storing results in shared memory (I don't >> > remember the details of it) >> > >> > The advantage of this second architecture being of course that you >> > can't have more than N workers hitting your database >> > simultaneously. So it's easier to keep the load on the server to a >> > reasonable value. >> >> you have a couple of very good options to achieve the same in postgres >> -- pgbouncer, pgpool. >> > > I wish it was the same (I use and like both pgbouncer and pgpool too, > and they do a good job, I'm not arguing on that). But unfortunately it > isn't: you still have the notion of session for each connected client > in Oracle when using the shared servers model. True, it is not exactly the same, it's similar. I don't think PostgreSQL will ever provide 'shared backends' the way Oracle does - it would require significant code change. Most of the benefits can be achieved by using a connection pool without the added complexity. > It means you keep your session variables, your prepared statements, > your running transaction, etc… in each individual session while having > the multiplexing equivalent of a 'statement level' from pgbouncer. Yes. But if you expect that PostgreSQL will mimic Oracle architecture, then you're mistaken. It's simply a different solution with different architecture, and that means you may need to use different application design sometimes. Tomas
Le Thu, 8 Dec 2011 16:27:56 +0000, "Nicholson, Brad (Toronto, ON, CA)" <bnicholson@hp.com> a écrit : > On Thu, Dec 8, 2011 at 9:11 AM, Marc Cousin <cousinmarc@gmail.com> > > > wrote: > > > > I wish it was the same (I use and like both pgbouncer and pgpool > > too, and they do a good job, I'm not arguing on that). But > > unfortunately it isn't: you still have the notion of session for > > each connected client in Oracle when using the shared servers model. > > > > It means you keep your session variables, your prepared statements, > > your running transaction, etc… in each individual session while > > having the multiplexing equivalent of a 'statement level' from > > pgbouncer. > > In Oracle - can the pool share connections between DB users and/or > databases on the instance? If the answer is yes to either, that is a > fair bit better than what we can achieve today. Between users yes. But there is only one DB per instance in Oracle :)
On Thu, Dec 8, 2011 at 10:11 AM, Marc Cousin <cousinmarc@gmail.com> wrote: > Le Thu, 8 Dec 2011 09:29:28 -0600, > Merlin Moncure <mmoncure@gmail.com> a écrit : > >> On Thu, Dec 8, 2011 at 9:11 AM, Marc Cousin <cousinmarc@gmail.com> >> wrote: >> > Le Thu, 8 Dec 2011 12:27:22 +0000, >> > Simon Riggs <simon@2ndQuadrant.com> a écrit : >> > >> >> On Thu, Dec 8, 2011 at 11:24 AM, Craig Ringer >> >> <ringerc@ringerc.id.au> wrote: >> >> >> >> > Areas in which Pg seems significantly less capable include: >> >> >> >> Please can you explain the features Oracle has in these area, I'm >> >> not clear. Thanks. >> > >> > Maybe I can answer from my own Oracle experience. I hope it will be >> > what Craig had in mind :) >> > >> >> >> >> >> >> > - admission control, queuing and resource limiting to optimally >> >> > load a machine. Some limited level is possible with external >> >> > pooling, but only by limiting concurrent workers. >> > >> > Oracle has natively two ways of handling inbound connections: >> > - Dedicated, which is very similar to the PostgreSQL way of >> > accepting connections: accept(), fork() and so on >> > - Shared, which is based on processes listening and handling the >> > connections (called dispatchers) and processes doing the real work >> > (called workers, obviously). All of this works internally with >> > some sort of queuing and storing results in shared memory (I don't >> > remember the details of it) >> > >> > The advantage of this second architecture being of course that you >> > can't have more than N workers hitting your database >> > simultaneously. So it's easier to keep the load on the server to a >> > reasonable value. >> >> you have a couple of very good options to achieve the same in postgres >> -- pgbouncer, pgpool. >> > > I wish it was the same (I use and like both pgbouncer and pgpool too, > and they do a good job, I'm not arguing on that). But unfortunately it > isn't: you still have the notion of session for each connected client > in Oracle when using the shared servers model. > > It means you keep your session variables, your prepared statements, > your running transaction, etc… in each individual session while having > the multiplexing equivalent of a 'statement level' from pgbouncer. yeah -- maybe we could use a server side feature that could allow you to save a session state and load it up later to make life easier for connection pooled applications. however, it's not really that much work to organize most of the things you'd use for this in an application managed session instead of database managed one. regarding the "enterprises won't use community supplied postgresql add ons" point, this completely true in many cases. I do think pgbouncer should be seriously considered for advancement as a core feature. That said, this should be done on its own merits, not to satisfy the capricious whims of enterprises. merlin
Le Thu, 8 Dec 2011 17:54:20 +0100, "Tomas Vondra" <tv@fuzzy.cz> a écrit : > On 8 Prosinec 2011, 16:11, Marc Cousin wrote: > >> > - admission control, queuing and resource limiting to optimally > >> > load a machine. Some limited level is possible with external > >> > pooling, but only by limiting concurrent workers. > > > > Oracle has natively two ways of handling inbound connections: > > - Dedicated, which is very similar to the PostgreSQL way of > > accepting connections: accept(), fork() and so on > > - Shared, which is based on processes listening and handling the > > connections (called dispatchers) and processes doing the real work > > (called workers, obviously). All of this works internally with > > some sort of queuing and storing results in shared memory (I don't > > remember the details of it) > > > > The advantage of this second architecture being of course that you > > can't have more than N workers hitting your database > > simultaneously. So it's easier to keep the load on the server to a > > reasonable value. > > Which is exactly what pgbouncer and other connection pools are for ... Yep. But with some limitations (not that important, but they exist) as detailed in another message. I like the pgbouncer approach as it is much simpler, but it has the limitation that the real sessions aren't in the database anymore, so context is lost, etc… > > >> > - prioritisation of queries or users. It's hard to say "prefer > >> > this query over this one, give it more resources" or "user A's > >> > work always preempts user B's" in Pg. > >> > > It's called the resource manager in Oracle. You define 'resource > > plans', 'consumer groups', etc… and you get some sort of QoS for > > your queries. It's mostly about CPU resource allocation if I > > remember correctly (I never used it, except during training :) ) > > And it's damn difficult to get it working properly ... the simpler the > better here. Yep, it's very hard and ugly to use. It's by the way the reason I used it only in training, not in production situations (in production, when it doesn't work, you have to debug the damn thing, not just throw it away :) )
On 8 Prosinec 2011, 17:27, Nicholson, Brad (Toronto, ON, CA) wrote: > On Thu, Dec 8, 2011 at 9:11 AM, Marc Cousin <cousinmarc@gmail.com> >> > wrote: >> >> I wish it was the same (I use and like both pgbouncer and pgpool too, >> and they do a good job, I'm not arguing on that). But unfortunately it >> isn't: you still have the notion of session for each connected client >> in Oracle when using the shared servers model. >> >> It means you keep your session variables, your prepared statements, >> your running transaction, etc… in each individual session while having >> the multiplexing equivalent of a 'statement level' from pgbouncer. > > In Oracle - can the pool share connections between DB users and/or > databases on the instance? If the answer is yes to either, that is a fair > bit better than what we can achieve today. Yes, each session has a UGA (User Global Area) memory, and this iss placed either in SGA (Shared Global Area) in case of "dedicated server" or PGA (Process Global Are) in case of "shared server." If you use shared server, then each request might be handled by a different backend process. PostgreSQL uses dedicated architecture which means simpler code base, configuration etc. If you need something like a "shared server" then you can use a connection pool, but you have to handle the session state on your own (in the application). Tomas
On 8 Prosinec 2011, 18:00, Marc Cousin wrote: > Le Thu, 8 Dec 2011 16:27:56 +0000, > "Nicholson, Brad (Toronto, ON, CA)" <bnicholson@hp.com> a écrit : > >> On Thu, Dec 8, 2011 at 9:11 AM, Marc Cousin <cousinmarc@gmail.com> >> > > wrote: >> > >> > I wish it was the same (I use and like both pgbouncer and pgpool >> > too, and they do a good job, I'm not arguing on that). But >> > unfortunately it isn't: you still have the notion of session for >> > each connected client in Oracle when using the shared servers model. >> > >> > It means you keep your session variables, your prepared statements, >> > your running transaction, etc… in each individual session while >> > having the multiplexing equivalent of a 'statement level' from >> > pgbouncer. >> >> In Oracle - can the pool share connections between DB users and/or >> databases on the instance? If the answer is yes to either, that is a >> fair bit better than what we can achieve today. > > Between users yes. But there is only one DB per instance in Oracle :) Because Oracle uses schemas instead of databases. One schema = one user = one database. Tomas
Le Thu, 8 Dec 2011 18:02:51 +0100, "Tomas Vondra" <tv@fuzzy.cz> a écrit : > On 8 Prosinec 2011, 17:11, Marc Cousin wrote: > > Le Thu, 8 Dec 2011 09:29:28 -0600, > >> >> > - admission control, queuing and resource limiting to > >> >> > optimally load a machine. Some limited level is possible with > >> >> > external pooling, but only by limiting concurrent workers. > >> > > >> > Oracle has natively two ways of handling inbound connections: > >> > - Dedicated, which is very similar to the PostgreSQL way of > >> > accepting connections: accept(), fork() and so on > >> > - Shared, which is based on processes listening and handling the > >> > connections (called dispatchers) and processes doing the real > >> > work (called workers, obviously). All of this works internally > >> > with some sort of queuing and storing results in shared memory > >> > (I don't remember the details of it) > >> > > >> > The advantage of this second architecture being of course that > >> > you can't have more than N workers hitting your database > >> > simultaneously. So it's easier to keep the load on the server to > >> > a reasonable value. > >> > >> you have a couple of very good options to achieve the same in > >> postgres -- pgbouncer, pgpool. > >> > > > > I wish it was the same (I use and like both pgbouncer and pgpool > > too, and they do a good job, I'm not arguing on that). But > > unfortunately it isn't: you still have the notion of session for > > each connected client in Oracle when using the shared servers model. > > True, it is not exactly the same, it's similar. I don't think > PostgreSQL will ever provide 'shared backends' the way Oracle does - > it would require significant code change. Most of the benefits can be > achieved by using a connection pool without the added complexity. I didn't ask for it. It's just not the same, which was what I was answering to. > > > It means you keep your session variables, your prepared statements, > > your running transaction, etc… in each individual session while > > having the multiplexing equivalent of a 'statement level' from > > pgbouncer. > > Yes. But if you expect that PostgreSQL will mimic Oracle architecture, > then you're mistaken. It's simply a different solution with different > architecture, and that means you may need to use different application > design sometimes. I just don't understand what you're getting all heated up for. I don't want PostgreSQL to mimic Oracle, and I'm only answering to Simon's question at the begining. I'm just saying that there are differences between the pgbouncer approach and the shared server approach, and benefits to the later (and to the former too by the way, mainly simplicity so less bugs, as shared servers architecture suffered from a lot of bugs).
Le Thu, 8 Dec 2011 10:56:39 -0600, Merlin Moncure <mmoncure@gmail.com> a écrit : > On Thu, Dec 8, 2011 at 10:11 AM, Marc Cousin <cousinmarc@gmail.com> > wrote: > > Le Thu, 8 Dec 2011 09:29:28 -0600, > > Merlin Moncure <mmoncure@gmail.com> a écrit : > > > >> On Thu, Dec 8, 2011 at 9:11 AM, Marc Cousin <cousinmarc@gmail.com> > >> wrote: > >> > Le Thu, 8 Dec 2011 12:27:22 +0000, > >> > Simon Riggs <simon@2ndQuadrant.com> a écrit : > >> > > >> >> On Thu, Dec 8, 2011 at 11:24 AM, Craig Ringer > >> >> <ringerc@ringerc.id.au> wrote: > >> >> > >> >> > Areas in which Pg seems significantly less capable include: > >> >> > >> >> Please can you explain the features Oracle has in these area, > >> >> I'm not clear. Thanks. > >> > > >> > Maybe I can answer from my own Oracle experience. I hope it will > >> > be what Craig had in mind :) > >> > > >> >> > >> >> > >> >> > - admission control, queuing and resource limiting to > >> >> > optimally load a machine. Some limited level is possible with > >> >> > external pooling, but only by limiting concurrent workers. > >> > > >> > Oracle has natively two ways of handling inbound connections: > >> > - Dedicated, which is very similar to the PostgreSQL way of > >> > accepting connections: accept(), fork() and so on > >> > - Shared, which is based on processes listening and handling the > >> > connections (called dispatchers) and processes doing the real > >> > work (called workers, obviously). All of this works internally > >> > with some sort of queuing and storing results in shared memory > >> > (I don't remember the details of it) > >> > > >> > The advantage of this second architecture being of course that > >> > you can't have more than N workers hitting your database > >> > simultaneously. So it's easier to keep the load on the server to > >> > a reasonable value. > >> > >> you have a couple of very good options to achieve the same in > >> postgres -- pgbouncer, pgpool. > >> > > > > I wish it was the same (I use and like both pgbouncer and pgpool > > too, and they do a good job, I'm not arguing on that). But > > unfortunately it isn't: you still have the notion of session for > > each connected client in Oracle when using the shared servers model. > > > > It means you keep your session variables, your prepared statements, > > your running transaction, etc… in each individual session while > > having the multiplexing equivalent of a 'statement level' from > > pgbouncer. > > yeah -- maybe we could use a server side feature that could allow you > to save a session state and load it up later to make life easier for > connection pooled applications. however, it's not really that much > work to organize most of the things you'd use for this in an > application managed session instead of database managed one. For us who can change our application code, of course. But some people can't. > > regarding the "enterprises won't use community supplied postgresql add > ons" point, this completely true in many cases. I do think pgbouncer > should be seriously considered for advancement as a core feature. That > said, this should be done on its own merits, not to satisfy the > capricious whims of enterprises. Sure. I'm not advocating this. Neither am I advocating using Oracle by the way. I'm just as pro-postgresql as anyone else. I just wanted to weigh the pros and cons of Oracle's way of doing things versus pgbouncer. And the shared server approach has its merits.
On 8 Prosinec 2011, 17:56, Merlin Moncure wrote: > regarding the "enterprises won't use community supplied postgresql add > ons" point, this completely true in many cases. I do think pgbouncer > should be seriously considered for advancement as a core feature. That > said, this should be done on its own merits, not to satisfy the > capricious whims of enterprises. I don't think so. In my experience, the enterprises are not using PostgreSQL (or other OS software in genera) because they think there's no guarantee or support available. If there's a third party (might be a local consulting company) providing acceptable guarantees and support for PostgreSQL, it may as well provide guarantees for pgbouncer and the enterprise customer is fine. They simply want a package with guarantees, it does not matter whether it's in core or not. Tomas
On 8 Prosinec 2011, 18:14, Marc Cousin wrote: >> > It means you keep your session variables, your prepared statements, >> > your running transaction, etc… in each individual session while >> > having the multiplexing equivalent of a 'statement level' from >> > pgbouncer. >> >> Yes. But if you expect that PostgreSQL will mimic Oracle architecture, >> then you're mistaken. It's simply a different solution with different >> architecture, and that means you may need to use different application >> design sometimes. > > I just don't understand what you're getting all heated up for. I don't > want PostgreSQL to mimic Oracle, and I'm only answering to Simon's > question at the begining. Sorry, I was not arguing with you nor heated up. It's rather a translation mistake - in my native language we often use 'you' when we actually mean 'someone.' So it should be something like "If someone expects ... he's mistaken," etc. Tomas
On Thursday 08 December 2011 18:34:35 Tomas Vondra wrote: > On 8 Prosinec 2011, 18:14, Marc Cousin wrote: > >> > It means you keep your session variables, your prepared > >> > statements, > >> > your running transaction, etc… in each individual session while > >> > having the multiplexing equivalent of a 'statement level' from > >> > pgbouncer. > >> > >> Yes. But if you expect that PostgreSQL will mimic Oracle architecture, > >> then you're mistaken. It's simply a different solution with different > >> architecture, and that means you may need to use different application > >> design sometimes. > > > > I just don't understand what you're getting all heated up for. I don't > > want PostgreSQL to mimic Oracle, and I'm only answering to Simon's > > question at the begining. > > Sorry, I was not arguing with you nor heated up. It's rather a translation > mistake - in my native language we often use 'you' when we actually mean > 'someone.' So it should be something like "If someone expects ... he's > mistaken," etc. Ok, no problem, that just felt weird :)
On 8 Prosinec 2011, 14:20, Andrew Sullivan wrote: > On Thu, Dec 08, 2011 at 01:53:45PM +0100, Tomas Vondra wrote: >> I wonder if the prioritisation could be done using nice - each backend >> is >> a separate process, so why not to do 'nice(10)' for low priority >> processes >> or something like that. > > This won't work because if you are holding a lock on something someone > else needs, your low nice score is going to cause them problems. It > could make things worse rather than better. (This suggestion comes up > a lot, by the way, so there's been a lot of discussion of it > historically.) I'm aware of that, but there are cases when this may actually work. For example we do have an OLTP system, but we need to build exports to other systems regularly. The export may need to read a lot of data, but I don't want to annoy the people who are using the system. So I could lower the priority for the backend generating the report. Yes, there are cases where this "priority inversion" makes it unusable. Tomas
On 12/08/2011 03:24 AM, Craig Ringer wrote: > - shared-storage clustering. Dunno if anyone still cares about this one > though. This one seems to be moving into the legacy category over the next 3-5 years. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development The PostgreSQL Conference - http://www.postgresqlconference.org/ @cmdpromptinc - @postgresconf - 509-416-6579
On 12/08/11 10:14 AM, Joshua D. Drake wrote: > >> - shared-storage clustering. Dunno if anyone still cares about this one >> though. > > This one seems to be moving into the legacy category over the next 3-5 > years. um, I believe this is referring to Oracle RAC clustering, not HA active/standby. I seriously doubt Oracle is dropping RAC. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
On 12/08/2011 10:54 AM, John R Pierce wrote: > > On 12/08/11 10:14 AM, Joshua D. Drake wrote: >> >>> - shared-storage clustering. Dunno if anyone still cares about this one >>> though. >> >> This one seems to be moving into the legacy category over the next 3-5 >> years. > > um, I believe this is referring to Oracle RAC clustering, not HA > active/standby. I seriously doubt Oracle is dropping RAC. I meant worrying about it for Pg. JD > > > -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development The PostgreSQL Conference - http://www.postgresqlconference.org/ @cmdpromptinc - @postgresconf - 509-416-6579
On 12/08/11 11:16 AM, Joshua D. Drake wrote: >> >> um, I believe this is referring to Oracle RAC clustering, not HA >> active/standby. I seriously doubt Oracle is dropping RAC. > > I meant worrying about it for Pg. the odds of Postgres developing something as complex and intricate as RAC are probably between zilch and none. RAC was for many years completely unusable, and even now, its complicated, fragile, and expensive. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
On 12/08/2011 11:24 AM, John R Pierce wrote: > > On 12/08/11 11:16 AM, Joshua D. Drake wrote: >>> >>> um, I believe this is referring to Oracle RAC clustering, not HA >>> active/standby. I seriously doubt Oracle is dropping RAC. >> >> I meant worrying about it for Pg. > > the odds of Postgres developing something as complex and intricate as > RAC are probably between zilch and none. RAC was for many years > completely unusable, and even now, its complicated, fragile, and expensive. > Exactly. -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development The PostgreSQL Conference - http://www.postgresqlconference.org/ @cmdpromptinc - @postgresconf - 509-416-6579
On Thu, Dec 8, 2011 at 3:24 AM, Craig Ringer <ringerc@ringerc.id.au> wrote: > > On Dec 8, 2011 1:27 PM, "Chris Travers" <chris.travers@gmail.com> wrote: >> >> Additionally I am not entirely sure what he means by the last point. >> If you look at the work that NTT along with EDB has put into >> Postgres-XC, for example, it looks to me like the Postgres ecosystem >> is growing by leaps and bounds and we are approaching an era where >> Oracle is no longer ahead in any significant use case. > > While Pg is impressively capable now, I don't agree that Oracle (if DB2, > MS-SQL etc) isn't ahead for any significant use case. Not on a purely > technical basis anyway - once cost is considered there may be a stronger > argument. I said "approaching an era" for a reason. We aren't there yet, but we are fast approaching it. Major areas I didn't think PostgreSQL would ever directly compete in are now within arms reach. Also when I say use case, I am talking like: "I have a 2TB database and need to be able to run aggregates across 20M row tables as part of my transactional system." What I see you mentioning are tools missing which in some cases show use cases we aren't so good at (high security databases where row-level security needs to be enforced, or accounting systems for holding companies or the like), but given now that the above use case is now within reach, I have to think the others will be soon as well. Best Wishes, Chris Travers
On Thu, 08 Dec 2011 11:24:12 -0800, John R Pierce <pierce@hogranch.com> wrote: > On 12/08/11 11:16 AM, Joshua D. Drake wrote: >>> >>> um, I believe this is referring to Oracle RAC clustering, not HA >>> active/standby. I seriously doubt Oracle is dropping RAC. >> >> I meant worrying about it for Pg. > > the odds of Postgres developing something as complex and intricate as > RAC are probably between zilch and none. RAC was for many years > completely unusable, and even now, its complicated, fragile, and expensive. Happily, the complications and fragility are now utilised by Oracle to help sell ExaData units, on the basis that if you give Oracle even more money, they'll sell you a RAC that actually works!
On Thu, Dec 8, 2011 at 6:54 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > There is simply no comparing mysql's backend programming features with > those of postgres. Postgres is a development platform in a box, mysql > is not. > This and there are annoyances I have run into with MySQL, such as deadlocks.... on a system with only one running transaction.... I traced this problem eventually to multi-row inserts and concluded it was a race condition in threaded processing. As I say, any db can and should allow conflicting transactions to deadlock. It takes special talent to allow transactions to deadlock against *themselves*...... Interestingly I have customers who run both MySQL and PostgreSQL and have had similar issues with transactions deadlocking against themselves on MySQL, so I know I am not unique here. MySQL, I am sorry to say, is not catching up to PostgreSQL at least from where I stand except for the few very specific workloads that its query cache are designed to work with. Best Wishes, Chris Travers
On Thu, Dec 8, 2011 at 4:09 PM, Rodger Donaldson <rodgerd@diaspora.gen.nz> wrote: > On Thu, 08 Dec 2011 11:24:12 -0800, John R Pierce <pierce@hogranch.com> > wrote: >> On 12/08/11 11:16 AM, Joshua D. Drake wrote: >>>> >>>> um, I believe this is referring to Oracle RAC clustering, not HA >>>> active/standby. I seriously doubt Oracle is dropping RAC. >>> >>> I meant worrying about it for Pg. >> >> the odds of Postgres developing something as complex and intricate as >> RAC are probably between zilch and none. RAC was for many years >> completely unusable, and even now, its complicated, fragile, and > expensive. > > Happily, the complications and fragility are now utilised by Oracle to > help sell ExaData units, on the basis that if you give Oracle even more > money, they'll sell you a RAC that actually works! Looking at the general design of Postgres-XC compared to RAC, which workloads would the latter excel at as a matter of design that the former would not? Granted Postgres-XC is still pre-1.0 (latest release iirc is 0.9.6) and it doesn't yet support everything it needs to, but it looks very promising in this area, and it is open source. Best Wishes, Chris Travers
On Thu, 8 Dec 2011 16:34:49 -0800, Chris Travers <chris.travers@gmail.com> wrote: > On Thu, Dec 8, 2011 at 4:09 PM, Rodger Donaldson > <rodgerd@diaspora.gen.nz> wrote: >> On Thu, 08 Dec 2011 11:24:12 -0800, John R Pierce <pierce@hogranch.com> >> wrote: >>> On 12/08/11 11:16 AM, Joshua D. Drake wrote: >>>>> >>>>> um, I believe this is referring to Oracle RAC clustering, not HA >>>>> active/standby. I seriously doubt Oracle is dropping RAC. >>>> >>>> I meant worrying about it for Pg. >>> >>> the odds of Postgres developing something as complex and intricate as >>> RAC are probably between zilch and none. RAC was for many years >>> completely unusable, and even now, its complicated, fragile, and >> expensive. >> >> Happily, the complications and fragility are now utilised by Oracle to >> help sell ExaData units, on the basis that if you give Oracle even more >> money, they'll sell you a RAC that actually works! > > Looking at the general design of Postgres-XC compared to RAC, which > workloads would the latter excel at as a matter of design that the > former would not? Not having touched -XC I can only theorycraft, but if I've understood the shared-nothing model correctly, I'd expect it to hammer RAC in high-write workloads, based on my experience of contention in RAC servers - a common condition is that since the SGA is (effectively) shared across the RAC interconnect, you're effectively limiting your peak write performance to the throughput and latency of your ethernet link (which is, I assume, why the ExaData uses Infiniband). We've had to do things like set up seperate connection pools which write to one node at a time (with failover, of course) for high-INSERT tables to avoid the RAC becoming painfully slow. On the other hand I don't see anything that would suggest -XC will work as seamlessly for failover as the RAC does. We went through a period where our applications (and DBAs!) failed to notice kernel-panic induced reboots of RAC members at least once every couple of weeks, with absolutely no customer impact. Anyway, my comment was more of a dig at the selling pitch for ExaData: I have had it pitched with a straight face that I should want to buy one because RAC is so hard to configure and maintain. > Granted Postgres-XC is still pre-1.0 (latest > release iirc is 0.9.6) and it doesn't yet support everything it needs > to, but it looks very promising in this area, and it is open source. It looks really interesting. Thanks for the pointer.
On 12/08/2011 08:27 PM, Simon Riggs wrote: > On Thu, Dec 8, 2011 at 11:24 AM, Craig Ringer<ringerc@ringerc.id.au> wrote: > >> Areas in which Pg seems significantly less capable include: > Please can you explain the features Oracle has in these area, I'm not > clear. Thanks. > Marc has, as I was hoping, done so much better than I could. Most of what I know is 2nd hand from Oracle users - I'm not one myself. It's interesting to see the view that the resource manager for query and user prioritisation is hard to use in practice. That's not something I'd heard before, but I can't say I'm entirely surprised given how complicated problems around lock management and priority inversion are to get right even in a system where there *aren't* free-form dynamic user-defined queries running. -- Craig Ringer
On 12/09/2011 01:02 AM, Tomas Vondra wrote: > On 8 Prosinec 2011, 17:11, Marc Cousin wrote: >> Le Thu, 8 Dec 2011 09:29:28 -0600, >>>>>> - admission control, queuing and resource limiting to optimally >>>>>> load a machine. Some limited level is possible with external >>>>>> pooling, but only by limiting concurrent workers. >>>> Oracle has natively two ways of handling inbound connections: >>>> - Dedicated, which is very similar to the PostgreSQL way of >>>> accepting connections: accept(), fork() and so on >>>> - Shared, which is based on processes listening and handling the >>>> connections (called dispatchers) and processes doing the real work >>>> (called workers, obviously). All of this works internally with >>>> some sort of queuing and storing results in shared memory (I don't >>>> remember the details of it) >>>> >>>> The advantage of this second architecture being of course that you >>>> can't have more than N workers hitting your database >>>> simultaneously. So it's easier to keep the load on the server to a >>>> reasonable value. >>> you have a couple of very good options to achieve the same in postgres >>> -- pgbouncer, pgpool. >>> >> I wish it was the same (I use and like both pgbouncer and pgpool too, >> and they do a good job, I'm not arguing on that). But unfortunately it >> isn't: you still have the notion of session for each connected client >> in Oracle when using the shared servers model. > True, it is not exactly the same, it's similar. I don't think PostgreSQL > will ever provide 'shared backends' the way Oracle does - it would require > significant code change. Most of the benefits can be achieved by using a > connection pool without the added complexity. > Yep - a connection pool that could save and restore session state, separating "executor/backend" from "connection/session", would produce much the same result with a lot less complexity. It's one of the reasons I'd love to see in-core pooling, because I don't see how an out-of-core solution can maintain session state like advisory locks, HOLD cursors, etc. -- Craig Ringer
On Thu, Dec 8, 2011 at 6:47 PM, Rodger Donaldson <rodgerd@diaspora.gen.nz> wrote: > On Thu, 8 Dec 2011 16:34:49 -0800, Chris Travers <chris.travers@gmail.com> > wrote: >> On Thu, Dec 8, 2011 at 4:09 PM, Rodger Donaldson >> <rodgerd@diaspora.gen.nz> wrote: >>> On Thu, 08 Dec 2011 11:24:12 -0800, John R Pierce <pierce@hogranch.com> >>> wrote: >>>> On 12/08/11 11:16 AM, Joshua D. Drake wrote: >>>>>> >>>>>> um, I believe this is referring to Oracle RAC clustering, not HA >>>>>> active/standby. I seriously doubt Oracle is dropping RAC. >>>>> >>>>> I meant worrying about it for Pg. >>>> >>>> the odds of Postgres developing something as complex and intricate as >>>> RAC are probably between zilch and none. RAC was for many years >>>> completely unusable, and even now, its complicated, fragile, and >>> expensive. >>> >>> Happily, the complications and fragility are now utilised by Oracle to >>> help sell ExaData units, on the basis that if you give Oracle even more >>> money, they'll sell you a RAC that actually works! >> >> Looking at the general design of Postgres-XC compared to RAC, which >> workloads would the latter excel at as a matter of design that the >> former would not? > > Not having touched -XC I can only theorycraft, but if I've understood the > shared-nothing model correctly, I'd expect it to hammer RAC in high-write > workloads, based on my experience of contention in RAC servers - a common > condition is that since the SGA is (effectively) shared across the RAC > interconnect, you're effectively limiting your peak write performance to > the throughput and latency of your ethernet link (which is, I assume, why > the ExaData uses Infiniband). We've had to do things like set up seperate > connection pools which write to one node at a time (with failover, of > course) for high-INSERT tables to avoid the RAC becoming painfully slow. I think you have to define "shared-nothing" in this context. Like a shared-storage architecture, this has two tiers, namely a tier that orchestrates queries and a storage tier for lack of a better word. In other words, XC is attempting to be like Teradata more than like RAC. In fact one of the key issues here is that Postgres-XC is supposed to be write-scalable. > > On the other hand I don't see anything that would suggest -XC will work as > seamlessly for failover as the RAC does. We went through a period where > our applications (and DBAs!) failed to notice kernel-panic induced reboots > of RAC members at least once every couple of weeks, with absolutely no > customer impact. Looking to me like it would be quite possible to set it up so that it would be seemless in the event of node reboots at least on the coordinator tier. On the storage tier, not sure the extent to which this would be possible but it might depend on Postgresql replication options and how well supported they are by XC. > > Anyway, my comment was more of a dig at the selling pitch for ExaData: I > have had it pitched with a straight face that I should want to buy one > because RAC is so hard to configure and maintain. In which case it's good that Pg isnt gunning for that market, right? > >> Granted Postgres-XC is still pre-1.0 (latest >> release iirc is 0.9.6) and it doesn't yet support everything it needs >> to, but it looks very promising in this area, and it is open source. > > It looks really interesting. Thanks for the pointer. YW :-) Chris Travers
Le Fri, 09 Dec 2011 11:11:12 +0800, Craig Ringer <ringerc@ringerc.id.au> a écrit : > On 12/08/2011 08:27 PM, Simon Riggs wrote: > > On Thu, Dec 8, 2011 at 11:24 AM, Craig > > Ringer<ringerc@ringerc.id.au> wrote: > > > >> Areas in which Pg seems significantly less capable include: > > Please can you explain the features Oracle has in these area, I'm > > not clear. Thanks. > > > Marc has, as I was hoping, done so much better than I could. Most of > what I know is 2nd hand from Oracle users - I'm not one myself. > > It's interesting to see the view that the resource manager for query > and user prioritisation is hard to use in practice. That's not > something I'd heard before, but I can't say I'm entirely surprised > given how complicated problems around lock management and priority > inversion are to get right even in a system where there *aren't* > free-form dynamic user-defined queries running. The complexity, at least for me, came from the user interface (at least a dozen of stored procedures with a complex syntax) to set up and monitor the resource manager. I don't think it manages the priority inversion problems, just CPU priorities. I asked the Oracle trainer, who wasn't sure either :)
On Thu, Dec 8, 2011 at 3:11 PM, Marc Cousin <cousinmarc@gmail.com> wrote: > Le Thu, 8 Dec 2011 12:27:22 +0000, > Simon Riggs <simon@2ndQuadrant.com> a écrit : > >> On Thu, Dec 8, 2011 at 11:24 AM, Craig Ringer <ringerc@ringerc.id.au> >> wrote: >> >> > Areas in which Pg seems significantly less capable include: >> >> Please can you explain the features Oracle has in these area, I'm not >> clear. Thanks. > > Maybe I can answer from my own Oracle experience. I hope it will be what > Craig had in mind :) > >> >> >> > - admission control, queuing and resource limiting to optimally >> > load a machine. Some limited level is possible with external >> > pooling, but only by limiting concurrent workers. > > Oracle has natively two ways of handling inbound connections: > - Dedicated, which is very similar to the PostgreSQL way of accepting > connections: accept(), fork() and so on > - Shared, which is based on processes listening and handling the > connections (called dispatchers) and processes doing the real work > (called workers, obviously). All of this works internally with > some sort of queuing and storing results in shared memory (I don't > remember the details of it) > The advantage of this second architecture being of course that you > can't have more than N workers hitting your database simultaneously. So > it's easier to keep the load on the server to a reasonable value. > >> >> > - prioritisation of queries or users. It's hard to say "prefer this >> > query over this one, give it more resources" or "user A's work >> > always preempts user B's" in Pg. >> > It's called the resource manager in Oracle. You define 'resource plans', > 'consumer groups', etc… and you get some sort of QoS for your queries. > It's mostly about CPU resource allocation if I remember correctly (I > never used it, except during training :) ) > > Being able of changing the backend's nice level may do something > similar I guess. I don't think Oracle's resource manager solves > the priority inversion due to locking in the database, but I'm not sure > of it. Thanks, sounds interesting. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Am 08.12.2011 19:54, schrieb John R Pierce: > On 12/08/11 10:14 AM, Joshua D. Drake wrote: >> >>> - shared-storage clustering. Dunno if anyone still cares about this one >>> though. >> >> This one seems to be moving into the legacy category over the next 3-5 >> years. > > um, I believe this is referring to Oracle RAC clustering, not HA > active/standby. I seriously doubt Oracle is dropping RAC. Oracle Exadata reqires RAC. Therefore I also don't think, they will drop this ;-) -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project
On Thu, Dec 8, 2011 at 2:54 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Wed, Dec 7, 2011 at 8:52 PM, Rodrigo E. De León Plicet > <rdeleonp@gmail.com> wrote: >> http://www.dbms2.com/2011/11/23/hope-for-a-new-postgresql-era/ >> >> Some of the points mentioned: >> - MySQL is narrowing the gap, but PostgreSQL is still ahead of MySQL >> in some ways. (Database extensibility if nothing else.) > > There is simply no comparing mysql's backend programming features with > those of postgres. Postgres is a development platform in a box, mysql > is not. A key point, I think, but not just as a comparison against other RDBMS. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 12/08/2011 09:48 AM, Satoshi Nagayasu wrote: > For examples, I've been working on investigating PostgreSQL LWLock > behaviors > precisely for a few weeks, and it could not be obtained within PostgreSQL > itself, therefore, I picked up SystemTap. However, SystemTap could not be > used in a production system, because it often kills the target > processes. :( > How can I observe LWLocks in the production system? I decided about a year ago that further work on using SystemTap was a black hole: time goes in, nothing really usable on any production server seems to come out. It can be useful for collecting data in a developer context. But the sort of problems people are more interested in all involve "why is the production server doing this?", and as you've also discovered the only reasonable answer so far doesn't involve SystemTap; it involves DTrace and either Solaris or FreeBSD (or Mac OS, for smaller server hardware deployments). Since those platforms are problematic to run database servers on in many cases, that doesn't help very much. I'm planning to put that instrumentation into the database directly, which is what people with Oracle background are asking for. There are two underlying low-level problems to solve before even starting that: -How can the overhead of collecting the timing data be kept down? It's really high in some places. This is being worked out right now on pgsql-hackers, see "Timing overhead and Linux clock sources" -How do you log the potentially large amount of data collected without killing server performance? Initial discussions also happening right now, see "logging in high performance systems". I feel this will increasingly be the top blocker for performance sensitive deployments in the coming year, people used to having these tools in Oracle cannot imagine how they would operate without them. One of my big pictures goals is have this available as a compile-time option starting in PostgreSQL 9.3 in 2013, piggybacked off the existing DTrace support. And the earlier the better--since many migrations have a long lead time, just knowing it's coming in the next version would be good enough for some people who are blocked right now to start working on theirs. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On 12/10/2011 09:54 AM, Greg Smith wrote: > I'm planning to put that instrumentation into the database directly, > which is what people with Oracle background are asking for. FWIW, even for folks like me who've come from a general OSS DB background with a smattering of old Sybase and other primitive stuff, it's still a concern. I know enough about Linux's guts, I/O and memory behaviour to be able to track down many issues but it takes *time*, time I can't spend on any of the numerous other things I have to also be doing. Right now Pg performance fine-tuning is more of an expert skill (as you know!) and for those with a less strong background in Linux/UNIX systems and tuning it's a bit of a barrier. Thankfully my workload is so light I don't need to care; manual EXPLAIN ANALYSE and auto_explain along with vmstat/iotop is enough for me. > -How can the overhead of collecting the timing data be kept down? > It's really high in some places. This is being worked out right now > on pgsql-hackers, see "Timing overhead and Linux clock sources" One thing I think would be interesting for this would be to identify slow queries (without doing detailed plan timing) and flag them for more detailed timing if they're run again within <x> time. I suspect this would only be practical with parameterised prepared statements where the query string remained the same, but that'd still be interesting - essentially automatically upgrading the log level for problem queries from slow query logging to auto_explain with explain analyse. The main issue would be exempting queries that're expected to take longer than the slow query threshold, like reporting queries, where you wouldn't want to pay that overhead. That should be handled by forgetting about slow queries that aren't run again too soon, so they get flagged for EXPLAIN ANALYZE next run but forgotten about before they're next run. I don't actually need this myself, it's just something I've been thinking about as a way to reduce the admin load of identifying and tuning problem queries. > I feel this will increasingly be the top blocker for performance > sensitive deployments in the coming year, people used to having these > tools in Oracle cannot imagine how they would operate without them. Yep, especially since there's nothing in Pg to manage user/query priorities for I/O or CPU, so the ability to actively manage performance problems from the outside is particularly crucial. You'll always want to do that of course, and it's probably _better_ than relying on work priorities, especially since it sounds from recent comments like even on Oracle those priority features aren't what you'd call friendly. Personally I'd choose good performance monitoring over user/query priorities any day. With good perf monitoring I can script from the outside I have a lot more control, can send alerts, etc etc. -- Craig Ringer
On Sat, Dec 10, 2011 at 7:28 PM, Craig Ringer <ringerc@ringerc.id.au> wrote: > The main issue would be exempting queries that're expected to take longer > than the slow query threshold, like reporting queries, where you wouldn't > want to pay that overhead. One trick you can use for this is to assign the reporting application a different user and then alter user yada yada to turn off logging of slow queries etc for that user. Since it's often useful to increase work_mem and / or change random_page_cost and so on for a reporting user, it's pretty common to do this anyway.
On 12/10/2011 09:28 PM, Craig Ringer wrote: > One thing I think would be interesting for this would be to identify > slow queries (without doing detailed plan timing) and flag them for > more detailed timing if they're run again within <x> time. I suspect > this would only be practical with parameterised prepared statements > where the query string remained the same, but that'd still be interesting There are actually two patches sitting in the current PostgreSQL CommitFest that allow normalizing query strings in a way that they could be handled like this even if not prepared, as part of pg_stat_statements. What you're asking for is basically a hybrid of that and auto_explain, with something smarter deciding when the explain is triggered. Interesting idea, I hadn't thought of that heuristic before. It won't be hard to do if the query normalization stuff commits. > Personally I'd choose good performance monitoring over user/query > priorities any day. With good perf monitoring I can script from the > outside I have a lot more control, can send alerts, etc etc. Luckily for you it's hard to do it in any other order. When I think about how we'd have to validate whether query prioritization code was operating as expected or not, I imagine some extra monitoring tools really need to get built first. Might as well expose those for people like yourself too, once they're built for that purpose. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
2011/12/10 10:54, Greg Smith wrote: > On 12/08/2011 09:48 AM, Satoshi Nagayasu wrote: >> For examples, I've been working on investigating PostgreSQL LWLock behaviors >> precisely for a few weeks, and it could not be obtained within PostgreSQL >> itself, therefore, I picked up SystemTap. However, SystemTap could not be >> used in a production system, because it often kills the target processes. :( >> How can I observe LWLocks in the production system? > > I decided about a year ago that further work on using SystemTap was a black hole: time goes in, nothing really usable onany production server seems to come out. It can be useful for collecting data in a developer context. But the sort of problemspeople are more interested in all involve "why is the production server doing this?", and as you've also discoveredthe only reasonable answer so far doesn't involve SystemTap; it involves DTrace and either Solaris or FreeBSD (orMac OS, for smaller server hardware deployments). Since those platforms are problematic to run database servers on inmany cases, that doesn't help very much. Absolutely. SystemTap would be useful if I'm able to reproduce the situation outside the production system. However, in most cases, it would be actually difficult. > I'm planning to put that instrumentation into the database directly, which is what people with Oracle background are askingfor. There are two underlying low-level problems to solve before even starting that: > > -How can the overhead of collecting the timing data be kept down? It's really high in some places. This is being workedout right now on pgsql-hackers, see "Timing overhead and Linux clock sources" > > -How do you log the potentially large amount of data collected without killing server performance? Initial discussionsalso happening right now, see "logging in high performance systems". > > I feel this will increasingly be the top blocker for performance sensitive deployments in the coming year, people usedto having these tools in Oracle cannot imagine how they would operate without them. One of my big pictures goals is havethis available as a compile-time option starting in PostgreSQL 9.3 in 2013, piggybacked off the existing DTrace support.And the earlier the better--since many migrations have a long lead time, just knowing it's coming in the next versionwould be good enough for some people who are blocked right now to start working on theirs. I'm glad to hear that. I'm very interested in focusing on it, and will follow the threads. Thanks. -- NAGAYASU Satoshi <satoshi.nagayasu@gmail.com>
Sorry for delay, >>I wonder if the prioritisation could be done=20 >>using nice - each backend is a separate=20 >>process, so why not to do 'nice(10)' for low=20 >>priority processes or something like that. > >Yes, to a limited degree you can prioritise=20 >queries using nice and ionice, but it's awkward because: > >- All queries run as `postgres' so you can't do per-user limiting very eas= ily > >- The postmaster doesn't have a way to set the=20 >nice level and ionice level when it forks a=20 >backend, nor does the backend have any way to do=20 >it later. You can use your own user-defined C functions for this, though. Postmaster hasn't a way to modify the system nice=20 or ionice, but it can send signals to child=20 process. Sending a user signal to sleep=20 completely the child (i/o and cpu) or before any=20 i/o call can do the trick. Perhaps it's time to=20 add a query scheduler or a more complete/complex one in postgres. >-- >Craig Ringer ---------------------------------------------- Si la vida te da la espalda, =A1tocale el culo!=20=20
> Absolutely. SystemTap would be useful if I'm able to reproduce the situation
> outside the production system. However, in most cases, it would be actually
> difficult.
At the db level, Oracle provides "Database replay" feature. that lets you replay the production server events in the development/test environment.
http://docs.oracle.com/cd/B28359_01/server.111/e12253/dbr_intro.htm
Won't something like this be useful in PostgreSQL? It will let us mimic the production environment load and analyze it better.
Regards,
Jayadevan
DISCLAIMER:
"The information in this e-mail and any attachment is intended only for the person to whom it is addressed and may contain confidential and/or privileged material. If you have received this e-mail in error, kindly contact the sender and destroy all copies of the original communication. IBS makes no warranty, express or implied, nor guarantees the accuracy, adequacy or completeness of the information contained in this email or any attachment and is not liable for any errors, defects, omissions, viruses or for resultant loss or damage, if any, direct or indirect."
Hi, On 12 December 2011 15:39, Jayadevan M <Jayadevan.Maymala@ibsplc.com> wrote: > At the db level, Oracle provides "Database replay" feature. that lets you > replay the production server events in the development/test environment. > http://docs.oracle.com/cd/B28359_01/server.111/e12253/dbr_intro.htm > Won't something like this be useful in PostgreSQL? It will let us mimic the > production environment load and analyze it better. There is project called pgreplay (http://pgreplay.projects.postgresql.org/): pgreplay reads a PostgreSQL log file (not a WAL file), extracts the SQL statements and executes them in the same order and relative time against a PostgreSQL database cluster. The idea is to replay a real-world database workload as exactly as possible. -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
Sorry for delay, >>I wonder if the prioritisation could be done >>using nice - each backend is a separate >>process, so why not to do 'nice(10)' for low >>priority processes or something like that. > >Yes, to a limited degree you can prioritise >queries using nice and ionice, but it's awkward because: > >- All queries run as `postgres' so you can't do per-user limiting very easily > >- The postmaster doesn't have a way to set the >nice level and ionice level when it forks a >backend, nor does the backend have any way to do >it later. You can use your own user-defined C functions for this, though. Postmaster hasn't a way to modify the system nice or ionice, but it can send signals to child process. Sending a user signal to sleep completely the child (i/o and cpu) or before any i/o call can do the trick. Perhaps it's time to add a query scheduler or a more complete/complex one in postgres. >-- >Craig Ringer ---------------------------------------------- Si la vida te da la espalda, ¡tocale el culo!
At the db level, Oracle provides "Database replay" feature. that lets you replay the production server events in the development/test environment.
http://docs.oracle.com/cd/B28359_01/server.111/e12253/dbr_intro.htm
Won't something like this be useful in PostgreSQL? It will let us mimic the production environment load and analyze it better.
There are several projects aiming at this goal in various ways: http://wiki.postgresql.org/wiki/Statement_Playback
Some of the features currently under development right now will make this sort of thing easier to build into the core database. For example, the recent "Command Triggers" feature submission will make it easier to catch DDL changes as well as queries for this sort of thing.
-- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On 12/09/2011 08:54 PM, Greg Smith wrote: > I decided about a year ago that further work on using Systemtap was a > black hole: time goes in, nothing really usable on any production > server seems to come out. My off-list e-mail this weekend has, quite rightly, pointed out that this cheap shot is unfair bordering on libel toward the hard working Systemtap developers. I'd like to publicly apologize for that and clarify my frustrated statement here (I'd *really* like this sort of tool available more) The main problem I've had with Systemtap is its reputation; I don't actually have any real, informed gripes about its current state. But the sort of customers I have are very risk-adverse. PostgreSQL does a good job attracting that sort of user. I'm sure we have a disproportionate number of them relative to your average open-source program. Accordingly, unless a piece of software is very trusted, it's hard for me to convince anyone to use it. (See "why Greg hates the disclaimers around the PostgreSQL contrib modules") That makes it hard for me to give Systemtap a serious spin on most of the production servers I see my hardest problems on. That's the reason behind the statement I made badly here--regardless of how much I know about it, I can't seem to get Systemtap deployed in the places I spent the most time working at. True or false, the "Systemtap is more likely to crash your kernel than DTrace" meme is out there. I think some of that is an unexpectedly bad side-effect of its open-source development. DTrace had the luxury of being hidden from the world at large until it was well formed. Whereas a lot of people saw Systemtap in a really early state, formed opinions several years ago, and the oldest of those are some of the highest ranking pages when you search for information. I just searched again today, and there's mounds of stuff from 2006 and 2007 that surely doesn't reflect the current state of things coming back from that. Systemtap didn't get a 1.0 release until September 2009. As someone who has spent a lot of time at the wrong end of the "PostgreSQL is slower than MySQL" meme, I shouldn't have just thrown this sort of criticism out there without explaining the basis for my statement. I hope this clears up what I meant. Ultimately I think we need both more tools like Systemtap and DTrace, as well as more instrumentation inside PostgreSQL, to cover all of the things people would like visibility into. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
One thing I think would be interesting for this would be to identify slow queries (without doing detailed plan timing) and flag them for more detailed timing if they're run again within <x> time. I suspect this would only be practical with parameterised prepared statements where the query string remained the same, but that'd still be interesting - essentially automatically upgrading the log level for problem queries from slow query logging to auto_explain with explain analyse.
>
> Some of the features currently under development right now will make
> this sort of thing easier to build into the core database. For
> example, the recent "Command Triggers" feature submission will make
> it easier to catch DDL changes as well as queries for this sort of thing.
Thank you for all the responses. I am a long-time Oracle guy who loves trying PostgreSQL because it is looks like a great database. I try it out on my pc, and have no production experience with PostgrSQL. We recently tried moving our product (using Oracle) with more than 100 GB data to PostgreSQL and conducted performance testing. It did an outstanding job!
But I miss all those user-friendly trouble-shooting utilities (like Automatic Workload Repository,Active Session History etc etc) in PostgreSQL. Yes - some of them are there, but one has to search,download, configure etc. I hope many of these features will become part of the 'core' soon.
Regards,
Jayadevan
DISCLAIMER:
"The information in this e-mail and any attachment is intended only for the person to whom it is addressed and may contain confidential and/or privileged material. If you have received this e-mail in error, kindly contact the sender and destroy all copies of the original communication. IBS makes no warranty, express or implied, nor guarantees the accuracy, adequacy or completeness of the information contained in this email or any attachment and is not liable for any errors, defects, omissions, viruses or for resultant loss or damage, if any, direct or indirect."
But I miss all those user-friendly trouble-shooting utilities (like Automatic Workload Repository,Active Session History etc etc) in PostgreSQL. Yes - some of them are there, but one has to search,download, configure etc. I hope many of these features will become part of the 'core' soon.
That's unlikely, simply because the definition of "core" in PostgreSQL doesn't quite include the full experience of user tools like this. For example, you might see the core collect the data needed for something that acts like a AWR baseline. But you're unlikely to ever get the sort of easy baseline management+graph management tools that Oracle's Enterprise Manager layers on top of them in core. There's no place to put a GUI/web tool like that there, and I wouldn't expect that to ever change. You might see it provided as a feature to the pgAdmin tool though, or as a separate web application.
The fact that there are multiple pieces of software involved doesn't have to make this hard. Ultimately the problem you're identifying is a packaging one. Something doesn't have to be in the PostgreSQL core to be packaged nicely so that you can easily install and use it. It's probably easy for you to get pgAdmin installed and working for example, and that's not a part of core. There's just been a lot more work put into packaging it than most tools have gotten so far.
-- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us