Thread: Contrib -- PostgreSQL shared variables
This is a first pass on a simple shared memory variable system for PostgreSQL. I would appriciate anyone interested in this functionality to rip it apart. It basically adds this functionality: SetShared('name', value); GetSharedInt('name'); SetSharedText('name); RemovedShared('name'); I also added two extra functions that are sort of a kludge, but could be very helpful. AddSharedInt('name', value); SubSharedInt('name', value); These add or subtect the 'value' from the variable and return the result.
Attachment
LockShared('name'); pgsql@mohawksoft.com wrote: > This is a first pass on a simple shared memory variable system for > PostgreSQL. I would appriciate anyone interested in this functionality to > rip it apart. > > It basically adds this functionality: > > SetShared('name', value); > GetSharedInt('name'); > SetSharedText('name); > RemovedShared('name'); > > I also added two extra functions that are sort of a kludge, but could be > very helpful. > > AddSharedInt('name', value); > SubSharedInt('name', value); > > These add or subtect the 'value' from the variable and return the result. > > > ------------------------------------------------------------------------ > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match
> LockShared('name'); Hmmm, I thought about that, but it is *WAY* more complicated than it looks. What if after a "Lock" the process crashes before it can call "Unlock?" It is this problem that inspired the "add" and "sub" calls. > > pgsql@mohawksoft.com wrote: >> This is a first pass on a simple shared memory variable system for >> PostgreSQL. I would appriciate anyone interested in this functionality >> to >> rip it apart. >> >> It basically adds this functionality: >> >> SetShared('name', value); >> GetSharedInt('name'); >> SetSharedText('name); >> RemovedShared('name'); >> >> I also added two extra functions that are sort of a kludge, but could be >> very helpful. >> >> AddSharedInt('name', value); >> SubSharedInt('name', value); >> >> These add or subtect the 'value' from the variable and return the >> result. >> >> >> ------------------------------------------------------------------------ >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 9: the planner will ignore your desire to choose an index scan if >> your >> joining column's datatypes do not match >
pgsql@mohawksoft.com wrote: > This is a first pass on a simple shared memory variable system for > PostgreSQL. I would appriciate anyone interested in this functionality to > rip it apart. > > It basically adds this functionality: > > SetShared('name', value); > GetSharedInt('name'); > SetSharedText('name); > RemovedShared('name'); > > I also added two extra functions that are sort of a kludge, but could be > very helpful. > > AddSharedInt('name', value); > SubSharedInt('name', value); > > These add or subtect the 'value' from the variable and return the result. > Something that I've found very useful when dealing with shared memory is the ability to do atomic exchange of values. With that in place, its possible to perform atomic operations involving several variables. Perhaps it could be as simple as splitting your SetShared into SetSharedInt and SetSharedString and then have them return the old value? Here's an example of what I mean (I use C syntax for clarity, I know the intended use is from SQL). /* Loop until something other than LOCKED is returned. When * that happens, we have the lock. */ while(SetSharedInt('lock', LOCKED) == LOCKED) usleep(WAIT_PERIOD); Set a group of variables here. /* Unlock */ SetSharedInt('lock', UNLOCKED); Regards, Thomas Hallgren
This new version contains, in addition to the previous version, SwapShared(..) which allows you to swap a value in a variable.
Attachment
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday 25 August 2004 02:15 pm, pgsql@mohawksoft.com wrote: > This is a first pass on a simple shared memory variable system for > PostgreSQL. I would appriciate anyone interested in this functionality to > rip it apart. > > It basically adds this functionality: > > SetShared('name', value); > GetSharedInt('name'); > SetSharedText('name); > RemovedShared('name'); > > I also added two extra functions that are sort of a kludge, but could be > very helpful. > > AddSharedInt('name', value); > SubSharedInt('name', value); > > These add or subtect the 'value' from the variable and return the result. I don't see how this is different from "CREATE TABLE shared_variables (name VARCHAR PRIMARY KEY, value VARCHAR)" and inserting/updating/deleting/selecting from that. Perhaps these are per-session shared variables? IN which case, what is the utility if sharing them across shared memory? - -- Jonathan Gardner jgardner@jonathangardner.net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) iD8DBQFBL8FMqp6r/MVGlwwRAl4KAJ4oC02uGs5IjO0basUCxNPd3/CCqQCfcepL NLDzKJPINHKK9adjKue2qlQ= =Z8mP -----END PGP SIGNATURE-----
> I don't see how this is different from "CREATE TABLE shared_variables > (name > VARCHAR PRIMARY KEY, value VARCHAR)" and > inserting/updating/deleting/selecting from that. Perhaps these are > per-session shared variables? IN which case, what is the utility if > sharing > them across shared memory? > > - -- > Jonathan Gardner Well, the issues you don't see is this: What if you have to update the variables [n] times a second? You have to vacuum very frequently. If you update a variable a hundred times a second, and vacuum only once every minute, the time it takes to update ranges from reading one row from the database to reading 5999 dead rows to get to the live one. Then you vacuum, then you are back to one row again. On top of that, all the WAL logging that has to take place for each "transaction." I hope no one misunderstands my motives here. I think PostgreSQL is a GREAT project. I differ with the core team on a frequent basis. Maybe the problem is maybe I've been corrupted by Windows programming or something. If you develop software on Windows, the trick is knowing how to do things not intended to be done by you in order to accomplish your task. These small little tricks are common knowledge in the M$ campuses, but in the outside world they are the street cred that gets you paid. I may play a little too fast and loose with "standards." PostgreSQL is a "GREAT" SQL database, however the "database" is no longer the product. The product is the set of functions and infrastructure on top of the database. The SQL database is a commodity, it is what you can do with it that makes it interresting. The stuff that I contribute is often met with resistence, and that may or may not be a good thing, but over the years, things I've said they NEEDED to do, have been done and PostgreSQL is better for it. I remember an exchange with Tom about four or five years ago, about the need to return a number of rows from a PostgreSQL function. I don't remember it too well, but my memory tells me he said it wasn't possible, or maybe it was possible but not worth it, or something like that. PostgreSQL returns sets of rows now. It took a few years, but no one would say, now, they didn't need to do it. Now don't think I'm talking trash to Tom either. These guys do a lot of hard work. I have the luxury of working on other projects and noticing the sorts of things that are a problem or could be great features. I have great respect for them, even if not they for me. I think the shared variable module is another one of those things. The cost overhead of a single variable implemented as a row is too high, especially if you want to update it many times a second. This is a simple module. It will work for a good number of deployments and make PostgreSQL more viable for them. I fully expect that the module will grow into something better, ala rserv, be folded into PostgreSQL, ala toast, or just languish in contrib. (if it is added at all.) I believe in Open Source, I have needed this functionality in the past, I will probably need it in the future. If I can share it with the project, why not?
On Sat, 28 Aug 2004 pgsql@mohawksoft.com wrote: > >> I don't see how this is different from "CREATE TABLE shared_variables >> (name >> VARCHAR PRIMARY KEY, value VARCHAR)" and >> inserting/updating/deleting/selecting from that. Perhaps these are >> per-session shared variables? IN which case, what is the utility if >> sharing >> them across shared memory? >> >> - -- >> Jonathan Gardner > > Well, the issues you don't see is this: > > What if you have to update the variables [n] times a second? > > You have to vacuum very frequently. If you update a variable a hundred > times a second, and vacuum only once every minute, the time it takes to > update ranges from reading one row from the database to reading 5999 dead > rows to get to the live one. Then you vacuum, then you are back to one row > again. I think the right approach is to tackle that problem instead of working around it with a completely new variable mechanism. I've been playing with the idea of a quick vacuum that runs through the shmem buffers. The idea is that since the pages are already in memory, the vacuum runs very quickly. Vacuuming the hot pages frequently enough should avoid the problem you describe. It also saves I/O in the long run since dirty pages are vacuumed before they are written to disk, eliminating the need to read in, vacuum and write the same pages again later. The problem is of course that to vacuum the heap pages, you have to make sure that there is no references to the dead tuples from any indexes. The trivial case is that the table has no indexes. But I believe that even if the table has ONE index, it's very probable that the corresponding index pages of the dead tuples are also in memory, since the tuple was probably accessed through the index. As the number of indexes gets bigger, the chances of all corresponding index pages being in memory gets smaller. If the "quick" vacuum or opportunistic vacuum as I call it is clever enough to recognize that there is a dead tuple in memory, and all the index pages that references are in memory too, it could reliably vacuum just those tuples without scanning through the whole relation and without doing any extra I/O. I've written some code that implements the trivial case of no indexes. I'm hoping to extend it to handle the indexes too if I have time. Then we'll see if it's any good. I've attached a patch with my current ugly implementation if you want to give it a try. > On top of that, all the WAL logging that has to take place for each > "transaction." How is that a bad thing? You don't want to give up ACID do you? - Heikki
Jonathan, > The stuff that I contribute is often met with resistence, and that may or > may not be a good thing, but over the years, things I've said they NEEDED > to do, have been done and PostgreSQL is better for it. > Now don't think I'm talking trash to Tom either. These guys do a lot of > hard work. I have the luxury of working on other projects and noticing the > sorts of things that are a problem or could be great features. I have > great respect for them, even if not they for me. I think you're reading too much into the objections. One of the things that keeps PostgreSQL fast, accessable, and open to new OSS developers is a high barrier of entry to new code proposed for the core. A certain minimalism is called for if we don't want to end up like a Microsoft product, bloated, self-conflicted, and utterly impenetrable even to its own experts. Often it's Tom and Peter expressing that minimalism but others would have to were they to go away (gods forfend). This means that, for a project to get into the core, any proposer will have to spend a lot of time convincing the people on this list of its worthiness, both through lobbying and through code. Thus the very first piece of criticism you'll see is "Why do we need it?". But you can overcome these sorts of objections -- others have, as you yourself pointed out. Unfortunately, /contrib is no better than core in this way as our code gets larger and mainenance gets harder. I believe our eventual goal is/should be to *remove* contrib entirely, replacing it with a build environment of pluggable components which Peter is working on. This will have the beneficial effect of erasing the difficult-to-cross gap which currently exists between external (pgFoundry, GBorg, Sourceforge) projects and contrib. Go Peter! Also, keep in mind that Tom just completed a 300-hour marathon of reviewing features for 8.0. I can't imagine that he'll take a positive additude toward new features of any kind for about a month. > I think the shared variable module is another one of those things. The > cost overhead of a single variable implemented as a row is too high, > especially if you want to update it many times a second. Question: How will these "system variables" behave regarding transactions? If I update a system variable and roll back the transaction, does it change back? Do changes in a running transaction remain invisible until COMMIT? Excuse me if you've already answered these; I've not caught up on my -hackers backlog since I broke my foot. -- Josh Berkus Aglio Database Solutions San Francisco
>> I think the shared variable module is another one of those things. The >> cost overhead of a single variable implemented as a row is too high, >> especially if you want to update it many times a second. > > Question: How will these "system variables" behave regarding > transactions? > If I update a system variable and roll back the transaction, does it > change > back? Do changes in a running transaction remain invisible until COMMIT? > Excuse me if you've already answered these; I've not caught up on my > -hackers > backlog since I broke my foot. Actually that are not involved with transactions in any way. There classes of problems in which proper ACID implementation is not nessisary. In fact, there are subclasses within a larger system that need not have the overhead imposed by transactions. Think about this: In one part of the system, you update a table: updated summary_table set sum = sum + $newvalue ; In another part of the system you perform this: select sum from summary_table; You did this to keep from having to do: select sum(col) from real_table; Which, as everyone knows, can be millions of records, and a full table scan would be too expensive. In an active system there is *no* way that the results of these transactions can be anything but an estimate or a snap shot. By the time the results have been reported, the results surely would have changed. In order to get this inaccurate information, you had to incure many dleted rows in the summary table for each update or the cost of a full table scan. Usng the variable metaphor, the overall cost of doing this is almost nil. With care, the results can be just as accurate with respect to time. Like I said, it is *NOT* a general solution, but it *is* a solution a lot of developers using databases have to implement *somewhere*, why not in the database in which the data is stored?
> On Sat, 28 Aug 2004 pgsql@mohawksoft.com wrote: > >> >>> I don't see how this is different from "CREATE TABLE shared_variables >>> (name >>> VARCHAR PRIMARY KEY, value VARCHAR)" and >>> inserting/updating/deleting/selecting from that. Perhaps these are >>> per-session shared variables? IN which case, what is the utility if >>> sharing >>> them across shared memory? >>> >>> - -- >>> Jonathan Gardner >> >> Well, the issues you don't see is this: >> >> What if you have to update the variables [n] times a second? >> >> You have to vacuum very frequently. If you update a variable a hundred >> times a second, and vacuum only once every minute, the time it takes to >> update ranges from reading one row from the database to reading 5999 >> dead >> rows to get to the live one. Then you vacuum, then you are back to one >> row >> again. > > I think the right approach is to tackle that problem instead of working > around it with a completely new variable mechanism. You assume there is a problem with the existing mechanism that needs fixing. There isn't. PostgreSQL does the right thing already. The issue is that the class of problem we are trying to solve does not fit the MVCC methodology very well. I've given this example a number of times: Consider this query: select sum(col) from table; If this is an active system, with many inserts and updates a second, and table has a couple million rows, there is NO way to get any meaningful answer out of this query. It is only a snapshot, not much better than an estimate. The shared variables are a way of expressing, "I know this is just an estimate, but I need to do it FAST." A LOT of websites can benefit from this, like a quick "Number of people online" > > I've been playing with the idea of a quick vacuum that runs through the > shmem buffers. The idea is that since the pages are already in memory, > the vacuum runs very quickly. Vacuuming the hot pages frequently > enough should avoid the problem you describe. It also saves I/O in the > long run since dirty pages are vacuumed before they are written to > disk, eliminating the need to read in, vacuum and write the same pages > again later. That's fine, but how does it affect performance? > > The problem is of course that to vacuum the heap pages, you have to make > sure that there is no references to the dead tuples from any indexes. Exactly. > > The trivial case is that the table has no indexes. But I believe that > even if the table has ONE index, it's very probable that the corresponding > index pages of the dead tuples are also in memory, since the tuple was > probably accessed through the index. > > As the number of indexes gets bigger, the chances of all corresponding > index pages being in memory gets smaller. > > If the "quick" vacuum or opportunistic vacuum as I call it is clever > enough to recognize that there is a dead tuple in memory, and all the > index pages that references are in memory too, it could reliably vacuum > just those tuples without scanning through the whole relation and without > doing any extra I/O. > > I've written some code that implements the trivial case of no indexes. I'm > hoping to extend it to handle the indexes too if I have time. Then we'll > see if it's any good. I've attached a patch with my current ugly > implementation if you want to give it a try. This is cool, but is it actually "more" work over the long run than vacuum? I mean, a wile ago, vacuum had to lock the tables. This non-locking vacuum is pretty cool and takes away almost all objections that I can think about with PostgreSQL. > >> On top of that, all the WAL logging that has to take place for each >> "transaction." > > How is that a bad thing? You don't want to give up ACID do you? Actually, yes, yes I do. Not for everything, but for classes of problems that do not need it. I've some experience with this. I have a session manager for PHP. I wrote it because databases fail miserably when what you want to do is have SQL-esque information that neeeds to be FAST and can go away without worrying too much about it. Think about a website shopping cart. The session and the things in the cart don't matter after 10 or 15 minutes. Most of the time, it will just be abandoned. The only time it is important is at checkout. At which point it is transfered to an ACID compliant system. My session manager on a modern dual PIII can handle 8000 full "get vars, set vars" cycles a second. This is more or less impossible with a SQL database without spending a LOT of money on clustering. Mine can do it with a single standard x86 Linux server box. Some information can go away without worry. Paying for protecting it with CPU and disk I/O can be VERY expensive. PostgreSQL does what it does very well. Sometimes what it does can get in the way of using it. > > - Heikki > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
Josh Berkus <josh@agliodbs.com> writes: > Question: How will these "system variables" behave regarding transactions? > If I update a system variable and roll back the transaction, does it change > back? Do changes in a running transaction remain invisible until COMMIT? > Excuse me if you've already answered these; I've not caught up on my -hackers > backlog since I broke my foot. I think this is precisely the objection. He wants something that allows him to break out of transaction semantics entirely. Other backends would see the value he sets before he commits. If he rolls back his changes would remain. I expect to see substantial resistance to incorporating such a feature. One argument will be that he should simply use a separate system for such data. For example, memcached would be much faster and more scalable than communicating via postgres. The only disadvantage to using something like memcached would be that you couldn't transparently use values from there inside your queries. You would have to fetch them and then pass them as parameters to postgres. -- greg
Greg Stark wrote: >I expect to see substantial resistance to incorporating such a feature. One >argument will be that he should simply use a separate system for such data. >For example, memcached would be much faster and more scalable than >communicating via postgres. > > Postgres should do what postgres is good at IMNSHO. Small is beautiful. >The only disadvantage to using something like memcached would be that you >couldn't transparently use values from there inside your queries. You would >have to fetch them and then pass them as parameters to postgres. > > Could you not overcome that using server-side functions, in, say, C or plperlu? In fact I suspect you could implement this whole scheme using memcached (which looks quite cool and exactly made for this) plus a very small amount of plperlu glue. cheers andrew
On Sun, Aug 29, 2004 at 09:24:59AM -0400, pgsql@mohawksoft.com wrote: > Think about a website shopping cart. The session and the things in the > cart don't matter after 10 or 15 minutes. Most of the time, it will just > be abandoned. The only time it is important is at checkout. At which point > it is transfered to an ACID compliant system. Much of the rest of what you said makes sense, but this is a TERRIBLE example. As a user, I have been burned repeatedly by crappy, non-persistent shopping carts built by developers who, apparently, agree with you that, "Oh, it's just a shopping cart, no one cares if it randomly disappears and the user has to start over from scratch." Nonsense! Damn it, if put 12 different items in my shopping cart and then go to lunch, I expect my shopping cart to STILL BE THERE when I get back! Note that Amazon does this right, so any ecommerce site doing it WRONG does so at severe peril of losing its customers. > My session manager on a modern dual PIII can handle 8000 full "get vars, > set vars" cycles a second. I don't really see why you need or want to do this in PostgreSQL, though. Can't you do it in your web or application server? (I certainly can in AOLserver with its nsv API, for example.) What's the advantage of instead stuffing these in-memory variables into PostgreSQL? That it still works correctly and simply even if you have a farm of 10 separate web servers sitting in front of it? That could be useful for certain applications, but is there any more to it than that? I've occasionally REALLY wanted a small in-memory ACId (no "D" for Durability) RDBMS for tracking transient data that doesn't need to go to disk (and one of these days I will experiment with using SQLite for that), but that's not what you're talking about, right? The shared variables you're talking about here are just simple scalar values, no tables, no integrity constraints, no MVCC, none of that, right? -- Andrew Piskorski <atp@piskorski.com> http://www.piskorski.com/
Mohawksoft, > Actually that are not involved with transactions in any way. > > There classes of problems in which proper ACID implementation is not > nessisary. In fact, there are subclasses within a larger system that need > not have the overhead imposed by transactions. Hmmm, wait, are you implementing this, or is Jonathan? As a database developer who would use this or a similar feature, I do not agree that the lack of transactionality is a benefit. Quite the contrary. Imagine this scenario, once we have error-trapping in place for 8.1 (pardon the syntax if it's off the proposal, I think you'll see the point anyway): BEGINSAVEPOINT before_udpate; UPDATE jehosaphat SET slivy = 60 WHERE status = 4; SET SHARED jehosaphat_last_update= now(); IF transaction_error > 0 THEN ROLLBACK TO before_update; ELSE COMMIT; END IF; ... Now assume that the reason I'm using a shared variable is that "jehosaphat" needs to be updated every 500ms, and thus we don't want to be storing it in a table. In that case, if shared variables are not transactional, the above would be deceptive; the system would think that jeshosaphat had been updated when it had not. Ah, you say, but you could work around that if you coded correctly. Maybe so, although I find the idea that transactions don't apply to some objects very liable to cause confusion. Also consider this case: CREATE FUNCTION update_jeshosaphat_last () RETURNS TRIGGER AS ' BEGINSET SHARED jehosaphat_last_update = now(); END; ' language plpgsql; CREATE TRIGGER tg_jehosaphat_update BEFORE UPDATE OR DELETE ON jehosaphat FOR EACH STATEMENT EXECUTE PROCEDURE update_jehosaphat_last(); NOW imagine if your update gets rolled back; the shared variable will be misreporting the last update and there isn't any coding around it. My personal perspective is that shared variables should be transactional, just like everything else in PostgreSQL except sequences. Just because they are transactional does not mean they have to be saved on disk; that's where most of your "dead tuple overhead" comes in. If you could place the variables in "reserved" shared memory and never sync, then needing to have several copies which split and collapse to remain current with the existing transactions should not be a major performance cost. Of course, it will make the code a lot harder, but that's one of the reasons why nobody has done it before now. Now, if you're telling me that it's too difficult for you to implement transactions at this time, and that you want to push this patch without transactions for them to be added sometime later, then we can have it out on that basis. But you're going to have a hard time convincing me or several other people that ignoring transactions is somehow a benefit. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
> On Sun, Aug 29, 2004 at 09:24:59AM -0400, pgsql@mohawksoft.com wrote: > >> Think about a website shopping cart. The session and the things in the >> cart don't matter after 10 or 15 minutes. Most of the time, it will just >> be abandoned. The only time it is important is at checkout. At which >> point >> it is transfered to an ACID compliant system. > > Much of the rest of what you said makes sense, but this is a TERRIBLE > example. As a user, I have been burned repeatedly by crappy, > non-persistent shopping carts built by developers who, apparently, > agree with you that, "Oh, it's just a shopping cart, no one cares if > it randomly disappears and the user has to start over from scratch." > > Nonsense! Damn it, if put 12 different items in my shopping cart and > then go to lunch, I expect my shopping cart to STILL BE THERE when I > get back! Note that Amazon does this right, so any ecommerce site > doing it WRONG does so at severe peril of losing its customers. Your complaint is the timing, not the process. Most web sessions go away, less than 10% of all shopping carts ever get checked out, period. The Msession manager has a module that will save expired sessions to a database and reload them if the session ID ever comes back, if you like. Listen, I don't want to ramble on about web sessions, that is a different topic, and I only wanted an example of data that was important to use in the database, but was not persistent enough or irreplacable enough to warrent the full ACID overhead. Web sessions are temporary. If you have a machine that crashes every 9 months which causes all the active sessions to be lost, this is probably acceptable as long as there is failover. That is totally unacceptable with regard to a database. There are classes of undependable or transient and reproducable data in a system which works in concert with dependable data. It is inefficient to require the transient data to be managed under the strict guidelines of the dependable data. Furthermore, the more independent systems you have within a larger system, the less reliable the larger system is. > >> My session manager on a modern dual PIII can handle 8000 full "get vars, >> set vars" cycles a second. > > I don't really see why you need or want to do this in PostgreSQL, > though. Can't you do it in your web or application server? (I > certainly can in AOLserver with its nsv API, for example.) Having managed many projects, I can tell you, point blank, the fewer things you need to use the better. An extension to an interface you already use is much better than a different interface. Also, the data in the shared variables is related to the data in the database. It belongs close to it. > > What's the advantage of instead stuffing these in-memory variables > into PostgreSQL? That it still works correctly and simply even if you > have a farm of 10 separate web servers sitting in front of it? That > could be useful for certain applications, but is there any more to it > than that? Isn't "that could be useful for certain applications" enough of a reason to add something to contrib? > > I've occasionally REALLY wanted a small in-memory ACId (no "D" for > Durability) RDBMS for tracking transient data that doesn't need to go > to disk (and one of these days I will experiment with using SQLite for > that), but that's not what you're talking about, right? The shared > variables you're talking about here are just simple scalar values, no > tables, no integrity constraints, no MVCC, none of that, right? Right. The metaphor will be done in almost every non-trivial system. The developer can either search for some system to do it and figure out some way to use the data that system in PostgreSQL, or load this contrib module, and have it at the SQL level. Honestly, I don't see the problem.
> Mohawksoft, > >> Actually that are not involved with transactions in any way. >> >> There classes of problems in which proper ACID implementation is not >> nessisary. In fact, there are subclasses within a larger system that >> need >> not have the overhead imposed by transactions. > > Hmmm, wait, are you implementing this, or is Jonathan? > > As a database developer who would use this or a similar feature, I do not > agree that the lack of transactionality is a benefit. Quite the > contrary. > Imagine this scenario, once we have error-trapping in place for 8.1 > (pardon > the syntax if it's off the proposal, I think you'll see the point anyway): [snip -- example] Your example makes sense if you think that you'll only be using this in the SQL environment. One of the most common SQL database applications is as the backend of some higher level system, like a web site. In these systems, ususlly done in Java, PHP, or heaven help us, .NET, you have multiple front end machine communicating with a SQL database backend. These systems use the database as the main data server. The overall scalability of their system is based on the scalability of the database. At some point, you'll max out the DB and be stuck. The shared variables module is designed to allow you to scale better by allowing you to cheapy maintain data. So, rather than call this: select sum(field) from mytable; or update mytable_summary set field = field + n; You can do this: select addshared('mytable', n); Now, in Java or PHP, you'd do something like this: somefunc() { if(sql.exec(...)) { // Worked sql.exec("select addshared('mytable', n)"); } } OK, now you're going to say, why not use a different system for the caching? As I said in a different post, the more systems you glue together, the more complicated the bigger system gets. The more complicated it gets, the less reliable it gets. Besides, this is data you're gonna use with the data in PostgreSQL, so why not provide an extension for it?
Mohawksoft, > One of the most common SQL database applications is as the backend of some > higher level system, like a web site. In these systems, ususlly done in > Java, PHP, or heaven help us, .NET, you have multiple front end machine > communicating with a SQL database backend. Leaving aside that Java, .NET and even PHP5 have much more sophisticated ways of sharing variables ... > These systems use the database as the main data server. The overall > scalability of their system is based on the scalability of the database. > At some point, you'll max out the DB and be stuck. The shared variables > module is designed to allow you to scale better by allowing you to cheapy > maintain data. I really don't follow how this makes a lack of transaction-awareness for these global variables an advantage. If you're saying that it's not possible to make shared variables which are both cheap and transaction-aware, then say so and be clear about it. Please understand that I am not opposed to the idea behind the shared variable proposal. I just think there are some details to be worked out. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
> Mohawksoft, > >> One of the most common SQL database applications is as the backend of >> some >> higher level system, like a web site. In these systems, ususlly done in >> Java, PHP, or heaven help us, .NET, you have multiple front end machine >> communicating with a SQL database backend. > > Leaving aside that Java, .NET and even PHP5 have much more sophisticated > ways > of sharing variables ... Yes, and hows does a developer share between PHP, Java, and .NET? Yes, in a homogenous system that may work, but not across multiple systems, and that still doesn't address the data being 'in' the database as opposed to some other system. > >> These systems use the database as the main data server. The overall >> scalability of their system is based on the scalability of the database. >> At some point, you'll max out the DB and be stuck. The shared variables >> module is designed to allow you to scale better by allowing you to >> cheapy >> maintain data. > > I really don't follow how this makes a lack of transaction-awareness for > these > global variables an advantage. If you're saying that it's not possible > to > make shared variables which are both cheap and transaction-aware, then say > so > and be clear about it. > > Please understand that I am not opposed to the idea behind the shared > variable > proposal. I just think there are some details to be worked out. You won't understand it from what you quoted because the reasoning has been snipped, and I've posted the example so many times my fingers are sore. Some data does not need to be transactional. Data that does not need to be transactional does not need to be transactional. If overhead can be reduced or eliminated by eliminating transactional procedures for data which does not need it, then the database scales better. Isn't that completely obvious? If your system is bonking out, every last bit of efficiency is important. (And this is no small efficiency!) It is these sorts of tricks that can make the difference between a well working site, and one that looks like it has been slashdotted.
On Monday 30 August 2004 09:50, pgsql@mohawksoft.com wrote: > > On Sun, Aug 29, 2004 at 09:24:59AM -0400, pgsql@mohawksoft.com wrote: > > What's the advantage of instead stuffing these in-memory variables > > into PostgreSQL? That it still works correctly and simply even if you > > have a farm of 10 separate web servers sitting in front of it? That > > could be useful for certain applications, but is there any more to it > > than that? > > Isn't "that could be useful for certain applications" enough of a reason > to add something to contrib? > That's enough of a reason to add something to *gborg*, but I can think of a lot of things I would rather see the postgresql major contributors maintaining than this.. especially when you consider that we've already removed things like jdbc from contrib. -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL