Thread: Contrib -- PostgreSQL shared variables

Contrib -- PostgreSQL shared variables

From

pgsql@mohawksoft.com

Date:

25 August 2004, 17:54:43

This is a first pass on a simple shared memory variable system for
PostgreSQL. I would appriciate anyone interested in this functionality to
rip it apart.

It basically adds this functionality:

SetShared('name', value);
GetSharedInt('name');
SetSharedText('name);
RemovedShared('name');

I also added two extra functions that are sort of a kludge, but could be
very helpful.

AddSharedInt('name', value);
SubSharedInt('name', value);

These add or subtect the 'value' from the variable and return the result.

Attachment

sharedvar.tgz

Re: Contrib -- PostgreSQL shared variables

From

Rick Gigger

Date:

25 August 2004, 18:24:04

LockShared('name');

pgsql@mohawksoft.com wrote:
> This is a first pass on a simple shared memory variable system for
> PostgreSQL. I would appriciate anyone interested in this functionality to
> rip it apart.
>
> It basically adds this functionality:
>
> SetShared('name', value);
> GetSharedInt('name');
> SetSharedText('name);
> RemovedShared('name');
>
> I also added two extra functions that are sort of a kludge, but could be
> very helpful.
>
> AddSharedInt('name', value);
> SubSharedInt('name', value);
>
> These add or subtect the 'value' from the variable and return the result.
>
>
> ------------------------------------------------------------------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

Re: Contrib -- PostgreSQL shared variables

From

pgsql@mohawksoft.com

Date:

25 August 2004, 18:26:39

> LockShared('name');

Hmmm, I thought about that, but it is *WAY* more complicated than it
looks. What if after a "Lock" the process crashes before it can call
"Unlock?" It is this problem that inspired the "add" and "sub" calls.



>
> pgsql@mohawksoft.com wrote:
>> This is a first pass on a simple shared memory variable system for
>> PostgreSQL. I would appriciate anyone interested in this functionality
>> to
>> rip it apart.
>>
>> It basically adds this functionality:
>>
>> SetShared('name', value);
>> GetSharedInt('name');
>> SetSharedText('name);
>> RemovedShared('name');
>>
>> I also added two extra functions that are sort of a kludge, but could be
>> very helpful.
>>
>> AddSharedInt('name', value);
>> SubSharedInt('name', value);
>>
>> These add or subtect the 'value' from the variable and return the
>> result.
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: the planner will ignore your desire to choose an index scan if
>> your
>>       joining column's datatypes do not match
>

Re: Contrib -- PostgreSQL shared variables

From

Thomas Hallgren

Date:

25 August 2004, 18:39:45

pgsql@mohawksoft.com wrote:
> This is a first pass on a simple shared memory variable system for
> PostgreSQL. I would appriciate anyone interested in this functionality to
> rip it apart.
>
> It basically adds this functionality:
>
> SetShared('name', value);
> GetSharedInt('name');
> SetSharedText('name);
> RemovedShared('name');
>
> I also added two extra functions that are sort of a kludge, but could be
> very helpful.
>
> AddSharedInt('name', value);
> SubSharedInt('name', value);
>
> These add or subtect the 'value' from the variable and return the result.
>
Something that I've found very useful when dealing with shared memory is
the ability to do atomic exchange of values. With that in place, its
possible to perform atomic operations involving several variables.

Perhaps it could be as simple as splitting your SetShared into
SetSharedInt and SetSharedString and then have them return the old value?

Here's an example of what I mean (I use C syntax for clarity, I know the
intended use is from SQL).

     /* Loop until something other than LOCKED is returned. When
      * that happens, we have the lock.
      */
     while(SetSharedInt('lock', LOCKED) == LOCKED)
         usleep(WAIT_PERIOD);

     Set a group of variables here.

     /* Unlock
      */
     SetSharedInt('lock', UNLOCKED);


Regards,

Thomas Hallgren

Re: Contrib -- PostgreSQL shared variables -with swap

From

pgsql@mohawksoft.com

Date:

25 August 2004, 22:12:59

This new version contains, in addition to the previous version,
SwapShared(..) which allows you to swap a value in a variable.

Attachment

sharedvar-040825.tgz

Re: Contrib -- PostgreSQL shared variables

From

Jonathan Gardner

Date:

27 August 2004, 20:18:55

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 25 August 2004 02:15 pm, pgsql@mohawksoft.com wrote:
> This is a first pass on a simple shared memory variable system for
> PostgreSQL. I would appriciate anyone interested in this functionality to
> rip it apart.
>
> It basically adds this functionality:
>
> SetShared('name', value);
> GetSharedInt('name');
> SetSharedText('name);
> RemovedShared('name');
>
> I also added two extra functions that are sort of a kludge, but could be
> very helpful.
>
> AddSharedInt('name', value);
> SubSharedInt('name', value);
>
> These add or subtect the 'value' from the variable and return the result.

I don't see how this is different from "CREATE TABLE shared_variables (name
VARCHAR PRIMARY KEY, value VARCHAR)" and
inserting/updating/deleting/selecting from that. Perhaps these are
per-session shared variables? IN which case, what is the utility if sharing
them across shared memory?

- --
Jonathan Gardner
jgardner@jonathangardner.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQFBL8FMqp6r/MVGlwwRAl4KAJ4oC02uGs5IjO0basUCxNPd3/CCqQCfcepL
NLDzKJPINHKK9adjKue2qlQ=
=Z8mP
-----END PGP SIGNATURE-----

Re: Contrib -- PostgreSQL shared variables

From

pgsql@mohawksoft.com

Date:

28 August 2004, 00:39:36

> I don't see how this is different from "CREATE TABLE shared_variables
> (name
> VARCHAR PRIMARY KEY, value VARCHAR)" and
> inserting/updating/deleting/selecting from that. Perhaps these are
> per-session shared variables? IN which case, what is the utility if
> sharing
> them across shared memory?
>
> - --
> Jonathan Gardner

Well, the issues you don't see is this:

What if you have to update the variables [n] times a second?

You have to vacuum very frequently. If you update a variable a hundred
times a second, and vacuum only once every minute, the time it takes to
update ranges from reading one row from the database to reading 5999 dead
rows to get to the live one. Then you vacuum, then you are back to one row
again.

On top of that, all the WAL logging that has to take place for each
"transaction."

I hope no one misunderstands my motives here. I think PostgreSQL is a
GREAT project. I differ with the core team on a frequent basis. Maybe the
problem is maybe I've been corrupted by Windows programming or something.
If you develop software on Windows, the trick is knowing how to do things
not intended to be done by you in order to accomplish your task. These
small little tricks are common knowledge in the M$ campuses, but in the
outside world they are the street cred that gets you paid. I may play a
little too fast and loose with "standards."

PostgreSQL is a "GREAT" SQL database, however the "database" is no longer
the product. The product is the set of functions and infrastructure on top
of the database. The SQL database is a commodity, it is what you can do
with it that makes it interresting.

The stuff that I contribute is often met with resistence, and that may or
may not be a good thing, but over the years, things I've said they NEEDED
to do, have been done and PostgreSQL is better for it.

I remember an exchange with Tom about four or five years ago, about the
need to return a number of rows from a PostgreSQL function. I don't
remember it too well, but my memory tells me he said it wasn't possible,
or maybe it was possible but not worth it, or something like that.
PostgreSQL returns sets of rows now. It took a few years, but no one would
say, now, they didn't need to do it.

Now don't think I'm talking trash to Tom either. These guys do a lot of
hard work. I have the luxury of working on other projects and noticing the
sorts of things that are a problem or could be great features. I have
great respect for them, even if not they for me.

I think the shared variable module is another one of those things. The
cost overhead of a single variable implemented as a row is too high,
especially if you want to update it many times a second. This is a simple
module. It will work for a good number of deployments and make PostgreSQL
more viable for them. I fully expect that the module will grow into
something better, ala rserv, be folded into PostgreSQL, ala toast, or just
languish in contrib. (if it is added at all.)

I believe in Open Source, I have needed this functionality in the past, I
will probably need it in the future. If I can share it with the project,
why not?

Re: Contrib -- PostgreSQL shared variables

From

Heikki Linnakangas

Date:

28 August 2004, 14:26:40

On Sat, 28 Aug 2004 pgsql@mohawksoft.com wrote:

>
>> I don't see how this is different from "CREATE TABLE shared_variables
>> (name
>> VARCHAR PRIMARY KEY, value VARCHAR)" and
>> inserting/updating/deleting/selecting from that. Perhaps these are
>> per-session shared variables? IN which case, what is the utility if
>> sharing
>> them across shared memory?
>>
>> - --
>> Jonathan Gardner
>
> Well, the issues you don't see is this:
>
> What if you have to update the variables [n] times a second?
>
> You have to vacuum very frequently. If you update a variable a hundred
> times a second, and vacuum only once every minute, the time it takes to
> update ranges from reading one row from the database to reading 5999 dead
> rows to get to the live one. Then you vacuum, then you are back to one row
> again.

I think the right approach is to tackle that problem instead of working 
around it with a completely new variable mechanism.

I've been playing with the idea of a quick vacuum that runs through the 
shmem buffers. The idea is that since the pages are already in memory, 
the vacuum runs very quickly. Vacuuming the hot pages frequently 
enough should avoid the problem you describe. It also saves I/O in the 
long run since dirty pages are vacuumed before they are written to 
disk, eliminating the need to read in, vacuum and write the same pages 
again later.

The problem is of course that to vacuum the heap pages, you have to make 
sure that there is no references to the dead tuples from any indexes.

The trivial case is that the table has no indexes. But I believe that 
even if the table has ONE index, it's very probable that the corresponding 
index pages of the dead tuples are also in memory, since the tuple was 
probably accessed through the index.

As the number of indexes gets bigger, the chances of all corresponding 
index pages being in memory gets smaller.

If the "quick" vacuum or opportunistic vacuum as I call it is clever 
enough to recognize that there is a dead tuple in memory, and all the 
index pages that references are in memory too, it could reliably vacuum 
just those tuples without scanning through the whole relation and without 
doing any extra I/O.

I've written some code that implements the trivial case of no indexes. I'm 
hoping to extend it to handle the indexes too if I have time. Then we'll 
see if it's any good. I've attached a patch with my current ugly 
implementation if you want to give it a try.

> On top of that, all the WAL logging that has to take place for each
> "transaction."

How is that a bad thing? You don't want to give up ACID do you?

- Heikki

Re: Contrib -- PostgreSQL shared variables

From

Josh Berkus

Date:

28 August 2004, 17:01:54

Jonathan,

> The stuff that I contribute is often met with resistence, and that may or
> may not be a good thing, but over the years, things I've said they NEEDED
> to do, have been done and PostgreSQL is better for it.

> Now don't think I'm talking trash to Tom either. These guys do a lot of
> hard work. I have the luxury of working on other projects and noticing the
> sorts of things that are a problem or could be great features. I have
> great respect for them, even if not they for me.

I think you're reading too much into the objections.  One of the things that 
keeps PostgreSQL fast, accessable, and open to new OSS developers is a high 
barrier of entry to new code proposed for the core.   A certain minimalism is 
called for if we don't want to end up like a Microsoft product, bloated, 
self-conflicted, and utterly impenetrable even to its own experts.   Often 
it's Tom and Peter expressing that minimalism but others would have to were 
they to go away (gods forfend).

This means that, for a project to get into the core, any proposer will have to 
spend a lot of time convincing the people on this list of its worthiness, 
both through lobbying and through code.  Thus the very first piece of 
criticism you'll see is "Why do we need it?".   But you can overcome these 
sorts of objections -- others have, as you yourself pointed out.  

Unfortunately, /contrib is no better than core in this way as our code gets 
larger and mainenance gets harder.  I believe our eventual goal is/should be 
to *remove* contrib entirely, replacing it with a build environment of 
pluggable components which Peter is working on.   This will have the 
beneficial effect of erasing the difficult-to-cross gap which currently 
exists between external (pgFoundry, GBorg, Sourceforge) projects and contrib.   
Go Peter!

Also, keep in mind that Tom just completed a 300-hour marathon of reviewing 
features for 8.0.    I can't imagine that he'll take a positive additude 
toward new features of any kind for about a month.

> I think the shared variable module is another one of those things. The
> cost overhead of a single variable implemented as a row is too high,
> especially if you want to update it many times a second.

Question:  How will these "system variables" behave regarding transactions?   
If I update a system variable and roll back the transaction, does it change 
back?   Do changes in a running transaction remain invisible until COMMIT?  
Excuse me if you've already answered these; I've not caught up on my -hackers 
backlog since I broke my foot.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Contrib -- PostgreSQL shared variables

From

pgsql@mohawksoft.com

Date:

29 August 2004, 09:44:15

>> I think the shared variable module is another one of those things. The
>> cost overhead of a single variable implemented as a row is too high,
>> especially if you want to update it many times a second.
>
> Question:  How will these "system variables" behave regarding
> transactions?
> If I update a system variable and roll back the transaction, does it
> change
> back?   Do changes in a running transaction remain invisible until COMMIT?
> Excuse me if you've already answered these; I've not caught up on my
> -hackers
> backlog since I broke my foot.

Actually that are not involved with transactions in any way.

There classes of problems in which proper ACID implementation is not
nessisary. In fact, there are subclasses within a larger system that need
not have the overhead imposed by transactions.

Think about this:

In one part of the system, you update a table:

updated summary_table set sum = sum + $newvalue ;


In another part of the system you perform this:

select sum from summary_table;

You did this to keep from having to do:

select sum(col) from real_table;

Which, as everyone knows, can be millions of records, and a full table
scan would be too expensive.


In an active system there is *no* way that the results of these
transactions can be anything but an estimate or a snap shot. By the time
the results have been reported, the results surely would have changed.

In order to get this inaccurate information, you had to incure many dleted
rows in the summary table for each update or the cost of a full table
scan.

Usng the variable metaphor, the overall cost of doing this is almost nil.
With care, the results can be just as accurate with respect to time.

Like I said, it is *NOT* a general solution, but it *is* a solution a lot
of developers using databases have to implement *somewhere*, why not in
the database in which the data is stored?

Re: Contrib -- PostgreSQL shared variables

From

pgsql@mohawksoft.com

Date:

29 August 2004, 10:03:16

> On Sat, 28 Aug 2004 pgsql@mohawksoft.com wrote:
>
>>
>>> I don't see how this is different from "CREATE TABLE shared_variables
>>> (name
>>> VARCHAR PRIMARY KEY, value VARCHAR)" and
>>> inserting/updating/deleting/selecting from that. Perhaps these are
>>> per-session shared variables? IN which case, what is the utility if
>>> sharing
>>> them across shared memory?
>>>
>>> - --
>>> Jonathan Gardner
>>
>> Well, the issues you don't see is this:
>>
>> What if you have to update the variables [n] times a second?
>>
>> You have to vacuum very frequently. If you update a variable a hundred
>> times a second, and vacuum only once every minute, the time it takes to
>> update ranges from reading one row from the database to reading 5999
>> dead
>> rows to get to the live one. Then you vacuum, then you are back to one
>> row
>> again.
>
> I think the right approach is to tackle that problem instead of working
> around it with a completely new variable mechanism.

You assume there is a problem with the existing mechanism that needs
fixing.  There isn't. PostgreSQL does the right thing already. The issue
is that the class of problem we are trying to solve does not fit the MVCC
methodology very well.

I've given this example a number of times:

Consider this query:

select sum(col) from table;

If this is an active system, with many inserts and updates a second, and
table has a couple million rows, there is NO way to get any meaningful
answer out of this query. It is only a snapshot, not much better than an
estimate.

The shared variables are a way of expressing, "I know this is just an
estimate, but I need to do it FAST."

A LOT of websites can benefit from this, like a quick "Number of people
online"

>
> I've been playing with the idea of a quick vacuum that runs through the
> shmem buffers. The idea is that since the pages are already in memory,
> the vacuum runs very quickly. Vacuuming the hot pages frequently
> enough should avoid the problem you describe. It also saves I/O in the
> long run since dirty pages are vacuumed before they are written to
> disk, eliminating the need to read in, vacuum and write the same pages
> again later.

That's fine, but how does it affect performance?

>
> The problem is of course that to vacuum the heap pages, you have to make
> sure that there is no references to the dead tuples from any indexes.

Exactly.

>
> The trivial case is that the table has no indexes. But I believe that
> even if the table has ONE index, it's very probable that the corresponding
> index pages of the dead tuples are also in memory, since the tuple was
> probably accessed through the index.
>
> As the number of indexes gets bigger, the chances of all corresponding
> index pages being in memory gets smaller.
>
> If the "quick" vacuum or opportunistic vacuum as I call it is clever
> enough to recognize that there is a dead tuple in memory, and all the
> index pages that references are in memory too, it could reliably vacuum
> just those tuples without scanning through the whole relation and without
> doing any extra I/O.
>
> I've written some code that implements the trivial case of no indexes. I'm
> hoping to extend it to handle the indexes too if I have time. Then we'll
> see if it's any good. I've attached a patch with my current ugly
> implementation if you want to give it a try.

This is cool, but is it actually "more" work over the long run than
vacuum? I mean, a wile ago, vacuum had to lock the tables. This
non-locking vacuum is pretty cool and takes away almost all objections
that I can think about with PostgreSQL.

>
>> On top of that, all the WAL logging that has to take place for each
>> "transaction."
>
> How is that a bad thing? You don't want to give up ACID do you?

Actually, yes, yes I do. Not for everything, but for classes of problems
that do not need it.

I've some experience with this. I have a session manager for PHP. I wrote
it because databases fail miserably when what you want to do is have
SQL-esque information that neeeds to be FAST and can go away without
worrying too much about it.

Think about a website shopping cart. The session and the things in the
cart don't matter after 10 or 15 minutes. Most of the time, it will just
be abandoned. The only time it is important is at checkout. At which point
it is transfered to an ACID compliant system.

My session manager on a modern dual PIII can handle 8000 full "get vars,
set vars" cycles a second. This is more or less impossible with a SQL
database without spending a LOT of money on clustering. Mine can do it
with a single standard x86 Linux server box.

Some information can go away without worry. Paying for protecting it with
CPU and disk I/O can be VERY expensive. PostgreSQL does what it does very
well. Sometimes what it does can get in the way of using it.

>
> - Heikki
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

Re: Contrib -- PostgreSQL shared variables

From

Greg Stark

Date:

29 August 2004, 11:42:58

Josh Berkus <josh@agliodbs.com> writes:

> Question:  How will these "system variables" behave regarding transactions?   
> If I update a system variable and roll back the transaction, does it change 
> back?   Do changes in a running transaction remain invisible until COMMIT?  
> Excuse me if you've already answered these; I've not caught up on my -hackers 
> backlog since I broke my foot.

I think this is precisely the objection. He wants something that allows him to
break out of transaction semantics entirely. Other backends would see the
value he sets before he commits. If he rolls back his changes would remain.

I expect to see substantial resistance to incorporating such a feature. One
argument will be that he should simply use a separate system for such data.
For example, memcached would be much faster and more scalable than
communicating via postgres.

The only disadvantage to using something like memcached would be that you
couldn't transparently use values from there inside your queries. You would
have to fetch them and then pass them as parameters to postgres.

-- 
greg

Re: Contrib -- PostgreSQL shared variables

From

Andrew Dunstan

Date:

29 August 2004, 13:05:57

Greg Stark wrote:

>I expect to see substantial resistance to incorporating such a feature. One
>argument will be that he should simply use a separate system for such data.
>For example, memcached would be much faster and more scalable than
>communicating via postgres.
>  
>

Postgres should do what postgres is good at IMNSHO. Small is beautiful.

>The only disadvantage to using something like memcached would be that you
>couldn't transparently use values from there inside your queries. You would
>have to fetch them and then pass them as parameters to postgres.
>  
>

Could you not overcome that using server-side functions, in, say, C or 
plperlu? In fact I suspect you could implement this whole scheme using 
memcached (which looks quite cool and exactly made for this) plus a very 
small amount of plperlu glue.

cheers

andrew

Re: Contrib -- PostgreSQL shared variables

From

Andrew Piskorski

Date:

29 August 2004, 14:48:16

On Sun, Aug 29, 2004 at 09:24:59AM -0400, pgsql@mohawksoft.com wrote:

> Think about a website shopping cart. The session and the things in the
> cart don't matter after 10 or 15 minutes. Most of the time, it will just
> be abandoned. The only time it is important is at checkout. At which point
> it is transfered to an ACID compliant system.

Much of the rest of what you said makes sense, but this is a TERRIBLE
example.  As a user, I have been burned repeatedly by crappy,
non-persistent shopping carts built by developers who, apparently,
agree with you that, "Oh, it's just a shopping cart, no one cares if
it randomly disappears and the user has to start over from scratch."

Nonsense!  Damn it, if put 12 different items in my shopping cart and
then go to lunch, I expect my shopping cart to STILL BE THERE when I
get back!  Note that Amazon does this right, so any ecommerce site
doing it WRONG does so at severe peril of losing its customers.

> My session manager on a modern dual PIII can handle 8000 full "get vars,
> set vars" cycles a second.

I don't really see why you need or want to do this in PostgreSQL,
though.  Can't you do it in your web or application server?  (I
certainly can in AOLserver with its nsv API, for example.)

What's the advantage of instead stuffing these in-memory variables
into PostgreSQL?  That it still works correctly and simply even if you
have a farm of 10 separate web servers sitting in front of it?  That
could be useful for certain applications, but is there any more to it
than that?

I've occasionally REALLY wanted a small in-memory ACId (no "D" for
Durability) RDBMS for tracking transient data that doesn't need to go
to disk (and one of these days I will experiment with using SQLite for
that), but that's not what you're talking about, right?  The shared
variables you're talking about here are just simple scalar values, no
tables, no integrity constraints, no MVCC, none of that, right?

-- 
Andrew Piskorski <atp@piskorski.com>
http://www.piskorski.com/

Re: Contrib -- PostgreSQL shared variables

From

Josh Berkus

Date:

29 August 2004, 15:23:21

Mohawksoft,

> Actually that are not involved with transactions in any way.
>
> There classes of problems in which proper ACID implementation is not
> nessisary. In fact, there are subclasses within a larger system that need
> not have the overhead imposed by transactions.

Hmmm, wait, are you implementing this, or is Jonathan?

As a database developer who would use this or a similar feature, I do not
agree that the lack of transactionality is a benefit. Quite the contrary.
Imagine this scenario, once we have error-trapping in place for 8.1 (pardon
the syntax if it's off the proposal, I think you'll see the point anyway):

BEGINSAVEPOINT before_udpate; UPDATE jehosaphat SET slivy = 60 WHERE status = 4; SET SHARED
jehosaphat_last_update= now(); IF transaction_error > 0 THEN ROLLBACK TO before_update; ELSE
COMMIT; END IF;

...

Now assume that the reason I'm using a shared variable is that "jehosaphat"
needs to be updated every 500ms, and thus we don't want to be storing it in a
table. In that case, if shared variables are not transactional, the above
would be deceptive; the system would think that jeshosaphat had been updated
when it had not.

Ah, you say, but you could work around that if you coded correctly. Maybe
so, although I find the idea that transactions don't apply to some objects
very liable to cause confusion. Also consider this case:

CREATE FUNCTION update_jeshosaphat_last () RETURNS TRIGGER AS '
BEGINSET SHARED jehosaphat_last_update = now();
END; ' language plpgsql;

CREATE TRIGGER tg_jehosaphat_update
BEFORE UPDATE OR DELETE ON jehosaphat
FOR EACH STATEMENT EXECUTE PROCEDURE update_jehosaphat_last();

NOW imagine if your update gets rolled back; the shared variable will be
misreporting the last update and there isn't any coding around it.

My personal perspective is that shared variables should be transactional, just
like everything else in PostgreSQL except sequences. Just because they are
transactional does not mean they have to be saved on disk; that's where most
of your "dead tuple overhead" comes in. If you could place the variables in
"reserved" shared memory and never sync, then needing to have several copies
which split and collapse to remain current with the existing transactions
should not be a major performance cost. Of course, it will make the code a
lot harder, but that's one of the reasons why nobody has done it before now.

Now, if you're telling me that it's too difficult for you to implement
transactions at this time, and that you want to push this patch without
transactions for them to be added sometime later, then we can have it out on
that basis. But you're going to have a hard time convincing me or several
other people that ignoring transactions is somehow a benefit.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: Contrib -- PostgreSQL shared variables

From

pgsql@mohawksoft.com

Date:

30 August 2004, 10:29:05

> On Sun, Aug 29, 2004 at 09:24:59AM -0400, pgsql@mohawksoft.com wrote:
>
>> Think about a website shopping cart. The session and the things in the
>> cart don't matter after 10 or 15 minutes. Most of the time, it will just
>> be abandoned. The only time it is important is at checkout. At which
>> point
>> it is transfered to an ACID compliant system.
>
> Much of the rest of what you said makes sense, but this is a TERRIBLE
> example.  As a user, I have been burned repeatedly by crappy,
> non-persistent shopping carts built by developers who, apparently,
> agree with you that, "Oh, it's just a shopping cart, no one cares if
> it randomly disappears and the user has to start over from scratch."
>
> Nonsense!  Damn it, if put 12 different items in my shopping cart and
> then go to lunch, I expect my shopping cart to STILL BE THERE when I
> get back!  Note that Amazon does this right, so any ecommerce site
> doing it WRONG does so at severe peril of losing its customers.

Your complaint is the timing, not the process. Most web sessions go away,
less than 10% of all shopping carts ever get checked out, period. The
Msession manager has a module that will save expired sessions to a
database and reload them if the session ID ever comes back, if you like.

Listen, I don't want to ramble on about web sessions, that is a different
topic, and I only wanted an example of data that was important to use in
the database, but was not persistent enough or irreplacable enough to
warrent the full ACID overhead. Web sessions are temporary. If you have a
machine that crashes every 9 months which causes all the active sessions
to be lost, this is probably acceptable as long as there is failover. That
is totally unacceptable with regard to a database.

There are classes of undependable or transient and reproducable data in a
system which works in concert with dependable data. It is inefficient to
require the transient data to be managed under the strict guidelines of
the dependable data.

Furthermore, the more independent systems you have within a larger system,
the less reliable the larger system is.

>
>> My session manager on a modern dual PIII can handle 8000 full "get vars,
>> set vars" cycles a second.
>
> I don't really see why you need or want to do this in PostgreSQL,
> though.  Can't you do it in your web or application server?  (I
> certainly can in AOLserver with its nsv API, for example.)

Having managed many projects, I can tell you, point blank, the fewer
things you need to use the better. An extension to an interface you
already use is much better than a different interface.

Also, the data in the shared variables is related to the data in the
database. It belongs close to it.
>
> What's the advantage of instead stuffing these in-memory variables
> into PostgreSQL?  That it still works correctly and simply even if you
> have a farm of 10 separate web servers sitting in front of it?  That
> could be useful for certain applications, but is there any more to it
> than that?

Isn't "that could be useful for certain applications" enough of a reason
to add something to contrib?

>
> I've occasionally REALLY wanted a small in-memory ACId (no "D" for
> Durability) RDBMS for tracking transient data that doesn't need to go
> to disk (and one of these days I will experiment with using SQLite for
> that), but that's not what you're talking about, right?  The shared
> variables you're talking about here are just simple scalar values, no
> tables, no integrity constraints, no MVCC, none of that, right?

Right. The metaphor will be done in almost every non-trivial system. The
developer can either search for some system to do it and figure out some
way to use the data that system in PostgreSQL, or load this contrib
module, and have it at the SQL level.

Honestly, I don't see the problem.

Re: Contrib -- PostgreSQL shared variables

From

pgsql@mohawksoft.com

Date:

30 August 2004, 11:03:10

> Mohawksoft,
>
>> Actually that are not involved with transactions in any way.
>>
>> There classes of problems in which proper ACID implementation is not
>> nessisary. In fact, there are subclasses within a larger system that
>> need
>> not have the overhead imposed by transactions.
>
> Hmmm, wait, are you implementing this, or is Jonathan?
>
> As a database developer who would use this or a similar feature, I do not
> agree that the lack of transactionality is a benefit.   Quite the
> contrary.
> Imagine this scenario, once we have error-trapping in place for 8.1
> (pardon
> the syntax if it's off the proposal, I think you'll see the point anyway):
[snip -- example]

Your example makes sense if you think that you'll only be using this in
the SQL environment.

One of the most common SQL database applications is as the backend of some
higher level system, like a web site. In these systems, ususlly done in
Java, PHP, or heaven help us, .NET, you have multiple front end machine
communicating with a SQL database backend.

These systems use the database as the main data server. The overall
scalability of their system is based on the scalability of the database.
At some point, you'll max out the DB and be stuck. The shared variables
module is designed to allow you to scale better by allowing you to cheapy
maintain data.

So, rather than call this:

select sum(field) from mytable;

or

update mytable_summary set field = field + n;

You can do this:

select addshared('mytable', n);

Now, in Java or PHP, you'd do something like this:

somefunc()
{   if(sql.exec(...))   {       // Worked       sql.exec("select addshared('mytable', n)");   }
}

OK, now you're going to say, why not use a different system for the
caching? As I said in a different post, the more systems you glue
together, the more complicated the bigger system gets. The more
complicated it gets, the less reliable it gets. Besides, this is data
you're gonna use with the data in PostgreSQL, so why not provide an
extension for it?

Re: Contrib -- PostgreSQL shared variables

From

Josh Berkus

Date:

30 August 2004, 17:11:12

Mohawksoft,

> One of the most common SQL database applications is as the backend of some
> higher level system, like a web site. In these systems, ususlly done in
> Java, PHP, or heaven help us, .NET, you have multiple front end machine
> communicating with a SQL database backend.

Leaving aside that Java, .NET and even PHP5 have much more sophisticated ways 
of sharing variables ...

> These systems use the database as the main data server. The overall
> scalability of their system is based on the scalability of the database.
> At some point, you'll max out the DB and be stuck. The shared variables
> module is designed to allow you to scale better by allowing you to cheapy
> maintain data.

I really don't follow how this makes a lack of transaction-awareness for these 
global variables an advantage.     If you're saying that it's not possible to 
make shared variables which are both cheap and transaction-aware, then say so 
and be clear about it.    

Please understand that I am not opposed to the idea behind the shared variable 
proposal.   I just think there are some details to be worked out.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: Contrib -- PostgreSQL shared variables

From

pgsql@mohawksoft.com

Date:

30 August 2004, 22:18:38

> Mohawksoft,
>
>> One of the most common SQL database applications is as the backend of
>> some
>> higher level system, like a web site. In these systems, ususlly done in
>> Java, PHP, or heaven help us, .NET, you have multiple front end machine
>> communicating with a SQL database backend.
>
> Leaving aside that Java, .NET and even PHP5 have much more sophisticated
> ways
> of sharing variables ...

Yes, and hows does a developer share between PHP, Java, and .NET? Yes, in
a homogenous system that may work, but not across multiple systems, and
that still doesn't address the data being 'in' the database as opposed to
some other system.

>
>> These systems use the database as the main data server. The overall
>> scalability of their system is based on the scalability of the database.
>> At some point, you'll max out the DB and be stuck. The shared variables
>> module is designed to allow you to scale better by allowing you to
>> cheapy
>> maintain data.
>
> I really don't follow how this makes a lack of transaction-awareness for
> these
> global variables an advantage.     If you're saying that it's not possible
> to
> make shared variables which are both cheap and transaction-aware, then say
> so
> and be clear about it.
>
> Please understand that I am not opposed to the idea behind the shared
> variable
> proposal.   I just think there are some details to be worked out.

You won't understand it from what you quoted because the reasoning has
been snipped, and I've posted the example so many times my fingers are
sore.

Some data does not need to be transactional. Data that does not need to be
transactional does not need to be transactional. If overhead can be
reduced or eliminated by eliminating transactional procedures for data
which does not need it, then the database scales better. Isn't that
completely obvious?

If your system is bonking out, every last bit of efficiency is important.
(And this is no small efficiency!) It is these sorts of tricks that can
make the difference between a well working site, and one that looks like
it has been slashdotted.

Re: Contrib -- PostgreSQL shared variables

From

Robert Treat

Date:

30 August 2004, 22:59:30

On Monday 30 August 2004 09:50, pgsql@mohawksoft.com wrote:
> > On Sun, Aug 29, 2004 at 09:24:59AM -0400, pgsql@mohawksoft.com wrote:
> > What's the advantage of instead stuffing these in-memory variables
> > into PostgreSQL?  That it still works correctly and simply even if you
> > have a farm of 10 separate web servers sitting in front of it?  That
> > could be useful for certain applications, but is there any more to it
> > than that?
>
> Isn't "that could be useful for certain applications" enough of a reason
> to add something to contrib?
>

That's enough of a reason to add something to *gborg*, but I can think of a 
lot of things I would rather see the postgresql major contributors 
maintaining than this.. especially when you consider that we've already 
removed things like jdbc from contrib. 

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL