Thread: serverless postgresql
For ease of configuration and other reasons, I would like for my single-user GUI app to be able to use postgresql in-process as a library accessing a database created in the users home directory. I think I could possibly get what I want by launching a captive copy of postmaster with appropriate args but it seems conceptually cleaner to not have a seperate process at all. Has anyone tried to do anything like this? I've looked at sqlite and it might be workable for my project but I would prefer the more standard and featureful SQL of postgres. In particular sqlite lacks date arithmetic and has some funny type issues (it claims to be typeless, except when it isn't, or something).
* Jeff Bowden <jlb@houseofdistraction.com> [2004-01-13 13:38:02 -0800]: > For ease of configuration and other reasons, I would like for my > single-user GUI app to be able to use postgresql in-process as a library > accessing a database created in the users home directory. I think I > could possibly get what I want by launching a captive copy of postmaster > with appropriate args but it seems conceptually cleaner to not have a > seperate process at all. Has anyone tried to do anything like this? hmm, i've also played around a little bit w/ postgresql standalone. you could also do nice things like calling postmaster from inetd ;-) but I dont think you really wanna have the postmaster in the same process w/ your application, since it relies on forks, signals, mmap() and some other things which directly affect the process control - this probably infers with your application. On the application side you cant really control, what it does, so it sounds quite difficult. Having the postmaster in a separate process connected by a socket or pty seems to be the better solution. cu -- --------------------------------------------------------------------- Enrico Weigelt == metux IT services phone: +49 36207 519931 www: http://www.metux.de/ fax: +49 36207 519932 email: contact@metux.de cellphone: +49 174 7066481 --------------------------------------------------------------------- Diese Mail wurde mit UUCP versandt. http://www.metux.de/uucp/
I have just about the same sort of needs now and concluded that postgres just is not suited for embedding into apps like that. I am going with sqlite and it is working fairly well. We just made user defined functions in php for sqlite to match the date functions in postgres. Who knows what other issues we may run into but so far sqlite is working fairly well. I think that sqlite stores everything as a string but for searching, sorting, ordering etc it uses the type info. I to would absolutely love to just include a postgres dll and have postgres in process and just store everything in a nice little file but from what I have read hear that would involve major changes that the developers on not interested in making. So far sqlite has done the job for me. Also I am considering looking into firebird for an embedded database solution since it can you pretty much what you are talking about here with an in proccess db that uses a single file to store it's data. While I don't want to switch my server stuff to it, it is probably more full featured than sqlite (I'm sure it has date and arithmatic functions) and is made to work well in embedded situations. I would be intersted to see what you end up doing. Rick ----- Original Message ----- From: "Jeff Bowden" <jlb@houseofdistraction.com> To: "pgsql-general" <pgsql-general@postgresql.org> Sent: Tuesday, January 13, 2004 2:38 PM Subject: [GENERAL] serverless postgresql > For ease of configuration and other reasons, I would like for my > single-user GUI app to be able to use postgresql in-process as a library > accessing a database created in the users home directory. I think I > could possibly get what I want by launching a captive copy of postmaster > with appropriate args but it seems conceptually cleaner to not have a > seperate process at all. Has anyone tried to do anything like this? > > I've looked at sqlite and it might be workable for my project but I > would prefer the more standard and featureful SQL of postgres. In > particular sqlite lacks date arithmetic and has some funny type issues > (it claims to be typeless, except when it isn't, or something). > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
"Rick Gigger" <rick@alpinenetworking.com> writes: > I to would absolutely love to just include a postgres dll and have postgres > in process and just store everything in a nice little file but from what I > have read hear that would involve major changes that the developers on not > interested in making. Not only are the developers uninterested in it, the developers actively oppose it. We think an embedded database library cannot be reliable enough to meet our notion of a "database", since it would be subject to failures anytime the surrounding application has a bug. Keeping the client code in a separate process is a far more robust design. regards, tom lane
Tom Lane wrote: >"Rick Gigger" <rick@alpinenetworking.com> writes: > > >>I to would absolutely love to just include a postgres dll and have postgres >>in process and just store everything in a nice little file but from what I >>have read hear that would involve major changes that the developers on not >>interested in making. >> >> > >Not only are the developers uninterested in it, the developers actively >oppose it. We think an embedded database library cannot be reliable >enough to meet our notion of a "database", since it would be subject to >failures anytime the surrounding application has a bug. Keeping the >client code in a separate process is a far more robust design. > > > That makes sense to me. I wonder if sqlite suffers for this problem (e.g. app crashing and corrupting the database). What about the notion of running postmaster on-demand as the user? Is that something that anyone has experience with? It seems like it would solve the complex configuration problems without compromising robustness or requiring any special support other than sufficient command-line parameters. Oh yeah, that brings me to another question. I was looking at the postmaster command-line switches and I couldn't find any that would allow me to point it at an arbitrary config file but then I had a look around and it seems that the .conf files are already used by the various startup scripts. Does postmaster itself ever read the .conf files or is it controlled strictly by switches?
On Jan 13, 2004, at 13:38, Jeff Bowden wrote: > but it seems conceptually cleaner to not have a seperate process at > all. That depends a lot on what you mean by cleaner. A few small programs in their own process space dedicated to a specific set of functionality with well-defined interfaces can make for a much more reliable application (see postfix). -- Dustin Sallings
Jeff Bowden <jlb@houseofdistraction.com> writes: > That makes sense to me. I wonder if sqlite suffers for this problem > (e.g. app crashing and corrupting the database). Likely. I can tell you that Ann Harrison once told me she made a decent amount of money as a consultant fixing broken Interbase/Firebird database files. It would be hard to make a living in the same game for Postgres. Now I don't think that Firebird is any buggier than Postgres. But it comes in an embedded-library form; I'll bet lunch that most of those data corruption problems were actually induced by crashes of surrounding applications. > What about the notion of running postmaster on-demand as the user? Possibly. You'd have to think carefully about what conditions the postmaster should be shut down under, and especially what conditions it should NOT be shut down under --- eg, a kill to the parent client application shouldn't cause an ungraceful postmaster exit. It could be tricky to get the signal handling right, especially under shells that try to deliver signals to all children of a process being signaled. On the whole I suspect it'd be easier just to leave the postmaster running in the background... > Oh yeah, that brings me to another question. I was looking at the > postmaster command-line switches and I couldn't find any that would > allow me to point it at an arbitrary config file The config files all live in $PGDATA and so are determined by the -D switch. There was some talk of changing this, awhile back, but it foundered on lack of consensus about exactly what to do instead. regards, tom lane
Tom Lane wrote: >Jeff Bowden <jlb@houseofdistraction.com> writes: > > >>What about the notion of running postmaster on-demand as the user? >> >> > >Possibly. You'd have to think carefully about what conditions the >postmaster should be shut down under, and especially what conditions >it should NOT be shut down under --- eg, a kill to the parent client >application shouldn't cause an ungraceful postmaster exit. It could >be tricky to get the signal handling right, especially under shells that >try to deliver signals to all children of a process being signaled. >On the whole I suspect it'd be easier just to leave the postmaster >running in the background... > > Details, details.... :-) >>Oh yeah, that brings me to another question. I was looking at the >>postmaster command-line switches and I couldn't find any that would >>allow me to point it at an arbitrary config file >> >> > >The config files all live in $PGDATA and so are determined by the -D >switch. There was some talk of changing this, awhile back, but it >foundered on lack of consensus about exactly what to do instead. > > As long as it can be done.
Tom Lane <tgl@sss.pgh.pa.us> said: > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > Re: tip 9 :-). In 7.3.5 is it possible that the planner is finding a type mismatch when there isn't one, resulting in a seq scan being selected? I've had a problem with joining tables on an int8 column (primary key column in one of the tables). Tested left outer and where clause syntax. TIA, Jim Wilson
"Jim Wilson" <jimw@kelcomaine.com> writes: > Re: tip 9 :-). In 7.3.5 is it possible that the planner is finding a type > mismatch when there isn't one, resulting in a seq scan being selected? When you gave no details, how is anyone to know? For questions like this, you need to post the table definitions, the exact query, and the results of EXPLAIN ANALYZE on that query. BTW, pgsql-performance is a more appropriate list than -general for such questions. regards, tom lane
> Jeff Bowden <jlb@houseofdistraction.com> writes: > > That makes sense to me. I wonder if sqlite suffers for this problem > > (e.g. app crashing and corrupting the database). > > Likely. I can tell you that Ann Harrison once told me she made a decent > amount of money as a consultant fixing broken Interbase/Firebird > database files. It would be hard to make a living in the same game for > Postgres. Now I don't think that Firebird is any buggier than Postgres. > But it comes in an embedded-library form; I'll bet lunch that most of > those data corruption problems were actually induced by crashes of > surrounding applications. I remember reading about how some the original commerical code from Interbase came with some bugs that corrupted databases and that those bugs were eventually fixed in Firebird. Don't know if that's related to this but it could be. Anyway since postgres uses WAL files to verify the integrity of the database couldn't it more or less make the same guarantee's in an embedded version? As long as the app uses the db libs unmodified and doesn't mess with the files it creates how does simply making it embedded increase the change of db errors resulting in database corruption? In addition there are times when you want the ease of a relational database but you don't need say the same level of reliability that you would on your server. In these cases it would be nice to just have something that is easy to drop into your app and have it just work. That being said I think that in these cases the developer isn't usually going to care if it runs in process or not. Maybe a good solution would be to make a postgres library that could be included in process that would interface with the application. All it would do is manager spawning the real postgres process. Then you would have the reliability of a real server but the ease of use of an embedded database. Couldn't that end up being pretty transparent to the developer? If it was made to be easy postgres could become very popular as an "embedded" db solution but still not have it's reputation diluted by rampant database corruption from the application.
"Rick Gigger" <rick@alpinenetworking.com> writes: > Anyway since postgres uses WAL files to verify the integrity of the database > couldn't it more or less make the same guarantee's in an embedded version? > As long as the app uses the db libs unmodified and doesn't mess with the > files it creates how does simply making it embedded increase the change of > db errors resulting in database corruption? If an app's internal data structures get corrupted and it goes nuts, pretty much anything can happen to its address space. -Doug
> Anyway since postgres uses WAL files to verify the integrity of the database > couldn't it more or less make the same guarantee's in an embedded version? > As long as the app uses the db libs unmodified and doesn't mess with the > files it creates how does simply making it embedded increase the change of > db errors resulting in database corruption? I have a different idea. I've been thinking about coding it, but haven't had the time. This could be done with no changes to Postgres itself. Basically, you would have a library which exported functions such as pg_instance *pg_start(char *directory); pg_get_connection(pg_instance *); pg_stop(pg_instance *); pg_initdb(char *directory) pg_start would do the following: 1) check "directory" for an instance of the UNIX socket. if it is there, make a Postgres connection if it is not there, start the Postmaster server with "-k directory -D directory/data" and then make a Postgres connection 2) Create a struct to contain the directory and any other data item we need to connect to the database 3) Return this structure pg_get_connection would just be a wrapper for pq_connect() pg_stop would kill the database. pg_initdb would simply run initdb Does anyone see a reason why this wouldn't work? Jon
This is just what I recommended (I think) in my earlier post. This would be FANTASTIC. Then I could scrap sqlite and any sqlite custom code that I end up having to write (there is a little). Plus I could have the reliability of a multi-process postgres "embedded". Now if they just finish the Win32 port sometime soon... ----- Original Message ----- From: "Jonathan Bartlett" <johnnyb@eskimo.com> To: "Rick Gigger" <rick@alpinenetworking.com> Cc: "Jeff Bowden" <jlb@houseofdistraction.com>; "Tom Lane" <tgl@sss.pgh.pa.us>; <pgsql-general@postgresql.org> Sent: Wednesday, January 14, 2004 2:07 PM Subject: Re: [GENERAL] serverless postgresql > > Anyway since postgres uses WAL files to verify the integrity of the database > > couldn't it more or less make the same guarantee's in an embedded version? > > As long as the app uses the db libs unmodified and doesn't mess with the > > files it creates how does simply making it embedded increase the change of > > db errors resulting in database corruption? > > I have a different idea. I've been thinking about coding it, but haven't > had the time. This could be done with no changes to Postgres itself. > > Basically, you would have a library which exported functions such as > > pg_instance *pg_start(char *directory); > pg_get_connection(pg_instance *); > pg_stop(pg_instance *); > pg_initdb(char *directory) > > pg_start would do the following: > 1) check "directory" for an instance of the UNIX socket. > if it is there, make a Postgres connection > if it is not there, start the Postmaster server with "-k directory > -D directory/data" and then make a Postgres connection > 2) Create a struct to contain the directory and any other data item we > need to connect to the database > 3) Return this structure > > pg_get_connection would just be a wrapper for pq_connect() > > pg_stop would kill the database. > > pg_initdb would simply run initdb > > Does anyone see a reason why this wouldn't work? > > Jon > >
Jeff Bowden wrote: > For ease of configuration and other reasons, I would like for my > single-user GUI app to be able to use postgresql in-process as a library > accessing a database created in the users home directory. I think I > could possibly get what I want by launching a captive copy of postmaster > with appropriate args but it seems conceptually cleaner to not have a > seperate process at all. Has anyone tried to do anything like this? [Sorry for not actually answering this question] I believe the demands for embedded/"serverless" version of PostgreSQL to increase significantly once PostgreSQL is natively available on Windows. So I would expect that official embedded support to follow quite shortly after win32 port has stabilized. :-) -- dave
Rick Gigger wrote: > I have just about the same sort of needs now and concluded that postgres > just is not suited for embedding into apps like that. Why not? It's not that the PostgreSQL backend is a mammoth like Oracle. The Firebird embedded version is pretty much the same as their server, but without network and client authentication layer. However, embedded usually demands that the backend be threaded. Otherwise it will be pretty useless/very inconvenient to use in many apps. Perhaps this is the major change that's hard to do? -- dave
Tom Lane wrote: > Jeff Bowden <jlb@houseofdistraction.com> writes: > >>That makes sense to me. I wonder if sqlite suffers for this problem >>(e.g. app crashing and corrupting the database). > > Likely. I can tell you that Ann Harrison once told me she made a decent > amount of money as a consultant fixing broken Interbase/Firebird > database files. It would be hard to make a living in the same game for > Postgres. Now I don't think that Firebird is any buggier than Postgres. > But it comes in an embedded-library form; I'll bet lunch that most of > those data corruption problems were actually induced by crashes of > surrounding applications. Do the developers generally oppose the idea of a threaded (but non-embedded) backend as well? If the backend is thread-safe, then users can still choose to run multiprocess or multithreaded right? -- dave
> Do the developers generally oppose the idea of a threaded (but > non-embedded) backend as well? If the backend is thread-safe, then users > can still choose to run multiprocess or multithreaded right? I've been under the impression that the developers were opposed to a threaded server because of the complete lack of consistency in threading behavior across platforms. However, I don't see how doing it as a multiprocess server as apposed to a multithreaded server affects embedded use - as long as it can simply be called from a library, why would a user or developer care? Jon
tgl@sss.pgh.pa.us says... > > That makes sense to me. I wonder if sqlite suffers for this problem > > (e.g. app crashing and corrupting the database). > Likely. I can tell you that Ann Harrison once told me she made a decent > amount of money as a consultant fixing broken Interbase/Firebird > database files. It would be hard to make a living in the same game for > Postgres. Now I don't think that Firebird is any buggier than Postgres. > But it comes in an embedded-library form; I'll bet lunch that most of > those data corruption problems were actually induced by crashes of > surrounding applications. The reason IMHO for "server" crashes in IB/FB is quite simply due to the fact that newbie types who are running IB/FB will tend to run it on the crappiest PC in the office under Wintendo 95 on the machine used by the secretary who's a bit of a ditz and keeps switching it off by mistake, or spilling her coffee on the exposed hard drive. No forced writes on, no UPS, no RAID, no server room - nada, zip, diddly. Basically, it boils down to the fact that just because IB/FB *_lets_* you do something, it's not necessarily a good idea. IB/FB will run reasonably on the crappiest of hardware, and on the crappiest of OS's, which is what leads to the problem. It's unlikely that PostgreSQL faces this problem, since it will only run under far more robust OS's, and you're not likely to have the office eejit running the Unix/Linux/BSD box with the db server on it. Paul... -- plinehan y_a_h_o_o and d_o_t com C++ Builder 5 SP1, Interbase 6.0.1.6 IBX 5.04 W2K Pro Please do not top-post. "XML avoids the fundamental question of what we should do, by focusing entirely on how we should do it." quote from http://www.metatorial.com
> Rick Gigger wrote: > > I have just about the same sort of needs now and concluded that postgres > > just is not suited for embedding into apps like that. > > Why not? It's not that the PostgreSQL backend is a mammoth like Oracle. > The Firebird embedded version is pretty much the same as their server, > but without network and client authentication layer. > > However, embedded usually demands that the backend be threaded. > Otherwise it will be pretty useless/very inconvenient to use in many > apps. Perhaps this is the major change that's hard to do? I should have said that postgres isn't suited for running in-proccess with the rest of your app. Or at least the developers don't think (and I guess I agree now) that it's a good idea. They could however make a library that would start up the server in multiprocess mode and run it in the background.
> Tom Lane wrote: > > Jeff Bowden <jlb@houseofdistraction.com> writes: > > > >>That makes sense to me. I wonder if sqlite suffers for this problem > >>(e.g. app crashing and corrupting the database). > > > > Likely. I can tell you that Ann Harrison once told me she made a decent > > amount of money as a consultant fixing broken Interbase/Firebird > > database files. It would be hard to make a living in the same game for > > Postgres. Now I don't think that Firebird is any buggier than Postgres. > > But it comes in an embedded-library form; I'll bet lunch that most of > > those data corruption problems were actually induced by crashes of > > surrounding applications. > > Do the developers generally oppose the idea of a threaded (but > non-embedded) backend as well? If the backend is thread-safe, then users > can still choose to run multiprocess or multithreaded right? My impression is that they don't want to go multithreaded. There was some talk of this for the windows port but I belive that Bruce is simply modifying the multi-proccess code so that it will work on windows.
David Garamond <lists@zara.6.isreserved.com> writes: > Do the developers generally oppose the idea of a threaded (but > non-embedded) backend as well? If the backend is thread-safe, then users > can still choose to run multiprocess or multithreaded right? The backend isn't thread-safe. There have been repeated discussions about using threading in the backend (see the pgsql-hackers archives), but so far no one has made a convincing case for it. BTW, this whole discussion is getting pretty off-topic for -general; I'd suggest pursuing it on -hackers. regards, tom lane
From: "David Garamond" <lists@zara.6.isreserved.com> > I believe the demands for embedded/"serverless" version of PostgreSQL to > increase significantly once PostgreSQL is natively available on Windows. > So I would expect that official embedded support to follow quite shortly > after win32 port has stabilized. :-) People are always asking for embedded dbms's without really considering the consequences. For example, if you need to share data, you end up with all the MS Access sorts of issues. I know because I used to work at Microsoft in the department that provided support both for Access and the developer products. I suspect the support nightmares may be part of the reason for pushing MSDE, and hence *get away* from the embedded dbms model ;-) For example, do you realize that the marketing info said that Access supported 256 *concurrent* users to a database? At the same time troubleshooting corruption was nearly unsupported for a while. I agree with the approach of a wrapper library which would wrap the startup/shutdown of a postgresql server so that the programmer doesn't have to worry about the details, but I would add another idea-- namely that the library should be able to determine whether the server is running remotely, and simply pass the connection to libpq. This would also create a conceptually cleaner framework for configuration of software which may need to access a local or remote data store. Best Wishes, Chris Travers
"Chris Travers" <chris@travelamericas.com> writes: > I agree with the approach of a wrapper library which would wrap the > startup/shutdown of a postgresql server so that the programmer doesn't have > to worry about the details, The reason that the client programmer doesn't have to worry about starting/stopping the database is that it's not his responsibility. I don't think that having the client control this is a good idea at all. David conveniently ignored the points I made before, but they are real issues --- if the client is in charge of starting or stopping the DB, it just adds potential for mucking things up. I can see the bug reports now: "I decided I'd make the shutdown routine 'kill -9' the postmaster because I didn't like the multi-second delay for a normal shutdown. Now my database is corrupt." Another set of objections to this center around the fact that with this sort of arrangement, the database files would necessarily belong to the client user, since there's no way to launch the postmaster as a different userid. (Unless the client is running as root, which I sincerely hope he is not.) That means there's no filesystem protection between the client and the database, which is another recipe for trouble. Not much point in keeping an address-space firewall between client and server when the client can scribble on the database anyway. regards, tom lane
> "Chris Travers" <chris@travelamericas.com> writes: > > I agree with the approach of a wrapper library which would wrap the > > startup/shutdown of a postgresql server so that the programmer doesn't have > > to worry about the details, > > The reason that the client programmer doesn't have to worry about > starting/stopping the database is that it's not his responsibility. > I don't think that having the client control this is a good idea at all. > David conveniently ignored the points I made before, but they are > real issues --- if the client is in charge of starting or stopping the > DB, it just adds potential for mucking things up. I can see the bug > reports now: "I decided I'd make the shutdown routine 'kill -9' the > postmaster because I didn't like the multi-second delay for a normal > shutdown. Now my database is corrupt." I recall a discussion a while back where people were complaining that some os (I think it was mac os x) would just kill all proccesses after something like 30 seconds on system shutdown if they didn't quit fast enough on their own. The response was that because of how postgres uses wal files to prevent db corruption this was not an issue. Why does this not apply in this situation? > Another set of objections to this center around the fact that with this > sort of arrangement, the database files would necessarily belong to the > client user, since there's no way to launch the postmaster as a > different userid. (Unless the client is running as root, which I > sincerely hope he is not.) That means there's no filesystem protection > between the client and the database, which is another recipe for > trouble. Not much point in keeping an address-space firewall between > client and server when the client can scribble on the database anyway. By saying "the client can scribble on the database anyway" do you do you mean the client app actually writing over the db files on disk? It seems like this would only be a problem with an exceptionally stupid programmer. How could that happen on accident?
> From: "David Garamond" <lists@zara.6.isreserved.com> > > I believe the demands for embedded/"serverless" version of PostgreSQL to > > increase significantly once PostgreSQL is natively available on Windows. > > So I would expect that official embedded support to follow quite shortly > > after win32 port has stabilized. :-) > > People are always asking for embedded dbms's without really considering the > consequences. For example, if you need to share data, you end up with all > the MS Access sorts of issues. I know because I used to work at Microsoft > in the department that provided support both for Access and the developer > products. I suspect the support nightmares may be part of the reason for > pushing MSDE, and hence *get away* from the embedded dbms model ;-) My argument of having an embedded or pseudo-embedded postgres is this: I am never, ever going to trust any really, really important data to an embedded database. If the user has it on his system the moron might delete the database files in which case it really doesn't matter how your protecting it. The user can potentially screw everything up. The reason I need an embedded database is that sometime my clients need to take a copy of the app off line and look the info, make reports etc. They also may eventually be filling out small amounts of data that can then be synced with the live database once they are back on line. If a clients thinkpad lights on fire while he's on the plane and he loses 6 hours of data entry well that's not my problem. But I've ALWAYS got most of the data in a nice consistent postgres database on my server which is backed up constantly to multiple locations. So there are cases where reliability requirements just go down and what I want is something that acts exactly like postgres so that I don't have to write custom code for 2 databases. Since I end up just running sqlite anyway I would prefer to just have an embedded (or pseudo-embedded) postgres. I'm not going to get all up in arms if something bad happens because I understand the risks of running it embedded. But those risks will be the same with any embedded app. I just don't care. Does this apply to the vast majority of embedded users or is it just me?
"Rick Gigger" <rick@alpinenetworking.com> writes: >> ... just adds potential for mucking things up. I can see the bug >> reports now: "I decided I'd make the shutdown routine 'kill -9' the >> postmaster because I didn't like the multi-second delay for a normal >> shutdown. Now my database is corrupt." > I recall a discussion a while back where people were complaining that some > os (I think it was mac os x) would just kill all proccesses after something > like 30 seconds on system shutdown if they didn't quit fast enough on their > own. The response was that because of how postgres uses wal files to > prevent db corruption this was not an issue. Why does this not apply in > this situation? Because a system shutdown doesn't leave anything else running. kill -9 on the postmaster leaves orphaned backends still running, and orphaned shared memory segments still in existence. The latter could prevent you from starting a fresh postmaster (because two shmem segments could exceed the kernel's SHMMAX) until you do manual cleanup, which most people don't know how to do. If you do succeed in starting another postmaster with a fresh shared memory segment, you will have two independent sets of backends modifying the database files with no interlocking, which is a sure recipe for data corruption. There is an interlock that is supposed to prevent this catastrophic scenario, but I do not trust it unreservedly (and not at all on non-Unix-derived platforms, because it depends on some rather arcane features of the SysV shared memory API, which might not be implemented fully on other platforms). This is why the "don't kill -9 the postmaster" TIP is still around, BTW. It has nothing to do with crash safety. > By saying "the client can scribble on the database anyway" do you do you > mean the client app actually writing over the db files on disk? It seems > like this would only be a problem with an exceptionally stupid programmer. > How could that happen on accident? Scribbling on memory that doesn't belong to you isn't something one does intentionally, either --- but it happens. However, my real concern here is not so much with program bugs as clueless users. We see enough cases already of people removing lock files or "unnecessary" log files; I fear it'd be a lot worse if those files were sitting in the user's home directory. regards, tom lane
Tom Lane wrote: >"Chris Travers" <chris@travelamericas.com> writes: > > >>I agree with the approach of a wrapper library which would wrap the >>startup/shutdown of a postgresql server so that the programmer doesn't have >>to worry about the details, >> >> > >The reason that the client programmer doesn't have to worry about >starting/stopping the database is that it's not his responsibility. >I don't think that having the client control this is a good idea at all. >David conveniently ignored the points I made before, but they are >real issues --- if the client is in charge of starting or stopping the >DB, it just adds potential for mucking things up. I can see the bug >reports now: "I decided I'd make the shutdown routine 'kill -9' the >postmaster because I didn't like the multi-second delay for a normal >shutdown. Now my database is corrupt." > >Another set of objections to this center around the fact that with this >sort of arrangement, the database files would necessarily belong to the >client user, since there's no way to launch the postmaster as a >different userid. (Unless the client is running as root, which I >sincerely hope he is not.) That means there's no filesystem protection >between the client and the database, which is another recipe for >trouble. Not much point in keeping an address-space firewall between >client and server when the client can scribble on the database anyway. > > > Still, the main problem I, and I suspect others, would like to solve is installation/configuration. For my app I don't want the user to have to understand anything about how keeping data in a shared system-administered database is different from keeping data in local files. Everything should "just work". There is no requirement for concurrent access. So kill -9 on postmaster can lead to database corruption? What happens in a power failure?
Personally, I would use something like berkelydb for an embedded database engine. As a general rule, if it needs to be embedded, you don't need sql. I'm sure you could think of areas where you could use a full blown sql server, but in practice that generally isn't the case. And like others have already mentioned, if you are embedding a database in software that a user has control of, you want it as SIMPLE as possible. Chris ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Rick Gigger" <rick@alpinenetworking.com> Cc: "Chris Travers" <chris@travelamericas.com>; "David Garamond" <lists@zara.6.isreserved.com>; "Jeff Bowden" <jlb@houseofdistraction.com>; "pgsql-general" <pgsql-general@postgresql.org> Sent: Friday, January 16, 2004 1:14 PM Subject: Re: embedded/"serverless" (Re: [GENERAL] serverless postgresql) > "Rick Gigger" <rick@alpinenetworking.com> writes: > >> ... just adds potential for mucking things up. I can see the bug > >> reports now: "I decided I'd make the shutdown routine 'kill -9' the > >> postmaster because I didn't like the multi-second delay for a normal > >> shutdown. Now my database is corrupt." > > > I recall a discussion a while back where people were complaining that some > > os (I think it was mac os x) would just kill all proccesses after something > > like 30 seconds on system shutdown if they didn't quit fast enough on their > > own. The response was that because of how postgres uses wal files to > > prevent db corruption this was not an issue. Why does this not apply in > > this situation? > > Because a system shutdown doesn't leave anything else running. kill -9 > on the postmaster leaves orphaned backends still running, and orphaned > shared memory segments still in existence. The latter could prevent you > from starting a fresh postmaster (because two shmem segments could > exceed the kernel's SHMMAX) until you do manual cleanup, which most > people don't know how to do. If you do succeed in starting another > postmaster with a fresh shared memory segment, you will have two > independent sets of backends modifying the database files with no > interlocking, which is a sure recipe for data corruption. > > There is an interlock that is supposed to prevent this catastrophic > scenario, but I do not trust it unreservedly (and not at all on > non-Unix-derived platforms, because it depends on some rather arcane > features of the SysV shared memory API, which might not be implemented > fully on other platforms). > > This is why the "don't kill -9 the postmaster" TIP is still around, BTW. > It has nothing to do with crash safety. > > > By saying "the client can scribble on the database anyway" do you do you > > mean the client app actually writing over the db files on disk? It seems > > like this would only be a problem with an exceptionally stupid programmer. > > How could that happen on accident? > > Scribbling on memory that doesn't belong to you isn't something one does > intentionally, either --- but it happens. However, my real concern here > is not so much with program bugs as clueless users. We see enough cases > already of people removing lock files or "unnecessary" log files; I fear > it'd be a lot worse if those files were sitting in the user's home > directory. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
> Personally, I would use something like berkelydb for an embedded database > engine. As a general rule, if it needs to be embedded, you don't need sql. > I'm sure you could think of areas where you could use a full blown sql > server, but in practice that generally isn't the case. And like others > have already mentioned, if you are embedding a database in software that a > user has control of, you want it as SIMPLE as possible. But the user isn't the one accessing the database directly. Have you ever used a program like Delphi to create applications? Using a relational database, you can create applications using nothing but drag-and-drop and SQL. No real code. If your application is a simple end-user app (like a wedding planner, for example), embedded databases are your dream come true. Linux has no real embedded database. PG would be an excellent embedded database. I have already proposed some functions that would help PG as an embedded database, and, if I hve time, I will implement them (can't right now because I'm finishing a book). Anyway, I get the feeling that those people who don't like the idea of having PG as an embedded database have never used one. I think Tom, for example, has been focusing on enterprise-level applications too much to think about the personal applications, and Chris, I'm going to guess you've never built an application w/ an embedded DB. Anyway, this thread is going nowhere, I'll come back if/when I have some code. Perhaps with something concrete we can all see better where this is going. Jon > > Chris > > > > ----- Original Message ----- > From: "Tom Lane" <tgl@sss.pgh.pa.us> > To: "Rick Gigger" <rick@alpinenetworking.com> > Cc: "Chris Travers" <chris@travelamericas.com>; "David Garamond" > <lists@zara.6.isreserved.com>; "Jeff Bowden" <jlb@houseofdistraction.com>; > "pgsql-general" <pgsql-general@postgresql.org> > Sent: Friday, January 16, 2004 1:14 PM > Subject: Re: embedded/"serverless" (Re: [GENERAL] serverless postgresql) > > > > "Rick Gigger" <rick@alpinenetworking.com> writes: > > >> ... just adds potential for mucking things up. I can see the bug > > >> reports now: "I decided I'd make the shutdown routine 'kill -9' the > > >> postmaster because I didn't like the multi-second delay for a normal > > >> shutdown. Now my database is corrupt." > > > > > I recall a discussion a while back where people were complaining that > some > > > os (I think it was mac os x) would just kill all proccesses after > something > > > like 30 seconds on system shutdown if they didn't quit fast enough on > their > > > own. The response was that because of how postgres uses wal files to > > > prevent db corruption this was not an issue. Why does this not apply in > > > this situation? > > > > Because a system shutdown doesn't leave anything else running. kill -9 > > on the postmaster leaves orphaned backends still running, and orphaned > > shared memory segments still in existence. The latter could prevent you > > from starting a fresh postmaster (because two shmem segments could > > exceed the kernel's SHMMAX) until you do manual cleanup, which most > > people don't know how to do. If you do succeed in starting another > > postmaster with a fresh shared memory segment, you will have two > > independent sets of backends modifying the database files with no > > interlocking, which is a sure recipe for data corruption. > > > > There is an interlock that is supposed to prevent this catastrophic > > scenario, but I do not trust it unreservedly (and not at all on > > non-Unix-derived platforms, because it depends on some rather arcane > > features of the SysV shared memory API, which might not be implemented > > fully on other platforms). > > > > This is why the "don't kill -9 the postmaster" TIP is still around, BTW. > > It has nothing to do with crash safety. > > > > > By saying "the client can scribble on the database anyway" do you do you > > > mean the client app actually writing over the db files on disk? It > seems > > > like this would only be a problem with an exceptionally stupid > programmer. > > > How could that happen on accident? > > > > Scribbling on memory that doesn't belong to you isn't something one does > > intentionally, either --- but it happens. However, my real concern here > > is not so much with program bugs as clueless users. We see enough cases > > already of people removing lock files or "unnecessary" log files; I fear > > it'd be a lot worse if those files were sitting in the user's home > > directory. > > > > regards, tom lane > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
Jeff Bowden <jlb@houseofdistraction.com> writes: > Still, the main problem I, and I suspect others, would like to solve is > installation/configuration. For my app I don't want the user to have to > understand anything about how keeping data in a shared > system-administered database is different from keeping data in local > files. Everything should "just work". Sure, but most of the existing packagings of PG already try to make this automatic (at least Lamar's RPMs and Oliver's Debian package do). No doubt further work could be invested to make it even smoother, but that doesn't mean we need a client-started database. > So kill -9 on postmaster can lead to database corruption? What happens > in a power failure? See my last response to Rick Gigger. kill -9 is a lot different from a power failure. regards, tom lane
On Fri, Jan 16, 2004 at 02:00:35PM -0800, Jonathan Bartlett wrote: > having PG as an embedded database have never used one. I think Tom, for > example, has been focusing on enterprise-level applications too much to > think about the personal applications, and Chris, I'm going to guess This seems to suggest that Postgres should focus on something other than enterprise-level applications, and (presumably) on something else. Frankly, I am _mighty glad_ that the developers are focussed on enterprise-level applications. That's what makes Postgres eligible to back enterprise-level software. Applications that try to do everything for everyone tend to end up doing nothing very well. I don't think one needs to look too hard to discover examples of that phenomenon. A -- Andrew Sullivan | ajs@crankycanuck.ca This work was visionary and imaginative, and goes to show that visionary and imaginative work need not end up well. --Dennis Ritchie
Tom Lane wrote: >Jeff Bowden <jlb@houseofdistraction.com> writes: > > >>Still, the main problem I, and I suspect others, would like to solve is >>installation/configuration. For my app I don't want the user to have to >>understand anything about how keeping data in a shared >>system-administered database is different from keeping data in local >>files. Everything should "just work". >> >> > >Sure, but most of the existing packagings of PG already try to make this >automatic (at least Lamar's RPMs and Oliver's Debian package do). No >doubt further work could be invested to make it even smoother, but that >doesn't mean we need a client-started database. > So maybe this is a packaging issue. On Debian when I install postgres it is necessary to do root shit in order to enable non-priveledged users to create and destroy databases. My understanding has alwasy been that these operations are restricted because it can allow users to accidentally or intentionally interefere with each other's use of the server. If there is some way for this to be set up sanely by default to allow users to successfully use createdb and dropdb on only their own data, then I will file a wishlist bug against the Debian packages to add this. Or is this all a pipe dream?
> This seems to suggest that Postgres should focus on something other > than enterprise-level applications, and (presumably) on something > else. Nope. > > Frankly, I am _mighty glad_ that the developers are focussed on > enterprise-level applications. Me too. But I think we can get it for small-scale stuff with no impact on the rest. Just an additional function or 6 to make embedded developer's lives easier. Jon
On Fri, 16 Jan 2004, Jeff Bowden wrote: > Tom Lane wrote: > > >Jeff Bowden <jlb@houseofdistraction.com> writes: > > > > > >>Still, the main problem I, and I suspect others, would like to solve is > >>installation/configuration. For my app I don't want the user to have to > >>understand anything about how keeping data in a shared > >>system-administered database is different from keeping data in local > >>files. Everything should "just work". > >> > >> > > > >Sure, but most of the existing packagings of PG already try to make this > >automatic (at least Lamar's RPMs and Oliver's Debian package do). No > >doubt further work could be invested to make it even smoother, but that > >doesn't mean we need a client-started database. > > > > So maybe this is a packaging issue. On Debian when I install postgres > it is necessary to do root shit in order to enable non-priveledged users > to create and destroy databases. My understanding has alwasy been that > these operations are restricted because it can allow users to > accidentally or intentionally interefere with each other's use of the > server. If there is some way for this to be set up sanely by default to > allow users to successfully use createdb and dropdb on only their own > data, then I will file a wishlist bug against the Debian packages to add > this. Or is this all a pipe dream? I don't see what the problem is. PostgreSQL is perfectly able to be installed and run as any old user. Even if one is distributing via a package system, such as RPM perhaps, which has problems installing as non-root user if one sees the package already installed then there's no need to try to install it again. Even with the package installations there's bugger all stopping any old user doing an initdb -D myownplace. The standard postgres user on a system is only special in that it is generally the user the postmaster is run as. If you have other non-priviledged users and there's no need for them to share a cluster why try to make them? -- Nigel Andrews
Nigel J. Andrews wrote: >On Fri, 16 Jan 2004, Jeff Bowden wrote: > > >>So maybe this is a packaging issue. On Debian when I install postgres >>it is necessary to do root shit in order to enable non-priveledged users >>to create and destroy databases. My understanding has alwasy been that >>these operations are restricted because it can allow users to >>accidentally or intentionally interefere with each other's use of the >>server. If there is some way for this to be set up sanely by default to >>allow users to successfully use createdb and dropdb on only their own >>data, then I will file a wishlist bug against the Debian packages to add >>this. Or is this all a pipe dream? >> >> > >I don't see what the problem is. PostgreSQL is perfectly able to be installed >and run as any old user. Even if one is distributing via a package system, such >as RPM perhaps, which has problems installing as non-root user if one sees the >package already installed then there's no need to try to install it >again. Even with the package installations there's bugger all stopping any old >user doing an initdb -D myownplace. > >The standard postgres user on a system is only special in that it is generally >the user the postmaster is run as. If you have other non-priviledged users and >there's no need for them to share a cluster why try to make them? > > I'm sorry, maybe I didn't state my ideas clearly enough. I was following on to Tom Lane's suggestion that the Debian and RPM packages make configuration "automatic". I was trying to explain that even after installation, some administrator configuration is required to make the server usable for non-privelidged users. Either in the form of creating and dropping databases or in giving the users rights to do it themselves. The way I'd like to make it work for my app is to run a seperate postmaster as the user to avoid involving the machine administrator or interfering with other users. If a shared install can be made to work in an equivalent way, then I would rather go with that. I'm not clear on whether that's possible though. My current understanding is that it is not and that my original notion is the only way for my app.
I still have to respectfully disagree. Postgresql is IMO just the wrong software for the job, and given that there are still a number of really important things that postgresql lacks, it should concentrate on those. I am not against it however for technical reasons, because those things can always be overcome. I just wouldn't want postgresql to start branching out in different directions at this point, it makes no sense if the project wants to keep focused and one day become comparable side by side to oracle. IMO that should be it's main goal, and embedded functionality would be a detour that has more chances of doing harm then good. Chris > > > > Frankly, I am _mighty glad_ that the developers are focussed on > > enterprise-level applications. > > Me too. But I think we can get it for small-scale stuff with no impact on > the rest. Just an additional function or 6 to make embedded developer's > lives easier. > > Jon > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
Chris Ochs wrote: > I still have to respectfully disagree. Postgresql is IMO just the wrong > software for the job, and given that there are still a number of really > important things that postgresql lacks, it should concentrate on those. I > am not against it however for technical reasons, because those things can > always be overcome. I just wouldn't want postgresql to start branching out > in different directions at this point, it makes no sense if the project > wants to keep focused and one day become comparable side by side to oracle. > IMO that should be it's main goal, and embedded functionality would be a > detour that has more chances of doing harm then good. I believe the basic mission of Postgres will stay the same [for a long time]: providing a 1) reliable database; 2) with advanced; 3) and standard compliant features. However, venturing into win32 world _will_ generate demands like embedded and other desktop app-oriented features since I expect that's what many people will using Postgres for in win32 (currently people are using IB/FB for this and not MySQL/Postgre; MySQL's embedded version is GPL/commercial). And we all know it's all a matter of what itches the developers the most. If enough people are bugging them about something, they will do it... :-) -- dave
Hi Rick; I have been looking for a few years at a similar situation with my CRM software and soon (once I have the SOAP interfaces done, I will start on a stand-alone client. I have eventually decided on a very different structure than you have, though the result is more or less the same. Rather than using PostgreSQL as an embedded database manager, I have decided that it makes more sense for my app to "cache objects" in XML documents and then use those on trips to customers' sites. The XML documents can then be checked and the objects re-sync'd when the app goes online again. The cacheing would be automatic and cached objects would expire unless explicitly saved. Why not use XML for this sort of problem? Or berkeley DB? Is there any reason that PostgreSQL would be better? I am not aware of any functional replication systems which would work in this way, so I am guessing your app will have to handle all the logic for the syncronization anyway. Best Wishes, Chris Travers
Hi Tom and others; I agree with your reasoning insofar as it pertains to dedicating core PostgreSQL time into developing such a library. I also agree insofar as it pertains to the PostgreSQL core distribution. I do think, however, that given the current demand for a serverless solution to PostgreSQL (whether or not it is the right tool for the job), I would suggest that those who are interested and have the necessary skills set up a gborg project. Perhaps at some point if PostgreSQL distributions become more commonplace, we will see it used, and this will open up new oportunities for PostgreSQL. It should be remembered that more options in critical areas (such as Win32) may mean more community, and hopefully in the long run, more developers. Best Wishes, Chris Travers
On Fri, 16 Jan 2004, Jeff Bowden wrote: > Nigel J. Andrews wrote: > >I don't see what the problem is. PostgreSQL is perfectly able to be installed > >and run as any old user. Even if one is distributing via a package system, such > >as RPM perhaps, which has problems installing as non-root user if one sees the > >package already installed then there's no need to try to install it > >again. Even with the package installations there's bugger all stopping any old > >user doing an initdb -D myownplace. > > > >The standard postgres user on a system is only special in that it is generally > >the user the postmaster is run as. If you have other non-priviledged users and > >there's no need for them to share a cluster why try to make them? > > > > > > I'm sorry, maybe I didn't state my ideas clearly enough. I was > following on to Tom Lane's suggestion that the Debian and RPM packages > make configuration "automatic". I was trying to explain that even after > installation, some administrator configuration is required to make the > server usable for non-privelidged users. Either in the form of creating > and dropping databases or in giving the users rights to do it themselves. > > The way I'd like to make it work for my app is to run a seperate > postmaster as the user to avoid involving the machine administrator or > interfering with other users. If a shared install can be made to work > in an equivalent way, then I would rather go with that. I'm not clear > on whether that's possible though. My current understanding is that it > is not and that my original notion is the only way for my app. I think I probably didn't do the explanation clearly enough. This does depend on what you mean by shared install. I was assuming you just wanted to manage data independently only using the shared binaries installed from the package. If that's not the case then obviously this isn't what you want. If that is the case then you could run the following as any old user, provided the environment has been set appropiately of course: initdb -D $HOME/pgdata sed -e 's/^#port = 5432/port = 12345/' $HOME/pgdata/postgresql.conf > /tmp/p mv /tmp/p $HOME/pgdata/postgresql.conf pg_ctl -D $HOME/pgdata start then access the cluster using: pgsql -P 12345 template1 That's it. That's access as the db superuser for that cluster. Sure if the system admin upgrades the shared software then you could have problems. My apologies if I've got the wrong end of things again, I've got a lot of unread email and I keep reading things in a very haphazard order. -- Nigel J. Andrews
On Fri, 2004-01-16 at 16:00, Jonathan Bartlett wrote: > Linux has no real embedded database. PG would be an excellent > embedded database. SQLite works great for these kinds of applications, FWIW. b.g.
Quoth jlb@houseofdistraction.com (Jeff Bowden): > For ease of configuration and other reasons, I would like for my > single-user GUI app to be able to use postgresql in-process as a > library accessing a database created in the users home directory. I > think I could possibly get what I want by launching a captive copy of > postmaster with appropriate args but it seems conceptually cleaner to > not have a seperate process at all. Has anyone tried to do anything > like this? One of the properties of Unix is that spawning extra processes to "do their thing" allows each application to be designed more 'cleanly.' Curiously enough, the implementors of alternatives to the Sendmail MTA have typically concluded that it was not only conceptually cleaner to have a set of separate cooperating processes, but that they would also reap benefits in terms of speed and improved security. Thus, I think you'll find that people disagree with you on this. It seems to me that it is conceptually cleaner to have the "application" do "application stuff," and not muddle things up by mixing that with the "database stuff." -- If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me http://cbbrowne.com/info/sgml.html Rules of the Evil Overlord #87. "My vats of hazardous chemicals will be covered when not in use. Also, I will not construct walkways above them." <http://www.eviloverlord.com/>
jlb@houseofdistraction.com (Jeff Bowden) wrote: > Tom Lane wrote: >>Not only are the developers uninterested in it, the developers actively >>oppose it. We think an embedded database library cannot be reliable >>enough to meet our notion of a "database", since it would be subject to >>failures anytime the surrounding application has a bug. Keeping the >>client code in a separate process is a far more robust design. > That makes sense to me. I wonder if sqlite suffers for this problem > (e.g. app crashing and corrupting the database). I'm not sure its use has yet been so widespread that there is a good feel for this. It is doubtless _possible_; "crash patterns" likely _don't_ lead to enormous disasters when databases are small, localized, and the apps probably _don't_ crash all the time. > What about the notion of running postmaster on-demand as the user? > Is that something that anyone has experience with? It seems like it > would solve the complex configuration problems without compromising > robustness or requiring any special support other than sufficient > command-line parameters. I haven't had call for this being controlled "by the application," but I can't see it being vastly troublesome. And I _don't_ see it requiring lots of "command line" parameters; you just need to specify the directory where the configuration is. > Oh yeah, that brings me to another question. I was looking at the > postmaster command-line switches and I couldn't find any that would > allow me to point it at an arbitrary config file but then I had a > look around and it seems that the .conf files are already used by > the various startup scripts. Does postmaster itself ever read the > .conf files or is it controlled strictly by switches? All of the .conf files are in one directory, and that directory is controlled by either the value of environment variable PGDATA or the "-D" command parameter. It seems not-overly-valuable to have the .conf files be able to be specified in random other locations. Is there some particular reason you have in mind why you would want to _ignore_ the configuration in $HOME/DBDIR and instead use configuration in some other random location? I would think that collecting the config into one directory, as is done right now, is a _good_ thing. -- let name="cbbrowne" and tld="ntlug.org" in name ^ "@" ^ tld;; http://www3.sympatico.ca/cbbrowne/lsf.html "Even in the area of anticompetitive conduct, Microsoft is mainly an imitator." -- Ralph Nader (1998/11/11)
Christopher Browne wrote: >jlb@houseofdistraction.com (Jeff Bowden) wrote: > > >>Oh yeah, that brings me to another question. I was looking at the >>postmaster command-line switches and I couldn't find any that would >>allow me to point it at an arbitrary config file but then I had a >>look around and it seems that the .conf files are already used by >>the various startup scripts. Does postmaster itself ever read the >>.conf files or is it controlled strictly by switches? >> >> > >All of the .conf files are in one directory, and that directory is >controlled by either the value of environment variable PGDATA or the >"-D" command parameter. > >It seems not-overly-valuable to have the .conf files be able to be >specified in random other locations. > >Is there some particular reason you have in mind why you would want to >_ignore_ the configuration in $HOME/DBDIR and instead use >configuration in some other random location? I would think that >collecting the config into one directory, as is done right now, is a >_good_ thing. > > If the existing user configuration could be reliably determined to be sane and have sufficient permissions or could be made so, that would be fine. I am writing an app for users who are not necessarily knowledgeable about databases. If they don't have an existing useable postgres configuration, I would like for my app to be able to create one without forcing the user to have to be root or learn about postgres configuration. The data for this app should remain private to the user who created it and there is no need for concurrent access.
> On Fri, Jan 16, 2004 at 02:00:35PM -0800, Jonathan Bartlett wrote: > > > having PG as an embedded database have never used one. I think Tom, for > > example, has been focusing on enterprise-level applications too much to > > think about the personal applications, and Chris, I'm going to guess > > This seems to suggest that Postgres should focus on something other > than enterprise-level applications, and (presumably) on something > else. Yes but sometimes an enterprise level application may need to be put on a laptop and taken off-line. Having an embedded database that is compatible with the one on the server makes this a bit easier to do. > Frankly, I am _mighty glad_ that the developers are focussed on > enterprise-level applications. That's what makes Postgres eligible > to back enterprise-level software. Applications that try to do > everything for everyone tend to end up doing nothing very well. I > don't think one needs to look too hard to discover examples of that > phenomenon. Well, this very well may be true. It does seem as if many of the features that add the reliability and performance necessary for an enterprise server application are what makes it difficult to make a good embedded solution.
> Hi Rick; > > I have been looking for a few years at a similar situation with my CRM > software and soon (once I have the SOAP interfaces done, I will start on a > stand-alone client. I have eventually decided on a very different structure > than you have, though the result is more or less the same. > > Rather than using PostgreSQL as an embedded database manager, I have decided > that it makes more sense for my app to "cache objects" in XML documents and > then use those on trips to customers' sites. The XML documents can then be > checked and the objects re-sync'd when the app goes online again. The > cacheing would be automatic and cached objects would expire unless > explicitly saved. > > Why not use XML for this sort of problem? Or berkeley DB? Is there any > reason that PostgreSQL would be better? I am not aware of any functional > replication systems which would work in this way, so I am guessing your app > will have to handle all the logic for the syncronization anyway. Well since the desktop and server app are using almost exactly the same code it is nice to not have to recode everything that accesses the database. While data syncronization code will have to be written either way I want the existing code to "just work" once taken off line.
On Wed, Jan 21, 2004 at 11:23:40AM -0700, Rick Gigger wrote: > > Yes but sometimes an enterprise level application may need to be put on a > laptop and taken off-line. Having an embedded database that is compatible > with the one on the server makes this a bit easier to do. Why can't you just run a postgres instance for this? What is magic about "embedded" for this sort of application? Sounds like a clever wrapper script is all that's necessary for something like that, no? A -- Andrew Sullivan | ajs@crankycanuck.ca My friend, you can put wheels on your mama but that doesn't make her a bus. --Joel Spolsky
On Wed, 21 Jan 2004, Andrew Sullivan wrote: > On Wed, Jan 21, 2004 at 11:23:40AM -0700, Rick Gigger wrote: > > > > Yes but sometimes an enterprise level application may need to be put on a > > laptop and taken off-line. Having an embedded database that is compatible > > with the one on the server makes this a bit easier to do. > > Why can't you just run a postgres instance for this? What is magic > about "embedded" for this sort of application? Sounds like a clever > wrapper script is all that's necessary for something like that, no? That's what I still don't get. Embedded means something running on [and probably running] your wrist watch _not_ something running on a full blown system. I understand in this thread's context that embedded is used to mean embedded within an application on an ordinary system, took me a while to realise that though, but just means I can't see why it is wanted like that. Imagine if there was a DB around that was used by direct library calls from an application. What would be one of the first things that would be programmed using it? A server perhaps? -- Nigel Andrews
Quoting Rick Gigger <rick@alpinenetworking.com>: > > Hi Rick; > > > > I have been looking for a few years at a similar situation with my CRM > > software and soon (once I have the SOAP interfaces done, I will start on a > > stand-alone client. I have eventually decided on a very different > structure > > than you have, though the result is more or less the same. > > > > Rather than using PostgreSQL as an embedded database manager, I have > decided > > that it makes more sense for my app to "cache objects" in XML documents > and > > then use those on trips to customers' sites. The XML documents can then > be > > checked and the objects re-sync'd when the app goes online again. The > > cacheing would be automatic and cached objects would expire unless > > explicitly saved. > > > > Why not use XML for this sort of problem? Or berkeley DB? Is there any > > reason that PostgreSQL would be better? I am not aware of any functional > > replication systems which would work in this way, so I am guessing your > app > > will have to handle all the logic for the syncronization anyway. > > Well since the desktop and server app are using almost exactly the same code > it is nice to not have to recode everything that accesses the database. > While data syncronization code will have to be written either way I want the > existing code to "just work" once taken off line. Rick, I'm not sure what OS you're using (I might have missed it in the thread) but I've never even thought about an embedded database because PG runs very nice on my laptop (P III - 850Mhz but I did NOT it new). Now, I'm running Linux so maybe that is the difference. In fact, I'm found it to be invaluable to take it to a client meeting and show not only the high level application gloss but I can also show the guts of a application/web site. Now I do understand the sync'ning issue with the database and I haven't had that issue yet but since you'd have two Pg server's, I would think this issue could be handled one of the replication project that is available or by something you custom build (like a perl script that connects to both systems and syncs by checking for missing pkey's in both systems). If you have the ability, you might want to invest in building a laptop with Linux on it until the features you want are available in the methods you want to use them in. -$0.02 -- Keith C. Perry, MS E.E. Director of Networks & Applications VCSN, Inc. http://vcsn.com ____________________________________ This email account is being host by: VCSN, Inc : http://vcsn.com
Andrew Sullivan wrote: > On Wed, Jan 21, 2004 at 11:23:40AM -0700, Rick Gigger wrote: > >>Yes but sometimes an enterprise level application may need to be put on a >>laptop and taken off-line. Having an embedded database that is compatible >>with the one on the server makes this a bit easier to do. > > Why can't you just run a postgres instance for this? What is magic > about "embedded" for this sort of application? Sounds like a clever > wrapper script is all that's necessary for something like that, no? The "magic" seems to be that the application appears to be managing the database by its own, without the help of any separate database product. I've seen commercial product that uses embedded Firebird and changes the default extension of the database file from *.fdb into *.dat or something. Of course, with the current cluster/database directory layout, it's still easy to spot PostgreSQL footprints all over the place (pg_hba.conf, pg_xlog/, PG_VERSION, etc). -- dave
David Garamond wrote: > Andrew Sullivan wrote: > >> On Wed, Jan 21, 2004 at 11:23:40AM -0700, Rick Gigger wrote: >> >>> Yes but sometimes an enterprise level application may need to be put >>> on a >>> laptop and taken off-line. Having an embedded database that is >>> compatible >>> with the one on the server makes this a bit easier to do. >> >> >> Why can't you just run a postgres instance for this? What is magic >> about "embedded" for this sort of application? Sounds like a clever >> wrapper script is all that's necessary for something like that, no? > > > The "magic" seems to be that the application appears to be managing > the database by its own, without the help of any separate database > product. I've seen commercial product that uses embedded Firebird and > changes the default extension of the database file from *.fdb into > *.dat or something. > > Of course, with the current cluster/database directory layout, it's > still easy to spot PostgreSQL footprints all over the place > (pg_hba.conf, pg_xlog/, PG_VERSION, etc). > It's not so important to hide the fact that postgres is being used as it is to spare the user from having to know anything about general database administration when the all the app really needs from the db is for it to be a more powerful way of storing and retreiving information from user-owned files.
On Thu, Jan 22, 2004 at 09:03:13AM -0800, Jeff Bowden wrote: > >>Why can't you just run a postgres instance for this? What is magic > >>about "embedded" for this sort of application? Sounds like a clever > >>wrapper script is all that's necessary for something like that, no? > > > > > >The "magic" seems to be that the application appears to be managing > >the database by its own, without the help of any separate database > >product. I've seen commercial product that uses embedded Firebird and > >changes the default extension of the database file from *.fdb into > >*.dat or something. > > > >Of course, with the current cluster/database directory layout, it's > >still easy to spot PostgreSQL footprints all over the place > >(pg_hba.conf, pg_xlog/, PG_VERSION, etc). > > > > It's not so important to hide the fact that postgres is being used as it > is to spare the user from having to know anything about general database > administration when the all the app really needs from the db is for it > to be a more powerful way of storing and retreiving information from > user-owned files. Installation and upgrades are another issue. To 'install' a typical embedded database requires no action at all - all the infrastructure is embedded within the application, or at most a single external dynamica library. I deploy and support applications that use PostgreSQL as a backend database and a huge amount of support effort is consumed by PostgreSQL issues. (We're just upgrading from 7.2.* to 7.4.1, and walking even very unix literate users through the pain of a dump, upgrade, restore process is very painful). Other differences between an embedded database and a typical RDBMS is that the embedded database is likely to be between twice as fast and ten times as fast, depending on the details of the application. One tradeoff, of course, is that the embedded database is likely to be single-user, or at most single-writer, multiple-reader. The other advantage that the embedded DB likely has is that it probably accepts queries and returns results as language native types via direct calls, avoiding all the overhead of translating queries into SQL and translating the result sets back again. Another difference between the two is that an RDBMS like PostgreSQL is that PostgreSQL will make more requirements of the underlying OS (for instance shared-memory and semaphores) - meaning that the kernel may need to be reconfigured and the machine rebooted. If what you need is a single-user database then, nice as it is to have a full RDBMS, you really don't need all that infrastructure, and the users will (likely) be far happier without it. If it'd be nice to have SQL support, then there are embedded, single-user, databases with some level of SQL support. All of this explains why an embedded PostgreSQL isn't a great idea. It being a true multi-user database means that even if you went though all the work needed to turn it into an embedded database you wouldn't get most of the advantages. Short term, if you want embedded SQL, use something like SQLite that's designed to work that way. If you want to share code between an embedded DB and PostgreSQL then use a database agnostic API (such as DBI or libdbi) and restrict yourself to fairly vanilla SQL. If you must have PostgreSQL used by a non DB aware user then you'll have to do what I've done, which is to build some automated maintenance code around PostgreSQL to handle installation and day-to-day maintenance and monitoring automagically, or patch something similar together based on the vacuum daemon and so on. Now, while I think that an embedded fork of PostgreSQL is completely missing the point I do think that a low maintenance fork or configuration option would be a very useful feature. I'd love to be able to ship an application that would o Have a private installation of PostgreSQL o That would run semi-persistently - if the DB isn't running, the application will transparently start it, and if the DB is idle for some length of time it gracefully shuts down o Is zero-maintenance - all vacuuming, analysing etc is handled automatically. So are database version upgrades. o That runs under the permissions of the user running the application o And that could, by tweaking an application configuration variable could swap out the private PostgreSQL installation and instead access a standard installation But (apart from the easy DB version upgrade bit) I'd far rather have point-in-time or incremental backups, easy replication, richer SQL support, easier reporting and easier to use replication. So, much as I'd like it, a low maintenance build is a long way down my list of priorities. Cheers, Steve
> I'm not sure what OS you're using (I might have missed it in the thread) but > I've never even thought about an embedded database because PG runs very nice on > my laptop (P III - 850Mhz but I did NOT it new). Now, I'm running Linux so > maybe that is the difference. In fact, I'm found it to be invaluable to take it > to a client meeting and show not only the high level application gloss but I can > also show the guts of a application/web site. If it were just my laptop that would be fine. But I just can't really install a full on database server (nor would I feel comfortable doing so) on every machine that I would need it on. > Now I do understand the sync'ning issue with the database and I haven't had that > issue yet but since you'd have two Pg server's, I would think this issue could > be handled one of the replication project that is available or by something you > custom build (like a perl script that connects to both systems and syncs by > checking for missing pkey's in both systems). This is not an issue for postgres at this point. I will most likely just handle the syncing of specific parts of the database on my own.
> > It's not so important to hide the fact that postgres is being used as it > > is to spare the user from having to know anything about general database > > administration when the all the app really needs from the db is for it > > to be a more powerful way of storing and retreiving information from > > user-owned files. > > Installation and upgrades are another issue. > > To 'install' a typical embedded database requires no action at all - > all the infrastructure is embedded within the application, or at most > a single external dynamica library. This is by far (other than reliability and compatibility) the most important thing for me. > All of this explains why an embedded PostgreSQL isn't a great idea. It > being a true multi-user database means that even if you went though > all the work needed to turn it into an embedded database you wouldn't > get most of the advantages. Is it true that postgres is not suited for this and should not be used as such or is it just a matter of spending the time to allow you maybe compile an embedded version? > Short term, if you want embedded SQL, use something like SQLite that's > designed to work that way. If you want to share code between an > embedded DB and PostgreSQL then use a database agnostic API (such as > DBI or libdbi) and restrict yourself to fairly vanilla SQL. This is exactly the approach that I have been using and it has worked very, very well. The only problems that I have come across have occured when my sql just wasn't quite "vanilla" enough. For instance sqlite doesn't have date functions. We have not come across any show stoppers for this approach but we do have to always consider that we are coding for 2 different dbs and sometimes we do have to write db specific sql when there are differences and test on both dbs. I understand that if that's what I have to do then that's what I have to do. It would just be very, very nice to have postgres in both situations, if it were possible. > Now, while I think that an embedded fork of PostgreSQL is completely > missing the point I do think that a low maintenance fork or > configuration option would be a very useful feature. I'd love to > be able to ship an application that would > > o Have a private installation of PostgreSQL > > o That would run semi-persistently - if the DB isn't running, the > application will transparently start it, and if the DB is idle > for some length of time it gracefully shuts down > > o Is zero-maintenance - all vacuuming, analysing etc is handled > automatically. So are database version upgrades. > > o That runs under the permissions of the user running the application > > o And that could, by tweaking an application configuration variable > could swap out the private PostgreSQL installation and instead > access a standard installation This just might work fine for me. :-) > But (apart from the easy DB version upgrade bit) I'd far rather have > point-in-time or incremental backups, easy replication, richer SQL > support, easier reporting and easier to use replication. So, much as > I'd like it, a low maintenance build is a long way down my list of > priorities. point-in-time, incremental backups, easy replication would be big for me as well. Especially point-in-time and incremental backups
"Rick Gigger" <rick@alpinenetworking.com> writes: >> All of this explains why an embedded PostgreSQL isn't a great idea. It >> being a true multi-user database means that even if you went though >> all the work needed to turn it into an embedded database you wouldn't >> get most of the advantages. > Is it true that postgres is not suited for this and should not be used as > such or is it just a matter of spending the time to allow you maybe compile > an embedded version? I think that Steve has it exactly right here. Postgres isn't designed to be an embedded database in that sense, and none of the developers are interested in moving it in that direction. It would require too many compromises versus the full-fledged-server situation. This is definitely a case where one size does not fit all. Rather than trying to force-fit Postgres to an application it's not suited for, you should use another product that is designed for that application. In short: your time would be better spent on upgrading SQLite to do what you need. regards, tom lane
> "Rick Gigger" <rick@alpinenetworking.com> writes: > >> All of this explains why an embedded PostgreSQL isn't a great idea. It > >> being a true multi-user database means that even if you went though > >> all the work needed to turn it into an embedded database you wouldn't > >> get most of the advantages. > > > Is it true that postgres is not suited for this and should not be used as > > such or is it just a matter of spending the time to allow you maybe compile > > an embedded version? > > I think that Steve has it exactly right here. Postgres isn't designed > to be an embedded database in that sense, and none of the developers are > interested in moving it in that direction. It would require too many > compromises versus the full-fledged-server situation. > > This is definitely a case where one size does not fit all. Rather > than trying to force-fit Postgres to an application it's not suited for, > you should use another product that is designed for that application. > In short: your time would be better spent on upgrading SQLite to do what > you need. How about the following comment from an earlier post: > Now, while I think that an embedded fork of PostgreSQL is completely > missing the point I do think that a low maintenance fork or > configuration option would be a very useful feature. I'd love to > be able to ship an application that would > > o Have a private installation of PostgreSQL > > o That would run semi-persistently - if the DB isn't running, the > application will transparently start it, and if the DB is idle > for some length of time it gracefully shuts down > > o Is zero-maintenance - all vacuuming, analysing etc is handled > automatically. So are database version upgrades. > > o That runs under the permissions of the user running the application > > o And that could, by tweaking an application configuration variable > could swap out the private PostgreSQL installation and instead > access a standard installation Is this something that could make sense for postgres?
On Thu, Jan 22, 2004 at 01:04:29PM -0700, Rick Gigger wrote: [PostgreSQL ill-suited to embedded use] > How about the following comment from an earlier post: > > > Now, while I think that an embedded fork of PostgreSQL is completely > > missing the point I do think that a low maintenance fork or > > configuration option would be a very useful feature. I'd love to > > be able to ship an application that would > > > > o Have a private installation of PostgreSQL > > > > o That would run semi-persistently - if the DB isn't running, the > > application will transparently start it, and if the DB is idle > > for some length of time it gracefully shuts down > > > > o Is zero-maintenance - all vacuuming, analysing etc is handled > > automatically. So are database version upgrades. > > > > o That runs under the permissions of the user running the application > > > > o And that could, by tweaking an application configuration variable > > could swap out the private PostgreSQL installation and instead > > access a standard installation > > Is this something that could make sense for postgres? It is something that could be done without touching the PostgreSQL core code - for instance, by dropping a transparent shim in front of libpg that maintains a little state and intercepts some of the calls to libpg, yet presents the same API to the application. Add a flag to the connect string to identify an attempt to connect to a private installation (passing the top-level path so the shim can find the binaries, database directory and socket). On a failed attempt to connect to the private installation try and fork/exec the postmaster, wait for it to spin up and then continue. Once connected, kill -TERM the postmaster (so that it'll shutdown gracefully once everyone is disconnected). That'll provide everything needed for a private installation, optional connection to a real installation, automatic spin-up and spin-down. Vacuuming and analysing can be handled using the algorithms (and the code, come to that) from pg_autovacuum. That provides everything on my wish-list apart from automated version upgrades with only a trivial amount of coding needed - probably no more than a hundred lines of new code for a minimalist version. Cheers, Steve
Steve Atkins <steve@blighty.com> writes: > [ much snipped ] > Vacuuming and analysing can be handled using the algorithms (and the code, > come to that) from pg_autovacuum. Autovacuum will undoubtedly migrate into the core. I'm not sure how soon, or whether the end result will look much like the present contrib code, but everybody agrees that this is something we need for main-line development. Accordingly, I don't see the point in expending extra effort on this in a (hypothetical) embedded-like wrapper for PG. regards, tom lane