Thread: How do I use the backend APIs
Hi, In Postgres, is there a C language API which would give me access to BTrees like Berkeley DB does? eg to seek to a particular key/value pair and iterate forward from there? If not whats the nearest thing to this in Postgres? Cheers.
On Thu, Feb 16, 2006 at 07:41:09AM -0800, Chad wrote: > In Postgres, is there a C language API which would give me access to > BTrees like Berkeley DB does? eg to seek to a particular key/value pair > and iterate forward from there? If not whats the nearest thing to this > in Postgres? Could you tell us about the problem you're trying to solve? Are you writing client-side or server-side code? -- Michael Fuhr
"Chad" <chadzakary@hotmail.com> wrote > > In Postgres, is there a C language API which would give me access to > BTrees like Berkeley DB does? eg to seek to a particular key/value pair > and iterate forward from there? AFAIK there is no such API for this purpose. The reason is that to access BTree, you have to setup complex enough environment to enable so. For example, the buffer pool support, the WAL support etc. So though exporting such API is easy to do, there is no pratical usage of it. Regards, Qingqing
On Fri, 2006-02-17 at 11:34 +0800, Qingqing Zhou wrote: > AFAIK there is no such API for this purpose. The reason is that to access > BTree, you have to setup complex enough environment to enable so. For > example, the buffer pool support, the WAL support etc. So though exporting > such API is easy to do, there is no pratical usage of it. Well, if the API is going to be invoked by C UDFs, it could assume that the environment has been appropriately initialized. I think it would be possible to provide such an API (although it would take a considerable amount of work). However, I don't see the point -- why would an application want to use the API? SQL is much more flexible. -Neil
A long time ago, in a galaxy far, far away, neilc@samurai.com (Neil Conway) wrote: > On Fri, 2006-02-17 at 11:34 +0800, Qingqing Zhou wrote: >> AFAIK there is no such API for this purpose. The reason is that to access >> BTree, you have to setup complex enough environment to enable so. For >> example, the buffer pool support, the WAL support etc. So though exporting >> such API is easy to do, there is no pratical usage of it. > > Well, if the API is going to be invoked by C UDFs, it could assume that > the environment has been appropriately initialized. > > I think it would be possible to provide such an API (although it would > take a considerable amount of work). However, I don't see the point -- > why would an application want to use the API? SQL is much more flexible. It would probably make more sense to create an API that runs the activities via a translation into SQL... -- let name="cbbrowne" and tld="ntlug.org" in String.concat "@" [name;tld];; http://linuxdatabases.info/info/slony.html Rules of the Evil Overlord #46. "If an advisor says to me "My liege, he is but one man. What can one man possibly do?", I will reply "This." and kill the advisor." <http://www.eviloverlord.com/>
On Thu, Feb 16, 2006 at 07:41:09AM -0800, Chad wrote: > Hi, > > In Postgres, is there a C language API which would give me access to > BTrees like Berkeley DB does? eg to seek to a particular key/value pair > and iterate forward from there? If not whats the nearest thing to this > in Postgres? Well, in the backend you can do things like open a btree index, setup an ScanKey to indicate which values you want and then keep calling getnext(). If you set your scankey to (col1 >= 'A') it will start at 'A' and go up from there... Most of the time though you just create a query and use SPI_exec. Then you don't actually have to worry about details like names of the indexes, OIDs, types, comparison functions, etc... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Thanks Martijn, This is exactly what I am looking for. I'm wondering how easy it is to sit on top of this backend. Does anybody have any stand alone sample code? Is it a library that can be linked or do you need to produce a modified version of the postgres server? Can it be used in shared library form and if so will it support multiple processes using it on the same machine? ---------------------- "Well, in the backend you can do things like open a btree index, setup an ScanKey to indicate which values you want and then keep calling getnext(). If you set your scankey to (col1 >= 'A') it will start at 'A' and go up from there... "
In a word: The kind of problems people use Berkeley DB for. People use BDB for more fine grained cursor access to BTrees. Stuff you CANNOT do with SQL. There is a market for this. See their website. I'd like something similar from Postgres so that the data would be stored in a full fledged RDBMS but I could use the cursor methods for searching more efficient than SQL. Best of both worlds. I've had a quick browse around the Postgres code and found some functions like "_bt_first()" but no sample code to use it. BTW its for developing an alternative server based access to the underlying relational data.
"Chad" <chadzakary@hotmail.com> writes: > This is exactly what I am looking for. I'm wondering how easy it is to > sit on top of this backend. You can't, and you'll get exactly zero community support for trying. We don't believe in embedded databases --- or at least, we don't believe in trying to use Postgres as one. We like a hard-and-fast separation between client and database server, so that client programming mistakes can't corrupt the database. You could possibly do what you are thinking of in the form of user-defined functions executing in the backend, but the communication overhead to the client side is likely more than you want, and you'll be relying on APIs that we consider backend-internal and feel free to whack around at the drop of a hat. I'd suggest looking for something that's actually intended to be an embedded database. sqlite maybe, though I'm no expert on the subject. For that matter, have you looked at good old dbm? regards, tom lane
On Fri, Feb 17, 2006 at 01:06:16AM -0800, Chad wrote: > In a word: The kind of problems people use Berkeley DB for. > > People use BDB for more fine grained cursor access to BTrees. Stuff you > CANNOT do with SQL. There is a market for this. See their website. I'd > like something similar from Postgres so that the data would be stored > in a full fledged RDBMS but I could use the cursor methods for > searching more efficient than SQL. Best of both worlds. Well, just the brief look at the docs doesn't immediatly reveal anything that couldn't be done with straight SQL and server side functions. It would be helpful if you could give an example of what you actually want to do. > I've had a quick browse around the Postgres code and found some > functions like "_bt_first()" but no sample code to use it. BTW its for > developing an alternative server based access to the underlying > relational data. Well, that function is several levels below where you need to be looking. Using it directly will probably get you into a world of hurt. BTW, what does "alternative server based access to the underlying relational data" mean? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
A long time ago, in a galaxy far, far away, "Chad" <chadzakary@hotmail.com> wrote: > In a word: The kind of problems people use Berkeley DB for. > People use BDB for more fine grained cursor access to BTrees. Stuff you > CANNOT do with SQL. There is a market for this. See their website. I'd > like something similar from Postgres so that the data would be stored > in a full fledged RDBMS but I could use the cursor methods for > searching more efficient than SQL. Best of both worlds. I daresay we get enough challenges to fill the day when we use the "coarse graining" of SQL. I'm generally keener on getting aggregate results that let me not bother needing to search in fantastical ways... As far as I'm concerned, you're not pointing at a better world; you're pointing at a worse one. I've seen far too many bugs falling out of the navigational complexities of navigation-oriented data structures. The sheer scope of bugginess of that is why my ears perk up when mention of languages like R and APL and such come up; I don't want to navigate through data; I want to parallel process it :-). > I've had a quick browse around the Postgres code and found some > functions like "_bt_first()" but no sample code to use it. BTW its > for developing an alternative server based access to the underlying > relational data. Those sorts of functions are intended as internals, and public usage can be expected to break gloriously badly as changing them is fair game as PostgreSQL progresses to new versions. For things for "public use," you should look at what is offered in libpq. If you could outline some usage that might make it more powerful, it is not implausible that people would listen. There are doubtless ways that cursors could be enhanced, and that might be the direction you would want to go. But you're not too likely to see PostgreSQL rewritten for the sake of attracting the "market" of people who need to manipulate the fine semantics of B-tree navigation. -- If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me http://cbbrowne.com/info/rdbms.html "For those of you who are into writing programs that are as obscure and complicated as possible, there are opportunities for... real fun here" -- Arthur Norman
Chad wrote: > Thanks Martijn, > "Well, in the backend you can do things like open a btree index, setup > an ScanKey to indicate which values you want and then keep calling > getnext(). If you set your scankey to (col1 >= 'A') it will start at > 'A' and go up from there... " That looks suspiciously much like cursors. Isn't this what you're looking for?: OPEN curs1 FOR SELECT * FROM foo WHERE key = mykey; FETCH cursor INTO target; CLOSE cursor; (from: http://www.postgresql.org/docs/8.1/static/plpgsql-cursors.html#PLPGSQL-CURSOR-DECLARATIONS ) I figure the other PL/ languages feature cursors as well, but I've never used those. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Firstly thank you to all who have taken the time to reply so far. I need to clarify a few things based on the feedback I have received. 1. I understand the concerns you have about people using internal APIs that the developers are free to change. I also understand the risks I take if I use an undocumented API. I would prefer to use a supported public API with full support of the community. I want a clean API as you have put it. My understanding is that the level of abstraction at which I want to use Postgres is not supported by a public API today. This API may exist at a less official level for internal use. Lets revisit the argument as to whether I NEED this level of abstraction in a later point as people have made good suggestions in this regard to counter my opinion. 2. I want to base my development on a fully fledged relational DB (not just an embedded DB). This is because I believe it will increase the appeal of my product. 3. We see cleanly defined layers in Networking Software and Operating Systems Software. For example, I can send stuff over a network as Ethernet Frames if I want to use a library at that level of abstraction, OR I can send stuff as TCP packets, this assuming I have at my disposal libraries exposing these levels of abstraction. I believe there is a case for DBMS software to support this concept of layering. I'm not aware of any standards based database software stack but I believe that if it existed it could result in better reuse of database core technology. This stack would maybe begin with a buffer cache at level 0, a data structures layer at level 1 (Btrees, heaps etc), a data model layer at level 2...eventually you work up to client APIs and so on. Whatever this stack looks like, I would like to be able to link to a library that exposed the data structure layer of a fully fledged relational DB. I'm not going to try to implement that from scratch and I want what I develop to have full SQL support. 4. Why I think I need to work at this level of abstraction. I believe, certain types of queries can be realized more efficiently by code that can seek to specific parts of an index and immediately returning a small number of rows to the client. The query would remain active and the client would only retrieve the next rows after the previous cursor position when the client was ready to do so. With a huge underlying table set this would enable very free incremental browsing of the data. Imagine a table browser application sitting on top of a multi million-row table. The user types in a partial name match ("Mal"). They want the table to scroll to the first name matching the name they have typed ("Malcolm"). It should then be cheap to interactively scroll downwards from here. It should also be cheap to scroll upwards to view records immediately preceding "Malcolm". The most natural way I can think of doing this is an API to cursor seek to "Malcolm" and navigate forwards or backwards from that cursor pos. I feel a B-Tree cursor API matches the way you would implement a table like this, only retrieving data as the user scrolls to it.
On Mon, Feb 20, 2006 at 05:08:33AM -0800, Chad wrote: > 4. Why I think I need to work at this level of abstraction. I believe, > certain types of queries can be realized more efficiently by code that > can seek to specific parts of an index and immediately returning a > small number of rows to the client. The query would remain active and > the client would only retrieve the next rows after the previous cursor > position when the client was ready to do so. With a huge underlying <snip> You realise you can acheive almost exactly what you want here by saying things like: DECLARE c AS CURSOR FOR select * from table where name >= 'Mal' order by name; FETCH FORWARD 1; -- Go down FETCH FORWARD 1; -- Go down FETCH BACKWARD 2; -- Go back again If you have an index it will just step through the index to find the rows you want. While you're free to try and make this work by using the index code directly, you're not asking anything that's otherwise impossible. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
On Mon, 2006-02-20 at 16:40, Martijn van Oosterhout wrote: > On Mon, Feb 20, 2006 at 05:08:33AM -0800, Chad wrote: > > 4. Why I think I need to work at this level of abstraction. I believe, > > certain types of queries can be realized more efficiently by code that > > can seek to specific parts of an index and immediately returning a > > small number of rows to the client. The query would remain active and > > the client would only retrieve the next rows after the previous cursor > > position when the client was ready to do so. With a huge underlying > > <snip> > > You realise you can acheive almost exactly what you want here by saying > things like: > > DECLARE c AS CURSOR FOR select * from table where name >= 'Mal' order by name; > FETCH FORWARD 1; -- Go down > FETCH FORWARD 1; -- Go down > FETCH BACKWARD 2; -- Go back again > > If you have an index it will just step through the index to find the > rows you want. While you're free to try and make this work by using the > index code directly, you're not asking anything that's otherwise > impossible. I think the problem here is that when you have a hammer, everything looks like a nail. We see it with people coming from Oracle or DB2 as well. "I learned it this way" is often a bigger obstacle to learning postgresql than no knowledge at all.
Thanks Martijn/Alban, This look interesting. I'll make some time to try this problem out using your approach. I have a few questions like: -Could I skip the FETCH FORWARD and go straight to the FETCH BACKWARD i.e. declare cursor to be at "Mal" and go backwards from there or is the cursor limited to going backward only as far as "Mal"? -Does the DB avoid transferring the data until the FETCH command? -When rows change in between opening the cursor and fetching the changed rows, will the FETCH retrieve the new data or is a snapshot taken when the cursor is declared ? -What about concurrency? If a cursor is kept open while other transactions change the same table or does it cause those writer transactions to block? Perhaps this is configurable.
Chad wrote: > Thanks Martijn/Alban, > > This look interesting. I'll make some time to try this problem out > using your approach. > I have a few questions like: > -Could I skip the FETCH FORWARD and go straight to the FETCH BACKWARD > i.e. declare cursor to be at "Mal" and go backwards from there or is > the cursor limited to going backward only as far as "Mal"? > -Does the DB avoid transferring the data until the FETCH command? Indeed it does. > -When rows change in between opening the cursor and fetching the > changed rows, will the FETCH retrieve the new data or is a snapshot > taken when the cursor is declared ? > -What about concurrency? If a cursor is kept open while other > transactions change the same table or does it cause those writer > transactions to block? Perhaps this is configurable. Transactions are isolated. If data is changed in other transactions, they don't influence the data in the transaction you're currently looking at. I assume the same goes for cursors, it's still a query after all. I don't know what happens if you change the data you're looping over in a way that the result set changes, but a test case can't be too hard to think up. I'm talking about something like this: OPEN cur FOR SELECT val FROM values WHERE val BETWEEN 1 AND 10 ORDER BY val; LOOP FETCH cur INTO record; -- This reverses the order in which the record would be fetched UPDATE values SET val = 11 - val WHERE val = record.val; END LOOP; Makes me kind of curious what happens... Would it only get to halfway the values 1..10 and then go backwards again? -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
On Tue, Feb 21, 2006 at 02:41:13AM -0800, Chad wrote: > Thanks Martijn/Alban, > > This look interesting. I'll make some time to try this problem out > using your approach. > I have a few questions like: > -Could I skip the FETCH FORWARD and go straight to the FETCH BACKWARD > i.e. declare cursor to be at "Mal" and go backwards from there or is > the cursor limited to going backward only as far as "Mal"? Only as far back as "Mal". However, if you set your original query to ORDER BY blah DESC, then FETCH FORWARD would scan backward through the index. > -Does the DB avoid transferring the data until the FETCH command? Of course. It would kind of defeat the purpose to do otherwise. > -When rows change in between opening the cursor and fetching the > changed rows, will the FETCH retrieve the new data or is a snapshot > taken when the cursor is declared ? Standard visibility rules apply. READ COMMITTED shows anything committed, even after you've started. SERIALIZABLE gives you a consistant snapshot. > -What about concurrency? If a cursor is kept open while other > transactions change the same table or does it cause those writer > transactions to block? Perhaps this is configurable. Some as normal. PostgreSQL doesn't acquire any locks for plain SELECTs so no risk there... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Martijn van Oosterhout <kleptog@svana.org> writes: > On Tue, Feb 21, 2006 at 02:41:13AM -0800, Chad wrote: >> -When rows change in between opening the cursor and fetching the >> changed rows, will the FETCH retrieve the new data or is a snapshot >> taken when the cursor is declared ? > Standard visibility rules apply. READ COMMITTED shows anything > committed, even after you've started. SERIALIZABLE gives you a > consistant snapshot. I believe that a cursor always shows a snapshot --- whether the transaction is READ COMMITTED or SERIALIZABLE only affects whether the snapshot is current as of the DECLARE CURSOR command or the transaction's BEGIN command. In either case you won't see changes occurring after the cursor is opened. regards, tom lane
Thanks Martijn.