Thread: Access to current database from C-language function
Hi, I have read http://www.postgresql.org/docs/9.1/static/xfunc-c.html and my idea is, to write a C function which returns aset of rows. To generate the result set, I would like to access indexes directly using the information I found at http://www.postgresql.org/docs/9.1/static/indexam.html.But I don't get the idea how to glue both parts together!? Could somebodygive me a starting point? How do I get a handle to the current database inside a C function? cheers, Achim
On Jul25, 2011, at 13:40 , Achim Domma wrote: > I have read http://www.postgresql.org/docs/9.1/static/xfunc-c.html and my idea is, to write a C function which returnsa set of rows. To generate the result set, I would like to access indexes directly using the information I found athttp://www.postgresql.org/docs/9.1/static/indexam.html. But I don't get the idea how to glue both parts together!? Couldsomebody give me a starting point? How do I get a handle to the current database inside a C function? If you want to access the database using SQL from C-language functions, there the SPI API for that. (http://www.postgresql.org/docs/9.0/interactive/spi.html).The API provided by SPI is conceptually similar to the one providedby the client library libpq, i.e. it works in terms of statements, cursors, ...). SPI takes care of handling allthe low-level details like making sure you're using a valid snapshot, are inside a transaction, correctly handle lockedrows, ... A more low-level API is provided by {heap,index}_{beginscan,endscan}, heap_{insert,update,delete} and index_insert. However,correct handling of transactions using this API isn't easy - for example, to update a row you'd first have to findthe latest version of that row, then decide if you're allowed to update it, and finally create a new version. best regards, Florian Pflug
Am 25.07.2011 um 14:48 schrieb Florian Pflug: > A more low-level API is provided by {heap,index}_{beginscan,endscan}, heap_{insert,update,delete} and index_insert. However,correct handling of transactions using this API isn't easy - for example, to update a row you'd first have to findthe latest version of that row, then decide if you're allowed to update it, and finally create a new version. > I see the problems with the second approach, but that's definitively what I would like to do. But I'm only interested inreading, which will hopefully make it a bit easier. Could you guide me to a starting point? Assuming my database has atable T with an index I. How do I get access to that index? cheers, Achim
On Jul25, 2011, at 22:31 , Achim Domma wrote: > Am 25.07.2011 um 14:48 schrieb Florian Pflug: >> A more low-level API is provided by {heap,index}_{beginscan,endscan}, heap_{insert,update,delete} and index_insert. However,correct handling of transactions using this API isn't easy - for example, to update a row you'd first have to findthe latest version of that row, then decide if you're allowed to update it, and finally create a new version. > > I see the problems with the second approach, but that's definitively what I would like to do. You're in for a lot of work, then. I still suggest that you explain your ultimate goals before you embark on your endeavor- people might be able to point our easier ways to achieve those. > But I'm only interested in reading, which will hopefully make it a bit easier. Maybe. But note that if you want your code to be correct for all transaction isolation level, you have to be quite carefuleven when only reading. Especially from postgres 9.1 forward, due to the new true serializability feature of thatrelease. > Could you guide me to a starting point? Assuming my database has a table T with an index I. How do I get access to thatindex? I suggest you start by reading nodeIndexScan.c. This is the code the query executor uses to implement index scans. However,before you embark on your endeavor. best regards, Florian Pflug
Am 26.07.2011 um 00:40 schrieb Florian Pflug: > On Jul25, 2011, at 22:31 , Achim Domma wrote: >> Am 25.07.2011 um 14:48 schrieb Florian Pflug: >>> A more low-level API is provided by {heap,index}_{beginscan,endscan}, heap_{insert,update,delete} and index_insert. However,correct handling of transactions using this API isn't easy - for example, to update a row you'd first have to findthe latest version of that row, then decide if you're allowed to update it, and finally create a new version. >> >> I see the problems with the second approach, but that's definitively what I would like to do. > > You're in for a lot of work, then. I still suggest that you explain your ultimate goals before you embark on your endeavor- people might be able to point our easier ways to achieve those. I have tables which store two integer IDs and a floating point rank. So the table MyTable might have these columns: EntityID -> int PropertyID -> int Rank -> float My algorithm needs to retrieve EntityID-Rank-Pairs for some given PropertyIDs. So I basically want to execute a "select EntityID,Rank from MyTable where PropertyID=123 oder by Rank desc". But I need to execute multiple of those statements andI don't want to load all the data into memory, but rather iterate over the results step by step, stopping at certain thresholds. My algorithm is somewhat nested and contains logic which I cannot express in SQL, but retrieving those pairs is the mostbasic operation I would need. Are cursors and option too? Is there a limitation for the number of open cursors? One callmight open 100 cursors or so. cheers, Achim
On Aug1, 2011, at 13:23 , Achim Domma wrote: > I have tables which store two integer IDs and a floating point rank. So the table MyTable might have these columns: > > EntityID -> int > PropertyID -> int > Rank -> float > > My algorithm needs to retrieve EntityID-Rank-Pairs for some given PropertyIDs. So I basically want to execute a "selectEntityID, Rank from MyTable where PropertyID=123 oder by Rank desc". But I need to execute multiple of those statementsand I don't want to load all the data into memory, but rather iterate over the results step by step, stopping atcertain thresholds. > > My algorithm is somewhat nested and contains logic which I cannot express in SQL, Are you aware that postgres supports recursive subselects via the SQL standard's WITH RECURSIVE notation? I've found thatWITH RECURSIVE lets you express a lot of things nicely which are otherwise very difficult, or even impossible, to expressin SQL. Whether or not WITH RECURSIVE helps depends on whether your algorithm can be stated as a kind of closure process- i.e., a process where you continue to compute new rows from existing ones until no more new rows can found. > but retrieving those pairs is the most basic operation I would need. > Are cursors and option too? It certainly seems so. Rows are computed and returned on demand when fetched through a cursor - opening the cursor does *not*,in general, load the whole result set into memory. There are statements where parts of the result are materializedbefore the first row is returned, though - statements which require a sorting step, for example. But in yourcase, an index on (PropertyID, Rank DESC) should avoid the need for explicit sorting, and instead allow the executorto read the rows in the desired output order. Thus, a cursor for your SQL statements should have a small startupcost and an equally small (and constant) cost per row afterwards. In fact, for your SQL statement, any hand-crafted code that you might come up with will end up being very similar to whatthe executor does if you simply us a cursor. > Is there a limitation for the number of open cursors? One call might open 100 cursors or so. I don't know of any hard limit, and a couple of 100 cursors doesn't sounds unreasonable. I suggest you simply try it out. best regards, Florian Pflug
On Mon, Aug 01, 2011 at 01:23:26PM +0200, Achim Domma wrote: > Am 26.07.2011 um 00:40 schrieb Florian Pflug: > > > On Jul25, 2011, at 22:31 , Achim Domma wrote: > >> Am 25.07.2011 um 14:48 schrieb Florian Pflug: > >>> A more low-level API is provided by > >>> {heap,index}_{beginscan,endscan}, heap_{insert,update,delete} > >>> and index_insert. However, correct handling of transactions > >>> using this API isn't easy - for example, to update a row you'd > >>> first have to find the latest version of that row, then decide > >>> if you're allowed to update it, and finally create a new > >>> version. > >> > >> I see the problems with the second approach, but that's > >> definitively what I would like to do. > > > > You're in for a lot of work, then. I still suggest that you > > explain your ultimate goals before you embark on your endeavor - > > people might be able to point our easier ways to achieve those. > > I have tables which store two integer IDs and a floating point rank. > So the table MyTable might have these columns: > > EntityID -> int PropertyID -> int Rank -> float > > My algorithm needs to retrieve EntityID-Rank-Pairs for some given > PropertyIDs. So I basically want to execute a "select EntityID, Rank > from MyTable where PropertyID=123 oder by Rank desc". But I need to > execute multiple of those statements and I don't want to load all > the data into memory, but rather iterate over the results step by > step, stopping at certain thresholds. > > My algorithm is somewhat nested and contains logic which I cannot > express in SQL, SQL is Turing-complete, so the chances are excellent that it's possible to express that algorithm in it. ;) Look into common table expressions for iteration/recursion, and windowing functions, commonly used for time series. http://www.postgresql.org/docs/current/static/queries-with.html http://www.postgresql.org/docs/current/static/tutorial-window.html Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Mon, Aug 1, 2011 at 11:04 AM, David Fetter <david@fetter.org> wrote: > SQL is Turing-complete, so the chances are excellent that it's > possible to express that algorithm in it. ;) > > Look into common table expressions for iteration/recursion, and > windowing functions, commonly used for time series. > > http://www.postgresql.org/docs/current/static/queries-with.html > http://www.postgresql.org/docs/current/static/tutorial-window.html That doesn't guarantee that it can be expressed efficiently, though. I think the OP may be going overboard by trying to bypass SPI, but putting the logic inside the server seems like it could well be worthwhile. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Aug 1, 2011 at 10:18 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Aug 1, 2011 at 11:04 AM, David Fetter <david@fetter.org> wrote: >> SQL is Turing-complete, so the chances are excellent that it's >> possible to express that algorithm in it. ;) >> >> Look into common table expressions for iteration/recursion, and >> windowing functions, commonly used for time series. >> >> http://www.postgresql.org/docs/current/static/queries-with.html >> http://www.postgresql.org/docs/current/static/tutorial-window.html > > That doesn't guarantee that it can be expressed efficiently, though. > I think the OP may be going overboard by trying to bypass SPI, but > putting the logic inside the server seems like it could well be > worthwhile. I think David is probably right and this can be handled in pure sql simply and easily (perhaps in a function, perhaps not). The SPI interface is great, but the sql and plpgsql languages are very powerful and should always be preferred over a C solution all else being equal. merlin
Am 01.08.2011 um 21:37 schrieb Merlin Moncure: > I think David is probably right and this can be handled in pure sql > simply and easily (perhaps in a function, perhaps not). The SPI > interface is great, but the sql and plpgsql languages are very > powerful and should always be preferred over a C solution all else > being equal. Thanks for all the input! I was not aware of all the available options. As I don't have time pressure at the moment, I'llinvestigate the different options further before deciding for one direction. I'm quite happy to see what's possible withPostgresql! :-) kind regards, Achim