Thread: Access to current database from C-language function

Access to current database from C-language function

From

Achim Domma

Date:

25 July 2011, 09:03:55

Hi,

I have read http://www.postgresql.org/docs/9.1/static/xfunc-c.html and my idea is, to write a C function which returns
aset of rows. To generate the result set, I would like to access indexes directly using the information I found at
http://www.postgresql.org/docs/9.1/static/indexam.html.But I don't get the idea how to glue both parts together!? Could
somebodygive me a starting point? How do I get a handle to the current database inside a C function? 

cheers,
Achim

Re: Access to current database from C-language function

From

Florian Pflug

Date:

25 July 2011, 09:48:26

On Jul25, 2011, at 13:40 , Achim Domma wrote:
> I have read http://www.postgresql.org/docs/9.1/static/xfunc-c.html and my idea is, to write a C function which
returnsa set of rows. To generate the result set, I would like to access indexes directly using the information I found
athttp://www.postgresql.org/docs/9.1/static/indexam.html. But I don't get the idea how to glue both parts together!?
Couldsomebody give me a starting point? How do I get a handle to the current database inside a C function? 

If you want to access the database using SQL from C-language functions, there the SPI API for that.
(http://www.postgresql.org/docs/9.0/interactive/spi.html).The API provided by SPI is conceptually similar to the one
providedby the client library libpq, i.e. it works in terms of statements, cursors, ...). SPI takes care of handling
allthe low-level details like making sure you're using a valid snapshot, are inside a transaction, correctly handle
lockedrows, ... 

A more low-level API is provided by {heap,index}_{beginscan,endscan}, heap_{insert,update,delete} and index_insert.
However,correct handling of transactions using this API isn't easy - for example, to update a row you'd first have to
findthe latest version of that row, then decide if you're allowed to update it, and finally create a new version. 

best regards,
Florian Pflug

Re: Access to current database from C-language function

From

Achim Domma

Date:

25 July 2011, 17:31:19

Am 25.07.2011 um 14:48 schrieb Florian Pflug:

> A more low-level API is provided by {heap,index}_{beginscan,endscan}, heap_{insert,update,delete} and index_insert.
However,correct handling of transactions using this API isn't easy - for example, to update a row you'd first have to
findthe latest version of that row, then decide if you're allowed to update it, and finally create a new version. 
>

I see the problems with the second approach, but that's definitively what I would like to do. But I'm only interested
inreading, which will hopefully make it a bit easier. Could you guide me to a starting point? Assuming my database has
atable T with an index I. How do I get access to that index? 

cheers,
Achim

Re: Access to current database from C-language function

From

Florian Pflug

Date:

25 July 2011, 19:40:12

On Jul25, 2011, at 22:31 , Achim Domma wrote:
> Am 25.07.2011 um 14:48 schrieb Florian Pflug:
>> A more low-level API is provided by {heap,index}_{beginscan,endscan}, heap_{insert,update,delete} and index_insert.
However,correct handling of transactions using this API isn't easy - for example, to update a row you'd first have to
findthe latest version of that row, then decide if you're allowed to update it, and finally create a new version. 
>
> I see the problems with the second approach, but that's definitively what I would like to do.

You're in for a lot of work, then. I still suggest that you explain your ultimate goals before you embark on your
endeavor- people might be able to point our easier ways to achieve those. 

> But I'm only interested in reading, which will hopefully make it a bit easier.

Maybe. But note that if you want your code to be correct for all transaction isolation level, you have to be quite
carefuleven when only reading. Especially from postgres 9.1 forward, due to the new true serializability feature of
thatrelease. 

> Could you guide me to a starting point? Assuming my database has a table T with an index I. How do I get access to
thatindex? 

I suggest you start by reading nodeIndexScan.c. This is the code the query executor uses to implement index scans.
However,before you embark on your endeavor. 

best regards,
Florian Pflug

Re: Access to current database from C-language function

From

Achim Domma

Date:

01 August 2011, 08:23:35

Am 26.07.2011 um 00:40 schrieb Florian Pflug:

> On Jul25, 2011, at 22:31 , Achim Domma wrote:
>> Am 25.07.2011 um 14:48 schrieb Florian Pflug:
>>> A more low-level API is provided by {heap,index}_{beginscan,endscan}, heap_{insert,update,delete} and index_insert.
However,correct handling of transactions using this API isn't easy - for example, to update a row you'd first have to
findthe latest version of that row, then decide if you're allowed to update it, and finally create a new version. 
>>
>> I see the problems with the second approach, but that's definitively what I would like to do.
>
> You're in for a lot of work, then. I still suggest that you explain your ultimate goals before you embark on your
endeavor- people might be able to point our easier ways to achieve those. 

I have tables which store two integer IDs and a floating point rank. So the table MyTable might have these columns:

EntityID -> int
PropertyID -> int
Rank -> float

My algorithm needs to retrieve EntityID-Rank-Pairs for some given PropertyIDs. So I basically want to execute a "select
EntityID,Rank from MyTable where PropertyID=123 oder by Rank desc". But I need to execute multiple of those statements
andI don't want to load all the data into memory, but rather iterate over the results step by step, stopping at certain
thresholds.

My algorithm is somewhat nested and contains logic which I cannot express in SQL, but retrieving those pairs is the
mostbasic operation I would need. Are cursors and option too? Is there a limitation for the number of open cursors? One
callmight open 100 cursors or so. 

cheers,
Achim

Re: Access to current database from C-language function

From

Florian Pflug

Date:

01 August 2011, 09:01:40

On Aug1, 2011, at 13:23 , Achim Domma wrote:
> I have tables which store two integer IDs and a floating point rank. So the table MyTable might have these columns:
>
> EntityID -> int
> PropertyID -> int
> Rank -> float
>
> My algorithm needs to retrieve EntityID-Rank-Pairs for some given PropertyIDs. So I basically want to execute a
"selectEntityID, Rank from MyTable where PropertyID=123 oder by Rank desc". But I need to execute multiple of those
statementsand I don't want to load all the data into memory, but rather iterate over the results step by step, stopping
atcertain thresholds. 
>
> My algorithm is somewhat nested and contains logic which I cannot express in SQL,

Are you aware that postgres supports recursive subselects via the SQL standard's WITH RECURSIVE notation? I've found
thatWITH RECURSIVE lets you express a lot of things nicely which are otherwise very difficult, or even impossible, to
expressin SQL. Whether or not WITH RECURSIVE helps depends on whether your algorithm can be stated as a kind of closure
process- i.e., a process where you continue to compute new rows from existing ones until no more new rows can found. 

> but retrieving those pairs is the most basic operation I would need.
> Are cursors and option too?

It certainly seems so. Rows are computed and returned on demand when fetched through a cursor - opening the cursor does
*not*,in general, load the whole result set into memory. There are statements where parts of the result are
materializedbefore the first row is returned, though - statements which require a sorting step, for example. But in
yourcase, an index on (PropertyID, Rank DESC) should avoid the need for explicit sorting, and instead allow the
executorto read the rows in the desired output order. Thus, a cursor for your SQL statements should have a small
startupcost and an equally small (and constant) cost per row afterwards. 

In fact, for your SQL statement, any hand-crafted code that you might come up with will end up being very similar to
whatthe executor does if you simply us a cursor. 

> Is there a limitation for the number of open cursors? One call might open 100 cursors or so.

I don't know of any hard limit, and a couple of 100 cursors doesn't sounds unreasonable. I suggest you simply try it
out.

best regards,
Florian Pflug

Re: Access to current database from C-language function

From

David Fetter

Date:

01 August 2011, 12:05:11

On Mon, Aug 01, 2011 at 01:23:26PM +0200, Achim Domma wrote:
> Am 26.07.2011 um 00:40 schrieb Florian Pflug:
> 
> > On Jul25, 2011, at 22:31 , Achim Domma wrote:
> >> Am 25.07.2011 um 14:48 schrieb Florian Pflug:
> >>> A more low-level API is provided by
> >>> {heap,index}_{beginscan,endscan}, heap_{insert,update,delete}
> >>> and index_insert. However, correct handling of transactions
> >>> using this API isn't easy - for example, to update a row you'd
> >>> first have to find the latest version of that row, then decide
> >>> if you're allowed to update it, and finally create a new
> >>> version.
> >> 
> >> I see the problems with the second approach, but that's
> >> definitively what I would like to do.
> > 
> > You're in for a lot of work, then. I still suggest that you
> > explain your ultimate goals before you embark on your endeavor -
> > people might be able to point our easier ways to achieve those.
> 
> I have tables which store two integer IDs and a floating point rank.
> So the table MyTable might have these columns:
> 
> EntityID -> int PropertyID -> int Rank -> float
> 
> My algorithm needs to retrieve EntityID-Rank-Pairs for some given
> PropertyIDs. So I basically want to execute a "select EntityID, Rank
> from MyTable where PropertyID=123 oder by Rank desc". But I need to
> execute multiple of those statements and I don't want to load all
> the data into memory, but rather iterate over the results step by
> step, stopping at certain thresholds.
> 
> My algorithm is somewhat nested and contains logic which I cannot
> express in SQL,

SQL is Turing-complete, so the chances are excellent that it's
possible to express that algorithm in it. ;)

Look into common table expressions for iteration/recursion, and
windowing functions, commonly used for time series.

http://www.postgresql.org/docs/current/static/queries-with.html
http://www.postgresql.org/docs/current/static/tutorial-window.html

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Access to current database from C-language function

From

Robert Haas

Date:

01 August 2011, 12:18:44

On Mon, Aug 1, 2011 at 11:04 AM, David Fetter <david@fetter.org> wrote:
> SQL is Turing-complete, so the chances are excellent that it's
> possible to express that algorithm in it. ;)
>
> Look into common table expressions for iteration/recursion, and
> windowing functions, commonly used for time series.
>
> http://www.postgresql.org/docs/current/static/queries-with.html
> http://www.postgresql.org/docs/current/static/tutorial-window.html

That doesn't guarantee that it can be expressed efficiently, though.
I think the OP may be going overboard by trying to bypass SPI, but
putting the logic inside the server seems like it could well be
worthwhile.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Access to current database from C-language function

From

Merlin Moncure

Date:

01 August 2011, 16:37:30

On Mon, Aug 1, 2011 at 10:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Aug 1, 2011 at 11:04 AM, David Fetter <david@fetter.org> wrote:
>> SQL is Turing-complete, so the chances are excellent that it's
>> possible to express that algorithm in it. ;)
>>
>> Look into common table expressions for iteration/recursion, and
>> windowing functions, commonly used for time series.
>>
>> http://www.postgresql.org/docs/current/static/queries-with.html
>> http://www.postgresql.org/docs/current/static/tutorial-window.html
>
> That doesn't guarantee that it can be expressed efficiently, though.
> I think the OP may be going overboard by trying to bypass SPI, but
> putting the logic inside the server seems like it could well be
> worthwhile.

I think David is probably right and this can be handled in pure sql
simply and easily (perhaps in a function, perhaps not).  The SPI
interface is great, but the sql and plpgsql languages are very
powerful and should always be preferred over a C solution all else
being equal.

merlin

Re: Access to current database from C-language function

From

Achim Domma

Date:

02 August 2011, 09:48:17

Am 01.08.2011 um 21:37 schrieb Merlin Moncure:

> I think David is probably right and this can be handled in pure sql
> simply and easily (perhaps in a function, perhaps not).  The SPI
> interface is great, but the sql and plpgsql languages are very
> powerful and should always be preferred over a C solution all else
> being equal.

Thanks for all the input! I was not aware of all the available options. As I don't have time pressure at the moment,
I'llinvestigate the different options further before deciding for one direction. I'm quite happy to see what's possible
withPostgresql! :-) 

kind regards,
Achim