Thread: Does a call to a language handler provide a context/session, and somewhere to keep session data?

Given that a language handler would be expected to be persistent, and to support concurrent (and reentrant) calls
withina single database, is there a unique context or session identifier available? 

Is there a place to store data, to be retrieved later in the same session?

Is there a place to store data (a cache?) that has been retrieved from the database for use by concurrent sessions
usingthat database? 

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org




On Sun, Mar 6, 2016 at 10:21 PM, <david@andl.org> wrote:
Given that a language handler would be expected to be persistent, and to support concurrent (and reentrant) calls within a single database, is there a unique context or session identifier available?

Is there a place to store data, to be retrieved later in the same session?

Is there a place to store data (a cache?) that has been retrieved from the database for use by concurrent sessions using that database?


​PL/R also has an implementation for this kind of thing.

David J.

Thank you for your response.

 

Yes, I was aware of GD and SD. My question is about what facilities Postgres provides for implementing such a thing. Where is the proper place for the root of the SD/GD? What does an implementation use to determine that two calls belong to the same session?

 

I’m not finding that easy to understand by reading source code.

 

Regards

David M Bennett FACS


MD Powerflex Corporation, creators of PFXplus

To contact us, please call +61-3-9548-9114 or go to www.pfxcorp.com/contact.htm

 

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of David G. Johnston
Sent: Monday, 7 March 2016 4:28 PM
To: david@andl.org
Cc: pgsql-general-owner+M220479=david=andl.org@postgresql.org; pgsql-general <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] Does a call to a language handler provide a context/session, and somewhere to keep session data?

 

On Sun, Mar 6, 2016 at 10:21 PM, <david@andl.org> wrote:

Given that a language handler would be expected to be persistent, and to support concurrent (and reentrant) calls within a single database, is there a unique context or session identifier available?

Is there a place to store data, to be retrieved later in the same session?

Is there a place to store data (a cache?) that has been retrieved from the database for use by concurrent sessions using that database?

 

 

​PL/R also has an implementation for this kind of thing.

 

David J.

 

On 3/7/2016 1:34 AM, david@andl.org wrote:

Yes, I was aware of GD and SD. My question is about what facilities Postgres provides for implementing such a thing. Where is the proper place for the root of the SD/GD? What does an implementation use to determine that two calls belong to the same session?



the process ID is unique for each active session.   of course, the OS can recycle a PID when a process/connection terminates


-- 
john r pierce, recycling bits in santa cruz

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of John R Pierce

Yes, I was aware of GD and SD. My question is about what facilities Postgres provides for implementing such a thing. Where is the proper place for the root of the SD/GD? What does an implementation use to determine that two calls belong to the same session?

the process ID is unique for each active session.   of course, the OS can recycle a PID when a process/connection terminates

[dmb>] Thanks for the idea, but I’m wary of using PID for that purpose.

[dmb>] In the Python implementation the GD appears to just be stored as a simple variable at file scope in the DLL. Would I be right in saying that the language handler DLL is loaded exactly once for each session (when the language is first used)? If so, then any unique identifier allocated in PG_init (such as a GUID or timestamp or counter) would seem to serve the purpose. I just wondered if there was something clever I hadn’t found out about yet.

 

Regards

David M Bennett FACS


Andl - A New Database Language - andl.org

 

On March 8, 2016 12:18:08 AM david@andl.org wrote:
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of John R Pierce
>
>
>
> Yes, I was aware of GD and SD. My question is about what facilities Postgres
> provides for implementing such a thing. Where is the proper place for the
> root of the SD/GD? What does an implementation use to determine that two
> calls belong to the same session?
>
> the process ID is unique for each active session.   of course, the OS can
> recycle a PID when a process/connection terminates
>
> [dmb>] Thanks for the idea, but I’m wary of using PID for that purpose.
>
> [dmb>] In the Python implementation the GD appears to just be stored as a
> simple variable at file scope in the DLL. Would I be right in saying that
> the language handler DLL is loaded exactly once for each session (when the
> language is first used)? If so, then any unique identifier allocated in
> PG_init (such as a GUID or timestamp or counter) would seem to serve the
> purpose. I just wondered if there was something clever I hadn’t found out
> about yet.
>

One thing that's probably key here is that pgsql isn't multi-threaded.
Individual connections are handled by forked backends, which share a shared-
memory cache that's not accessible by SQL-land code (which includes language
handlers). So I think your problem goes away once you realize that all the
data you have is tied to a single connection anyway.

You cannot use multi-threaded code (which touches the database) in language
handlers or other "plug-in" code.

Also, trying to outsmart the db engine's cache by building your own is usually
an exercise in futility and often counter-productive. I speak from experience
:-P



Thank you for your response.

 

Yes, I was aware of GD and SD. My question is about what facilities Postgres provides for implementing such a thing. Where is the proper place for the root of the SD/GD? What does an implementation use to determine that two calls belong to the same session?

 

I’m not finding that easy to understand by reading source code.

 

Regards

David M Bennett FACS


MD Powerflex Corporation, creators of PFXplus

To contact us, please call +61-3-9548-9114 or go to www.pfxcorp.com/contact.htm

 

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of David G. Johnston
Sent: Monday, 7 March 2016 4:28 PM
To: david@andl.org
Cc: pgsql-general-owner+M220479=david=andl.org@postgresql.org; pgsql-general <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] Does a call to a language handler provide a context/session, and somewhere to keep session data?

 

On Sun, Mar 6, 2016 at 10:21 PM, <david@andl.org> wrote:

Given that a language handler would be expected to be persistent, and to support concurrent (and reentrant) calls within a single database, is there a unique context or session identifier available?

Is there a place to store data, to be retrieved later in the same session?

Is there a place to store data (a cache?) that has been retrieved from the database for use by concurrent sessions using that database?

 

 

​PL/R also has an implementation for this kind of thing.

 

David J.

 

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jan de Visser

One thing that's probably key here is that pgsql isn't multi-threaded.
Individual connections are handled by forked backends, which share a shared- memory cache that's not accessible by
SQL-landcode (which includes language handlers). So I think your problem goes away once you realize that all the data
youhave is tied to a single connection anyway. 

[dmb>]
[dmb>] Thanks. Yes, I knew that. My problem is a strategy to start up the language engine once (per session), load
stufffrom the database once, and then keep the per-call cost as low as possible. I know that palloc() memory goes away;
thatit has a lifetime longer than a function invocation and should span a recursive function call, but definitely not
longerthan a transaction. So I need to get some memory per session and keep a pointer to it per session (which it seems
isper process). 

You cannot use multi-threaded code (which touches the database) in language handlers or other "plug-in" code.
[dmb>] That's good -- I hate threads. I've never been able to figure out how to test them adequately. No loss there.

Also, trying to outsmart the db engine's cache by building your own is usually an exercise in futility and often
counter-productive.I speak from experience :-P 
[dmb>] Isn't cache invalidation the second hardest thing in computing (after naming things)?

[dmb>] Seems like DLL static memory with allocation from process memory (or even malloc()) is "the simplest thing that
couldpossibly work". 

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org






On 3/7/2016 3:28 PM, david@andl.org wrote:
> [dmb>] Thanks. Yes, I knew that. My problem is a strategy to start up the language engine once (per session), load
stufffrom the database once, and then keep the per-call cost as low as possible.... 

this stuff you're loading from the database once, that's just data about
your language plugin's configuration, or is it user data, or what?

if its just a few global settings, you should consider using custom
settings variables, rather than database tables.   for instance, pljava
has a setting,
pljava.libjvm_location='/usr/lib/jvm/java-1.8.0/lib/libjvm.so' or
whatever which it uses to find the Java native calls interface library...




--
john r pierce, recycling bits in santa cruz



From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of John R Pierce

this stuff you're loading from the database once, that's just data about your language plugin's configuration, or is it
userdata, or what? 
[dmb>] It's the catalog for Andl. It contains defined functions, types, persistent scalar (non table) data values and
linksto tables.  

if its just a few global settings, you should consider using custom
settings variables, rather than database tables.   for instance, pljava has a setting,
pljava.libjvm_location='/usr/lib/jvm/java-1.8.0/lib/libjvm.so' or whatever which it uses to find the Java native calls
interfacelibrary... 
[dmb>] Andl has something similar, but that problem is already solved.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org







On March 8, 2016 11:35:00 AM david@andl.org wrote:
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of John R Pierce
>
> this stuff you're loading from the database once, that's just data about
> your language plugin's configuration, or is it user data, or what? [dmb>]
> It's the catalog for Andl. It contains defined functions, types, persistent
> scalar (non table) data values and links to tables.
>
> if its just a few global settings, you should consider using custom
> settings variables, rather than database tables.   for instance, pljava has
> a setting, pljava.libjvm_location='/usr/lib/jvm/java-1.8.0/lib/libjvm.so'
> or whatever which it uses to find the Java native calls interface
> library... [dmb>] Andl has something similar, but that problem is already
> solved.

You're being pretty oblique about what it is you're trying to achieve.

To go back to one of your earlier emails: the hardest problem in computing
isn't cache invalidation. It is clearly explaining what the problem at hand
is.


From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jan de Visser

One thing that's probably key here is that pgsql isn't multi-threaded.
Individual connections are handled by forked backends, which share a shared- memory cache that's not accessible by
SQL-landcode (which includes language handlers). So I think your problem goes away once you realize that all the data
youhave is tied to a single connection anyway. 

[dmb>]
[dmb>] Thanks. Yes, I knew that. My problem is a strategy to start up the language engine once (per session), load
stufffrom the database once, and then keep the per-call cost as low as possible. I know that palloc() memory goes away;
thatit has a lifetime longer than a function invocation and should span a recursive function call, but definitely not
longerthan a transaction. So I need to get some memory per session and keep a pointer to it per session (which it seems
isper process). 

You cannot use multi-threaded code (which touches the database) in language handlers or other "plug-in" code.
[dmb>] That's good -- I hate threads. I've never been able to figure out how to test them adequately. No loss there.

Also, trying to outsmart the db engine's cache by building your own is usually an exercise in futility and often
counter-productive.I speak from experience :-P 
[dmb>] Isn't cache invalidation the second hardest thing in computing (after naming things)?

[dmb>] Seems like DLL static memory with allocation from process memory (or even malloc()) is "the simplest thing that
couldpossibly work". 

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org







From: Jan de Visser [mailto:jan@de-visser.net]

You're being pretty oblique about what it is you're trying to achieve.
[dmb>] Sorry you see it that way. I know exactly where I'm trying to get to,
but it would take many pages to explain and I don't want to unduly trouble
other busy people.

To go back to one of your earlier emails: the hardest problem in computing
isn't cache invalidation. It is clearly explaining what the problem at hand
is.
[dmb>] What would you like to know? You can find out about Andl by following
the link in my footer. You can find out about me from various links. I have
a long history of experience in C/C++ in multiple environments. Andl is
written in C#, and I've been doing that since pre 1.0. Much of my life has
been writing compilers and related tools for developers, never for end
users. I don't have problems that look anything like the others I see on
this list.

I know from experience that writing long questions is a good way to get
ignored. So I try to pick out one hard question and ask it as briefly as
possible, in the hope that someone with deep Postgres knowledge will
understand what I need and help me find it. At the moment I have two
pressing problems.

One is the start-up phase: getting the Andl runtime stoked up, load its
catalog, set up its execution environment (including its own type system),
ready for business. That process in Postgres seems to be undocumented, but I
think I have it sorted (barring memory lifetime issues down the track).

The other is type conversions: incoming and outgoing. That is undocumented
too, and that's disappointing. Anyone writing functions or a language
handler will really need this. I'm finding it hard to pick a good path right
now.

The third would be queries, but that doesn't look too hard. SPI is quite
well documented.


Regards
David M Bennett FACS

Andl - A New Database Language - andl.org







On March 8, 2016 03:00:14 PM david@andl.org wrote:
> At the moment I have two pressing problems.
>
> One is the start-up phase: getting the Andl runtime stoked up, load its
> catalog, set up its execution environment (including its own type system),
> ready for business. That process in Postgres seems to be undocumented, but I
> think I have it sorted (barring memory lifetime issues down the track).
>
> The other is type conversions: incoming and outgoing. That is undocumented
> too, and that's disappointing. Anyone writing functions or a language
> handler will really need this. I'm finding it hard to pick a good path right
> now.
>
> The third would be queries, but that doesn't look too hard. SPI is quite
> well documented.
>

OK, still not quite sure what your architecture is. Is it


(1)

  +---------------+
  | Client system |
  +---------------+
          |
          v
  +---------------+
  |      andl     |
  +---------------+
          |
          v (SQL)
  +---------------+
  |     pgsql     |
  +---------------+

Or (2)

  +---------------+
  | Client system |
  +---------------+
          |
          v (SQL)
  +---------------+    +----------+
  |  driver/pgsql | -> |   andl   |
  +---------------+    +----------+


In case (1), you're writing a driver: you abstract out the actual datastore
from your client program using andl. In case (2) you have a procedural
language handler where your client program still consciously connects to a
pgsql database, and within that database some/all data processing is delegated
to andl.

The reason I'm asking is to set terminology. I've browsed some of your
website, and I'm still not clear which of the two options you're after. It
could even be both I think. So let's set some parameters.

Depending on where that goes, you should get pgsql-hackers involved.


I'm not asking you this because


From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jan de Visser

OK, still not quite sure what your architecture is. Is it


(1)

  +---------------+
  | Client system |
  +---------------+
          |
          v
  +---------------+
  |      andl     |
  +---------------+
          |
          v (SQL)
  +---------------+
  |     pgsql     |
  +---------------+

Or (2)

  +---------------+
  | Client system |
  +---------------+
          |
          v (SQL)
  +---------------+    +----------+
  |  driver/pgsql | -> |   andl   |
  +---------------+    +----------+


In case (1), you're writing a driver: you abstract out the actual datastore
from your client program using andl. In case (2) you have a procedural
language handler where your client program still consciously connects to a
pgsql database, and within that database some/all data processing is
delegated to andl.

[dmb>] It's a fair question, and not obvious at various levels. The plan at
this stage is to continue to use as much of the Postgres client
infrastructure as possible, which means sticking with (2). The client will
definitely connect with a Postgres database and could use ODBC/JDBC/libpq or
whatever.

[dmb>] But the logical model is more like (1). Andl compiles into byte code
which is stored as functions in its catalog (which BTW is just another
Postgres table). The only useful SQL query is: "SELECT func(args);", which
executes an Andl function with some arguments and returns a result set. It
is not intended to ever embed an Andl function call into an SQL query as I
think is done with other languages.

[dmb>] But it would be equally possible to implement (1) directly, using the
Thrift interface (or something else that I haven't tripped over yet). I have
an Sqlite implementation that works like that.

The reason I'm asking is to set terminology. I've browsed some of your
website, and I'm still not clear which of the two options you're after. It
could even be both I think. So let's set some parameters.
[dmb>] The website always lags behind. I only started with Postgres about 2
weeks ago, and I really am only now getting a clear enough idea to be able
to write coherently about it.

Depending on where that goes, you should get pgsql-hackers involved.
[dmb>] Love to. It takes time to learn the "culture of the lists" too.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org






On 03/08/2016 03:44 PM, david@andl.org wrote:
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jan de Visser
>
> OK, still not quite sure what your architecture is. Is it
>
>
> (1)
>
>    +---------------+
>    | Client system |
>    +---------------+
>            |
>            v
>    +---------------+
>    |      andl     |
>    +---------------+
>            |
>            v (SQL)
>    +---------------+
>    |     pgsql     |
>    +---------------+
>
> Or (2)
>
>    +---------------+
>    | Client system |
>    +---------------+
>            |
>            v (SQL)
>    +---------------+    +----------+
>    |  driver/pgsql | -> |   andl   |
>    +---------------+    +----------+
>
>
> In case (1), you're writing a driver: you abstract out the actual datastore
> from your client program using andl. In case (2) you have a procedural
> language handler where your client program still consciously connects to a
> pgsql database, and within that database some/all data processing is
> delegated to andl.
>
> [dmb>] It's a fair question, and not obvious at various levels. The plan at
> this stage is to continue to use as much of the Postgres client
> infrastructure as possible, which means sticking with (2). The client will
> definitely connect with a Postgres database and could use ODBC/JDBC/libpq or
> whatever.
>
> [dmb>] But the logical model is more like (1). Andl compiles into byte code
> which is stored as functions in its catalog (which BTW is just another
> Postgres table). The only useful SQL query is: "SELECT func(args);", which
> executes an Andl function with some arguments and returns a result set. It
> is not intended to ever embed an Andl function call into an SQL query as I
> think is done with other languages.
>
> [dmb>] But it would be equally possible to implement (1) directly, using the
> Thrift interface (or something else that I haven't tripped over yet). I have
> an Sqlite implementation that works like that.
>
> The reason I'm asking is to set terminology. I've browsed some of your
> website, and I'm still not clear which of the two options you're after. It
> could even be both I think. So let's set some parameters.
> [dmb>] The website always lags behind. I only started with Postgres about 2
> weeks ago, and I really am only now getting a clear enough idea to be able
> to write coherently about it.
>
> Depending on where that goes, you should get pgsql-hackers involved.
> [dmb>] Love to. It takes time to learn the "culture of the lists" too.

On that subject. I have followed this thread, but only sort of as the
quoting your email client is doing tends to obscure the flow. I know for
me that is making it hard to follow your ideas.

Is that something that can be changed?

>
> Regards
> David M Bennett FACS
>
> Andl - A New Database Language - andl.org
>
>
>
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


> owner@postgresql.org] On Behalf Of Adrian Klaver

> On that subject. I have followed this thread, but only sort of as the
> quoting your email client is doing tends to obscure the flow. I know
> for me that is making it hard to follow your ideas.
>
> Is that something that can be changed?

I use Outlook, and while it has lots of options, I can only pick one. I've
now switched to what seems to work best for plain text mailing lists, but it
looks terrible on anything else. Very annoying.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org







On 03/08/2016 04:39 PM, david@andl.org wrote:
>> owner@postgresql.org] On Behalf Of Adrian Klaver
>
>> On that subject. I have followed this thread, but only sort of as the
>> quoting your email client is doing tends to obscure the flow. I know
>> for me that is making it hard to follow your ideas.
>>
>> Is that something that can be changed?
>
> I use Outlook, and while it has lots of options, I can only pick one. I've
> now switched to what seems to work best for plain text mailing lists, but it
> looks terrible on anything else. Very annoying.

Thanks for trying.

>
> Regards
> David M Bennett FACS
>
> Andl - A New Database Language - andl.org
>


--
Adrian Klaver
adrian.klaver@aklaver.com


david@andl.org wrote:
> > owner@postgresql.org] On Behalf Of Adrian Klaver
>
> > On that subject. I have followed this thread, but only sort of as the
> > quoting your email client is doing tends to obscure the flow. I know
> > for me that is making it hard to follow your ideas.
> >
> > Is that something that can be changed?
>
> I use Outlook, and while it has lots of options, I can only pick one. I've
> now switched to what seems to work best for plain text mailing lists, but it
> looks terrible on anything else. Very annoying.

On that subject.  I noticed that Outlook seems to add the "return path"
addresses (sometimes called bounce address or envelope sender) to the CC
header, which sets a new record in the stupidity scale.  Since we use
VERP, each message gets a different return path address, so with each
reply you make, Outlook adds a new address to the CC.

Maybe this is tweakable.  If so, please turn it off.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services