Thread: Suggestions for the best strategy to emulate returning multiple sets of results

Suggestions for the best strategy to emulate returning multiple sets of results

From
Seref Arikan
Date:
Greetings,
I have a binary blog which is passed to a plpython function by a plpgsql function. plpython is used to create 2 different transformations of this binary blob to sets of postgresql type instances.
The flow is:  blob -> plpython -> canonical python based data model ->  (set of db_type_As + set of db_type_Bs)
The problem is, transforming the binary blob to postgresql is expensive, and a single binary blob is the source of two transformations. I have not found a way of returning to sets of data form the plpython function.
At the moment, I have two options:
1) calling two functions in plpython that use the same blob and return different sets of postgresql types (heavyweight transformation will happen twice: bad)
2) creating two temp tables and calling the plpython function which in turn writes to these temp tables, and then using the temp tables from plpgsql.

Do you think there are any other options that I might be missing? What would be the most efficient way of passing temp tables to plpython function?

Kind regards
Seref

On Mon, Oct 8, 2012 at 3:14 PM, Seref Arikan
<serefarikan@kurumsalteknoloji.com> wrote:
> Greetings,
> I have a binary blog which is passed to a plpython function by a plpgsql
> function. plpython is used to create 2 different transformations of this
> binary blob to sets of postgresql type instances.
> The flow is:  blob -> plpython -> canonical python based data model ->  (set
> of db_type_As + set of db_type_Bs)
> The problem is, transforming the binary blob to postgresql is expensive, and
> a single binary blob is the source of two transformations. I have not found
> a way of returning to sets of data form the plpython function.
> At the moment, I have two options:
> 1) calling two functions in plpython that use the same blob and return
> different sets of postgresql types (heavyweight transformation will happen
> twice: bad)
> 2) creating two temp tables and calling the plpython function which in turn
> writes to these temp tables, and then using the temp tables from plpgsql.
>
> Do you think there are any other options that I might be missing? What would
> be the most efficient way of passing temp tables to plpython function?

Are the two sets the same size?  If so, you probably want to do a
vanilla SRF.  If not, consider a a composite containing arrays:

create type foo as(a int[], b int[]);

CREATE FUNCTION get_stuff()
  RETURNS foo
AS $$
return [(1, 2, 3, 4, 5), (1,2,3)];
$$ LANGUAGE plpythonu;

select * from get_stuff();
postgres=# select * from get_stuff();
      a      |    b
-------------+---------
 {1,2,3,4,5} | {1,2,3}

merlin


Thanks Merlin,
I've  tried arrays but plpython does not support returning arrays of custom db types (which is what I'd need to do)

On Monday, 8 October 2012, Merlin Moncure wrote:
On Mon, Oct 8, 2012 at 3:14 PM, Seref Arikan
<serefarikan@kurumsalteknoloji.com> wrote:
> Greetings,
> I have a binary blog which is passed to a plpython function by a plpgsql
> function. plpython is used to create 2 different transformations of this
> binary blob to sets of postgresql type instances.
> The flow is:  blob -> plpython -> canonical python based data model ->  (set
> of db_type_As + set of db_type_Bs)
> The problem is, transforming the binary blob to postgresql is expensive, and
> a single binary blob is the source of two transformations. I have not found
> a way of returning to sets of data form the plpython function.
> At the moment, I have two options:
> 1) calling two functions in plpython that use the same blob and return
> different sets of postgresql types (heavyweight transformation will happen
> twice: bad)
> 2) creating two temp tables and calling the plpython function which in turn
> writes to these temp tables, and then using the temp tables from plpgsql.
>
> Do you think there are any other options that I might be missing? What would
> be the most efficient way of passing temp tables to plpython function?

Are the two sets the same size?  If so, you probably want to do a
vanilla SRF.  If not, consider a a composite containing arrays:

create type foo as(a int[], b int[]);

CREATE FUNCTION get_stuff()
  RETURNS foo
AS $$
return [(1, 2, 3, 4, 5), (1,2,3)];
$$ LANGUAGE plpythonu;

select * from get_stuff();
postgres=# select * from get_stuff();
      a      |    b
-------------+---------
 {1,2,3,4,5} | {1,2,3}

merlin
On Mon, Oct 8, 2012 at 4:59 PM, Seref Arikan
<serefarikan@kurumsalteknoloji.com> wrote:
> Thanks Merlin,
> I've  tried arrays but plpython does not support returning arrays of custom
> db types (which is what I'd need to do)


hm -- yeah.  can your custom types be broken down into plain SQL types
(that is, composite types?).  maybe stash the results in global
variable and return it in two calls, or insert into into a  tempt
table that drops on commit?

merlin


Hi Merlin,
Thanks for the response. At the moment, the main function is creating two temp tables that drops on commit, and python functions fills these. Not too bad, but I'd like to push these temp tables to ram, which is a bit tricky due to not having a direct method of doing this with postgresql. (a topic that has been discussed in the past in this mail group)

The global variable idea is interesting though. I have not encountered this before, is it the global dictionary SD/GD mentioned here: http://www.postgresql.org/docs/9.0/static/plpython-sharing.html ?
It may help perform the expensive transformations once and reuse the results.

Kind regards
Seref

On Wed, Oct 10, 2012 at 2:06 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Mon, Oct 8, 2012 at 4:59 PM, Seref Arikan
<serefarikan@kurumsalteknoloji.com> wrote:
> Thanks Merlin,
> I've  tried arrays but plpython does not support returning arrays of custom
> db types (which is what I'd need to do)


hm -- yeah.  can your custom types be broken down into plain SQL types
(that is, composite types?).  maybe stash the results in global
variable and return it in two calls, or insert into into a  tempt
table that drops on commit?

merlin

On Wed, Oct 10, 2012 at 8:27 AM, Seref Arikan
<serefarikan@kurumsalteknoloji.com> wrote:
> Hi Merlin,
> Thanks for the response. At the moment, the main function is creating two
> temp tables that drops on commit, and python functions fills these. Not too
> bad, but I'd like to push these temp tables to ram, which is a bit tricky
> due to not having a direct method of doing this with postgresql. (a topic
> that has been discussed in the past in this mail group)
>
> The global variable idea is interesting though. I have not encountered this
> before, is it the global dictionary SD/GD mentioned here:
> http://www.postgresql.org/docs/9.0/static/plpython-sharing.html ?
> It may help perform the expensive transformations once and reuse the
> results.

yeah.  maybe though you might find that the overhead of temp tables is
already pretty good -- they are mostly ram based in typical usage as
they aren't synced.  I find actually the greatest overhead in terms of
using them is creation and dropping -- so for very low latency
transactions I use a unlogged permanent table with value returned by
txid_current() as the leading field in the key.

merlin


Comments inline (sorry, did not cc the group in the other mail)

On Wed, Oct 10, 2012 at 2:55 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Wed, Oct 10, 2012 at 8:27 AM, Seref Arikan
<serefarikan@kurumsalteknoloji.com> wrote:
> Hi Merlin,
> Thanks for the response. At the moment, the main function is creating two
> temp tables that drops on commit, and python functions fills these. Not too
> bad, but I'd like to push these temp tables to ram, which is a bit tricky
> due to not having a direct method of doing this with postgresql. (a topic
> that has been discussed in the past in this mail group)
>
> The global variable idea is interesting though. I have not encountered this
> before, is it the global dictionary SD/GD mentioned here:
> http://www.postgresql.org/docs/9.0/static/plpython-sharing.html ?
> It may help perform the expensive transformations once and reuse the
> results.

yeah.  maybe though you might find that the overhead of temp tables is
already pretty good -- they are mostly ram based in typical usage as
they aren't synced.  I find actually the greatest overhead in terms of
using them is creation and dropping -- so for very low latency
transactions I use a unlogged permanent table with value returned by
txid_current() as the leading field in the key.
This is very interesting. The reason I've tried to avoid a shared temp table is that I'd have to have a session id for calls, which led to severe performance issues with the entity attribute value approach I'm using in the temp table.
Your approach sounds to have been designed to overcome my problem, but I have no idea  what an unlogged table does, and your use of txid_current. Could you explain a bit?

Regards
Seref
 

merlin

create a ramdrive
On Wed, 2012-10-10 at 14:27 +0100, Seref Arikan wrote:
> Hi Merlin,
> Thanks for the response. At the moment, the main function is creating
> two temp tables that drops on commit, and python functions fills
> these. Not too bad, but I'd like to push these temp tables to ram,
> which is a bit tricky due to not having a direct method of doing this
> with postgresql. (a topic that has been discussed in the past in this
> mail group)
>
> The global variable idea is interesting though. I have not encountered
> this before, is it the global dictionary SD/GD mentioned here:
> http://www.postgresql.org/docs/9.0/static/plpython-sharing.html ?
> It may help perform the expensive transformations once and reuse the
> results.
>
> Kind regards
> Seref
>
> On Wed, Oct 10, 2012 at 2:06 PM, Merlin Moncure <mmoncure@gmail.com>
> wrote:
>         On Mon, Oct 8, 2012 at 4:59 PM, Seref Arikan
>         <serefarikan@kurumsalteknoloji.com> wrote:
>         > Thanks Merlin,
>         > I've  tried arrays but plpython does not support returning
>         arrays of custom
>         > db types (which is what I'd need to do)
>
>
>
>         hm -- yeah.  can your custom types be broken down into plain
>         SQL types
>         (that is, composite types?).  maybe stash the results in
>         global
>         variable and return it in two calls, or insert into into a
>          tempt
>         table that drops on commit?
>
>         merlin
>




Thanks Bret,
I'm concerned about what happens when my functions under high load fills the ramdrive with temporary tables I'm using. The advantage of telling postgres to use ram with an option to fall back to disk is significantly better in terms of uptime.
However, I was thinking about some mechanism in the middle tier that watches the space in the ram drive and redirects queries to functions that create temp tables on disk, if ram drive is close to full. That may help me accomplish what I'm trying to

Regards
Seref


On Wed, Oct 10, 2012 at 3:58 PM, Bret Stern <bret_stern@machinemanagement.com> wrote:
create a ramdrive
On Wed, 2012-10-10 at 14:27 +0100, Seref Arikan wrote:
> Hi Merlin,
> Thanks for the response. At the moment, the main function is creating
> two temp tables that drops on commit, and python functions fills
> these. Not too bad, but I'd like to push these temp tables to ram,
> which is a bit tricky due to not having a direct method of doing this
> with postgresql. (a topic that has been discussed in the past in this
> mail group)
>
> The global variable idea is interesting though. I have not encountered
> this before, is it the global dictionary SD/GD mentioned here:
> http://www.postgresql.org/docs/9.0/static/plpython-sharing.html ?
> It may help perform the expensive transformations once and reuse the
> results.
>
> Kind regards
> Seref
>
> On Wed, Oct 10, 2012 at 2:06 PM, Merlin Moncure <mmoncure@gmail.com>
> wrote:
>         On Mon, Oct 8, 2012 at 4:59 PM, Seref Arikan
>         <serefarikan@kurumsalteknoloji.com> wrote:
>         > Thanks Merlin,
>         > I've  tried arrays but plpython does not support returning
>         arrays of custom
>         > db types (which is what I'd need to do)
>
>
>
>         hm -- yeah.  can your custom types be broken down into plain
>         SQL types
>         (that is, composite types?).  maybe stash the results in
>         global
>         variable and return it in two calls, or insert into into a
>          tempt
>         table that drops on commit?
>
>         merlin
>



On 2012-10-10, Seref Arikan <serefarikan@kurumsalteknoloji.com> wrote:
> --f46d0443048225e0e704cbb5e0ee
> Content-Type: text/plain; charset=ISO-8859-1
>
> Thanks Bret,
> I'm concerned about what happens when my functions under high load fills
> the ramdrive with temporary tables I'm using. The advantage of telling
> postgres to use ram with an option to fall back to disk is significantly
> better in terms of uptime.
> However, I was thinking about some mechanism in the middle tier that
> watches the space in the ram drive and redirects queries to functions that
> create temp tables on disk, if ram drive is close to full. That may help me
> accomplish what I'm trying to

That's what operating systems are for,  ramdisk is only ever a hint,
is ram is short it will wind up in swap, if ram is plentiful a disk
table will be fully buffered in ram.

--
⚂⚃ 100% natural

Jasen,
Thanks for this. My last use of ramdisk was ages ago and I've always had the idea that it was just a disk in ram with no capability to spill over to disk.
It appears the mind refuses to acknowledge that this has been the situation many years ago  :)

Some google searches returned others asking the same question, surely someone must have properly established this under *nix. I'll keep searching, and post my solution for feedback.

Kind regards
Seref

On Sun, Oct 14, 2012 at 5:54 AM, Jasen Betts <jasen@xnet.co.nz> wrote:
On 2012-10-10, Seref Arikan <serefarikan@kurumsalteknoloji.com> wrote:
> --f46d0443048225e0e704cbb5e0ee
> Content-Type: text/plain; charset=ISO-8859-1
>
> Thanks Bret,
> I'm concerned about what happens when my functions under high load fills
> the ramdrive with temporary tables I'm using. The advantage of telling
> postgres to use ram with an option to fall back to disk is significantly
> better in terms of uptime.
> However, I was thinking about some mechanism in the middle tier that
> watches the space in the ram drive and redirects queries to functions that
> create temp tables on disk, if ram drive is close to full. That may help me
> accomplish what I'm trying to

That's what operating systems are for,  ramdisk is only ever a hint,
is ram is short it will wind up in swap, if ram is plentiful a disk
table will be fully buffered in ram.

--
⚂⚃ 100% natural



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general