Thread: looping over a small record set over and over in a function

looping over a small record set over and over in a function

From
Ivan Sergio Borgonovo
Date:
I've a record set on which I have to loop several times.
The function should return the same record set + one more computed
field.
Something that in could look like:

foreach(row) {
  // compute stuff
  if(...) {
  }
  // place stuff in field[N+1] of the row
}
if(some condition) {
  //
}
foreach(row) {
  // compute stuff
  if(...) {
  }
  // place stuff in a field[N+1] of the row
}
if(some condition) {
  //
}
...

actually return row + computed field.

in pgplsql

where each loop depends on the result of the previous.
The dataset is very small.


If I could easily load all the dataset into an array, loop through
it and then just update the computed field it would be nice... but
how?
Is it really worth to load the whole record set in an array, loop
over etc... in spite of eg. building a temp table with the same
structure of the input record set + 1 field, loop over the table
etc... what about all the UPDATEs involved to change field N+1 of
the temp table? Will be they expensive?

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: looping over a small record set over and over in a function

From
Alban Hertroys
Date:
On Jun 19, 2009, at 8:23 PM, Ivan Sergio Borgonovo wrote:
> I've a record set on which I have to loop several times.
> The function should return the same record set + one more computed
> field.
> Something that in could look like:
>
> foreach(row) {
>  // compute stuff
>  if(...) {
>  }
>  // place stuff in field[N+1] of the row
> }
> if(some condition) {
>  //
> }
> foreach(row) {
>  // compute stuff
>  if(...) {
>  }
>  // place stuff in a field[N+1] of the row
> }
> if(some condition) {
>  //
> }
> ...
>
> actually return row + computed field.
>
> in pgplsql
>
> where each loop depends on the result of the previous.
> The dataset is very small.
>
>
> If I could easily load all the dataset into an array, loop through
> it and then just update the computed field it would be nice... but
> how?

You could add a column to your query as a placeholder for the computed
value.
For example, SELECT *, 0 AS computed_value FROM table.

If you use a scrollable cursor (possible in PL/pgSQL these days,
although it still has some limitations) you could just loop through
its results and rewind the cursor until you're done.

It does look like you're implementing some kind of aggregate function
though. Maybe you can implement it as an actual aggregate function,
maybe in combination with the windowing stuff in 8.4? Then you could
just add the aggregate to your query instead of using your query
results in a function. That's about the inverse of what you're
attempting now.
You seem to be implementing something that would look like SELECT
sum(SELECT * FROM table), while you may be better off aiming for
SELECT SUM(value) FROM table.

Considering you want to loop over the results multiple times that may
not be possible; only you can tell.

> Is it really worth to load the whole record set in an array, loop
> over etc... in spite of eg. building a temp table with the same
> structure of the input record set + 1 field, loop over the table
> etc... what about all the UPDATEs involved to change field N+1 of
> the temp table? Will be they expensive?

Neither of those look like a convenient approach. With the array you
lose all the meta-data the record set provided (types, column names,
etc), while the temp table approach looks like it will be kind of
inefficient.

Is the result of that function volatile or can you maybe store it
after computing it once and fire a trigger to update the computed
value if the underlying data changes?

I wonder why you need to re-iterate over your result set multiple
times? It's kind of rare that once isn't enough.

And lastly, if your current approach really is the only way to compute
what you're after, then maybe PL/pgSQL isn't the right match for the
problem; it looks like you'd be better served by a language that can
work with arrays of typed structures. As I'm not familiar with the
other PL languages I can't tell whether they would be suitable in that
respect, but I suspect Python or Java would be able to handle this
better.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4a3cba54759154137769037!



Re: looping over a small record set over and over in a function

From
Ivan Sergio Borgonovo
Date:
On Sat, 20 Jun 2009 12:30:42 +0200
Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote:

> You could add a column to your query as a placeholder for the
> computed value.
> For example, SELECT *, 0 AS computed_value FROM table.

> If you use a scrollable cursor (possible in PL/pgSQL these days,
> although it still has some limitations) you could just loop
> through its results and rewind the cursor until you're done.

> It does look like you're implementing some kind of aggregate
> function though. Maybe you can implement it as an actual aggregate
> function, maybe in combination with the windowing stuff in 8.4?
> Then you could just add the aggregate to your query instead of
> using your query results in a function. That's about the inverse
> of what you're attempting now.
> You seem to be implementing something that would look like SELECT
> sum(SELECT * FROM table), while you may be better off aiming for
> SELECT SUM(value) FROM table.
>
> Considering you want to loop over the results multiple times that
> may not be possible; only you can tell.

> > Is it really worth to load the whole record set in an array, loop
> > over etc... in spite of eg. building a temp table with the same
> > structure of the input record set + 1 field, loop over the table
> > etc... what about all the UPDATEs involved to change field N+1 of
> > the temp table? Will be they expensive?
>
> Neither of those look like a convenient approach. With the array
> you lose all the meta-data the record set provided (types, column
> names, etc), while the temp table approach looks like it will be
> kind of inefficient.

> Is the result of that function volatile or can you maybe store it
> after computing it once and fire a trigger to update the computed
> value if the underlying data changes?

The function is volatile.
What I was thinking about was to fire a trigger to wipe the temp
table if the table on which the computation is made is changed.

Considering I can't make the loop run in parallel aggregates may be
a way to go... but somehow they look as they are making the
implementation a bit hard to manage. Anyway it still have to be seen
if I could actually implement the loops with aggregates since every
loop has 2 "side effects" compute some aggregates for the whole
record set and compute an extra field for each row.

> I wonder why you need to re-iterate over your result set multiple
> times? It's kind of rare that once isn't enough.

> And lastly, if your current approach really is the only way to
> compute what you're after, then maybe PL/pgSQL isn't the right
> match for the problem; it looks like you'd be better served by a

Yeah. I gave a look to python but I don't want to add one more
language to the mix.
I enjoy strict type checking of plpgsql even if some bit of
syntactic sugar would help to make it more pleasing and I think it
is the most lightweight among the offer.
Still I don't know how easy it is with eg. python to load an array
with a result set, change it and place it back into the table where
it was coming from.

> language that can work with arrays of typed structures. As I'm not
> familiar with the other PL languages I can't tell whether they
> would be suitable in that respect, but I suspect Python or Java
> would be able to handle this better.


Your suggestion about cursor could be the way... but I don't know
enough about cursors internals to understand if updating a field of
a cursor will cause disk writes.

Currently my main concern is making this things readable and
extensible. I'm not planning to optimise yet. The result set on which
the computations are made is pretty small. It is just taken out from
a large one. But I don't want to cut my way to optimisation.

So one way could be:

select into a temp table the record set.

Build up a trigger that will wipe the temp table if the tables on
which the record set is built changes. This may even not be
necessary, since everything related to that record set is going to
happen in one session.

loop several times over the temp table

Since every loop should actually correspond to a function... I may
wonder if I could build up the cursor and pass it along to several
functions.
Every function will MOVE FIRST, update the one row of the record
set and return some computed fields that will be used by the next
function etc...

If this is possible, this way have the advantage to be extensible.
I wonder if it is efficient since I don't know if an
UPDATE table set where current of cursor
on a temp table is going to incur in any disk write.

thanks



--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: looping over a small record set over and over in a function

From
John DeSoi
Date:
On Jun 20, 2009, at 8:35 AM, Ivan Sergio Borgonovo wrote:

>> And lastly, if your current approach really is the only way to
>> compute what you're after, then maybe PL/pgSQL isn't the right
>> match for the problem; it looks like you'd be better served by a
>
> Yeah. I gave a look to python but I don't want to add one more
> language to the mix.
> I enjoy strict type checking of plpgsql even if some bit of
> syntactic sugar would help to make it more pleasing and I think it
> is the most lightweight among the offer.
> Still I don't know how easy it is with eg. python to load an array
> with a result set, change it and place it back into the table where
> it was coming from.
>
>> language that can work with arrays of typed structures. As I'm not
>> familiar with the other PL languages I can't tell whether they
>> would be suitable in that respect, but I suspect Python or Java
>> would be able to handle this better.
>
>
> Your suggestion about cursor could be the way... but I don't know
> enough about cursors internals to understand if updating a field of
> a cursor will cause disk writes.

I have not tried this, but the documentation says arrays can be
created for "any built-in or user-defined base type, enum type, or
composite type." So maybe you could define a composite type and stuff
those into a single array?




John DeSoi, Ph.D.





Re: looping over a small record set over and over in a function

From
Craig Ringer
Date:
On Fri, 2009-06-19 at 20:23 +0200, Ivan Sergio Borgonovo wrote:

> If I could easily load all the dataset into an array, loop through
> it and then just update the computed field it would be nice... but
> how?

Are you sure you _really_ need a PL/PgSQL function for this?

Can't you:

SELECT x.*, computefield(x, p)
FROM smalltable x, otherparams p;

or something along those lines?

--
Craig Ringer


On Sun, 21 Jun 2009 10:57:51 +0800
Craig Ringer <craig@postnewspapers.com.au> wrote:

> On Fri, 2009-06-19 at 20:23 +0200, Ivan Sergio Borgonovo wrote:
>
> > If I could easily load all the dataset into an array, loop
> > through it and then just update the computed field it would be
> > nice... but how?
>
> Are you sure you _really_ need a PL/PgSQL function for this?

Not really sure. I'm investigating.

But well I noticed that even working with a fully fledged procedural
language I'll have to loop 2 times to replace/add to the "original"
field the computed one, so it's surely not a big loss if I "loop" 2
times with SQL.

I think everything could be summed up as:

select into t myaggregate1(field) from dataset where condition1;
if(t>10) then
  update dataset set field=myfunc1(a,b,c) where condition1;
end if;

select into t myaggregate2(field) from dataset where condition2;
if(t>44) then
  update dataset set field=myfunc2(a,b,c) where condition2;
end if;

Actually there is a trick I could use to "skip" the update loop and
pass the result to the next loop but it looks convenient in a more
general case if the aggregate should be computed on the "updated"
value. I'm still not sure if the class of function I'm working with
are always of the above class where myaggregate works on the
"original" field.

Of course myaggregate could compute myfunc... but then I'll have
to compute myfunc twice.

In a loop I could

create temp table t as
  select *, field as field1, field as field2, field as result
  from t1
  join t2 on...
  join t3 on...
  where ...
  on commit drop;

flipflop='field1';
flopflip='field2';

foreach(row) {
  if(condition on row) {
    flopflip=myfunc(row[flipflop]);
  } else {
    flopflip=row[flipflop];
  }
  agg+=flopflip;
}
if(condition on agg) {
  switch flipflop
}

foreach(row) {
  if(condition on row) {
    flopflip=myfunc(row[flipflop]);
  } else {
    flopflip=row[flipflop];
  }
  agg+=flopflip;
}
if(condition on agg) {
  switch flipflop
}

execute 'update t set result=' || flipflop;

I think this could be obtained using cursors.
This is going to be much more verbose since to use the "flipflop"
technique I'll have to use dynamic statements and EXECUTE.

EXECUTE
 'UPDATE t SET' || flipflop || '=' || newfield ||
   'WHERE CURRENT OF cursor';
even "condition on row" should be expressed as a dynamic statement
if condition involve the computed field.

Compared to the select + update solution it is going to avoid
looping and checking the condition twice but it is going to work on
a larger dataset and run the update even for unchanged row (maybe
the else condition could be omitted??).
I'm not sure that actual set of functions have the update condition
equal to the select condition anyway.
But I think wasting updates has a larger cost if they require disk
IO.

I think stuff may look more "convenient" if there was a simple and
cheap way to load a record set into an array, do stuff with the
procedural language and substitute the old record set with
the computed one.

Currently I'll get most of the speed up from looping through a
smaller data set avoiding to join over a very large table.
Considering the very small data set I'm expecting (no more than 20
record) I don't think dynamic statement, lack of indexes etc... are
going to have an impact.

As soon as clients will increase, optimizing the loops may be
necessary.
I'll have to see if all the cases I'm dealing with could be solved
by the SELECT + conditional UPDATE technique since it looks much
easier to maintain.
Furthermore I suspect that for my real case the conditional check
will be made on the original value so that myfunction will be
computed at most once and I can delay an aggregate computation on
the computed field after the last loop has been executed.
The remaining costs will be:
- looping twice on the same table (getting the aggregate + updating
  the table)
- updating if it causes disk IO

I even suspect that the test will mostly fail so that updates will
be rare but I still would like to understand how this could work in
the most general case and how the temp table + cursor + flipflop
technique is going to work internally.


I think I really don't have a clear picture of how temp tables
really work.
They can be seen by concurrent transactions in the same session.
But if the transaction in which a temp table is created is not
committed yet, other transactions won't see it.
If the transaction where the temp table is created is aborted... no
other transaction will ever know about the existence of the temp
table.

If the temp table is defined as DROP ON COMMIT... somehow there is
no need to make the changes happening on the temp table land on
disk. So it may actually look more as a temporary, private storage
that doesn't have to be aware of concurrency.

So temp tables should require less disk IO. Is it?

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


On Sun, 2009-06-21 at 14:57 +0200, Ivan Sergio Borgonovo wrote:

> I think everything could be summed up as:
>
> select into t myaggregate1(field) from dataset where condition1;
> if(t>10) then
>   update dataset set field=myfunc1(a,b,c) where condition1;
> end if;
>
> select into t myaggregate2(field) from dataset where condition2;
> if(t>44) then
>   update dataset set field=myfunc2(a,b,c) where condition2;
> end if;

That's really too simplified to see what you're actually doing.

I've found that in the _vast_ majority of non-trigger cases where I've
considered using PL/PgSQL, a bit more thought and proper consideration
of the use of generate_series, subqueries in FROM, join types, etc has
allowed me to find a way to do it in SQL. It's almost always faster,
cleaner, and nicer if I do find a way to express it in SQL, too.

> I think I really don't have a clear picture of how temp tables
> really work.
> They can be seen by concurrent transactions in the same session.

Eh? In this context, what do you mean by "session"? Did you mean
consecutive rather than concurrent, ie:

BEGIN;
CREATE TEMPORARY TABLE x (...);
-- do other work
COMMIT;
BEGIN;
-- Can see `x' from here
COMMIT;

?

Normally, you couldn't see another sessions temp tables, even after
commit, unless you explicitly schema-qualified their names - eg
'pg_temp_4.x' . If even then; I haven't actually checked.

Two concurrent sessions that issue 'CREATE TEMPORARY TABLE x(...)' get
two DIFFERENT tables, both named `x', in different pg_temp schema, eg
'pg_temp_2.x' and 'pg_temp_3.x' .

> But if the transaction in which a temp table is created is not
> committed yet, other transactions won't see it.

Of course, since PostgreSQL doesn't support the READ UNCOMMITTED
isolation level.

> So it may actually look more as a temporary, private storage
> that doesn't have to be aware of concurrency.
>
> So temp tables should require less disk IO. Is it?

The big thing is that they're private to a session, so different
sessions can concurrently be doing things with temp tables by the same
name without treading on each others' feet.

Because they're limited to the lifetime of the session, though,
PostgreSQL doesn't need to care about ensuring that they're consistent
in the case of a backend crash, unexpected server reset, etc. Tom Lane
recently pointed out that as a result writes don't need to go through
the WAL, so my understanding is that you're avoiding the doubled-up disk
I/O from that. They also don't need to be written with O_SYNC or
fsync()ed since we don't care if writes make it to the table in order.

As a result I'm pretty sure temp tables don't ever have to hit the disk.
If the OS has enough write-cache space it can just store them in RAM
from creation to deletion.

--
Craig Ringer


On Sun, 21 Jun 2009 21:43:16 +0800
Craig Ringer <craig@postnewspapers.com.au> wrote:

> On Sun, 2009-06-21 at 14:57 +0200, Ivan Sergio Borgonovo wrote:
>
> > I think everything could be summed up as:
> >
> > select into t myaggregate1(field) from dataset where condition1;
> > if(t>10) then
> >   update dataset set field=myfunc1(a,b,c) where condition1;
> > end if;
> >
> > select into t myaggregate2(field) from dataset where condition2;
> > if(t>44) then
> >   update dataset set field=myfunc2(a,b,c) where condition2;
> > end if;
>
> That's really too simplified to see what you're actually doing.

I'm still checking if real case go far beyond to the above or I've
some more general case.

> I've found that in the _vast_ majority of non-trigger cases where
> I've considered using PL/PgSQL, a bit more thought and proper
> consideration of the use of generate_series, subqueries in FROM,
> join types, etc has allowed me to find a way to do it in SQL. It's
> almost always faster, cleaner, and nicer if I do find a way to
> express it in SQL, too.

The stuff is really serial in its nature and can't be made parallel.

> > I think I really don't have a clear picture of how temp tables
> > really work.
> > They can be seen by concurrent transactions in the same session.

> Eh? In this context, what do you mean by "session"? Did you mean
> consecutive rather than concurrent, ie:

http://www.postgresql.org/docs/8.3/interactive/sql-createtable.html
"If specified, the table is created as a temporary table. Temporary
tables are automatically dropped at the end of a session"

I'd interpret it as a connection.
I don't even know if it is possible to send more than one command
over a single connection and wait for the results asynchronously.
Any clarification?

If a session is a connection and it's not possible to send more than
one command and wait for the results asyncronously... or the server
doesn't run in parallel several command sent across the same
connection... than a temp table looks like what I need.

> BEGIN;
> CREATE TEMPORARY TABLE x (...);
> -- do other work
> COMMIT;
> BEGIN;
> -- Can see `x' from here
> COMMIT;
>
> ?

I was planning to use ON COMMIT DROP

> Normally, you couldn't see another sessions temp tables, even after
> commit, unless you explicitly schema-qualified their names - eg
> 'pg_temp_4.x' . If even then; I haven't actually checked.

> Two concurrent sessions that issue 'CREATE TEMPORARY TABLE x(...)'
> get two DIFFERENT tables, both named `x', in different pg_temp
> schema, eg 'pg_temp_2.x' and 'pg_temp_3.x' .

That's perfect, and what I really need.

> > But if the transaction in which a temp table is created is not
> > committed yet, other transactions won't see it.

> Of course, since PostgreSQL doesn't support the READ UNCOMMITTED
> isolation level.

> > So it may actually look more as a temporary, private storage
> > that doesn't have to be aware of concurrency.

> > So temp tables should require less disk IO. Is it?

> The big thing is that they're private to a session, so different
> sessions can concurrently be doing things with temp tables by the
> same name without treading on each others' feet.

> Because they're limited to the lifetime of the session, though,
> PostgreSQL doesn't need to care about ensuring that they're
> consistent in the case of a backend crash, unexpected server
> reset, etc. Tom Lane recently pointed out that as a result writes
> don't need to go through the WAL, so my understanding is that
> you're avoiding the doubled-up disk I/O from that. They also don't
> need to be written with O_SYNC or fsync()ed since we don't care if
> writes make it to the table in order.

> As a result I'm pretty sure temp tables don't ever have to hit the
> disk. If the OS has enough write-cache space it can just store
> them in RAM from creation to deletion.

So postgresql actually issues writes to disk and delegate to the OS
management of the cache/actual write on disk.
I thought it could just try to hold them in RAM and still delegate
to the OS to save them on disk in swap if the system is short on RAM.

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


On Sun, 2009-06-21 at 22:11 +0200, Ivan Sergio Borgonovo wrote:

> > > I think I really don't have a clear picture of how temp tables
> > > really work.
> > > They can be seen by concurrent transactions in the same session.
>
> > Eh? In this context, what do you mean by "session"? Did you mean
> > consecutive rather than concurrent, ie:
>
> http://www.postgresql.org/docs/8.3/interactive/sql-createtable.html
> "If specified, the table is created as a temporary table. Temporary
> tables are automatically dropped at the end of a session"
>
> I'd interpret it as a connection.

Correctly.

> I don't even know if it is possible to send more than one command
> over a single connection and wait for the results asynchronously.
> Any clarification?

To an extent cursors provide that ability. The "result" is returned
quite promptly, but it's a placeholder that allows you to retrieve the
real results progressively as you need them. Whether the database
generates the results immediately and stores them to return later, or
whether it generates them on demand, isn't something you can easily tell
it's up to the database.

I don't know if PostgreSQL will pause the execution of set-returning
PL/PgSQL functions connected to cursors until the client requests more
data. Anyone?

By the way, a cursor is only valid within the context of the transaction
in which it's created.

If you don't explicitly ask to use a cursor, then no, commands cannot
run concurrently within the same session.

> > As a result I'm pretty sure temp tables don't ever have to hit the
> > disk. If the OS has enough write-cache space it can just store
> > them in RAM from creation to deletion.
>
> So postgresql actually issues writes to disk and delegate to the OS
> management of the cache/actual write on disk.

Yes.

> I thought it could just try to hold them in RAM and still delegate
> to the OS to save them on disk in swap if the system is short on RAM.

For a variety of reasons, you REALLY don't want it to work that way.

OS memory managers tend to be _much_ better and faster at managing pages
that're backed by a file. They'll write dirty data out pre-emptively so
that execution doesn't stall when memory runs low; they write data to
the file in order for best disk performance; they efficiently buffer and
read-ahead when pulling the data back in, etc.

The OS knows much less about what anonymous memory (memory not backed by
a file) "means" to a program and can't be as clever with it. Swapping
tends to be _much_ more CPU expensive than writing dirty buffers to a
file. It's a lot more expensive to retrieve from disk, too, and usually
involves lots of seeks for quite scattered pages instead of nice block
readahead.

The OS knows much better than PostgreSQL does when the table will fit in
RAM and when it needs to spill to disk, and it's much better at managing
that than Pg can ever be. It's great that Pg just uses the OS's hardware
knowledge, system-wide awareness, and highly optimised memory manager +
disk IO management to take care of the problem.

--
Craig Ringer


Re: temp tables, sessions, pgpool and disk

From
Ivan Sergio Borgonovo
Date:
On Mon, 22 Jun 2009 07:26:56 +0800
Craig Ringer <craig@postnewspapers.com.au> wrote:

> > http://www.postgresql.org/docs/8.3/interactive/sql-createtable.html
> > "If specified, the table is created as a temporary table.
> > Temporary tables are automatically dropped at the end of a
> > session"
> >
> > I'd interpret it as a connection.
>
> Correctly.
>
> > I don't even know if it is possible to send more than one command
> > over a single connection and wait for the results asynchronously.
> > Any clarification?
>
> To an extent cursors provide that ability. The "result" is returned
> quite promptly, but it's a placeholder that allows you to retrieve
> the real results progressively as you need them. Whether the
> database generates the results immediately and stores them to
> return later, or whether it generates them on demand, isn't
> something you can easily tell it's up to the database.

http://www.postgresql.org/docs/8.3/interactive/libpq-async.html
"PQsendQuery cannot be called again (on the same connection) until
PQgetResult has returned a null pointer, indicating that the command
is done."

Asynchronous calls can't be made to parallelize postgres queries on
the same "session", but just to parallelize client and server work.

So a temp table seems as private as I need it.

I wonder what will happen if I put something like pgpool between
postgresql and a web app.

> > So postgresql actually issues writes to disk and delegate to the
> > OS management of the cache/actual write on disk.

> Yes.

> > I thought it could just try to hold them in RAM and still
> > delegate to the OS to save them on disk in swap if the system is
> > short on RAM.

> For a variety of reasons, you REALLY don't want it to work that
> way.

mmm... first sorry for the noise...
Interpret the following as reality checks.
I'm perfectly aware building up a DB is not easy, and I'm not
pretending I know how to write one. ;)

> OS memory managers tend to be _much_ better and faster at managing
> pages that're backed by a file. They'll write dirty data out
> pre-emptively so that execution doesn't stall when memory runs
> low; they write data to the file in order for best disk
> performance; they efficiently buffer and read-ahead when pulling
> the data back in, etc.

> The OS knows much less about what anonymous memory (memory not
> backed by a file) "means" to a program and can't be as clever with
> it. Swapping tends to be _much_ more CPU expensive than writing

But issuing a write to disk Postgresql doesn't actually say anything
more about what it is placing on the disk and how it is going to
access it... and it is actually adding overhead to move it back and
forward, no matter if this overhead happens on RAM or disk.
Actually since temp table are private to the connection they should
(?) be private to a postgresql process, so the OS should be able to
do a good job.
I don't see any atomicity constraint, so... if something fail while
writing to RAM, as you said you shouldn't need a WAL.

> dirty buffers to a file. It's a lot more expensive to retrieve
> from disk, too, and usually involves lots of seeks for quite
> scattered pages instead of nice block readahead.

Once you're running out of memory I see no guaranty your file will
end up in a fast easily accessible area of your disk... and you're
going to add the overhead associated with a file system
(journalling, permissions/ownership, locks) swap is volatile... and
it should offers the guaranty you "need" for a temp table.

> The OS knows much better than PostgreSQL does when the table will
> fit in RAM and when it needs to spill to disk, and it's much
> better at managing that than Pg can ever be. It's great that Pg
> just uses the OS's hardware knowledge, system-wide awareness, and
> highly optimised memory manager + disk IO management to take care
> of the problem.

The same should be true for virtual memory, not just file management
and postgresql has a configuration file that should give a clue to
the DB about the expected workload and hardware.
Surely postgresql can't forecast how many and how large the temp
tables for a single connection will be... but substantially I got
the idea that a connection is somehow serial in its execution and
that storage could be garbage collected or just released early
(drop table, on commit drop).
This looks as it is taking temp tables very far from the standard.
And yeah... once you want to do memory management/resource management
inside SQL you've opened the doors of Hell.
But well

For what I could see about SQL99 the definition of temp table is very
terse... and a bit confusing (at least for me) about global and
local.
I gave a quick look at what's available on MS SQL... and they have
an sort of "in memory temp table" but you can't modify its schema.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: temp tables, sessions, pgpool and disk

From
Martijn van Oosterhout
Date:
On Mon, Jun 22, 2009 at 09:53:59AM +0200, Ivan Sergio Borgonovo wrote:
> > The OS knows much less about what anonymous memory (memory not
> > backed by a file) "means" to a program and can't be as clever with
> > it. Swapping tends to be _much_ more CPU expensive than writing
>
> But issuing a write to disk Postgresql doesn't actually say anything
> more about what it is placing on the disk and how it is going to
> access it... and it is actually adding overhead to move it back and
> forward, no matter if this overhead happens on RAM or disk.
> Actually since temp table are private to the connection they should
> (?) be private to a postgresql process, so the OS should be able to
> do a good job.
> I don't see any atomicity constraint, so... if something fail while
> writing to RAM, as you said you shouldn't need a WAL.

For the record, temp tables are in fact handled differently, in
particular they are not stored in the shared_buffers, but instead are
in backend local (private) buffers, whose size is controlled by
temp_buffers. They are indeed not WAL archived, nor written to disk
unless needed.

So yes, small temp tables will likely stay in memory, but large temp
tables may spill to disk. There's no flushing or syncing so quite
likely they'll end up in the OS disk cache for a while. Once the temp
table is deleted, the file is deleted and the OS throws that data away.
So temp tables most likely won't use any disk I/O, but they *can* if
the need arises.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

Re: temp tables, sessions, pgpool and disk

From
Ivan Sergio Borgonovo
Date:
On Mon, 22 Jun 2009 11:40:08 +0200
Martijn van Oosterhout <kleptog@svana.org> wrote:

> On Mon, Jun 22, 2009 at 09:53:59AM +0200, Ivan Sergio Borgonovo
> wrote:
> > > The OS knows much less about what anonymous memory (memory not
> > > backed by a file) "means" to a program and can't be as clever
> > > with it. Swapping tends to be _much_ more CPU expensive than
> > > writing
> >
> > But issuing a write to disk Postgresql doesn't actually say
> > anything more about what it is placing on the disk and how it is
> > going to access it... and it is actually adding overhead to move
> > it back and forward, no matter if this overhead happens on RAM
> > or disk. Actually since temp table are private to the connection
> > they should (?) be private to a postgresql process, so the OS
> > should be able to do a good job.
> > I don't see any atomicity constraint, so... if something fail
> > while writing to RAM, as you said you shouldn't need a WAL.
>
> For the record, temp tables are in fact handled differently, in
> particular they are not stored in the shared_buffers, but instead
> are in backend local (private) buffers, whose size is controlled by
> temp_buffers. They are indeed not WAL archived, nor written to disk
> unless needed.

> So yes, small temp tables will likely stay in memory, but large
> temp tables may spill to disk. There's no flushing or syncing so
> quite likely they'll end up in the OS disk cache for a while. Once
> the temp table is deleted, the file is deleted and the OS throws
> that data away. So temp tables most likely won't use any disk I/O,
> but they *can* if the need arises.

Just to make it extra-clear to people unaware of pg internals...
since the second paragraph may seems to contradict the first one...

could be "nor written to disk unless needed" rephrased as:
even repeated UPDATE/INSERT won't issue writes (no matter if they end
up on disk or not, it won't issue writes to the OS) if the table fit
the buffer?

I see the default is somehow "large" (8M) and it is not pre
allocated. Looks nice.

> Have a nice day,

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it