Thread: RAM-only temporary tables

RAM-only temporary tables

From

"Kevin Grittner"

Date:

05 November 2008, 19:01:12

An idea for a possible enhancement to PostgreSQL: allow creation of a
temporary table without generating any disk I/O.  (Creating and
dropping a three-column temporary table within a database transaction
currently generates about 150 disk writes).
If some circumstances don't make it feasible to always do this as a
RAM-only operation, perhaps a clause could be added to the CREATE
TEMPORARY TABLE syntax to specify this behavior along with whatever
limitations on the temporary table are required for this to work. 
(For example, maybe this is only feasible if the table will be dropped
by the end of the transaction?)
-Kevin

Re: RAM-only temporary tables

From

"Alex Hunsaker"

Date:

05 November 2008, 19:57:10

On Wed, Nov 5, 2008 at 16:00, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> An idea for a possible enhancement to PostgreSQL: allow creation of a
> temporary table without generating any disk I/O.  (Creating and
> dropping a three-column temporary table within a database transaction
> currently generates about 150 disk writes).

Well arguably you can already do this...

create tablespace ram location '/dev/shm/';
create temp table (junk int) tablespace ram;

In theory because only one backend needs the temp table we should be
able to use mmap() to fake it being a file as well without to much
trouble.  But im sure I am missing something...

Re: RAM-only temporary tables

From

"Matthew T. O'Connor"

Date:

05 November 2008, 20:07:17

Kevin Grittner wrote:
> An idea for a possible enhancement to PostgreSQL: allow creation of a
> temporary table without generating any disk I/O.  (Creating and
> dropping a three-column temporary table within a database transaction
> currently generates about 150 disk writes).
>  
> If some circumstances don't make it feasible to always do this as a
> RAM-only operation, perhaps a clause could be added to the CREATE
> TEMPORARY TABLE syntax to specify this behavior along with whatever
> limitations on the temporary table are required for this to work. 
> (For example, maybe this is only feasible if the table will be dropped
> by the end of the transaction?)

As someone else already pointed out you can put temp tables on a RAM 
disk, but the larger issue is that temp tables still cause system table 
churn which will always need to be on stable media.

Re: RAM-only temporary tables

From

Alvaro Herrera

Date:

05 November 2008, 21:42:38

Kevin Grittner wrote:
> An idea for a possible enhancement to PostgreSQL: allow creation of a
> temporary table without generating any disk I/O.  (Creating and
> dropping a three-column temporary table within a database transaction
> currently generates about 150 disk writes).

Most of these are catalog updates.  A trace of WAL logs including only
heap inserts says that to create a temp table with 3 columns (2 int, 1
text) and no indexes there are this many inserts:
     3 1247 (pg_type)    20 1249 (pg_attribute)     3 1259 (pg_class)     7 2608 (pg_depend)     1 2610 (pg_index)

Note the excess of pg_attribute entries!  There are 3 in the table, 3 in
the toast table, and then there are 14 extra attrs which are for system
columns (7 for the main table, 7 for the toast table).  Just getting rid
of pg_attribute entries for those would probably prove to be an
importante gain.  (Don't forget the index updates for each of those heap
inserts; for pg_type it's 2 btree inserts for each index insert.)  If
you do this, you've shaved 42 of those 150 writes.

Perhaps another gain is getting rid of array types for temp tables; a
new feature of 8.3.  Who wants those anyway?  That also removes one or
two of the pg_depend entries.  Maybe these extra array types are the
additional 8.3 overhead.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: RAM-only temporary tables

From

"Kevin Grittner"

Date:

06 November 2008, 11:46:10

>>> Alvaro Herrera <alvherre@commandprompt.com> wrote: 
> Kevin Grittner wrote:
>> An idea for a possible enhancement to PostgreSQL: allow creation of
a
>> temporary table without generating any disk I/O.  (Creating and
>> dropping a three-column temporary table within a database
transaction
>> currently generates about 150 disk writes).
> 
> Most of these are catalog updates.  A trace of WAL logs including
only
> heap inserts says that to create a temp table with 3 columns (2 int,
1
> text) and no indexes there are this many inserts:
> 
>       3 1247 (pg_type)
>      20 1249 (pg_attribute)
>       3 1259 (pg_class)
>       7 2608 (pg_depend)
>       1 2610 (pg_index)
> 
> Note the excess of pg_attribute entries!  There are 3 in the table, 3
in
> the toast table, and then there are 14 extra attrs which are for
system
> columns (7 for the main table, 7 for the toast table).  Just getting
rid
> of pg_attribute entries for those would probably prove to be an
> importante gain.  (Don't forget the index updates for each of those
heap
> inserts; for pg_type it's 2 btree inserts for each index insert.) 
If
> you do this, you've shaved 42 of those 150 writes.
Note that the 150 disk writes were for the CREATE and the DROP.  Does
that mean that we'd actually shave 84 of 150 writes?
Also, if you're looking to account for all the writes, it's worth
noting that my test declared a one-column primary key (on an integer
column) in the CREATE TEMPORARY TABLE statement.
In suggesting this enhancement, my hope is that each session could
check for a referenced table as a temporary in RAM before going to the
system tables, in a manner vaguely similar to how space reserved by
the temp_buffers GUC is used for temp table data.  I, of course, am
suggesting this from a position of blissful ignorance of the actual
complexity of making such a change.
-Kevin

Re: RAM-only temporary tables

From

Alvaro Herrera

Date:

06 November 2008, 11:53:18

Kevin Grittner wrote:

> Note that the 150 disk writes were for the CREATE and the DROP.  Does
> that mean that we'd actually shave 84 of 150 writes?

Hmm, you'd shave more than 42 but not 84, because index entries are not
deleted until a later vacuum.  (I'd say about 56 -- 42 plus the 14 heap
deletions).

> Also, if you're looking to account for all the writes, it's worth
> noting that my test declared a one-column primary key (on an integer
> column) in the CREATE TEMPORARY TABLE statement.

That probably makes up for the extra few writes that I didn't see in my
quick test.

> In suggesting this enhancement, my hope is that each session could
> check for a referenced table as a temporary in RAM before going to the
> system tables, in a manner vaguely similar to how space reserved by
> the temp_buffers GUC is used for temp table data.  I, of course, am
> suggesting this from a position of blissful ignorance of the actual
> complexity of making such a change.

Right -- I don't expect we can make use of such an idea readily.  Not
creating unnecessary pg_attribute entries for system columns is probably
a lot easier to do.  The idea of uncatalogued temp tables has been
suggested and rejected several times in the past.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: RAM-only temporary tables

From

"Kevin Grittner"

Date:

06 November 2008, 12:13:31

>>> Alvaro Herrera <alvherre@commandprompt.com> wrote: 
> Kevin Grittner wrote:
> 
>> Note that the 150 disk writes were for the CREATE and the DROP. 
Does
>> that mean that we'd actually shave 84 of 150 writes?
> 
> Hmm, you'd shave more than 42 but not 84, because index entries are
not
> deleted until a later vacuum.  (I'd say about 56 -- 42 plus the 14
heap
> deletions).
> 
>> Also, if you're looking to account for all the writes, it's worth
>> noting that my test declared a one-column primary key (on an
integer
>> column) in the CREATE TEMPORARY TABLE statement.
> 
> That probably makes up for the extra few writes that I didn't see in
my
> quick test.
It sounds like you were counting the 8kB pages pushed from the
PostgreSQL cache to the OS cache, and I was counting the 1kB blocks
pushed from the OS cache to the RAID controller cache.  By watching
vmstat results after pushing this to a more-or-less steady state, I
was probably picking up the results of autovacuum runs, but multiple
writes to a single page were often combined by the OS.  If we match,
it's really just a coincidence.
-Kevin

Re: RAM-only temporary tables

From

Tom Lane

Date:

06 November 2008, 12:17:51

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Note that the 150 disk writes were for the CREATE and the DROP.  Does
> that mean that we'd actually shave 84 of 150 writes?

It really shouldn't be the case that each system catalog tuple insertion
generates a separate write --- especially not for multiple insertions
into the same catalog, which we could expect to go into the same page or
few pages.

I think a large fraction of the writes you're measuring are coming from
the file create/unlink operations.  It would certainly be important to
identify where the bulk of the cost *really* is before we start
expending effort on a solution.
> In suggesting this enhancement, my hope is that each session could
> check for a referenced table as a temporary in RAM before going to the
> system tables, in a manner vaguely similar to how space reserved by
> the temp_buffers GUC is used for temp table data.

This isn't very workable.  For one thing, client-side operations such as
psql's \dt still need to see catalog entries for temp tables.

There's been some handwaving about keeping catalog entries for temp
tables in temp children of the main system catalogs, but it hasn't got
past the handwaving stage.
        regards, tom lane

Re: RAM-only temporary tables

From

Alvaro Herrera

Date:

06 November 2008, 12:23:15

Kevin Grittner wrote:

> It sounds like you were counting the 8kB pages pushed from the
> PostgreSQL cache to the OS cache, and I was counting the 1kB blocks
> pushed from the OS cache to the RAID controller cache.  By watching
> vmstat results after pushing this to a more-or-less steady state, I
> was probably picking up the results of autovacuum runs, but multiple
> writes to a single page were often combined by the OS.  If we match,
> it's really just a coincidence.

I was counting WAL inserts actually, so "logical database operations".
I left out "storage" items (filesystem actions), so it was mainly just
catalog changes.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: RAM-only temporary tables

From

Tom Lane

Date:

06 November 2008, 12:29:10

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Right -- I don't expect we can make use of such an idea readily.  Not
> creating unnecessary pg_attribute entries for system columns is probably
> a lot easier to do.

I seem to recall having proposed that in the past, and getting shot down
on the basis that clients might be depending on those pg_attribute
entries being there.  I'm not sure how big a risk there really is ---
most of the code I've seen explicitly selects attnum > 0 --- but it's a
consideration.
        regards, tom lane

Re: RAM-only temporary tables

From

"Kevin Grittner"

Date:

06 November 2008, 12:38:45

>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> I think a large fraction of the writes you're measuring are coming
from
> the file create/unlink operations.  It would certainly be important
to
> identify where the bulk of the cost *really* is before we start
> expending effort on a solution.
Any ideas on a good way to gather that information?
Given the temp_buffers space, would it make sense to defer the
creation of the actual file until there is actually a need to spill
data to the disk?
-Kevin

Re: RAM-only temporary tables

From

Tom Lane

Date:

06 November 2008, 13:14:13

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
>> I think a large fraction of the writes you're measuring are coming from
>> the file create/unlink operations.  It would certainly be important to
>> identify where the bulk of the cost *really* is before we start
>> expending effort on a solution.
> Any ideas on a good way to gather that information?

I had done some preliminary trials using strace (you need to trace the
active backend, the bgwriter, and the wal writer process to be sure you
see everything going on).  However it's difficult to tell how much
physical I/O results from the create or unlink syscalls.  It might be
interesting to make a test program that just creates 4000 files and then
removes them again, and see what sort of load you see from that.

> Given the temp_buffers space, would it make sense to defer the
> creation of the actual file until there is actually a need to spill
> data to the disk?

No, because that opens us to problems with reuse of relfilenode numbers.

One place that I've always wanted to look at was suppressing the
creation of a btree metapage until there's some useful data in the
table.  We managed to avoid creating a root page until there's data,
but at the time avoiding the metapage seemed too invasive.  (Admittedly,
though, if one assumes that your real world case does involve putting
some data in the tables, this wouldn't actually save anything...)
        regards, tom lane

Re: RAM-only temporary tables

From

Alvaro Herrera

Date:

06 November 2008, 13:16:27

Tom Lane wrote:

> One place that I've always wanted to look at was suppressing the
> creation of a btree metapage until there's some useful data in the
> table.  We managed to avoid creating a root page until there's data,
> but at the time avoiding the metapage seemed too invasive.  (Admittedly,
> though, if one assumes that your real world case does involve putting
> some data in the tables, this wouldn't actually save anything...)

Agreed on the parenthised comment -- it'd be just benchmark
optimization.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: RAM-only temporary tables

From

"Kevin Grittner"

Date:

06 November 2008, 14:36:49

>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> it's difficult to tell how much
> physical I/O results from the create or unlink syscalls.  It might
be
> interesting to make a test program that just creates 4000 files
We use xfs with noatime for our databases.
In a fresh subdirectory of such a mountpoint:
for ((i=0 ; i < 4000 ; ++i)) ; do touch $i ; done
causes 44,969 block writes
> and then
> removes them again, and see what sort of load you see from that.
rm *
causes 26,820 block writes
That would make the file creation and unlink just under half the load.
-Kevin

Re: RAM-only temporary tables

From

Tom Lane

Date:

06 November 2008, 14:47:56

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> That would make the file creation and unlink just under half the load.

Worst possible case :-( ... means that we wouldn't get much improvement
without addressing both aspects.

It strikes me however that this does put some urgency into the question
of how much per-relation FSM is going to cost us.  For short-lived temp
tables the FSM is never going to have any usefulness at all, but in the
current HEAD code it'll double the create/unlink load.

Heikki, would it be reasonable to fix things so that a nonexistent FSM
fork is semantically the same as an empty one, and not create FSM until
there's actually something to put in it?
        regards, tom lane

Re: RAM-only temporary tables

From

Heikki Linnakangas

Date:

11 November 2008, 12:28:46

Tom Lane wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> That would make the file creation and unlink just under half the load.
> 
> Worst possible case :-( ... means that we wouldn't get much improvement
> without addressing both aspects.
> 
> It strikes me however that this does put some urgency into the question
> of how much per-relation FSM is going to cost us.  For short-lived temp
> tables the FSM is never going to have any usefulness at all, but in the
> current HEAD code it'll double the create/unlink load.

Agreed.

> Heikki, would it be reasonable to fix things so that a nonexistent FSM
> fork is semantically the same as an empty one, and not create FSM until
> there's actually something to put in it?

Possibly, but I'd like to understand what exactly the problem is. I 
tried running this:

CREATE TEMPORARY TABLE footemp (id int4);
DROP TABLE footemp;

with pgbench -f, but can't see any meaningful difference between 8.3 and 
CVS HEAD. Both can do about 300 tpm, or 700-800 with fsync=off. There 
probably is a measurable difference there if you run longer tests, but 
doesn't seem like the extra file creation+unlink is worth worrying 
about. With the caveat that this is on reiserfs, on my laptop. Does 
someone see a difference on other filesystems?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: RAM-only temporary tables

From

Tom Lane

Date:

11 November 2008, 15:14:21

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> Heikki, would it be reasonable to fix things so that a nonexistent FSM
>> fork is semantically the same as an empty one, and not create FSM until
>> there's actually something to put in it?

> Possibly, but I'd like to understand what exactly the problem is. I 
> tried running this:

> CREATE TEMPORARY TABLE footemp (id int4);
> DROP TABLE footemp;

> with pgbench -f, but can't see any meaningful difference between 8.3 and 
> CVS HEAD. Both can do about 300 tpm, or 700-800 with fsync=off.

Try several thousand temp tables within one transaction.
        regards, tom lane

Re: RAM-only temporary tables

From

Heikki Linnakangas

Date:

11 November 2008, 15:31:44

Heikki Linnakangas wrote:
> I tried running this:
> 
> CREATE TEMPORARY TABLE footemp (id int4);
> DROP TABLE footemp;
> 
> with pgbench -f, but can't see any meaningful difference between 8.3 and 
> CVS HEAD. Both can do about 300 tpm, or 700-800 with fsync=off. There 
> probably is a measurable difference there if you run longer tests, but 
> doesn't seem like the extra file creation+unlink is worth worrying 
> about. With the caveat that this is on reiserfs, on my laptop. Does 
> someone see a difference on other filesystems?

I just tried that on a test server with data directory a small RAID 
array with ext3. Results are similar, at ~120 tps with both CVS HEAD and 
8.3, and with fsync=off, around ~1700 tps on both versions.

oprofile does suggest that more time is spent in the kernel in CVS HEAD. 
CVS HEAD:

37387    13.9383  no-vmlinux               postgres                 (no 
symbols)
20604     7.6814  postgres                 postgres 
CatalogCacheFlushRelation
16075     5.9929  postgres                 postgres 
hash_search_with_hash_value
10109     3.7688  postgres                 postgres 
LWLockAcquire
9225      3.4392  postgres                 postgres 
_bt_compare
8973      3.3452  postgres                 postgres 
XLogInsert
8368      3.1197  postgres                 postgres 
LWLockRelease
8009      2.9858  postgres                 postgres                 hash_any
6944      2.5888  no-vmlinux               no-vmlinux               (no 
symbols)
4432      1.6523  no-vmlinux               pgbench                  (no 
symbols)
4197      1.5647  postgres                 postgres 
AllocSetAlloc
3773      1.4066  libc-2.7.so              postgres                 memcmp
3359      1.2523  postgres                 postgres 
_bt_checkkeys
3315      1.2359  postgres                 postgres 
FunctionCall2
3135      1.1688  postgres                 postgres 
PinBuffer

8.3:

27264     9.3254  no-vmlinux               postgres                 (no 
symbols)
20673     7.0710  postgres                 postgres 
CatalogCacheFlushRelation
18576     6.3537  postgres                 postgres 
hash_search_with_hash_value
12795     4.3764  postgres                 postgres 
LWLockAcquire
11565     3.9557  postgres                 postgres 
_bt_compare
11538     3.9465  postgres                 postgres                 hash_any
10414     3.5620  postgres                 postgres 
XLogInsert
10100     3.4546  postgres                 postgres 
LWLockRelease
6306      2.1569  postgres                 postgres 
_bt_checkkeys
5096      1.7430  postgres                 postgres 
AllocSetAlloc
4835      1.6538  no-vmlinux               no-vmlinux               (no 
symbols)
4090      1.3989  postgres                 postgres 
PrepareToInvalidateCacheTuple
4021      1.3753  postgres                 postgres 
FunctionCall2
3965      1.3562  no-vmlinux               pgbench                  (no 
symbols)
3869      1.3234  libc-2.7.so              postgres                 memcmp

but frankly I can't get too excited about that.

I also quickly tried a test case with ON COMMIT DELETE ROWS temp table. 
I modified pgbench slightly, so that it creates a temp table with ON 
COMMIT DELETE ROWS after connecting, and then run a script with a simple 
one row INSERT to the temp table. The results look similar; I'm getting 
~11000-12000 tps on both 8.3 and CVS HEAD. oprofile suggests that about 
~50% of the time is spent in kernel, so I'm actually a bit surprised 
that the new FSM file isn't hurting more there.

So, I'm not convinced we need to do anything, based on these test. Maybe 
I missed the point of the OP; if so, a repeatable test case would be nice.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: RAM-only temporary tables

From

Heikki Linnakangas

Date:

11 November 2008, 15:51:28

Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Tom Lane wrote:
>>> Heikki, would it be reasonable to fix things so that a nonexistent FSM
>>> fork is semantically the same as an empty one, and not create FSM until
>>> there's actually something to put in it?
> 
>> Possibly, but I'd like to understand what exactly the problem is. I 
>> tried running this:
> 
>> CREATE TEMPORARY TABLE footemp (id int4);
>> DROP TABLE footemp;
> 
>> with pgbench -f, but can't see any meaningful difference between 8.3 and 
>> CVS HEAD. Both can do about 300 tpm, or 700-800 with fsync=off.
> 
> Try several thousand temp tables within one transaction.

I ran into an interesting problem while doing that. I created a SQL 
script with 10000 CREATE TEMPORARY TABLE statements. After testing with 
that a few times, I got this:

WARNING:  out of shared memory
ERROR:  out of shared memory
HINT:  You might need to increase max_locks_per_transaction.

Not that surprising, but when I then just tried to run a single CREATE 
TEMPORARY TABLE in a new psql session, I got the same error. I then 
stopped and started postmaster, and I still get the same error! Testing 
with trace_locks=on, looks like the table creation takes a lock on all 
the temp tables that are already gone, and runs out of memory doing that.

I'll hunt that down, and try benchmarking the "thousands of temp tables 
in one transaction" case again after that...

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: RAM-only temporary tables

From

Heikki Linnakangas

Date:

11 November 2008, 16:38:59

Heikki Linnakangas wrote:
> Tom Lane wrote:
>> Try several thousand temp tables within one transaction.
> 
> I ran into an interesting problem while doing that. I created a SQL 
> script with 10000 CREATE TEMPORARY TABLE statements. After testing with 
> that a few times, I got this:
> 
> WARNING:  out of shared memory
> ERROR:  out of shared memory
> HINT:  You might need to increase max_locks_per_transaction.
> 
> Not that surprising, but when I then just tried to run a single CREATE 
> TEMPORARY TABLE in a new psql session, I got the same error. I then 
> stopped and started postmaster, and I still get the same error! Testing 
> with trace_locks=on, looks like the table creation takes a lock on all 
> the temp tables that are already gone, and runs out of memory doing that.

What's happening is that there is simply so many temporary tables in 
pg_class that when the new backend tries to clear them with 
RemoveTempRelations, it runs out of lock space.

One idea would be to modify RemoveTempRelations to not acquire locks on 
the temp objects, but that's pretty ugly and I'm not sure how it would 
interact with concurrent DROP TYPE CASCADE or similar. Any better ideas?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: RAM-only temporary tables

From

Tom Lane

Date:

11 November 2008, 16:42:19

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> What's happening is that there is simply so many temporary tables in 
> pg_class that when the new backend tries to clear them with 
> RemoveTempRelations, it runs out of lock space.

What happened to the original owner of the tables?  It must have had
locks on all those tables ...
        regards, tom lane

Re: RAM-only temporary tables

From

Alvaro Herrera

Date:

11 November 2008, 16:59:36

Heikki Linnakangas wrote:

> oprofile does suggest that more time is spent in the kernel in CVS HEAD.  
> CVS HEAD:
>
> 37387    13.9383  no-vmlinux               postgres                 (no  
> symbols)

> 8.3:
>
> 27264     9.3254  no-vmlinux               postgres                 (no  
> symbols)

What this means is that there are a lot more syscalls in HEAD than 8.3 ...

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: RAM-only temporary tables

From

Heikki Linnakangas

Date:

11 November 2008, 17:14:39

Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> What's happening is that there is simply so many temporary tables in 
>> pg_class that when the new backend tries to clear them with 
>> RemoveTempRelations, it runs out of lock space.
> 
> What happened to the original owner of the tables?  It must have had
> locks on all those tables ...

It's pretty easily reproducible with this:

postgres=# CREATE OR REPLACE FUNCTION createtemps (iters int4) RETURNS 
VOID LANGUAGE plpgsql AS $$
DECLARE i int4;
BEGIN  i := 0;  WHILE i < iters LOOP    EXECUTE  'CREATE TEMPORARY TABLE "dummytemp' || i || '_' || 
random() ||'" (id int4)';    i := i + 1;  END LOOP;
END;
$$;
CREATE FUNCTION
postgres=# SELECT createtemps(7000); createtemps
-------------

(1 row)

postgres=# SELECT createtemps(7000); createtemps
-------------

(1 row)

postgres=# \q
$ psql postgres
psql (8.4devel)
Type "help" for help.

postgres=# CREATE TEMPORARY TABLE footemp(id int4);WARNING:  out of 
shared memory
ERROR:  out of shared memory
HINT:  You might need to increase max_locks_per_transaction.
postgres=#

In this case, the temp tables are created in two separate transactions, 
which is why the creations don't run out of shared memory. But when 
they're later dropped, the new backend tries to grab a lock on all of 
them at the same time.

I'm not quite sure how I got into that situation with my test case. I 
might have run the test case without a BEGIN/COMMIT at first, while 
developing the test script, so that all the thousands of temp tables 
were created in separate transactions.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: RAM-only temporary tables

From

Heikki Linnakangas

Date:

11 November 2008, 17:15:59

Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
> 
>> oprofile does suggest that more time is spent in the kernel in CVS HEAD.  
>> CVS HEAD:
>>
>> 37387    13.9383  no-vmlinux               postgres                 (no  
>> symbols)
> 
>> 8.3:
>>
>> 27264     9.3254  no-vmlinux               postgres                 (no  
>> symbols)
> 
> What this means is that there are a lot more syscalls in HEAD than 8.3 ...

Yeah, sure. My point was that it doesn't seem to be significant in the 
big scheme of things.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: RAM-only temporary tables

From

Heikki Linnakangas

Date:

11 November 2008, 17:36:57

Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Tom Lane wrote:
>>> Heikki, would it be reasonable to fix things so that a nonexistent FSM
>>> fork is semantically the same as an empty one, and not create FSM until
>>> there's actually something to put in it?
> 
>> Possibly, but I'd like to understand what exactly the problem is. I 
>> tried running this:
> 
>> CREATE TEMPORARY TABLE footemp (id int4);
>> DROP TABLE footemp;
> 
>> with pgbench -f, but can't see any meaningful difference between 8.3 and 
>> CVS HEAD. Both can do about 300 tpm, or 700-800 with fsync=off.
> 
> Try several thousand temp tables within one transaction.

After increasing max_lock_per_transaction to avoid the issue I mentioned 
elsewhere in this thread, using this test script:

#/bin/sh

echo "BEGIN;"
for ((i=0;i<=10000;i+=1)); do  echo "CREATE TEMPORARY TABLE footemp$i (id int4);"
done
echo "COMMIT;"

And repeating a few times with:

$ time sh temptest.sh | ~/installations/cvshead/bin/psql postgres -q

I'm again seeing no big difference between 8.3 and CVS HEAD. I'm getting 
values around both sides of 6 seconds, with 5.8s being the minimum for 
CVS HEAD, and 5.6s the minimum for PG 8.3.

oprofile for 8.3:

samples  %        image name               app name 
symbol name
23844    11.1625  postgres                 postgres 
LockReassignCurrentOwner
22299    10.4392  no-vmlinux               postgres                 (no 
symbols)
15461     7.2380  postgres                 postgres 
hash_seq_search
12309     5.7624  postgres                 postgres 
CatalogCacheFlushRelation
10981     5.1407  postgres                 postgres 
hash_search_with_hash_value
7139      3.3421  postgres                 postgres 
get_tabstat_entry
7133      3.3393  postgres                 postgres 
XLogInsert
7112      3.3294  no-vmlinux               no-vmlinux               (no 
symbols)
6294      2.9465  postgres                 postgres 
_bt_compare
6027      2.8215  postgres                 postgres 
LWLockAcquire
5189      2.4292  postgres                 postgres                 hash_any
4744      2.2209  postgres                 postgres 
LWLockRelease
3355      1.5706  bash                     bash                     (no 
symbols)

oprofile for CVS HEAD:

samples  %        image name               app name 
symbol name
30167    11.0321  postgres                 postgres 
LockReassignCurrentOwner
27345    10.0001  no-vmlinux               postgres                 (no 
symbols)
19537     7.1447  postgres                 postgres 
hash_seq_search
17376     6.3544  postgres                 postgres 
CatalogCacheFlushRelation
14910     5.4526  postgres                 postgres 
findDependentObjects
14187     5.1882  postgres                 postgres 
hash_search_with_hash_value
9268      3.3893  postgres                 postgres 
get_tabstat_entry
7478      2.7347  postgres                 postgres 
_bt_compare
6861      2.5091  no-vmlinux               no-vmlinux               (no 
symbols)
6779      2.4791  postgres                 postgres 
LWLockAcquire
6764      2.4736  postgres                 postgres 
XLogInsert
5122      1.8731  postgres                 postgres 
LWLockRelease
4838      1.7693  postgres                 postgres                 hash_any
3501      1.2803  bash                     bash                     (no 
symbols)

I'm quite surprised to see LockReassignCurrentOwner being called at all. 
gdb shows where the call is coming from:

#0  LockReassignCurrentOwner () at lock.c:1653
#1  0x0000000000662865 in ResourceOwnerReleaseInternal (owner=0xab21f8,    phase=RESOURCE_RELEASE_LOCKS, isCommit=1
'\001',isTopLevel=0 '\0')    at resowner.c:261
 
#2  0x00000000006628f0 in ResourceOwnerRelease (owner=0xab21f8,    phase=RESOURCE_RELEASE_LOCKS, isCommit=1 '\001',
isTopLevel=0'\0')    at resowner.c:171
 
#3  0x000000000066139b in PortalDrop (portal=0xacc060,    isTopCommit=<value optimized out>) at portalmem.c:418
#4  0x00000000005aa162 in exec_simple_query (    query_string=0xa91350 "CREATE TEMPORARY TABLE footemp7013 (id int4);")
  at postgres.c:972
 

I'm also a bit surprised that the higher number of syscalls in CVS HEAD 
isn't visible in this oprofile report. The top percentages seemed to 
vary by a couple percentage points from run to run, though, so perhaps 
it's just drowned by noise.

Kevin, what was your original scenario like that led you to investigate 
this?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: RAM-only temporary tables

From

"Kevin Grittner"

Date:

21 November 2008, 19:18:35

Sorry for the late response; I was on vacation.
>>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: 
> Kevin, what was your original scenario like that led you to
investigate 
> this?
We noticed a performance degradation in application code which, within
a database transaction, looped through large numbers of iterations
where each iteration created, used, and dropped a temporary table. 
The temporary table always had several columns (typically ten to
twenty, many of which were varchar) and had a primary key.
We initially thought this was because of an upgrade of PostgreSQL from
8.2.5 to 8.3.4, but subsequent testing showed that it was because of
the concurrent update of the Linux kernel from one which defaulted to
not using write barriers to one which did default to using write
barriers.  The file system was XFS on five spindles in RAID 5 on a
good BBU controller.  Updating the relevant /etc/fstab entry with
nobarrier brought performance back to an acceptable level.
-Kevin