Thread: TRUNCATE TABLE

TRUNCATE TABLE

From
Adriaan van Os
Date:
Recently, I have been doing extensive profiling of a version 8.1.4 Postgres DB with about 175
tables and 5 GB of data (the server running on Fedora Linux and the clients on Windows XP).
Surprisingly, one of the bottlenecks is TRUNCATE TABLE and that command is really slow as compared
to other operations. For example, we have operations like:

TRUNCATE TABLE my_temporary_table
COPY my_temporary_table ... FROM STDIN BINARY
do_something

where do_something is using the data in my_temporary_table to do something like a JOIN or a mass
UPDATE or whatever.

Now, it turns out that typically most time is lost in TRUNCATE TABLE, in fact it spoils the
performance of most operations on the DB !

I read in a mailing list archive that TRUNCATE TABLE is slow since it was made transaction-safe
somewhere in version 7, but for operations on a temporary table (with data coming from the outside
world) that is irrelevant, at least for my application, in casu, a middleware software package.

So, my questions are

1. Why is TRUNCATE TABLE so slow (even if transaction-safe)
2. Is there is way to dig up in the source code somewhere a quick-and-dirty TRUNCATE TABLE
alternative for operations on temporary tables that need not be transaction-safe (because the
middleware itself can easily restore anything that goes wrong there).

I noticed, by the way, that removing records in general is painfully slow, but I didn't do a
detailed analysis of that issue yet.

As an alternative to TRUNCATE TABLE I tried to CREATE and DROP a table, but that wasn't any faster.

Sincerely,

Adriaan van Os

Re: TRUNCATE TABLE

From
Tom Lane
Date:
Adriaan van Os <postgres@microbizz.nl> writes:
> Surprisingly, one of the bottlenecks is TRUNCATE TABLE and that
> command is really slow as compared to other operations.

When you don't quantify that statement at all, it's hard to make an
intelligent comment on it, but TRUNCATE per se shouldn't be slow.
Are you sure you are not measuring a delay to obtain exclusive lock
on the table before it can be truncated (ie, waiting for other
transactions to finish with it)?

            regards, tom lane

Re: TRUNCATE TABLE

From
Gregory Stark
Date:
"Adriaan van Os" <postgres@microbizz.nl> writes:

> Recently, I have been doing extensive profiling of a version 8.1.4 Postgres DB
> with about 175 tables and 5 GB of data (the server running on Fedora Linux and
> the clients on Windows XP). Surprisingly, one of the bottlenecks is TRUNCATE
> TABLE and that command is really slow as compared to other operations. For
> example, we have operations like:

What filesystem is this? Some filesystems are notoriously slow at deleting
large files. The mythtv folk who face this problem regularly recommend either
JFS or XFS for this purpose.

Postgres generally doesn't really need to be able to delete large files
quickly. The only times files are deleted which come to mind are when you DROP
or TRUNCATE or possibly when you VACUUM a table.

> I noticed, by the way, that removing records in general is painfully slow, but
> I didn't do a detailed analysis of that issue yet.

That's strange. Deleting should be the *quickest* operation in Postgres. Do
you perchance have foreign key references referencing this table? Do you have
any triggers?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: TRUNCATE TABLE

From
Adriaan van Os
Date:
Gregory Stark wrote:
> That's strange. Deleting should be the *quickest* operation in Postgres. Do
> you perchance have foreign key references referencing this table?

No.

> Do you have any triggers?

No.

Tom Lane wrote:
> Adriaan van Os <postgres@microbizz.nl> writes:
>> Surprisingly, one of the bottlenecks is TRUNCATE TABLE and that
>> command is really slow as compared to other operations.
>
> When you don't quantify that statement at all, it's hard to make an
> intelligent comment on it, but TRUNCATE per se shouldn't be slow.
> Are you sure you are not measuring a delay to obtain exclusive lock
> on the table before it can be truncated (ie, waiting for other
> transactions to finish with it)?

During the tests, there is only one connection to the database server. No other transactions are
running.

> When you don't quantify that statement at all, it's hard to make an
> intelligent comment on it, but TRUNCATE per se shouldn't be slow.

Below are some timings, in milliseconds.

> TRUNCATE TABLE my_temporary_table
> COPY my_temporary_table ... FROM STDIN BINARY
> do_something

The temporary table has one INT4 field and no indices.

Numrows        TRUNCATE (ms)            COPY (ms)        SELECT (ms)
   5122                  80,6                    16,1                51,2
   3910                  79,5                    12,9                39,9
   2745                  90,4                    10,7                32,4
   1568                  99,5                     7,6                24,7
    398                 161,1                     4,0                22,1
    200                  79,5                     3,3                22,0
    200                  87,9                      3,1                22,0
222368                 4943,5                 728,6            7659,5
222368                1685,7                 512,2            2883,1

Note how fast the COPY is (which is nice). The SELECT statement uses the temporary table.

Regards,

Adriaan van Os


Re: TRUNCATE TABLE

From
Tom Lane
Date:
Adriaan van Os <postgres@microbizz.nl> writes:
> Tom Lane wrote:
>> When you don't quantify that statement at all, it's hard to make an
>> intelligent comment on it, but TRUNCATE per se shouldn't be slow.

> Below are some timings, in milliseconds.

I can only conclude that you're using a seriously bad filesystem :-(

I tried to replicate your results on a fairly old and slow HPUX box.
I get a fairly repeatable time of around 40msec to truncate a table;
this is presumably mostly filesystem time to create one file and delete
another.  I used CVS HEAD for this because the devel version of psql
supports reporting \timing for \copy commands, but I'm quite sure that
TRUNCATE isn't any faster than it was in 8.2:

regression=# create table tab(f1 int);
CREATE TABLE
Time: 63.775 ms
regression=# insert into tab select random()*10000 from generate_series(1,5000);
INSERT 0 5000
Time: 456.011 ms
regression=# \copy tab to 'tab.data' binary
Time: 80.343 ms
regression=# truncate table tab;
TRUNCATE TABLE
Time: 35.825 ms
regression=# \copy tab from 'tab.data' binary
Time: 391.928 ms
regression=# select count(*) from tab;
 count
-------
  5000
(1 row)

Time: 21.457 ms
regression=# truncate table tab;
TRUNCATE TABLE
Time: 47.867 ms
regression=# \copy tab from 'tab.data' binary
Time: 405.074 ms
regression=# select count(*) from tab;
 count
-------
  5000
(1 row)

Time: 20.247 ms

If I increase the test size to 200K rows, I get a proportional increase
in the copy and select times, but truncate stays about the same:

regression=# truncate table tab;
TRUNCATE TABLE
Time: 40.196 ms
regression=# \copy tab from 'tab.data' binary
Time: 15779.689 ms
regression=# select count(*) from tab;
 count
--------
 200000
(1 row)

Time: 642.965 ms

Your numbers are not making any sense to me.  In particular there is no
reason in the Postgres code for it to take longer to truncate a 200K-row
table than a 5K-row table.  (I would expect some increment at the point
of having 1GB in the table, where we'd create a second table segment
file, but you are nowhere near that.)

The bottom line seems to be that you have a filesystem that takes a
long time to delete a file, with the cost rising rapidly as the file
gets bigger.  Can you switch to a different filesystem?

            regards, tom lane

Re: TRUNCATE TABLE

From
Adriaan van Os
Date:
Gregory Stark wrote:
> What filesystem is this?

Ext3 on Fedora Linux.

> Some filesystems are notoriously slow at deleting
> large files. The mythtv folk who face this problem regularly recommend either
> JFS or XFS for this purpose.

That's a remarkable advice, because XFS is known to be slow at creating and deleting files, see
<http://en.wikipedia.org/wiki/XFS> and <http://everything2.com/index.pl?node_id=1479435>.

Regards,

Adriaan van Os


Re: TRUNCATE TABLE

From
Gregory Stark
Date:
"Adriaan van Os" <postgres@microbizz.nl> writes:

> That's a remarkable advice, because XFS is known to be slow at creating and
> deleting files, see <http://en.wikipedia.org/wiki/XFS> and
> <http://everything2.com/index.pl?node_id=1479435>.

I think this is a case of "you're both right". XFS may have to do more work
than other filesystems for meta-information updates. However It still only has
to do a constant or nearly constant amount of work. So it may be slower at
managing a large directory of thousands of small files than ext3, but it's
faster at deleting a single 1G file than ext3.

On mythtv the experience is that if you use ext3 and delete a large file while
recording another program you can expect the new recording to lose stutter at
that point. The large delete will lock out the recording from writing to the
filesystem for several seconds.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: TRUNCATE TABLE

From
Jean-Max Reymond
Date:
Adriaan van Os a écrit :
> That's a remarkable advice, because XFS is known to be slow at creating
> and deleting files, see <http://en.wikipedia.org/wiki/XFS> and
> <http://everything2.com/index.pl?node_id=1479435>.
>

date of article: Fri Jul 25 2003  !

Re: TRUNCATE TABLE

From
"Thomas Samson"
Date:
On 7/13/07, Jean-Max Reymond <jmreymond@gmail.com> wrote:
> Adriaan van Os a écrit :
> > That's a remarkable advice, because XFS is known to be slow at creating
> > and deleting files, see <http://en.wikipedia.org/wiki/XFS> and
> > <http://everything2.com/index.pl?node_id=1479435>.
> >
>
> date of article: Fri Jul 25 2003  !
>

Even at this date, the article end with :

"More interestingly, my delete performance has actually superseded
that of ext3, for
both random and sequential deletes! The most major weakness of XFS has been
eliminated, and my spankin' new filesystem is ready to rock. Cheers!"

--
Thomas SAMSON
I came, I saw, I deleted all your files.

Re: TRUNCATE TABLE

From
Michael Stone
Date:
On Fri, Jul 13, 2007 at 09:47:06AM +0200, Adriaan van Os wrote:
>That's a remarkable advice, because XFS is known to be slow at creating and
>deleting files, see <http://en.wikipedia.org/wiki/XFS> and
><http://everything2.com/index.pl?node_id=1479435>.

xfs' slowness is proportional to the *number* rather than the *size* of
the files. In postgres you'll tend to have fewer, larger, files than you
would in (e.g.) a source code repository, so it is generally more
important to have a filesystem that deletes large files quickly than a
filesystem that deletes lots of files quickly. I'd suspect that the same
is true for mythtv.

Mike Stone

Re: TRUNCATE TABLE

From
Tom Lane
Date:
Michael Stone <mstone+postgres@mathom.us> writes:
> xfs' slowness is proportional to the *number* rather than the *size* of
> the files. In postgres you'll tend to have fewer, larger, files than you
> would in (e.g.) a source code repository, so it is generally more
> important to have a filesystem that deletes large files quickly than a
> filesystem that deletes lots of files quickly.

The weird thing is that the files in question were hardly "large".
IIRC his test case used a single int4 column, so the rows were probably
36 bytes apiece allowing for all overhead.  So the test cases with about
5K rows were less than 200K in the file, and the ones with 200K rows
were still only a few megabytes.

I tried the test on my Linux machine (which I couldn't do when I
responded earlier because it was tied up with another test), and
saw truncate times of a few milliseconds for both table sizes.
This is ext3 on Fedora 6.

So I'm still of the opinion that there's something broken about
Adriaan's infrastructure, but maybe we have to look to an even
lower level than the filesystem.  Perhaps he should try getting
some bonnie++ benchmark numbers to see if his disk is behaving
properly.

            regards, tom lane

Re: TRUNCATE TABLE

From
Adriaan van Os
Date:
Tom Lane wrote:
> Michael Stone <mstone+postgres@mathom.us> writes:
>> xfs' slowness is proportional to the *number* rather than the *size* of
>> the files. In postgres you'll tend to have fewer, larger, files than you
>> would in (e.g.) a source code repository, so it is generally more
>> important to have a filesystem that deletes large files quickly than a
>> filesystem that deletes lots of files quickly.
>
> The weird thing is that the files in question were hardly "large".
> IIRC his test case used a single int4 column, so the rows were probably
> 36 bytes apiece allowing for all overhead.  So the test cases with about
> 5K rows were less than 200K in the file, and the ones with 200K rows
> were still only a few megabytes.

Right.

> I tried the test on my Linux machine (which I couldn't do when I
> responded earlier because it was tied up with another test), and
> saw truncate times of a few milliseconds for both table sizes.
> This is ext3 on Fedora 6.
>
> So I'm still of the opinion that there's something broken about
> Adriaan's infrastructure, but maybe we have to look to an even
> lower level than the filesystem.  Perhaps he should try getting
> some bonnie++ benchmark numbers to see if his disk is behaving
> properly.

Well, I can hardly believe that something is broken with the infrastructure, because I have seen
the same behaviour on other hardware  (or it must be that I am using the standard postgresql.conf).

I started another test. I copied an existing database (not very large, 35 tables, typically a few
hundred up to a few thousand records) with CREATE DATABASE testdb TEMPLATE mydb and started to
remove random tables from testdb with DROP TABLE and TRUNCATE TABLE. I did this with the query tool
of pgAdmin III, to exclude any doubts about my own software (that uses pqlib). The hardware is an
Intel dual-core 17-inch MacBook Pro running Mac OS X 10.4.

I can not make any sense of the results. Truncating or dropping a table typically takes 1-2 ms or
30-70 ms or 200-500 ms. I have seen that truncating the *same* table with the *same* data takes 1
ms in one test and takes 532 ms in another one. The database has no foreign keys.

Based on these results, I still believe there is a problem in Postgres.

Regards,

Adriaan van Os

Re: TRUNCATE TABLE

From
"Steinar H. Gunderson"
Date:
On Fri, Jul 13, 2007 at 06:17:18PM +0200, Adriaan van Os wrote:
> The hardware is an Intel dual-core 17-inch MacBook Pro running Mac
> OS X 10.4.

To isolate things, have you tried testing a different operating system?

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: TRUNCATE TABLE

From
Tom Lane
Date:
Adriaan van Os <postgres@microbizz.nl> writes:
> I started another test. I copied an existing database (not very large,
> 35 tables, typically a few hundred up to a few thousand records) with
> CREATE DATABASE testdb TEMPLATE mydb and started to remove random
> tables from testdb with DROP TABLE and TRUNCATE TABLE. I did this with
> the query tool of pgAdmin III, to exclude any doubts about my own
> software (that uses pqlib).

Can you try it with plain psql?  pgAdmin is a variable that wasn't
accounted for in my tests.

> The hardware is an Intel dual-core 17-inch
> MacBook Pro running Mac OS X 10.4.

Hmm.  I thought you said Fedora before.  However, I'd done a few tests
yesterday on my own Mac laptop (Al G4) and not gotten results that were
out of line with HPUX or Fedora.

Does anyone else want to try replicating these tests?

            regards, tom lane

Re: TRUNCATE TABLE

From
"Jim C. Nasby"
Date:
On Fri, Jul 13, 2007 at 12:30:46PM -0400, Tom Lane wrote:
> Adriaan van Os <postgres@microbizz.nl> writes:
> > I started another test. I copied an existing database (not very large,
> > 35 tables, typically a few hundred up to a few thousand records) with
> > CREATE DATABASE testdb TEMPLATE mydb and started to remove random
> > tables from testdb with DROP TABLE and TRUNCATE TABLE. I did this with
> > the query tool of pgAdmin III, to exclude any doubts about my own
> > software (that uses pqlib).
>
> Can you try it with plain psql?  pgAdmin is a variable that wasn't
> accounted for in my tests.
>
> > The hardware is an Intel dual-core 17-inch
> > MacBook Pro running Mac OS X 10.4.
>
> Hmm.  I thought you said Fedora before.  However, I'd done a few tests
> yesterday on my own Mac laptop (Al G4) and not gotten results that were
> out of line with HPUX or Fedora.
>
> Does anyone else want to try replicating these tests?

The following is consistently between 1 and 3 ms:
decibel=# create table i as select * from generate_series(1,20000) i; drop table i;
SELECT
Time: 42.413 ms
DROP TABLE
Time: 1.415 ms
decibel=# select version();
                                                                  version
                    

--------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.3devel on i386-apple-darwin8.10.1, compiled by GCC i686-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple
Computer,Inc. build 5363) 
(1 row)

Time: 46.870 ms
decibel=# \! uname -a
Darwin platter.local 8.10.1 Darwin Kernel Version 8.10.1: Wed May 23 16:33:00 PDT 2007;
root:xnu-792.22.5~1/RELEASE_I386i386 i386 
decibel=#

Truncate is a different story... this is consistently either 6 something ms or
17 something ms:

decibel=# insert into i select generate_series(1,20000); truncate i;
INSERT 0 20000
Time: 600.940 ms
TRUNCATE TABLE
Time: 6.313 ms
decibel=#

This is on a 17" MBP, fsync turned on.
--
Jim Nasby                                      decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Attachment

Re: TRUNCATE TABLE

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Adriaan van Os <postgres@microbizz.nl> writes:
> > I started another test. I copied an existing database (not very large,
> > 35 tables, typically a few hundred up to a few thousand records) with
> > CREATE DATABASE testdb TEMPLATE mydb and started to remove random
> > tables from testdb with DROP TABLE and TRUNCATE TABLE. I did this with
> > the query tool of pgAdmin III, to exclude any doubts about my own
> > software (that uses pqlib).
>
> Can you try it with plain psql?  pgAdmin is a variable that wasn't
> accounted for in my tests.
>
> > The hardware is an Intel dual-core 17-inch
> > MacBook Pro running Mac OS X 10.4.
>
> Hmm.  I thought you said Fedora before.  However, I'd done a few tests
> yesterday on my own Mac laptop (Al G4) and not gotten results that were
> out of line with HPUX or Fedora.
>
> Does anyone else want to try replicating these tests?

I notice that the times are sometimes different when the table is TEMP.
DROP TABLE times are sometimes in the vicinity of 13ms and at other
times 200ms.  My test is

vacuum pg_class; vacuum pg_type; vacuum pg_attribute;
create temp table van_os (a int);
insert into van_os select * from generate_series(1, 200000); drop table van_os;

passed as a single line to psql (no -c).

Times are closer to 2ms when the table has only 5000 tuples.




Doing this
insert into van_os select * from generate_series(1, 200000); truncate van_os;

I get about 200ms on the truncate step.

Whereas if I do this
insert into van_os select * from generate_series(1, 5000); truncate van_os;
times are closer to 8-13 ms.

I guess the difference is the amount of data that ext3 is logging on its
journal.  My ext3 journal settings are default.

--
Alvaro Herrera       Valdivia, Chile   ICBM: S 39º 49' 18.1", W 73º 13' 56.4"
"¿Que diferencia tiene para los muertos, los huérfanos, y aquellos que han
perdido su hogar, si la loca destrucción ha sido realizada bajo el nombre
del totalitarismo o del santo nombre de la libertad y la democracia?" (Gandhi)

Re: TRUNCATE TABLE

From
"Pavel Stehule"
Date:
Hello,

I tested speed difference between TRUNCATE TABLE and  DROP TABLE
(tested on my notebook ext3 and Linux fedora 7):

CREATE OR REPLACE FUNCTION test01() RETURNS SETOF double precision
AS $$
DECLARE t1 timestamp with time zone;
BEGIN
  CREATE TEMP TABLE foo(a integer);
  FOR i IN 1..1000 LOOP
    INSERT INTO foo SELECT 1 FROM generate_series(1,10000);
    t1 := clock_timestamp();
    TRUNCATE TABLE foo;
    RETURN NEXT EXTRACT('ms' FROM clock_timestamp()-t1);
  END LOOP;
  RETURN;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION test02() RETURNS SETOF double precision
AS $$
DECLARE t1 timestamp with time zone;
BEGIN
  FOR i IN 1..1000 LOOP
    EXECUTE 'CREATE TEMP TABLE foo(a integer);';
    EXECUTE 'INSERT INTO foo SELECT 1 FROM generate_series(1,10000);';
    t1 := clock_timestamp();
    EXECUTE 'DROP TABLE foo;';
    RETURN NEXT EXTRACT('ms' FROM clock_timestamp()-t1);
  END LOOP;
  RETURN;
END;
$$ LANGUAGE plpgsql;

vacuum pg_class; vacuum pg_type; vacuum pg_attribute;

postgres=# select count(*), min(t), max(t), avg(t), stddev_samp(t),
stddev_pop(t) from test01() t(t);
 count |  min  |   max   |   avg    |   stddev_samp    |    stddev_pop
-------+-------+---------+----------+------------------+------------------
  1000 | 0.295 | 803.971 | 3.032483 | 30.0036729610037 | 29.9886673721876
(1 row)

Time: 33826,841 ms
postgres=# select count(*), min(t), max(t), avg(t), stddev_samp(t),
stddev_pop(t) from test02() t(t);
 count |  min  |  max   |   avg    |   stddev_samp    |    stddev_pop
-------+-------+--------+----------+------------------+-------------------
  1000 | 0.418 | 20.792 | 0.619168 | 0.81550718804297 | 0.815099332459549
(1 row)

Time: 33568,818 ms

It's true, stddev_samp(TRUNCATE) >> stddev_samp(DROP)

Regards
Pavel Stehule

Re: TRUNCATE TABLE

From
Adriaan van Os
Date:
Tom Lane wrote:
> Adriaan van Os <postgres@microbizz.nl> writes:
>> I started another test. I copied an existing database (not very large,
>> 35 tables, typically a few hundred up to a few thousand records) with
>> CREATE DATABASE testdb TEMPLATE mydb and started to remove random
>> tables from testdb with DROP TABLE and TRUNCATE TABLE. I did this with
>> the query tool of pgAdmin III, to exclude any doubts about my own
>> software (that uses pqlib).
>
> Can you try it with plain psql?  pgAdmin is a variable that wasn't
> accounted for in my tests.

Will do that and report the results.

>> The hardware is an Intel dual-core 17-inch
>> MacBook Pro running Mac OS X 10.4.
>
> Hmm.  I thought you said Fedora before.

Yes, the test that I mentioned yesterday was on Fedora, but as you were "of the opinion that
there's something broken about
Adriaan's infrastructure" I tried the new test on a completely different system today.

   However, I'd done a few tests
> yesterday on my own Mac laptop (Al G4) and not gotten results that were
> out of line with HPUX or Fedora.
>
> Does anyone else want to try replicating these tests?

Thanks,

Adriaan van Os

Re: TRUNCATE TABLE

From
"Jim C. Nasby"
Date:
On Fri, Jul 13, 2007 at 09:12:34PM +0200, Pavel Stehule wrote:
> Hello,
>
> I tested speed difference between TRUNCATE TABLE and  DROP TABLE
> (tested on my notebook ext3 and Linux fedora 7):
>
> CREATE OR REPLACE FUNCTION test01() RETURNS SETOF double precision
> AS $$
> DECLARE t1 timestamp with time zone;
> BEGIN
>  CREATE TEMP TABLE foo(a integer);
>  FOR i IN 1..1000 LOOP
>    INSERT INTO foo SELECT 1 FROM generate_series(1,10000);
>    t1 := clock_timestamp();
>    TRUNCATE TABLE foo;
>    RETURN NEXT EXTRACT('ms' FROM clock_timestamp()-t1);
>  END LOOP;
>  RETURN;
> END;
> $$ LANGUAGE plpgsql;
>
> CREATE OR REPLACE FUNCTION test02() RETURNS SETOF double precision
> AS $$
> DECLARE t1 timestamp with time zone;
> BEGIN
>  FOR i IN 1..1000 LOOP
>    EXECUTE 'CREATE TEMP TABLE foo(a integer);';
>    EXECUTE 'INSERT INTO foo SELECT 1 FROM generate_series(1,10000);';
>    t1 := clock_timestamp();
>    EXECUTE 'DROP TABLE foo;';
>    RETURN NEXT EXTRACT('ms' FROM clock_timestamp()-t1);
>  END LOOP;
>  RETURN;
> END;
> $$ LANGUAGE plpgsql;

Are you sure you can ignore the added cost of an EXECUTE? I tried the following as a test, but my repeatability
sucks...:/ 

CREATE OR REPLACE FUNCTION test02() RETURNS SETOF double precision AS $$
DECLARE t1 timestamp with time zone;
BEGIN
  CREATE TEMP TABLE foo(a integer);
  FOR i IN 1..1000 LOOP
    EXECUTE 'INSERT INTO foo SELECT 1 FROM generate_series(1,10000)';
    t1 := clock_timestamp();
    EXECUTE 'TRUNCATE TABLE foo';
    RETURN NEXT EXTRACT('ms' FROM clock_timestamp()-t1);
  END LOOP;
  RETURN;
END;
$$ LANGUAGE plpgsql;

decibel=# drop table foo;select count(*), min(t), max(t), avg(t), stddev_samp(t),stddev_pop(t) from test01() t(t);drop
tablefoo;select count(*), min(t), max(t), avg(t), stddev_samp(t),stddev_pop(t) from test03() t(t);drop table foo;select
count(*),min(t), max(t), avg(t), stddev_samp(t),stddev_pop(t) from test01() t(t);drop table foo;select count(*),
min(t),max(t), avg(t), stddev_samp(t),stddev_pop(t) from test03() t(t);drop table foo;select count(*), min(t), max(t),
avg(t),stddev_samp(t),stddev_pop(t) from test01() t(t);drop table foo;select count(*), min(t), max(t), avg(t),
stddev_samp(t),stddev_pop(t)from test03() t(t); 
ERROR:  table "foo" does not exist
 count |  min  |   max    |   avg    |   stddev_samp    |    stddev_pop
-------+-------+----------+----------+------------------+------------------
  1000 | 0.533 | 1405.747 | 3.444874 | 44.4166419484871 | 44.3944280726548
(1 row)

Time: 44945.101 ms
DROP TABLE
Time: 11.204 ms
 count |  min  |   max    |   avg    |   stddev_samp    |    stddev_pop
-------+-------+----------+----------+------------------+------------------
  1000 | 0.446 | 1300.168 | 7.611269 | 79.7606049935278 | 79.7207147159672
(1 row)

Time: 44955.870 ms
DROP TABLE
Time: 148.186 ms
 count | min  |  max   |   avg    |   stddev_samp   |    stddev_pop
-------+------+--------+----------+-----------------+------------------
  1000 | 0.46 | 21.585 | 1.991845 | 1.2259573313755 | 1.22534419938848
(1 row)

Time: 47566.985 ms
DROP TABLE
Time: 5.065 ms
 count |  min  |   max    |   avg    |   stddev_samp    |    stddev_pop
-------+-------+----------+----------+------------------+------------------
  1000 | 0.479 | 1907.865 | 5.368207 | 73.8576562901696 | 73.8207182251985
(1 row)

Time: 48681.777 ms
DROP TABLE
Time: 7.863 ms
 count |  min  |   max    |   avg    |   stddev_samp   |   stddev_pop
-------+-------+----------+----------+-----------------+-----------------
  1000 | 0.562 | 1009.578 | 2.998867 | 31.874023877249 | 31.858082879064
(1 row)

Time: 37426.441 ms
DROP TABLE
Time: 4.935 ms
 count | min  |  max   |   avg    |   stddev_samp    |    stddev_pop
-------+------+--------+----------+------------------+------------------
  1000 | 0.42 | 20.721 | 2.064845 | 1.24241007069275 | 1.24178871027844
(1 row)

Time: 47906.628 ms
decibel=#
--
Jim Nasby                                      decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Attachment

Re: TRUNCATE TABLE

From
Tom Lane
Date:
"Jim C. Nasby" <decibel@decibel.org> writes:
> On Fri, Jul 13, 2007 at 09:12:34PM +0200, Pavel Stehule wrote:
>> I tested speed difference between TRUNCATE TABLE and  DROP TABLE
>> (tested on my notebook ext3 and Linux fedora 7):

> Are you sure you can ignore the added cost of an EXECUTE? I tried the
> following as a test, but my repeatability sucks... :/

The repeatability was sucky for me too, until I turned off autovacuum.
I am not sure why autovac is interfering with truncate more than with
create/drop, but that seems to be what's happening.  Note that all the
tables involved are temp, so autovac really shouldn't be touching them
at all, so this is a bit surprising.

[ investigates awhile... ] There is a fairly serious mistake in Pavel's
test script, which is that it is testing 1000 iterations *within a
single transaction*.  We do not drop removable tables until end of
transaction, and that means that the actual filesystem effects of drop
or truncate are not being timed by the script.  The part that he's
really timing is:

* for DROP: mark a bunch of system catalog tuples as deleted

* for TRUNCATE: create one new, empty disk file, then mark one pg_class
tuple as deleted and insert a replacement one.

Thus the timing issue (at least as exhibited by this script) has nothing
whatever to do with the time to delete a file, but with the time to
create one.  Since the part of DROP being timed has probably got no I/O
involved at all (the tuples being touched are almost surely still in
shared buffers), it's unsurprising that it is consistently fast.

I tried strace -T on the backend while running the TRUNCATE script, and
got a smoking gun: most of the open(O_CREAT) calls take only 130 to 150
microseconds, but the tail of the distribution is awful:

0.000186
0.000187
0.000188
0.000190
0.000193
0.000194
0.000204
0.000208
0.000235
0.000265
0.000274
0.000289
0.000357
0.000387
0.000410
0.000434
0.000435
0.000488
0.000563
0.065674
0.583236

Somehow, autovac is doing something that makes the filesystem go nuts
every so often, and take an astonishingly long time to create an empty
file.  But autovac itself doesn't create or delete any files, so what's
up here?

Also, I was able to reproduce the variability in timing on HPUX and
Darwin as well as Linux, so we can't put all the blame on ext3.
(I didn't drill down to the strace level on the other two machines,
though, so it's possible that there is a different mechanism at work
there.)

            regards, tom lane

Re: TRUNCATE TABLE

From
Adriaan van Os
Date:
Tom Lane wrote:

> Thus the timing issue (at least as exhibited by this script) has nothing
> whatever to do with the time to delete a file, but with the time to
> create one.  Since the part of DROP being timed has probably got no I/O
> involved at all (the tuples being touched are almost surely still in
> shared buffers), it's unsurprising that it is consistently fast.

In my original profiling, CREATE TEMPORARY TABLE/DROP TABLE wasn't much faster than TRUNCATE TABLE.
When I try it again now, I see that DROP TABLE is consistently fast, while the timings of CREATE
TEMPORARY TABLE vary as much as those of TRUNCATE TABLE. Your observations on the time needed to
open a file confirm that, I think.

In my test databases, autovacuum is off.

Regards,

Adriaan van Os

Re: TRUNCATE TABLE

From
Adriaan van Os
Date:
Tom Lane wrote:

> Somehow, autovac is doing something that makes the filesystem go nuts
> every so often, and take an astonishingly long time to create an empty
> file.  But autovac itself doesn't create or delete any files, so what's
> up here?
>
> Also, I was able to reproduce the variability in timing on HPUX and
> Darwin as well as Linux, so we can't put all the blame on ext3.
> (I didn't drill down to the strace level on the other two machines,
> though, so it's possible that there is a different mechanism at work
> there.)

Any news since this message ? Should I file a bug report ?

Regards,

Adriaan van Os

Re: TRUNCATE TABLE

From
"Kevin Grittner"
Date:
>>> On Mon, Jul 16, 2007 at  7:18 PM, in message <25418.1184631498@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Somehow, autovac is doing something that makes the filesystem go nuts
> every so often, and take an astonishingly long time to create an empty
> file.  But autovac itself doesn't create or delete any files, so what's
> up here?

Have you ruled out checkpoints as the culprit?

-Kevin




Re: TRUNCATE TABLE

From
Alvaro Herrera
Date:
Adriaan van Os wrote:
> Tom Lane wrote:
>
>> Somehow, autovac is doing something that makes the filesystem go nuts
>> every so often, and take an astonishingly long time to create an empty
>> file.  But autovac itself doesn't create or delete any files, so what's
>> up here?
>> Also, I was able to reproduce the variability in timing on HPUX and
>> Darwin as well as Linux, so we can't put all the blame on ext3.
>> (I didn't drill down to the strace level on the other two machines,
>> though, so it's possible that there is a different mechanism at work
>> there.)
>
> Any news since this message ? Should I file a bug report ?

Were you able to show that turning off autovacuum removes the
performance problem?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: TRUNCATE TABLE

From
Adriaan van Os
Date:
Kevin Grittner wrote:
>>>> On Mon, Jul 16, 2007 at  7:18 PM, in message <25418.1184631498@sss.pgh.pa.us>,
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Somehow, autovac is doing something that makes the filesystem go nuts
>> every so often, and take an astonishingly long time to create an empty
>> file.  But autovac itself doesn't create or delete any files, so what's
>> up here?
>
> Have you ruled out checkpoints as the culprit?

That's a good question. I will do some more tests, but I also suspect fsync "cascading"
<http://www.uwsg.iu.edu/hypermail/linux/kernel/0708.0/1435.html>.

Regards,

Adriaan van Os


Re: TRUNCATE TABLE

From
Decibel!
Date:
On Sat, Aug 04, 2007 at 11:39:31PM +0200, Adriaan van Os wrote:
> Kevin Grittner wrote:
> >>>>On Mon, Jul 16, 2007 at  7:18 PM, in message
> >>>><25418.1184631498@sss.pgh.pa.us>,
> >Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>Somehow, autovac is doing something that makes the filesystem go nuts
> >>every so often, and take an astonishingly long time to create an empty
> >>file.  But autovac itself doesn't create or delete any files, so what's
> >>up here?
> >
> >Have you ruled out checkpoints as the culprit?
>
> That's a good question. I will do some more tests, but I also suspect fsync
> "cascading"
> <http://www.uwsg.iu.edu/hypermail/linux/kernel/0708.0/1435.html>.

Interesting. I'm guessing that ext3 has to sync out the entire journal
up to the point in time that fsync() is called, regardless of what
files/information the journal contains. Fortunately I think it's common
knowledge to mount PostgreSQL filesystems with data=writeback, which
hopefully eliminates much of that bottleneck... but if you don't do
noatime you're probably still spewing a lot out to the drive.
--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Attachment

Re: TRUNCATE TABLE

From
Tom Lane
Date:
Decibel! <decibel@decibel.org> writes:
> Interesting. I'm guessing that ext3 has to sync out the entire journal
> up to the point in time that fsync() is called, regardless of what
> files/information the journal contains. Fortunately I think it's common
> knowledge to mount PostgreSQL filesystems with data=3Dwriteback, which
> hopefully eliminates much of that bottleneck... but if you don't do
> noatime you're probably still spewing a lot out to the drive.

FWIW, I tried to test the above by running Pavel's script on an ext3
partition mounted noatime,data=writeback.  This didn't seem to make any
difference --- still very large deviations in the time to do a TRUNCATE.
However the problem seems harder to reproduce now than it was three weeks
ago.  In the meantime I installed a 2.6.22-based kernel instead of the
2.6.20 one that Fedora was using before; I wonder whether the kernel
guys tweaked something related ...

            regards, tom lane