Thread: PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

Hello,

My question below is almost exact copy of the on on SO: http://stackoverflow.com/questions/11311079/postgresql-db-30-tables-with-number-of-rows-100-not-huge-the-fastest-way

The post on SO caused a few answers, all as one stating "DO ONLY TRUNCATION - this is the fast".

Also I think I've met some amount of misunderstanding of what exactly do I want. I would appreciate it great, if you try, as people whom I may trust in performance question.

Here goes the SO subject, formulating exact task I want to accomplish, this procedure is intended to be run beetween after or before each test, ensure database is cleaned enough and has reset unique identifiers column (User.id of the first User should be nor the number left from previous test in a test suite but 1). Here goes the message:

==== PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones ====

I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible.

I found myself such kind of solution for MySQL, it performs much faster than just truncation of tables one by one. But anyway, I am interested in the fastest solutions for MySQL too. See my result here, of course it it for MySQL only: https://github.com/bmabey/database_cleaner/issues/126

I have following assumptions:

    I have 30-100 tables. Let them be 30.

    Half of the tables are empty.

    Each non-empty table has, say, no more than 100 rows. By this I mean, tables are NOT large.

    I need an optional possibility to exclude 2 or 5 or N tables from this procedure.

    I cannot! use transactions.

I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9.

I see the following approaches:

1) Truncate each table. It is too slow, I think, especially for empty tables.

2) Check each table for emptiness by more faster method, and then if it is empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence (the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it.

I use Ruby code to iterate through all tables, calling code below on each of them, I tried to setup SQL code running against each table like:

DO $$DECLARE r record;
BEGIN
  somehow_captured = SELECT last_value from #{table}_id_seq
  IF (somehow_captured == 1) THEN
    == restore initial unique identifier column value here ==
  END

  IF (somehow_captured > 1) THEN
    TRUNCATE TABLE #{table};
  END IF;
END$$;

Manipulating this code in various aspects, I couldn't make it work, because of I am unfamiliar with PostgreSQL functions and blocks (and variables).

Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be used to work good as one of the "check procedure" units, cleaning procedure should consist of, but haven't accomplished it too.

I would appreciate any hints on how this procedure could be accomplished in PostgreSQL native way.

Thanks!

UPDATE:

I need all this to run unit and integration tests for Ruby or Ruby on Rails projects. Each test should have a clean DB before it runs, or to do a cleanup after itself (so called teardown). Transactions are very good, but they become unusable when running tests against particular webdrivers, in my case the switch to truncation strategy is needed. Once I updated that with reference to RoR, please do not post here the answers about "Obviously, you need DatabaseCleaner for PG" and so on and so on.

==== post ends ====

Thanks,

Stanislaw.
On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote:
    I cannot! use transactions.
Everything in PostgreSQL uses transactions, they are not optional.

I'm assuming you mean you can't use explicit transaction demarcation, ie BEGIN and COMMIT.

 need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9.
Just so you know, there isn't really any "PostgreSQL 8" or "PostgreSQL 9". Major versions are x.y, eg 8.4, 9.0, 9.1 and 9.2 are all distinct major versions. This is different to most software and IMO pretty damn annoying, but that's how it is.


1) Truncate each table. It is too slow, I think, especially for empty tables.
Really?!? TRUNCATE should be extremely fast, especially on empty tables.

You're aware that you can TRUNCATE many tables in one run, right?

TRUNCATE TABLE a, b, c, d, e, f, g;


2) Check each table for emptiness by more faster method, and then if it is empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence (the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it.
You can examine the value of SELECT last_value FROM the_sequence ; that's the equivalent of the MySQL hack you're using. To set it, use 'setval(...)'.

http://www.postgresql.org/docs/9.1/static/functions-sequence.html

I use Ruby code to iterate through all tables

If you want to be fast, get rid of iteration. Do it all in one query or a couple of simple queries. Minimize the number of round-trips and queries.

I'll be truly stunned if the fastest way isn't to just TRUNCATE all the target tables in a single statement (not iteratively one by one with separate TRUNCATEs).

--
Craig Ringer
On 07/06/2012 07:29 PM, Craig Ringer wrote:
On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote:
    I cannot! use transactions.
Everything in PostgreSQL uses transactions, they are not optional.

I'm assuming you mean you can't use explicit transaction demarcation, ie BEGIN and COMMIT.

 need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9.
Just so you know, there isn't really any "PostgreSQL 8" or "PostgreSQL 9". Major versions are x.y, eg 8.4, 9.0, 9.1 and 9.2 are all distinct major versions. This is different to most software and IMO pretty damn annoying, but that's how it is.


1) Truncate each table. It is too slow, I think, especially for empty tables.
Really?!? TRUNCATE should be extremely fast, especially on empty tables.

You're aware that you can TRUNCATE many tables in one run, right?

TRUNCATE TABLE a, b, c, d, e, f, g;


2) Check each table for emptiness by more faster method, and then if it is empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence (the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it.
You can examine the value of SELECT last_value FROM the_sequence ; that's the equivalent of the MySQL hack you're using. To set it, use 'setval(...)'.

http://www.postgresql.org/docs/9.1/static/functions-sequence.html

I use Ruby code to iterate through all tables

If you want to be fast, get rid of iteration. Do it all in one query or a couple of simple queries. Minimize the number of round-trips and queries.

I'll be truly stunned if the fastest way isn't to just TRUNCATE all the target tables in a single statement (not iteratively one by one with separate TRUNCATEs).

Oh, also, you can setval(...) a bunch of sequences at once:

SELECT
  setval('first_seq', 0),
  setval('second_seq', 0),
  setval('third_seq', 0),
  setval('fouth_seq', 0);

... etc. You should only need two statements, fast ones, to reset your DB to the default state.

--
Craig Ringer
On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
> 1) Truncate each table. It is too slow, I think, especially for empty
> tables.
>
> Really?!? TRUNCATE should be extremely fast, especially on empty tables.
>
> You're aware that you can TRUNCATE many tables in one run, right?
>
> TRUNCATE TABLE a, b, c, d, e, f, g;

I have seen in "trivial" cases -- in terms of data size -- where
TRUNCATE is much slower than a full-table DELETE.  The most common use
case for that is rapid setup/teardown of tests, where it can add up
quite quickly and in a very big way. This is probably an artifact the
speed of one's file system to truncate and/or unlink everything.

I haven't tried a multi-truncate though.  Still, I don't know a
mechanism besides slow file system truncation time that would explain
why DELETE would be significantly faster.

--
fdr

On 07/06/2012 07:38 PM, Daniel Farina wrote:
> On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
>> 1) Truncate each table. It is too slow, I think, especially for empty
>> tables.
>>
>> Really?!? TRUNCATE should be extremely fast, especially on empty tables.
>>
>> You're aware that you can TRUNCATE many tables in one run, right?
>>
>> TRUNCATE TABLE a, b, c, d, e, f, g;
> I have seen in "trivial" cases -- in terms of data size -- where
> TRUNCATE is much slower than a full-table DELETE.  The most common use
> case for that is rapid setup/teardown of tests, where it can add up
> quite quickly and in a very big way. This is probably an artifact the
> speed of one's file system to truncate and/or unlink everything.
That makes some sense, actually. DELETEing from a table that has no
foreign keys, triggers, etc while nothing else is accessing the table is
fairly cheap and doesn't take much (any?) cleanup work afterwards. For
tiny deletes I can easily see it being better than forcing the OS to
journal a metadata change or two and a couple of fsync()s for a truncate.

I suspect truncating many tables at once will prove a win over
iteratively DELETEing from many tables at once. I'd benchmark it except
that it's optimizing something I don't care about at all, and the
results would be massively dependent on the file system (ext3, ext4,
xfs) and its journal configuration.

--
Craig Ringer

Stanislaw Pankevich wrote:
> ==== PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each
> non-empty table and reset unique identifier column of empty ones ====
> 
> I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in
> the fastest solutions ever possible.

> I have following assumptions:
> 
>     I have 30-100 tables. Let them be 30.
> 
>     Half of the tables are empty.
> 
>     Each non-empty table has, say, no more than 100 rows. By this I mean, tables are NOT large.
> 
>     I need an optional possibility to exclude 2 or 5 or N tables from this procedure.
> 
>     I cannot! use transactions.

Why? That would definitely speed up everything.

> I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9.
> 
> I see the following approaches:
> 
> 1) Truncate each table. It is too slow, I think, especially for empty tables.

Did you actually try it? That's the king's way to performance questions!
Truncating a single table is done in a matter of microseconds, particularly
if it is not big.
Do you have tens of thousands of tables?

> 2) Check each table for emptiness by more faster method, and then if it is empty reset its unique
> identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its
> last_value from sequence (the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it.

That seems fragile an won't work everywhere.

What if the table has no primary key with a DEFAULT that uses a sequence?
What if it has such a key, but the DEFAULT was not used for an INSERT?
What if somebody manually reset the sequence?

Besides, how do you find out what the sequence for a table's primary key
is? With a SELECT, I guess. That SELECT is probably not faster than
a simple TRUNCATE.

> Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be used to work good as one
> of the "check procedure" units, cleaning procedure should consist of, but haven't accomplished it too.

You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't
think that this will be considerably faster than just truncating the table.

> I would appreciate any hints on how this procedure could be accomplished in PostgreSQL native way.
> 
> Thanks!
> 
> UPDATE:
> 
> I need all this to run unit and integration tests for Ruby or Ruby on Rails projects. Each test should
> have a clean DB before it runs, or to do a cleanup after itself (so called teardown). Transactions are
> very good, but they become unusable when running tests against particular webdrivers, in my case the
> switch to truncation strategy is needed. Once I updated that with reference to RoR, please do not post
> here the answers about "Obviously, you need DatabaseCleaner for PG" and so on and so on.

I completely fail to understand what you talk about here.

Yours,
Laurenz Albe

On Friday, July 06, 2012 01:38:56 PM Daniel Farina wrote:
> ll, I don't know a
> mechanism besides slow file system truncation time that would explain
> why DELETE would be significantly faster.
There is no filesystem truncation happening. The heap and the indexes get
mapped into a new file. Otherwise rollback would be pretty hard to implement.

I guess the biggest cost in a bigger cluster is the dropping the buffers that
were formerly mapped to that relation (DropRelFileNodeBuffers).

Andres
--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

On 07/06/2012 09:45 PM, Stanislaw Pankevich wrote:

> Question: Is there a possibility in PostgreSQL to do DELETE on many
> tables massively, like TRUNCATE allows. Like DELETE table1, table2, ...?

Yes, you can do it with a writable common table expression, but you
wanted version portability.

WITH
   discard1 AS (DELETE FROM test1),
   discard2 AS (DELETE FROM test2 AS b)
SELECT 1;

Not only will this not work in older versions (IIRC it only works with
9.1, maybe 9.0 too but I don't see it in the documentation for SELECT
for 9.0) but I find it hard to imagine any performance benefit over
simply sending

   DELETE FROM test1; DELETE FROM test2;

This all smells like premature optimisation of cases that don't matter.
What problem are you solving with this?

--
Craig Ringer

Stanislaw Pankevich wrote:
>>> ==== PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each
>>> non-empty table and reset unique identifier column of empty ones ====
>>>
>>> I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in
>>> the fastest solutions ever possible.

>>> I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9.
>>>
>>> I see the following approaches:
>>>
>>> 1) Truncate each table. It is too slow, I think, especially for empty tables.

>> Did you actually try it? That's the king's way to performance questions!
>> Truncating a single table is done in a matter of microseconds, particularly
>> if it is not big.
>> Do you have tens of thousands of tables?

> Actually, 10-100 tables.

>> You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't
>> think that this will be considerably faster than just truncating the table.
> 
> Exactly this query is much faster, believe me. You can see my latest
> results on https://github.com/stanislaw/truncate-vs-count.

Ok, I believe you.

My quick tests showed that a sible truncate (including transaction and
client-server roundtrip via UNIX sockets takes some 10 to 30 milliseconds.

Multiply that with 100, and you end up with just a few seconds at most.
Or what did you measure?

I guess you run that deletion very often so that it is painful.

Still I think that the biggest performance gain is to be had by using
PostgreSQL's features (truncate several tables in one statement, ...).

Try to bend your Ruby framework!

Yours,
Laurenz Albe




Stanislaw Pankevich wrote:
>>> ==== PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each
>>> non-empty table and reset unique identifier column of empty ones ====

Hello,

2 'exotic' ideas:

- use dblink_send_query to do the job in multiple threads (I doubt this really could be faster)
- have prepared empty tables in a separate schema, and a "garbage schema":

   ALTER TABLE x set schema garbage;
   ALTER TABLE prepared.x set schema "current";

you should be ready for the next test,

but still have to clean garbage nad moved to prepared for the next but one in the background....

best regards,

Marc Mamin





>>>
>>> I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in
>>> the fastest solutions ever possible.

>>> I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9.
>>>
>>> I see the following approaches:
>>>
>>> 1) Truncate each table. It is too slow, I think, especially for empty tables.

>> Did you actually try it? That's the king's way to performance questions!
>> Truncating a single table is done in a matter of microseconds, particularly
>> if it is not big.
>> Do you have tens of thousands of tables?

> Actually, 10-100 tables.

>> You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't
>> think that this will be considerably faster than just truncating the table.
>
> Exactly this query is much faster, believe me. You can see my latest
> results on https://github.com/stanislaw/truncate-vs-count.

Ok, I believe you.

My quick tests showed that a sible truncate (including transaction and
client-server roundtrip via UNIX sockets takes some 10 to 30 milliseconds.

Multiply that with 100, and you end up with just a few seconds at most.
Or what did you measure?

I guess you run that deletion very often so that it is painful.

Still I think that the biggest performance gain is to be had by using
PostgreSQL's features (truncate several tables in one statement, ...).

Try to bend your Ruby framework!

Yours,
Laurenz Albe

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
> On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote:

> > 1) Truncate each table. It is too slow, I think, especially for empty
> > tables.
>
> Really?!? TRUNCATE should be extremely fast, especially on empty tables.
>
> You're aware that you can TRUNCATE many tables in one run, right?
>
> TRUNCATE TABLE a, b, c, d, e, f, g;

This still calls DropRelFileNodeAllBuffers once for each table (and
each index), even if the table is empty.

With large shared_buffers, this can be relatively slow.

Cheers,

Jeff

On 07/03/2012 08:22 AM, Stanislaw Pankevich wrote:
>
> ==== PostgreSQL db, 30 tables with number of rows < 100 (not huge) -
> the fastest way to clean each non-empty table and reset unique
> identifier column of empty ones ====
>
> I wonder, what is the fastest way to accomplish this kind of task in
> PostgreSQL. I am interested in the fastest solutions ever possible.
>
It would help if we really understood your use-case. If you want to
fully reset your database to a known starting state for test runs, why
not just have a base database initialized exactly as you wish, say
"test_base", then just drop your test database and create the new
database from your template:
drop database test;
create database test template test_base;

This should be very fast but it won't allow you to exclude individual
tables.

Are you interested in absolute fastest as a mind-game or is there a
specific use requirement, i.e. how fast is fast enough? This is the
basic starting point for tuning, hardware selection, etc.

Truncate should be extremely fast but on tables that are as tiny as
yours the difference may not be visible to an end-user. I just tried a
"delete from" to empty a 10,000 record table and it took 14 milliseconds
so you could do your maximum of 100 tables each containing 10-times your
max number of records in less than two seconds.

Regardless of the method you choose, you need to be sure that nobody is
accessing the database when you reset it. The drop/create database
method will, of course, require and enforce that. Truncate requires an
exclusive lock so it may appear to be very slow if it is waiting to get
that lock. And even if you don't have locking issues, your reluctance to
wrap your reset code in transactions means that a client could be
updating some table or tables whenever the reset script isn't actively
working on that same table leading to unexplained weird test results.

Cheers,
Steve


Daniel Farina-4 wrote
>
> On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer <ringerc@.id> wrote:
>> 1) Truncate each table. It is too slow, I think, especially for empty
>> tables.
>>
>> Really?!? TRUNCATE should be extremely fast, especially on empty tables.
>>
>> You're aware that you can TRUNCATE many tables in one run, right?
>>
>> TRUNCATE TABLE a, b, c, d, e, f, g;
>
> I have seen in "trivial" cases -- in terms of data size -- where
> TRUNCATE is much slower than a full-table DELETE.  The most common use
> case for that is rapid setup/teardown of tests, where it can add up
> quite quickly and in a very big way. This is probably an artifact the
> speed of one's file system to truncate and/or unlink everything.
>
> I haven't tried a multi-truncate though.  Still, I don't know a
> mechanism besides slow file system truncation time that would explain
> why DELETE would be significantly faster.
>
> --
> fdr
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

That's my experience - I have a set of regression tests that clean the
database (deletes everything from a single parent table and lets the
referential integrity checks cascade to delete five other tables) at the end
of each test run, and it can complete 90 tests (including 90 mass deletes)
in a little over five seconds. If I replace that simple delete with a
truncation of all six tables at once, my test run balloons to 42 seconds.

I run my development database with synchronous_commit = off, though, so I
guess TRUNCATE has to hit the disk while the mass delete doesn't.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-db-30-tables-with-number-of-rows-100-not-huge-the-fastest-way-to-clean-each-non-empty-tab-tp5715643p5715734.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Thanks for the answer.

Please, see my answers below:

On Fri, Jul 6, 2012 at 2:35 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
> On 07/06/2012 07:29 PM, Craig Ringer wrote:
>
> On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote:
>
>     I cannot! use transactions.
>
> Everything in PostgreSQL uses transactions, they are not optional.
>
> I'm assuming you mean you can't use explicit transaction demarcation, ie
> BEGIN and COMMIT.

Yes, right!

>  need the fastest cleaning strategy for such case working on PostgreSQL both
> 8 and 9.

> Just so you know, there isn't really any "PostgreSQL 8" or "PostgreSQL 9".
> Major versions are x.y, eg 8.4, 9.0, 9.1 and 9.2 are all distinct major
> versions. This is different to most software and IMO pretty damn annoying,
> but that's how it is.

Yes, right! I've meant "queries as much universal across different
versions as possible" by saying this.

>
> 1) Truncate each table. It is too slow, I think, especially for empty
> tables.
>
> Really?!? TRUNCATE should be extremely fast, especially on empty tables.
>
> You're aware that you can TRUNCATE many tables in one run, right?
>
> TRUNCATE TABLE a, b, c, d, e, f, g;

YES, I know it ;) and I use this option!

> 2) Check each table for emptiness by more faster method, and then if it is
> empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL)
> to initial state (1), i.e to restore its last_value from sequence (the same
> AUTO_INCREMENT analog) back to 1, otherwise run truncate on it.
>
> You can examine the value of SELECT last_value FROM the_sequence ;

I tried using last_value, but somehow, it was equal 1, for table with
0 rows, and for table with 1 rows, and began to increment only after
rows > 1! This seemed very strange to me, but I ensured it working
this way by many times running my test script. Because of this, I am
using SELECT currval.

> that's
> the equivalent of the MySQL hack you're using. To set it, use 'setval(...)'.
>
> http://www.postgresql.org/docs/9.1/static/functions-sequence.html
>
> I use Ruby code to iterate through all tables
>
>
> If you want to be fast, get rid of iteration. Do it all in one query or a
> couple of simple queries. Minimize the number of round-trips and queries.
>
> I'll be truly stunned if the fastest way isn't to just TRUNCATE all the
> target tables in a single statement (not iteratively one by one with
> separate TRUNCATEs).
>
>
> Oh, also, you can setval(...) a bunch of sequences at once:
>
> SELECT
>   setval('first_seq', 0),
>   setval('second_seq', 0),
>   setval('third_seq', 0),
>   setval('fouth_seq', 0);
> ... etc. You should only need two statements, fast ones, to reset your DB to
> the default state.

Good idea!

Could please look at my latest results at
https://github.com/stanislaw/truncate-vs-count? I think they are
awesome for test oriented context.

In slower way, resetting ids I do SELECT currval('#{table}_id_seq');
then check whether it raises an error or > 0.

In a faster way, just checking for a number of rows, for each table I do:
at_least_one_row = execute(<<-TR
        SELECT true FROM #{table} LIMIT 1;
TR
)

If there is at least one row, I add this table to the list of
tables_to_truncate.
Finally I run multiple truncate: TRUNCATE tables_to_truncate;

Thanks,
Stanislaw.

On Fri, Jul 6, 2012 at 4:38 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
> On 07/06/2012 07:38 PM, Daniel Farina wrote:
>>
>> On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer <ringerc@ringerc.id.au>
>> wrote:
>>>
>>> 1) Truncate each table. It is too slow, I think, especially for empty
>>> tables.
>>>
>>> Really?!? TRUNCATE should be extremely fast, especially on empty tables.
>>>
>>> You're aware that you can TRUNCATE many tables in one run, right?
>>>
>>> TRUNCATE TABLE a, b, c, d, e, f, g;
>>
>> I have seen in "trivial" cases -- in terms of data size -- where
>> TRUNCATE is much slower than a full-table DELETE.  The most common use
>> case for that is rapid setup/teardown of tests, where it can add up
>> quite quickly and in a very big way. This is probably an artifact the
>> speed of one's file system to truncate and/or unlink everything.
>
> That makes some sense, actually. DELETEing from a table that has no foreign
> keys, triggers, etc while nothing else is accessing the table is fairly
> cheap and doesn't take much (any?) cleanup work afterwards. For tiny deletes
> I can easily see it being better than forcing the OS to journal a metadata
> change or two and a couple of fsync()s for a truncate.
>
> I suspect truncating many tables at once will prove a win over iteratively
> DELETEing from many tables at once. I'd benchmark it except that it's
> optimizing something I don't care about at all, and the results would be
> massively dependent on the file system (ext3, ext4, xfs) and its journal
> configuration.

Question:
Is there a possibility in PostgreSQL to do DELETE on many tables
massively, like TRUNCATE allows. Like DELETE table1, table2, ...?

On Fri, Jul 6, 2012 at 4:39 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
> Stanislaw Pankevich wrote:
>> ==== PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each
>> non-empty table and reset unique identifier column of empty ones ====
>>
>> I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in
>> the fastest solutions ever possible.
>
>> I have following assumptions:
>>
>>     I have 30-100 tables. Let them be 30.
>>
>>     Half of the tables are empty.
>>
>>     Each non-empty table has, say, no more than 100 rows. By this I mean, tables are NOT large.
>>
>>     I need an optional possibility to exclude 2 or 5 or N tables from this procedure.
>>
>>     I cannot! use transactions.
>
> Why? That would definitely speed up everything.
It is because of specifics of Ruby or the Rails testing environment,
when running tests again webdriver, which uses its own connection
separate from one, which test suite itself uses. Transactions are
great, but not for all cases.

>> I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9.
>>
>> I see the following approaches:
>>
>> 1) Truncate each table. It is too slow, I think, especially for empty tables.
>
> Did you actually try it? That's the king's way to performance questions!
> Truncating a single table is done in a matter of microseconds, particularly
> if it is not big.
> Do you have tens of thousands of tables?

Actually, 10-100 tables.

>> 2) Check each table for emptiness by more faster method, and then if it is empty reset its unique
>> identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its
>> last_value from sequence (the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it.
>
> That seems fragile an won't work everywhere.
>
> What if the table has no primary key with a DEFAULT that uses a sequence?
> What if it has such a key, but the DEFAULT was not used for an INSERT?
> What if somebody manually reset the sequence?

I'm using currval in my latest code.

> Besides, how do you find out what the sequence for a table's primary key
> is? With a SELECT, I guess. That SELECT is probably not faster than
> a simple TRUNCATE.
>
>> Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be used to work good as one
>> of the "check procedure" units, cleaning procedure should consist of, but haven't accomplished it too.
>
> You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't
> think that this will be considerably faster than just truncating the table.

Exactly this query is much faster, believe me. You can see my latest
results on https://github.com/stanislaw/truncate-vs-count.

>> I need all this to run unit and integration tests for Ruby or Ruby on Rails projects. Each test should
>> have a clean DB before it runs, or to do a cleanup after itself (so called teardown). Transactions are
>> very good, but they become unusable when running tests against particular webdrivers, in my case the
>> switch to truncation strategy is needed. Once I updated that with reference to RoR, please do not post
>> here the answers about "Obviously, you need DatabaseCleaner for PG" and so on and so on.
>
> I completely fail to understand what you talk about here.
Yes, I know it is very Ruby and Ruby on Rails specific. But I tried to
make my question clear and abstract enough, to be understandable
without the context it was originally drawn from.

Thanks.

On Fri, Jul 6, 2012 at 5:22 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
> On 07/06/2012 09:45 PM, Stanislaw Pankevich wrote:
>
>> Question: Is there a possibility in PostgreSQL to do DELETE on many tables
>> massively, like TRUNCATE allows. Like DELETE table1, table2, ...?
>
>
> Yes, you can do it with a writable common table expression, but you wanted
> version portability.
>
> WITH
>   discard1 AS (DELETE FROM test1),
>   discard2 AS (DELETE FROM test2 AS b)
> SELECT 1;
>
> Not only will this not work in older versions (IIRC it only works with 9.1,
> maybe 9.0 too but I don't see it in the documentation for SELECT for 9.0)
> but I find it hard to imagine any performance benefit over simply sending
>
>   DELETE FROM test1; DELETE FROM test2;
>
> This all smells like premature optimisation of cases that don't matter. What
> problem are you solving with this?

I will write tests for both massive TRUNCATE and DELETE (DELETE
each_table) for my case with Ruby testing environment, and let you
know about the results. For now, I think, I should go for massive
TRUNCATE.

Interesting catch, I will try to test the behavior of 'DELETE vs
multiple TRUNCATE'.

I'll post it here, If I discover any amazing results.

On Fri, Jul 6, 2012 at 2:38 PM, Daniel Farina <daniel@heroku.com> wrote:
> On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
>> 1) Truncate each table. It is too slow, I think, especially for empty
>> tables.
>>
>> Really?!? TRUNCATE should be extremely fast, especially on empty tables.
>>
>> You're aware that you can TRUNCATE many tables in one run, right?
>>
>> TRUNCATE TABLE a, b, c, d, e, f, g;
>
> I have seen in "trivial" cases -- in terms of data size -- where
> TRUNCATE is much slower than a full-table DELETE.  The most common use
> case for that is rapid setup/teardown of tests, where it can add up
> quite quickly and in a very big way. This is probably an artifact the
> speed of one's file system to truncate and/or unlink everything.
>
> I haven't tried a multi-truncate though.  Still, I don't know a
> mechanism besides slow file system truncation time that would explain
> why DELETE would be significantly faster.
>
> --
> fdr

Marc, thanks for the answer.

Na, these seem not to be enough universal and easy to hook into
existing truncation strategies used in Ruby world.

On Fri, Jul 6, 2012 at 6:24 PM, Marc Mamin <M.Mamin@intershop.de> wrote:
>
>
>
> Stanislaw Pankevich wrote:
>>>> ==== PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the
>>>> fastest way to clean each
>>>> non-empty table and reset unique identifier column of empty ones ====
>
> Hello,
>
> 2 'exotic' ideas:
>
> - use dblink_send_query to do the job in multiple threads (I doubt this
> really could be faster)
> - have prepared empty tables in a separate schema, and a "garbage schema":
>
>    ALTER TABLE x set schema garbage;
>    ALTER TABLE prepared.x set schema "current";
>
> you should be ready for the next test,
>
> but still have to clean garbage nad moved to prepared for the next but one
> in the background....
>
> best regards,
>
> Marc Mamin

If someone is interested with the current strategy, I am using for
this, see this Ruby-based repo
https://github.com/stanislaw/truncate-vs-count for both MySQL and
PostgreSQL.

MySQL: the fastest strategy for cleaning databases is truncation with
following modifications:
1) We check is table is not empty and then truncate.
2) If table is empty, we check if AUTO_INCREMENT was changed. If it
was, we do a truncate.

For MySQL just truncation is much faster than just deletion. The only
case where DELETE wins TRUNCATE is doing it on empty table.
For MySQL truncation with empty checks is much faster than just
multiple truncation.
For MySQL deletion with empty checks is much faster than just DELETE
on each tables.

PostgreSQL: The fastest strategy for cleaning databases is deletion
with the same modifications.

For PostgreSQL just deletion is much faster than just TRUNCATION(even multiple).
For PostgreSQL multiple TRUNCATE doing empty checks before is slightly
faster than just multiple TRUNCATE
For PostgreSQL deletion with empty checks is slightly faster than just
PostgreSQL deletion.

This is from where it began:
https://github.com/bmabey/database_cleaner/issues/126
This is the result code and long discussion:
https://github.com/bmabey/database_cleaner/issues/126

We began collecting users feedback proving my idea with first checking
empty tables is right.

Thanks to all participants, especially those who've suggested trying
DELETE as well as optimizing TRUNCATE.

Stanislaw

On Fri, Jul 6, 2012 at 7:06 PM, Steve Crawford
<scrawford@pinpointresearch.com> wrote:
> On 07/03/2012 08:22 AM, Stanislaw Pankevich wrote:
>>
>>
>> ==== PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the
>> fastest way to clean each non-empty table and reset unique identifier column
>> of empty ones ====
>>
>> I wonder, what is the fastest way to accomplish this kind of task in
>> PostgreSQL. I am interested in the fastest solutions ever possible.
>>
> It would help if we really understood your use-case. If you want to fully
> reset your database to a known starting state for test runs, why not just
> have a base database initialized exactly as you wish, say "test_base", then
> just drop your test database and create the new database from your template:
> drop database test;
> create database test template test_base;
>
> This should be very fast but it won't allow you to exclude individual
> tables.
>
> Are you interested in absolute fastest as a mind-game or is there a specific
> use requirement, i.e. how fast is fast enough? This is the basic starting
> point for tuning, hardware selection, etc.
>
> Truncate should be extremely fast but on tables that are as tiny as yours
> the difference may not be visible to an end-user. I just tried a "delete
> from" to empty a 10,000 record table and it took 14 milliseconds so you
> could do your maximum of 100 tables each containing 10-times your max number
> of records in less than two seconds.
>
> Regardless of the method you choose, you need to be sure that nobody is
> accessing the database when you reset it. The drop/create database method
> will, of course, require and enforce that. Truncate requires an exclusive
> lock so it may appear to be very slow if it is waiting to get that lock. And
> even if you don't have locking issues, your reluctance to wrap your reset
> code in transactions means that a client could be updating some table or
> tables whenever the reset script isn't actively working on that same table
> leading to unexplained weird test results.
>
> Cheers,
> Steve
>

On 07/13/2012 03:50 PM, Stanislaw Pankevich wrote:
MySQL: the fastest strategy for cleaning databases is truncation with
following modifications:
1) We check is table is not empty and then truncate.
2) If table is empty, we check if AUTO_INCREMENT was changed. If it
was, we do a truncate.

For MySQL just truncation is much faster than just deletion. 
You're talking about MySQL like it's only one database. Is this with MyISAM tables? InnoDB? Something else? I don't see any mention of table formats in a very quick skim of the discussion you linked to.

PostgreSQL will never be able to compete with MyISAM on raw speed of small, simple operations. There might things that can be made faster than they are right now, but I really doubt it'll ever surpass MyISAM.

My mental analogy is asking an abseiler, who is busy clipping in and testing their gear at the top of a bridge, why they aren't at the bottom of the canyon with the BASE jumper yet.

The BASE jumper will always get there faster, but the abseiler will always get there alive.

If you're talking about InnoDB or another durable, reliable table structure then I'd be interested in the mechanics of what MySQL's truncates are doing.

--
Craig Ringer
On Tue, Jul 3, 2012 at 10:22 AM, Stanislaw Pankevich
<s.pankevich@gmail.com> wrote:
> Hello,
>
> My question below is almost exact copy of the on on SO:
> http://stackoverflow.com/questions/11311079/postgresql-db-30-tables-with-number-of-rows-100-not-huge-the-fastest-way
>
> The post on SO caused a few answers, all as one stating "DO ONLY TRUNCATION
> - this is the fast".
>
> Also I think I've met some amount of misunderstanding of what exactly do I
> want. I would appreciate it great, if you try, as people whom I may trust in
> performance question.
>
> Here goes the SO subject, formulating exact task I want to accomplish, this
> procedure is intended to be run beetween after or before each test, ensure
> database is cleaned enough and has reset unique identifiers column (User.id
> of the first User should be nor the number left from previous test in a test
> suite but 1). Here goes the message:
>
> ==== PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the
> fastest way to clean each non-empty table and reset unique identifier column
> of empty ones ====
>
> I wonder, what is the fastest way to accomplish this kind of task in
> PostgreSQL. I am interested in the fastest solutions ever possible.
>
> I found myself such kind of solution for MySQL, it performs much faster than
> just truncation of tables one by one. But anyway, I am interested in the
> fastest solutions for MySQL too. See my result here, of course it it for
> MySQL only: https://github.com/bmabey/database_cleaner/issues/126
>
> I have following assumptions:
>
>     I have 30-100 tables. Let them be 30.
>
>     Half of the tables are empty.
>
>     Each non-empty table has, say, no more than 100 rows. By this I mean,
> tables are NOT large.
>
>     I need an optional possibility to exclude 2 or 5 or N tables from this
> procedure.
>
>     I cannot! use transactions.
>
> I need the fastest cleaning strategy for such case working on PostgreSQL
> both 8 and 9.
>
> I see the following approaches:
>
> 1) Truncate each table. It is too slow, I think, especially for empty
> tables.
>
> 2) Check each table for emptiness by more faster method, and then if it is
> empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL)
> to initial state (1), i.e to restore its last_value from sequence (the same
> AUTO_INCREMENT analog) back to 1, otherwise run truncate on it.
>
> I use Ruby code to iterate through all tables, calling code below on each of
> them, I tried to setup SQL code running against each table like:
>
> DO $$DECLARE r record;
> BEGIN
>   somehow_captured = SELECT last_value from #{table}_id_seq
>   IF (somehow_captured == 1) THEN
>     == restore initial unique identifier column value here ==
>   END
>
>   IF (somehow_captured > 1) THEN
>     TRUNCATE TABLE #{table};
>   END IF;
> END$$;

This didn't work because you can't use variables for table names in
non-dynamic (that is, executed as a string) statements. You'd probably
want:

EXECUTE 'TRUNCATE TABLE ' || #{table};

As to performance, TRUNCATE in postgres (just like mysql) has the nice
property that the speed of truncation is mostly not dependent on table
size: truncating a table with 100 records is not very much faster than
truncating a table with millions of records.  For very small tables,
it might be faster to simply fire off a delete.

merlin