Re: PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones. - Mailing list pgsql-performance

From Stanislaw Pankevich
Subject Re: PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
Date
Msg-id CAFXpGYbD-nZACnT2w5C0Xasg6=xtxKrvcuubw-Bhz9-gCg7cYA@mail.gmail.com
Whole thread Raw
In response to Re: PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.  ("Albe Laurenz" <laurenz.albe@wien.gv.at>)
List pgsql-performance
On Fri, Jul 6, 2012 at 4:39 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
> Stanislaw Pankevich wrote:
>> ==== PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each
>> non-empty table and reset unique identifier column of empty ones ====
>>
>> I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in
>> the fastest solutions ever possible.
>
>> I have following assumptions:
>>
>>     I have 30-100 tables. Let them be 30.
>>
>>     Half of the tables are empty.
>>
>>     Each non-empty table has, say, no more than 100 rows. By this I mean, tables are NOT large.
>>
>>     I need an optional possibility to exclude 2 or 5 or N tables from this procedure.
>>
>>     I cannot! use transactions.
>
> Why? That would definitely speed up everything.
It is because of specifics of Ruby or the Rails testing environment,
when running tests again webdriver, which uses its own connection
separate from one, which test suite itself uses. Transactions are
great, but not for all cases.

>> I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9.
>>
>> I see the following approaches:
>>
>> 1) Truncate each table. It is too slow, I think, especially for empty tables.
>
> Did you actually try it? That's the king's way to performance questions!
> Truncating a single table is done in a matter of microseconds, particularly
> if it is not big.
> Do you have tens of thousands of tables?

Actually, 10-100 tables.

>> 2) Check each table for emptiness by more faster method, and then if it is empty reset its unique
>> identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its
>> last_value from sequence (the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it.
>
> That seems fragile an won't work everywhere.
>
> What if the table has no primary key with a DEFAULT that uses a sequence?
> What if it has such a key, but the DEFAULT was not used for an INSERT?
> What if somebody manually reset the sequence?

I'm using currval in my latest code.

> Besides, how do you find out what the sequence for a table's primary key
> is? With a SELECT, I guess. That SELECT is probably not faster than
> a simple TRUNCATE.
>
>> Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be used to work good as one
>> of the "check procedure" units, cleaning procedure should consist of, but haven't accomplished it too.
>
> You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't
> think that this will be considerably faster than just truncating the table.

Exactly this query is much faster, believe me. You can see my latest
results on https://github.com/stanislaw/truncate-vs-count.

>> I need all this to run unit and integration tests for Ruby or Ruby on Rails projects. Each test should
>> have a clean DB before it runs, or to do a cleanup after itself (so called teardown). Transactions are
>> very good, but they become unusable when running tests against particular webdrivers, in my case the
>> switch to truncation strategy is needed. Once I updated that with reference to RoR, please do not post
>> here the answers about "Obviously, you need DatabaseCleaner for PG" and so on and so on.
>
> I completely fail to understand what you talk about here.
Yes, I know it is very Ruby and Ruby on Rails specific. But I tried to
make my question clear and abstract enough, to be understandable
without the context it was originally drawn from.

Thanks.

pgsql-performance by date:

Previous
From: Stanislaw Pankevich
Date:
Subject: Re: PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
Next
From: Stanislaw Pankevich
Date:
Subject: Re: PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.