Re: PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones. - Mailing list pgsql-performance

From Stanislaw Pankevich
Subject Re: PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
Date
Msg-id CAFXpGYYtHZ+CWG_Vxv1w5ipGJZ4_wHG-amU6Tbmbnfo0GHpmuA@mail.gmail.com
Whole thread Raw
In response to Re: PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.  (Craig Ringer <ringerc@ringerc.id.au>)
List pgsql-performance
Thanks for the answer.

Please, see my answers below:

On Fri, Jul 6, 2012 at 2:35 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
> On 07/06/2012 07:29 PM, Craig Ringer wrote:
>
> On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote:
>
>     I cannot! use transactions.
>
> Everything in PostgreSQL uses transactions, they are not optional.
>
> I'm assuming you mean you can't use explicit transaction demarcation, ie
> BEGIN and COMMIT.

Yes, right!

>  need the fastest cleaning strategy for such case working on PostgreSQL both
> 8 and 9.

> Just so you know, there isn't really any "PostgreSQL 8" or "PostgreSQL 9".
> Major versions are x.y, eg 8.4, 9.0, 9.1 and 9.2 are all distinct major
> versions. This is different to most software and IMO pretty damn annoying,
> but that's how it is.

Yes, right! I've meant "queries as much universal across different
versions as possible" by saying this.

>
> 1) Truncate each table. It is too slow, I think, especially for empty
> tables.
>
> Really?!? TRUNCATE should be extremely fast, especially on empty tables.
>
> You're aware that you can TRUNCATE many tables in one run, right?
>
> TRUNCATE TABLE a, b, c, d, e, f, g;

YES, I know it ;) and I use this option!

> 2) Check each table for emptiness by more faster method, and then if it is
> empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL)
> to initial state (1), i.e to restore its last_value from sequence (the same
> AUTO_INCREMENT analog) back to 1, otherwise run truncate on it.
>
> You can examine the value of SELECT last_value FROM the_sequence ;

I tried using last_value, but somehow, it was equal 1, for table with
0 rows, and for table with 1 rows, and began to increment only after
rows > 1! This seemed very strange to me, but I ensured it working
this way by many times running my test script. Because of this, I am
using SELECT currval.

> that's
> the equivalent of the MySQL hack you're using. To set it, use 'setval(...)'.
>
> http://www.postgresql.org/docs/9.1/static/functions-sequence.html
>
> I use Ruby code to iterate through all tables
>
>
> If you want to be fast, get rid of iteration. Do it all in one query or a
> couple of simple queries. Minimize the number of round-trips and queries.
>
> I'll be truly stunned if the fastest way isn't to just TRUNCATE all the
> target tables in a single statement (not iteratively one by one with
> separate TRUNCATEs).
>
>
> Oh, also, you can setval(...) a bunch of sequences at once:
>
> SELECT
>   setval('first_seq', 0),
>   setval('second_seq', 0),
>   setval('third_seq', 0),
>   setval('fouth_seq', 0);
> ... etc. You should only need two statements, fast ones, to reset your DB to
> the default state.

Good idea!

Could please look at my latest results at
https://github.com/stanislaw/truncate-vs-count? I think they are
awesome for test oriented context.

In slower way, resetting ids I do SELECT currval('#{table}_id_seq');
then check whether it raises an error or > 0.

In a faster way, just checking for a number of rows, for each table I do:
at_least_one_row = execute(<<-TR
        SELECT true FROM #{table} LIMIT 1;
TR
)

If there is at least one row, I add this table to the list of
tables_to_truncate.
Finally I run multiple truncate: TRUNCATE tables_to_truncate;

Thanks,
Stanislaw.

pgsql-performance by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Paged Query
Next
From: Stanislaw Pankevich
Date:
Subject: Re: PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.