Re: Subject: bool / vacuum full bug followup part 2 - Mailing list pgsql-general

From Scott Marlowe
Subject Re: Subject: bool / vacuum full bug followup part 2
Date
Msg-id Pine.LNX.4.33.0205031702200.1471-100000@css120.ihs.com
Whole thread Raw
In response to Re: Subject: bool / vacuum full bug followup part 2  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Subject: bool / vacuum full bug followup part 2  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Fri, 3 May 2002, Tom Lane wrote:

> Scott Marlowe <scott.marlowe@ihs.com> writes:
> > Well, my keys aren't changing and the index is growing like they are.
>
> Could we see the exact details of your test case?

Sure.  I think I posted most of it here already...

Here's my table:

scott.marlowe=# \d test
         Table "test"
 Column |  Type   | Modifiers
--------+---------+-----------
 buf    | text    |
 yn     | boolean |
 id     | integer |
Indexes: test_id_dx,
         test_yn_dx

Here's the indexes:

scott.marlowe=# \di test_id_dx
         List of relations
    Name    | Type  |     Owner
------------+-------+---------------
 test_id_dx | index | scott.marlowe

scott.marlowe=# \di test_yn_dx
         List of relations
    Name    | Type  |     Owner
------------+-------+---------------
 test_yn_dx | index | scott.marlowe
(1 row)

Here's the php script that creates the test data, it makes $count number
of rows and sets the bool to true or false randomly:

set_time_limit(3600);
$conn = pg_connect("dbname=scott.marlowe host=css120.ihs.com");
$count = 100000;
pg_exec($conn,"begin");
$flag[0]="false";
$flag[1]="true";
for ($i=0;$i<$count;$i++){
        if ($i%1000==0) {
                pg_exec($conn,"end");
                pg_exec($conn,"begin");
        }
        $letter = chr(rand(65,91));
        $tf = rand(0,1);
        $query = "insert into test (buf,yn) values ('";
    $query.= $letter."',".$flag[$tf].")";
        pg_exec($conn,$query);
}
pg_exec($conn,"end");

Here's the files in my database directory, and their size by du in
kbytes after vacuum full;
vacuum;
reindex index test_id_dx;
reindex index test_yn_dx;

[postgres@css120 16556]$ oid2name -d scott.marlowe
All tables from database "scott.marlowe":
---------------------------------
126572 = accounts
126574 = accounts_pkey
126566 = branches
126568 = branches_pkey
126575 = history
126569 = tellers
126571 = tellers_pkey
16557  = test
1126687 = test_id_dx
1126688 = test_yn_dx

[postgres@css120 16556]$ du -s 16557 1126687 1126688
11448   16557 (test)
1772    1126687 (test_id_dx)
1772    1126688 (test_yn_dx)

WHAT I DID:

scott.marlowe=# update test set yn=true where yn=true;
UPDATE 50080

[postgres@css120 16556]$ du -s 16557 1126687 1126688
17176   16557
3516    1126687
2924    1126688

scott.marlowe=# vacuum;
VACUUM

[postgres@css120 16556]$ du -s 16557 1126687 1126688
17176   16557
3516    1126687
2924    1126688

scott.marlowe=# vacuum full;
VACUUM

[postgres@css120 16556]$ du -s 16557 1126687 1126688
11448   16557
3516    1126687
4052    1126688  <-- Notice that the index here just GREW

scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;

[postgres@css120 16556]$ du -s 16557 1126687 1126688
68744   16557
13980   1126687
15660   1126688

scott.marlowe=# vacuum;
VACUUM

[postgres@css120 16556]$ du -s 16557 1126687 1126688
68736   16557
13964   1126687
15652   1126688

scott.marlowe=# vacuum full;
VACUUM

[postgres@css120 16556]$ du -s 16557 1126687 1126688
11448   16557
13964   1126687
16808   1126688

***************
So, now thinking the problem might be just vacuum full, I try plain old
vacuums
***************

scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;

scott.marlowe=# vacuum;
VACUUM

[postgres@css120 16556]$ du -s 16557 1126687 1126688
22908   16557
13964   1126687
20088   1126688

scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;

scott.marlowe=# vacuum;
VACUUM

[postgres@css120 16556]$ du -s 16557 1126687 1126688
22908   16557
13964   1126687
22380   1126688

*****************************
Nope, the index on the bools just keeps growing and growing.
Given a few million updates and it will be bigger than the data it is
supposed to index.

scott.marlowe=# reindex index test_yn_dx;
REINDEX
scott.marlowe=# reindex index test_id_dx;
REINDEX

oid2name -d scott.marlowe |grep dx
1126690 = test_id_dx
1126689 = test_yn_dx
[postgres@css120 16556]$ du -s 16557 1126690 1126689
22908   16557
1772    1126690
1772    1126689

and now they're small again.

It would at least be nice if reindex was operational in a transaction so
it would be safe to use on a live database, since it appears to not be
intended for this purpose really, but for fixing broken indexes.  Til then
I'll write a script that asks pg_indexes that drops the index and
recreates it in a transaction to keep my data store svelt and clean.

Thanks for the attention to this.  Scott.


pgsql-general by date:

Previous
From: Uros Gruber
Date:
Subject: Re: storing intermediate results in recursive plpgsql f
Next
From: "Hiroshi Inoue"
Date:
Subject: Re: Using views and MS access via odbc