Thread: Index corruption

Index corruption

From
Bankim Bhavsar
Date:
Hello postgres experts,

We are running a test that periodically abruptly kills postgres
process(equivalent to kill -9) and restarts it.
After running this test for 24 hrs or so, we see duplicate primary key
entries in postgres table.

We detect this as we load internal hash-table data-structure in a
separate process with primary key entries.

Before hitting this issue we see following warning messages in pg_log

17365 2015-03-24 03:01:42.729 GMTWARNING:  page is not marked
all-visible but visibility map bit is set in relation "table_foo" page 12
17365 2015-03-24 03:01:42.729 GMTWARNING:  page is not marked
all-visible but visibility map bit is set in relation "table_foo" page 13

Some information about schema.
- This table can contain upto 150k entries.
- *IMPORTANT*: We constantly insert new entries and remove older entries
from the table.

Relevant columns in table_foo
-----------------------------------------------------------------------------
  pk_col3  | bigint    | not null default 0::bigint
  pk_col1  | bigint    | not null default 0::bigint
  pk_col2  | bigint    | not null default 0::bigint

"table_foo_pkey" PRIMARY KEY, btree (pk_col1, pk_col2, pk_col3)

There are 3 other indexes on non-primary key columns in the table.

Duplicate entries

db=# select pk_col1, pk_col2, pk_col3, count(1) from table_foo group by
pk_col1, pk_col2, pk_col3 having count(1) > 1;
      pk_col1        | pk_col2| pk_col3  | count
--------------------+--------+----------+-------
  627708949163497688 |      1 |    13467 |     2
  627708949163497688 |      4 |    13566 |     2
  627708949163497688 |    266 |    13565 |     2
(3 rows)


Query analyzer using index only scan.

sodb=# explain select pk_col1, pk_col2, pk_col3, count(1) from table_foo
group by pk_col1, pk_col2, pk_col3 having count(1) > 1 order by pk_col3;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Sort (cost=166.25..167.97 rows=689 width=24)
Sort Key: pk_col3
-> HashAggregate (cost=125.16..133.77 rows=689 width=24)
Filter: (count(1) > 1)
-> Index Only Scan using table_foo_pkey on table_foo (cost=0.00..113.36
rows=944 width=24)
(5 rows)


When non-primary key column is queried we don't get duplicate entries.
Query analyzer is using sequential scan on table_foo table.

sodb=# select pk_col1, pk_col2, pk_col3, creation_time, count(1) from
table_foo group by pk_col1, pk_col2, pk_col3 having count(1) > 1 order
by pk_col3;
pk_col1 | pk_col2 | pk_col3 | creation_time | count
-------------------------------------------
(0 rows)

sodb=# explain select pk_col1, pk_col2, pk_col3, creation_time, count(1)
from table_foo group by pk_col1, pk_col2, pk_col3 having count(1) > 1
order by pk_col3;
QUERY PLAN
--------------------------------------------------------------------------
Sort (cost=174.33..176.06 rows=689 width=32)
Sort Key: pk_col3
-> HashAggregate (cost=133.24..141.85 rows=689 width=32)
Filter: (count(1) > 1)
-> Seq Scan on table_foo (cost=0.00..121.44 rows=944 width=32)
(5 rows)

We ran an experiment wherein we reindex the offending table on every
postgres startup and we don't see the same issue after reindex.
This leads us to believe that the index is corrupted but actual data on
the table is fine.

Some information about postgres setup.
- 9.2.0
- We use standard configuration with shared_buffer setting as 32MB and
checkpoint_timeout as 1 min.
- In this particular case postgres replication is not enabled.

Let me know if more information is needed to help understand this issue.

Any help or pointers will be appreciated.

Thanks,
Bankim.



Re: Index corruption

From
John R Pierce
Date:
On 3/24/2015 11:49 AM, Bankim Bhavsar wrote:
> - 9.2.0

9.2 is currently at 9.2.10.  9.2.0 was released 2.5 years ago. several
of the bugs fixed in the 10 incremental updates were data corruption
related.



--
john, recycling bits in santa cruz



Re: Index corruption

From
"David G. Johnston"
Date:
On Tuesday, March 24, 2015, Bankim Bhavsar <bankim@nimblestorage.com> wrote:
Hello postgres experts,

We are running a test that periodically abruptly kills postgres process(equivalent to kill -9) and restarts it.
After running this test for 24 hrs or so, we see duplicate primary key entries in postgres table.

We detect this as we load internal hash-table data-structure in a separate process with primary key entries.

Before hitting this issue we see following warning messages in pg_log

17365 2015-03-24 03:01:42.729 GMTWARNING:  page is not marked all-visible but visibility map bit is set in relation "table_foo" page 12
17365 2015-03-24 03:01:42.729 GMTWARNING:  page is not marked all-visible but visibility map bit is set in relation "table_foo" page 13


 
Some information about postgres setup.
- 9.2.0
- We use standard configuration with shared_buffer setting as 32MB and checkpoint_timeout as 1 min.
- In this particular case postgres replication is not enabled.

Let me know if more information is needed to help understand this issue.

Any help or pointers will be appreciated.


If you really are running 9.2.0 instead of 9.2.10 you should upgrade and reconfirm.

David J.

Re: Index corruption

From
Bankim Bhavsar
Date:
Thanks for the input.

We'll upgrade to 9.2.10 and attempt to reproduce the issue.

If possible, can someone point to corruption related issues fixed after 9.2.0?

Thanks,
-Bankim.

On 03/24/2015 11:56 AM, David G. Johnston wrote:
On Tuesday, March 24, 2015, Bankim Bhavsar <bankim@nimblestorage.com> wrote:
Hello postgres experts,

We are running a test that periodically abruptly kills postgres process(equivalent to kill -9) and restarts it.

After running this test for 24 hrs or so, we see duplicate primary key entries in postgres table.

We detect this as we load internal hash-table data-structure in a separate process with primary key entries.


Before hitting this issue we see following warning messages in pg_log


17365 2015-03-24 03:01:42.729 GMTWARNING:  page is not marked all-visible but visibility map bit is set in relation "table_foo" page 12

17365 2015-03-24 03:01:42.729 GMTWARNING:  page is not marked all-visible but visibility map bit is set in relation "table_foo" page 13


 
Some information about postgres setup.
- 9.2.0
- We use standard configuration with shared_buffer setting as 32MB and checkpoint_timeout as 1 min.
- In this particular case postgres replication is not enabled.

Let me know if more information is needed to help understand this issue.


Any help or pointers will be appreciated.



If you really are running 9.2.0 instead of 9.2.10 you should upgrade and reconfirm.

David J.

Re: Index corruption

From
John R Pierce
Date:
On 3/24/2015 12:05 PM, Bankim Bhavsar wrote:
> We'll upgrade to 9.2.10 and attempt to reproduce the issue.
>
> If possible, can someone point to corruption related issues fixed
> after 9.2.0?

in the postgres manual, see the release notes for 9.2.1 through 9.2.10



--
john, recycling bits in santa cruz



Re: Index corruption

From
Kevin Grittner
Date:
John R Pierce <pierce@hogranch.com> wrote:
> On 3/24/2015 12:05 PM, Bankim Bhavsar wrote:

>> If possible, can someone point to corruption related issues fixed
>> after 9.2.0?
>
> in the postgres manual, see the release notes for 9.2.1 through 9.2.10

You can find links to those from here:

http://www.postgresql.org/docs/9.2/static/release.html

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Index corruption

From
Bankim Bhavsar
Date:
> We'll upgrade to 9.2.10 and attempt to reproduce the issue.

Been running the same test with postgres 9.2.10 for over 48 hrs and we are unable to reproduce the index corruption
issueobserved earlier with 9.2.0. 

Thanks for the help everyone!!
-Bankim.

________________________________________
From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of Kevin Grittner
[kgrittn@ymail.com]
Sent: Tuesday, March 24, 2015 12:47 PM
To: John R Pierce; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Index corruption

John R Pierce <pierce@hogranch.com> wrote:
> On 3/24/2015 12:05 PM, Bankim Bhavsar wrote:

>> If possible, can someone point to corruption related issues fixed
>> after 9.2.0?
>
> in the postgres manual, see the release notes for 9.2.1 through 9.2.10

You can find links to those from here:

http://www.postgresql.org/docs/9.2/static/release.html

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: Index corruption

From
Scott Marlowe
Date:
On Tue, Mar 24, 2015 at 12:49 PM, Bankim Bhavsar
<bankim@nimblestorage.com> wrote:
> Hello postgres experts,
>
> We are running a test that periodically abruptly kills postgres
> process(equivalent to kill -9) and restarts it.
> After running this test for 24 hrs or so, we see duplicate primary key
> entries in postgres table.
>

Are you killing the POSTMASTER process, deleting the pid file, and
then restarting it without killing its children? That's a recipe for
corruption.