Thread: Possible corrupt index?

Possible corrupt index?

From
Zahir Lalani
Date:

Hi All

 

New on this list!

 

We have an existing postgres deployment which is showing some odd behaviour on Live. We use Navicat to manage it, and when we open a specific table via navicat on live, it says we don’t have a primary index – although the design view does show it. We have auto deployment scripts between environments, and none of the other environments show this. Doing a structure dump shows that there is indeed a primary index:

 

CREATE TABLE "public"."briefs_master" (

  "id" int4 NOT NULL DEFAULT nextval('briefs_master_id_seq'::regclass),

…..

"ext_system_ref" varchar(255) COLLATE "pg_catalog"."default"

…..

 

ALTER TABLE "public"."briefs_master" ADD CONSTRAINT "briefs_master_pkey" PRIMARY KEY ("id");

 

However, we are seeing very strange behaviour on live. We can search via the ID field just fine. Searching on ext_system_ref returns no records.

If I manually add data to the ext field, it then queries correctly. But as other records are added to the table, this field data disappears on the row I manually edited.

 

We have run a reindex on the specific index as well as table. We have run a vacuum on the table. Nothing seems to resolve and we are quite confused as to what the issue could be. Any help and guidance would be most appreciated

 

Regards

 

Zahir

Re: Possible corrupt index?

From
Michael Lewis
Date:
Which version? What are the queries you are running which give unexpected behavior? Have your run explain analyze on those to check what plan is being used? Have your reindexed all or only the one you suspect?

Re: Possible corrupt index?

From
Adrian Klaver
Date:
On 4/16/19 10:02 AM, Zahir Lalani wrote:
> Hi All
> 
> New on this list!
> 
> We have an existing postgres deployment which is showing some odd 
> behaviour on Live. We use Navicat to manage it, and when we open a 

What does Live refer to?

> specific table via navicat on live, it says we don’t have a primary 
> index – although the design view does show it. We have auto deployment 
> scripts between environments, and none of the other environments show 
> this. Doing a structure dump shows that there is indeed a primary index:
> 
> CREATE TABLE "public"."briefs_master" (
> 
>    "id" int4 NOT NULL DEFAULT nextval('briefs_master_id_seq'::regclass),
> 
> …..
> 
> "ext_system_ref" varchar(255) COLLATE "pg_catalog"."default"
> 
> …..
> 
> ALTER TABLE "public"."briefs_master" ADD CONSTRAINT "briefs_master_pkey" 
> PRIMARY KEY ("id");
> 
> However, we are seeing very strange behaviour on live. We can search via 
> the ID field just fine. Searching on ext_system_ref returns no records.
> 
> If I manually add data to the ext field, it then queries correctly. But 
> as other records are added to the table, this field data disappears on 
> the row I manually edited.
> 
> We have run a reindex on the specific index as well as table. We have 
> run a vacuum on the table. Nothing seems to resolve and we are quite 
> confused as to what the issue could be. Any help and guidance would be 
> most appreciated
> 
> Regards
> 
> Zahir
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



RE: Possible corrupt index?

From
Zahir Lalani
Date:

 

>Which version? What are the queries you are running which give unexpected behavior? Have your run explain analyze on those to check >what plan is being used? Have your reindexed all or only the one you suspect?

 

Hi Michael

 

Version: PostgreSQL 9.6.12 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit

 

LIVE – production environment (as opposed to Dev and UAT)

 

Query: select id from briefs_master where ext_system_ref = '12345'

 

Explain:

Seq Scan on briefs_master  (cost=0.00..2937.90 rows=1 width=4) (actual time=18.082..18.082 rows=0 loops=1)

  Filter: ((ext_system_ref)::text = '12345'::text)

  Rows Removed by Filter: 31235

Planning time: 0.242 ms

Execution time: 18.096 ms

 

 

Reindex was done initially on the primary and then on all in the table.

 

So when we reset the data into the ext_system_ref field, the next query returns fine. However, the issue is that since the system thinks there is no primary, we are seeing this value get over-written with a null several minutes later as other rows are added

 

Z

Re: Possible corrupt index?

From
Adrian Klaver
Date:
On 4/16/19 10:16 AM, Zahir Lalani wrote:
>>Which version? What are the queries you are running which give unexpected behavior? Have your run explain analyze on
thoseto check >what plan is being used? Have your reindexed all or only the  one you suspect?
 
> 
> Hi Michael
> 
> Version: PostgreSQL 9.6.12 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 
> 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit

Is this the same for the other environments?

What does:

SHOW lc_collate;

produce in each environment?

Are you doing the below through Navicat or psql?

If through Navicat, what happens if you use psql?

> 
> LIVE – production environment (as opposed to Dev and UAT)
> 
> Query: select id from briefs_master where ext_system_ref = '12345'
> 
> Explain:
> 
> Seq Scan on briefs_master  (cost=0.00..2937.90 rows=1 width=4) (actual 
> time=18.082..18.082 rows=0 loops=1)
> 
>    Filter: ((ext_system_ref)::text = '12345'::text)
> 
>    Rows Removed by Filter: 31235
> 
> Planning time: 0.242 ms
> 
> Execution time: 18.096 ms
> 
> Reindex was done initially on the primary and then on all in the table.
> 
> So when we reset the data into the ext_system_ref field, the next query 
> returns fine. However, the issue is that since the system thinks there 
> is no primary, we are seeing this value get over-written with a null 
> several minutes later as other rows are added
> 
> Z
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



RE: Possible corrupt index?

From
Zahir Lalani
Date:
Hi 

>Is this the same for the other environments?

We have recently upgraded others to 10 and are in testing before we upgrade production. However we still some 9.6 dev
serverswhich are fine.
 

>What does:
>SHOW lc_collate;

en_US.UTF-8

>Are you doing the below through Navicat or psql?

Using Navicat for the test in all environments, but we have the same issue with app layer code connecting via perl
DBI.

Z

Re: Possible corrupt index?

From
Adrian Klaver
Date:
On 4/16/19 2:04 PM, Zahir Lalani wrote:
> Hi
> 
>> Is this the same for the other environments?
> 
> We have recently upgraded others to 10 and are in testing before we upgrade production. However we still some 9.6 dev
serverswhich are fine.
 

I am not following above.

What is running version 10?

Did you restore a dump from a version 10 onto a version 9.6 machine?

> 
>> What does:
>> SHOW lc_collate;
> 
> en_US.UTF-8

The above is the same for all the servers?

> 
>> Are you doing the below through Navicat or psql?
> 
> Using Navicat for the test in all environments, but we have the same issue with app layer code connecting via perl
DBI.
> 
> Z
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



RE: Possible corrupt index?

From
Zahir Lalani
Date:
>I am not following above.
>What is running version 10?

Production is 9.6, others are 10 (as we are testing). Prod has always been fine, its been running for 2 years without a
hitch.This issue has only occurred recently
 

>Did you restore a dump from a version 10 onto a >version 9.6 machine?

No - as I said, I don’t believe it’s a version issue as its been fine for 2 years.

> 
>> What does:
>> SHOW lc_collate;
> 
> en_US.UTF-8

>The above is the same for all the servers?

The upgraded 10 servers are en_GB.UTF_8. The current prod 9.6 is as above


Z

Re: Possible corrupt index?

From
Adrian Klaver
Date:
On 4/17/19 12:42 AM, Zahir Lalani wrote:
>> I am not following above.
>> What is running version 10?
> 
> Production is 9.6, others are 10 (as we are testing). Prod has always been fine, its been running for 2 years without
ahitch. This issue has only occurred recently
 

Any changes occur between the time it worked and the time it did not?
Say a crash, change in schema, new data added and so on.

Per below the version 10 servers use en_GB.UTF_8 and the production one 
en_US.UTF-8.

Why the difference?

And did the production used to be en_GB.UTF_8?

Have you logged into the production instance using psql and tried the 
queries to see if they work?

> 
>> Did you restore a dump from a version 10 onto a >version 9.6 machine?
> 
> No - as I said, I don’t believe it’s a version issue as its been fine for 2 years.
> 
>>
>>> What does:
>>> SHOW lc_collate;
>>
>> en_US.UTF-8
> 
>> The above is the same for all the servers?
> 
> The upgraded 10 servers are en_GB.UTF_8. The current prod 9.6 is as above
> 
> 
> Z
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



RE: Possible corrupt index?

From
Zahir Lalani
Date:
>Any changes occur between the time it worked and the time it did not?
>Say a crash, change in schema, new data added and so on.

Not as far as we can tell - one of my DB team noticed it day before yesterday - there were no tell tale signs of any
issuesbut we have seen issues yesterday with certain records not being found
 

>Per below the version 10 servers use en_GB.UTF_8 and the production one en_US.UTF-8.
>Why the difference?
>And did the production used to be en_GB.UTF_8?

All servers used to be en_US as I think that was the default and had been setup early in our use of PG. I suspect our
devopsguy corrected this during the upgrade.
 

>Have you logged into the production instance using psql and tried the queries to see if they work?
Not via psql, but as I said we saw issues from our app layer. I will do a similar test with psql and let you know


Z


Re: Possible corrupt index?

From
Adrian Klaver
Date:
On 4/17/19 7:28 AM, Zahir Lalani wrote:
> 
>> Any changes occur between the time it worked and the time it did not?
>> Say a crash, change in schema, new data added and so on.
> 
> Not as far as we can tell - one of my DB team noticed it day before yesterday - there were no tell tale signs of any
issuesbut we have seen issues yesterday with certain records not being found
 

Does that including looking at the logs?

> 
>> Per below the version 10 servers use en_GB.UTF_8 and the production one en_US.UTF-8.
>> Why the difference?
>> And did the production used to be en_GB.UTF_8?
> 
> All servers used to be en_US as I think that was the default and had been setup early in our use of PG. I suspect our
devopsguy corrected this during the upgrade.
 

Did that extend to fiddling with the locale on the production machine?

> 
>> Have you logged into the production instance using psql and tried the queries to see if they work?
> Not via psql, but as I said we saw issues from our app layer. I will do a similar test with psql and let you know
> 
> 
> Z
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com