Thread: Primary key Index Error

Primary key Index Error

From

Manoj K P

Date:

24 October 2011, 08:40:27

Postgres recovered from data folder , after that some queries shows error


select * from table2 order by app_id  ; - its work ( 50000000 data)
select * from table2 order by app_id   desc ; - its work

Here app_id contains  binary index


select * from table2 order by id  ; - its work ( 50000000 data)
select * from table2 order by id   desc ; - shows following error

manoj[local] postgres ERROR:  index "tbl2_id_pkey" contains unexpected zero
page at block 311121
manoj [local] postgres HINT:  Please REINDEX it.

index "tbl2_id_pkey" is primary key

Reindex not pratical this table (250 GB data)

How i can solve this ?

Regards
Manoj K P

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Primary-key-Index-Error-tp4931714p4931714.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: Primary key Index Error

From

manoj

Date:

24 October 2011, 16:19:02

On 10/24/2011 06:38 PM, Merlin Moncure wrote:

On Mon, Oct 24, 2011 at 3:22 AM, Manoj K P <manoj@comodo.com> wrote:

Postgres recovered from data folder , after that some queries shows error


select * from table2 order by app_id  ; - its work ( 50000000 data)
select * from table2 order by app_id   desc ; - its work

Here app_id contains  binary index


select * from table2 order by id  ; - its work ( 50000000 data)
select * from table2 order by id   desc ; - shows following error

manoj[local] postgres ERROR:  index "tbl2_id_pkey" contains unexpected zero
page at block 311121
manoj [local] postgres HINT:  Please REINDEX it.

index "tbl2_id_pkey" is primary key

Reindex not pratical this table (250 GB data)

How i can solve this ?

It looks like you have data corruption.  Your first priority should be
to do the minimum necessary to get a usable database dump and then to
get back online.  I would consider your database off line until this
is done. Essentially, the only practical ways to repair the error you
see would be to reindex or restore from backup.  Did you have any
unexpected power events?  Server crashes?

merlin

  Actually this is backup server ,no power failure & Server crashes happens  in between , The data folder is base backup(pg_start_backup) on the server. pg_dump not practical in this case  because of data size. After recovering from base backup run  WAL recovery file successfully and its updates current date data.  When i am running query on big table    (more than 5 Million  data)  shows error other wise its work fine.
 select id , name  from table2 order by id  ; It uses the index

  select id , name  from  table2 order by id desc ; - shows the error

Error happens only using (DESC clause against primary key) , all other case its work fine

 REINDEX will take ages

--
Manoj K P
Postgres DBA
Comodo India

Attachment

smime.p7s

Re: Primary key Index Error

From

Merlin Moncure

Date:

24 October 2011, 16:27:39

On Mon, Oct 24, 2011 at 8:32 AM, manoj <manoj@comodo.com> wrote:
> On 10/24/2011 06:38 PM, Merlin Moncure wrote:
>
> On Mon, Oct 24, 2011 at 3:22 AM, Manoj K P <manoj@comodo.com> wrote:
>
> Postgres recovered from data folder , after that some queries shows error
>
>
> select * from table2 order by app_id  ; - its work ( 50000000 data)
> select * from table2 order by app_id   desc ; - its work
>
> Here app_id contains  binary index
>
>
> select * from table2 order by id  ; - its work ( 50000000 data)
> select * from table2 order by id   desc ; - shows following error
>
> manoj[local] postgres ERROR:  index "tbl2_id_pkey" contains unexpected zero
> page at block 311121
> manoj [local] postgres HINT:  Please REINDEX it.
>
> index "tbl2_id_pkey" is primary key
>
> Reindex not pratical this table (250 GB data)
>
> How i can solve this ?
>
> It looks like you have data corruption.  Your first priority should be
> to do the minimum necessary to get a usable database dump and then to
> get back online.  I would consider your database off line until this
> is done. Essentially, the only practical ways to repair the error you
> see would be to reindex or restore from backup.  Did you have any
> unexpected power events?  Server crashes?
>
> merlin
>
>   Actually this is backup server ,no power failure & Server crashes happens
> in between , The data folder is base backup(pg_start_backup) on the server.
> pg_dump not practical in this case  because of data size.
>   After recovering from base backup run  WAL recovery file successfully and
> its updates current date data.
>   When i am running query on big table    (more than 5 Million  data)  shows
> error other wise its work fine.
>
>   select id , name  from table2 order by id  ; It uses the index
>
>   select id , name  from  table2 order by id desc ; - shows the error
>
> Error happens only using (DESC clause against primary key) , all other case
> its work fine
>
>  REINDEX will take ages

do you happen to still have the database logs on the backup from
startup to end of recovery?  anything interesting in there?

merlin

Re: Primary key Index Error

From

Merlin Moncure

Date:

24 October 2011, 17:17:43

On Mon, Oct 24, 2011 at 3:22 AM, Manoj K P <manoj@comodo.com> wrote:
> Postgres recovered from data folder , after that some queries shows error
>
>
> select * from table2 order by app_id  ; - its work ( 50000000 data)
> select * from table2 order by app_id   desc ; - its work
>
> Here app_id contains  binary index
>
>
> select * from table2 order by id  ; - its work ( 50000000 data)
> select * from table2 order by id   desc ; - shows following error
>
> manoj[local] postgres ERROR:  index "tbl2_id_pkey" contains unexpected zero
> page at block 311121
> manoj [local] postgres HINT:  Please REINDEX it.
>
> index "tbl2_id_pkey" is primary key
>
> Reindex not pratical this table (250 GB data)
>
> How i can solve this ?

It looks like you have data corruption.  Your first priority should be
to do the minimum necessary to get a usable database dump and then to
get back online.  I would consider your database off line until this
is done. Essentially, the only practical ways to repair the error you
see would be to reindex or restore from backup.  Did you have any
unexpected power events?  Server crashes?

merlin

Re: Primary key Index Error

From

Manoj K P

Date:

25 October 2011, 06:10:11

Server log Oct 1 00:06:59 server_host_name postgres[1453]: [5-1] 2011-10-01 00:06:59.831 EDT 1453 4e869041.5ad postgres [local] postgres LOG: duration: 418583.238 ms statement: select pg_start_backup('fortnightly'); Oct 2 03:03:18 server_host_name postgres[1453]: [6-1] 2011-10-02 03:03:18.243 EDT 1453 4e869041.5ad postgres [local] postgres LOG: duration: 8034.385 ms statement: select pg_stop_backup(); In between stop and start process server_host_name is receiving all type of DML & DDL and generating new WAL file Taking base backup in between start and stop process Client Log Details Oct 25 05:16:18 client_server_name postgres[28858]: [2-1] 2011-10-25 05:16:18.202 BST 28858 LOG: could not open file "pg_xlog/00002710000047B10000008C" (log file 18353, segment 140): No such file or directory Oct 25 05:16:18 client_server_name postgres[28858]: [3-1] 2011-10-25 05:16:18.203 BST 28858 LOG: invalid checkpoint record Oct 25 05:16:18 client_server_name postgres[28858]: [4-1] 2011-10-25 05:16:18.203 BST 28858 FATAL: could not locate required checkpoint record Oct 25 05:16:18 client_server_name postgres[28858]: [4-2] 2011-10-25 05:16:18.203 BST 28858 HINT: If you are not restoring from a backup, try removing the file "/mnt/new_cluster/backup_label". Oct 25 05:16:18 client_server_name postgres[28857]: [1-1] 2011-10-25 05:16:18.205 BST 28857 LOG: startup process (PID 28858) exited with exit code 1 Oct 25 05:16:18 client_server_name postgres[28857]: [2-1] 2011-10-25 05:16:18.205 BST 28857 LOG: aborting startup due to startup process failure Oct 25 05:20:53 client_server_name postgres[29030]: [2-1] 2011-10-25 05:20:53.630 BST 29030 LOG: could not open file "pg_xlog/00002710000047B100000068" (log file 18353, segment 104): No such file or directory Oct 25 05:20:53 client_server_name postgres[29030]: [3-1] 2011-10-25 05:20:53.630 BST 29030 FATAL: could not find redo location referenced by checkpoint record Oct 25 05:20:53 client_server_name postgres[29030]: [3-2] 2011-10-25 05:20:53.630 BST 29030 HINT: If you are not restoring from a backup, try removing the file "/mnt/new_cluster/backup_label". Oct 25 05:20:53 client_server_name postgres[29029]: [1-1] 2011-10-25 05:20:53.633 BST 29029 LOG: startup process (PID 29030) exited with exit code 1 Oct 25 05:20:53 client_server_name postgres[29029]: [2-1] 2011-10-25 05:20:53.633 BST 29029 LOG: aborting startup due to startup process failure manually copy following file to pg_xlog folder 00002710000047B10000008C 00002710000047B100000068 After words i can start postgres and accessing the database , but same error

View this message in context: Re: Primary key Index Error
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: Primary key Index Error

From

Raghavendra

Date:

25 October 2011, 07:41:55

On Tue, Oct 25, 2011 at 11:40 AM, Manoj K P <manoj@comodo.com> wrote:

Server log Oct 1 00:06:59 server_host_name postgres[1453]: [5-1] 2011-10-01 00:06:59.831 EDT 1453 4e869041.5ad postgres [local] postgres LOG: duration: 418583.238 ms statement: select pg_start_backup('fortnightly'); Oct 2 03:03:18 server_host_name postgres[1453]: [6-1] 2011-10-02 03:03:18.243 EDT 1453 4e869041.5ad postgres [local] postgres LOG: duration: 8034.385 ms statement: select pg_stop_backup(); In between stop and start process server_host_name is receiving all type of DML & DDL and generating new WAL file Taking base backup in between start and stop process Client Log Details Oct 25 05:16:18 client_server_name postgres[28858]: [2-1] 2011-10-25 05:16:18.202 BST 28858 LOG: could not open file "pg_xlog/00002710000047B10000008C" (log file 18353, segment 140): No such file or directory Oct 25 05:16:18 client_server_name postgres[28858]: [3-1] 2011-10-25 05:16:18.203 BST 28858 LOG: invalid checkpoint record Oct 25 05:16:18 client_server_name postgres[28858]: [4-1] 2011-10-25 05:16:18.203 BST 28858 FATAL: could not locate required checkpoint record Oct 25 05:16:18 client_server_name postgres[28858]: [4-2] 2011-10-25 05:16:18.203 BST 28858 HINT: If you are not restoring from a backup, try removing the file "/mnt/new_cluster/backup_label". Oct 25 05:16:18 client_server_name postgres[28857]: [1-1] 2011-10-25 05:16:18.205 BST 28857 LOG: startup process (PID 28858) exited with exit code 1 Oct 25 05:16:18 client_server_name postgres[28857]: [2-1] 2011-10-25 05:16:18.205 BST 28857 LOG: aborting startup due to startup process failure Oct 25 05:20:53 client_server_name postgres[29030]: [2-1] 2011-10-25 05:20:53.630 BST 29030 LOG: could not open file "pg_xlog/00002710000047B100000068" (log file 18353, segment 104): No such file or directory Oct 25 05:20:53 client_server_name postgres[29030]: [3-1] 2011-10-25 05:20:53.630 BST 29030 FATAL: could not find redo location referenced by checkpoint record Oct 25 05:20:53 client_server_name postgres[29030]: [3-2] 2011-10-25 05:20:53.630 BST 29030 HINT: If you are not restoring from a backup, try removing the file "/mnt/new_cluster/backup_label". Oct 25 05:20:53 client_server_name postgres[29029]: [1-1] 2011-10-25 05:20:53.633 BST 29029 LOG: startup process (PID 29030) exited with exit code 1 Oct 25 05:20:53 client_server_name postgres[29029]: [2-1] 2011-10-25 05:20:53.633 BST 29029 LOG: aborting startup due to startup process failure manually copy following file to pg_xlog folder 00002710000047B10000008C 00002710000047B100000068 After words i can start postgres and accessing the database , but same error

As per the logs, do you see missing XLOG files in Archive Destination ? becz these kind of situations mostly missing files will be in WAL-Archive location. You need to copy to pg_xlog directory and start the instance.

As Merlin Said, you need to dig more to know why its crashing by increasing the LOG-DEBUG level's. Increasing DEBUG level may occupy good amount of space in log-location, so make sure you have good space for logs to get what exactly happening at the time of backup in particular. Am not sure whether its safe to attach ***backtrace*** to instance for information.

---

Regards,

Raghavendra

EnterpriseDB Corporation

Blog: http://raghavt.blogspot.com/

Re: Primary key Index Error

From

Merlin Moncure

Date:

25 October 2011, 13:09:53

On Tue, Oct 25, 2011 at 2:41 AM, Raghavendra
<raghavendra.rao@enterprisedb.com> wrote:
> On Tue, Oct 25, 2011 at 11:40 AM, Manoj K P <manoj@comodo.com> wrote:
>>
>> Server log Oct 1 00:06:59 server_host_name postgres[1453]: [5-1]
>> 2011-10-01 00:06:59.831 EDT 1453 4e869041.5ad postgres [local] postgres LOG:
>> duration: 418583.238 ms statement: select pg_start_backup('fortnightly');
>> Oct 2 03:03:18 server_host_name postgres[1453]: [6-1] 2011-10-02
>> 03:03:18.243 EDT 1453 4e869041.5ad postgres [local] postgres LOG: duration:
>> 8034.385 ms statement: select pg_stop_backup(); In between stop and start
>> process server_host_name is receiving all type of DML & DDL and generating
>> new WAL file Taking base backup in between start and stop process Client Log
>> Details Oct 25 05:16:18 client_server_name postgres[28858]: [2-1] 2011-10-25
>> 05:16:18.202 BST 28858 LOG: could not open file
>> "pg_xlog/00002710000047B10000008C" (log file 18353, segment 140): No such
>> file or directory Oct 25 05:16:18 client_server_name postgres[28858]: [3-1]
>> 2011-10-25 05:16:18.203 BST 28858 LOG: invalid checkpoint record Oct 25
>> 05:16:18 client_server_name postgres[28858]: [4-1] 2011-10-25 05:16:18.203
>> BST 28858 FATAL: could not locate required checkpoint record Oct 25 05:16:18
>> client_server_name postgres[28858]: [4-2] 2011-10-25 05:16:18.203 BST 28858
>> HINT: If you are not restoring from a backup, try removing the file
>> "/mnt/new_cluster/backup_label". Oct 25 05:16:18 client_server_name
>> postgres[28857]: [1-1] 2011-10-25 05:16:18.205 BST 28857 LOG: startup
>> process (PID 28858) exited with exit code 1 Oct 25 05:16:18
>> client_server_name postgres[28857]: [2-1] 2011-10-25 05:16:18.205 BST 28857
>> LOG: aborting startup due to startup process failure Oct 25 05:20:53
>> client_server_name postgres[29030]: [2-1] 2011-10-25 05:20:53.630 BST 29030
>> LOG: could not open file "pg_xlog/00002710000047B100000068" (log file 18353,
>> segment 104): No such file or directory Oct 25 05:20:53 client_server_name
>> postgres[29030]: [3-1] 2011-10-25 05:20:53.630 BST 29030 FATAL: could not
>> find redo location referenced by checkpoint record Oct 25 05:20:53
>> client_server_name postgres[29030]: [3-2] 2011-10-25 05:20:53.630 BST 29030
>> HINT: If you are not restoring from a backup, try removing the file
>> "/mnt/new_cluster/backup_label". Oct 25 05:20:53 client_server_name
>> postgres[29029]: [1-1] 2011-10-25 05:20:53.633 BST 29029 LOG: startup
>> process (PID 29030) exited with exit code 1 Oct 25 05:20:53
>> client_server_name postgres[29029]: [2-1] 2011-10-25 05:20:53.633 BST 29029
>> LOG: aborting startup due to startup process failure manually copy following
>> file to pg_xlog folder 00002710000047B10000008C 00002710000047B100000068
>> After words i can start postgres and accessing the database , but same
>> error
>
> As per the logs, do you see missing XLOG files in Archive Destination ? becz
> these kind of situations mostly missing files will be in WAL-Archive
> location. You need to copy to pg_xlog directory and start the instance.
>
> As Merlin Said, you need to dig more to know why its crashing by increasing
> the LOG-DEBUG level's. Increasing DEBUG level may occupy good amount of
> space in log-location, so make sure you have good space for logs to get what
> exactly happening at the time of backup in particular. Am not sure whether
> its safe to attach ***backtrace*** to instance for information.

yeah. also, what's the setting of archive_command (or is it even set)?
 taking a 'hot' filesystem backup without having an archive_command
and not doing any other intervention to guarantee the necessary WAL
segments are present will not give you a complete backup.  my money is
on you having an invalid backup procedure.  the only way to take a
filesystem snapshot without dealing with WAL files is to bring the
database down.

merlin