Thread: trying to run PITR recovery
Ayuda por favor
Cuando ejecuto este comando me sale una doble tupla del usuario postgres, esto me error cuando ejecuto el comando Pg_dump para realizar el respectivo backup..
radar=# select * from pg_user;
usename | usesysid | usecreatedb | usesuper | usecatupd | passwd | valuntil | useconfig
-----------+----------+-------------+----------+-----------+----------+----------+-----------
postgres | 1 | t | t | t | ******** | |
postgres | 1 | t | t | t | ******** | |
userradar | 100 | f | f | f | ******** | |
(3 rows)
He intentado borrar los usuario directamente de la tabla subiendo postgres com postgres -O -P -D directorio
Y corriendo estos comandos..
select * from pg_shadow;
delete from pg_shadow;
CREATE USER postgres WITH CREATEDB CREATEUSER PASSWORD '***' SYSID 1; CREATE USER userradar PASSWORD '***'
Pero me crea los mismos tres usuarios en la tabla pg_shadow, el problema es que no he podido hacer el backup y ya no tengo espacio en la particion donde tengo montada la base
Ayuda por favor
Cuando ejecuto este comando me sale una doble tupla del usuario postgres, esto me error cuando ejecuto el comando Pg_dump para realizar el respectivo backup..
radar=# select * from pg_user;
usename | usesysid | usecreatedb | usesuper | usecatupd | passwd | valuntil | useconfig
-----------+----------+-------------+----------+-----------+----------+----------+-----------
postgres | 1 | t | t | t | ******** | |
postgres | 1 | t | t | t | ******** | |
userradar | 100 | f | f | f | ******** | |
(3 rows)
He intentado borrar los usuario directamente de la tabla subiendo postgres com postgres -O -P -D directorio
Y corriendo estos comandos..
select * from pg_shadow;
delete from pg_shadow;
CREATE USER postgres WITH CREATEDB CREATEUSER PASSWORD '***' SYSID 1; CREATE USER userradar PASSWORD '***'
Pero me crea los mismos tres usuarios en la tabla pg_shadow, el problema es que no he podido hacer el backup y ya no tengo espacio en la particion donde tengo montada la base
De: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] En nombre de Warren Little
Enviado el: Friday, March 23, 2007 6:16 PM
Para: pgsql-admin@postgresql.org
Asunto: [ADMIN] trying to run PITR recovery
usename | usesysid | usecreatedb | usesuper | usecatupd | passwd | valuntil | useconfig
-----------+----------+-------------+----------+-----------+----------+----------+-----------
postgres | 1 | t | t | t | ******** | |
postgres | 1 | t | t | t | ******** | |
userradar | 100 | f | f | f | ******** | |
(3 rows)
pg_dump: saving database definition
pg_dump: SQL command failed
pg_dump: Error message from server: ERROR: more than one row returned by a subq
uery used as an expression
pg_dump: The command was: SELECT (SELECT usename FROM pg_user WHERE usesysid = d
atdba) as dba, pg_encoding_to_char(encoding) as encoding, datpath FROM pg_databa
se WHERE datname = 'radar'
pg_dump: *** aborted because of error
select * from pg_shadow;
delete from pg_shadow;
CREATE USER postgres WITH CREATEDB CREATEUSER PASSWORD '***' SYSID 1;
CREATE USER userradar PASSWORD '***'
De: Daniel Ricardo Medina
Enviado el: Friday, March 23, 2007 6:38 PM
Para: pgsql-admin@postgresql.org
Asunto: AYUDA URGENTE CON TABLA PG_SHADOW
Ayuda por favor
Cuando ejecuto este comando me sale una doble tupla del usuario postgres, esto me error cuando ejecuto el comando Pg_dump para realizar el respectivo backup..
radar=# select * from pg_user;
usename | usesysid | usecreatedb | usesuper | usecatupd | passwd | valuntil | useconfig
-----------+----------+-------------+----------+-----------+----------+----------+-----------
postgres | 1 | t | t | t | ******** | |
postgres | 1 | t | t | t | ******** | |
userradar | 100 | f | f | f | ******** | |
(3 rows)
He intentado borrar los usuario directamente de la tabla subiendo postgres com postgres -O -P -D directorio
Y corriendo estos comandos..
select * from pg_shadow;
delete from pg_shadow;
CREATE USER postgres WITH CREATEDB CREATEUSER PASSWORD '***' SYSID 1; CREATE USER userradar PASSWORD '***'
Pero me crea los mismos tres usuarios en la tabla pg_shadow, el problema es que no he podido hacer el backup y ya no tengo espacio en la particion donde tengo montada la base
Ayuda por favor
Cuando ejecuto este comando me sale una doble tupla del usuario postgres, esto me error cuando ejecuto el comando Pg_dump para realizar el respectivo backup..
radar=# select * from pg_user;
usename | usesysid | usecreatedb | usesuper | usecatupd | passwd | valuntil | useconfig
-----------+----------+-------------+----------+-----------+----------+----------+-----------
postgres | 1 | t | t | t | ******** | |
postgres | 1 | t | t | t | ******** | |
userradar | 100 | f | f | f | ******** | |
(3 rows)
He intentado borrar los usuario directamente de la tabla subiendo postgres com postgres -O -P -D directorio
Y corriendo estos comandos..
select * from pg_shadow;
delete from pg_shadow;
CREATE USER postgres WITH CREATEDB CREATEUSER PASSWORD '***' SYSID 1; CREATE USER userradar PASSWORD '***'
Pero me crea los mismos tres usuarios en la tabla pg_shadow, el problema es que no he podido hacer el backup y ya no tengo espacio en la particion donde tengo montada la base
De: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] En nombre de Warren Little
Enviado el: Friday, March 23, 2007 6:16 PM
Para: pgsql-admin@postgresql.org
Asunto: [ADMIN] trying to run PITR recovery
"Daniel Ricardo Medina" <dmedina@computec.com.co> writes: > I have try to run this command in single user mod > > select * from pg_shadow; > delete from pg_shadow; > CREATE USER postgres WITH CREATEDB CREATEUSER PASSWORD '***' SYSID 1; > CREATE USER userradar PASSWORD '***' > but the result is the same..... What do you see if you look in pg_shadow right after doing the DELETE? What PG version is this? regards, tom lane
Deleting doesn't work because postgres only looks for the record shown by the index (though this may have been fixed in more recent releases). At least this is the behaviour I observed.
The way I fixed it was to drop the unique index, remove the offending rows, reinsert only one of the offending rows (so things would once again be unique), and remake the index. It worked.
There are two things that concern me:
1. Trying to do this on a system table is likely ill-advised at best, and dangerous at worst.
2. If the table only has 3 rows in it and it got corrupted, there's something really wrong. My table in question was a user table that had almost a million rows.
"Daniel Ricardo Medina" <dmedina@computec.com.co> writes:
> I have try to run this command in single user mod
>
> select * from pg_shadow;
> delete from pg_shadow;
> CREATE USER postgres WITH CREATEDB CREATEUSER PASSWORD '***' SYSID 1;
> CREATE USER userradar PASSWORD '***'
> but the result is the same.....
What do you see if you look in pg_shadow right after doing the DELETE?
What PG version is this?
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
http://www.postgresql.org/about/donate
Warren Little <Warren.Little@MeridiasCapital.com> writes: > I'm testing my PITR recovery procedures and something doesn't look > right. > The following is from the logs upon starting postgres with > recovery.conf file in place > @ 2007-03-23 05:57:33 MDTLOG: restored log file > "000000010000011A000000FD" from archive > @ 2007-03-23 05:57:35 MDTLOG: incorrect resource manager data > checksum in record at 11A/FD492B20 > @ 2007-03-23 05:57:35 MDTLOG: redo done at 11A/FD492210 > My concern is that there were many more logfiles to be played > following "00000010000011A000000FD" > (ie 000000010000011E0000005C) yet it appears the recovery stop at > that point and called it good. Perhaps you have a bad copy of that particular log segment file? The "incorrect checksum" failure is normal in certain circumstances (eg, where the system crashes partway through writing out a large WAL record), which is why the code treats it as indicating the end of WAL. But in this case it looks more like corrupt data ... regards, tom lane
After delete all, postgres said no exist the user id 1... The version is 7.4.8 Daniel Ricardo Medina Ing. de Desarrollo Computec S.A. Tel 260 71 11 Ext 1640 -----Mensaje original----- De: Tom Lane [mailto:tgl@sss.pgh.pa.us] Enviado el: Sunday, March 25, 2007 1:52 PM Para: Daniel Ricardo Medina CC: pgsql-admin@postgresql.org Asunto: Re: [ADMIN] URGENT TABLE PG_SHADOW CORRUTEP "Daniel Ricardo Medina" <dmedina@computec.com.co> writes: > I have try to run this command in single user mod > > select * from pg_shadow; > delete from pg_shadow; > CREATE USER postgres WITH CREATEDB CREATEUSER PASSWORD '***' SYSID 1; > CREATE USER userradar PASSWORD '***' > but the result is the same..... What do you see if you look in pg_shadow right after doing the DELETE? What PG version is this? regards, tom lane
"Daniel Ricardo Medina" <dmedina@computec.com.co> writes: > After delete all, postgres said no exist the user id 1... Maybe reindexing pg_shadow would help. regards, tom lane
De: Peter Koczan [mailto:pjkoczan@gmail.com]
Enviado el: Sunday, March 25, 2007 7:28 PM
Para: Tom Lane
CC: Daniel Ricardo Medina; pgsql-admin@postgresql.org
Asunto: Re: [ADMIN] URGENT TABLE PG_SHADOW CORRUTEP
Deleting doesn't work because postgres only looks for the record shown by the index (though this may have been fixed in more recent releases). At least this is the behaviour I observed.
The way I fixed it was to drop the unique index, remove the offending rows, reinsert only one of the offending rows (so things would once again be unique), and remake the index. It worked.
There are two things that concern me:
1. Trying to do this on a system table is likely ill-advised at best, and dangerous at worst.
2. If the table only has 3 rows in it and it got corrupted, there's something really wrong. My table in question was a user table that had almost a million rows.
"Daniel Ricardo Medina" <dmedina@computec.com.co> writes:
> I have try to run this command in single user mod
>
> select * from pg_shadow;
> delete from pg_shadow;
> CREATE USER postgres WITH CREATEDB CREATEUSER PASSWORD '***' SYSID 1;
> CREATE USER userradar PASSWORD '***'
> but the result is the same.....
What do you see if you look in pg_shadow right after doing the DELETE?
What PG version is this?
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
http://www.postgresql.org/about/donate
On Fri, 2007-03-23 at 17:16 -0600, Warren Little wrote: > My concern is that there were many more logfiles to be played > following "00000010000011A000000FD" > (ie 000000010000011E0000005C) yet it appears the recovery stop at that > point and called it good. > I would assume all WAL logs would be restored. I'm interested in your feedback here. How would you like it to have acted? The WAL file was clearly corrupt. 1. Don't continue and don't come up. Have the recovery fail. In order to bring the server up, we would have to restart recovery with an additional command to say "I note that my recovery has failed and would like recovery to come up at the last possible point." 2. Attempt to continue after we fail the CRC check. This is both dangerous and in many cases won't work either, since this is one of the normal ending points. 3. Continue after a CRC check, don't attempt to apply the records, just look at them to determine if they look correct. i.e. see if the CRC error applies to just that record 4. Add a command to ignore specific WAL records ignore_record = '11A/FD492B20' These may also not work very well at all, since many records depend upon previous data changes, so could quickly end in further errors. What would you suggest? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
@ 2007-03-23 05:57:33 MDTLOG: restored log file"000000010000011A000000FD" from archive@ 2007-03-23 05:57:35 MDTLOG: incorrect resource manager datachecksum in record at 11A/FD492B20@ 2007-03-23 05:57:35 MDTLOG: redo done at 11A/FD492210
On Fri, 2007-03-23 at 17:16 -0600, Warren Little wrote:My concern is that there were many more logfiles to be playedfollowing "00000010000011A000000FD"(ie 000000010000011E0000005C) yet it appears the recovery stop at thatpoint and called it good.I would assume all WAL logs would be restored.I'm interested in your feedback here. How would you like it to haveacted?The WAL file was clearly corrupt.1. Don't continue and don't come up. Have the recovery fail. In order tobring the server up, we would have to restart recovery with anadditional command to say "I note that my recovery has failed and wouldlike recovery to come up at the last possible point."2. Attempt to continue after we fail the CRC check. This is bothdangerous and in many cases won't work either, since this is one of thenormal ending points.3. Continue after a CRC check, don't attempt to apply the records, justlook at them to determine if they look correct. i.e. see if the CRCerror applies to just that record4. Add a command to ignore specific WAL recordsignore_record = '11A/FD492B20'These may also not work very well at all, since many records depend uponprevious data changes, so could quickly end in further errors.What would you suggest?--Simon Riggs
On Fri, 2007-03-30 at 08:23 -0600, Warren Little wrote: > On slightly different topic, is there some way to determine the > timeline of the corrupted segment, ie what was the original time of > the last restored transaction. You need to examine the WAL files using xlogdump available from pgfoundry.org - its not part of the main distribution yet. Reporting the time of the last commit/abort record seen in the WAL seems like an easy and useful addition to make. e.g. LOG: last transaction originally completed at 2007-02-23 05:57:35 Point-in-time recovery really ought to tell you the point-in-time it has stopped at sounds blindingly obvious in retrospect. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Sun, 2007-03-25 at 20:57 -0400, Tom Lane wrote: > Warren Little <Warren.Little@MeridiasCapital.com> writes: > > I'm testing my PITR recovery procedures and something doesn't look > > right. > > The following is from the logs upon starting postgres with > > recovery.conf file in place > > > @ 2007-03-23 05:57:33 MDTLOG: restored log file > > "000000010000011A000000FD" from archive > > @ 2007-03-23 05:57:35 MDTLOG: incorrect resource manager data > > checksum in record at 11A/FD492B20 > > @ 2007-03-23 05:57:35 MDTLOG: redo done at 11A/FD492210 > > > My concern is that there were many more logfiles to be played > > following "00000010000011A000000FD" > > (ie 000000010000011E0000005C) yet it appears the recovery stop at > > that point and called it good. > > Perhaps you have a bad copy of that particular log segment file? > The "incorrect checksum" failure is normal in certain circumstances > (eg, where the system crashes partway through writing out a large > WAL record), which is why the code treats it as indicating the end > of WAL. But in this case it looks more like corrupt data ... Tom, I think there is a problem here. If we stop before the end of logs we should be incrementing the timeline id. I'm not going to think about it right now, but I'll think on that next week. Not the most likely problem we'll ever see, I grant you. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > I think there is a problem here. If we stop before the end of logs we > should be incrementing the timeline id. There is no good reason here to think that we have stopped before the end of logs, and I don't think I want the code bumping the timeline ID on every crash restart. regards, tom lane
On Fri, 2007-03-30 at 12:40 -0400, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > I think there is a problem here. If we stop before the end of logs we > > should be incrementing the timeline id. > > There is no good reason here to think that we have stopped before the > end of logs, and I don't think I want the code bumping the timeline ID > on every crash restart. The timeline is a protection against confusing ourselves when we have two log files both called the same thing. In the OP's case, there were clearly unapplied log files that end up as duplicates. I can see the difficulty in knowing whether or not to bump the timeline id. At very least we need to document this, since if the manual's advice were taken "The archive command should generally be designed to refuse to overwrite any pre-existing archive file." then the OP's system would start throwing errors when the first xlog fills after the recovered system re-enters normal operation. We should say: "During recovery it is possible, if you're unlucky, that one of the WAL files has been damaged. If so, recovery will stop at the point at which the damage has occurred. It is probable that WAL files higher than the damaged WAL file exist in the archive. If that is the case, you may need to begin archiving to a different location, or move the earlier WAL files out of the archive, to allow the newly restored server to continue archive operations correctly. If you don't, the server will operate normally but further archiving may not occur correctly. Take good care of your archived WAL files or better still take two copies.". -- Simon Riggs EnterpriseDB http://www.enterprisedb.com