Re: Streaming replication - invalid resource manager ID - Mailing list pgsql-general
From | Laurenz Albe |
---|---|
Subject | Re: Streaming replication - invalid resource manager ID |
Date | |
Msg-id | 789fdb8f4c304f8e70cf9e13eee50080690349fa.camel@cybertec.at Whole thread Raw |
In response to | Streaming replication - invalid resource manager ID ("Maeldron T." <maeldron@gmail.com>) |
List | pgsql-general |
Maeldron T. wrote: > I’m a bit stuck with a few LOG messages. The one in the subject disturbs me the most. > > (FreeBSD 11.2, PostgreSQL 10.6) > > When I restart the hot standby, sometimes I see > > 2019-02-13 21:17:31 CET LOG: redo starts at 0/488A2748 > 2019-02-13 21:17:31 CET LOG: invalid record length at 0/488A2780: wanted 24, got 0 > 2019-02-13 21:17:31 CET LOG: database system is ready to accept read only connections > 2019-02-13 21:17:31 CET LOG: started streaming WAL from primary at 0/48000000 on timeline 1 > > I try to believe it is harmless. I found two emails in the archive telling that this is harmless, > but they were from many years ago. Is it really harmless? (Without PANIC, etc). > The main reason I believe it’s harmless is that I often see it. It is harmless. When WAL is replayed from a file, there comes the time when you hit the end of the WAL. WAL files have a fixed, preallocated length, so you notice that you have reached the end if you try to read the next WAL record and find only garbage, or the checksum is not correct. This is a typical "I found garbage, so I guess it's the end of WAL". PostgreSQL just explains why it considers the WAL record garbage, so debugging is easier. If the database logs a LOG, it thinks that nothing is wrong. > The one I see less often is: "invalid resource manager ID" > > The last time I encountered it when I made the standby the new master, and the original master the new standby. Steps: That is the same thing, only the check if the next WAL record header is valid passed the first check (the length was big enough) and fails the second. It means that the number found in the "resource manager" field of the header was bigger than the allowed maximum. See the function "ValidXLogRecordHeader" in "src/backend/access/transam/xlogreader.c". > 2019-02-13 20:38:45 CET LOG: entering standby mode > 2019-02-13 20:38:45 CET LOG: redo starts at 8F42/E7DC51B8 > 2019-02-13 20:38:47 CET WARNING: could not open directory "base/26083437": No such file or directory > 2019-02-13 20:38:47 CET CONTEXT: WAL redo at 8F42/EDE3E720 for Database/DROP: dir 26083437/1663 > 2019-02-13 20:38:47 CET WARNING: some useless files may be left behind in old database directory "base/26083437" > 2019-02-13 20:38:47 CET CONTEXT: WAL redo at 8F42/EDE3E720 for Database/DROP: dir 26083437/1663 > > I don’t like these because they are WARNINGS. I believe they might be there because sometimes > I check the replication with a createdb something (on master); psql -l (on standby); > dropdb something (on master). Although I do it when they both are running. Wow, that's an ... interesting method to check if replication is working. What about something lighter, like running an UPDATE on a test table? If you read "dbase_redo" in "src/backend/commands/dbcommands.c", you'll see this: /* * Our theory for replaying a CREATE is to forcibly drop the target * subdirectory if present, then re-copy the source data. This may be * more work than needed, but it is simple to implement. */ if (stat(dst_path, &st) == 0 && S_ISDIR(st.st_mode)) { if (!rmtree(dst_path, true)) /* If this failed, copydir() below is going to error. */ ereport(WARNING, (errmsg("some useless files may be left behind in old database directory \"%s\"", dst_path))); } That explains this nicely. > Since I’m not someone who says: "It’s okay, it works", I ended up running pg_basebackups after each > promote due to these messages. > I have been using PostgreSQL for ages. I use and love almost every feature of it. > > The log messages, on the other hand, are more esoteric to me than the original Maya calendar > was which sank with Atlantis. Maybe I’m not the only user, who, without reading the source code, > would expect something like this: > > LOG: Everything is all right. I did what you asked, and I talk about it. > NOTICE: Please fix your pg_hba.conf > WARNING: If I weren’t rock solid ACID, you would be in a trouble now. Don’t do this again. > ERROR: The query was wrong but your data is safe. > FATAL: Do you have a backup? > PANIC: Do you have a legal insurance? > > But the reality in my understanding is something like this: > > LOG: Your data is either corrupted or not. > WARNING: I think something is wrong. (But it’s not) > ERROR: Your data is either corrupted or not. > FATAL: I’m the least harmful kind of log message in PostgreSQL. Just give me two seconds. Your data is safe. The logic is like this: LOG: This should go to the log so the reader knows what happened. No reason to worry. NOTICE: Similar to log, but for the client, not the DBA. "Hey, I automatically created a sequence for you." WARNING: Processing can continue, but this looks suspiciously like you made a mistake somewhere. (e.g., entering COMMIT when no transaction is active.) ERROR: The current activity has to be interrupted, and the current transaction is dead. This could be a typo or data corruption. FATAL: A database connection has to be terminated. It is fatal only for the connection, not for the database. (e.g., the WAL receiver complains that hist friend on the other end has hung up, so it goes to diem having nothing to do) PANIC: an error in a critical section there is no good way to recover from this, and shared memory state might be corrupted, so reinitialize and perform crash recovery. I have included the references to the source deliberately. Since you say you like PostgreSQL and are someone who wants to understand what is going on, your way will eventually lead to the source code. Many of the error messages only make sense if you understand how PostgreSQL works, e.g. that there is a WAL receiver process that cannot proceed if the master server is shut down. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
pgsql-general by date: