Re: invalid magic number in log segment - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: invalid magic number in log segment |
Date | |
Msg-id | 52A81BC0.3080409@vmware.com Whole thread Raw |
In response to | invalid magic number in log segment ("Erikjan Rijkers" <er@xs4all.nl>) |
List | pgsql-hackers |
On 12/11/2013 09:44 AM, Erikjan Rijkers wrote: > I don't know whether the below constitutes a bug, but: > > Daily (sometimes even more often) I recompile 9.4devel (after git pull) to run a large dev database (100 GB or so). > > To avoid frequent initdb and many-hour-restore of data, I do this only when the following two #defines are unchanged: > CATALOG_VERSION_NO in src/include/catalog/catversion.h, and > PG_CONTROL_VERSION in src/include/catalog/pg_control.h > > the goal being to always run the latest db, without having to reload the ~100 GB dev db unexpectedly at inconvenient times. > > Generally, this works OK. > > However, the last few weeks I sometimes get, after such recompiling, 'invalid magic number' errors from which I don'tknow > how to recover (it means, apparently, an initdb is needed and I have then to reload the database). > > > 2013-12-11 00:15:25.627 CET 25304 LOG: received smart shutdown request > 2013-12-11 00:15:25.631 CET 25306 LOG: shutting down > 2013-12-11 00:15:25.904 CET 25306 LOG: database system is shut down > 2013-12-11 08:11:59.858 CET 25490 LOG: database system was shut down at 2013-12-11 00:15:25 CET > 2013-12-11 08:11:59.901 CET 25490 LOG: invalid magic number D078 in log segment 000000010000006300000034, offset 0 > 2013-12-11 08:11:59.901 CET 25490 LOG: invalid primary checkpoint record > 2013-12-11 08:11:59.901 CET 25490 LOG: invalid magic number D078 in log segment 000000010000006300000034, offset 0 > 2013-12-11 08:11:59.901 CET 25490 LOG: invalid secondary checkpoint record > 2013-12-11 08:11:59.901 CET 25490 PANIC: could not locate a valid checkpoint record > 2013-12-11 08:12:00.326 CET 25492 FATAL: the database system is starting up > 2013-12-11 08:12:01.328 CET 25493 FATAL: the database system is starting up > 2013-12-11 08:12:01.682 CET 25489 LOG: startup process (PID 25490) was terminated by signal 6: Aborted > 2013-12-11 08:12:01.682 CET 25489 LOG: aborting startup due to startup process failure > > > My question is two-fold: > > 1. (general:) is this 'invalid magic number' unexpected, and should it be reported always? The magic number it's complaining about is the constant stored in the WAL logs, that acts as a version number. Think of it as the CATALOG_VERSION_NO, but for the WAL log format. It's bumped whenever the WAL format changes. > 2. (for my setup specifically:) is there any way that I can recognize, beforehand, at the code base level, such an > impending 'invalid magic number' state? Keep an eye on the XLOG_PAGE_MAGIC constaint in xlog_internal.h. > Can de db be recovered from easily? (although this dev database is expendable, it takes many hours to rebuild; I'd liketo > avoid that if possible). Yes, you can use pg_resetxlog to recover. BTW, you could also use pg_upgrade. That should work also when catversion has been bumped. - Heikki
pgsql-hackers by date: