Hi,
> At 2018-05-15 01:49:41, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> >=?GBK?B?19S8ug==?= <zoulx1982@163.com> writes:
> >> i run test using pg10.0 on my machine, and the program crashed on _bt_getbuf.
> >> And i found the following code:
> >> the routine _bt_page_recyclable say maybe the page is all-zero page,
> >> if so then the code run (BTPageOpaque) PageGetSpecialPointer(page);
> >> it will be failed because it access invalid memory.
> >> I don't know whether it is so. Look forward t your reply, thanks.
> >
> >This code's clearly broken, as was discussed before:
> >
> >https://www.postgresql.org/message-id/flat/2628.1474272158%40localhost
> >
> >but nothing was done about it, perhaps partly because we didn't have a
> >reproducible test case. Do you have one?
> >
> > regards, tom lane
>
> Unfortunately, I don't have a complete test case.
I recently checked about this code and previous discussion and tried to occur a crash.
I will describe how to occur a crash in the last of this mail, but I don't know whether it is useful because I used gdb
tooccur a crash, that it is not actually a reproducible test case.
As was discussed before, this crash happens when recycling an all-zeroes page in an index.
Referring to below comments in code, an all-zeroes page is created when backend downs in the split process after
extendingthe index's relation to get a new page and before making WAL entries for that.
bool
_bt_page_recyclable(Page page)
{
BTPageOpaque opaque;
/*
* It's possible to find an all-zeroes page in an index --- for example, a
* backend might successfully extend the relation one page and then crash
* before it is able to make a WAL entry for adding the page. If we find a
* zeroed page then reclaim it.
*/
if (PageIsNew(page))
return true;
...
}
After backend down at that time, an extended new page is not initialized since a recovery process after a backend down
donothing because of no WAL entry about a new page, and it will be recyclable when vacuum runs.
Considering above conditions, I reproduced a crash as below.
I tested at version in master(11beta1), compiled with --enable-cassert and --enable-debug, with hot-standby.
<<method for making recyclable new page>>
(psql) CREATE TABLE mytab (id int, val int);
(psql) CREATE INDEX idx_val ON mytab(val);
(gdb) b nbtinsert.c:1467 (at XLogBeginInsert(); in _bt_split())
(gdb) c
while(breakpoint is not hit){
(psql) INSERT INTO mytab SELECT t, t FROM generate_series(1, 3000) t;
}
[bash] kill -s SIGKILL (backend pid)
(psql) VACUUM;
<<method for occuring a crash>>
while(crash is not occurred){
(psql) INSERT INTO mytab SELECT t, t FROM generate_series(1, 3000) t;
}
Yoshikazu Imai