Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Support for N synchronous standby servers - take 2 |
Date | |
Msg-id | CAHGQGwFth8pnYhaLBx0nF8o4qmwctdzEOcWRqEu7HOwgdJGa3g@mail.gmail.com Whole thread Raw |
In response to | Re: Support for N synchronous standby servers - take 2 (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Responses |
Re: Support for N synchronous standby servers - take 2
|
List | pgsql-hackers |
On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > Hello, > > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoAJMDV1EUKMfeyaV24arx4pzUjGHYbY4ZxzKpkiCUvh0Q@mail.gmail.com> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI >> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> > Thank you for the new patch. Sorry to have overlooked some >> > versions. I'm looking the v19 patch now. >> > >> > make complains for an unused variable. > > Thank you. I'll have a closer look on it a bit later. > >> >> Attached latest patch incorporating all review comments so far. >> >> >> >> Aside from the review comments, I did following changes; >> >> - Add logic to avoid fatal exit in yy_fatal_error(). >> > >> > Maybe good catch, but.. >> > >> >> syncrep_scanstr(const char *str) >> > .. >> >> * Regain control after a fatal, internal flex error. It may have >> >> * corrupted parser state. Consequently, abandon the file, but trust >> > ~~~~~~~~~~~~~~~~ >> >> * that the state remains sane enough for syncrep_yy_delete_buffer(). >> > ~~~~~~~~~~~~~~~~~~~~~~~~ >> > >> > guc-file.l actually abandones the config file but syncrep_scanner >> > reads only a value of an item in it. And, the latter is >> > eventually true but a bit hard to understand. >> > >> > The patch will emit a mysterious error message like this. >> > >> >> invalid value for parameter "synchronous_standby_names": "2[a,b,c]" >> >> configuration file ".../postgresql.conf" contains errors >> > >> > This is utterly wrong. A bit related to that, it seems to me that >> > syncrep_scan.l doesn't need the same mechanism with >> > guc-file.l. The nature of the modification would be making >> > call_*_check_hook to be tri-state instead of boolean. So just >> > cathing errors in call_*_check_hook and ereport()'ing as SQL >> > parser does seems enough, but either will do for me. >> >> Well, I think that call_*_check_hook can not catch such a fatal error. > > As mentioned in my comment, SQL parser converts yy_fatal_error > into ereport(ERROR), which can be caught by the upper PG_TRY (by > #define'ing fprintf). So it is doable if you mind exit(). I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) is implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal flex fatal error occurs, postmaster just exits instead of jumping out of parser. ISTM that, when an internal flex fatal error occurs, it's better to elog(FATAL) and terminate the problematic process. This might lead to the server crash (e.g., if postmaster emits a FATAL error, it and its all child processes will exit soon). But probably we can live with this because the fatal error basically rarely happens. OTOH, if we make the process keep running even after it gets an internal fatal error (like Sawada's patch or your idea do), this might cause more serious problem. Please imagine the case where one walsender gets the fatal error (e.g., because of OOM), abandon new setting value of synchronous_standby_names, and keep running with the previous setting value. OTOH, the other walsender processes successfully parse the setting and keep running with new setting. In this case, the inconsistency of the setting which each walsender is based on happens. This completely will mess up the synchronous replication. Therefore, I think that it's better to make the problematic process exit with FATAL error rather than ignore the error and keep it running. Regards, -- Fujii Masao
pgsql-hackers by date: