Re: WIP: Access method extendability - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: WIP: Access method extendability |
Date | |
Msg-id | CA+U5nMKNcbcy1wFZVfr3S9QREJSVJ_xNuU1VS9o0McTjFgtZxg@mail.gmail.com Whole thread Raw |
In response to | WIP: Access method extendability (Alexander Korotkov <aekorotkov@gmail.com>) |
Responses |
Re: WIP: Access method extendability
Re: WIP: Access method extendability Re: WIP: Access method extendability |
List | pgsql-hackers |
On 15 October 2014 13:08, Alexander Korotkov <aekorotkov@gmail.com> wrote: > Postgres was initially designed to support access methods extendability. > This extendability lives to present day. However, this is mostly internal > in-core extendability. One can quite easily add new access method into > PostgreSQL core. But if one try to implement access method as external > module, he will be faced with following difficulties: ... > Problem of WAL is a bit more complex. According to previous discussions, we > don't want to let extensions declare their own xlog records. If we let them > then recovery process will depend on extensions. That is much violates > reliability. Solution is to implement some generic xlog record which is able > to represent difference between blocks in some general manner. Thank you for progressing with these thoughts. I'm still a little uncertain about the approach, now my eyes are open to the problems of extendability. The main problem we had in the past was that GiST and GIN indexes both had faulty implementations for redo, which in some cases caused severe issues. Adding new indexes will also suffer the same problems, so I see a different starting place. The faults there raised the need for us to be able to mark specific indexes as corrupt, so that they could be avoided during Hot Standby and in normal running after promotion. Here's the order of features I think we need 1. A mechanism to mark an index as corrupt so that it won't be usable by queries. That needs to work during recovery, so we can persist a data structure which tells us which indexes are corrupt. Then something that checks whether an index is known corrupt during relcache access. So if we decide an index is bad, we record the index as corrupt and then fire a relcache invalidation. 2. Some additional code in Autovacuum to rebuild corrupt indexes at startup, using AV worker processes to perform a REINDEX CONCURRENTLY. This will give us what we need to allow an AM to behave sensibly, even in the face of its own bugs. It also gives us UNLOGGED indexes for free. Unlogged indexes means we can change the way unlogged tables behave to allow them to truncate down to the highest unchanged data at recovery, so we don't lose all the data when we crash. 3. That then allows us to move towards having indexes that are marked "changed" when we perform first DML on the table in any checkpoint cycle. Which allows us to rebuild indexes which were in the middle of being changed when we crashed. (The way we'd do that is to have an LSN on the metapage and then only write WAL for the metapage). The difference here is that they are UNLOGGED but do not get trashed on recovery unless they were in the process of changing. If we do those things, then we won't even need to worry about needing AMs to write their own WAL records. Recovery will be safe AND we won't need to go through problems of buggy persistence implementations in new types of index. Or put it another way, it will be easier to write new index AMs because we'll be able to skip the WAL part until we know we want it. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: