Home > mailing lists

Re: WIP: Access method extendability - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: WIP: Access method extendability
Date	October 28, 2014 14:22:48
Msg-id	CA+U5nMKNcbcy1wFZVfr3S9QREJSVJ_xNuU1VS9o0McTjFgtZxg@mail.gmail.com Whole thread
In response to	WIP: Access method extendability (Alexander Korotkov <aekorotkov@gmail.com>)
Responses	Re: WIP: Access method extendability Re: WIP: Access method extendability Re: WIP: Access method extendability
List	pgsql-hackers

Tree view

On 15 October 2014 13:08, Alexander Korotkov <aekorotkov@gmail.com> wrote:

> Postgres was initially designed to support access methods extendability.
> This extendability lives to present day. However, this is mostly internal
> in-core extendability. One can quite easily add new access method into
> PostgreSQL core. But if one try to implement access method as external
> module, he will be faced with following difficulties:

...

> Problem of WAL is a bit more complex. According to previous discussions, we
> don't want to let extensions declare their own xlog records. If we let them
> then recovery process will depend on extensions. That is much violates
> reliability. Solution is to implement some generic xlog record which is able
> to represent difference between blocks in some general manner.

Thank you for progressing with these thoughts.

I'm still a little uncertain about the approach, now my eyes are open
to the problems of extendability.

The main problem we had in the past was that GiST and GIN indexes both
had faulty implementations for redo, which in some cases caused severe
issues. Adding new indexes will also suffer the same problems, so I
see a different starting place.

The faults there raised the need for us to be able to mark specific
indexes as corrupt, so that they could be avoided during Hot Standby
and in normal running after promotion.

Here's the order of features I think we need

1. A mechanism to mark an index as corrupt so that it won't be usable
by queries. That needs to work during recovery, so we can persist a
data structure which tells us which indexes are corrupt. Then
something that checks whether an index is known corrupt during
relcache access. So if we decide an index is bad, we record the index
as corrupt and then fire a relcache invalidation.

2. Some additional code in Autovacuum to rebuild corrupt indexes at
startup, using AV worker processes to perform a REINDEX CONCURRENTLY.

This will give us what we need to allow an AM to behave sensibly, even
in the face of its own bugs. It also gives us UNLOGGED indexes for
free. Unlogged indexes means we can change the way unlogged tables
behave to allow them to truncate down to the highest unchanged data at
recovery, so we don't lose all the data when we crash.

3. That then allows us to move towards having indexes that are marked
"changed" when we perform first DML on the table in any checkpoint
cycle. Which allows us to rebuild indexes which were in the middle of
being changed when we crashed. (The way we'd do that is to have an LSN
on the metapage and then only write WAL for the metapage). The
difference here is that they are UNLOGGED but do not get trashed on
recovery unless they were in the process of changing.

If we do those things, then we won't even need to worry about needing
AMs to write their own WAL records. Recovery will be safe AND we won't
need to go through problems of buggy persistence implementations in
new types of index.

Or put it another way, it will be easier to write new index AMs
because we'll be able to skip the WAL part until we know we want it.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Andres Freund
Date: 28 October 2014, 14:16:52
Subject: Re: Deferring some AtStart* allocations?

From: Heikki Linnakangas
Date: 28 October 2014, 14:40:30
Subject: Re: [WIP Patch] Using 128-bit integers for sum, avg and statistics aggregates

Re: WIP: Access method extendability - Mailing list pgsql-hackers

Previous

Next