Home > mailing lists

Re: Why does CREATE INDEX CONCURRENTLY need two scans? - Mailing list pgsql-general

From	Joshua Ma
Subject	Re: Why does CREATE INDEX CONCURRENTLY need two scans?
Date	April 1, 2015 04:06:56
Msg-id	CAG9XPV=sdOrbPs3d+M8qd5RpFTGpaHM=F1s3MS4L3XHYB-VAww@mail.gmail.com Whole thread
In response to	Re: Why does CREATE INDEX CONCURRENTLY need two scans? (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-general

Tree view

Ah, that's exactly what I was looking for. Thanks everyone for the responses!

- Josh

ᐧ

On Tue, Mar 31, 2015 at 8:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Michael Paquier <michael.paquier@gmail.com> writes:
> On Wed, Apr 1, 2015 at 9:43 AM, Joshua Ma <josh@benchling.com> wrote:
>> Why are two scans necessary? What would break if it did something like the
>> following?
>>
>> 1) insert pg_index entry, wait for relevant txns to finish, mark index
>> open for inserts
>>
>> 2) build index in a single snapshot, mark index valid for searches

>> Wouldn't new inserts update the index correctly? Between the snapshot and
>> index-updating txns afterwards, wouldn't all updates be covered?

> When an index is built with index_build, are included in the index only the
> tuples seen at the start of the first scan. A second scan is needed to add
> in the index entries for the tuples that have been inserted into the table
> during the build phase.

More to the point: Joshua's design supposes that retail insertions into
an index can happen in parallel with index build. Or in other words,
that index build consists of instantaneously creating an empty-but-valid
index file and then doing a lot of ordinary inserts into it. That's a
possible design, but it's not very efficient, and most of our index AMs
don't do it that way. btree, for instance, starts by sorting all the
entries and creating the leaf-level pages. Then it builds the upper tree
levels. It doesn't have a complete tree that could support retail
insertions until the very end. Moreover, most of the work is done in
storage that's local to the backend running CREATE INDEX, and isn't
accessible to other processes at all.

regards, tom lane

pgsql-general by date:

From: Tom Lane
Date: 01 April 2015, 03:54:46
Subject: Re: Why does CREATE INDEX CONCURRENTLY need two scans?

From: Tom Lane
Date: 01 April 2015, 04:17:21
Subject: Re: Would like to know how analyze works technically

Re: Why does CREATE INDEX CONCURRENTLY need two scans? - Mailing list pgsql-general

Previous

Next