Thread: online reindex

online reindex

From
Greg Stark
Date:
It sounds to me like it should be fairly straightforward to implement online
reindex. That is, reindex without locking the table. This is an important
feature for 24x7 operation. Since postgres doesn't modify data in place the
only thing required for online reindex is to ensure that the reindex operation
sees all of the latest data.

If reindex sets a flag that causes all new inserts and updates to allocate new
space at the end of the heap without checking for free space, then a simple
linear scan should be guaranteed to catch all the data.

(I'm not sure how the indexing operation works, if it reads in the whole table
and then sorts it there would have to be an extra step where any new tuples
are read in and inserted.)

There would only have to be a small window with the table locked while the
indexes are swapped at the end.

-- 
greg



Re: online reindex

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> If reindex sets a flag that causes all new inserts and updates to allocate new
> space at the end of the heap without checking for free space, then a simple
> linear scan should be guaranteed to catch all the data.

Oh?  If people are inserting tuples at the same time you're reading, I'm
not sure this holds good at all.

> There would only have to be a small window with the table locked while the
> indexes are swapped at the end.

Can you say "deadlock"?  Upgrading a lock from shared to exclusive won't
work in general, because there may be other transactions trying to do
the same thing.
        regards, tom lane


Re: online reindex

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > If reindex sets a flag that causes all new inserts and updates to allocate new
> > space at the end of the heap without checking for free space, then a simple
> > linear scan should be guaranteed to catch all the data.
> 
> Oh?  If people are inserting tuples at the same time you're reading, I'm
> not sure this holds good at all.

The premise is that reusing space within the heap is disabled during this
scan, so any new tuples are at the end. When you've reached the end you've
read all the ones inserted during your scan as well. 

There is a window between the end of the scan and locking the table when more
extra tuples could be added but presumably that window would be small and the
extra tuples would be limited. They would have to be processed while the table
is locked.

I don't know much about postgres's locking yet, so I can't comment on how to
arrange to be able to get a lock on the table. At first blush this sounds like
there needs to be a "upgradable lock" that only one process can hold but
allows other processes to read while it's held until it's upgraded. But as I
said I don't know much details about the locking policies in use currently.

-- 
greg