Thread: online reindex
It sounds to me like it should be fairly straightforward to implement online reindex. That is, reindex without locking the table. This is an important feature for 24x7 operation. Since postgres doesn't modify data in place the only thing required for online reindex is to ensure that the reindex operation sees all of the latest data. If reindex sets a flag that causes all new inserts and updates to allocate new space at the end of the heap without checking for free space, then a simple linear scan should be guaranteed to catch all the data. (I'm not sure how the indexing operation works, if it reads in the whole table and then sorts it there would have to be an extra step where any new tuples are read in and inserted.) There would only have to be a small window with the table locked while the indexes are swapped at the end. -- greg
Greg Stark <gsstark@mit.edu> writes: > If reindex sets a flag that causes all new inserts and updates to allocate new > space at the end of the heap without checking for free space, then a simple > linear scan should be guaranteed to catch all the data. Oh? If people are inserting tuples at the same time you're reading, I'm not sure this holds good at all. > There would only have to be a small window with the table locked while the > indexes are swapped at the end. Can you say "deadlock"? Upgrading a lock from shared to exclusive won't work in general, because there may be other transactions trying to do the same thing. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > If reindex sets a flag that causes all new inserts and updates to allocate new > > space at the end of the heap without checking for free space, then a simple > > linear scan should be guaranteed to catch all the data. > > Oh? If people are inserting tuples at the same time you're reading, I'm > not sure this holds good at all. The premise is that reusing space within the heap is disabled during this scan, so any new tuples are at the end. When you've reached the end you've read all the ones inserted during your scan as well. There is a window between the end of the scan and locking the table when more extra tuples could be added but presumably that window would be small and the extra tuples would be limited. They would have to be processed while the table is locked. I don't know much about postgres's locking yet, so I can't comment on how to arrange to be able to get a lock on the table. At first blush this sounds like there needs to be a "upgradable lock" that only one process can hold but allows other processes to read while it's held until it's upgraded. But as I said I don't know much details about the locking policies in use currently. -- greg