Re: Reducing relation locking overhead - Mailing list pgsql-hackers
From | Greg Stark |
---|---|
Subject | Re: Reducing relation locking overhead |
Date | |
Msg-id | 874q5rfdpl.fsf@stark.xeocode.com Whole thread Raw |
In response to | Re: Reducing relation locking overhead (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Reducing relation locking overhead
Re: Reducing relation locking overhead |
List | pgsql-hackers |
Simon Riggs <simon@2ndquadrant.com> writes: > On Fri, 2005-12-02 at 02:14 -0500, Tom Lane wrote: > > Greg Stark <gsstark@mit.edu> writes: > > > It was a *major* new feature that many people were waiting for when Oracle > > > finally implemented live CREATE INDEX and REINDEX. The ability to run create > > > an index without blocking any operations on a table, even updates, was > > > absolutely critical for 24x7 operation. > > > > Well, we're still not in *that* ballpark and I haven't seen any serious > > proposals to make us so. How "absolutely critical" is it really? > > Is REINDEX-in-parallel-with-reads-but-not-writes, which is what we > > actually have at the moment, an "absolutely critical" facility? Alright, I'll grant Tom that "absolutely critical" was a bit of hyperbole. > I know at least one other RDBMS that uses optimistic locking when > creating indexes. It checks the table description, builds the index with > a read lock, then checks the table description again before attempting > to lock the catalog, "create" the index and then complete. There is a > risk of getting a "table restructured error" after the build is nearly > complete. I suspect this comes out of a very different storage model from Postgres's. Postgres would have no trouble building an index of the existing data using only shared locks. The problem is that any newly inserted (or updated) records could be missing from such an index. To do it you would then have to gather up all those newly inserted records. And of course while you're doing that new records could be inserted. And so on. There's no guarantee it would ever finish, though I suppose you could detect the situation if the size of the new batch wasn't converging to 0 and throw an error. One optimization would be to have a flag that disabled the use of the FSM, forcing all inserts to extend the table and allocate new tuples at the end. This would at least limit the amount the index build would have to scan. The index build could just do one-by-one insertions for the remaining tuples until it catches up to the head. At the end of the index build there's also a problem upgrading locks to put in place the new index. That would create a deadlock risk. Perhaps that's where the "table restructured error" comes up in these other databases? > 24x7 operation is actually fairly common. Maybe not with a strong SLA > for availability, but many websites and embedded apps are out there all > the time. The PostgreSQL claim to fame has concurrency at the top of the > list, so we should assume that in all we do. Off the top of my head I would put these items on the list of "necessary for 24x7 operation": . (non-FULL) VACUUM . Online/PITR backups . Partitioned Tables . online index builds Of which Postgres has 2.5 out of 4. And most of those have come in just the last 12 months or so. Doing pretty damned good. -- greg
pgsql-hackers by date: