Re: Reducing relation locking overhead - Mailing list pgsql-hackers
From | Hannu Krosing |
---|---|
Subject | Re: Reducing relation locking overhead |
Date | |
Msg-id | 1134326038.3567.50.camel@localhost.localdomain Whole thread Raw |
In response to | Re: Reducing relation locking overhead (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Reducing relation locking overhead
|
List | pgsql-hackers |
Ühel kenal päeval, L, 2005-12-10 kell 21:07, kirjutas Tom Lane: > Hannu Krosing <hannu@skype.net> writes: > > How do you plan to determine "any rows not already present in the index" > > without explicitly remembering the start and end snapshots of existing > > CREATE INDEX (SNAP1 and SNAP2 in my proposal)? > > I was thinking in terms of actually looking into the index to see if the > particular TID is present or not. You could use snapshots to optimize > this by avoiding index probes for tuples that must be present, which > hopefully will be most of 'em. Also you need a snapshot to detect > tuples that are new enough that they certainly will be indexed by their > inserting transaction, so that you don't have a race condition between > an active inserter and the REINDEX. (I think this is possible but maybe > I missed something.) So would it be possible to do the whole indexing whil another long transaction runs, possibly in SERIALIZABLE isolation level, inserting tuples all the way. > That leaves you looking at just the tuples > inserted by transactions that might or might not have known about the > index. So yeah, you do need SNAP1 and SNAP2 but they're being used in > a different way than the original proposal. By taking snaps at points with no concurrrent transactions I was hoping to make things (both concept and code) simpler by reducing SNAP1 and SNAP2 to mere transaction ids and making sure that these and only these tuples inserted by transactions which fall between them need to be added to the index. If it does not actually make things clearer then it may not be a good idea to try to fit them between transactions. Otoh, we still do need to wait - possibly a long time - for "3. Attempt to acquire ShareLock". So why not wait also at start in case that would make things cleares/simpler ? > > In the last round of discussion you pointed out that index itself can't > > be effectively used for this in case there are lots of equal index keys. > > True, but if you can avoid going to the index at all for the majority of > the tuples, I think this is tolerable. And it seems that we could store same-valued index leafs in TID order, at least during concurrent create index/reindex, making the index lookup cheaper. To me, it seems a good idea to store similar values in index always in TID order, as this makes an index scan do the right thing (scan in heap order), at least for key=const case, even without intermediate bitmap index. Or are there downsides to this in general case ? > In any case the design idea here > seems to be "we don't care how long REINDEX takes as long as it's not > blocking anyone". Yes, thats the general idea. Within reason of course, for example making a seqscan over the index for each and every tuple inserted during building the first index would probably still be too slow :) > regards, tom lane
pgsql-hackers by date: