Thread: BUG #5157: Hash index not concurrency safe
The following bug has been logged online: Bug reference: 5157 Logged by: Jeff Janes Email address: jeff.janes@gmail.com PostgreSQL version: 8.4.1 Operating system: Linux Description: Hash index not concurrency safe Details: Hash index is not concurrency safe, starting in REL8_4_0 and up to HEAD. T1: create table foo (id int, x text); create index asdlfkjsdf on foo using hash (id); insert into foo select 1, 'xxxxxxxxxxx' from generate_series(1,100); set enable_seqscan =off; set enable_bitmapscan =off; \timing on select count(pg_sleep(.3)) from foo where id=1; count ------- 100 (1 row) Time: 30897.835 ms select count(pg_sleep(.3)) from foo where id=1; While that is running switch to T2: insert into foo select generate_series, 'xxxxxxxxxxx' from generate_series(1,100000); Back in T1: count ------- 8 (1 row) Time: 2474.709 ms The pg_sleep is simply there to give me time to force the two transactions to overlap. The problem is that hashgettuple releases the buffer content share lock when it returns a tuple, so when it comes back to get another tuple the block may have been rearranged by concurrent inserts. But the state of the scan object, specifically so->hashso_curpos, makes no attempt to detect or correct for this rearragement.
"Jeff Janes" <jeff.janes@gmail.com> writes: > Hash index is not concurrency safe, starting in REL8_4_0 and up to HEAD. Ouch. This used to be okay, because adding new entries to a hash page always added them at the end. The 8.4 changes to keep individual hash pages sorted by hashcode broke it :-(. I think we could recover by having the hashgettuple code path re-synchronize by looking for the heap TID it previously returned. That must be at the same or higher index TID as we had stopped at. (Deletions are not possible, so we only have to search forward, and the TID must be there someplace.) regards, tom lane
On Sun, Nov 1, 2009 at 8:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Jeff Janes" <jeff.janes@gmail.com> writes: >> Hash index is not concurrency safe, starting in REL8_4_0 and up to HEAD. > > Ouch. =A0This used to be okay, because adding new entries to a hash page > always added them at the end. =A0The 8.4 changes to keep individual hash > pages sorted by hashcode broke it :-(. > > I think we could recover by having the hashgettuple code path > re-synchronize by looking for the heap TID it previously returned. > That must be at the same or higher index TID as we had stopped at. > (Deletions are not possible, so we only have to search forward, > and the TID must be there someplace.) Can it get pushed to another page (an overflow page)? My quick reading of the code suggests it can't get pushed, which makes the fix easier. I'll work on a fix for it. But if 8.4.2 is coming out in the next couple of weeks and we want the fix to be in it, then we might want someone more proficient than me to work on it. Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > On Sun, Nov 1, 2009 at 8:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think we could recover by having the hashgettuple code path >> re-synchronize by looking for the heap TID it previously returned. > Can it get pushed to another page (an overflow page)? My quick > reading of the code suggests it can't get pushed, which makes the fix > easier. It can't, which is an important part of the reason why this fix is okay. > I'll work on a fix for it. But if 8.4.2 is coming out in the next > couple of weeks and we want the fix to be in it, then we might want > someone more proficient than me to work on it. I'm working on it already ... I figure it's my fault for not catching this while reviewing the patch :-(. There's no timetable as yet for 8.4.2. regards, tom lane
I wrote: > Jeff Janes <jeff.janes@gmail.com> writes: >> Hash index is not concurrency safe, starting in REL8_4_0 and up to HEAD. > Ouch. This used to be okay, because adding new entries to a hash page > always added them at the end. The 8.4 changes to keep individual hash > pages sorted by hashcode broke it :-(. Actually, now that I am looking at it, that patch COMPLETELY destroyed hash indexes. The search logic requires that index entries within a page are ordered by hash value. Although the insertion code preserves that property, neither _hash_splitbucket nor _hash_squeezebucket make any attempt to do so. So it's not just a transient concurrency issue, you can easily get corruption of a hash index leading to permanent search failures. Ugh. Mea culpa for letting this one through. regards, tom lane
I wrote: > Ugh. Mea culpa for letting this one through. I've committed patches to fix this. regards, tom lane