Thread: [sqlsmith] Failed assertion in _hash_splitbucket_guts

[sqlsmith] Failed assertion in _hash_splitbucket_guts

From
Andreas Seltenreich
Date:
Hi,

the new hash index code on 11003eb failed an assertion yesterday:
   TRAP: FailedAssertion("!(bucket == obucket)", File: "hashpage.c", Line: 1037)

Statement was
   update public.hash_i4_heap set seqno = public.hash_i4_heap.random;

It can be reproduced with the data directory (Debian stretch amd64) I've
put here:
   http://ansel.ydns.eu/~andreas/_hash_splitbucket_guts.tar.xz (12 MB)

Backtrace below.  The cluster hasn't suffered any crashes before this
incident.

regards,
Andreas

Core was generated by `postgres: smith regression [local] UPDATE                           '.
Program terminated with signal SIGABRT, Aborted.
(gdb) bt
#0  0x00007f49c40cc198 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f49c40cd61a in __GI_abort () at abort.c:89
#2  0x00000000007f55c1 in ExceptionalCondition (conditionName=conditionName@entry=0x84f890 "!(bucket == obucket)",
errorType=errorType@entry=0x83665d"FailedAssertion", fileName=fileName@entry=0x84f86a "hashpage.c",
lineNumber=lineNumber@entry=1037)at assert.c:54
 
#3  0x00000000004a3d41 in _hash_splitbucket_guts (rel=rel@entry=0x1251ff8, metabuf=metabuf@entry=1703,
obucket=obucket@entry=37,nbucket=nbucket@entry=549, obuf=obuf@entry=3082, nbuf=nbuf@entry=1754, htab=0x0,
maxbucket=549,highmask=1023, lowmask=511) at hashpage.c:1037
 
#4  0x00000000004a5627 in _hash_splitbucket (lowmask=511, highmask=1023, maxbucket=549, nbuf=1754, obuf=3082,
nbucket=549,obucket=37, metabuf=1703, rel=0x1251ff8) at hashpage.c:894
 
#5  _hash_expandtable (rel=0x1251ff8, metabuf=1703) at hashpage.c:768
#6  0x00000000004a1f71 in _hash_doinsert (rel=rel@entry=0x1251ff8, itup=itup@entry=0x26dc830) at hashinsert.c:236
#7  0x00000000004a01c3 in hashinsert (rel=0x1251ff8, values=<optimized out>, isnull=<optimized out>, ht_ctid=0x26dc6fc,
heapRel=<optimizedout>, checkUnique=<optimized out>) at hash.c:247
 
#8  0x00000000005ded1b in ExecInsertIndexTuples (slot=slot@entry=0x26dbd10, tupleid=tupleid@entry=0x26dc6fc,
estate=estate@entry=0x2530028,noDupErr=noDupErr@entry=0 '\000', specConflict=specConflict@entry=0x0,
arbiterIndexes=arbiterIndexes@entry=0x0)at execIndexing.c:388
 
#9  0x00000000005fddaa in ExecUpdate (tupleid=tupleid@entry=0x7ffcaa7c9e40, oldtuple=oldtuple@entry=0x0,
slot=slot@entry=0x26dbd10,planSlot=planSlot@entry=0x26db278, epqstate=epqstate@entry=0x26dac98,
estate=estate@entry=0x2530028,canSetTag=1 '\001') at nodeModifyTable.c:1030
 
#10 0x00000000005fe49c in ExecModifyTable (node=node@entry=0x26dabf0) at nodeModifyTable.c:1516
#11 0x00000000005e3a18 in ExecProcNode (node=node@entry=0x26dabf0) at execProcnode.c:396
#12 0x00000000005dfabe in ExecutePlan (dest=0x1c2ecd0, direction=<optimized out>, numberTuples=0, sendTuples=<optimized
out>,operation=CMD_UPDATE, use_parallel_mode=<optimized out>, planstate=0x26dabf0, estate=0x2530028) at
execMain.c:1567
#13 standard_ExecutorRun (queryDesc=0x1c2ed68, direction=<optimized out>, count=0) at execMain.c:338
#14 0x0000000000701b94 in ProcessQuery (plan=<optimized out>, sourceText=0xfff228 "update public.hash_i4_heap set \n
seqno= public.hash_i4_heap.random\nreturning \n  (select option_value from information_schema.foreign_server_options
limit1 offset 2)\n     as c0", params=0x0, dest=0x1c2ecd0, completionTag=0x7ffcaa7ca020 "") at pquery.c:185
 
#15 0x0000000000701e0b in PortalRunMulti (portal=portal@entry=0x25c52b0, isTopLevel=isTopLevel@entry=1 '\001',
setHoldSnapshot=setHoldSnapshot@entry=1'\001', dest=dest@entry=0x1c2ecd0, altdest=0xca30e0 <donothingDR>,
completionTag=completionTag@entry=0x7ffcaa7ca020"") at pquery.c:1299
 
#16 0x00000000007020f9 in FillPortalStore (portal=portal@entry=0x25c52b0, isTopLevel=isTopLevel@entry=1 '\001') at
pquery.c:1045
#17 0x0000000000702bcd in PortalRun (portal=portal@entry=0x25c52b0, count=count@entry=9223372036854775807,
isTopLevel=isTopLevel@entry=1'\001', dest=dest@entry=0x199f248, altdest=altdest@entry=0x199f248,
completionTag=completionTag@entry=0x7ffcaa7ca3d0"") at pquery.c:782
 
#18 0x0000000000700379 in exec_simple_query (query_string=0xfff228 "update public.hash_i4_heap set \n  seqno =
public.hash_i4_heap.random\nreturning\n  (select option_value from information_schema.foreign_server_options limit 1
offset2)\n     as c0") at postgres.c:1094
 
#19 PostgresMain (argc=<optimized out>, argv=argv@entry=0xfad1d8, dbname=<optimized out>, username=<optimized out>) at
postgres.c:4069
#20 0x000000000046d6c9 in BackendRun (port=0xfa8c60) at postmaster.c:4271
#21 BackendStartup (port=0xfa8c60) at postmaster.c:3945
#22 ServerLoop () at postmaster.c:1701
#23 0x0000000000698ab9 in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0xf765d0) at postmaster.c:1309
#24 0x000000000046e88d in main (argc=4, argv=0xf765d0) at main.c:228



Re: [sqlsmith] Failed assertion in _hash_splitbucket_guts

From
Amit Kapila
Date:
On Sat, Dec 3, 2016 at 2:06 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
> Hi,
>
> the new hash index code on 11003eb failed an assertion yesterday:
>
>     TRAP: FailedAssertion("!(bucket == obucket)", File: "hashpage.c", Line: 1037)
>

This can happen if we start new split before completing the previous
split of a bucket or if there is still any remaining tuples present in
the bucket being from the previous split.  I see a problem in below
code:

_hash_expandtable(Relation rel, Buffer metabuf)
{
..
if (H_NEEDS_SPLIT_CLEANUP(oopaque))
{
/* Release the metapage lock. */
_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);

hashbucketcleanup(rel, old_bucket, buf_oblkno, start_oblkno, NULL, metap->hashm_maxbucket, metap->hashm_highmask,
metap->hashm_lowmask,NULL, NULL, true, NULL, NULL);
 
..
}

Here we shouldn't be accessing meta page after releasing the lock as
concurrent activity can change these values.  This can be fixed by
storing these values in local variables before releasing the lock and
passing local variables in hashbucketcleanup().  I will send patch
shortly.  However, I wanted to verify that this is the reason why you
are seeing the problem.  I could not connect to the database provided
by you.

> Statement was
>
>     update public.hash_i4_heap set seqno = public.hash_i4_heap.random;
>
> It can be reproduced with the data directory (Debian stretch amd64) I've
> put here:
>
>     http://ansel.ydns.eu/~andreas/_hash_splitbucket_guts.tar.xz (12 MB)
>
> Backtrace below.  The cluster hasn't suffered any crashes before this
> incident.
>

How should I connect to this database?  If I use the user fdw
mentioned in pg_hba.conf (changed authentication method to trust in
pg_hba.conf), it says the user doesn't exist.  Can you create a user
in the database which I can use?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [sqlsmith] Failed assertion in _hash_splitbucket_guts

From
Amit Kapila
Date:
On Sat, Dec 3, 2016 at 6:58 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Sat, Dec 3, 2016 at 2:06 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
>> Hi,
>>
>> the new hash index code on 11003eb failed an assertion yesterday:
>>
>>     TRAP: FailedAssertion("!(bucket == obucket)", File: "hashpage.c", Line: 1037)
>
> _hash_expandtable(Relation rel, Buffer metabuf)
> {
> ..
> if (H_NEEDS_SPLIT_CLEANUP(oopaque))
> {
> /* Release the metapage lock. */
> _hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
>
> hashbucketcleanup(rel, old_bucket, buf_oblkno, start_oblkno, NULL,
>   metap->hashm_maxbucket, metap->hashm_highmask,
>   metap->hashm_lowmask, NULL,
>   NULL, true, NULL, NULL);
> ..
> }
>
> Here we shouldn't be accessing meta page after releasing the lock as
> concurrent activity can change these values.  This can be fixed by
> storing these values in local variables before releasing the lock and
> passing local variables in hashbucketcleanup().  I will send patch
> shortly.
>

Please find attached patch to fix above code.  Now, if this is the
reason of the problem you are seeing, it won't fix your existing
database as it already contains some tuples in the wrong bucket.  Can
you please re-run the test to see if you can reproduce the problem?


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [sqlsmith] Failed assertion in _hash_splitbucket_guts

From
Andreas Seltenreich
Date:
Amit Kapila writes:

> How should I connect to this database?  If I use the user fdw
> mentioned in pg_hba.conf (changed authentication method to trust in
> pg_hba.conf), it says the user doesn't exist.  Can you create a user
> in the database which I can use?

There is also a superuser "postgres" and an unprivileged user "smith"
you should be able to login with.  You could also start postgres in
single-user mode to bypass the authentication altogether.

Amit Kapila writes:

> Please find attached patch to fix above code.  Now, if this is the
> reason of the problem you are seeing, it won't fix your existing
> database as it already contains some tuples in the wrong bucket.  Can
> you please re-run the test to see if you can reproduce the problem?

Ok, I'll do testing with the patch applied.

Btw, I also find entries like following in the logging database:

ERROR:  could not read block 2638 in file "base/16384/17256": read only 0 of 8192 bytes

…with relfilenode being an hash index.  I usually ignore these as they
naturally start occuring after a recovery because of an unrelated crash.
But since 11003eb, they also occur when the cluster has not yet suffered
a crash.

regards,
Andreas



Re: [sqlsmith] Failed assertion in _hash_splitbucket_guts

From
Amit Kapila
Date:
On Sat, Dec 3, 2016 at 3:44 PM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
> Amit Kapila writes:
>
>> How should I connect to this database?  If I use the user fdw
>> mentioned in pg_hba.conf (changed authentication method to trust in
>> pg_hba.conf), it says the user doesn't exist.  Can you create a user
>> in the database which I can use?
>
> There is also a superuser "postgres" and an unprivileged user "smith"
> you should be able to login with.  You could also start postgres in
> single-user mode to bypass the authentication altogether.
>

Thanks.  I have checked and found that my above speculation seems to
be right which means that old bucket contains tuples from previous
split.  At the location of Assert, I have printed the values of old
bucket, new bucket and actual bucket to which tuple belongs and below
is the result.

regression=# update public.hash_i4_heap set seqno = public.hash_i4_heap.random;
ERROR:  wrong bucket, old bucket:37, new bucket:549, actual bucket:293

So what above means is that tuple should either belong to bucket 37 or
549, but it actually belongs to 293.  Both 293 and 549 are the buckets
that are split from splitted from bucket 37 (you can find that by
using calculation as used in _hash_expandtable).  I have again checked
the code and couldn't find any other reason execpt from what I
mentioned in my previous mail.  So, let us wait for the results of
your new test run.

> Amit Kapila writes:
>
>> Please find attached patch to fix above code.  Now, if this is the
>> reason of the problem you are seeing, it won't fix your existing
>> database as it already contains some tuples in the wrong bucket.  Can
>> you please re-run the test to see if you can reproduce the problem?
>
> Ok, I'll do testing with the patch applied.
>
> Btw, I also find entries like following in the logging database:
>
> ERROR:  could not read block 2638 in file "base/16384/17256": read only 0 of 8192 bytes
>
> …with relfilenode being an hash index.  I usually ignore these as they
> naturally start occuring after a recovery because of an unrelated crash.
> But since 11003eb, they also occur when the cluster has not yet suffered
> a crash.
>

Hmm, I am not sure if this is related to previous problem, but it
could be.  Is it possible to get the operation and or callstack for
above failure?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [sqlsmith] Failed assertion in _hash_splitbucket_guts

From
Robert Haas
Date:
On Fri, Dec 2, 2016 at 10:04 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Here we shouldn't be accessing meta page after releasing the lock as
>> concurrent activity can change these values.  This can be fixed by
>> storing these values in local variables before releasing the lock and
>> passing local variables in hashbucketcleanup().  I will send patch
>> shortly.
>
> Please find attached patch to fix above code.

Committed.  I don't know either whether this will fix things for
Andreas, but it's certainly a bug fix in its own right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Short reads in hash indexes (was: [sqlsmith] Failed assertion in _hash_splitbucket_guts)

From
Andreas Seltenreich
Date:
Amit Kapila writes:

> On Sat, Dec 3, 2016 at 3:44 PM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
>> Amit Kapila writes:
>>
>>> [2. text/x-diff; fix_hash_bucketsplit_sqlsmith_v1.patch]
>> Ok, I'll do testing with the patch applied.

Good news: the assertion hasn't fired since the patch is in.

However, these are still getting logged:

smith=# select * from state_report where sqlstate = 'XX001';
-[ RECORD 1 ]------------------------------------------------------------------------------
count    | 10
sqlstate | XX001
sample   | ERROR:  could not read block 1173 in file "base/16384/17256": read only 0 of 8192 bytes
hosts    | {airbisquit,frell,gorgo,marbit,pillcrow,quakken}

> Hmm, I am not sure if this is related to previous problem, but it
> could be.  Is it possible to get the operation and or callstack for
> above failure?

Ok, will turn the elog into an assertion to get at the backtraces.

regards,
Andreas



Re: [sqlsmith] Short reads in hash indexes

From
Andreas Seltenreich
Date:
Andreas Seltenreich writes:

> Amit Kapila writes:
>
>> On Sat, Dec 3, 2016 at 3:44 PM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
>>> Amit Kapila writes:
>>>
>>>> [2. text/x-diff; fix_hash_bucketsplit_sqlsmith_v1.patch]
>>> Ok, I'll do testing with the patch applied.
>
> Good news: the assertion hasn't fired since the patch is in.

Meh, it fired again today after being silent for 100e6 queries :-/
I guess I need to add some confidence qualification on such statements.
Maybe sigmas as they do at CERN…

> smith=# select * from state_report where sqlstate = 'XX001';
> -[ RECORD 1 ]------------------------------------------------------------------------------
> count    | 10
> sqlstate | XX001
> sample   | ERROR:  could not read block 1173 in file "base/16384/17256": read only 0 of 8192 bytes
> hosts    | {airbisquit,frell,gorgo,marbit,pillcrow,quakken}
>
>> Hmm, I am not sure if this is related to previous problem, but it
>> could be.  Is it possible to get the operation and or callstack for
>> above failure?
>
> Ok, will turn the elog into an assertion to get at the backtraces.

Doing so on top of 4212cb7, I caught the backtrace below.  Query was:

--8<---------------cut here---------------start------------->8---
set max_parallel_workers_per_gather = 0;
select  count(1) from      public.hash_name_heap as ref_2      join public.rtest_emplog as sample_1             on
(ref_2.random= sample_1.who); 
--8<---------------cut here---------------end--------------->8---

I've put the data directory where it can be reproduced here:
   http://ansel.ydns.eu/~andreas/hash_index_short_read.tar.xz (12MB)

regards,
Andreas

TRAP: FailedAssertion("!(!"short read of block")", File: "md.c", Line: 782)
#2  0x00000000007f7f11 in ExceptionalCondition (conditionName=conditionName@entry=0x9a1ae9 "!(!\"short read of
block\")",errorType=errorType@entry=0x83db3d "FailedAssertion", fileName=fileName@entry=0x946a9a "md.c",
lineNumber=lineNumber@entry=782)at assert.c:54 
#3  0x00000000006fb305 in mdread (reln=<optimized out>, forknum=<optimized out>, blocknum=4702, buffer=0x7fe97e7e1280
"\"")at md.c:782 
#4  0x00000000006d0ffa in ReadBuffer_common (smgr=0x2af7408, relpersistence=<optimized out>,
forkNum=forkNum@entry=MAIN_FORKNUM,blockNum=blockNum@entry=4702, mode=RBM_NORMAL, strategy=<optimized out>,
hit=0x7ffde9df11cf"") at bufmgr.c:890 
#5  0x00000000006d1a20 in ReadBufferExtended (reln=0x2fd10d8, forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=4702,
mode=mode@entry=RBM_NORMAL,strategy=strategy@entry=0x0) at bufmgr.c:664 
#6  0x00000000006d1b74 in ReadBuffer (blockNum=<optimized out>, reln=<optimized out>) at bufmgr.c:596
#7  ReleaseAndReadBuffer (buffer=buffer@entry=87109984, relation=<optimized out>, blockNum=<optimized out>) at
bufmgr.c:1540
#8  0x00000000004c047b in index_fetch_heap (scan=scan@entry=0x5313160) at indexam.c:469
#9  0x00000000004c05ee in index_getnext (scan=scan@entry=0x5313160, direction=direction@entry=ForwardScanDirection) at
indexam.c:565
#10 0x00000000005f9b71 in IndexNext (node=node@entry=0x5311c48) at nodeIndexscan.c:105
#11 0x00000000005ec492 in ExecScanFetch (recheckMtd=0x5f9af0 <IndexRecheck>, accessMtd=0x5f9b30 <IndexNext>,
node=0x5311c48)at execScan.c:95 
#12 ExecScan (node=0x5311c48, accessMtd=0x5f9b30 <IndexNext>, recheckMtd=0x5f9af0 <IndexRecheck>) at execScan.c:145
#13 0x00000000005e4da8 in ExecProcNode (node=node@entry=0x5311c48) at execProcnode.c:427
#14 0x00000000006014f9 in ExecNestLoop (node=node@entry=0x53110a8) at nodeNestloop.c:174
#15 0x00000000005e4cf8 in ExecProcNode (node=node@entry=0x53110a8) at execProcnode.c:476
#16 0x0000000000601436 in ExecNestLoop (node=node@entry=0x5310e00) at nodeNestloop.c:123
#17 0x00000000005e4cf8 in ExecProcNode (node=node@entry=0x5310e00) at execProcnode.c:476
#18 0x0000000000601436 in ExecNestLoop (node=node@entry=0x530f698) at nodeNestloop.c:123
#19 0x00000000005e4cf8 in ExecProcNode (node=node@entry=0x530f698) at execProcnode.c:476
#20 0x00000000005e0e9e in ExecutePlan (dest=0x603a4a8, direction=<optimized out>, numberTuples=0, sendTuples=<optimized
out>,operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x530f698, estate=0x46bc008) at execMain.c:1568 
#21 standard_ExecutorRun (queryDesc=0x3475168, direction=<optimized out>, count=0) at execMain.c:338
#22 0x00000000007029f8 in PortalRunSelect (portal=portal@entry=0x2561e18, forward=forward@entry=1 '\001', count=0,
count@entry=9223372036854775807,dest=dest@entry=0x603a4a8) at pquery.c:946 
#23 0x0000000000703f3e in PortalRun (portal=portal@entry=0x2561e18, count=count@entry=9223372036854775807,
isTopLevel=isTopLevel@entry=1'\001', dest=dest@entry=0x603a4a8, altdest=altdest@entry=0x603a4a8,
completionTag=completionTag@entry=0x7ffde9df18b0"") at pquery.c:787 
#24 0x0000000000700d5b in exec_simple_query (query_string=0x4685258 ) at postgres.c:1094
#25 PostgresMain (argc=<optimized out>, argv=argv@entry=0x256f5a8, dbname=0x256f580 "regression", username=<optimized
out>)at postgres.c:4069 
#26 0x000000000046daf2 in BackendRun (port=0x25645a0) at postmaster.c:4274
#27 BackendStartup (port=0x25645a0) at postmaster.c:3946
#28 ServerLoop () at postmaster.c:1704
#29 0x0000000000699d28 in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x25425c0) at postmaster.c:1312
#30 0x000000000046ec96 in main (argc=4, argv=0x25425c0) at main.c:228



Re: [sqlsmith] Short reads in hash indexes

From
Amit Kapila
Date:
On Thu, Dec 8, 2016 at 2:38 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
> Andreas Seltenreich writes:
>
>> Amit Kapila writes:
>>
>>> On Sat, Dec 3, 2016 at 3:44 PM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
>>>> Amit Kapila writes:
>>>>
>>>>> [2. text/x-diff; fix_hash_bucketsplit_sqlsmith_v1.patch]
>>>> Ok, I'll do testing with the patch applied.
>>
>> Good news: the assertion hasn't fired since the patch is in.
>
> Meh, it fired again today after being silent for 100e6 queries :-/
> I guess I need to add some confidence qualification on such statements.
> Maybe sigmas as they do at CERN…
>
>> smith=# select * from state_report where sqlstate = 'XX001';
>> -[ RECORD 1 ]------------------------------------------------------------------------------
>> count    | 10
>> sqlstate | XX001
>> sample   | ERROR:  could not read block 1173 in file "base/16384/17256": read only 0 of 8192 bytes
>> hosts    | {airbisquit,frell,gorgo,marbit,pillcrow,quakken}
>>
>>> Hmm, I am not sure if this is related to previous problem, but it
>>> could be.  Is it possible to get the operation and or callstack for
>>> above failure?
>>
>> Ok, will turn the elog into an assertion to get at the backtraces.
>
> Doing so on top of 4212cb7, I caught the backtrace below.  Query was:
>

Thanks for the report, I will look into it.  I think this one is quite
similar to what Jeff has reported [1].

[1] -
https://www.postgresql.org/message-id/CAMkU%3D1ydfriLCOriJ%3DAxtF%3DhhBOUUcWtf172vquDrj%3D3T7yXmg%40mail.gmail.com


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] [sqlsmith] Short reads in hash indexes

From
Amit Kapila
Date:
On Thu, Dec 8, 2016 at 2:38 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
> Andreas Seltenreich writes:
>
>> Amit Kapila writes:
>>
>>> On Sat, Dec 3, 2016 at 3:44 PM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
>>>> Amit Kapila writes:
>>>>
>>>>> [2. text/x-diff; fix_hash_bucketsplit_sqlsmith_v1.patch]
>>>> Ok, I'll do testing with the patch applied.
>>
>> Good news: the assertion hasn't fired since the patch is in.
>
> Meh, it fired again today after being silent for 100e6 queries :-/
> I guess I need to add some confidence qualification on such statements.
> Maybe sigmas as they do at CERN…
>

This assertion can be reproduced with Jeff's test as well and the fix
for the same is posted [1].

>> smith=# select * from state_report where sqlstate = 'XX001';
>> -[ RECORD 1 ]------------------------------------------------------------------------------
>> count    | 10
>> sqlstate | XX001
>> sample   | ERROR:  could not read block 1173 in file "base/16384/17256": read only 0 of 8192 bytes
>> hosts    | {airbisquit,frell,gorgo,marbit,pillcrow,quakken}
>>
>>> Hmm, I am not sure if this is related to previous problem, but it
>>> could be.  Is it possible to get the operation and or callstack for
>>> above failure?
>>
>> Ok, will turn the elog into an assertion to get at the backtraces.
>
> Doing so on top of 4212cb7, I caught the backtrace below.  Query was:
>
> --8<---------------cut here---------------start------------->8---
> set max_parallel_workers_per_gather = 0;
> select  count(1) from
>        public.hash_name_heap as ref_2
>        join public.rtest_emplog as sample_1
>               on (ref_2.random = sample_1.who);
> --8<---------------cut here---------------end--------------->8---
>
> I've put the data directory where it can be reproduced here:
>
>     http://ansel.ydns.eu/~andreas/hash_index_short_read.tar.xz (12MB)
>

This can happen due to non-marking of the dirty buffer as the index
page where we have deleted the tuples will not be flushed whereas
vacuum would have removed corresponding heap tuples.  Next access to
hash index page will bring back the old copy of index page which
contains tuples that were supposed to get deleted by vacuum and
accessing those tuples will give wrong information about heap tuples
and when we try to access deleted heap tuples, it can give us short
reads problem.

Can you please try with the patch posted on hash index thread [1] to
see if you can reproduce any of these problems?

[1] - https://www.postgresql.org/message-id/CAA4eK1Kf6tOY0oVz_SEdngiNFkeXrA3xUSDPPORQvsWVPdKqnA%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] [sqlsmith] Short reads in hash indexes

From
Andreas Seltenreich
Date:
Amit Kapila writes:

> Can you please try with the patch posted on hash index thread [1] to
> see if you can reproduce any of these problems?
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1Kf6tOY0oVz_SEdngiNFkeXrA3xUSDPPORQvsWVPdKqnA%40mail.gmail.com

I'm no longer seeing the failed assertions nor short reads since these
patches are in.

regards,
Andreas



Re: [HACKERS] [sqlsmith] Short reads in hash indexes

From
Amit Kapila
Date:
On Fri, Dec 30, 2016 at 3:45 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
> Amit Kapila writes:
>
>> Can you please try with the patch posted on hash index thread [1] to
>> see if you can reproduce any of these problems?
>>
>> [1] - https://www.postgresql.org/message-id/CAA4eK1Kf6tOY0oVz_SEdngiNFkeXrA3xUSDPPORQvsWVPdKqnA%40mail.gmail.com
>
> I'm no longer seeing the failed assertions nor short reads since these
> patches are in.
>

Thanks for the confirmation!

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com