Re: [sqlsmith] Failed assertion in _hash_splitbucket_guts - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [sqlsmith] Failed assertion in _hash_splitbucket_guts
Date
Msg-id CAA4eK1KkuKGAQRnWW61Sq+Oi+ycZrfM5aMnrNn3jtCdffRfNDQ@mail.gmail.com
Whole thread Raw
In response to Re: [sqlsmith] Failed assertion in _hash_splitbucket_guts  (Andreas Seltenreich <seltenreich@gmx.de>)
Responses Short reads in hash indexes (was: [sqlsmith] Failed assertion in _hash_splitbucket_guts)
List pgsql-hackers
On Sat, Dec 3, 2016 at 3:44 PM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
> Amit Kapila writes:
>
>> How should I connect to this database?  If I use the user fdw
>> mentioned in pg_hba.conf (changed authentication method to trust in
>> pg_hba.conf), it says the user doesn't exist.  Can you create a user
>> in the database which I can use?
>
> There is also a superuser "postgres" and an unprivileged user "smith"
> you should be able to login with.  You could also start postgres in
> single-user mode to bypass the authentication altogether.
>

Thanks.  I have checked and found that my above speculation seems to
be right which means that old bucket contains tuples from previous
split.  At the location of Assert, I have printed the values of old
bucket, new bucket and actual bucket to which tuple belongs and below
is the result.

regression=# update public.hash_i4_heap set seqno = public.hash_i4_heap.random;
ERROR:  wrong bucket, old bucket:37, new bucket:549, actual bucket:293

So what above means is that tuple should either belong to bucket 37 or
549, but it actually belongs to 293.  Both 293 and 549 are the buckets
that are split from splitted from bucket 37 (you can find that by
using calculation as used in _hash_expandtable).  I have again checked
the code and couldn't find any other reason execpt from what I
mentioned in my previous mail.  So, let us wait for the results of
your new test run.

> Amit Kapila writes:
>
>> Please find attached patch to fix above code.  Now, if this is the
>> reason of the problem you are seeing, it won't fix your existing
>> database as it already contains some tuples in the wrong bucket.  Can
>> you please re-run the test to see if you can reproduce the problem?
>
> Ok, I'll do testing with the patch applied.
>
> Btw, I also find entries like following in the logging database:
>
> ERROR:  could not read block 2638 in file "base/16384/17256": read only 0 of 8192 bytes
>
> …with relfilenode being an hash index.  I usually ignore these as they
> naturally start occuring after a recovery because of an unrelated crash.
> But since 11003eb, they also occur when the cluster has not yet suffered
> a crash.
>

Hmm, I am not sure if this is related to previous problem, but it
could be.  Is it possible to get the operation and or callstack for
above failure?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: Add support for restrictive RLS policies
Next
From: Michael Paquier
Date:
Subject: Better support for symlinks on Windows...