Thread: Add a filed to PageHeaderData

Add a filed to PageHeaderData

From
Soroosh Sardari
Date:
Dear Hackers

I wanted to add a char array with length of 20 to PageHeaderData in include/storage/bufpage.h.
Surprisingly regression test failed on rangetypes test!

The diff of resulted and expected file is :

*** 968,974 ****
  select count(*) from test_range_spgist where ir -|- int4range(100,500);
   count
  -------
!      5
  (1 row)
 
  -- now check same queries using a bulk-loaded index
--- 968,974 ----
  select count(*) from test_range_spgist where ir -|- int4range(100,500);
   count
  -------
!      2
  (1 row)
 
  -- now check same queries using a bulk-loaded index

======================================================================

Any help appreciated.


Soroosh Sardari

Re: Add a filed to PageHeaderData

From
Soroosh Sardari
Date:


On Mon, Jun 23, 2014 at 10:23 AM, Soroosh Sardari <soroosh.sardari@gmail.com> wrote:
Dear Hackers

I wanted to add a char array with length of 20 to PageHeaderData in include/storage/bufpage.h.
Surprisingly regression test failed on rangetypes test!

The diff of resulted and expected file is :

*** 968,974 ****
  select count(*) from test_range_spgist where ir -|- int4range(100,500);
   count
  -------
!      5
  (1 row)
 
  -- now check same queries using a bulk-loaded index
--- 968,974 ----
  select count(*) from test_range_spgist where ir -|- int4range(100,500);
   count
  -------
!      2
  (1 row)
 
  -- now check same queries using a bulk-loaded index

======================================================================

Any help appreciated.


Soroosh Sardari


Is there any rule for adding a field to PageHeaderData?
 

Re: Add a filed to PageHeaderData

From
Greg Stark
Date:
On Tue, Jun 24, 2014 at 12:02 AM, Soroosh Sardari
<soroosh.sardari@gmail.com> wrote:
> Is there any rule for adding a field to PageHeaderData?

Not really. It's a pretty internal thing, not something we expect
people to be doing all the time.

The only rule I can think of is that you should bump some version
numbers such as the page format version and probably the catalog
version. But that's probably irrelevant to your problem. It sounds
like you have a bug in your code but you haven't posted enough
information to say much more.



-- 
greg



Re: Add a filed to PageHeaderData

From
Andres Freund
Date:
On 2014-06-24 01:58:32 -0700, Greg Stark wrote:
> On Tue, Jun 24, 2014 at 12:02 AM, Soroosh Sardari
> <soroosh.sardari@gmail.com> wrote:
> > Is there any rule for adding a field to PageHeaderData?
> 
> Not really. It's a pretty internal thing, not something we expect
> people to be doing all the time.

I'd actually say that 99% of the things that need it are not going to
happen because we don't want to break on disk compatibility.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Add a filed to PageHeaderData

From
Pavan Deolasee
Date:
On Tue, Jun 24, 2014 at 2:28 PM, Greg Stark <stark@mit.edu> wrote:
On Tue, Jun 24, 2014 at 12:02 AM, Soroosh Sardari
<soroosh.sardari@gmail.com> wrote:
> Is there any rule for adding a field to PageHeaderData?

Not really. It's a pretty internal thing, not something we expect
people to be doing all the time.

The only rule I can think of is that you should bump some version
numbers such as the page format version and probably the catalog
version. But that's probably irrelevant to your problem. It sounds
like you have a bug in your code but you haven't posted enough
information to say much more.


Out of curiosity, I actually tried adding a char[20] field in the page header because just like you I thought this should be completely internal, as long as the field is added before the pd_linp[] field. But I get the same failure that OP is reporting. I wonder if its a bug in gist index build, though I could not spot anything at the first glance. FWIW changing the char[] from 20 to 22 or 24 does not cause any failure in rangetypes test. So I am thinking its some alignment issue (mine is a 64 bit build)

Thanks,
Pavan
--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: Add a filed to PageHeaderData

From
Soroosh Sardari
Date:



On Tue, Jun 24, 2014 at 1:34 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
On Tue, Jun 24, 2014 at 2:28 PM, Greg Stark <stark@mit.edu> wrote:
On Tue, Jun 24, 2014 at 12:02 AM, Soroosh Sardari
<soroosh.sardari@gmail.com> wrote:
> Is there any rule for adding a field to PageHeaderData?

Not really. It's a pretty internal thing, not something we expect
people to be doing all the time.

The only rule I can think of is that you should bump some version
numbers such as the page format version and probably the catalog
version. But that's probably irrelevant to your problem. It sounds
like you have a bug in your code but you haven't posted enough
information to say much more.


Out of curiosity, I actually tried adding a char[20] field in the page header because just like you I thought this should be completely internal, as long as the field is added before the pd_linp[] field. But I get the same failure that OP is reporting. I wonder if its a bug in gist index build, though I could not spot anything at the first glance. FWIW changing the char[] from 20 to 22 or 24 does not cause any failure in rangetypes test. So I am thinking its some alignment issue (mine is a 64 bit build)

Thanks,
Pavan
--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee



I check this problem with a virgin source code of postgresql-9.3.2. So the bug is not for my codes.
As Pavan said, may be some alignment issues cause this problem.
By the way, following code has two different output and it is weird.

drop table if exists test_range_spgist;
create table test_range_spgist(ir int4range);
create index test_range_spgist_idx on test_range_spgist using spgist (ir);
insert into test_range_spgist select int4range(g, g+10) from
generate_series(1,590) g;


SET enable_seqscan    = t;
SET enable_indexscan  = f;
SET enable_bitmapscan = f;

select * from test_range_spgist where ir -|- int4range(100,500);

SET enable_seqscan    = f;
SET enable_indexscan  = t;
SET enable_bitmapscan = f;

select * from test_range_spgist where ir -|- int4range(100,500);


Regards,
Soroosh

Re: Add a filed to PageHeaderData

From
Abhijit Menon-Sen
Date:
At 2014-06-24 14:21:24 +0430, soroosh.sardari@gmail.com wrote:
>
> By the way, following code has two different output and it is weird.

I get the same output from both queries with both 9.3.4 and HEAD:
   ir     
-----------[90,100)[500,510)
(2 rows)

If you're reporting a problem, please make some effort to provide enough
details to reproduce it. From your mail I could guess that you tried it
on 9.3.2, but please try not to make people guess.

-- Abhijit



Re: Add a filed to PageHeaderData

From
Kevin Grittner
Date:
Soroosh Sardari <soroosh.sardari@gmail.com> wrote:

> I check this problem with a virgin source code of
> postgresql-9.3.2. So the bug is not for my codes.

> By the way, following code has two different output and it is
> weird.

I can confirm that I see the difference in 9.3.2, and that I don't
see the difference in 9.3.4.  Upgrade.

http://www.postgresql.org/support/versioning/

There's really no point in reporting a possible bug on a version
with known bugs which have already had fixes published.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Hooks Docu - list of hooks

From
geohas
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi to all!

I am searching for a documentation of hooks in PG, all i found was a
presentation in the wiki and some modules from 2ndQuadrant and petere on
github.  The last three weeks i was reading the source code to get some
information.

Is there a list of possible hooks, or maybe a little docu or overview?
Especially hooks to catch Insert, Update and Delete Stmts and SubQuerys.

It would help a lot to finish / write a log into Tables Module.



Please excuse my mail, if there was a similar question on the list, i
subscribed today and a simple search in the archive showd no results.

regards

geohas

PS: I've an excuse for my bad english - i'am austrian ;)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTqVetAAoJEJFGMlQe7wR/ae0H/Rkt0G5d6kspgWhPyN/aIWIS
wTYKDdxDWt+EeyuCg7SWx/UxJLW22wnWKxmLjvfkT+/ibkCv5qmYRLMOh+cvH0O9
AimWP7fZX+VpYSfpmm/SuvuwUM3OQiM3iwU6MIpu4XfrulAD3F94/aafNp3D2jBK
Fz/J/Sjmr9LN/YBuE99i6asUJG669m4ISsmMpNwXPAh3wv+A3sN0dhvDCFJ11iCL
hIXqktMpm60iI5sIQUPUjgSTHFTj3aGuKtX3OCWPM4CHoaHwDNtq1klHeuiLSb3y
enjMW4tvTWtPw8DIkEgpatn8gsJvXVIjfsZPiTsp8HbN2evhkYxsgfV89R8usRU=
=vA51
-----END PGP SIGNATURE-----




Re: Add a filed to PageHeaderData

From
Soroosh Sardari
Date:



On Tue, Jun 24, 2014 at 2:40 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
Soroosh Sardari <soroosh.sardari@gmail.com> wrote:

> I check this problem with a virgin source code of
> postgresql-9.3.2. So the bug is not for my codes.

> By the way, following code has two different output and it is
> weird.

I can confirm that I see the difference in 9.3.2, and that I don't
see the difference in 9.3.4.  Upgrade.

http://www.postgresql.org/support/versioning/

There's really no point in reporting a possible bug on a version
with known bugs which have already had fixes published.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


wow, it's arch-dependent.
in the 32-bit compiled of PG9.3.2 the code has same output and in 64-bit binary of same code output is different!!

The problem is not about the sql code I posted in the last email. Problem could be different in any architecture,
In 32-bit or 64-bit architecture adding a char array of length 20 to PageHeaderData cause error in regression test.

Regards,
Soroosh

Re: Add a filed to PageHeaderData

From
Andres Freund
Date:
On 2014-06-24 15:23:54 +0430, Soroosh Sardari wrote:
> On Tue, Jun 24, 2014 at 2:40 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> 
> > Soroosh Sardari <soroosh.sardari@gmail.com> wrote:
> >
> > > I check this problem with a virgin source code of
> > > postgresql-9.3.2. So the bug is not for my codes.
> >
> > > By the way, following code has two different output and it is
> > > weird.
> >
> > I can confirm that I see the difference in 9.3.2, and that I don't
> > see the difference in 9.3.4.  Upgrade.
> >
> > http://www.postgresql.org/support/versioning/
> >
> > There's really no point in reporting a possible bug on a version
> > with known bugs which have already had fixes published.
> >
> > --
> > Kevin Grittner
> > EDB: http://www.enterprisedb.com
> > The Enterprise PostgreSQL Company
> >
> 
> 
> wow, it's arch-dependent.
> in the 32-bit compiled of PG9.3.2 the code has same output and in 64-bit
> binary of same code output is different!!
> 
> The problem is not about the sql code I posted in the last email. Problem
> could be different in any architecture,
> In 32-bit or 64-bit architecture adding a char array of length 20 to
> PageHeaderData cause error in regression test.

You likely didn't adapt SizeOfPageHederData.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Hooks Docu - list of hooks

From
Abhijit Menon-Sen
Date:
At 2014-06-24 12:49:17 +0200, lists@hasibether.at wrote:
>
> Is there a list of possible hooks, or maybe a little docu or overview?

The best I found was "git grep _hook_type" and then read the code to
understand when and why the hook was called.

> Especially hooks to catch Insert, Update and Delete Stmts and
> SubQuerys.
> 
> It would help a lot to finish / write a log into Tables Module.

Look at how pgaudit does it: https://github.com/2ndQuadrant/pgaudit

The code has comments about how the various available hooks are used.
(I was planning to implement a bgwriter that wrote log messages to a
table, which sounds a bit like what you want to do.)

-- Abhijit



Re: Hooks Docu - list of hooks

From
geohas
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On 24/06/14 12:59, Abhijit Menon-Sen wrote:
> At 2014-06-24 12:49:17 +0200, lists@hasibether.at wrote:
>>
>> Is there a list of possible hooks, or maybe a little docu or overview?
>
> The best I found was "git grep _hook_type" and then read the code to
> understand when and why the hook was called.
>
>> Especially hooks to catch Insert, Update and Delete Stmts and
>> SubQuerys.
>>
>> It would help a lot to finish / write a log into Tables Module.
>
> Look at how pgaudit does it: https://github.com/2ndQuadrant/pgaudit
I already tried pgaudit ;), one of the best examples, it helped me much.
>
>
> The code has comments about how the various available hooks are used.
> (I was planning to implement a bgwriter that wrote log messages to a
> table, which sounds a bit like what you want to do.)
The module i'm thinking of, working on, is a bit inspired from pgaudit
and petere's pg_trashcan.
It should copy every created table in a "shadow"-schema with extra
columns for record on / record off and Userid (this is already working ;)).
In case of a drop statement it should rename the table in the shadow
schema XXX-droped-Date.

Now i am trying to catch the planned Stmts, ...
It should work without triggers - because the shadow schema should only
be visible for user postgres.

regards
geohas

>
>
> -- Abhijit
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTqWQ4AAoJEJFGMlQe7wR/8CEIAJihWVGc//dDHGF9lDtMo3Ds
v1Xhd5U9n1tLL/Cx0/cqnslKctdfSCY2I/ptjNSDFO8U/YdUjNdPf4nYvxn0gjKR
n8VuC61BDr6qHFQvlJE7GLv2hs2GCxFM5dEgnV7foJjT18C/VgnSRFulJzxU87EZ
8uKG53+CM9ERDa5P9py9jyvrJJvIAXk9AAfevU9g+jimwK9OntwkC7ZfyVWEDwfr
x7LDyrzhge/EIco01pzJSimuVd0BPvTQ8V7XUTpy25xS+D8968wE8eRBaMWXH0b2
KR5lju+sz+SyVQKildcyExOEQWN3PgVmST5USAy9cAzPIuic+yR+qsa5H2VRTFI=
=ZYct
-----END PGP SIGNATURE-----




Re: Add a filed to PageHeaderData

From
Soroosh Sardari
Date:
<div dir="ltr"><br /><div class="gmail_extra"><br /><br /><div class="gmail_quote">On Tue, Jun 24, 2014 at 3:27 PM,
AndresFreund <span dir="ltr"><<a href="mailto:andres@2ndquadrant.com"
target="_blank">andres@2ndquadrant.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0px 0px
0px0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class="h5">On 2014-06-24 15:23:54
+0430,Soroosh Sardari wrote:<br /> > On Tue, Jun 24, 2014 at 2:40 PM, Kevin Grittner <<a
href="mailto:kgrittn@ymail.com">kgrittn@ymail.com</a>>wrote:<br /> ><br /> > > Soroosh Sardari <<a
href="mailto:soroosh.sardari@gmail.com">soroosh.sardari@gmail.com</a>>wrote:<br /> > ><br /> > > > I
checkthis problem with a virgin source code of<br /> > > > postgresql-9.3.2. So the bug is not for my
codes.<br/> > ><br /> > > > By the way, following code has two different output and it is<br /> >
>> weird.<br /> > ><br /> > > I can confirm that I see the difference in 9.3.2, and that I don't<br
/>> > see the difference in 9.3.4.  Upgrade.<br /> > ><br /> > > <a
href="http://www.postgresql.org/support/versioning/"
target="_blank">http://www.postgresql.org/support/versioning/</a><br/> > ><br /> > > There's really no
pointin reporting a possible bug on a version<br /> > > with known bugs which have already had fixes
published.<br/> > ><br /> > > --<br /> > > Kevin Grittner<br /> > > EDB: <a
href="http://www.enterprisedb.com"target="_blank">http://www.enterprisedb.com</a><br /> > > The Enterprise
PostgreSQLCompany<br /> > ><br /> ><br /> ><br /> > wow, it's arch-dependent.<br /> > in the 32-bit
compiledof PG9.3.2 the code has same output and in 64-bit<br /> > binary of same code output is different!!<br />
><br/> > The problem is not about the sql code I posted in the last email. Problem<br /> > could be different
inany architecture,<br /> > In 32-bit or 64-bit architecture adding a char array of length 20 to<br /> >
PageHeaderDatacause error in regression test.<br /><br /></div></div>You likely didn't adapt SizeOfPageHederData.<br
/><divclass=""><div class="h5"><br /> Greetings,<br /><br /> Andres Freund<br /><br /> --<br />  Andres Freund        
           <a href="http://www.2ndQuadrant.com/" target="_blank">http://www.2ndQuadrant.com/</a><br />  PostgreSQL
Development,24x7 Support, Training & Services<br /></div></div></blockquote></div><br /><br /><br />#define
SizeOfPageHeaderData(offsetof(PageHeaderData, pd_linp))<br /><br /></div><div class="gmail_extra">I think ,the macro
doesnot need any change!<br /></div></div> 

Re: Add a filed to PageHeaderData

From
Pavan Deolasee
Date:

On Tue, Jun 24, 2014 at 3:40 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>
> Soroosh Sardari <soroosh.sardari@gmail.com> wrote:
>
> > I check this problem with a virgin source code of
> > postgresql-9.3.2. So the bug is not for my codes.
>
> > By the way, following code has two different output and it is
> > weird.
>
> I can confirm that I see the difference in 9.3.2, and that I don't
> see the difference in 9.3.4.  Upgrade.
>
> http://www.postgresql.org/support/versioning/
>
> There's really no point in reporting a possible bug on a version
> with known bugs which have already had fixes published.
>

FWIW I can reproduce this on HEAD with the attached patch. I could reproduce this on a 64-bit Ubuntu as well as 64-bit Mac OSX. Very confusing it is because I tried with various values for N in char[N] array and it fails for N=20. Other values I tried are 4, 12, 22, 24 and the test passes for all of them. The logic for trying other values is to see if pd_linp[] starting on un-aligned boundary can trigger the issue. But there seem to be no correlation.

postgres=# select version();

PostgreSQL 9.5devel on x86_64-apple-darwin13.2.0, compiled by Apple LLVM version 5.1 (clang-503.0.38) (based on LLVM 3.4svn), 64-bit

postgres=# -- test SP-GiST index that's been built incrementally

postgres=# create table test_range_spgist(ir int4range);
postgres=# create index test_range_spgist_idx on test_range_spgist using spgist (ir);
postgres=# insert into test_range_spgist select int4range(g, g+10) from generate_series(1,586) g;
INSERT 0 586

postgres=# SET enable_seqscan    = t;
postgres=# SET enable_indexscan  = f;
postgres=# SET enable_bitmapscan = f;

postgres=# select * from test_range_spgist where ir -|- int4range(100,500);
    ir    
-----------
[90,100)
[500,510)
(2 rows)

postgres=# SET enable_seqscan    = f;
postgres=# select * from test_range_spgist where ir -|- int4range(100,500);
    ir    
-----------
 [90,100)
 [500,510)
(2 rows)

At this point, both rows are visible via index scan as well as seq scan. 

postgres=# insert into test_range_spgist select int4range(g, g+10) from generate_series(587,587) g;
INSERT 0 1

postgres=# select * from test_range_spgist where ir -|- int4range(100,500);
    ir    
----------
 [90,100)
(1 row)

Ouch. The second row somehow disappeared.

postgres=# SET enable_seqscan    = t;
postgres=# select * from test_range_spgist where ir -|- int4range(100,500);
    ir    
-----------
 [90,100)
 [500,510)
(2 rows)

So the last INSERT suddenly makes one row disappear via the index scan though its still reachable via seq scan. I tried looking at the SP-Gist code but clearly I don't understand it a whole lot to figure out the issue, if one exists.

Thanks,
Pavan
Attachment
On 06/24/2014 08:48 PM, Pavan Deolasee wrote:
> FWIW I can reproduce this on HEAD with the attached patch. I could
> reproduce this on a 64-bit Ubuntu as well as 64-bit Mac OSX. Very confusing
> it is because I tried with various values for N in char[N] array and it
> fails for N=20. Other values I tried are 4, 12, 22, 24 and the test passes
> for all of them. The logic for trying other values is to see if pd_linp[]
> starting on un-aligned boundary can trigger the issue. But there seem to be
> no correlation.
>
> postgres=# select version();
>
> PostgreSQL 9.5devel on x86_64-apple-darwin13.2.0, compiled by Apple LLVM
> version 5.1 (clang-503.0.38) (based on LLVM 3.4svn), 64-bit
>
> postgres=# -- test SP-GiST index that's been built incrementally
>
> postgres=# create table test_range_spgist(ir int4range);
> postgres=# create index test_range_spgist_idx on test_range_spgist using
> spgist (ir);
> postgres=# insert into test_range_spgist select int4range(g, g+10) from
> generate_series(1,586) g;
> INSERT 0 586
>
> postgres=# SET enable_seqscan    = t;
> postgres=# SET enable_indexscan  = f;
> postgres=# SET enable_bitmapscan = f;
>
> postgres=# select * from test_range_spgist where ir -|- int4range(100,500);
>      ir
> -----------
> [90,100)
> [500,510)
> (2 rows)
>
> postgres=# SET enable_seqscan    = f;
> postgres=# select * from test_range_spgist where ir -|- int4range(100,500);
>      ir
> -----------
>   [90,100)
>   [500,510)
> (2 rows)
>
> At this point, both rows are visible via index scan as well as seq scan.
>
> postgres=# insert into test_range_spgist select int4range(g, g+10) from
> generate_series(587,587) g;
> INSERT 0 1
>
> postgres=# select * from test_range_spgist where ir -|- int4range(100,500);
>      ir
> ----------
>   [90,100)
> (1 row)
>
> Ouch. The second row somehow disappeared.
>
> postgres=# SET enable_seqscan    = t;
> postgres=# select * from test_range_spgist where ir -|- int4range(100,500);
>      ir
> -----------
>   [90,100)
>   [500,510)
> (2 rows)
>
> So the last INSERT suddenly makes one row disappear via the index scan
> though its still reachable via seq scan. I tried looking at the SP-Gist
> code but clearly I don't understand it a whole lot to figure out the issue,
> if one exists.

Yeah, I can reproduce this. It doesn't seem to be related to the padding 
or alignment at all. The padding just happens to move tuples around so 
that [500, 510) is picked as an SP-GiST inner node.

The real bug is in spg_range_quad_inner_consistent(), for the adjacent 
operator. Things go wrong when:

The scan key is [100, 500)
The prev centroid is [500, 510)
The current centroid is [544, 554).

The row that should match but isn't returned, [500, 510) is equal to the 
previous centroid. It's in quadrant 3 from the current centroid, but 
spg_range_quad_inner_consistent() incorrectly concludes that it doesn't 
need to scan that quadrant.

The function compares the scan key's upper bound with the the previous 
centroid's lower bound and the current centroid's lower bound:

> /*
>  * Check if upper bound of argument is not in a
>  * quadrant we visited in the previous step.
>  */
> cmp1 = range_cmp_bounds(typcache, &upper, &prevLower);
> cmp2 = range_cmp_bounds(typcache, ¢roidLower,
>             &prevLower);
> if ((cmp2 < 0 && cmp1 > 0) || (cmp2 > 0 && cmp1 < 0))
>     which2 = 0;

The idea is that if the scan key's upper bound doesn't fall between the 
prev and current centroid's lower bounds, there is no match.
  *   *    * PL   X    CL

X = scan key's upper bound: 500)
PL = prev centroid's lower bound [500
CL = current centroid's lower bound [500

This is wrong. X < PL, but it's still nevertheless adjacent to it.

I'll take a closer look tomorrow...

(The "if (which2) ..." block after the code I quoted above also looks 
wrong - it seems to be comparing the argument's lower bound when it 
should be comparing the upper bound according to the comment. )

- Heikki




On 06/24/2014 11:22 PM, Heikki Linnakangas wrote:
> The real bug is in spg_range_quad_inner_consistent(), for the adjacent
> operator. Things go wrong when:
>
> The scan key is [100, 500)
> The prev centroid is [500, 510)
> The current centroid is [544, 554).
>
> The row that should match but isn't returned, [500, 510) is equal to the
> previous centroid. It's in quadrant 3 from the current centroid, but
> spg_range_quad_inner_consistent() incorrectly concludes that it doesn't
> need to scan that quadrant.
>
> The function compares the scan key's upper bound with the the previous
> centroid's lower bound and the current centroid's lower bound:
>
>> /*
>>   * Check if upper bound of argument is not in a
>>   * quadrant we visited in the previous step.
>>   */
>> cmp1 = range_cmp_bounds(typcache, &upper, &prevLower);
>> cmp2 = range_cmp_bounds(typcache, ¢roidLower,
>>             &prevLower);
>> if ((cmp2 < 0 && cmp1 > 0) || (cmp2 > 0 && cmp1 < 0))
>>      which2 = 0;
>
> The idea is that if the scan key's upper bound doesn't fall between the
> prev and current centroid's lower bounds, there is no match.
>
>     *   *    *
>    PL   X    CL
>
> X = scan key's upper bound: 500)
> PL = prev centroid's lower bound [500
> CL = current centroid's lower bound [500
>
> This is wrong. X < PL, but it's still nevertheless adjacent to it.
>
> I'll take a closer look tomorrow...
>
> (The "if (which2) ..." block after the code I quoted above also looks
> wrong - it seems to be comparing the argument's lower bound when it
> should be comparing the upper bound according to the comment. )

I came up with the attached. There were several bugs:

* The "if (which2) { ... }"  block was broken. It compared the
argument's lower bound against centroid's upper bound, while it was
supposed to compare the argument's upper bound against the centroid's
lower bound (the comment was right). Also, it clear bits in the "which1"
variable, while it was supposed to clear bits in "which2". ISTM it was
copy-pasted from the if (which1) { ... }" block just before it, but not
modified.

* If the argument's upper bound was equal to the centroid's lower bound,
we descended to both halves (= all quadrants). That's unnecessary.
Imagine that the argument is (x, 500), and the centroid is (500, y), so
that the bounds are equal. The adjacent ranges that we're trying to find
would be of form [500, z), which are to the right of the centroid. There
is no need to visit the left quadrants. This won't lead to incorrect
query results, but slows down queries unnecessarily.

* In the case that argument's lower bound is adjacent to the centroid's
upper bound, we also don't need to visit all quadrants. Per similar
reasoning as previous point.

* The code where we compare the previous centroid with the current
centroid should match the code where we compare the current centroid
with the argument. The point of that code is to redo the calculation
done in the previous level, to see if we were supposed to traverse left
or right (or up or down), and if we actually did. If we moved in the
different direction, then we know there are no matches for bound.

Those could be fixed with quite small adjustments, but I think the code
needs some refactoring. It's complicated as it is, it's very difficult
to understand all the cases and comparisons. Case in point: the patch
was written by Alexander, reviewed by Jeff, and committed by me, and we
all missed the above bugs. So, I propose the attached.

I also wrote the attached little white-box testing tool for this. It
allows you to call spg_range_quad_inner_consistent with the adjacent
strategy, and pass the exact argument, centroid and prev centroid ranges
you want. It prints out the result of which quadrants to visit.

- Heikki


Attachment

Re: Add a filed to PageHeaderData

From
Soroosh Sardari
Date:



On Tue, Jun 24, 2014 at 10:18 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

On Tue, Jun 24, 2014 at 3:40 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>
> Soroosh Sardari <soroosh.sardari@gmail.com> wrote:
>
> > I check this problem with a virgin source code of
> > postgresql-9.3.2. So the bug is not for my codes.
>
> > By the way, following code has two different output and it is
> > weird.
>
> I can confirm that I see the difference in 9.3.2, and that I don't
> see the difference in 9.3.4.  Upgrade.
>
> http://www.postgresql.org/support/versioning/
>
> There's really no point in reporting a possible bug on a version
> with known bugs which have already had fixes published.
>

FWIW I can reproduce this on HEAD with the attached patch. I could reproduce this on a 64-bit Ubuntu as well as 64-bit Mac OSX. Very confusing it is because I tried with various values for N in char[N] array and it fails for N=20. Other values I tried are 4, 12, 22, 24 and the test passes for all of them. The logic for trying other values is to see if pd_linp[] starting on un-aligned boundary can trigger the issue. But there seem to be no correlation.

postgres=# select version();

PostgreSQL 9.5devel on x86_64-apple-darwin13.2.0, compiled by Apple LLVM version 5.1 (clang-503.0.38) (based on LLVM 3.4svn), 64-bit

postgres=# -- test SP-GiST index that's been built incrementally

postgres=# create table test_range_spgist(ir int4range);
postgres=# create index test_range_spgist_idx on test_range_spgist using spgist (ir);
postgres=# insert into test_range_spgist select int4range(g, g+10) from generate_series(1,586) g;
INSERT 0 586

postgres=# SET enable_seqscan    = t;
postgres=# SET enable_indexscan  = f;
postgres=# SET enable_bitmapscan = f;

postgres=# select * from test_range_spgist where ir -|- int4range(100,500);

    ir    
-----------
[90,100)
[500,510)
(2 rows)

postgres=# SET enable_seqscan    = f;
postgres=# select * from test_range_spgist where ir -|- int4range(100,500);

    ir    
-----------
 [90,100)
 [500,510)
(2 rows)

At this point, both rows are visible via index scan as well as seq scan. 

postgres=# insert into test_range_spgist select int4range(g, g+10) from generate_series(587,587) g;
INSERT 0 1

postgres=# select * from test_range_spgist where ir -|- int4range(100,500);
    ir    
----------
 [90,100)
(1 row)

Ouch. The second row somehow disappeared.

postgres=# SET enable_seqscan    = t;
postgres=# select * from test_range_spgist where ir -|- int4range(100,500);

    ir    
-----------
 [90,100)
 [500,510)
(2 rows)

So the last INSERT suddenly makes one row disappear via the index scan though its still reachable via seq scan. I tried looking at the SP-Gist code but clearly I don't understand it a whole lot to figure out the issue, if one exists.


Thanks,
Pavan


Is there any plug in to examine each page of spgist index?
Unfortunately pageinspect only work for btree index.
On Wed, Jun 25, 2014 at 10:39 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

I came up with the attached. There were several bugs:


I tested for the original bug report and patch definitely fixes that. I don't feel qualified enough with SP-Gist to really comment on the other bugs you reported and presumably fixed in the patch.

Thanks,
Pavan 
On Wed, Jul 2, 2014 at 11:11 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
On Wed, Jun 25, 2014 at 10:39 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

I came up with the attached. There were several bugs:


I tested for the original bug report and patch definitely fixes that. I don't feel qualified enough with SP-Gist to really comment on the other bugs you reported and presumably fixed in the patch.


Heikki, did you get chance to commit your patch? IMHO we should get the bug fix in before minor releases next week. My apologies if you've already committed it and I've missed the commit message.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee
On Wed, Jul 16, 2014 at 1:34 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> Heikki, did you get chance to commit your patch? IMHO we should get the bug
> fix in before minor releases next week. My apologies if you've already
> committed it and I've missed the commit message.
FWIW, this patch has not been committed yet. I am not seeing any
recent update on src/backend/utils/adt/rangetypes_spgist.c.
-- 
Michael



On 07/16/2014 08:30 AM, Michael Paquier wrote:
> On Wed, Jul 16, 2014 at 1:34 PM, Pavan Deolasee
> <pavan.deolasee@gmail.com> wrote:
>> Heikki, did you get chance to commit your patch? IMHO we should get the bug
>> fix in before minor releases next week. My apologies if you've already
>> committed it and I've missed the commit message.
> FWIW, this patch has not been committed yet. I am not seeing any
> recent update on src/backend/utils/adt/rangetypes_spgist.c.

Thanks for the reminder, committed now.

- Heikki