Thread: Re: [HACKERS] Cube extension point support // GSoC'13

Re: [HACKERS] Cube extension point support // GSoC'13

From
Heikki Linnakangas
Date:
On 12.07.2013 14:57, Stas Kelvich wrote:
> Hello.
>
> here is a patch adding to cube extension support for compressed representation of point cubes. If cube is a point,
i.e.has coincident lower left and upper right corners, than only one corner is stored. First bit of the cube header
indicateswhether the cube is point or not. Few moments: 
>
> * Patch preserves binary compatibility with old indices
> * All functions that create cubes from user input, check whether it is a point or not
> * All internal functions that can return cubes takes care of all cases where a cube might become a point

Great!

cube_is_point() needs to still handle old-style points. An NDBOX without
the point-flag set, where the ll and ur coordinates for each dimension
are the same, still needs to be considered a point. Even if you are
careful to never construct such structs in the code, they can be present
on-disk if you have upgraded from an earlier version with pg_upgrade.
Same in cube_out().

> * Added tests for checking correct point behavior

You'll need to adjust all the expected output files, not only cube_1.out.

> Also this patch includes adapted Alexander Korotkov's patch with kNN-based ordering operator, which he wrote for
postgresql-9.0beta1with knngist patch. More info there
http://www.postgresql.org/message-id/AANLkTimhFaq6hCibRnk0tlcQMIyhYWHwAQ2ZD87wbH86@mail.gmail.com

To make review easier, it would be better to keep that as a separate
patch, actually. Could you split it up again, please?

- Heikki


Re: [HACKERS] Cube extension point support // GSoC'13

From
Stas Kelvich
Date:
Hello

There is new version of patch. I have separated ordering operators to different patch
(https://commitfest.postgresql.org/action/patch_view?id=1243),fixed formatting issues and implemented backward
compatibilitywith old-style points in cube_is_point() and cube_out(). 

Also comparing output files I've discovered that this four files is combination of two types of different behavior:

1) SELECT '-1e-700'::cube AS cube;
can be (0) or (-0)

2) Amount of zeros in exponent of floating point, i.e. SELECT '1e27'::cube AS cube;
 can be (1e+027) or (1e+27)

On my system (OSX) it is second option in both situations. I've also tested it on FreeBSD 9.0 and Ubuntu 12.04 with the
sameresults. So is there some ideas how can I reproduce such results? 

Stas.



On Sep 16, 2013, at 10:48 AM, Heikki Linnakangas wrote:

> On 12.07.2013 14:57, Stas Kelvich wrote:
>> Hello.
>>
>> here is a patch adding to cube extension support for compressed representation of point cubes. If cube is a point,
i.e.has coincident lower left and upper right corners, than only one corner is stored. First bit of the cube header
indicateswhether the cube is point or not. Few moments: 
>>
>> * Patch preserves binary compatibility with old indices
>> * All functions that create cubes from user input, check whether it is a point or not
>> * All internal functions that can return cubes takes care of all cases where a cube might become a point
>
> Great!
>
> cube_is_point() needs to still handle old-style points. An NDBOX without the point-flag set, where the ll and ur
coordinatesfor each dimension are the same, still needs to be considered a point. Even if you are careful to never
constructsuch structs in the code, they can be present on-disk if you have upgraded from an earlier version with
pg_upgrade.Same in cube_out(). 
>
>> * Added tests for checking correct point behavior
>
> You'll need to adjust all the expected output files, not only cube_1.out.
>
>> Also this patch includes adapted Alexander Korotkov's patch with kNN-based ordering operator, which he wrote for
postgresql-9.0beta1with knngist patch. More info there
http://www.postgresql.org/message-id/AANLkTimhFaq6hCibRnk0tlcQMIyhYWHwAQ2ZD87wbH86@mail.gmail.com
>
> To make review easier, it would be better to keep that as a separate patch, actually. Could you split it up again,
please?
>
> - Heikki


Attachment

Re: [HACKERS] Cube extension point support // GSoC'13

From
Robert Haas
Date:
On Tue, Sep 24, 2013 at 9:07 AM, Stas Kelvich <stas.kelvich@gmail.com> wrote:
> Hello
>
> There is new version of patch. I have separated ordering operators to different patch
(https://commitfest.postgresql.org/action/patch_view?id=1243),fixed formatting issues and implemented backward
compatibilitywith old-style points in cube_is_point() and cube_out(). 
>
> Also comparing output files I've discovered that this four files is combination of two types of different behavior:
>
> 1) SELECT '-1e-700'::cube AS cube;
> can be (0) or (-0)
>
> 2) Amount of zeros in exponent of floating point, i.e. SELECT '1e27'::cube AS cube;
>  can be (1e+027) or (1e+27)
>
> On my system (OSX) it is second option in both situations. I've also tested it on FreeBSD 9.0 and Ubuntu 12.04 with
thesame results. So is there some ideas how can I reproduce such results? 

Heikki, are you going to review this further for this CommitFest?

Is there a prerequisite patch that hasn't been committed yet?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Cube extension point support // GSoC'13

From
Heikki Linnakangas
Date:
On 09.10.2013 21:07, Robert Haas wrote:
> On Tue, Sep 24, 2013 at 9:07 AM, Stas Kelvich<stas.kelvich@gmail.com>  wrote:
>> Hello
>>
>> There is new version of patch. I have separated ordering operators to different patch
(https://commitfest.postgresql.org/action/patch_view?id=1243),fixed formatting issues and implemented backward
compatibilitywith old-style points in cube_is_point() and cube_out(). 
>>
>> Also comparing output files I've discovered that this four files is combination of two types of different behavior:
>>
>> 1) SELECT '-1e-700'::cube AS cube;
>> can be (0) or (-0)
>>
>> 2) Amount of zeros in exponent of floating point, i.e. SELECT '1e27'::cube AS cube;
>>   can be (1e+027) or (1e+27)
>>
>> On my system (OSX) it is second option in both situations. I've also tested it on FreeBSD 9.0 and Ubuntu 12.04 with
thesame results. So is there some ideas how can I reproduce such results? 
>
> Heikki, are you going to review this further for this CommitFest?

Sorry, I didn't realize the ball was in my court.

I went through the patch now, kibitzing over some minor style issues.
Attached is a new version.

This seems good for commit except for two things:

1. The alternative expected output files still need to be updated. Stas
couldn't find a system where some of those file were used. One option is
to simply commit the patch as is, and see if the buildfarm goes red. If
it doesn't, we can simply remove the alternative files - they are not
used on any supported platform. If some animals go red, we'll get the
required diff from the buildfarm output and apply. So this isn't a
show-stopper.

2. I didn't understand this change:

> @@ -422,24 +439,14 @@ g_cube_union(PG_FUNCTION_ARGS)
>  Datum
>  g_cube_compress(PG_FUNCTION_ARGS)
>  {
> -    PG_RETURN_DATUM(PG_GETARG_DATUM(0));
> +    GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
> +    PG_RETURN_POINTER(entry);
>  }
>
>  Datum
>  g_cube_decompress(PG_FUNCTION_ARGS)
>  {
>      GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
> -    NDBOX       *key = DatumGetNDBOX(PG_DETOAST_DATUM(entry->key));
> -
> -    if (key != DatumGetNDBOX(entry->key))
> -    {
> -        GISTENTRY  *retval = (GISTENTRY *) palloc(sizeof(GISTENTRY));
> -
> -        gistentryinit(*retval, PointerGetDatum(key),
> -                      entry->rel, entry->page,
> -                      entry->offset, FALSE);
> -        PG_RETURN_POINTER(retval);
> -    }
>      PG_RETURN_POINTER(entry);
>  }
>

What did the removed code do, and why isn't it needed anymore?

> Is there a prerequisite patch that hasn't been committed yet?

No.

- Heikki

Attachment

Re: [HACKERS] Cube extension point support // GSoC'13

From
Alexander Korotkov
Date:
On Fri, Oct 11, 2013 at 5:56 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
2. I didn't understand this change:

@@ -422,24 +439,14 @@ g_cube_union(PG_FUNCTION_ARGS)
 Datum
 g_cube_compress(PG_FUNCTION_ARGS)
 {
-       PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+       GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
+       PG_RETURN_POINTER(entry);
 }

 Datum
 g_cube_decompress(PG_FUNCTION_ARGS)
 {
        GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
-       NDBOX      *key = DatumGetNDBOX(PG_DETOAST_DATUM(entry->key));
-
-       if (key != DatumGetNDBOX(entry->key))
-       {
-               GISTENTRY  *retval = (GISTENTRY *) palloc(sizeof(GISTENTRY));
-
-               gistentryinit(*retval, PointerGetDatum(key),
-                                         entry->rel, entry->page,
-                                         entry->offset, FALSE);
-               PG_RETURN_POINTER(retval);
-       }
        PG_RETURN_POINTER(entry);
 }


What did the removed code do, and why isn't it needed anymore?

I don't understand this place even more generally. What decompress function do is to detoast key and return new gist entry with it if needed. But can GiST key ever be toasted? My experiments show that answer is no, it can't. When too long key is passed then GiST falls during inserting key into page. And I didn't find any code doing GiST key toast in git.
However taking care about key toast is sequentially repeated in GiST related code. For example, in contrib/intarray/_int.h

#define SIGLENINT  63 /* >122 => key will toast, so very slow!!! */

Any ideas?

------
With best regards,
Alexander Korotkov.

Re: [HACKERS] Cube extension point support // GSoC'13

From
Heikki Linnakangas
Date:
On 11.10.2013 17:39, Alexander Korotkov wrote:
> On Fri, Oct 11, 2013 at 5:56 PM, Heikki Linnakangas<hlinnakangas@vmware.com
>> wrote:
>
>> 2. I didn't understand this change:
>>
>>   @@ -422,24 +439,14 @@ g_cube_union(PG_FUNCTION_ARGS)
>>>   Datum
>>>   g_cube_compress(PG_FUNCTION_**ARGS)
>>>   {
>>> -       PG_RETURN_DATUM(PG_GETARG_**DATUM(0));
>>> +       GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
>>> +       PG_RETURN_POINTER(entry);
>>>   }
>>>
>>>   Datum
>>>   g_cube_decompress(PG_FUNCTION_**ARGS)
>>>   {
>>>          GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
>>> -       NDBOX      *key = DatumGetNDBOX(PG_DETOAST_**DATUM(entry->key));
>>> -
>>> -       if (key != DatumGetNDBOX(entry->key))
>>> -       {
>>> -               GISTENTRY  *retval = (GISTENTRY *)
>>> palloc(sizeof(GISTENTRY));
>>> -
>>> -               gistentryinit(*retval, PointerGetDatum(key),
>>> -                                         entry->rel, entry->page,
>>> -                                         entry->offset, FALSE);
>>> -               PG_RETURN_POINTER(retval);
>>> -       }
>>>          PG_RETURN_POINTER(entry);
>>>   }
>>>
>>>
>> What did the removed code do, and why isn't it needed anymore?
>
> I don't understand this place even more generally. What decompress function
> do is to detoast key and return new gist entry with it if needed. But can
> GiST key ever be toasted? My experiments show that answer is no, it can't.
> When too long key is passed then GiST falls during inserting key into page.
> And I didn't find any code doing GiST key toast in git.

I guess it can't happen. Or is it possible that a toasted value that
came from disk will be passed to these functions, without detoasting
them somewhere along the way? Not sure.

In any case, I've committed this patch now. I left out the changes to
compress and decompress functions, as that's unrelated to the patch at
hand. I ran the regression test on a Windows VM, and updated the
expected output file that the Windows build used. That leaves two
alternative expected output files unmodified - the buildfarm will tell
us whether they're still needed.

Thanks and congrats on your first committed patch, Stas! :-)

- Heikki


Re: [HACKERS] Cube extension point support // GSoC'13

From
Alexander Korotkov
Date:
On Mon, Oct 21, 2013 at 11:06 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 11.10.2013 17:39, Alexander Korotkov wrote:
On Fri, Oct 11, 2013 at 5:56 PM, Heikki Linnakangas<hlinnakangas@vmware.com
wrote:

2. I didn't understand this change:

  @@ -422,24 +439,14 @@ g_cube_union(PG_FUNCTION_ARGS)
  Datum
  g_cube_compress(PG_FUNCTION_**ARGS)
  {
-       PG_RETURN_DATUM(PG_GETARG_**DATUM(0));

+       GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
+       PG_RETURN_POINTER(entry);
  }

  Datum
  g_cube_decompress(PG_FUNCTION_**ARGS)

  {
         GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
-       NDBOX      *key = DatumGetNDBOX(PG_DETOAST_**DATUM(entry->key));

-
-       if (key != DatumGetNDBOX(entry->key))
-       {
-               GISTENTRY  *retval = (GISTENTRY *)
palloc(sizeof(GISTENTRY));
-
-               gistentryinit(*retval, PointerGetDatum(key),
-                                         entry->rel, entry->page,
-                                         entry->offset, FALSE);
-               PG_RETURN_POINTER(retval);
-       }
         PG_RETURN_POINTER(entry);
  }


What did the removed code do, and why isn't it needed anymore?

I don't understand this place even more generally. What decompress function
do is to detoast key and return new gist entry with it if needed. But can
GiST key ever be toasted? My experiments show that answer is no, it can't.
When too long key is passed then GiST falls during inserting key into page.
And I didn't find any code doing GiST key toast in git.

I guess it can't happen. Or is it possible that a toasted value that came from disk will be passed to these functions, without detoasting them somewhere along the way? Not sure.
 
I will ask Teodor about it. When situation will be completely clear we should cleanup confusing comments.
Also, too long GiST key shouldn't cause GiST to fall in its internals. We should add check somewhere before.

In any case, I've committed this patch now. I left out the changes to compress and decompress functions, as that's unrelated to the patch at hand. I ran the regression test on a Windows VM, and updated the expected output file that the Windows build used. That leaves two alternative expected output files unmodified - the buildfarm will tell us whether they're still needed.

Thanks and congrats on your first committed patch, Stas! :-)

Stas, congratulations from me too! Now, it's time to rebase another patches against head. :)

------
With best regards,
Alexander Korotkov.
 

Re: [HACKERS] Cube extension point support // GSoC'13

From
Tom Lane
Date:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> On Mon, Oct 21, 2013 at 11:06 PM, Heikki Linnakangas <
> hlinnakangas@vmware.com> wrote:
>> I guess it can't happen. Or is it possible that a toasted value that came
>> from disk will be passed to these functions, without detoasting them
>> somewhere along the way? Not sure.

> I will ask Teodor about it. When situation will be completely clear we
> should cleanup confusing comments.

I believe the reason GIST has compress/decompress functions is not for
TOAST (they predate that, if memory serves), but to allow the on-disk
representation of an index entry to be different from the data type's
normal representation in other ways --- think lossy storage in particular.

It's possible that the code in cube's decompress/decompress functions is
unnecessary because cube doesn't actually do any such compression; the
code might've been copy-pasted from somewhere else without adequate
thought about whether it was useful for cubes.  I'd want to see some
discussion about why it's okay to change it before we do so, though.
In any case it seems separate from the claimed purpose of this patch
and thus better done in a different patch.

            regards, tom lane


Re: [HACKERS] Cube extension point support // GSoC'13

From
Dimitri Fontaine
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> I believe the reason GIST has compress/decompress functions is not for
> TOAST (they predate that, if memory serves), but to allow the on-disk
> representation of an index entry to be different from the data type's
> normal representation in other ways --- think lossy storage in particular.

My understanding of the use case for those functions is to do with
storing a different data type in the index upper nodes and in the index
leafs. It should be possible to do that in a non-lossy way, so that you
would implement compress/decompress and not declare the RECHECK bits.

Then again I'm talking from 8.3 era memories of when I tried to
understand GiST enough to code the prefix extension.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: [HACKERS] Cube extension point support // GSoC'13

From
Alexander Korotkov
Date:
On Tue, Oct 22, 2013 at 2:00 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> I believe the reason GIST has compress/decompress functions is not for
> TOAST (they predate that, if memory serves), but to allow the on-disk
> representation of an index entry to be different from the data type's
> normal representation in other ways --- think lossy storage in particular.

My understanding of the use case for those functions is to do with
storing a different data type in the index upper nodes and in the index
leafs. It should be possible to do that in a non-lossy way, so that you
would implement compress/decompress and not declare the RECHECK bits.

Then again I'm talking from 8.3 era memories of when I tried to
understand GiST enough to code the prefix extension.

Actually, I mean purpose of this particular decompress function implementation, not compress/decompress in general. I understand that in general compress/decompress can do useful job.

------
With best regards,
Alexander Korotkov.