Thread: Re: [HACKERS] Cube extension point support // GSoC'13
On 12.07.2013 14:57, Stas Kelvich wrote: > Hello. > > here is a patch adding to cube extension support for compressed representation of point cubes. If cube is a point, i.e.has coincident lower left and upper right corners, than only one corner is stored. First bit of the cube header indicateswhether the cube is point or not. Few moments: > > * Patch preserves binary compatibility with old indices > * All functions that create cubes from user input, check whether it is a point or not > * All internal functions that can return cubes takes care of all cases where a cube might become a point Great! cube_is_point() needs to still handle old-style points. An NDBOX without the point-flag set, where the ll and ur coordinates for each dimension are the same, still needs to be considered a point. Even if you are careful to never construct such structs in the code, they can be present on-disk if you have upgraded from an earlier version with pg_upgrade. Same in cube_out(). > * Added tests for checking correct point behavior You'll need to adjust all the expected output files, not only cube_1.out. > Also this patch includes adapted Alexander Korotkov's patch with kNN-based ordering operator, which he wrote for postgresql-9.0beta1with knngist patch. More info there http://www.postgresql.org/message-id/AANLkTimhFaq6hCibRnk0tlcQMIyhYWHwAQ2ZD87wbH86@mail.gmail.com To make review easier, it would be better to keep that as a separate patch, actually. Could you split it up again, please? - Heikki
Hello There is new version of patch. I have separated ordering operators to different patch (https://commitfest.postgresql.org/action/patch_view?id=1243),fixed formatting issues and implemented backward compatibilitywith old-style points in cube_is_point() and cube_out(). Also comparing output files I've discovered that this four files is combination of two types of different behavior: 1) SELECT '-1e-700'::cube AS cube; can be (0) or (-0) 2) Amount of zeros in exponent of floating point, i.e. SELECT '1e27'::cube AS cube; can be (1e+027) or (1e+27) On my system (OSX) it is second option in both situations. I've also tested it on FreeBSD 9.0 and Ubuntu 12.04 with the sameresults. So is there some ideas how can I reproduce such results? Stas. On Sep 16, 2013, at 10:48 AM, Heikki Linnakangas wrote: > On 12.07.2013 14:57, Stas Kelvich wrote: >> Hello. >> >> here is a patch adding to cube extension support for compressed representation of point cubes. If cube is a point, i.e.has coincident lower left and upper right corners, than only one corner is stored. First bit of the cube header indicateswhether the cube is point or not. Few moments: >> >> * Patch preserves binary compatibility with old indices >> * All functions that create cubes from user input, check whether it is a point or not >> * All internal functions that can return cubes takes care of all cases where a cube might become a point > > Great! > > cube_is_point() needs to still handle old-style points. An NDBOX without the point-flag set, where the ll and ur coordinatesfor each dimension are the same, still needs to be considered a point. Even if you are careful to never constructsuch structs in the code, they can be present on-disk if you have upgraded from an earlier version with pg_upgrade.Same in cube_out(). > >> * Added tests for checking correct point behavior > > You'll need to adjust all the expected output files, not only cube_1.out. > >> Also this patch includes adapted Alexander Korotkov's patch with kNN-based ordering operator, which he wrote for postgresql-9.0beta1with knngist patch. More info there http://www.postgresql.org/message-id/AANLkTimhFaq6hCibRnk0tlcQMIyhYWHwAQ2ZD87wbH86@mail.gmail.com > > To make review easier, it would be better to keep that as a separate patch, actually. Could you split it up again, please? > > - Heikki
Attachment
On Tue, Sep 24, 2013 at 9:07 AM, Stas Kelvich <stas.kelvich@gmail.com> wrote: > Hello > > There is new version of patch. I have separated ordering operators to different patch (https://commitfest.postgresql.org/action/patch_view?id=1243),fixed formatting issues and implemented backward compatibilitywith old-style points in cube_is_point() and cube_out(). > > Also comparing output files I've discovered that this four files is combination of two types of different behavior: > > 1) SELECT '-1e-700'::cube AS cube; > can be (0) or (-0) > > 2) Amount of zeros in exponent of floating point, i.e. SELECT '1e27'::cube AS cube; > can be (1e+027) or (1e+27) > > On my system (OSX) it is second option in both situations. I've also tested it on FreeBSD 9.0 and Ubuntu 12.04 with thesame results. So is there some ideas how can I reproduce such results? Heikki, are you going to review this further for this CommitFest? Is there a prerequisite patch that hasn't been committed yet? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 09.10.2013 21:07, Robert Haas wrote: > On Tue, Sep 24, 2013 at 9:07 AM, Stas Kelvich<stas.kelvich@gmail.com> wrote: >> Hello >> >> There is new version of patch. I have separated ordering operators to different patch (https://commitfest.postgresql.org/action/patch_view?id=1243),fixed formatting issues and implemented backward compatibilitywith old-style points in cube_is_point() and cube_out(). >> >> Also comparing output files I've discovered that this four files is combination of two types of different behavior: >> >> 1) SELECT '-1e-700'::cube AS cube; >> can be (0) or (-0) >> >> 2) Amount of zeros in exponent of floating point, i.e. SELECT '1e27'::cube AS cube; >> can be (1e+027) or (1e+27) >> >> On my system (OSX) it is second option in both situations. I've also tested it on FreeBSD 9.0 and Ubuntu 12.04 with thesame results. So is there some ideas how can I reproduce such results? > > Heikki, are you going to review this further for this CommitFest? Sorry, I didn't realize the ball was in my court. I went through the patch now, kibitzing over some minor style issues. Attached is a new version. This seems good for commit except for two things: 1. The alternative expected output files still need to be updated. Stas couldn't find a system where some of those file were used. One option is to simply commit the patch as is, and see if the buildfarm goes red. If it doesn't, we can simply remove the alternative files - they are not used on any supported platform. If some animals go red, we'll get the required diff from the buildfarm output and apply. So this isn't a show-stopper. 2. I didn't understand this change: > @@ -422,24 +439,14 @@ g_cube_union(PG_FUNCTION_ARGS) > Datum > g_cube_compress(PG_FUNCTION_ARGS) > { > - PG_RETURN_DATUM(PG_GETARG_DATUM(0)); > + GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0); > + PG_RETURN_POINTER(entry); > } > > Datum > g_cube_decompress(PG_FUNCTION_ARGS) > { > GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0); > - NDBOX *key = DatumGetNDBOX(PG_DETOAST_DATUM(entry->key)); > - > - if (key != DatumGetNDBOX(entry->key)) > - { > - GISTENTRY *retval = (GISTENTRY *) palloc(sizeof(GISTENTRY)); > - > - gistentryinit(*retval, PointerGetDatum(key), > - entry->rel, entry->page, > - entry->offset, FALSE); > - PG_RETURN_POINTER(retval); > - } > PG_RETURN_POINTER(entry); > } > What did the removed code do, and why isn't it needed anymore? > Is there a prerequisite patch that hasn't been committed yet? No. - Heikki
Attachment
On Fri, Oct 11, 2013 at 5:56 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
#define SIGLENINT 63 /* >122 => key will toast, so very slow!!! */
2. I didn't understand this change:@@ -422,24 +439,14 @@ g_cube_union(PG_FUNCTION_ARGS)
Datum
g_cube_compress(PG_FUNCTION_ARGS)
{
- PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+ GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
+ PG_RETURN_POINTER(entry);
}
Datum
g_cube_decompress(PG_FUNCTION_ARGS)
{
GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
- NDBOX *key = DatumGetNDBOX(PG_DETOAST_DATUM(entry->key));
-
- if (key != DatumGetNDBOX(entry->key))
- {
- GISTENTRY *retval = (GISTENTRY *) palloc(sizeof(GISTENTRY));
-
- gistentryinit(*retval, PointerGetDatum(key),
- entry->rel, entry->page,
- entry->offset, FALSE);
- PG_RETURN_POINTER(retval);
- }
PG_RETURN_POINTER(entry);
}
What did the removed code do, and why isn't it needed anymore?
I don't understand this place even more generally. What decompress function do is to detoast key and return new gist entry with it if needed. But can GiST key ever be toasted? My experiments show that answer is no, it can't. When too long key is passed then GiST falls during inserting key into page. And I didn't find any code doing GiST key toast in git.
However taking care about key toast is sequentially repeated in GiST related code. For example, in contrib/intarray/_int.h
#define SIGLENINT 63 /* >122 => key will toast, so very slow!!! */
Any ideas?
------
With best regards,
Alexander Korotkov.
With best regards,
Alexander Korotkov.
On 11.10.2013 17:39, Alexander Korotkov wrote: > On Fri, Oct 11, 2013 at 5:56 PM, Heikki Linnakangas<hlinnakangas@vmware.com >> wrote: > >> 2. I didn't understand this change: >> >> @@ -422,24 +439,14 @@ g_cube_union(PG_FUNCTION_ARGS) >>> Datum >>> g_cube_compress(PG_FUNCTION_**ARGS) >>> { >>> - PG_RETURN_DATUM(PG_GETARG_**DATUM(0)); >>> + GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0); >>> + PG_RETURN_POINTER(entry); >>> } >>> >>> Datum >>> g_cube_decompress(PG_FUNCTION_**ARGS) >>> { >>> GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0); >>> - NDBOX *key = DatumGetNDBOX(PG_DETOAST_**DATUM(entry->key)); >>> - >>> - if (key != DatumGetNDBOX(entry->key)) >>> - { >>> - GISTENTRY *retval = (GISTENTRY *) >>> palloc(sizeof(GISTENTRY)); >>> - >>> - gistentryinit(*retval, PointerGetDatum(key), >>> - entry->rel, entry->page, >>> - entry->offset, FALSE); >>> - PG_RETURN_POINTER(retval); >>> - } >>> PG_RETURN_POINTER(entry); >>> } >>> >>> >> What did the removed code do, and why isn't it needed anymore? > > I don't understand this place even more generally. What decompress function > do is to detoast key and return new gist entry with it if needed. But can > GiST key ever be toasted? My experiments show that answer is no, it can't. > When too long key is passed then GiST falls during inserting key into page. > And I didn't find any code doing GiST key toast in git. I guess it can't happen. Or is it possible that a toasted value that came from disk will be passed to these functions, without detoasting them somewhere along the way? Not sure. In any case, I've committed this patch now. I left out the changes to compress and decompress functions, as that's unrelated to the patch at hand. I ran the regression test on a Windows VM, and updated the expected output file that the Windows build used. That leaves two alternative expected output files unmodified - the buildfarm will tell us whether they're still needed. Thanks and congrats on your first committed patch, Stas! :-) - Heikki
On Mon, Oct 21, 2013 at 11:06 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
------
With best regards,
Alexander Korotkov.
On 11.10.2013 17:39, Alexander Korotkov wrote:2. I didn't understand this change:
@@ -422,24 +439,14 @@ g_cube_union(PG_FUNCTION_ARGS)Datum
g_cube_compress(PG_FUNCTION_**ARGS)
{
- PG_RETURN_DATUM(PG_GETARG_**DATUM(0));g_cube_decompress(PG_FUNCTION_**ARGS)
+ GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
+ PG_RETURN_POINTER(entry);
}
Datum- NDBOX *key = DatumGetNDBOX(PG_DETOAST_**DATUM(entry->key));
{
GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
-
- if (key != DatumGetNDBOX(entry->key))
- {
- GISTENTRY *retval = (GISTENTRY *)
palloc(sizeof(GISTENTRY));
-
- gistentryinit(*retval, PointerGetDatum(key),
- entry->rel, entry->page,
- entry->offset, FALSE);
- PG_RETURN_POINTER(retval);
- }
PG_RETURN_POINTER(entry);
}What did the removed code do, and why isn't it needed anymore?
I don't understand this place even more generally. What decompress function
do is to detoast key and return new gist entry with it if needed. But can
GiST key ever be toasted? My experiments show that answer is no, it can't.
When too long key is passed then GiST falls during inserting key into page.
And I didn't find any code doing GiST key toast in git.
I guess it can't happen. Or is it possible that a toasted value that came from disk will be passed to these functions, without detoasting them somewhere along the way? Not sure.
I will ask Teodor about it. When situation will be completely clear we should cleanup confusing comments.
Also, too long GiST key shouldn't cause GiST to fall in its internals. We should add check somewhere before.
In any case, I've committed this patch now. I left out the changes to compress and decompress functions, as that's unrelated to the patch at hand. I ran the regression test on a Windows VM, and updated the expected output file that the Windows build used. That leaves two alternative expected output files unmodified - the buildfarm will tell us whether they're still needed.
Thanks and congrats on your first committed patch, Stas! :-)
Stas, congratulations from me too! Now, it's time to rebase another patches against head. :)
With best regards,
Alexander Korotkov.
Alexander Korotkov <aekorotkov@gmail.com> writes: > On Mon, Oct 21, 2013 at 11:06 PM, Heikki Linnakangas < > hlinnakangas@vmware.com> wrote: >> I guess it can't happen. Or is it possible that a toasted value that came >> from disk will be passed to these functions, without detoasting them >> somewhere along the way? Not sure. > I will ask Teodor about it. When situation will be completely clear we > should cleanup confusing comments. I believe the reason GIST has compress/decompress functions is not for TOAST (they predate that, if memory serves), but to allow the on-disk representation of an index entry to be different from the data type's normal representation in other ways --- think lossy storage in particular. It's possible that the code in cube's decompress/decompress functions is unnecessary because cube doesn't actually do any such compression; the code might've been copy-pasted from somewhere else without adequate thought about whether it was useful for cubes. I'd want to see some discussion about why it's okay to change it before we do so, though. In any case it seems separate from the claimed purpose of this patch and thus better done in a different patch. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > I believe the reason GIST has compress/decompress functions is not for > TOAST (they predate that, if memory serves), but to allow the on-disk > representation of an index entry to be different from the data type's > normal representation in other ways --- think lossy storage in particular. My understanding of the use case for those functions is to do with storing a different data type in the index upper nodes and in the index leafs. It should be possible to do that in a non-lossy way, so that you would implement compress/decompress and not declare the RECHECK bits. Then again I'm talking from 8.3 era memories of when I tried to understand GiST enough to code the prefix extension. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Tue, Oct 22, 2013 at 2:00 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
------
With best regards,
Alexander Korotkov.
Tom Lane <tgl@sss.pgh.pa.us> writes:My understanding of the use case for those functions is to do with
> I believe the reason GIST has compress/decompress functions is not for
> TOAST (they predate that, if memory serves), but to allow the on-disk
> representation of an index entry to be different from the data type's
> normal representation in other ways --- think lossy storage in particular.
storing a different data type in the index upper nodes and in the index
leafs. It should be possible to do that in a non-lossy way, so that you
would implement compress/decompress and not declare the RECHECK bits.
Then again I'm talking from 8.3 era memories of when I tried to
understand GiST enough to code the prefix extension.
Actually, I mean purpose of this particular decompress function implementation, not compress/decompress in general. I understand that in general compress/decompress can do useful job.
------
With best regards,
Alexander Korotkov.