Thread: [QUESTION/PROPOSAL] loose quadtree in spgist

[QUESTION/PROPOSAL] loose quadtree in spgist

From
Peter Griggs
Date:
Hello, I wanted some guidance/suggestions about creating an spgist extension. For context, i am a grad student doing research that involves comparing the performance of different indexes for spatial data. We've built a system that uses Postgres and one of the data structures we want to use is a loose quadtree, but there is no implementation of this data structure in spgist. The reason why I think this is pretty do-able is that it is quite similar to a quadtree on boxes, which is implemented in src/backend/utils/adt/geo_spgist.c.

Additionally, I found by grepping through the repo for the existing functions in spgist/box_ops operator class that several catalog files need to be updated to reflect a new operator class in spgist. The files that I believe need to be changed to create a new
spgist_loose_box_ops operator class are:

src/include/catalog/pg_amop.dat
src/include/catalog/pg_amproc.dat
src/include/catalog/pg_opclass.dat
src/include/catalog/pg_opfamily.dat


I've poked around quite a bit in the spgist code and have tried making minimal changes to geo_spgist.c, but I haven't done any development on postgres before, so i'm running into some issues that I couldn't find help with on the postgres slack, by searching the mailing list, or by scouring the development wikis. For example, I wanted to just print out some data to see what quadrant a box is being placed into in the geo_spgist.c code. I understand that printing to stdout won't work in postgres, but I thought that I could possibly write some data to the logfile. I tried updating a function to use both elog and ereport and re-built the code. However, I can't get anything to print out to the logfile no matter what I try. Does anyone have tips for printing out and debugging in general for postgres development?


Any tips or guidance would be much appreciated. Also, if there's a different route I should go to turn this into a proposal for a patch please let me know. I'm new to postgres dev.

Best,
Peter


Re: [QUESTION/PROPOSAL] loose quadtree in spgist

From
Tomas Vondra
Date:
On Tue, Jan 07, 2020 at 11:33:31AM -0500, Peter Griggs wrote:
>Hello, I wanted some guidance/suggestions about creating an spgist
>extension. For context, i am a grad student doing research that involves
>comparing the performance of different indexes for spatial data. We've
>built a system that uses Postgres and one of the data structures we want to
>use is a loose quadtree, but there is no implementation of this data
>structure in spgist. The reason why I think this is pretty do-able is that
>it is quite similar to a quadtree on boxes, which is implemented in
>src/backend/utils/adt/geo_spgist.c.
>
>Additionally, I found by grepping through the repo for the existing
>functions in spgist/box_ops operator class that several catalog files need
>to be updated to reflect a new operator class in spgist. The files that I
>believe need to be changed to create a new
>spgist_loose_box_ops operator class are:
>
>src/include/catalog/pg_amop.dat
>src/include/catalog/pg_amproc.dat
>src/include/catalog/pg_opclass.dat
>src/include/catalog/pg_opfamily.dat
>

You should probably try using CREATE OPERATOR CLASS command [1], not
modify the catalogs directly. That's only necessary for built-in index
types (i.e. available right after initdb). But you mentioned you're
working on an extension, so the command is the right thing to do (after
all, you don't know OIDs of objects from the extension).

[1] https://www.postgresql.org/docs/current/sql-createopclass.html

>
>I've poked around quite a bit in the spgist code and have tried making
>minimal changes to geo_spgist.c, but I haven't done any development on
>postgres before, so i'm running into some issues that I couldn't find help
>with on the postgres slack, by searching the mailing list, or by scouring
>the development wikis.

Well, learning the ropes may take a bit of time, and pgsql-hackers is
probably the right place to ask ...

>For example, I wanted to just print out some data to
>see what quadrant a box is being placed into in the geo_spgist.c code. I
>understand that printing to stdout won't work in postgres, but I thought
>that I could possibly write some data to the logfile. I tried updating a
>function to use both elog and ereport and re-built the code. However, I
>can't get anything to print out to the logfile no matter what I try. Does
>anyone have tips for printing out and debugging in general for postgres
>development?
>

Well, elog/ereport are the easiest approach (it's what I'd do), and they
do about the same thing. The main difference is that ereport allows
translations of messages to other languages, while elog is for internal
things that should not happen (unexpected errors, ...). For debugging
just use elog(), I guess.

It's hard to say why you're not getting anything logged, because you
haven't shown us any code. My guess is that you're uring log level that
is not high enough to make it into the log file.

The default config in postgresql.conf says

   log_min_messages = warning

which means the level has to be at least WARNING to make it into the
file. So either WARNING, ERROR, LOG, FATAL, PANIC. So for example

   elog(INFO, "test message");

won't do anything, but

   elog(LOG, "test message");

will write stuff to the log file. If you use WARNING, you'll actually
get the message on the client console (well, there's client_min_messages
but you get the idea).

>
>Any tips or guidance would be much appreciated. Also, if there's a
>different route I should go to turn this into a proposal for a patch
>please let me know. I'm new to postgres dev.
>

A general recommendation is to show snippets of code, so that people on
this list actually can help without too much guessing what you're doing.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: [QUESTION/PROPOSAL] loose quadtree in spgist

From
Peter Griggs
Date:
Thank you for the tips Tomas, I really appreciate it. You're definitely right that I should include code snippets, so here's the code i'm trying to change.

In the getQuadrant function in the file src/backend/utils/adt/geo_spgist.c, I only added some elog statements to see the quadrant that a box is placed into using the current code. getQuadrant is called several times by the spg_box_quad_picksplit function, which is used when inserting into the quadtree. With this change, I can still build postgres but when I try to trigger the code, nothing gets printed to my logfile. Here's my process for trying to trigger this code:

1. delete the current postgres installation by removing /usr/local/pgsql
2. re-build from source by following documentation
3. create a database with a table that has two columns: (id int, b box)
4. insert some boxes into the table and build an index on it using "CREATE INDEX box_quad_idx ON quad USING spgist(b);"

And here's the function I modified:

/*
 * Calculate the quadrant
 * 
 * The quadrant is 8 bit unsigned integer with 4 least bits in use.
 * This function accepts BOXes as input.  They are not casted to
 * RangeBoxes, yet.  All 4 bits are set by comparing a corner of the box.
 * This makes 16 quadrants in total.
 */
static uint8
getQuadrant(BOX *centroid, BOX *inBox)
{uint8		quadrant = 0;
elog(LOG, "BOX (minx, miny) = (%d, %d)\n", centroid->low.x, centroid->low.y);elog(LOG, "BOX (maxx, maxy) = (%d, %d)\n", centroid->high.x, centroid->high.y);
if (inBox->low.x > centroid->low.x)	quadrant |= 0x8;
if (inBox->high.x > centroid->high.x)	quadrant |= 0x4;
if (inBox->low.y > centroid->low.y)	quadrant |= 0x2;
if (inBox->high.y > centroid->high.y)	quadrant |= 0x1;
elog(LOG, "Quadrant bitvector value is: %d\n", quadrant);
return quadrant;
}





On Tue, Jan 7, 2020 at 5:56 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On Tue, Jan 07, 2020 at 11:33:31AM -0500, Peter Griggs wrote:
>Hello, I wanted some guidance/suggestions about creating an spgist
>extension. For context, i am a grad student doing research that involves
>comparing the performance of different indexes for spatial data. We've
>built a system that uses Postgres and one of the data structures we want to
>use is a loose quadtree, but there is no implementation of this data
>structure in spgist. The reason why I think this is pretty do-able is that
>it is quite similar to a quadtree on boxes, which is implemented in
>src/backend/utils/adt/geo_spgist.c.
>
>Additionally, I found by grepping through the repo for the existing
>functions in spgist/box_ops operator class that several catalog files need
>to be updated to reflect a new operator class in spgist. The files that I
>believe need to be changed to create a new
>spgist_loose_box_ops operator class are:
>
>src/include/catalog/pg_amop.dat
>src/include/catalog/pg_amproc.dat
>src/include/catalog/pg_opclass.dat
>src/include/catalog/pg_opfamily.dat
>

You should probably try using CREATE OPERATOR CLASS command [1], not
modify the catalogs directly. That's only necessary for built-in index
types (i.e. available right after initdb). But you mentioned you're
working on an extension, so the command is the right thing to do (after
all, you don't know OIDs of objects from the extension).

[1] https://www.postgresql.org/docs/current/sql-createopclass.html

>
>I've poked around quite a bit in the spgist code and have tried making
>minimal changes to geo_spgist.c, but I haven't done any development on
>postgres before, so i'm running into some issues that I couldn't find help
>with on the postgres slack, by searching the mailing list, or by scouring
>the development wikis.

Well, learning the ropes may take a bit of time, and pgsql-hackers is
probably the right place to ask ...

>For example, I wanted to just print out some data to
>see what quadrant a box is being placed into in the geo_spgist.c code. I
>understand that printing to stdout won't work in postgres, but I thought
>that I could possibly write some data to the logfile. I tried updating a
>function to use both elog and ereport and re-built the code. However, I
>can't get anything to print out to the logfile no matter what I try. Does
>anyone have tips for printing out and debugging in general for postgres
>development?
>

Well, elog/ereport are the easiest approach (it's what I'd do), and they
do about the same thing. The main difference is that ereport allows
translations of messages to other languages, while elog is for internal
things that should not happen (unexpected errors, ...). For debugging
just use elog(), I guess.

It's hard to say why you're not getting anything logged, because you
haven't shown us any code. My guess is that you're uring log level that
is not high enough to make it into the log file.

The default config in postgresql.conf says

   log_min_messages = warning

which means the level has to be at least WARNING to make it into the
file. So either WARNING, ERROR, LOG, FATAL, PANIC. So for example

   elog(INFO, "test message");

won't do anything, but

   elog(LOG, "test message");

will write stuff to the log file. If you use WARNING, you'll actually
get the message on the client console (well, there's client_min_messages
but you get the idea).

>
>Any tips or guidance would be much appreciated. Also, if there's a
>different route I should go to turn this into a proposal for a patch
>please let me know. I'm new to postgres dev.
>

A general recommendation is to show snippets of code, so that people on
this list actually can help without too much guessing what you're doing.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Peter Griggs
Masters of Engineering (Meng) in Computer Science
Massachusetts Institute of Technology | 2020

Re: [QUESTION/PROPOSAL] loose quadtree in spgist

From
Tom Lane
Date:
Peter Griggs <petergriggs33@gmail.com> writes:
> In the getQuadrant function in the file src/backend/utils/adt/geo_spgist.c,
> I only added some elog statements to see the quadrant that a box is placed
> into using the current code. getQuadrant is called several times by the
> spg_box_quad_picksplit function, which is used when inserting into the
> quadtree. With this change, I can still build postgres but when I try to
> trigger the code, nothing gets printed to my logfile.

Perhaps you're looking in the wrong logfile.  elog(LOG) should definitely
produce output unless you're using very strange settings.

Another possibility is that the specific test case you're using doesn't
actually reach this function.  I'm not totally sure, but I think that
SPGiST might not call the datatype-specific choose or picksplit functions
until it's got more than one index page's worth of data.

>     elog(LOG, "BOX (minx, miny) = (%d, %d)\n", centroid->low.x, centroid->low.y);

A couple thoughts here:

* From memory, the x and y values of a BOX are float8, so don't you want
to use %g or %f instead of %d?

* You don't want to end an elog with \n, that'll just add extra blank
lines to the log.

            regards, tom lane



Re: [QUESTION/PROPOSAL] loose quadtree in spgist

From
Peter Griggs
Date:
As an update, you were totally right Tom, SPGIST loads all of the tuples onto the root page which doesn't call picksplit until there's a page's worth of data. I just wasn't inserting enough tuples to see my elog values appear in the log, but now I am and they do!

The hint from Tomas before to use the CREATE OPERATOR CLASS command was spot on. That documentation lead me to this page (https://www.postgresql.org/docs/11/xindex.html), which looked like the sql I need to include in the extension to get the new loose quadtree index to build. What I did was create a "loose_quadtree" folder for my extension in the /contrib folder and followed the format of another extension by including a Makefile, loose_quadtree.control file, loose_quadtree.sql, and my loose_quadtree.c file, in which I just put copies of the box quadtree functions from geo_spgist.c with a bit of extra logging code. Now, I can build the extension with make and then run 'CREATE EXTENSION loose_quadtree;', then index using 'spgist(b loose_quadtree_ops)' operator class! Now, I have to actually figure out how to change the logic within the functions from geo_spgist.c.

I was wondering if you had advice on how to handle implementing insertion for the loose quadtree index. For some background, a loose quadtree is similar to a quadtree over boxes, except that the length of a side becomes k*L where k>1. Throughout this, I assume that our space is a square (take the minimum bounding square of all of the boxes). Usually, a value of K=2 is used. Since, each loose quadtree cell is 2x its normal size, a given level can hold any object that has a radius of <=1/4 of the cell side length, regardless of the object's position. We can do a bit of math and figure out what level an object should be inserted into the tree in O(1) time. I'm including a picture of the level selection algorithm below, but its just a re-formulation of what i've said above. My overall question is how to do this in spgist. From what I understand in the spgist insertion algorithm, the level selection would should done in the choose() function because choose() is called when we are trying to insert a leaf tuple into a inner tuple that has one or more levels under it. Currently, it seems like the spg_box_quad_choose() function descends recursively into a quadtree node. What I would like to do is to have it jump straight to the level it wants to insert into using the loose quadtree level selection algorithm and then find which cell it should add to by comparing its center coordinates.

[Following image from: Thatcher Ulrich. Loose octrees. In Mark DeLoura, editor, Game Programming Gems, pages 444–453. Charles River Media, 2000]

Screen Shot 2020-01-14 at 6.15.09 PM.png

Best,
Peter





On Wed, Jan 8, 2020 at 3:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Griggs <petergriggs33@gmail.com> writes:
> In the getQuadrant function in the file src/backend/utils/adt/geo_spgist.c,
> I only added some elog statements to see the quadrant that a box is placed
> into using the current code. getQuadrant is called several times by the
> spg_box_quad_picksplit function, which is used when inserting into the
> quadtree. With this change, I can still build postgres but when I try to
> trigger the code, nothing gets printed to my logfile.

Perhaps you're looking in the wrong logfile.  elog(LOG) should definitely
produce output unless you're using very strange settings.

Another possibility is that the specific test case you're using doesn't
actually reach this function.  I'm not totally sure, but I think that
SPGiST might not call the datatype-specific choose or picksplit functions
until it's got more than one index page's worth of data.

>       elog(LOG, "BOX (minx, miny) = (%d, %d)\n", centroid->low.x, centroid->low.y);

A couple thoughts here:

* From memory, the x and y values of a BOX are float8, so don't you want
to use %g or %f instead of %d?

* You don't want to end an elog with \n, that'll just add extra blank
lines to the log.

                        regards, tom lane


--
Peter Griggs
Masters of Engineering (Meng) in Computer Science
Massachusetts Institute of Technology | 2020
Attachment

Re: [QUESTION/PROPOSAL] loose quadtree in spgist

From
Peter Griggs
Date:
Hi, I was wondering if someone could help me understand what the "allTheSame" attribute in the spgChooseIn struct is.
Does it mean that the inner tuple contains either only inner tuples or only leaf nodes? Or is it saying that the tuples in an inner tuple are all in the same quadrant?

This is a code snippet from /src/include/access/spgist.h:
/*
 * Argument structs for spg_choose method
 */
typedef struct spgChooseIn
{
   Datum datum; /* original datum to be indexed */
   Datum leafDatum; /* current datum to be stored at leaf */
   int level; /* current level (counting from zero) */

   /* Data from current inner tuple */
   bool allTheSame; /* tuple is marked all-the-same? */
   bool hasPrefix; /* tuple has a prefix? */
   Datum prefixDatum; /* if so, the prefix value */
   int nNodes; /* number of nodes in the inner tuple */
   Datum   *nodeLabels; /* node label values (NULL if none) */
} spgChooseIn;

On Thu, Jan 16, 2020 at 12:32 AM Peter Griggs <petergriggs33@gmail.com> wrote:
As an update, you were totally right Tom, SPGIST loads all of the tuples onto the root page which doesn't call picksplit until there's a page's worth of data. I just wasn't inserting enough tuples to see my elog values appear in the log, but now I am and they do!

The hint from Tomas before to use the CREATE OPERATOR CLASS command was spot on. That documentation lead me to this page (https://www.postgresql.org/docs/11/xindex.html), which looked like the sql I need to include in the extension to get the new loose quadtree index to build. What I did was create a "loose_quadtree" folder for my extension in the /contrib folder and followed the format of another extension by including a Makefile, loose_quadtree.control file, loose_quadtree.sql, and my loose_quadtree.c file, in which I just put copies of the box quadtree functions from geo_spgist.c with a bit of extra logging code. Now, I can build the extension with make and then run 'CREATE EXTENSION loose_quadtree;', then index using 'spgist(b loose_quadtree_ops)' operator class! Now, I have to actually figure out how to change the logic within the functions from geo_spgist.c.

I was wondering if you had advice on how to handle implementing insertion for the loose quadtree index. For some background, a loose quadtree is similar to a quadtree over boxes, except that the length of a side becomes k*L where k>1. Throughout this, I assume that our space is a square (take the minimum bounding square of all of the boxes). Usually, a value of K=2 is used. Since, each loose quadtree cell is 2x its normal size, a given level can hold any object that has a radius of <=1/4 of the cell side length, regardless of the object's position. We can do a bit of math and figure out what level an object should be inserted into the tree in O(1) time. I'm including a picture of the level selection algorithm below, but its just a re-formulation of what i've said above. My overall question is how to do this in spgist. From what I understand in the spgist insertion algorithm, the level selection would should done in the choose() function because choose() is called when we are trying to insert a leaf tuple into a inner tuple that has one or more levels under it. Currently, it seems like the spg_box_quad_choose() function descends recursively into a quadtree node. What I would like to do is to have it jump straight to the level it wants to insert into using the loose quadtree level selection algorithm and then find which cell it should add to by comparing its center coordinates.

[Following image from: Thatcher Ulrich. Loose octrees. In Mark DeLoura, editor, Game Programming Gems, pages 444–453. Charles River Media, 2000]

Screen Shot 2020-01-14 at 6.15.09 PM.png

Best,
Peter





On Wed, Jan 8, 2020 at 3:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Griggs <petergriggs33@gmail.com> writes:
> In the getQuadrant function in the file src/backend/utils/adt/geo_spgist.c,
> I only added some elog statements to see the quadrant that a box is placed
> into using the current code. getQuadrant is called several times by the
> spg_box_quad_picksplit function, which is used when inserting into the
> quadtree. With this change, I can still build postgres but when I try to
> trigger the code, nothing gets printed to my logfile.

Perhaps you're looking in the wrong logfile.  elog(LOG) should definitely
produce output unless you're using very strange settings.

Another possibility is that the specific test case you're using doesn't
actually reach this function.  I'm not totally sure, but I think that
SPGiST might not call the datatype-specific choose or picksplit functions
until it's got more than one index page's worth of data.

>       elog(LOG, "BOX (minx, miny) = (%d, %d)\n", centroid->low.x, centroid->low.y);

A couple thoughts here:

* From memory, the x and y values of a BOX are float8, so don't you want
to use %g or %f instead of %d?

* You don't want to end an elog with \n, that'll just add extra blank
lines to the log.

                        regards, tom lane


--
Peter Griggs
Masters of Engineering (Meng) in Computer Science
Massachusetts Institute of Technology | 2020


--
Peter Griggs
Masters of Engineering (Meng) in Computer Science
Massachusetts Institute of Technology | 2020
Attachment