Re: [QUESTION/PROPOSAL] loose quadtree in spgist - Mailing list pgsql-hackers

From Peter Griggs
Subject Re: [QUESTION/PROPOSAL] loose quadtree in spgist
Date
Msg-id CACEwj4q91T-Z3Cc=9-Z3EwnPSacvAponrwX3tBG_6aefKFTrJw@mail.gmail.com
Whole thread Raw
In response to Re: [QUESTION/PROPOSAL] loose quadtree in spgist  (Peter Griggs <petergriggs33@gmail.com>)
List pgsql-hackers
Hi, I was wondering if someone could help me understand what the "allTheSame" attribute in the spgChooseIn struct is.
Does it mean that the inner tuple contains either only inner tuples or only leaf nodes? Or is it saying that the tuples in an inner tuple are all in the same quadrant?

This is a code snippet from /src/include/access/spgist.h:
/*
 * Argument structs for spg_choose method
 */
typedef struct spgChooseIn
{
   Datum datum; /* original datum to be indexed */
   Datum leafDatum; /* current datum to be stored at leaf */
   int level; /* current level (counting from zero) */

   /* Data from current inner tuple */
   bool allTheSame; /* tuple is marked all-the-same? */
   bool hasPrefix; /* tuple has a prefix? */
   Datum prefixDatum; /* if so, the prefix value */
   int nNodes; /* number of nodes in the inner tuple */
   Datum   *nodeLabels; /* node label values (NULL if none) */
} spgChooseIn;

On Thu, Jan 16, 2020 at 12:32 AM Peter Griggs <petergriggs33@gmail.com> wrote:
As an update, you were totally right Tom, SPGIST loads all of the tuples onto the root page which doesn't call picksplit until there's a page's worth of data. I just wasn't inserting enough tuples to see my elog values appear in the log, but now I am and they do!

The hint from Tomas before to use the CREATE OPERATOR CLASS command was spot on. That documentation lead me to this page (https://www.postgresql.org/docs/11/xindex.html), which looked like the sql I need to include in the extension to get the new loose quadtree index to build. What I did was create a "loose_quadtree" folder for my extension in the /contrib folder and followed the format of another extension by including a Makefile, loose_quadtree.control file, loose_quadtree.sql, and my loose_quadtree.c file, in which I just put copies of the box quadtree functions from geo_spgist.c with a bit of extra logging code. Now, I can build the extension with make and then run 'CREATE EXTENSION loose_quadtree;', then index using 'spgist(b loose_quadtree_ops)' operator class! Now, I have to actually figure out how to change the logic within the functions from geo_spgist.c.

I was wondering if you had advice on how to handle implementing insertion for the loose quadtree index. For some background, a loose quadtree is similar to a quadtree over boxes, except that the length of a side becomes k*L where k>1. Throughout this, I assume that our space is a square (take the minimum bounding square of all of the boxes). Usually, a value of K=2 is used. Since, each loose quadtree cell is 2x its normal size, a given level can hold any object that has a radius of <=1/4 of the cell side length, regardless of the object's position. We can do a bit of math and figure out what level an object should be inserted into the tree in O(1) time. I'm including a picture of the level selection algorithm below, but its just a re-formulation of what i've said above. My overall question is how to do this in spgist. From what I understand in the spgist insertion algorithm, the level selection would should done in the choose() function because choose() is called when we are trying to insert a leaf tuple into a inner tuple that has one or more levels under it. Currently, it seems like the spg_box_quad_choose() function descends recursively into a quadtree node. What I would like to do is to have it jump straight to the level it wants to insert into using the loose quadtree level selection algorithm and then find which cell it should add to by comparing its center coordinates.

[Following image from: Thatcher Ulrich. Loose octrees. In Mark DeLoura, editor, Game Programming Gems, pages 444–453. Charles River Media, 2000]

Screen Shot 2020-01-14 at 6.15.09 PM.png

Best,
Peter





On Wed, Jan 8, 2020 at 3:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Griggs <petergriggs33@gmail.com> writes:
> In the getQuadrant function in the file src/backend/utils/adt/geo_spgist.c,
> I only added some elog statements to see the quadrant that a box is placed
> into using the current code. getQuadrant is called several times by the
> spg_box_quad_picksplit function, which is used when inserting into the
> quadtree. With this change, I can still build postgres but when I try to
> trigger the code, nothing gets printed to my logfile.

Perhaps you're looking in the wrong logfile.  elog(LOG) should definitely
produce output unless you're using very strange settings.

Another possibility is that the specific test case you're using doesn't
actually reach this function.  I'm not totally sure, but I think that
SPGiST might not call the datatype-specific choose or picksplit functions
until it's got more than one index page's worth of data.

>       elog(LOG, "BOX (minx, miny) = (%d, %d)\n", centroid->low.x, centroid->low.y);

A couple thoughts here:

* From memory, the x and y values of a BOX are float8, so don't you want
to use %g or %f instead of %d?

* You don't want to end an elog with \n, that'll just add extra blank
lines to the log.

                        regards, tom lane


--
Peter Griggs
Masters of Engineering (Meng) in Computer Science
Massachusetts Institute of Technology | 2020


--
Peter Griggs
Masters of Engineering (Meng) in Computer Science
Massachusetts Institute of Technology | 2020
Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Next
From: Michael Paquier
Date:
Subject: Adding one CheckTableNotInUse() for REINDEX CONCURRENTLY