Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiSTindexes from gevel - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiSTindexes from gevel
Date
Msg-id CAEepm=1Vf6edWc33qF5us5Vw4J8QS-3LYtFzFVKYscs6c9+B7g@mail.gmail.com
Whole thread Raw
In response to [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiST indexes fromgevel  (Alexey Chernyshov <a.chernyshov@postgrespro.ru>)
Responses Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiSTindexes from gevel
List pgsql-hackers
On Sat, Jul 22, 2017 at 12:05 AM, Alexey Chernyshov
<a.chernyshov@postgrespro.ru> wrote:
> the following patch transfers functionality from gevel module
> (http://www.sai.msu.su/~megera/wiki/Gevel) which provides functions for
> analyzing GIN and GiST indexes to pageinspect. Gevel was originally
> designed by Oleg Bartunov, and Teodor Sigaev for developers of GiST and
> GIN indexes.
>
> Functions added:
>  - gist_stat(text) - shows statistics on GiST Tree
>  - gist_tree(text) - shows GiST tree
>  - gist_tree(text, int4) - shows GiST tree up to MAXLEVEL
>  - gist_print(text) - prints objects stored in GiST tree
>  - spgist_stat(text) - shows statistics on SP-GiST
>  - spgist_print(text) - prints objects stored in index
>  - gin_value_count() - originally gin_stat(text) - prints estimated counts
> for index values
>  - gin_stats() - originally gin_statpage(text) - shows statistics
>  - gin_count_estimate(text, tsquery) - shows number of indexed rows matched
> query
>
> Tests also transferred, docs for new functions are added. I run pgindent
> over the code, but the result is different from those I expected, so I leave
> pgindented one.
> The patch is applicable to the commit
> 866f4a7c210857aa342bf901558d170325094dde.

Hi Alexey,

This patch still applies but doesn't build after commits 2cd70845 and c6293249.

ginfuncs.c:91:47: error: invalid type argument of ‘->’ (have
‘FormData_pg_attribute’)          st->index->rd_att->attrs[st->attnum]->attbyval,
...several similar errors...

For example, that expression needs to be changed to:
          TupleDescAttr(st->index->rd_att, attnum)->attbyval

Here is some superficial review since I'm here:

+/*
+ * process_tuples
+ * Retrieves the number of TIDs in a tuple.
+ */
+static void
+process_tuple(FuncCallContext *funcctx, GinStatState * st, IndexTuple itup)

"process_tuples" vs "process_tuple"

+ /* do no distiguish various null category */

"do not distinguish various null categories"

+ st->nulls[0] = (st->category == GIN_CAT_NORM_KEY) ? false : true;

That's a long way to write (st->category != GIN_CAT_NORM_KEY)!

+ * We scaned the whole page, so we should take right page

"scanned"

+/*
+ * refind_position
+ * Refinds a previois position, at returns it has correctly
+ * set offset and buffer is locked.
+ */

"previous", s/, at returns/.  On return/

+ memset(st->nulls, false, 2 * sizeof(*st->nulls));

"false" looks strange here where an int is expected.  The size
expression is weird.  I think you should just write:

st->nulls[0] = false;
st->nulls[1] = false;

Investigating the types more closely, I see that 'nulls' is declared
like this (in struct GinStatState):

+ char nulls[2];

Then later you do this:

+ htuple = heap_form_tuple(funcctx->attinmeta->tupdesc,
+     st->dvalues,
+     st->nulls);

But that's not OK, heap_form_tuple takes bool *.  bool and char are
not interchangeable (they may have different sizes on some platforms,
among other problems, even though usually they are both 1 byte).  So I
think you should declare it as "bool nulls[2]".

Meanwhile in TypeStorage you have a member "bool *nulls", which you
then initialise like so:

+ st->nulls = (char *) palloc((tupdesc->natts + 2) * sizeof(*st->nulls));

That cast is wrong.

+/*
+ * gin_count_estimate
+ * Outputs number of indexed rows matched query. It doesn't touch heap at
+ * all.

Maybe "... number of indexed rows matching query."?

+ if (attnum < 0 || attnum >= st->index->rd_att->natts)
+     elog(ERROR, "Wrong column's number");
+ st->attnum = attnum;

Maybe "invalid column number" or "invalid attribute number"?

+ elog(ERROR, "Column type is not a tsvector");

s/C/c/ (according to convention).

+ * Prints various stat about index internals.

s/stat/stats/

+ elog(ERROR, "Relation %s.%s is not an index",

s/R/r/

+ elog(ERROR, "Index %s.%s has wrong type",

s/I/i/

+      <function>gin_value_count</function> prints estimated counts for each
+      indexed values, single-argument function will return result for a first
+      column of index. For example:

"... for each indexed value.  The single argument version will return
results for the first column of an index.  For example:"

+      <function>gin_count_estimate</function> outputs number of indexed
+     rows matched query. It doesn't touch heap at all. For example:

"... outputs the number of indexed rows matched by a query. ..."

+      <function>gist_print</function> prints objects stored in
+      <acronym>GiST</acronym> tree, works only if objects in index have
+      textual representation (<function>type_out</function> functions should be
+      implemented for the given object type). It's known to work with R-tree
+      <acronym>GiST</acronym> based index. Note, in example below, objects are
+      of type box. For example:

"... prints objects stored in a GiST tree.  it works only if the
objects in the index have textual representation (type_out functions
should be implemented for the given object type).  It's known to work
with R-tree GiST-based indexes.  Note: in the example below, objects
are of type box.  For example:"

+      <function>gin_value_count</function> prints estimated counts for each
+      indexed value for a given column. For example:

Maybe "for a given column (starting from zero)"?

+      <function>spgist_print</function> prints objects stored in
+      <acronym>SP-GiST</acronym> tree, works only if objects in index have
+      textual representation (<function>type_out</function> functions should
+      be implemented for the given object type).

"... tree.  It works only if ..."

+         Note 1. in example below we used quad_point_ops which uses point
+         for leaf and prefix value, but doesn't use node_label at all.
+         Use type  'int' as dummy type for prefix or/and node_label.

"... in the example ..."

A few random whitespace changes:

- (errmsg("must be superuser to use raw page functions"))));
+ (errmsg("must be superuser to use raw page functions")))
+ );

-   opaq->flags, GIN_META)));
+   opaq->flags, GIN_META))
+ );

- flags[nflags++] = DirectFunctionCall1(to_hex32, Int32GetDatum(flagbits));
+ flags[nflags++] = DirectFunctionCall1(to_hex32,
+  Int32GetDatum(flagbits));

-   (GIN_DATA | GIN_LEAF | GIN_COMPRESSED))));
+   (GIN_DATA | GIN_LEAF | GIN_COMPRESSED)))
+ );

--
Thomas Munro
http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Rafia Sabih
Date:
Subject: Re: [HACKERS] Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE
Next
From: Thomas Munro
Date:
Subject: Re: [HACKERS] additional contrib test suites