Re: WIP: Covering + unique indexes. - Mailing list pgsql-hackers

From Anastasia Lubennikova
Subject Re: WIP: Covering + unique indexes.
Date
Msg-id a976fff4-29a9-c799-5c2d-19a371031325@postgrespro.ru
Whole thread Raw
In response to Re: WIP: Covering + unique indexes.  (Andres Freund <andres@anarazel.de>)
Responses Re: WIP: Covering + unique indexes.  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
List pgsql-hackers
30.03.2017 22:11, Andres Freund
Any objection from reviewers to push both patches?

First of all, I want to thank you and Robert for reviewing this patch.
Your expertise in postgres subsystems is really necessary for features like this.
I just wonder, why don't you share your thoughts and doubts till the "last call".

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index f2eda67..59029b9 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -120,6 +120,7 @@ blhandler(PG_FUNCTION_ARGS)	amroutine->amclusterable = false;	amroutine->ampredlocks = false;	amroutine->amcanparallel = false;
+	amroutine->amcaninclude = false;
That name doesn't strike me as very descriptive.

The feature is "index with included columns", it uses keyword "INCLUDE".
So the name looks good to me.
Any suggestions?
+      <term><literal>INCLUDE</literal></term>
+      <listitem>
+       <para>
+        An optional <literal>INCLUDE</> clause allows a list of columns to be
+        specified which will be included in the non-key portion of the index.
+        Columns which are part of this clause cannot also exist in the
+        key columns portion of the index, and vice versa. The
+        <literal>INCLUDE</> columns exist solely to allow more queries to benefit
+        from <firstterm>index-only scans</> by including certain columns in the
+        index, the value of which would otherwise have to be obtained by reading
+        the table's heap. Having these columns in the <literal>INCLUDE</> clause
+        in some cases allows <productname>PostgreSQL</> to skip the heap read
+        completely. This also allows <literal>UNIQUE</> indexes to be defined on
+        one set of columns, which can include another set of columns in the
+       <literal>INCLUDE</> clause, on which the uniqueness is not enforced.
+        It's the same with other constraints (PRIMARY KEY and EXCLUDE). This can
+        also can be used for non-unique indexes as any columns which are not required
+        for the searching or ordering of records can be used in the
+        <literal>INCLUDE</> clause, which can slightly reduce the size of the index.
+        Currently, only the B-tree access method supports this feature.
+        Expressions as included columns are not supported since they cannot be used
+        in index-only scans.
+       </para>
+      </listitem>
+     </varlistentry>
This could use some polishing.
Definitely. But do you have any specific proposals?

+/*
+ * Reform index tuple. Truncate nonkey (INCLUDE) attributes.
+ */
+IndexTuple
+index_truncate_tuple(Relation idxrel, IndexTuple olditup)
+{
+	TupleDesc   itupdesc = RelationGetDescr(idxrel);
+	Datum       values[INDEX_MAX_KEYS];
+	bool        isnull[INDEX_MAX_KEYS];
+	IndexTuple	newitup;
+	int indnatts = IndexRelationGetNumberOfAttributes(idxrel);
+	int indnkeyatts = IndexRelationGetNumberOfKeyAttributes(idxrel);
+
+	Assert(indnatts <= INDEX_MAX_KEYS);
+	Assert(indnkeyatts > 0);
+	Assert(indnkeyatts < indnatts);
+
+	index_deform_tuple(olditup, itupdesc, values, isnull);
+
+	/* form new tuple that will contain only key attributes */
+	itupdesc->natts = indnkeyatts;
+	newitup = index_form_tuple(itupdesc, values, isnull);
+	newitup->t_tid = olditup->t_tid;
+
+	itupdesc->natts = indnatts;
Uh, isn't this a *seriously* bad idea?  If index_form_tuple errors out,
this'll corrupt the tuple descriptor.
Initial reasoning was something like this:
> Maybe it would be better to modify index_form_tuple() to accept a new
> argument with a number of attributes, then you can just Assert that
> this number is never higher than the number of attributes in the
> TupleDesc.
Good point.
I agree that this function is a bit strange. I have to set 
tupdesc->nattrs to support compatibility with index_form_tuple().
I didn't want to add neither a new field to tupledesc nor a new 
parameter to index_form_tuple(), because they are used widely.

But I haven't considered the possibility of index_form_tuple() failure.
Fixed in this version of the patch. Now it creates a copy of tupledesc to pass it to index_form_tuple.

Maybe also rename the function to index_build_key_tuple()?
We'd discussed with other reviewers, they suggested index_truncate_tuple() instead of index_reform_tuple().
I think that this name reflects the essence of the function clear enough and don't feel like renaming it again.

@@ -368,7 +370,7 @@ systable_beginscan(Relation heapRelation,		{			int			j;
-			for (j = 0; j < irel->rd_index->indnatts; j++)
+			for (j = 0; j < IndexRelationGetNumberOfAttributes(irel); j++)
 			{				if (key[i].sk_attno == irel->rd_index->indkey.values[j])				{
@@ -376,7 +378,7 @@ systable_beginscan(Relation heapRelation,					break;				}			}
-			if (j == irel->rd_index->indnatts)
+			if (j == IndexRelationGetNumberOfAttributes(irel))				elog(ERROR, "column is not in index");		}
Not that it matters overly much, but why are we doing this for all
attributes, rather than just key attributes?
Since we don't use included columns for system indexes, there is no difference. I've just tried to minimize code changes here.
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -600,7 +600,7 @@ boot_openrel(char *relname)		 relname, (int) ATTRIBUTE_FIXED_PART_SIZE);	boot_reldesc = heap_openrv(makeRangeVar(NULL, relname, -1), NoLock);
-	numattr = boot_reldesc->rd_rel->relnatts;
+	numattr = RelationGetNumberOfAttributes(boot_reldesc);	for (i = 0; i < numattr; i++)	{		if (attrtypes[i] == NULL)
That seems a bit unrelated.
I've replaced all the references to relnatts with macro, primarily to ensure that I won't miss anything that should use only key attributes.
@@ -2086,7 +2086,8 @@ StoreRelCheck(Relation rel, char *ccname, Node *expr,							  is_validated,							  RelationGetRelid(rel),	/* relation */							  attNos,	/* attrs in the constraint */
-							  keycount, /* # attrs in the constraint */
+							  keycount, /* # key attrs in the constraint */
+							  keycount, /* # total attrs in the constraint */							  InvalidOid,		/* not a domain constraint */							  InvalidOid,		/* no associated index */							  InvalidOid,		/* Foreign key fields */
It doesn't quite seem right to me to store this both in pg_index and
pg_constraint.

Initially, I did to provide pg_get_constraintdef_worker() with info about included columns.
Maybe it can be solved in some other way, but for now it is a tested and working implementation.

 
+opt_c_include:	INCLUDE '(' columnList ')'			{ $$ = $3; }
+			 |		/* EMPTY */						{ $$ = NIL; }
+		;
+opt_include:		INCLUDE '(' index_including_params ')'			{ $$ = $3; }
+			 |		/* EMPTY */						{ $$ = NIL; }
+		;
+
+index_including_params:	index_elem						{ $$ = list_make1($1); }
+			| index_including_params ',' index_elem		{ $$ = lappend($1, $3); }
+		;
+
Why do we have multiple different definitions of this?

Hm,
columnList contains entries of columnElem type and index_including_params works with index_elem.
Is there a way they can be combined?

+			keytup = index_truncate_tuple(wstate->index, oitup);
The code in _bt_split previously claimed that it's the only place doing
truncation...
To be exact, it claimed that regarding insertion of new values, not index build.
> It's the only point in insertion process, where we perform truncation


Other comments about code format, spelling and comments are fixed in attached patches.
Thank you again for reviewing.
-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: PATCH: pageinspect / add page_checksum andbt_page_items(bytea)
Next
From: Tom Lane
Date:
Subject: Re: Allow to specify #columns in heap/index_form_tuple