Re: WIP: Covering + unique indexes. - Mailing list pgsql-hackers

From Anastasia Lubennikova
Subject Re: WIP: Covering + unique indexes.
Date
Msg-id 57050BFE.7020805@postgrespro.ru
Whole thread Raw
In response to Re: WIP: Covering + unique indexes.  (Peter Geoghegan <pg@heroku.com>)
Responses Re: WIP: Covering + unique indexes.
Re: WIP: Covering + unique indexes.
List pgsql-hackers
06.04.2016 03:05, Peter Geoghegan:
> On Tue, Apr 5, 2016 at 1:31 PM, Peter Geoghegan<pg@heroku.com>  wrote:
>> My new understanding: The extra "included" columns are stored in the
>> index, but do not affect its sort order at all. They are no more part
>> of the key than, say, the heap TID that the key points to. They are
>> just "payload".

It was really long and complicated discussion. I'm glad that finally we
are in agreement about the patch.
Anyway, I think all mentioned questions will be very helpful for the
future work on b-tree.

> Noticed a few issues following another pass:
>
> * tuplesort.c should handle the CLUSTER case in the same way as the
> btree case. No?
Yes, I just missed that cluster uses index sort.  Fixed.

> * Why have a RelationGetNumberOfAttributes(indexRel) call in
> tuplesort_begin_index_btree() at all now?
Fixed.
> * This critical section is unnecessary, because this happens during
> index builds:
>
> +               if (indnkeyatts != indnatts && P_ISLEAF(opageop))
> +               {
> +                       /*
> +                        * It's essential to truncate High key here.
> +                        * The purpose is not just to save more space
> on this particular page,
> +                        * but to keep whole b-tree structure
> consistent. Subsequent insertions
> +                        * assume that hikey is already truncated, and
> so they should not
> +                        * worry about it, when copying the high key
> into the parent page
> +                        * as a downlink.
> +                        * NOTE It is not crutial for reliability in present,
> +                        * but maybe it will be that in the future.
> +                        * NOTE this code will be changed by the
> "btree compression" patch,
> +                        * which is in progress now.
> +                        */
> +                       keytup = index_reform_tuple(wstate->index, oitup,
> +
>                   indnatts, indnkeyatts);
> +
> +                       /*  delete "wrong" high key, insert keytup as
> P_HIKEY. */
> +                       START_CRIT_SECTION();
> +                       PageIndexTupleDelete(opage, P_HIKEY);
> +
> +                       if (!_bt_pgaddtup(opage,
> IndexTupleSize(keytup), keytup, P_HIKEY))
> +                               elog(ERROR, "failed to rewrite
> compressed item in index \"%s\"",
> +                                       RelationGetRelationName(wstate->index));
> +                       END_CRIT_SECTION();
> +               }
>
> Note that START_CRIT_SECTION() promotes any ERROR to PANIC, which
> isn't useful here, because we have no buffer lock held, and nothing
> must be WAL-logged.
>
> * Think you forgot to update spghandler(). (You did not add a test for
> just that one AM, either)
Fixed.
> * I wonder why this restriction needs to exist:
>
> +               else
> +                       elog(ERROR, "Expressions are not supported in
> included columns.");
>
> What does not supporting it buy us? Was it just that the pg_index
> representation is more complicated, and you wanted to put it off?
>
> An error like this should use ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED ..., btw.
Yes, you get it right. It was a bit complicated to implement and I
decided to delay it to the next patch.
errmsg is fixed.

> * I would like to see index_reform_tuple() assert that the new,
> truncated index tuple is definitely <= the original (I worry about the
> 1/3 page restriction issue). Maybe you should also change the name of
> index_reform_tuple(), per David.
Is it possible that the new tuple, containing less attributes than the
old one, will have a greater size?
Maybe you can give an example?
I think that  Assert(indnkeyatts <= indnatts); covers this kind of errors.
I do not mind to rename this function, but what name would be better?
index_truncate_tuple()?

> * There is some stray whitespace within RelationGetIndexAttrBitmap().
> I think you should have updated it with code, though. I don't think
> it's necessary for HOT updates to work, but I think it could be
> necessary so that we don't need to get a row lock that blocks
> non-conflict foreign key locking (see heap_update() callers). I think
> you need to be careful for non-key columns within the loop in
> RelationGetIndexAttrBitmap(), basically, because it seems to still go
> through all columns. UPSERT also must call this code, FWIW.
>
> * I think that a similar omission is also made for the replica
> identity stuff in RelationGetIndexAttrBitmap(). Some thought is needed
> on how this patch interacts with logical decoding, I guess.

Good point. Indexes are everywhere in the code.
I missed that RelationGetIndexAttrBitmap() is used not only for REINDEX.
I'll discuss it with Theodor and send an updated patch tomorrow.

> * Valgrind shows an error with an aggregate statement I tried:
>
> 2016-04-05 17:01:31.129 PDT 12310 LOG:  statement: explain analyze
> select count(*) from ab  where b > 5 group by a, b;
> ==12310== Invalid read of size 4
> ==12310==    at 0x656615: match_clause_to_indexcol (indxpath.c:2226)
> ==12310==    by 0x656615: match_clause_to_index (indxpath.c:2144)
> ==12310==    by 0x656DBC: match_clauses_to_index (indxpath.c:2115)
> ==12310==    by 0x658054: match_restriction_clauses_to_index (indxpath.c:2026)
> ==12310==    by 0x658054: create_index_paths (indxpath.c:269)
> ==12310==    by 0x64D1DB: set_plain_rel_pathlist (allpaths.c:649)
> ==12310==    by 0x64D1DB: set_rel_pathlist (allpaths.c:427)
> ==12310==    by 0x64D93B: set_base_rel_pathlists (allpaths.c:299)
> ==12310==    by 0x64D93B: make_one_rel (allpaths.c:170)
> ==12310==    by 0x66876C: query_planner (planmain.c:246)
> ==12310==    by 0x669FBA: grouping_planner (planner.c:1666)
> ==12310==    by 0x66D0C9: subquery_planner (planner.c:751)
> ==12310==    by 0x66D3DA: standard_planner (planner.c:300)
> ==12310==    by 0x66D714: planner (planner.c:170)
> ==12310==    by 0x6FD692: pg_plan_query (postgres.c:798)
> ==12310==    by 0x59082D: ExplainOneQuery (explain.c:350)
> ==12310==  Address 0xbff290c is 2,508 bytes inside a block of size 8,192 alloc'd
> ==12310==    at 0x4C2AB80: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==12310==    by 0x81B7FA: AllocSetAlloc (aset.c:853)
> ==12310==    by 0x81D257: palloc (mcxt.c:907)
> ==12310==    by 0x4B6F65: RelationGetIndexScan (genam.c:94)
> ==12310==    by 0x4C135D: btbeginscan (nbtree.c:431)
> ==12310==    by 0x4B7A5C: index_beginscan_internal (indexam.c:279)
> ==12310==    by 0x4B7C5A: index_beginscan (indexam.c:222)
> ==12310==    by 0x4B73D1: systable_beginscan (genam.c:379)
> ==12310==    by 0x7E8CF9: ScanPgRelation (relcache.c:341)
> ==12310==    by 0x7EB3C4: RelationBuildDesc (relcache.c:951)
> ==12310==    by 0x7ECD35: RelationIdGetRelation (relcache.c:1800)
> ==12310==    by 0x4A4D37: relation_open (heapam.c:1118)
> ==12310==
> {
>     <insert_a_suppression_name_here>
>     Memcheck:Addr4
>     fun:match_clause_to_indexcol
>     fun:match_clause_to_index
>     fun:match_clauses_to_index
>     fun:match_restriction_clauses_to_index
>     fun:create_index_paths
>     fun:set_plain_rel_pathlist
>     fun:set_rel_pathlist
>     fun:set_base_rel_pathlists
>     fun:make_one_rel
>     fun:query_planner
>     fun:grouping_planner
>     fun:subquery_planner
>     fun:standard_planner
>     fun:planner
>     fun:pg_plan_query
>     fun:ExplainOneQuery
> }
>
> Separately, I tried "make installcheck-tests TESTS=index_including"
> from Postgres + Valgrind, with Valgrind's --track-origins option
> enabled (as it was above). I recommend installing Valgrind, and making
> sure that the patch shows no errors. I didn't actually find a Valgrind
> issue from just using your regression tests (nor did I find an issue
> from separately running the regression tests with
> CLOBBER_CACHE_ALWAYS, FWIW).
>
Thank you for advice.
Another miss of index->ncolumns to index->nkeycolumns replacement in
match_clause_to_index. Fixed.
I also updated couple of typos in documentation.

Thank you again for the detailed review.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Attachment

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: WIP: Failover Slots
Next
From: Andres Freund
Date:
Subject: Re: [COMMITTERS] pgsql: Avoid archiving XLOG_RUNNING_XACTS on idle server