Re: WIP: Covering + unique indexes. - Mailing list pgsql-hackers

From Andrey Borodin
Subject Re: WIP: Covering + unique indexes.
Date
Msg-id 20160814171131.21390.75752.pgcf@coridan.postgresql.org
Whole thread Raw
In response to Re: WIP: Covering + unique indexes.  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: WIP: Covering + unique indexes.
List pgsql-hackers
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, failed
Spec compliant:           tested, passed
Documentation:            tested, passed

Hi hackers!

I've read the patch and here is my code review.

==========PURPOSE============
I've used this feature from time to time with MS SQL. From my experience INCLUDE is a 'sugar on top' feature. 
Some MS SQL classes do not even mention INCLUDE despite it's there from 2005 (though classes do not mention lots of
importantthings, so it's not kind of valuable indicator).
 
But those who use it, use it whenever possible. For example, system view with recommended indices rarely list one
withoutINCLUDE columns.
 
So, this feature is very important from perspective of converting MS SQL DBAs to PostgreSQL. This is how I see it.

========SUGGESTIONS==========
0. Index build is broken. This script
https://github.com/x4m/pggistopt/blob/8ad65d2e305e98c836388a07909af5983dba9c73/test.sqlSEGFAULTs and may cause
situationwhen you cannot insert anything into table (I think drop of index would help, but didn't tested this)
 
1. I think MS SQL syntax INCLUDE instead of INCLUDING would be better (for a purpose listed above)
2. Empty line added in ruleutils.c. Is it for a reason?
3. Now we have indnatts and indnkeyatts instead of indnatts. I think it is worth considering renaming indnatts to
somethingdifferent from old name. Someone somewhere could still suppose it's a number of keys.
 

========PERFORMANCE==========
Due to suggestion number 0 I could not measure performance of index build. Index crashes when there's more than 1.1
millionof rows in a table.
 
Performance test script is here
https://github.com/x4m/pggistopt/blob/f206b4395baa15a2fa42897eeb27bd555619119a/test.sql
Test scenario is following:
1. Create table, then create index, then add data.
2. Make a query touching data in INCLUDING columns.
This scenario is tested against table with:
A. Table with index, that do not contain touched columns, just PK.
B. Index with all columns in index.
C. Index with PK in keys and INCLUDING all other columns.

Tests were executed 5 times on Ubuntu VM under Hyper-V i5 2500 CPU, 16 Gb of RAM, SSD disk.
Time to insert 10M rows:
A. AVG 110 seconds STD 4.8
B. AVG 121 seconds STD 2.0
C. AVG 111 seconds STD 5.7
Inserts to INCLUDING index is almost as fast as inserts to index without extra columns.

Time to run SELECT query:
A. AVG 2864 ms STD 794
B. AVG 2329 ms STD 84
C. AVG 2293 ms STD 58
Selects with INCLUDING columns is almost as fast as with full index.

Index size (deterministic measure, STD = 0)
A. 317 MB
B. 509 MB
C. 399 MB
Index size is in the middle between full index and minimal index.

I think this numbers agree with expectation from the feature.

========CONCLUSION==========
This patch brings useful and important feature. Build shall be repaired; other my suggestions are only suggestions.



Best regards, Andrey Borodin, Octonica & Ural Federal University.

The new status of this patch is: Waiting on Author

pgsql-hackers by date:

Previous
From: Ryan Murphy
Date:
Subject: Patch: initdb: "'" for QUOTE_PATH (non-windows)
Next
From: Tom Lane
Date:
Subject: Re: [NOVICE] numeric data type upper limit.