Re: Avoid full GIN index scan when possible - Mailing list pgsql-hackers

From Nikita Glukhov
Subject Re: Avoid full GIN index scan when possible
Date
Msg-id b53614eb-6f9f-8c5c-9df8-f703b0b102b6@postgrespro.ru
Whole thread Raw
In response to Re: Avoid full GIN index scan when possible  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses Re: Avoid full GIN index scan when possible
List pgsql-hackers
On 26.12.2019 4:59, Alexander Korotkov wrote:

 I've tried to add patch #4 to comparison, but I've catch assertion failure.

TRAP: FailedAssertion("key->includeNonMatching", File: "ginget.c", Line: 1340)
There simply should be inverted condition in the assertion: 
Assert(!key->includeNonMatching);

I have looked at v9 patch, and here is my review:

1. I agree with NULL-flag handling simplifications in ginNewScanKey(),
ginScanKeyAddHiddenEntry() extraction.

2. I also agree that usage of nrequired/nadditional in keyGetItem() is a more
natural solution to implement exclusion keys than my previous attempt of doing
that in scanGetKey().

But there are some questions:

Can we avoid referencing excludeOnly flag keyGetItem() by replacing these
references with !nrequired?

Maybe it would be better to move the whole block of keyGetItem() code
starting from the first loop over required keys and ending before the loop over
additional keys inside 'if (key->nrequired) { ... }'?

Can we avoid introducing excludeOnly flag by reusing searchMode and/or by
moving the initialization of nrequired/nadditional into ginNewScanKey()?


3. The following two times repeated NULL-filtering check looks too complicated
and needs to be refactored somehow:

-	res = key->triConsistentFn(key);
+	if (key->excludeOnly &&
+		key->nuserentries < key->nentries &&
+		key->scanEntry[key->nuserentries]->queryCategory == GIN_CAT_NULL_KEY &&
+		key->entryRes[key->nuserentries] == GIN_TRUE)
+		res = GIN_FALSE;
+	else
+		res = key->triConsistentFn(key);

For example, a special consistentFn() can be introduced for such NOT_NULL
scankeys.  Or even a hidden separate one-entry scankey with a trivial
consistentFn() can be added instead of adding hidden entry.


4. forcedRecheck flag that was previously used for discarded empty ALL scankeys
is removed now.  0-entry exclusion keys can appear instead, and their
consistentFn() simply returns constant value.  Could this lead to tangible
overhead in some cases (in comparison to forcedRecheck flag)?


5. A hidden GIN_CAT_EMPTY_QUERY is added only for the first empty ALL-scankey,
NULLs in other columns are filtered out with GIN_CAT_NULL_KEY.  This looks like
asymmetric, and it leads to accelerations is some cases and slowdowns in others
(depending on NULL fractions and their correlations in columns).

The following test shows a significant performance regression of v9:

insert into t select array[i], NULL, NULL from generate_series(1, 1000000) i;
                                      |        Query time, ms           WHERE condition            | master |   v8   |    v9
---------------------------------------+--------+--------+---------a @> '{}'                             |    224 |    213 |    212a @> '{}' and b @> '{}'               |     52 |     57 |    255a @> '{}' and b @> '{}' and c @> '{}' |     51 |     58 |    290


In the older version of the patch I tried to do the similar things (initialize
only one NOT_NULL entry for the first column), but refused to do this in v8.

So, to avoid slowdowns relative to master, I can offer simply to add 
GIN_CAT_EMPTY_QUERY entry for each column with empty ALL-keys if there are 
no normal keys.


-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

Previous
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: Libpq support to connect to standby server as priority
Next
From: Tomas Vondra
Date:
Subject: Re: [HACKERS] Block level parallel vacuum