Re: regression coverage gaps for gist and hash indexes - Mailing list pgsql-hackers
From | Alexander Lakhin |
---|---|
Subject | Re: regression coverage gaps for gist and hash indexes |
Date | |
Msg-id | 69ee05f7-8168-cce5-7f2d-d887f0aaf8bd@gmail.com Whole thread Raw |
In response to | Re: regression coverage gaps for gist and hash indexes (Alexander Lakhin <exclusion@gmail.com>) |
List | pgsql-hackers |
Hi, 31.03.2023 17:00, Alexander Lakhin wrote: > 31.03.2023 15:55, Tom Lane wrote: >> See also the thread about bug #16329 [1]. Alexander promised to look >> into improving the test coverage in this area, maybe he can keep an >> eye on the WAL logic coverage too. > > Yes, I'm going to analyze that area too. Maybe it'll take more time > (a week or two) if I encounter some bugs there (for now I observe anomalies > with gist__int_ops), but I will definitely try to improve the gist testing. After 2+ weeks of researching I'd like to summarize my findings. 1) The checking query proposed in [1] could be improved by adding the restriction "tgk.v = brute.v" to the condition: WHERE tgk.k >> point(brute.min - 1, 0) AND tgk.k << point(brute.max + 1, 0) Otherwise that query gives a false positive after insert into test_gist_killtuples values(point(505, 0)); The similar improved condition could be placed in hash_index_killtuples.sql. Yet another improvement for the checking query could be done with the replacement: min(k <-> point(0, 0)), max(k <-> point(0, 0)) -> min(k <-> point(0, k[1])), max(p <-> point(0, k[1])) ... It doesn't change the query plan dramatically, but the query becomes more universal (it would work for points with any non-negative integer x). 2) I've checked clang`s scan-build notices related to gist as I planned [2], namely: Logic error Branch condition evaluates to a garbage value src/backend/access/gist/gistutil.c gistCompressValues 606 Logic error Dereference of null pointer src/backend/access/gist/gist.c gistFindCorrectParent 1099 Logic error Dereference of null pointer src/backend/access/gist/gist.c gistdoinsert 671 Logic error Dereference of null pointer src/backend/access/gist/gist.c gistfinishsplit 1339 Logic error Dereference of null pointer src/backend/access/gist/gist.c gistplacetopage 340 Logic error Dereference of null pointer src/backend/access/gist/gistbuildbuffers.c gistPushItupToNodeBuffer 366 Logic error Result of operation is garbage or undefined src/backend/access/gist/gistbuildbuffers.c gistRelocateBuildBuffersOnSplit 677 Logic error Result of operation is garbage or undefined src/backend/access/gist/gistutil.c gistchoose 463 Unused code Dead assignment src/backend/access/gist/gist.c gistdoinsert 843 And found that all of them (except for the last one, that doesn't worth fixing, IMO) are false positives (I can present detailed explanations if it could be of interest.) So I see no grounds here to build new tests on. 3) To date I found other anomalies more or less related to gist: fillfactor is ignored for sorted index build mode, which is effectively default now [3] amcheck verification for gist is not yet ready to use [4] (and the latest patch doesn't apply to the current HEAD) bug #17888: Incorrect memory access in gist__int_ops for an input array with many elements [5] 4) I've constructed some tests, that provide full coverage for gistFindCorrectParent(), reach for "very rare situation", and for gistfixsplit(), but all these tests execute concurrent queries, so they can't be implemented as simple regression tests. Moreover, I could not find any explicit problems when reaching those places (I used the checking query from [1] in absence of other means to check gist indexes), so I see no value in developing (not to speak of committing) these tests for now. I'm going to further explore the gist behavior in those dark corners, but it looks like a long-term task, so I think it shouldn't delay the gist coverage improvement already proposed. 5) 02.04.2023 20:50, Andres Freund wrote: > Looks like the test in [1] could be made a lot cheaper by changing effective_cache_size > for just that test: The effective_cache_size is accounted only when buffering = auto, but in that test we have buffering = on, so changing it wouldn't help there. While looking at gist-related tests, I've noticed an incorrect comment in index_including_gist.sql: * 1.1. test CREATE INDEX with buffered build It's incorrect exactly because with the default effective_cache_size the buffered build mode is not enabled for that index size (I've confirmed this with the elog(LOG,..) placed inside gistInitBuffering()). So I'd like to propose the patch attached, that: a) demonstrates the bug #16329: With 8e5eef50c reverted, I get: **00:00:00:11.179 1587838** Valgrind detected 1 error(s) during execution of "CREATE INDEX tbl_gist_idx ON tbl_gist using gist (c4) INCLUDE (c1,c2,c3) WITH (buffering = on);" b) makes the comment in index_including_gist.sql correct c) increases a visible test coverage a little, in particular: Function 'gistBuffersReleaseBlock' -Lines executed:66.67% of 9 +Lines executed:100.00% of 9 d) doesn't increase the test duration significantly: without valgrind I see difference: 84 ms -> 93 ms, under vagrind: 13513 ms -> 14511 ms Thus, I think, it's worth to split the activity related to gist testing improvement to finalizing/accepting the already-emerging patches and to background research/anomaly findings, which could inspire further enhancements in this area. [1] https://www.postgresql.org/message-id/20230331231300.4kkrl44usvy2pmkv%40awork3.anarazel.de [2] https://www.postgresql.org/message-id/cad7055f-0d76-cc31-71d5-f8b600ebb116%40gmail.com [3] https://www.postgresql.org/message-id/fbbfe5dc-3dfa-d54a-3a94-e2bee37b85d8%40gmail.com [4] https://www.postgresql.org/message-id/885cfb61-26e9-e7c1-49a8-02b3fb12b497%40gmail.com [5] https://www.postgresql.org/message-id/17888-f72930e6b5ce8c14@postgresql.org Best regards, Alexander
Attachment
pgsql-hackers by date: