Re: [PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit" setting is relatively small for large tables - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit" setting is relatively small for large tables
Date
Msg-id 30491.1583882178@sss.pgh.pa.us
Whole thread Raw
In response to [PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit"setting is relatively small for large tables  (Adé <ade.hey@gmail.com>)
Responses Re: [PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit"setting is relatively small for large tables  (Adé <ade.hey@gmail.com>)
List pgsql-hackers
=?UTF-8?B?QWTDqQ==?= <ade.hey@gmail.com> writes:
> Like the title says, using "gin_fuzzy_search_limit" degrades speed when it
> has a relatively low setting.
> ...
> Attached is SQL to test and observe this issue and also attached is a patch
> I want to eventually submit to a commitfest.

I took a brief look at this.  It seems like what you're actually trying
to accomplish is to ensure that entryLoadMoreItems's "stepright" path
is taken, instead of re-descending the index from the root.  Okay,
I can see why that'd be a big win, but why are you tying it to the
dropItem behavior?  It should apply any time we're iterating this loop
more than once.  IOW, it seems to me like the logic about when to step
right is just kinda broken, and this is a band-aid rather than a full fix.
The performance hit is worse for fuzzy-search mode because it will
iterate the loop more (relative to the amount of work done elsewhere),
but there's still a potential for wasted work.

Actually, a look at the code coverage report shows that the
not-step-right path is never taken at all in our standard regression
tests.  Maybe that just says bad things about the tests' coverage, but
now I'm wondering whether we could just flush that code path altogether,
and assume that we should always step right at this point.

[ cc'ing heikki and alexander, who seem to have originated that code
at 626a120656a75bf4fe64b1d0d83c23cb38d3771a.  The commit message says
it saves a lot of I/O, but I'm wondering if this report disproves that.
In any case the lack of test coverage is pretty damning. ]

While we're here, what do you think about the comment in the other
code branch just above:

        /* XXX: shouldn't we apply the fuzzy search limit here? */

I'm personally inclined to suspect that the fuzzy-search business has
got lots of bugs, which haven't been noticed because (a) it's so squishily
defined that one can hardly tell whether a given result is buggy or not,
and (b) nearly nobody uses it anyway (possibly not unrelated to (a)).
As somebody who evidently is using it, you're probably more motivated
to track down bugs than the rest of us.

            regards, tom lane



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: [PATCH] Erase the distinctClause if the result is unique by definition
Next
From: "movead.li@highgo.ca"
Date:
Subject: Re: Re: Asynchronous Append on postgres_fdw nodes.