Re: Rethinking the implementation of ts_headline() - Mailing list pgsql-hackers
From | Alexander Lakhin |
---|---|
Subject | Re: Rethinking the implementation of ts_headline() |
Date | |
Msg-id | c27f642d-020b-01ff-ae61-086af287c4fd@gmail.com Whole thread Raw |
In response to | Re: Rethinking the implementation of ts_headline() (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Rethinking the implementation of ts_headline()
|
List | pgsql-hackers |
Hi,
19.01.2023 19:13, Tom Lane wrote:
19.01.2023 19:13, Tom Lane wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:Anyway, I don't think this needs to stop your current patch.Many thanks for looking at it!
I've found that starting from commit 5a617d75 this query:
SELECT ts_headline('english', 'To be, or not to be', to_tsquery('english', 'or'));
invokes a valgrind-detected error:
==00:00:00:03.950 3241424== Invalid read of size 1
==00:00:00:03.950 3241424== at 0x8EE2A9: TS_execute_locations_recurse (tsvector_op.c:2046)
==00:00:00:03.950 3241424== by 0x8EE220: TS_execute_locations (tsvector_op.c:2017)
==00:00:00:03.950 3241424== by 0x77EAC4: prsd_headline (wparser_def.c:2677)
==00:00:00:03.950 3241424== by 0x94536C: FunctionCall3Coll (fmgr.c:1173)
==00:00:00:03.950 3241424== by 0x778648: ts_headline_byid_opt (wparser.c:322)
==00:00:00:03.950 3241424== by 0x9440F5: DirectFunctionCall3Coll (fmgr.c:836)
==00:00:00:03.950 3241424== by 0x778763: ts_headline_byid (wparser.c:343)
==00:00:00:03.950 3241424== by 0x4AC9F2: ExecInterpExpr (execExprInterp.c:752)
==00:00:00:03.950 3241424== by 0x4AEDFE: ExecInterpExprStillValid (execExprInterp.c:1838)
==00:00:00:03.950 3241424== by 0x636A7E: ExecEvalExprSwitchContext (executor.h:344)
==00:00:00:03.950 3241424== by 0x63E92D: evaluate_expr (clauses.c:4843)
==00:00:00:03.950 3241424== by 0x63DA53: evaluate_function (clauses.c:4345)
...
(Initially I had encountered an asan-detected heap-buffer-overflow with a
more informative document.)
But the less-verbose call:
SELECT ts_headline('', '');
discovers a different error even on 5a617d75~1:
==00:00:00:04.113 3139158== Conditional jump or move depends on uninitialised value(s)
==00:00:00:04.113 3139158== at 0x77B44F: mark_fragment (wparser_def.c:2100)
==00:00:00:04.113 3139158== by 0x77E2F2: mark_hl_words (wparser_def.c:2519)
==00:00:00:04.113 3139158== by 0x77E891: prsd_headline (wparser_def.c:2610)
==00:00:00:04.113 3139158== by 0x944B68: FunctionCall3Coll (fmgr.c:1173)
==00:00:00:04.113 3139158== by 0x778648: ts_headline_byid_opt (wparser.c:322)
==00:00:00:04.113 3139158== by 0x9438F1: DirectFunctionCall3Coll (fmgr.c:836)
==00:00:00:04.113 3139158== by 0x7787B6: ts_headline (wparser.c:352)
==00:00:00:04.113 3139158== by 0x4AC9F2: ExecInterpExpr (execExprInterp.c:752)
==00:00:00:04.113 3139158== by 0x4AEDFE: ExecInterpExprStillValid (execExprInterp.c:1838)
==00:00:00:04.113 3139158== by 0x50BD0C: ExecEvalExprSwitchContext (executor.h:344)
==00:00:00:04.113 3139158== by 0x50BD84: ExecProject (executor.h:378)
==00:00:00:04.113 3139158== by 0x50BFBB: ExecResult (nodeResult.c:136)
==00:00:00:04.113 3139158==
I've reproduced it on REL9_4_STABLE (REL9_4_15) and it looks like the defect
in mark_hl_words():
int bestb = -1,
beste = -1;
int bestlen = -1;
int pose = 0,
...
if (highlight == 0)
{
while (hlCover(prs, query, &p, &q))
{
...
if (bestlen < 0)
{
curlen = 0;
for (i = 0; i < prs->curwords && curlen < min_words; i++)
{
if (!NONWORDTOKEN(prs->words[i].type))
curlen++;
pose = i;
}
bestb = 0;
beste = pose;
}
...
// here we have bestb == 0, beste == 0, but no prs->words in this case
for (i = bestb; i <= beste; i++)
{
if (prs->words[i].item)
prs->words[i].selected = 1;
exists since 2a0083ede.
(Sorry for the distraction.)
Best regards,
Alexander
pgsql-hackers by date: