Re: Rethinking the implementation of ts_headline() - Mailing list pgsql-hackers

From Alexander Lakhin
Subject Re: Rethinking the implementation of ts_headline()
Date
Msg-id c27f642d-020b-01ff-ae61-086af287c4fd@gmail.com
Whole thread Raw
In response to Re: Rethinking the implementation of ts_headline()  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Rethinking the implementation of ts_headline()
List pgsql-hackers
Hi,

19.01.2023 19:13, Tom Lane wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Anyway, I don't think this needs to stop your current patch.
Many thanks for looking at it!

I've found that starting from commit 5a617d75 this query:
SELECT ts_headline('english', 'To be, or not to be', to_tsquery('english', 'or'));

invokes a valgrind-detected error:
==00:00:00:03.950 3241424== Invalid read of size 1
==00:00:00:03.950 3241424==    at 0x8EE2A9: TS_execute_locations_recurse (tsvector_op.c:2046)
==00:00:00:03.950 3241424==    by 0x8EE220: TS_execute_locations (tsvector_op.c:2017)
==00:00:00:03.950 3241424==    by 0x77EAC4: prsd_headline (wparser_def.c:2677)
==00:00:00:03.950 3241424==    by 0x94536C: FunctionCall3Coll (fmgr.c:1173)
==00:00:00:03.950 3241424==    by 0x778648: ts_headline_byid_opt (wparser.c:322)
==00:00:00:03.950 3241424==    by 0x9440F5: DirectFunctionCall3Coll (fmgr.c:836)
==00:00:00:03.950 3241424==    by 0x778763: ts_headline_byid (wparser.c:343)
==00:00:00:03.950 3241424==    by 0x4AC9F2: ExecInterpExpr (execExprInterp.c:752)
==00:00:00:03.950 3241424==    by 0x4AEDFE: ExecInterpExprStillValid (execExprInterp.c:1838)
==00:00:00:03.950 3241424==    by 0x636A7E: ExecEvalExprSwitchContext (executor.h:344)
==00:00:00:03.950 3241424==    by 0x63E92D: evaluate_expr (clauses.c:4843)
==00:00:00:03.950 3241424==    by 0x63DA53: evaluate_function (clauses.c:4345)
...
(Initially I had encountered an asan-detected heap-buffer-overflow with a
more informative document.)

But the less-verbose call:
SELECT ts_headline('', '');

discovers a different error even on 5a617d75~1:
==00:00:00:04.113 3139158== Conditional jump or move depends on uninitialised value(s)
==00:00:00:04.113 3139158==    at 0x77B44F: mark_fragment (wparser_def.c:2100)
==00:00:00:04.113 3139158==    by 0x77E2F2: mark_hl_words (wparser_def.c:2519)
==00:00:00:04.113 3139158==    by 0x77E891: prsd_headline (wparser_def.c:2610)
==00:00:00:04.113 3139158==    by 0x944B68: FunctionCall3Coll (fmgr.c:1173)
==00:00:00:04.113 3139158==    by 0x778648: ts_headline_byid_opt (wparser.c:322)
==00:00:00:04.113 3139158==    by 0x9438F1: DirectFunctionCall3Coll (fmgr.c:836)
==00:00:00:04.113 3139158==    by 0x7787B6: ts_headline (wparser.c:352)
==00:00:00:04.113 3139158==    by 0x4AC9F2: ExecInterpExpr (execExprInterp.c:752)
==00:00:00:04.113 3139158==    by 0x4AEDFE: ExecInterpExprStillValid (execExprInterp.c:1838)
==00:00:00:04.113 3139158==    by 0x50BD0C: ExecEvalExprSwitchContext (executor.h:344)
==00:00:00:04.113 3139158==    by 0x50BD84: ExecProject (executor.h:378)
==00:00:00:04.113 3139158==    by 0x50BFBB: ExecResult (nodeResult.c:136)
==00:00:00:04.113 3139158==

I've reproduced it on REL9_4_STABLE (REL9_4_15) and it looks like the defect
in mark_hl_words():
        int                     bestb = -1,
                                beste = -1;
        int                     bestlen = -1;
        int                     pose = 0,
...
        if (highlight == 0)
        {
                while (hlCover(prs, query, &p, &q))
                {
...
                if (bestlen < 0)
                {
                        curlen = 0;
                        for (i = 0; i < prs->curwords && curlen < min_words; i++)
                        {
                                if (!NONWORDTOKEN(prs->words[i].type))
                                        curlen++;
                                pose = i;
                        }
                        bestb = 0;
                        beste = pose;
                }
...
// here we have bestb == 0, beste == 0, but no prs->words in this case
        for (i = bestb; i <= beste; i++)
        {
                if (prs->words[i].item)
                        prs->words[i].selected = 1;

exists since 2a0083ede.
(Sorry for the distraction.)

Best regards,
Alexander

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)
Next
From: Kirk Wolak
Date:
Subject: Re: Schema variables - new implementation for Postgres 15