Thread: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
[sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
From
Andreas Seltenreich
Date:
Hi, the following statement triggers an assertion in tsearch: select ts_delete(array_to_tsvector('{smith,smith,smith}'::text[]), '{smith,smith}'::text[]); -- TRAP: FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511) regards, Andreas
Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
From
Tom Lane
Date:
Andreas Seltenreich <seltenreich@gmx.de> writes: > the following statement triggers an assertion in tsearch: > select ts_delete(array_to_tsvector('{smith,smith,smith}'::text[]), '{smith,smith}'::text[]); > -- TRAP: FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511) Confirmed here. I notice that the output of array_to_tsvector() is already fishy in this example: regression=# select array_to_tsvector('{smith,smith,smith}'::text[]); array_to_tsvector -------------------------'smith' 'smith' 'smith' (1 row) Shouldn't those have been merged together? You certainly don't get results like that from other tsvector-producing operations: regression=# select to_tsvector('smith smith smith'); to_tsvector ---------------'smith':1,2,3 (1 row) regression=# select 'smith smith smith'::tsvector;tsvector ----------'smith' (1 row) However, that does not seem to be the proximate cause of the crash in ts_delete, because this non-duplicated case still crashes: select ts_delete(array_to_tsvector('{smith,smithx,smithy}'::text[]), '{smith,smith}'::text[]); It kinda looks like you need more than one deletion request for the first entry in the sorted tsvector, because for example {smith,foo,toolbox} works but not {smith,too,toolbox}. I'm thinking there are two distinct bugs here. regards, tom lane
Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
From
Tom Lane
Date:
I wrote: > I'm thinking there are two distinct bugs here. Actually, make that three bugs. I was so focused on the crashing that I failed to notice that ts_delete wasn't producing sane answers even when it didn't crash: regression=# select ts_delete(array_to_tsvector('{smith,foo,bar}'::text[]), '{smith,smith}'::text[]); ts_delete ---------------------'smith' 'foo' 'bar' (1 row) regression=# select ts_delete(array_to_tsvector('{smith,foo,bar}'::text[]), '{smith,foo}'::text[]); ts_delete ---------------'smith' 'bar' (1 row) regression=# select ts_delete(array_to_tsvector('{smith,foo,bar}'::text[]), '{bar,smith}'::text[]); ts_delete ---------------------'smith' 'foo' 'bar' (1 row) The non-array version is no better: regression=# select ts_delete(array_to_tsvector('{smith,foo,bar}'::text[]), 'smith'::text); ts_delete ---------------------'smith' 'foo' 'bar' (1 row) regression=# select ts_delete(array_to_tsvector('{smith,foo,bar}'::text[]), 'foo'::text); ts_delete ---------------'smith' 'bar' (1 row) regression=# select ts_delete(array_to_tsvector('{smith,foo,bar}'::text[]), 'bar'::text); ts_delete ---------------------'smith' 'foo' 'bar' (1 row) I'm not sure if ts_delete takes its second argument as verbatim lexemes or normalizes them first, but none of these words are changed by to_tsvector, so either way it seems to fail to delete stuff it should. regards, tom lane
Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
From
Thomas Munro
Date:
On Thu, Aug 4, 2016 at 9:39 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andreas Seltenreich <seltenreich@gmx.de> writes: >> the following statement triggers an assertion in tsearch: > >> select ts_delete(array_to_tsvector('{smith,smith,smith}'::text[]), '{smith,smith}'::text[]); >> -- TRAP: FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511) > > Confirmed here. I notice that the output of array_to_tsvector() is > already fishy in this example: > > regression=# select array_to_tsvector('{smith,smith,smith}'::text[]); > array_to_tsvector > ------------------------- > 'smith' 'smith' 'smith' > (1 row) > > Shouldn't those have been merged together? You certainly don't get > results like that from other tsvector-producing operations: > > regression=# select to_tsvector('smith smith smith'); > to_tsvector > --------------- > 'smith':1,2,3 > (1 row) > regression=# select 'smith smith smith'::tsvector; > tsvector > ---------- > 'smith' > (1 row) > > However, that does not seem to be the proximate cause of the crash > in ts_delete, because this non-duplicated case still crashes: > > select ts_delete(array_to_tsvector('{smith,smithx,smithy}'::text[]), '{smith,smith}'::text[]); > > It kinda looks like you need more than one deletion request for > the first entry in the sorted tsvector, because for example > {smith,foo,toolbox} works but not {smith,too,toolbox}. > > I'm thinking there are two distinct bugs here. The assertion in tsvector_delete_by_indices fails because its counting algorithm doesn't expect indices_to_delete to contain multiple references to the same index. Maybe that could be fixed by uniquifying in tsvector_delete_arr before calling it, but since tsvector_delete_by_indices already qsorts its input, it should be able to handle duplicates cheaply. I was thinking something like this: for (i = j = k = 0; i < tsv->size; i++) { + bool drop_lexeme = false; + /* * Here we should check whether current i is present in * indices_to_deleteor not. Since indices_to_delete is already sorted - * we can advance it index only when we have match. + * we can advance it index only when we have match. We do this + * repeatedly, in case indices_to_delete contains duplicate references + * to the same index. */ - if (k < indices_count && i == indices_to_delete[k]) + while (k < indices_count && i == indices_to_delete[k]) { + drop_lexeme = true; k++; - continue; } + if (drop_lexeme) + continue; But that doesn't seem to be enough, there is something else wrong here resulting in garbage output, maybe related to the failure to merge the tsvector... -- Thomas Munro http://www.enterprisedb.com
Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
From
Noah Misch
Date:
On Wed, Aug 03, 2016 at 05:52:44PM -0400, Tom Lane wrote: > I wrote: > > I'm thinking there are two distinct bugs here. > > Actually, make that three bugs. I was so focused on the crashing > that I failed to notice that ts_delete wasn't producing sane answers > even when it didn't crash: [Action required within 72 hours. This is a generic notification.] The above-described topic is currently a PostgreSQL 9.6 open item. Teodor, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a 9.6 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within 72 hours of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed in advance of shipping 9.6rc1 next week. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
Re: Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
From
Robert Haas
Date:
On Thu, Aug 4, 2016 at 12:14 AM, Noah Misch <noah@leadboat.com> wrote: > On Wed, Aug 03, 2016 at 05:52:44PM -0400, Tom Lane wrote: >> I wrote: >> > I'm thinking there are two distinct bugs here. >> >> Actually, make that three bugs. I was so focused on the crashing >> that I failed to notice that ts_delete wasn't producing sane answers >> even when it didn't crash: > > [Action required within 72 hours. This is a generic notification.] > > The above-described topic is currently a PostgreSQL 9.6 open item. Teodor, > since you committed the patch believed to have created it, you own this open > item. If some other commit is more relevant or if this does not belong as a > 9.6 open item, please let us know. Otherwise, please observe the policy on > open item ownership[1] and send a status update within 72 hours of this > message. Include a date for your subsequent status update. Testers may > discover new open items at any time, and I want to plan to get them all fixed > in advance of shipping 9.6rc1 next week. Consequently, I will appreciate your > efforts toward speedy resolution. Thanks. Action within 72 hours now seems inadequate; we are scheduled to wrap rc1 on Monday. We need someone to either fix these bugs very very soon, or decide to ship beta4 instead of rc1 (uggh), or decide it's OK to ship rc1 with these known defects, or postpone the planned release. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > Action within 72 hours now seems inadequate; we are scheduled to wrap > rc1 on Monday. We need someone to either fix these bugs very very > soon, or decide to ship beta4 instead of rc1 (uggh), or decide it's OK > to ship rc1 with these known defects, or postpone the planned release. Given the time of year, I'd not be surprised if Oleg and Teodor are on vacation. In view of the time pressure, I'll take a whack at fixing this. I think that Thomas Munro's suggestion is good as far as fixing the Assert failure is concerned. I do not know where the other problems are, but maybe I can find them ... regards, tom lane
Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
From
Tom Lane
Date:
Thomas Munro <thomas.munro@enterprisedb.com> writes: > The assertion in tsvector_delete_by_indices fails because its counting > algorithm doesn't expect indices_to_delete to contain multiple > references to the same index. Maybe that could be fixed by > uniquifying in tsvector_delete_arr before calling it, but since > tsvector_delete_by_indices already qsorts its input, it should be able > to handle duplicates cheaply. I poked at this and realized that that's not sufficient. If there are duplicates in indices_to_delete, then the initial estimate tsout->size = tsv->size - indices_count; is wrong because indices_count is an overestimate of how many lexemes will be removed. And because the calculation "dataout = STRPTR(tsout)" depends on tsout->size, we can't just wait till later to get it right. We could possibly initialize tsout->size = tsv->size (the maximum possible value), thereby ensuring that the WordEntry array doesn't overlap the dataout area; compute the correct tsout->size in the loop; and then memmove the data area into place to collapse out wasted space. But I think it might be simpler and better-performant just to de-dup the indices_to_delete array after qsort'ing it; that would certainly win for the case of indices_count == 1. The other problems I noted with failure to delete items seem to stem from the fact that tsvector_delete_arr relies on tsvector_bsearch to find items, but the input tsvector is not sorted (never mind de'duped) by array_to_tsvector. This seems like simple brain fade in array_to_tsvector, as AFAICS that's a required property of tsvectors. regards, tom lane