Memory bug in dsnowball_lexize - Mailing list pgsql-hackers

From Mark Dilger
Subject Memory bug in dsnowball_lexize
Date
Msg-id CAE-h2TrW-5ocMg8ma_0iUcqnD6n8qN9JJ+sAqp=dN2oYjaKdDw@mail.gmail.com
Whole thread Raw
Responses Re: Memory bug in dsnowball_lexize
List pgsql-hackers
Hackers,

In src/backend/snowball/libstemmer/utilities.c, 'create_s' uses
malloc (not palloc) to allocate memory, and on memory exhaustion
returns NULL rather than throwing an exception.  In this same
file, 'replace_s' calls 'create_s' and if it gets back NULL, returns
the error code -1.  Otherwise, it sets z->p to the allocated
memory.

In src/backend/snowball/libstemmer/api.c, 'SN_set_current' calls
'replace_s' and returns whatever 'replace_s' returned, which in
the case of memory exhaustion will be -1.

In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize'
calls 'SN_set_current' and ignores the return value, thereby
failing to notice the error, if any.

I checked one of the stemmers, stem_ISO_8859_1_english.c,
and it treats z->p as an array without checking whether it is
NULL.  This will crash the backend in the above error case.

There is something else weird here, though.  The call to
'SN_set_current' is wrapped in a memory context switch, along
with a call to the stemmer, as if the caller expects any allocated
memory to be palloc'd, which it is not, given the underlying code's
use of malloc and calloc.

There is a comment higher up in dict_snowball.c that seems to
use some handwaving about all this, or perhaps it is documenting
something else entirely.  In any event, I find the documentation
about dictCtx insufficient to explain why this memory handling
is correct.

mark



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: RE: psql - add SHOW_ALL_RESULTS option
Next
From: Fabien COELHO
Date:
Subject: Re: refactoring - share str2*int64 functions