Thread: BUG #18875: COPY BINARY tsvector FROM file leads to misaligned memory access

BUG #18875: COPY BINARY tsvector FROM file leads to misaligned memory access

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18875
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 17.4
Operating system:   Ubuntu 24.04
Description:

The following script, executed against a build with sanitizers enabled:
CREATE TABLE test_tsvector(t text, a tsvector);
COPY test_tsvector FROM '.../src/test/regress/data/tsearch.data';
COPY BINARY test_tsvector TO '/tmp/t.data';
COPY BINARY test_tsvector FROM '/tmp/t.data';

triggers a runtime error:
2025-04-02 17:23:25.502 UTC [1721608] LOG:  statement: COPY BINARY
test_tsvector FROM '/tmp/t.data';
tsvector.c:90:59: runtime error: member access within misaligned address
0x52500005a23c for type 'const struct WordEntryIN', which requires 8 byte
alignment
0x52500005a23c: note: pointer points here
  04 00 00 00 04 20 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
              ^
    #0 0x5e62469fe827 in compareentry
.../src/backend/utils/adt/tsvector.c:90
    #1 0x5e62469fe85c in WordEntryCMP
.../src/backend/utils/adt/tsvector.c:173
    #2 0x5e6246a025bd in tsvectorrecv
.../src/backend/utils/adt/tsvector.c:514
    #3 0x5e6246af3bdc in ReceiveFunctionCall
.../src/backend/utils/fmgr/fmgr.c:1715
    #4 0x5e6245c759b4 in CopyReadBinaryAttribute
.../src/backend/commands/copyfromparse.c:2048
    #5 0x5e6245c7cd0c in CopyFromBinaryOneRow
.../src/backend/commands/copyfromparse.c:1139
    #6 0x5e6245c79428 in NextCopyFrom
.../src/backend/commands/copyfromparse.c:890
    #7 0x5e6245c6e429 in CopyFrom .../src/backend/commands/copyfrom.c:1149
    #8 0x5e6245c669e5 in DoCopy .../src/backend/commands/copy.c:306
    #9 0x5e62466463fd in standard_ProcessUtility
.../src/backend/tcop/utility.c:738
...

Reproduced on REL_10_STABLE .. master.


PG Bug reporting form <noreply@postgresql.org> writes:
> The following script, executed against a build with sanitizers enabled:
> CREATE TABLE test_tsvector(t text, a tsvector);
> COPY test_tsvector FROM '.../src/test/regress/data/tsearch.data';
> COPY BINARY test_tsvector TO '/tmp/t.data';
> COPY BINARY test_tsvector FROM '/tmp/t.data';

> triggers a runtime error:
> 2025-04-02 17:23:25.502 UTC [1721608] LOG:  statement: COPY BINARY
> test_tsvector FROM '/tmp/t.data';
> tsvector.c:90:59: runtime error: member access within misaligned address
> 0x52500005a23c for type 'const struct WordEntryIN', which requires 8 byte
> alignment

Hmm.  This is evidently because of the type pun involved: WordEntryCMP
is supposed to compare WordEntry structs, but it's turning around and
using compareentry which compares WordEntryIN structs.  And those are
larger/better aligned.  Now compareentry doesn't access anything
outside the WordEntry part, but it's theoretically possible that
the compiler could generate load instructions that depend on the
larger alignment.  Given the lack of field reports, that's not
happening on any platforms where it would matter.  But still we
ought to clean it up.

ISTM this coding is basically backwards: compareentry should be coded
to work on WordEntry structs, and then if it's used to compare
WordEntry structs that are embedded in WordEntryIN there's no problem.
And then we don't need the WordEntryCMP wrapper at all.

Will fix, thanks for the report!

            regards, tom lane