Thread: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15943
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 12beta2
Operating system:   Ubuntu 18.04
Description:

When running under valgrind (sitting on the master branch):
printf "SELECT pg_notify('notify_async1',NULL); %.0s" `seq 2048` | psql

I get the following error:
==00:00:00:12.885 19648== Syscall param write(buf) points to uninitialised
byte(s)
==00:00:00:12.885 19648==    at 0x5054281: write (write.c:27)
==00:00:00:12.885 19648==    by 0x2687F0: SlruPhysicalWritePage
(slru.c:851)
==00:00:00:12.885 19648==    by 0x268E95: SlruInternalWritePage
(slru.c:543)
==00:00:00:12.885 19648==    by 0x2690F1: SlruSelectLRUPage (slru.c:1095)
==00:00:00:12.885 19648==    by 0x269777: SimpleLruZeroPage (slru.c:269)
==00:00:00:12.885 19648==    by 0x31DABA: asyncQueueAddEntries
(async.c:1379)
==00:00:00:12.885 19648==    by 0x31F130: PreCommit_Notify (async.c:845)
==00:00:00:12.885 19648==    by 0x273C19: CommitTransaction (xact.c:2128)
==00:00:00:12.885 19648==    by 0x274A10: CommitTransactionCommand
(xact.c:2921)
==00:00:00:12.885 19648==    by 0x53BD7D: finish_xact_command
(postgres.c:2597)
==00:00:00:12.885 19648==    by 0x53E2F3: exec_simple_query
(postgres.c:1256)
==00:00:00:12.885 19648==    by 0x540545: PostgresMain (postgres.c:4256)
==00:00:00:12.885 19648==  Address 0xf4f95bf is in a rw- anonymous segment
==00:00:00:12.885 19648==  Uninitialised value was created by a stack
allocation
==00:00:00:12.885 19648==    at 0x31D967: asyncQueueAddEntries
(async.c:1306)

Increasing the count to 8192 produces another one, slightly different
stack:
==00:00:00:29.999 19648== Syscall param write(buf) points to uninitialised
byte(s)
==00:00:00:29.999 19648==    at 0x5054281: write (write.c:27)
==00:00:00:29.999 19648==    by 0x2687F0: SlruPhysicalWritePage
(slru.c:851)
==00:00:00:29.999 19648==    by 0x268E95: SlruInternalWritePage
(slru.c:543)
==00:00:00:29.999 19648==    by 0x26A1BD: SimpleLruTruncate (slru.c:1234)
==00:00:00:29.999 19648==    by 0x31D5A5: asyncQueueAdvanceTail
(async.c:2022)
==00:00:00:29.999 19648==    by 0x31F231: ProcessCompletedNotifies
(async.c:1132)
==00:00:00:29.999 19648==    by 0x5403D1: PostgresMain (postgres.c:4176)
==00:00:00:29.999 19648==    by 0x4B16D9: BackendRun (postmaster.c:4431)
==00:00:00:29.999 19648==    by 0x4B481B: BackendStartup
(postmaster.c:4122)
==00:00:00:29.999 19648==    by 0x4B4B32: ServerLoop (postmaster.c:1704)
==00:00:00:29.999 19648==    by 0x4B5F70: PostmasterMain
(postmaster.c:1377)
==00:00:00:29.999 19648==    by 0x406428: main (main.c:228)
==00:00:00:29.999 19648==  Address 0xf4fb5bf is in a rw- anonymous segment
==00:00:00:29.999 19648==  Uninitialised value was created by a stack
allocation
==00:00:00:29.999 19648==    at 0x31D967: asyncQueueAddEntries
(async.c:1306)

(It's also reproduced on REL_12_BETA3, REL9_6_15.)


Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

From
Andres Freund
Date:
Hi,

On August 8, 2019 12:44:43 AM EDT, PG Bug reporting form <noreply@postgresql.org> wrote:
>The following bug has been logged on the website:
>
>Bug reference:      15943
>Logged by:          Alexander Lakhin
>Email address:      exclusion@gmail.com
>PostgreSQL version: 12beta2
>Operating system:   Ubuntu 18.04
>Description:
>
>When running under valgrind (sitting on the master branch):
>printf "SELECT pg_notify('notify_async1',NULL); %.0s" `seq 2048` | psql
>
>I get the following error:
>==00:00:00:12.885 19648== Syscall param write(buf) points to
>uninitialised
>byte(s)
>==00:00:00:12.885 19648==    at 0x5054281: write (write.c:27)
>==00:00:00:12.885 19648==    by 0x2687F0: SlruPhysicalWritePage
>(slru.c:851)
>==00:00:00:12.885 19648==    by 0x268E95: SlruInternalWritePage
>(slru.c:543)
>==00:00:00:12.885 19648==    by 0x2690F1: SlruSelectLRUPage
>(slru.c:1095)
>==00:00:00:12.885 19648==    by 0x269777: SimpleLruZeroPage
>(slru.c:269)
>==00:00:00:12.885 19648==    by 0x31DABA: asyncQueueAddEntries
>(async.c:1379)
>==00:00:00:12.885 19648==    by 0x31F130: PreCommit_Notify
>(async.c:845)
>==00:00:00:12.885 19648==    by 0x273C19: CommitTransaction
>(xact.c:2128)
>==00:00:00:12.885 19648==    by 0x274A10: CommitTransactionCommand
>(xact.c:2921)
>==00:00:00:12.885 19648==    by 0x53BD7D: finish_xact_command
>(postgres.c:2597)
>==00:00:00:12.885 19648==    by 0x53E2F3: exec_simple_query
>(postgres.c:1256)
>==00:00:00:12.885 19648==    by 0x540545: PostgresMain
>(postgres.c:4256)
>==00:00:00:12.885 19648==  Address 0xf4f95bf is in a rw- anonymous
>segment
>==00:00:00:12.885 19648==  Uninitialised value was created by a stack
>allocation
>==00:00:00:12.885 19648==    at 0x31D967: asyncQueueAddEntries
>(async.c:1306)
>
>Increasing the count to 8192 produces another one, slightly different
>stack:
>==00:00:00:29.999 19648== Syscall param write(buf) points to
>uninitialised
>byte(s)
>==00:00:00:29.999 19648==    at 0x5054281: write (write.c:27)
>==00:00:00:29.999 19648==    by 0x2687F0: SlruPhysicalWritePage
>(slru.c:851)
>==00:00:00:29.999 19648==    by 0x268E95: SlruInternalWritePage
>(slru.c:543)
>==00:00:00:29.999 19648==    by 0x26A1BD: SimpleLruTruncate
>(slru.c:1234)
>==00:00:00:29.999 19648==    by 0x31D5A5: asyncQueueAdvanceTail
>(async.c:2022)
>==00:00:00:29.999 19648==    by 0x31F231: ProcessCompletedNotifies
>(async.c:1132)
>==00:00:00:29.999 19648==    by 0x5403D1: PostgresMain
>(postgres.c:4176)
>==00:00:00:29.999 19648==    by 0x4B16D9: BackendRun
>(postmaster.c:4431)
>==00:00:00:29.999 19648==    by 0x4B481B: BackendStartup
>(postmaster.c:4122)
>==00:00:00:29.999 19648==    by 0x4B4B32: ServerLoop
>(postmaster.c:1704)
>==00:00:00:29.999 19648==    by 0x4B5F70: PostmasterMain
>(postmaster.c:1377)
>==00:00:00:29.999 19648==    by 0x406428: main (main.c:228)
>==00:00:00:29.999 19648==  Address 0xf4fb5bf is in a rw- anonymous
>segment
>==00:00:00:29.999 19648==  Uninitialised value was created by a stack
>allocation
>==00:00:00:29.999 19648==    at 0x31D967: asyncQueueAddEntries
>(async.c:1306)
>
>(It's also reproduced on REL_12_BETA3, REL9_6_15.)

That seems likely to be padding. Probably easier to see if you enable read-var-info and track-origins.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

From
Alexander Lakhin
Date:
Hello Andres,

08.08.2019 13:33, Andres Freund wrote:
> Hi,
>
> On August 8, 2019 12:44:43 AM EDT, PG Bug reporting form <noreply@postgresql.org> wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference:      15943
>> Logged by:          Alexander Lakhin
>> Email address:      exclusion@gmail.com
>> PostgreSQL version: 12beta2
>> Operating system:   Ubuntu 18.04
>> Description:        
>>
>> When running under valgrind (sitting on the master branch):
>> printf "SELECT pg_notify('notify_async1',NULL); %.0s" `seq 2048` | psql
>>
> That seems likely to be padding. Probably easier to see if you enable read-var-info and track-origins.
>
With "--track-origins=yes --read-var-info=yes" I get:

==00:00:00:30.800 24110== Syscall param write(buf) points to
uninitialised byte(s)
==00:00:00:30.801 24110==    at 0x5054281: write (write.c:27)
==00:00:00:30.801 24110==    by 0x2687F0: SlruPhysicalWritePage (slru.c:851)
==00:00:00:30.801 24110==    by 0x268E95: SlruInternalWritePage (slru.c:543)
==00:00:00:30.801 24110==    by 0x2690F1: SlruSelectLRUPage (slru.c:1095)
==00:00:00:30.801 24110==    by 0x269777: SimpleLruZeroPage (slru.c:269)
==00:00:00:30.801 24110==    by 0x31DABA: asyncQueueAddEntries
(async.c:1381)
==00:00:00:30.801 24110==    by 0x31F130: PreCommit_Notify (async.c:845)
==00:00:00:30.801 24110==    by 0x273C19: CommitTransaction (xact.c:2128)
==00:00:00:30.801 24110==    by 0x274A10: CommitTransactionCommand
(xact.c:2921)
==00:00:00:30.801 24110==    by 0x53BD7D: finish_xact_command
(postgres.c:2597)
==00:00:00:30.801 24110==    by 0x53E2F3: exec_simple_query
(postgres.c:1256)
==00:00:00:30.801 24110==    by 0x540545: PostgresMain (postgres.c:4256)
==00:00:00:30.801 24110==  Address 0xf4f95bf is in a rw- anonymous segment
==00:00:00:30.801 24110==  Uninitialised value was created by a stack
allocation
==00:00:00:30.801 24110==    at 0x31D967: asyncQueueAddEntries
(async.c:1306)

and
==00:00:00:48.177 24110== Syscall param write(buf) points to
uninitialised byte(s)
==00:00:00:48.177 24110==    at 0x5054281: write (write.c:27)
==00:00:00:48.177 24110==    by 0x2687F0: SlruPhysicalWritePage (slru.c:851)
==00:00:00:48.177 24110==    by 0x268E95: SlruInternalWritePage (slru.c:543)
==00:00:00:48.177 24110==    by 0x26A1BD: SimpleLruTruncate (slru.c:1234)
==00:00:00:48.177 24110==    by 0x31D5A5: asyncQueueAdvanceTail
(async.c:2024)
==00:00:00:48.177 24110==    by 0x31F231: ProcessCompletedNotifies
(async.c:1132)
==00:00:00:48.177 24110==    by 0x5403D1: PostgresMain (postgres.c:4176)
==00:00:00:48.177 24110==    by 0x4B16D9: BackendRun (postmaster.c:4431)
==00:00:00:48.177 24110==    by 0x4B481B: BackendStartup (postmaster.c:4122)
==00:00:00:48.177 24110==    by 0x4B4B32: ServerLoop (postmaster.c:1704)
==00:00:00:48.178 24110==    by 0x4B5F70: PostmasterMain (postmaster.c:1377)
==00:00:00:48.178 24110==    by 0x406428: main (main.c:228)
==00:00:00:48.178 24110==  Address 0xf4fb5bf is in a rw- anonymous segment
==00:00:00:48.178 24110==  Uninitialised value was created by a stack
allocation
==00:00:00:48.178 24110==    at 0x31D967: asyncQueueAddEntries
(async.c:1306)

Indeed, "memset(&qe, 0, sizeof(qe));" added in the beginning of
asyncQueueAddEntries fixes the issue.
More interestingly, "memset(qe.data, 0, sizeof(qe.data));" in the same
place fixes it too (the patch is attached).
So it seems, the padding in AsyncQueueEntry has an alibi.

Best regards,
Alexander


Attachment

Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

From
Alvaro Herrera
Date:
On 2019-Aug-08, Alexander Lakhin wrote:

> Indeed, "memset(&qe, 0, sizeof(qe));" added in the beginning of
> asyncQueueAddEntries fixes the issue.
> More interestingly, "memset(qe.data, 0, sizeof(qe.data));" in the same
> place fixes it too (the patch is attached).

Maybe VALGRIND_MAKE_MEM_DEFINED?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

From
Andres Freund
Date:
Hi,

On August 8, 2019 12:43:32 PM EDT, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>On 2019-Aug-08, Alexander Lakhin wrote:
>
>> Indeed, "memset(&qe, 0, sizeof(qe));" added in the beginning of
>> asyncQueueAddEntries fixes the issue.
>> More interestingly, "memset(qe.data, 0, sizeof(qe.data));" in the
>same
>> place fixes it too (the patch is attached).
>
>Maybe VALGRIND_MAKE_MEM_DEFINED?

Seems there's no reason to not zero initialize memory here? But perhaps just by initializing the stack variable?
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> Seems there's no reason to not zero initialize memory here? But perhaps just by initializing the stack variable?

I think all we need here is another suppression in
src/tools/valgrind.supp.  We have not insisted on zeroing
pad bytes that get sent to disk in the places already
enumerated in valgrind.supp, so why would we apply a
different rule to async.c?

            regards, tom lane



Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

From
Andres Freund
Date:
Hi,

On August 8, 2019 2:41:42 PM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Andres Freund <andres@anarazel.de> writes:
>> Seems there's no reason to not zero initialize memory here? But
>perhaps just by initializing the stack variable?
>
>I think all we need here is another suppression in
>src/tools/valgrind.supp.  We have not insisted on zeroing
>pad bytes that get sent to disk in the places already
>enumerated in valgrind.supp, so why would we apply a
>different rule to async.c?

Well, with a lot of the other case there's a lot of sources for such uninitialized data. Here there probably is just
one?If it weren't such a game of whack a mole, I'd even consider properly initializing the other places. And
initializingthe stack data here ought to be cheap in this case? 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

From
Alexander Lakhin
Date:
Hello hackers,
> 08.08.2019 13:33, Andres Freund wrote:
>> On August 8, 2019 12:44:43 AM EDT, PG Bug reporting form <noreply@postgresql.org> wrote:
>>> The following bug has been logged on the website:
>>>
>>> Bug reference:      15943
>>> Logged by:          Alexander Lakhin
>>> Email address:      exclusion@gmail.com
>>> PostgreSQL version: 12beta2
>>> Operating system:   Ubuntu 18.04
>>> Description:        
>>>
>>> When running under valgrind (sitting on the master branch):
>>> printf "SELECT pg_notify('notify_async1',NULL); %.0s" `seq 2048` | psql
>>>
>> That seems likely to be padding. Probably easier to see if you enable read-var-info and track-origins.
> Indeed, "memset(&qe, 0, sizeof(qe));" added in the beginning of
> asyncQueueAddEntries fixes the issue.
> More interestingly, "memset(qe.data, 0, sizeof(qe.data));" in the same
> place fixes it too (the patch is attached).
> So it seems, the padding in AsyncQueueEntry has an alibi.
Andres, you're right. The padding is found guilty (inside the qe.data
itself).
Please look at the patch addressing the cause of the problem (inspired
by btree_bit.c).
But if a valgrind suppression is preferred, look at the alternate patch.

Best regards,
Alexander

Attachment