Thread: BUG #15217: Valgrind - writing uninitialised byte(s) in FileWrite
The following bug has been logged on the website: Bug reference: 15217 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 10.4 Operating system: Ubuntu 16.04 Description: Hello, When running PostgreSQL (REL_10_STABLE) under valgrind I am getting the following error messages sporadically: ==00:03:26:22.952 1623== Syscall param write(buf) points to uninitialised byte(s) ==00:03:26:22.952 1623== at 0x4E4A4A0: __write_nocancel (syscall-template.S:84) ==00:03:26:22.952 1623== by 0x72F0EF: FileWrite (fd.c:1773) ==00:03:26:22.952 1623== by 0x7310D1: BufFileDumpBuffer (buffile.c:324) ==00:03:26:22.952 1623== by 0x7313F1: BufFileFlush (buffile.c:466) ==00:03:26:22.952 1623== by 0x7313F1: BufFileRead (buffile.c:372) ==00:03:26:22.952 1623== by 0x894958: ltsReadBlock (logtape.c:248) ==00:03:26:22.952 1623== by 0x894A34: ltsReadFillBuffer (logtape.c:275) ==00:03:26:22.952 1623== by 0x89CB9F: mergeruns (tuplesort.c:2683) ==00:03:26:22.952 1623== by 0x89CB9F: tuplesort_performsort (tuplesort.c:1823) ==00:03:26:22.952 1623== by 0x6414C2: ExecSort (nodeSort.c:117) ==00:03:26:22.952 1623== by 0x6296F8: ExecProcNode (executor.h:250) ==00:03:26:22.952 1623== by 0x6296F8: fetch_input_tuple (nodeAgg.c:695) ==00:03:26:22.952 1623== by 0x62BBBA: agg_retrieve_direct (nodeAgg.c:2347) ==00:03:26:22.952 1623== by 0x62BBBA: ExecAgg (nodeAgg.c:2158) ==00:03:26:22.952 1623== by 0x638AE6: ExecProcNode (executor.h:250) ==00:03:26:22.952 1623== by 0x638AE6: gather_merge_readnext (nodeGatherMerge.c:634) ==00:03:26:22.952 1623== by 0x638E5C: gather_merge_init (nodeGatherMerge.c:468) ==00:03:26:22.952 1623== by 0x638E5C: gather_merge_getnext (nodeGatherMerge.c:536) ==00:03:26:22.952 1623== by 0x638E5C: ExecGatherMerge (nodeGatherMerge.c:250) ==00:03:26:22.952 1623== Address 0xfa8a27e is 6,350 bytes inside a block of size 8,256 client-defined ==00:03:26:22.952 1623== at 0x88FC20: palloc (mcxt.c:872) ==00:03:26:22.952 1623== by 0x7312CF: makeBufFile (buffile.c:107) ==00:03:26:22.952 1623== by 0x7312CF: BufFileCreateTemp (buffile.c:175) ==00:03:26:22.952 1623== by 0x894B14: LogicalTapeSetCreate (logtape.c:390) ==00:03:26:22.952 1623== by 0x89B2F2: inittapes (tuplesort.c:2410) ==00:03:26:22.952 1623== by 0x89B2F2: puttuple_common (tuplesort.c:1622) ==00:03:26:22.952 1623== by 0x89C4CE: tuplesort_putheaptuple (tuplesort.c:1397) ==00:03:26:22.952 1623== by 0x6414A0: ExecSort (nodeSort.c:111) ==00:03:26:22.952 1623== by 0x6296F8: ExecProcNode (executor.h:250) ==00:03:26:22.952 1623== by 0x6296F8: fetch_input_tuple (nodeAgg.c:695) ==00:03:26:22.952 1623== by 0x62BBBA: agg_retrieve_direct (nodeAgg.c:2347) ==00:03:26:22.952 1623== by 0x62BBBA: ExecAgg (nodeAgg.c:2158) ==00:03:26:22.952 1623== by 0x638AE6: ExecProcNode (executor.h:250) ==00:03:26:22.952 1623== by 0x638AE6: gather_merge_readnext (nodeGatherMerge.c:634) ==00:03:26:22.952 1623== by 0x638E5C: gather_merge_init (nodeGatherMerge.c:468) ==00:03:26:22.952 1623== by 0x638E5C: gather_merge_getnext (nodeGatherMerge.c:536) ==00:03:26:22.952 1623== by 0x638E5C: ExecGatherMerge (nodeGatherMerge.c:250) ==00:03:26:22.952 1623== by 0x6296F8: ExecProcNode (executor.h:250) ==00:03:26:22.952 1623== by 0x6296F8: fetch_input_tuple (nodeAgg.c:695) ==00:03:26:22.952 1623== by 0x62BBBA: agg_retrieve_direct (nodeAgg.c:2347) ==00:03:26:22.952 1623== by 0x62BBBA: ExecAgg (nodeAgg.c:2158) ==00:03:26:22.952 1623== Uninitialised value was created by a heap allocation ==00:03:26:22.952 1623== at 0x88FC20: palloc (mcxt.c:872) ==00:03:26:22.952 1623== by 0x894DC2: LogicalTapeWrite (logtape.c:476) ==00:03:26:22.952 1623== by 0x895D81: writetup_heap (tuplesort.c:3757) ==00:03:26:22.952 1623== by 0x89AEE9: dumpbatch (tuplesort.c:3100) ==00:03:26:22.952 1623== by 0x89AEE9: dumptuples (tuplesort.c:2972) ==00:03:26:22.952 1623== by 0x89CA0B: tuplesort_performsort (tuplesort.c:1822) ==00:03:26:22.952 1623== by 0x6414C2: ExecSort (nodeSort.c:117) ==00:03:26:22.952 1623== by 0x6296F8: ExecProcNode (executor.h:250) ==00:03:26:22.952 1623== by 0x6296F8: fetch_input_tuple (nodeAgg.c:695) ==00:03:26:22.952 1623== by 0x62BBBA: agg_retrieve_direct (nodeAgg.c:2347) ==00:03:26:22.952 1623== by 0x62BBBA: ExecAgg (nodeAgg.c:2158) ==00:03:26:22.952 1623== by 0x638AE6: ExecProcNode (executor.h:250) ==00:03:26:22.952 1623== by 0x638AE6: gather_merge_readnext (nodeGatherMerge.c:634) ==00:03:26:22.952 1623== by 0x638E5C: gather_merge_init (nodeGatherMerge.c:468) ==00:03:26:22.952 1623== by 0x638E5C: gather_merge_getnext (nodeGatherMerge.c:536) ==00:03:26:22.952 1623== by 0x638E5C: ExecGatherMerge (nodeGatherMerge.c:250) ==00:03:26:22.952 1623== by 0x6296F8: ExecProcNode (executor.h:250) ==00:03:26:22.952 1623== by 0x6296F8: fetch_input_tuple (nodeAgg.c:695) ==00:03:26:22.952 1623== by 0x62BBBA: agg_retrieve_direct (nodeAgg.c:2347) ==00:03:26:22.952 1623== by 0x62BBBA: ExecAgg (nodeAgg.c:2158) ==00:03:26:22.952 1623== { <insert_a_suppression_name_here> Memcheck:Param write(buf) fun:__write_nocancel fun:FileWrite fun:BufFileDumpBuffer fun:BufFileFlush fun:BufFileRead fun:ltsReadBlock fun:ltsReadFillBuffer fun:mergeruns fun:tuplesort_performsort fun:ExecSort fun:ExecProcNode fun:fetch_input_tuple fun:agg_retrieve_direct fun:ExecAgg fun:ExecProcNode fun:gather_merge_readnext fun:gather_merge_init fun:gather_merge_getnext fun:ExecGatherMerge } Is this something that needs further investigation (and fix) or it just looks like a false positive (and should be added to valgrind.supp)?
On Tue, May 29, 2018 at 9:02 PM, PG Bug reporting form <noreply@postgresql.org> wrote: > Is this something that needs further investigation (and fix) or it just > looks like a false positive (and should be added to valgrind.supp)? This looks like the same harmless warning that we suppressed for parallel sort in Postgres 11 (see commits 9fafa413ac6 and de6428afe13). This happens to be a serial sort in Postgres 10, but the fact that it's still possible there isn't surprising. What's your work_mem setting? Can you show EXPLAIN ANALYZE output for the query? -- Peter Geoghegan
This looks like the same harmless warning that we suppressed for parallel sort in Postgres 11 (see commits 9fafa413ac6 and de6428afe13). This happens to be a serial sort in Postgres 10, but the fact that it's still possible there isn't surprising. What's your work_mem setting? Can you show EXPLAIN ANALYZE output for the query?
Thanks for the answer!
The work_mem setting is 4MB. To get EXPLAIN ANALYZE is rather difficult as I get these errors while running pg_tcpds for hours, but I'll try to catch the query. I will also try to play with work_mem and to apply the aforementioned patches to REL_10_STABLE.
Best regards,
------
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
I've managed to reproduce this warning with pg_tpch.30.05.2018 08:54, Peter Geoghegan wrote:This looks like the same harmless warning that we suppressed for parallel sort in Postgres 11 (see commits 9fafa413ac6 and de6428afe13). This happens to be a serial sort in Postgres 10, but the fact that it's still possible there isn't surprising. What's your work_mem setting? Can you show EXPLAIN ANALYZE output for the query?Thanks for the answer!
The work_mem setting is 4MB. To get EXPLAIN ANALYZE is rather difficult as I get these errors while running pg_tcpds for hours, but I'll try to catch the query. I will also try to play with work_mem and to apply the aforementioned patches to REL_10_STABLE.
The "problematic" query: https://github.com/tvondra/pg_tpch/blob/master/dss/templates/10.sql
EXPLAIN ANALYZE:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=178881.96..178882.01 rows=20 width=202) (actual time=41708.427..41708.994 rows=20 loops=1)
-> Sort (cost=178881.96..179020.08 rows=55249 width=202) (actual time=41706.775..41706.828 rows=20 loops=1)
Sort Key: (sum((lineitem.l_extendedprice * ('1'::numeric - lineitem.l_discount)))) DESC
Sort Method: top-N heapsort Memory: 34kB
-> Finalize GroupAggregate (cost=170313.58..177411.81 rows=55249 width=202) (actual time=27586.165..40109.437 rows=37948 loops=1)
Group Key: customer.c_custkey, nation.n_name
-> Gather Merge (cost=170313.58..176260.79 rows=46040 width=202) (actual time=27570.718..33645.648 rows=37948 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=169313.56..169946.61 rows=23020 width=202) (actual time=26845.343..34327.273 rows=12649 loops=3)
Group Key: customer.c_custkey, nation.n_name
-> Sort (cost=169313.56..169371.11 rows=23020 width=182) (actual time=26801.917..27406.584 rows=38258 loops=3)
Sort Key: customer.c_custkey, nation.n_name
Sort Method: external sort Disk: 7016kB
-> Hash Join (cost=35074.91..165598.19 rows=23020 width=182) (actual time=8882.471..24071.174 rows=38258 loops=3)
Hash Cond: (customer.c_nationkey = nation.n_nationkey)
-> Nested Loop (cost=35073.35..165314.18 rows=23020 width=164) (actual time=8833.355..23146.172 rows=38258 loops=3)
-> Merge Join (cost=35072.92..42628.04 rows=23461 width=156) (actual time=8823.262..13619.941 rows=19073 loops=3)
Merge Cond: (customer.c_custkey = orders.o_custkey)
-> Parallel Index Scan using customer_pkey on customer (cost=0.42..6742.42 rows=62500 width=152) (actual time=13.463..1101.789 rows=50000 loops=3)
-> Materialize (cost=35072.48..35354.01 rows=56306 width=12) (actual time=8795.265..10510.505 rows=57105 loops=3)
-> Sort (cost=35072.48..35213.24 rows=56306 width=12) (actual time=8790.529..9667.322 rows=57105 loops=3)
Sort Key: orders.o_custkey
Sort Method: external merge Disk: 1256kB
-> Bitmap Heap Scan on orders (cost=1197.56..29667.15 rows=56306 width=12) (actual time=273.899..6733.885 rows=57218 loops=3)
Recheck Cond: ((o_orderdate >= '1993-07-01'::date) AND (o_orderdate < '1993-10-01 00:00:00'::timestamp without time zone))
Rows Removed by Index Recheck: 430095
Heap Blocks: exact=15937 lossy=8327
-> Bitmap Index Scan on idx_orders_orderdate (cost=0.00..1183.49 rows=56306 width=0) (actual time=225.114..225.114 rows=57218 loops=3)
Index Cond: ((o_orderdate >= '1993-07-01'::date) AND (o_orderdate < '1993-10-01 00:00:00'::timestamp without time zone))
-> Index Scan using idx_lineitem_orderkey on lineitem (cost=0.43..5.19 rows=4 width=20) (actual time=0.372..0.448 rows=2 loops=57218)
Index Cond: (l_orderkey = orders.o_orderkey)
Filter: (l_returnflag = 'R'::bpchar)
Rows Removed by Filter: 2
-> Hash (cost=1.25..1.25 rows=25 width=30) (actual time=31.890..31.890 rows=25 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Seq Scan on nation (cost=0.00..1.25 rows=25 width=30) (actual time=6.582..8.616 rows=25 loops=3)
Planning time: 689.897 ms
Execution time: 41835.301 ms
(39 rows)
(The work_mem setting was decreased to 1MB.)
Valgrind logs of the three backends affected are attached.
After applying the changes from 9fafa413ac6 and de6428afe13 I don't see this warning.
Best regards,
------
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company