Thread: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow

[BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow

From

skoposov@cmu.edu

Date:

29 June 2017, 19:16:37

The following bug has been logged on the website:

Bug reference:      14722
Logged by:          Sergey Koposov
Email address:      skoposov@cmu.edu
PostgreSQL version: 9.5.7
Operating system:   Debian 7.11, x86_64
Description:

Hi, 

I have a very large table (40e9 records) that I'm trying to create the index
on and I am getting a segmentation fault that could be traced as far as I
understand to a 32 bit int overflow in tuplesort_heap_siftup

Here are the commands leading to the crash: 

wsdb=# set maintenance_work_mem to '70GB';
                            
 
SET
wsdb=# create index on cgonzal.vvv_single_ks_sorted (q3c_ang2ipix(ra,dec));
                            
 

----

Importantly the table has already been sorted by q3c_ang2ipix(ra,dec) !

--

Here is the table info:
wsdb=# explain select * from  cgonzal.vvv_single_ks_sorted;                                     QUERY PLAN
                      
 
---------------------------------------------------------------------------------------Seq Scan on vvv_single_ks_sorted
(cost=0.00..968967342.13 rows=43362626913 
width=72)
(1 row)

wsdb=# \d cgonzal.vvv_single_ks_sorted Table "cgonzal.vvv_single_ks_sorted"Column  |       Type       | Modifiers 
---------+------------------+-----------objid   | bigint           | ra      | double precision | dec     | double
precision| x       | real             | y       | real             | chip    | integer          | mag     | real
    | e_mag   | real             | class   | integer          | frameid | bigint           | zp      | double precision
|obj_id  | bigint           | 
 

--------


Here is the gdb full stacktrace:
(gdb) bt full
#0  0x0000000000914cf8 in tuplesort_heap_siftup (state=0x23503f8,
checkIndex=1 '\001') at tuplesort.c:3014       j = -1879048193       memtuples = 0x7fb283aa1048       tuple =
0x7fba03aa0fd0      i = 1207959551       n = 1342177275
 
#1  0x000000000091430a in dumptuples (state=0x23503f8, alltuples=0 '\000')
at tuplesort.c:2648       __func__ = "dumptuples"
#2  0x00000000009120a3 in puttuple_common (state=0x23503f8,
tuple=0x7ffe420fefc0) at tuplesort.c:1468       __func__ = "puttuple_common"
#3  0x0000000000911d85 in tuplesort_putindextuplevalues (state=0x23503f8,
rel=0x7fd040f3b8e0, self=0x234ba34, values=0x7ffe420ff360,    isnull=0x7ffe420ff340 "") at tuplesort.c:1321
oldcontext= 0x23340b8       stup = {tuple = 0x7fbf040f6ae8, datum1 = 4710889527840951089, 
isnull1 = 0 '\000', tupindex = 0}       original = 4710889527840951089       tuple = 0x7fbf040f6ae8
#4  0x00000000004d26dd in _bt_spool (btspool=0x234cba0, self=0x234ba34,
values=0x7ffe420ff360, isnull=0x7ffe420ff340 "") at nbtsort.c:192
No locals.
#5  0x00000000004cba67 in btbuildCallback (index=0x7fd040f3b8e0,
htup=0x234ba30, values=0x7ffe420ff360, isnull=0x7ffe420ff340 "",    tupleIsAlive=1 '\001', state=0x7ffe420ff550) at
nbtree.c:179      buildstate = 0x7ffe420ff550
 
#6  0x0000000000525d8e in IndexBuildHeapRangeScan
(heapRelation=0x7fd040f32f78, indexRelation=0x7fd040f3b8e0,
indexInfo=0x2348308,    allow_sync=1 '\001', anyvisible=0 '\000', start_blockno=0,
numblocks=4294967295, callback=0x4cba0a <btbuildCallback>,    callback_state=0x7ffe420ff550) at index.c:2591
tupleIsAlive= 1 '\001'       is_system_catalog = 0 '\000'       checking_uniqueness = 0 '\000'       scan = 0x234b9e8
   heapTuple = 0x234ba30       values = {4710889527840951089, 9472000, 36863416, 1089733344, 
140730006762416, 9195433, 140730006762448, 140532419658520, 140730006762528,
         140532419658464, 140730006762448, 9261444, 1976, 140532419658520,
4999282, 128, 36962306, 17179869199, 140730006762544, 9473335, 37029384,          37020152, 18288211008, 9498080,
37029368,37020152, 
140730006762592, 9478487, 140730006762624, 37029384, 64, 37020152}       isnull =
"\000\314\366@\320\177\000\000'Z\216\000\000\000\000\000\030\003\364@\320\177\000\000\310A3\002\000\000\000"
reltuples= 1342177279       predicate = 0x0       slot = 0x2348e08       estate = 0x2358448
 
---Type <return> to continue, or q <return> to quit---       econtext = 0x2358558       snapshot = 0xd366e0
OldestXmin= 1148880660       root_blkno = 16570089       root_offsets = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,          34, 35, 36, 37, 38, 39, 40, 41, 42,
43,44, 45, 46, 47, 48, 49, 
50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
         69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 0 <repeats 210
times>}       __func__ = "IndexBuildHeapRangeScan"
#7  0x0000000000525556 in IndexBuildHeapScan (heapRelation=0x7fd040f32f78,
indexRelation=0x7fd040f3b8e0, indexInfo=0x2348308, allow_sync=1 '\001',    callback=0x4cba0a <btbuildCallback>,
callback_state=0x7ffe420ff550)at 
index.c:2162
No locals.
#8  0x00000000004cb979 in btbuild (fcinfo=0x7ffe420ff5d0) at nbtree.c:121       heap = 0x7fd040f32f78       index =
0x7fd040f3b8e0      indexInfo = 0x2348308       result = 0x234be28       reltuples = 6.9529861680561111e-310
buildstate= {isUnique = 0 '\000', haveDead = 0 '\000', heapRel = 
0x7fd040f32f78, spool = 0x234cba0, spool2 = 0x0, indtuples = 1342177278}       __func__ = "btbuild"
#9  0x00000000008e8a13 in OidFunctionCall3Coll (functionId=338, collation=0,
arg1=140532419604344, arg2=140532419639520, arg3=36995848) at fmgr.c:1649       flinfo = {fn_addr = 0x4cb854 <btbuild>,
fn_oid= 338, fn_nargs = 3, 
fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2 '\002',          fn_extra = 0x0, fn_mcxt = 0x23340b8, fn_expr
=0x0}      fcinfo = {flinfo = 0x7ffe420ff980, context = 0x0, resultinfo = 0x0, 
fncollation = 0, isnull = 0 '\000', nargs = 3, arg = {140532419604344,            140532419639520, 36995848,
140532419656080,68756505104, 128, 
13, 17179869199, 140730006763184, 9472170, 128, 36023424, 140730006763152,
           17189342519, 140532419627520, 36913336, 36023424, 37017440,
37017424, 36913336, 140730006763200, 512, 1108342496, 25769803839,            140730006763248, 9473335,
140532419653944,36023424, 
26878146304, 6912158, 140532419653928, 36023424, 140730006763296, 9478487,
           140730006763296, 140532419653944, 0, 36023424, 140730006763328,
9230214, 672953898141726960, 140532419653944, 140730006763792, 9231509,            10999411261461, 140532419639520,
70458938492543,156684292, 0, 
18446744069414584320, 65536, 0, 140532419654472, 140532419654744,            672953910093598724, 16405, 0, 0, 0, 0, 0,
0,0, 0, 0, 0, 
140532419656112, 140532419656056, 140532419657096, 37019880, 37027816,
37028344,            37028368, 37028392, 37028416, 37028584, 0, 0, 0, 0, 0, 0,
37028560, 0, 0, 0, 0, 36491160, 0, 0, 0, 13854912, 140532419640320,
8626848200,            36633224, 8589934592, 140730006763808, 6798261,
672953909936914436, 13854912},          argnull =

"\000\000\000\000\000\000\000\000\240h\323\000\000\000\000\000`\371\017B\376\177\000\000\032\274g\000\000\000\000\000\240h\323\000\
000\000\000\000\023\000\000\000\016\000\000\000\300\371\017B\376\177\000\000\aۍ",
'\000' <repeats 13 times>, "hD\224\000\000\000\000\000p<\224\000\000\
000\000\000\331\a\000\000\016\000\000\000\260\371\017B"}       result = 42949672962       __func__ =
"OidFunctionCall3Coll"
#10 0x00000000005252a3 in index_build (heapRelation=0x7fd040f32f78,
indexRelation=0x7fd040f3b8e0, indexInfo=0x2348308, isprimary=0 '\000', 
---Type <return> to continue, or q <return> to quit---   isreindex=0 '\000') at index.c:2025       procedure = 338
stats = 0x234cfec       save_userid = 10       save_sec_context = 0       save_nestlevel = 2       __func__ =
"index_build"
#11 0x0000000000523f98 in index_create (heapRelation=0x7fd040f32f78,
indexRelationName=0x234b8e8 "vvv_single_ks_sorted_q3c_ang2ipix_idx",    indexRelationId=156684292, relFileNode=0,
indexInfo=0x2348308,
indexColNames=0x234b638, accessMethodObjectId=403, tableSpaceId=0,    collationObjectId=0x234bdf8,
classObjectId=0x234be10,
coloptions=0x234be28, reloptions=0, isprimary=0 '\000', isconstraint=0
'\000',    deferrable=0 '\000', initdeferred=0 '\000', allow_system_table_mods=0
'\000', skip_build=0 '\000', concurrent=0 '\000', is_internal=0 '\000',    if_not_exists=0 '\000') at index.c:1100
heapRelationId = 156673270       pg_class = 0x7fd040f81208       indexRelation = 0x7fd040f3b8e0       indexTupDesc =
0x23486c8      shared_relation = 0 '\000'       mapped_relation = 0 '\000'       is_exclusion = 0 '\000'
                                                                                            [120/270]       namespaceId
=16842       i = 1       relpersistence = 112 'p'       __func__ = "index_create"
 
#12 0x00000000005e9d27 in DefineIndex (relationId=156673270, stmt=0x23485f8,
indexRelationId=0, is_alter_table=0 '\000', check_rights=1 '\001',    skip_build=0 '\000', quiet=0 '\000') at
indexcmds.c:607      indexRelationName = 0x234b8e8 
"vvv_single_ks_sorted_q3c_ang2ipix_idx"       accessMethodName = 0x2348930 "btree"       typeObjectId = 0x234b780
collationObjectId= 0x234bdf8       classObjectId = 0x234be10       accessMethodId = 403       namespaceId = 16842
tablespaceId= 0       indexColNames = 0x234b638       rel = 0x7fd040f32f78       indexRelation = 0x23340b8       tuple
=0x7fd040f39b30
 
---Type <return> to continue, or q <return> to quit---       accessMethodForm = 0x7fd040f39ba8       amcanorder = 1
'\001'      amoptions = 2785       reloptions = 0       coloptions = 0x234be28       indexInfo = 0x2348308
numberOfAttributes= 1       limitXmin = 0       old_snapshots = 0x7fd040f32f78       address = {classId = 36997560,
objectId= 0, objectSubId = 
36995848}       n_old_snapshots = 0       heaprelid = {relId = 1108343952, dbId = 32766}       heaplocktag =
{locktag_field1= 4657712, locktag_field2 = 0, 
locktag_field3 = 1108347536, locktag_field4 = 32766, locktag_type = 0
'\000',          locktag_lockmethodid = 0 '\000'}       lockmode = 5       snapshot = 0x2348308       i = 0
__func__= "DefineIndex"
 
#13 0x00000000007ab5ec in ProcessUtilitySlow (parsetree=0x230c138,    queryString=0x230b268 "create index on
cgonzal.vvv_single_ks_sorted
(q3c_ang2ipix(ra,dec));", context=PROCESS_UTILITY_TOPLEVEL, params=0x0,    dest=0x230c4d8, completionTag=0x7ffe42100420
"")at utility.c:1259       stmt = 0x23485f8       relid = 156673270       lockmode = 5       save_exception_stack =
0x7ffe421002e0      save_context_stack = 0x0       local_sigjmp_buf = {{__jmpbuf = {0, 8080871256505359237, 4657712, 
140730006768272, 0, 0, 8080871325866564485, -8081285932728411259},            __mask_was_saved = 0, __saved_mask =
{__val= {64, 36632424, 
140730006765464, 140730006765472, 13829056, 8192, 36973152, 4657712, 5,                140730006765328, 9476353, 64, 0,
36973248,13829056, 
64}}}}       isTopLevel = 1 '\001'       isCompleteQuery = 1 '\001'       needCleanup = 0 '\000'       commandCollected
=0 '\000'       address = {classId = 0, objectId = 0, objectSubId = 13829056}       secondaryObject = {classId = 0,
objectId= 0, objectSubId = 0}       __func__ = "ProcessUtilitySlow"
 
#14 0x00000000007aaa16 in standard_ProcessUtility (parsetree=0x230c138, 
---Type <return> to continue, or q <return> to quit---   queryString=0x230b268 "create index on
cgonzal.vvv_single_ks_sorted
(q3c_ang2ipix(ra,dec));", context=PROCESS_UTILITY_TOPLEVEL, params=0x0,    dest=0x230c4d8, completionTag=0x7ffe42100420
"")at utility.c:892       isTopLevel = 1 '\001'       __func__ = "standard_ProcessUtility"
 
#15 0x00000000007a9beb in ProcessUtility (parsetree=0x230c138,    queryString=0x230b268 "create index on
cgonzal.vvv_single_ks_sorted
(q3c_ang2ipix(ra,dec));", context=PROCESS_UTILITY_TOPLEVEL, params=0x0,    dest=0x230c4d8, completionTag=0x7ffe42100420
"")at utility.c:334
 
No locals.
#16 0x00000000007a8e07 in PortalRunUtility (portal=0x2278798,
utilityStmt=0x230c138, isTopLevel=1 '\001', dest=0x230c4d8,    completionTag=0x7ffe42100420 "") at pquery.c:1183
active_snapshot_set= 1 '\001'       __func__ = "PortalRunUtility"
 
#17 0x00000000007a8fae in PortalRunMulti (portal=0x2278798, isTopLevel=1
'\001', dest=0x230c4d8, altdest=0x230c4d8,
completionTag=0x7ffe4210042[50/270]   at pquery.c:1314       stmt = 0x230c138       active_snapshot_set = 0 '\000'
stmtlist_item = 0x230c488
 
#18 0x00000000007a85c2 in PortalRun (portal=0x2278798,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x230c4d8,
altdest=0x230c4d8,    completionTag=0x7ffe42100420 "") at pquery.c:812       save_exception_stack = 0x7ffe42100560
save_context_stack = 0x0       local_sigjmp_buf = {{__jmpbuf = {0, 8080871256352267141, 4657712, 
140730006768272, 0, 0, 8080871256442444677, -8081285932000961659},            __mask_was_saved = 0, __saved_mask =
{__val= {3432, 9356099, 
36745776, 13, 0, 140730006766512, 9477730, 36624768, 88, 0, 36750640, 88,                9359107, 36750552, 36750640,
0}}}}      result = 0 '\000'       nprocessed = 32766       saveTopTransactionResourceOwner = 0x22ef878
saveTopTransactionContext= 0x22ef768       saveActivePortal = 0x0       saveResourceOwner = 0x22ef878
savePortalContext= 0x0       saveMemoryContext = 0x22ef768       __func__ = "PortalRun"
 
#19 0x00000000007a2ac3 in exec_simple_query (query_string=0x230b268 "create
index on cgonzal.vvv_single_ks_sorted (q3c_ang2ipix(ra,dec));")
[29/270]   at postgres.c:1104       parsetree = 0x230c138       portal = 0x2278798
---Type <return> to continue, or q <return> to quit---       snapshot_set = 0 '\000'       commandTag = 0xa4fc46
"CREATEINDEX"       completionTag = 
"\000\004\020B\376\177\000\000\243b\217\000\000\000\000\000p\004\020B\376\177\000\000\000
\000\000D\000\000\000p\004\020B\376\1
77\000\000\252i\217\000\000\000\000\000\002\000\000\000\002\000\000\000J\000\000\000\000\000\000"       querytree_list
=0x230c458       plantree_list = 0x230c4a8       receiver = 0x230c4d8       format = 0       dest = DestRemote
oldcontext= 0x22ef768       parsetree_list = 0x230c1e8       parsetree_item = 0x230c1c8       save_log_statement_stats
=0 '\000'       was_logged = 0 '\000'       isTopLevel = 1 '\001'       msec_str = 
"\260\004\020B\376\177\000\000\177:\217\000\000\000\000\000\006\000\000\000D\000\000\000h\262\060\002\000\000\000"
__func__ = "exec_simple_query"
 
#20 0x00000000007a69b2 in PostgresMain (argc=1, argv=0x225a220,
dbname=0x225a0d8 "wsdb", username=0x225a0b8 "postgres") at postgres.c:4051       query_string = 0x230b268 "create index
on
cgonzal.vvv_single_ks_sorted (q3c_ang2ipix(ra,dec));"       firstchar = 81       input_message = {data = 0x230b268
"createindex on 
cgonzal.vvv_single_ks_sorted (q3c_ang2ipix(ra,dec));", len = 69, maxlen =
1024,          cursor = 69}       local_sigjmp_buf = {{__jmpbuf = {0, 8080871256293546885, 4657712,
140730006768272, 0, 0, 8080871256322907013, -8081285930984760443},            __mask_was_saved = 1, __saved_mask =
{__val= {0, 36017624, 0, 
0, 0, 0, 1024, 0, 30064771199, 140730006767152, 9473335, 36150008,                36017624, 30064771088, 36150008,
36149992}}}}      send_ready_for_query = 0 '\000'       __func__ = "PostgresMain"
 
#21 0x0000000000732973 in BackendRun (port=0x22a3050) at postmaster.c:4255       av = 0x225a220       maxac = 2
ac= 1       secs = 552065929       usecs = 554900       i = 1       __func__ = "BackendRun"
 
#22 0x0000000000732106 in BackendStartup (port=0x22a3050) at
postmaster.c:3929       bn = 0x22a3230
---Type <return> to continue, or q <return> to quit---       pid = 0       __func__ = "BackendStartup"
#23 0x000000000072ea84 in ServerLoop () at postmaster.c:1699       port = 0x22a3050       i = 4       rmask = {fds_bits
={128, 0 <repeats 15 times>}}       selres = 1       now = 1498750719       readmask = {fds_bits = {248, 0 <repeats 15
times>}}      nSockets = 8       last_lockfile_recheck_time = 1498750679       last_touch_time = 1498750679
__func__= "ServerLoop"
 
#24 0x000000000072e100 in PostmasterMain (argc=3, argv=0x2259310) at
postmaster.c:1307       opt = -1       status = 0       userDoption = 0x227ad40 "/mnt/bigdata/pgdata9.5"
listen_addr_saved= 1 '\001'       i = 64       output_config_variable = 0x0       __func__ = "PostmasterMain"
 
#25 0x000000000068ecda in main (argc=3, argv=0x2259310) at main.c:228       do_check_root = 1 '\001'


---- 

From a quick look of the code it looks to me that the reason for the bug is
the 32 bit int overflow in the j=2*i+1 calculation inside the
tuplesort_heap_siftup leading to negative values of j. 


Regards,         Sergey Koposov



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow

From

Peter Geoghegan

Date:

29 June 2017, 19:34:40

On Thu, Jun 29, 2017 at 9:16 AM,  <skoposov@cmu.edu> wrote:
> I have a very large table (40e9 records) that I'm trying to create the index
> on and I am getting a segmentation fault that could be traced as far as I
> understand to a 32 bit int overflow in tuplesort_heap_siftup

Can your print the Tuplesortstate (the variable "state") within GDB,
and post it here?


-- 
Peter Geoghegan


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bitoverflow

From

Sergey Koposov

Date:

29 June 2017, 19:50:31

On Thu, 2017-06-29 at 09:34 -0700, Peter Geoghegan wrote:
> On Thu, Jun 29, 2017 at 9:16 AM,  <skoposov@cmu.edu> wrote:
> > I have a very large table (40e9 records) that I'm trying to create the index
> > on and I am getting a segmentation fault that could be traced as far as I
> > understand to a 32 bit int overflow in tuplesort_heap_siftup
> 
> Can your print the Tuplesortstate (the variable "state") within GDB,
> and post it here?

Here it is (it is a different run, as I closed the previous session,
but the bug is 100% reproduceable). 

Program received signal SIGSEGV, Segmentation fault.
0x0000000000914cf8 in tuplesort_heap_siftup (state=0x234ffe8,
checkIndex=1 '\001') at tuplesort.c:3014
3014                            HEAPCOMPARE(&memtuples[j], &memtuples[j
+ 1]) > 0)
(gdb) print (state)
$1 = (Tuplesortstate *) 0x234ffe8
(gdb) print (*state)
$2 = {status = TSS_BUILDRUNS, nKeys = 1, randomAccess = 0 '\000',
bounded = 0 '\000', boundUsed = 0 '\000', bound = 0, availMem =
-6442450776,  allowedMem = 75161927680, maxTapes = 262144, tapeRange = 262143,
sortcontext = 0x234e5c8, tapeset = 0x7fbf032a3048,  comparetup = 0x918d8a <comparetup_index_btree>, copytup = 0x919bbc
<copytup_index>, writetup = 0x91a243 <writetup_index>,  readtup = 0x91a31f <readtup_index>, memtuples =
0x7fb283aa1048,
memtupcount = 1342177275, memtupsize = 1342177279, growmemtuples = 0
'\000',  currentRun = 0, mergeactive = 0x7fc80aab9048 "", mergenext =
0x7fc80a23e048, mergelast = 0x7fc809ebc048, mergeavailslots =
0x7fc809dbb048,  mergeavailmem = 0x7fc8096ba048, mergefreelist = 0, mergefirstfree = 0,
Level = 1, destTape = 0, tp_fib = 0x7fc8095b9048, tp_runs =
0x7fc8094b8048,  tp_dummy = 0x7fc8093b7048, tp_tapenum = 0x7fc8081b7048, activeTapes =
0, result_tape = -1, current = 0, eof_reached = 0 '\000', markpos_block
= 0,  markpos_offset = 0, markpos_eof = 0 '\000', tupDesc = 0x0, sortKeys =
0x2350288, onlyKey = 0x0, abbrevNext = 10, indexInfo = 0x0, estate =
0x0,  heapRel = 0x7fd040f32f78, indexRel = 0x7fd040f3b8e0, enforceUnique = 0
'\000', hash_mask = 0, datumType = 0, datumTypeLen = 0,  datumTypeByVal = 0 '\000', ru_start = {tv = {tv_sec = 0,
tv_usec= 0},
 
ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0,        tv_usec = 0}, ru_maxrss = 0, ru_ixrss = 0,
ru_idrss= 0,
 
ru_isrss = 0, ru_minflt = 0, ru_majflt = 0, ru_nswap = 0, ru_inblock =
0,      ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0,
ru_nvcsw = 0, ru_nivcsw = 0}}}


Regards,         Sergey


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow

From

Peter Geoghegan

Date:

29 June 2017, 20:00:41

On Thu, Jun 29, 2017 at 9:16 AM,  <skoposov@cmu.edu> wrote:
> From a quick look of the code it looks to me that the reason for the bug is
> the 32 bit int overflow in the j=2*i+1 calculation inside the
> tuplesort_heap_siftup leading to negative values of j.

It seems likely that the explanation is as simple as that. This
happens during run generation with replacement selection. All versions
are affected, but version 9.6+ is dramatically less likely to be
affected, because replacement selection was all but killed in Postgres
9.6.

This is an oversight in commit 263865a. The fix is to use a variable
that won't overflow in tuplesort_heap_siftup() -- this is probably a
one-liner, because when the variable overflows today, the correct
behavior would be for control to break out of the loop that declares
the overflowing variable "j", and, I don't see any similar problem in
other heap maintenance routines. It's a very isolated problem.

I could write a patch.

-- 
Peter Geoghegan

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bitoverflow

From

Sergey Koposov

Date:

29 June 2017, 20:50:36

On Thu, 2017-06-29 at 10:00 -0700, Peter Geoghegan wrote:
> On Thu, Jun 29, 2017 at 9:16 AM,  <skoposov@cmu.edu> wrote:
> > From a quick look of the code it looks to me that the reason for the bug is
> > the 32 bit int overflow in the j=2*i+1 calculation inside the
> > tuplesort_heap_siftup leading to negative values of j.
> 
> It seems likely that the explanation is as simple as that. This
> happens during run generation with replacement selection. All versions
> are affected, but version 9.6+ is dramatically less likely to be
> affected, because replacement selection was all but killed in Postgres
> 9.6.
> 
> This is an oversight in commit 263865a. The fix is to use a variable
> that won't overflow in tuplesort_heap_siftup() -- this is probably a
> one-liner, because when the variable overflows today, the correct
> behavior would be for control to break out of the loop that declares
> the overflowing variable "j", and, I don't see any similar problem in
> other heap maintenance routines. It's a very isolated problem.
> 
> I could write a patch.

For the time being I've just changed the type of i,j from int to long
(or int64) and I am running the index creation now. I let you submit a
patch -- thanks in advance. 
I also noticed that the availMem variable was negative in the printout
of the TupleSortState.  availMem =-6442450776, 
I don't know whether that's an issue on its own or was caused by the
(i,j) overflow. (availMem seems to be int64 variable though). 
 Sergey


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow

From

Peter Geoghegan

Date:

29 June 2017, 21:10:13

On Thu, Jun 29, 2017 at 10:50 AM, Sergey Koposov <skoposov@cmu.edu> wrote:
> For the time being I've just changed the type of i,j from int to long
> (or int64) and I am running the index creation now. I let you submit a
> patch -- thanks in advance.

That's more or less what I had in mind.

>  I also noticed that the availMem variable was negative in the printout
> of the TupleSortState.
>  availMem =-6442450776,
> I don't know whether that's an issue on its own or was caused by the
> (i,j) overflow. (availMem seems to be int64 variable though).

That's definitely allowed to go negative, which is why it's int64.
That's about 6GB of memory, though, which seems unusually large.

-- 
Peter Geoghegan


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow

From

Peter Geoghegan

Date:

29 June 2017, 21:59:12

On Thu, Jun 29, 2017 at 9:16 AM,  <skoposov@cmu.edu> wrote:
> Here are the commands leading to the crash:
>
> wsdb=# set maintenance_work_mem to '70GB';
>
> SET
> wsdb=# create index on cgonzal.vvv_single_ks_sorted (q3c_ang2ipix(ra,dec));
>
>
> ----
>
> Importantly the table has already been sorted by q3c_ang2ipix(ra,dec) !

BTW, given the exact details of what you're doing here, using only 1MB
of maintenance_work_mem will probably make the operation complete
sooner.


-- 
Peter Geoghegan


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bitoverflow

From

Sergey Koposov

Date:

29 June 2017, 22:56:02

On Thu, 2017-06-29 at 11:59 -0700, Peter Geoghegan wrote:
> On Thu, Jun 29, 2017 at 9:16 AM,  <skoposov@cmu.edu> wrote:
> > Here are the commands leading to the crash:
> >
> > wsdb=# set maintenance_work_mem to '70GB';
> >
> > SET
> > wsdb=# create index on cgonzal.vvv_single_ks_sorted (q3c_ang2ipix(ra,dec));
> >
> >
> > ----
> >
> > Importantly the table has already been sorted by q3c_ang2ipix(ra,dec) !
> 
> BTW, given the exact details of what you're doing here, using only 1MB
> of maintenance_work_mem will probably make the operation complete
> sooner.
> 
> 

Thank you. It does seem to be ~ 50% faster. 
It is somewhat unfortunate that maintenance_work_mem is
counterproductive in this scenario, given that indexing an ordered table
is a natural step in clustering a table...  

Sergey



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bitoverflow

From

Sergey Koposov

Date:

05 July 2017, 22:06:19

On Thu, 2017-06-29 at 10:00 -0700, Peter Geoghegan wrote:
> On Thu, Jun 29, 2017 at 9:16 AM,  <skoposov@cmu.edu> wrote:
> > From a quick look of the code it looks to me that the reason for the bug is
> > the 32 bit int overflow in the j=2*i+1 calculation inside the
> > tuplesort_heap_siftup leading to negative values of j.
> 
> It seems likely that the explanation is as simple as that. This
> happens during run generation with replacement selection. All versions
> are affected, but version 9.6+ is dramatically less likely to be
> affected, because replacement selection was all but killed in Postgres
> 9.6.
> 
> This is an oversight in commit 263865a. The fix is to use a variable
> that won't overflow in tuplesort_heap_siftup() -- this is probably a
> one-liner, because when the variable overflows today, the correct
> behavior would be for control to break out of the loop that declares
> the overflowing variable "j", and, I don't see any similar problem in
> other heap maintenance routines. It's a very isolated problem.
> 
> I could write a patch.

Just to avoid being forgotten, I attach a trivial patch against 9.5
branch as well as have created a commitfest submission
https://commitfest.postgresql.org/14/1189/

The script below allows to reproduce the bug (segfault) and test that
the patch fixes it:  (>~70 GB of RAM are needed and 100+GB of disk
space)
---
create table xx3 as 
 select generate_series as a
    from generate_series(0,(1.5*((1::bigint)<<31))::bigint);
set maintenance_work_mem to '70GB';
create index on xx3(a);
----

Hopefully somebody can take care of patching other PG branches. 

         Sergey

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Attachment

x.patch

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow

From

Tom Lane

Date:

06 July 2017, 01:03:56

Sergey Koposov <skoposov@cmu.edu> writes:
> On Thu, 2017-06-29 at 10:00 -0700, Peter Geoghegan wrote:
>> This is an oversight in commit 263865a. The fix is to use a variable
>> that won't overflow in tuplesort_heap_siftup() -- this is probably a
>> one-liner, because when the variable overflows today, the correct
>> behavior would be for control to break out of the loop that declares
>> the overflowing variable "j", and, I don't see any similar problem in
>> other heap maintenance routines. It's a very isolated problem.
>> 
>> I could write a patch.

> Just to avoid being forgotten, I attach a trivial patch against 9.5
> branch as well as have created a commitfest submission
> https://commitfest.postgresql.org/14/1189/

I don't like s/int/int64/g as a fix for this.  That loop is probably
a hot spot, and this fix is going to be expensive on any machine where
int64 isn't the native word width.  How about something like this instead:

-        int            j = 2 * i + 1;
+        int            j;

+        if (unlikely(i > INT_MAX / 2))
+            break;        /* if j would overflow, we're done */
+        j = 2 * i + 1;    if (j >= n)        break;

        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bitoverflow

From

Andres Freund

Date:

06 July 2017, 01:14:14

On 2017-07-05 18:03:56 -0400, Tom Lane wrote:
> Sergey Koposov <skoposov@cmu.edu> writes:
> > On Thu, 2017-06-29 at 10:00 -0700, Peter Geoghegan wrote:
> >> This is an oversight in commit 263865a. The fix is to use a variable
> >> that won't overflow in tuplesort_heap_siftup() -- this is probably a
> >> one-liner, because when the variable overflows today, the correct
> >> behavior would be for control to break out of the loop that declares
> >> the overflowing variable "j", and, I don't see any similar problem in
> >> other heap maintenance routines. It's a very isolated problem.
> >> 
> >> I could write a patch.
> 
> > Just to avoid being forgotten, I attach a trivial patch against 9.5
> > branch as well as have created a commitfest submission
> > https://commitfest.postgresql.org/14/1189/
> 
> I don't like s/int/int64/g as a fix for this.  That loop is probably
> a hot spot, and this fix is going to be expensive on any machine where
> int64 isn't the native word width.  How about something like this instead:
> 
> -        int            j = 2 * i + 1;
> +        int            j;
> 
> +        if (unlikely(i > INT_MAX / 2))
> +            break;        /* if j would overflow, we're done */
> +        j = 2 * i + 1;
>         if (j >= n)
>             break;

Isn't an added conditional likely going to be more costly than the
s/32/64/ bit calculations on the majority of machines pg runs on? I'm
quite doubtful that it's worth catering for the few cases where that's
really slow.

- Andres


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow

From

Peter Geoghegan

Date:

06 July 2017, 01:19:57

On Wed, Jul 5, 2017 at 3:14 PM, Andres Freund <andres@anarazel.de> wrote:
> Isn't an added conditional likely going to be more costly than the
> s/32/64/ bit calculations on the majority of machines pg runs on? I'm
> quite doubtful that it's worth catering for the few cases where that's
> really slow.

I doubt that myself. Especially prior to Postgres 10, where merging
will have tuplesort_heap_insert() as the bottleneck.

In Postgres 10, tuplesort external sort run merging became much faster
following commit 24598337c8d. It might be noticeable if such a machine
were using Postgres 10 already, and really had something to lose, but
that seems very unlikely.

-- 
Peter Geoghegan

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow

From

Peter Geoghegan

Date:

06 July 2017, 01:35:30

On Wed, Jul 5, 2017 at 3:19 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> I doubt that myself. Especially prior to Postgres 10, where merging
> will have tuplesort_heap_insert() as the bottleneck.

BTW, I do of course understand that the reporter isn't talking about
merging, but rather is talking about run generation using replacement
selection. Replacement selection is more or less obsolete, and was
more or less removed in Postgres 9.6. I don't think that its needs
should be given much weight here; this is mostly about merging.

I wanted to make merging use its own dedicated function within the
patch that became 24598337c8d, and leave tuplesort_heap_insert() and
tuplesort_heap_siftup() as things used only by replacement selection +
Top-N heap sort, but Heikki overruled me on that. He might have been
right about that. I'm not sure, and never took the time to follow up
with it.

tuplesort_heap_replace_top(), which is where this logic lives now that
tuplesort_heap_siftup() was broken up, certainly is a red-hot code
path where individual instructions could matter. I've looked at the
disassembly of the function in the past, which is rare for me.

-- 
Peter Geoghegan

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bitoverflow

From

Heikki Linnakangas

Date:

12 July 2017, 16:15:10

On 07/06/2017 01:14 AM, Andres Freund wrote:
> On 2017-07-05 18:03:56 -0400, Tom Lane wrote:
>> I don't like s/int/int64/g as a fix for this.  That loop is probably
>> a hot spot, and this fix is going to be expensive on any machine where
>> int64 isn't the native word width.  How about something like this instead:
>>
>> -        int            j = 2 * i + 1;
>> +        int            j;
>>
>> +        if (unlikely(i > INT_MAX / 2))
>> +            break;        /* if j would overflow, we're done */
>> +        j = 2 * i + 1;
>>         if (j >= n)
>>             break;
>
> Isn't an added conditional likely going to be more costly than the
> s/32/64/ bit calculations on the majority of machines pg runs on? I'm
> quite doubtful that it's worth catering for the few cases where that's
> really slow.

Another option to use "unsigned int", on the assumption that UINT_MAX >= 
INT_MAX * 2 + 1. And to eliminate that assumption, we can use (UINT_MAX 
- 1) / 2 as the maximum size of the memtuples array, rather than INT_MAX.

- Heikki



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow

From

Tom Lane

Date:

12 July 2017, 18:42:04

Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 07/06/2017 01:14 AM, Andres Freund wrote:
>> On 2017-07-05 18:03:56 -0400, Tom Lane wrote:
>>> I don't like s/int/int64/g as a fix for this.  That loop is probably
>>> a hot spot, and this fix is going to be expensive on any machine where
>>> int64 isn't the native word width.  How about something like this instead:

> Another option to use "unsigned int", on the assumption that UINT_MAX >= 
> INT_MAX * 2 + 1.

Ah, that seems like a fine idea.

> And to eliminate that assumption, we can use (UINT_MAX 
> - 1) / 2 as the maximum size of the memtuples array, rather than INT_MAX.

Uh ... what assumption?  That's certainly true on any twos-complement
machine.  Besides, if you're worried about hypothetical portability
issues, I'm not sure it's any better to assume that (UINT_MAX - 1) / 2
fits in a signed int.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow

From

Peter Geoghegan

Date:

12 July 2017, 18:47:57

On Wed, Jul 12, 2017 at 8:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Another option to use "unsigned int", on the assumption that UINT_MAX >=
>> INT_MAX * 2 + 1.
>
> Ah, that seems like a fine idea.

Works for me.

-- 
Peter Geoghegan


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow

From

Tom Lane

Date:

12 July 2017, 19:14:54

Peter Geoghegan <pg@bowt.ie> writes:
> On Wed, Jul 12, 2017 at 8:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Another option to use "unsigned int", on the assumption that UINT_MAX >=
>>> INT_MAX * 2 + 1.

>> Ah, that seems like a fine idea.

> Works for me.

I'll go make it so, unless Heikki's already on it?
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bitoverflow

From

Heikki Linnakangas

Date:

12 July 2017, 19:20:53

On 07/12/2017 07:14 PM, Tom Lane wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
>> On Wed, Jul 12, 2017 at 8:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Another option to use "unsigned int", on the assumption that UINT_MAX >=
>>>> INT_MAX * 2 + 1.
>
>>> Ah, that seems like a fine idea.
>
>> Works for me.
>
> I'll go make it so, unless Heikki's already on it?

I'm not. Thanks!

>> And to eliminate that assumption, we can use (UINT_MAX
>> - 1) / 2 as the maximum size of the memtuples array, rather than INT_MAX.
>
> Uh ... what assumption?  That's certainly true on any twos-complement
> machine.  Besides, if you're worried about hypothetical portability
> issues, ...

Right, it's a hypothetical portability issue. The assumption we're 
making is that UINT_MAX >= INT_MAX * 2 + 1. I'm not aware of any system 
where it's not true, but I don't know what the C standards say about that.

> ... I'm not sure it's any better to assume that (UINT_MAX - 1) / 2
> fits in a signed int.

Well, you could do Min(INT_MAX, (UINT_MAX - 1 / 2). Or just add a 
StaticAssertion for it. Or just note in a comment that we're making that 
assumption.

- Heikki



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow

From

Tom Lane

Date:

12 July 2017, 19:46:56

Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 07/12/2017 07:14 PM, Tom Lane wrote:
>> Uh ... what assumption?  That's certainly true on any twos-complement
>> machine.  Besides, if you're worried about hypothetical portability
>> issues, ...

> Right, it's a hypothetical portability issue. The assumption we're 
> making is that UINT_MAX >= INT_MAX * 2 + 1. I'm not aware of any system 
> where it's not true, but I don't know what the C standards say about that.

Actually, at the top of the loop we have i < n <= INT_MAX, so the
assumption only needs to be UINT_MAX >= INT_MAX * 2 - 1.

My copy of the C99 draft says, among other things

6.2.5  Types
      [#6] For each of  the  signed  integer  types,  there  is  a      corresponding   (but   different)   unsigned
integer type      (designated with the keyword unsigned) that  uses  the  same      amount  of  storage (including sign
information)and has the      same  alignment  requirements.
 
      [#9] The range of nonnegative values  of  a  signed  integer      type  is  a  subrange  of the corresponding
unsignedinteger      type, and the representation of the same value in each  type      is the same.
 

6.2.6.2  Integer types
      [#1]  For  unsigned  integer types other than unsigned char,      the bits of the object representation shall be
divided into      two  groups:  value bits and padding bits (there need not be      any of the latter).  If there are N
value  bits,  each  bit      shall  represent  a different power of 2 between 1 and 2N-1,      so  that  objects  of
that type  shall   be   capable   of      representing  values  from  0  to  2N-1  using a pure binary
representation; this  shall   be   known   as   the   value      representation.    The   values  of  any  padding
bits are      unspecified.37)
 
      [#2]  For  signed  integer  types,  the  bits  of the object      representation shall be divided  into  three
groups: value      bits, padding bits, and the sign bit.  There need not be any      padding bits; there shall be
exactlyone sign bit.  Each bit      that  is  a  value bit shall have the same value as the same      bit  in  the
object representation  of  the  corresponding      unsigned  type (if there are M value bits in the signed type
andN in the unsigned type, then M<=N).  If the sign bit  is      zero,  it shall not affect the resulting value.  If
thesign      bit is one, then the value shall be modified in one  of  the      following ways:
 
        -- the corresponding value with sign bit 0 is negated;
        -- the sign bit has the value -2N;
        -- the sign bit has the value 1-2N.
      [#3]  The  values of any padding bits are unspecified.

Both 6.2.5.9 and 6.2.6.2.2 constrain INT_MAX to be <= UINT_MAX.
In principle, you could have a conforming implementation in which
they were the same, and the sign bit of a signed integer was treated
as a useless padding bit in an unsigned integer.  I can't imagine
anyone actually building it that way though; what would be the point?
But as long as there aren't wasted bits in an unsigned int, these
rules require INT_MAX to be <= UINT_MAX/2, AFAICS.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow

From

Peter Geoghegan

Date:

12 July 2017, 19:48:29

On Wed, Jul 12, 2017 at 9:20 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> Uh ... what assumption?  That's certainly true on any twos-complement
>> machine.  Besides, if you're worried about hypothetical portability
>> issues, ...
>
>
> Right, it's a hypothetical portability issue. The assumption we're making is
> that UINT_MAX >= INT_MAX * 2 + 1. I'm not aware of any system where it's not
> true, but I don't know what the C standards say about that.

Intuitively, it seems very likely to be true, since two's complement
arithmetic is already assumed by Postgres, and addition and
multiplication work the same way at the instruction level for unsigned
and signed (two's complement) integers. In order for this to break,
int and unsigned int would have to have different widths.

> Well, you could do Min(INT_MAX, (UINT_MAX - 1 / 2). Or just add a
> StaticAssertion for it. Or just note in a comment that we're making that
> assumption.

I like the idea of a StaticAssertion().

-- 
Peter Geoghegan

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow

From

Tom Lane

Date:

12 July 2017, 19:53:08

Peter Geoghegan <pg@bowt.ie> writes:
> On Wed, Jul 12, 2017 at 9:20 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> Right, it's a hypothetical portability issue. The assumption we're making is
>> that UINT_MAX >= INT_MAX * 2 + 1. I'm not aware of any system where it's not
>> true, but I don't know what the C standards say about that.

> Intuitively, it seems very likely to be true, since two's complement
> arithmetic is already assumed by Postgres,

... right.  There haven't been any non-twos-complement machines in the
wild for probably 50 years, and even if there were, this would be *way*
down the list of problems you'd have to fix to get Postgres to run on
one of them.  I think a comment is plenty sufficient.
        regards, tom lane


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow

From

Peter Geoghegan

Date:

12 July 2017, 20:16:02

On Wed, Jul 12, 2017 at 6:15 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Another option to use "unsigned int", on the assumption that UINT_MAX >=
> INT_MAX * 2 + 1. And to eliminate that assumption, we can use (UINT_MAX - 1)
> / 2 as the maximum size of the memtuples array, rather than INT_MAX.

FWIW, memtupcount is allowed to go negative. It just won't in this
function, per the assertion. So the fix should be specifically scoped
to only the one function that is affected. That's a good idea anyway,
of course.

-- 
Peter Geoghegan

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bitoverflow

From

Heikki Linnakangas

Date:

12 July 2017, 20:42:55

On 07/12/2017 08:16 PM, Peter Geoghegan wrote:
> On Wed, Jul 12, 2017 at 6:15 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> Another option to use "unsigned int", on the assumption that UINT_MAX >=
>> INT_MAX * 2 + 1. And to eliminate that assumption, we can use (UINT_MAX - 1)
>> / 2 as the maximum size of the memtuples array, rather than INT_MAX.
>
> FWIW, memtupcount is allowed to go negative.

Huh, say what?

- Heikki



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow

From

Peter Geoghegan

Date:

12 July 2017, 20:44:33

On Wed, Jul 12, 2017 at 10:42 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> FWIW, memtupcount is allowed to go negative.
>
>
> Huh, say what?

Well, there are defenses against that case, both in assertions and in
code like tuplesort_heap_delete_top(). In that sense, it's allowed to
go negative.

-- 
Peter Geoghegan

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bitoverflow

From

David Gould

Date:

15 July 2017, 00:17:45

On Wed, 12 Jul 2017 12:53:08 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> ... right.  There haven't been any non-twos-complement machines in the
> wild for probably 50 years, and even if there were, this would be *way*
> down the list of problems you'd have to fix to get Postgres to run on
> one of them.

Not quite 50 years. In 1979 had the "pleasure" of working at Bechtel on a
Univac 1110. Univac 1100 seris are ones-complement (with both positive
and negative zero!) with 36 bit longs, 18 bit ints and depending on character
mode either 9 bit ASCII or 6 bit FIELDDATA chars. 

Not even one year. UNISYS are still marketing this architecture as the UNISYS
ClearPath IX series, you can order one today.

Apparently it is true in computing that nothing ever dies.

These were actually fascinatingly weird machines, almost everything is
different from what we are used to:

https://en.wikipedia.org/wiki/UNIVAC_1100/2200_series

Still, I think it is safe to wait until someone actually pays for a
postgresql port before considering ones-complement issues.

-dg

-- 
David Gould                                   daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow

From

Tom Lane

Date:

15 July 2017, 00:45:30

David Gould <daveg@sonic.net> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... right.  There haven't been any non-twos-complement machines in the
>> wild for probably 50 years, and even if there were, this would be *way*
>> down the list of problems you'd have to fix to get Postgres to run on
>> one of them.

> Not quite 50 years. In 1979 had the "pleasure" of working at Bechtel on a
> Univac 1110. Univac 1100 seris are ones-complement (with both positive
> and negative zero!) with 36 bit longs, 18 bit ints and depending on character
> mode either 9 bit ASCII or 6 bit FIELDDATA chars.

Fun stuff.  Other than the ones-complement choice, this smells quite a bit
like the PDP-10 gear I used to use back when.

> Not even one year. UNISYS are still marketing this architecture as the UNISYS
> ClearPath IX series, you can order one today.

The recent models claim to be pure Intel though; if they're still
supporting the 1100 architecture, it must be through emulation.

> Still, I think it is safe to wait until someone actually pays for a
> postgresql port before considering ones-complement issues.

Yeah.  I actually suspect that the weird word size would be a much bigger
headache for us than the ones-complement business.  The other small
problem is that as far as I could find, they never went past a 24-bit
address space, which would make for at most 64MB worth of memory (with
9-bit "bytes").  In principle you could probably run modern Postgres
with so little RAM, but it wouldn't be of any real use.
        regards, tom lane


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs