Seg fault when processing large SPI cursor (PG9.13) - Mailing list pgsql-hackers

From Fields, Zachary J. (MU-Student)
Subject Seg fault when processing large SPI cursor (PG9.13)
Date
Msg-id CF524B464816BB469FEBFD9AD660E92E3EF88ACF@CH1PRD0102MB112.prod.exchangelabs.com
Whole thread Raw
Responses Re: Seg fault when processing large SPI cursor (PG9.13)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
<div style="direction: ltr;font-family: Tahoma;color: #000000;font-size: 10pt;">I'm working on PostgreSQL 9.13 (waiting
foradmin to push upgrades next week), in the meanwhile, I was curious if there are any known bugs regarding large
cursorfetches, or if I am to blame.<br /><br /> My cursor has 400 million records, and I'm fetching in blocks of 2^17
(approx.130K). When I fetch the next block after processing the 48,889,856th record, then the DB seg faults. It should
benoted, I have processed tables with 23 million+ records several times and everything appears to work great.<br /><br
/>I have watched top, and the system memory usage gets up to 97.6% (from approx 30 million records onward - then sways
upand down), but ultimately crashes when I try to get past the 48,889,856th record. I have tried odd and various block
sizes,anything greater than 2^17 crashes at the fetch that would have it surpassed 48,889,856 records, 2^16 hits the
samesweet spot, and anything less than 2^16 actually crashes slightly earlier (noted in comments in code below).<br
/><br/> To me, it appears to be an obvious memory leak, the question is who caused it. I would typically assume I am to
blame(and I may be), but the code is so simple (shown below) that I can't see how it could be me - unless I am misusing
pg-sql(which is totally possible).<br /><br /> Here is the code segment that is crashing...<br /><br /> <code><br
/>    // Cursor variables<br />     const char *cursor_name = NULL;  // Postgres will self-assign a name<br />    
constint arg_count = 0;  // No arguments will be passed<br />     Oid *arg_types = NULL;  // n/a<br />     Datum
*arg_values= NULL;  // n/a<br />     const char *null_args = NULL;  // n/a<br />     bool read_only = true;  //
read_onlyallows for optimization<br />     const int cursor_opts = CURSOR_OPT_NO_SCROLL;  // default cursor options<br
/>    bool forward = true;<br />     //const long fetch_count = FETCH_ALL;<br />     //const long fetch_count =
1048576; // 2^20 - last processed = 48,234,496<br />     //const long fetch_count = 524288;  // 2^19 - last processed =
48,758,784<br/>     //const long fetch_count = 262144;  // 2^18 - last processed = 48,758,784<br />     const long
fetch_count= 131072;  // 2^17 - last processed = 48,889,856<br />     //const long fetch_count = 65536;  // 2^16 - last
processed= 48,889,856<br />     //const long fetch_count = 32768;  // 2^15 - last processed = 48,857,088<br />    
//constlong fetch_count = 16384;  // 2^14 - last processed = 48,791,552<br />     //const long fetch_count = 8192;  //
2^13- last processed = 48,660,480<br />     //const long fetch_count = 4096;  // 2^12 - last processed = 48,398,336<br
/>    //const long fetch_count = 2048;  // 2^11<br />     //const long fetch_count = 1024;  // 2^10<br />     //const
longfetch_count = 512;  // 2^9<br />     //const long fetch_count = 256;  // 2^8<br />     //const long fetch_count =
128; // 2^7<br />     //const long fetch_count = 64;  // 2^6<br />     //const long fetch_count = 32;  // 2^5<br />    
//constlong fetch_count = 16;  // 2^4<br />     //const long fetch_count = 8;  // 2^3<br />     //const long
fetch_count= 4;  // 2^2<br />     //const long fetch_count = 2;  // 2^1<br />     //const long fetch_count = 1;  //
2^0<br/><br />     unsigned int i, j, end, stored;<br />     unsigned int result_counter = 0;<br />     float8
l1_norm;<br/>     bool is_null = true;<br />     bool nulls[4];<br />     Datum result_tuple_datum[4];<br />    
HeapTuplenew_tuple;<br />     MemoryContext function_context;<br /><br />     ResultCandidate *candidate, **candidates,
*top,*free_candidate = NULL;<br />     KSieve<ResultCandidate *> sieve(result_cnt_);<br /><br />      
/*********************/<br/>      /** Init SPI_cursor **/<br />     /*********************/<br /><br />     // Connect
toSPI<br />     if ( SPI_connect() != SPI_OK_CONNECT ) { return; }<br /><br />     // Prepare and open SPI cursor<br />
   Portal signature_cursor = SPI_cursor_open_with_args(cursor_name, sql_stmt_, arg_count, arg_types, arg_values,
null_args,read_only, cursor_opts);<br /><br />     do {<br />         // Fetch rows for processing (Populates
SPI_processedand SPI_tuptable)<br />         SPI_cursor_fetch(signature_cursor, forward, fetch_count);<br /><br />    
     /************************/<br />          /** Process SPI_cursor **/<br />         /************************/<br
/><br/>         // Iterate cursor and perform calculations<br />         for (i = 0 ; i < SPI_processed ; ++i) {<br
/>            // Transfer columns to work array<br />             for ( j = 1 ; j < 4 ; ++j ) {<br />            
   result_tuple_datum[j-1] = SPI_getbinval(SPI_tuptable->vals[i], SPI_tuptable->tupdesc, j, &is_null);<br />
               nulls[j-1] = is_null;<br />             }<br /><br />             // Special Handling for final
column<br/>             Datum raw_double_array = SPI_getbinval(SPI_tuptable->vals[i], SPI_tuptable->tupdesc, 4,
&is_null);<br/>             nulls[3] = is_null;<br />             if ( is_null ) {<br />                 l1_norm =
FLT_MAX;<br/>                 result_tuple_datum[3] = PointerGetDatum(NULL);<br />             } else {<br />        
       // Transform binary into double array<br />                 ArrayType *pg_double_array =
DatumGetArrayTypeP(raw_double_array);<br/>                 l1_norm = meanAbsoluteError(signature_, (double
*)ARR_DATA_PTR(pg_double_array),(ARR_DIMS(pg_double_array))[0], 0);<br />                 result_tuple_datum[3] =
Float8GetDatum(l1_norm);<br/>             }<br /><br />             // Create and test candidate<br />             if (
free_candidate) {<br />                 candidate = free_candidate;<br />                 free_candidate = NULL;<br />
           } else {<br />                 candidate = (ResultCandidate *)palloc(sizeof(ResultCandidate));<br />        
   }<br />             (*candidate).lat = DatumGetFloat8(result_tuple_datum[0]);<br />            
(*candidate).null_lat= nulls[0];<br />             (*candidate).lon = DatumGetFloat8(result_tuple_datum[1]);<br />    
       (*candidate).null_lon = nulls[1];<br />             (*candidate).orientation =
DatumGetFloat8(result_tuple_datum[2]);<br/>             (*candidate).null_orientation = nulls[2];<br />            
(*candidate).rank= l1_norm;<br />             (*candidate).null_rank = nulls[3];<br /><br />             // Run
candidatethrough sieve<br />             top = sieve.top();<br />             if ( !sieve.siftItem(candidate) ) {<br />
               // Free non-filtered candidates<br />                 free_candidate = candidate;<br />             }
elseif ( sieve.size() == result_cnt_ ) {<br />                 // Free non-filtered candidates<br />                
free_candidate= top;<br />             }<br />         }<br />         result_counter += i;<br />     } while (
SPI_processed);<br /><br />     SPI_finish();<br /> </code><br /><br /> Is there an obvious error I'm
overlooking,or is there a known bug (PG9.13) for large fetch sizes?<br /><br /> Thanks,<br /> Zak<br /><br /> P.S.
KSieveis POD encapsulating an array that has been allocated with palloc().<br /></div> 

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Partial patch status update, 3/3/13
Next
From: Cliff_Bytes
Date:
Subject: Re: LIBPQ Implementation Requiring BYTEA Data