Seg fault when processing large SPI cursor (PG9.13) - Mailing list pgsql-hackers
From | Fields, Zachary J. (MU-Student) |
---|---|
Subject | Seg fault when processing large SPI cursor (PG9.13) |
Date | |
Msg-id | CF524B464816BB469FEBFD9AD660E92E3EF88ACF@CH1PRD0102MB112.prod.exchangelabs.com Whole thread Raw |
Responses |
Re: Seg fault when processing large SPI cursor (PG9.13)
(Tom Lane <tgl@sss.pgh.pa.us>)
|
List | pgsql-hackers |
<div style="direction: ltr;font-family: Tahoma;color: #000000;font-size: 10pt;">I'm working on PostgreSQL 9.13 (waiting foradmin to push upgrades next week), in the meanwhile, I was curious if there are any known bugs regarding large cursorfetches, or if I am to blame.<br /><br /> My cursor has 400 million records, and I'm fetching in blocks of 2^17 (approx.130K). When I fetch the next block after processing the 48,889,856th record, then the DB seg faults. It should benoted, I have processed tables with 23 million+ records several times and everything appears to work great.<br /><br />I have watched top, and the system memory usage gets up to 97.6% (from approx 30 million records onward - then sways upand down), but ultimately crashes when I try to get past the 48,889,856th record. I have tried odd and various block sizes,anything greater than 2^17 crashes at the fetch that would have it surpassed 48,889,856 records, 2^16 hits the samesweet spot, and anything less than 2^16 actually crashes slightly earlier (noted in comments in code below).<br /><br/> To me, it appears to be an obvious memory leak, the question is who caused it. I would typically assume I am to blame(and I may be), but the code is so simple (shown below) that I can't see how it could be me - unless I am misusing pg-sql(which is totally possible).<br /><br /> Here is the code segment that is crashing...<br /><br /> <code><br /> // Cursor variables<br /> const char *cursor_name = NULL; // Postgres will self-assign a name<br /> constint arg_count = 0; // No arguments will be passed<br /> Oid *arg_types = NULL; // n/a<br /> Datum *arg_values= NULL; // n/a<br /> const char *null_args = NULL; // n/a<br /> bool read_only = true; // read_onlyallows for optimization<br /> const int cursor_opts = CURSOR_OPT_NO_SCROLL; // default cursor options<br /> bool forward = true;<br /> //const long fetch_count = FETCH_ALL;<br /> //const long fetch_count = 1048576; // 2^20 - last processed = 48,234,496<br /> //const long fetch_count = 524288; // 2^19 - last processed = 48,758,784<br/> //const long fetch_count = 262144; // 2^18 - last processed = 48,758,784<br /> const long fetch_count= 131072; // 2^17 - last processed = 48,889,856<br /> //const long fetch_count = 65536; // 2^16 - last processed= 48,889,856<br /> //const long fetch_count = 32768; // 2^15 - last processed = 48,857,088<br /> //constlong fetch_count = 16384; // 2^14 - last processed = 48,791,552<br /> //const long fetch_count = 8192; // 2^13- last processed = 48,660,480<br /> //const long fetch_count = 4096; // 2^12 - last processed = 48,398,336<br /> //const long fetch_count = 2048; // 2^11<br /> //const long fetch_count = 1024; // 2^10<br /> //const longfetch_count = 512; // 2^9<br /> //const long fetch_count = 256; // 2^8<br /> //const long fetch_count = 128; // 2^7<br /> //const long fetch_count = 64; // 2^6<br /> //const long fetch_count = 32; // 2^5<br /> //constlong fetch_count = 16; // 2^4<br /> //const long fetch_count = 8; // 2^3<br /> //const long fetch_count= 4; // 2^2<br /> //const long fetch_count = 2; // 2^1<br /> //const long fetch_count = 1; // 2^0<br/><br /> unsigned int i, j, end, stored;<br /> unsigned int result_counter = 0;<br /> float8 l1_norm;<br/> bool is_null = true;<br /> bool nulls[4];<br /> Datum result_tuple_datum[4];<br /> HeapTuplenew_tuple;<br /> MemoryContext function_context;<br /><br /> ResultCandidate *candidate, **candidates, *top,*free_candidate = NULL;<br /> KSieve<ResultCandidate *> sieve(result_cnt_);<br /><br /> /*********************/<br/> /** Init SPI_cursor **/<br /> /*********************/<br /><br /> // Connect toSPI<br /> if ( SPI_connect() != SPI_OK_CONNECT ) { return; }<br /><br /> // Prepare and open SPI cursor<br /> Portal signature_cursor = SPI_cursor_open_with_args(cursor_name, sql_stmt_, arg_count, arg_types, arg_values, null_args,read_only, cursor_opts);<br /><br /> do {<br /> // Fetch rows for processing (Populates SPI_processedand SPI_tuptable)<br /> SPI_cursor_fetch(signature_cursor, forward, fetch_count);<br /><br /> /************************/<br /> /** Process SPI_cursor **/<br /> /************************/<br /><br/> // Iterate cursor and perform calculations<br /> for (i = 0 ; i < SPI_processed ; ++i) {<br /> // Transfer columns to work array<br /> for ( j = 1 ; j < 4 ; ++j ) {<br /> result_tuple_datum[j-1] = SPI_getbinval(SPI_tuptable->vals[i], SPI_tuptable->tupdesc, j, &is_null);<br /> nulls[j-1] = is_null;<br /> }<br /><br /> // Special Handling for final column<br/> Datum raw_double_array = SPI_getbinval(SPI_tuptable->vals[i], SPI_tuptable->tupdesc, 4, &is_null);<br/> nulls[3] = is_null;<br /> if ( is_null ) {<br /> l1_norm = FLT_MAX;<br/> result_tuple_datum[3] = PointerGetDatum(NULL);<br /> } else {<br /> // Transform binary into double array<br /> ArrayType *pg_double_array = DatumGetArrayTypeP(raw_double_array);<br/> l1_norm = meanAbsoluteError(signature_, (double *)ARR_DATA_PTR(pg_double_array),(ARR_DIMS(pg_double_array))[0], 0);<br /> result_tuple_datum[3] = Float8GetDatum(l1_norm);<br/> }<br /><br /> // Create and test candidate<br /> if ( free_candidate) {<br /> candidate = free_candidate;<br /> free_candidate = NULL;<br /> } else {<br /> candidate = (ResultCandidate *)palloc(sizeof(ResultCandidate));<br /> }<br /> (*candidate).lat = DatumGetFloat8(result_tuple_datum[0]);<br /> (*candidate).null_lat= nulls[0];<br /> (*candidate).lon = DatumGetFloat8(result_tuple_datum[1]);<br /> (*candidate).null_lon = nulls[1];<br /> (*candidate).orientation = DatumGetFloat8(result_tuple_datum[2]);<br/> (*candidate).null_orientation = nulls[2];<br /> (*candidate).rank= l1_norm;<br /> (*candidate).null_rank = nulls[3];<br /><br /> // Run candidatethrough sieve<br /> top = sieve.top();<br /> if ( !sieve.siftItem(candidate) ) {<br /> // Free non-filtered candidates<br /> free_candidate = candidate;<br /> } elseif ( sieve.size() == result_cnt_ ) {<br /> // Free non-filtered candidates<br /> free_candidate= top;<br /> }<br /> }<br /> result_counter += i;<br /> } while ( SPI_processed);<br /><br /> SPI_finish();<br /> </code><br /><br /> Is there an obvious error I'm overlooking,or is there a known bug (PG9.13) for large fetch sizes?<br /><br /> Thanks,<br /> Zak<br /><br /> P.S. KSieveis POD encapsulating an array that has been allocated with palloc().<br /></div>
pgsql-hackers by date: