Thread: [BUG] Segmentation fault in pfree in PLy_output_tuple_funcs

[BUG] Segmentation fault in pfree in PLy_output_tuple_funcs

From
Sergey Konoplev
Date:
Hi,

PostgreSQL 9.2.4
Linux 2.6.32

listen_addresses =3D '*'
max_connections =3D 550
shared_buffers =3D 35GB
work_mem =3D 256MB
maintenance_work_mem =3D 1GB
shared_preload_libraries =3D 'pg_stat_statements'
bgwriter_delay =3D 10ms
bgwriter_lru_multiplier =3D 10.0
effective_io_concurrency =3D 32
wal_level =3D hot_standby
synchronous_commit =3D off
checkpoint_segments =3D 1024
checkpoint_timeout =3D 1h
checkpoint_completion_target =3D 0.9
checkpoint_warning =3D 5min
max_wal_senders =3D 3
wal_keep_segments =3D 2048
hot_standby =3D on
max_standby_streaming_delay =3D 5min
hot_standby_feedback =3D on
effective_cache_size =3D 133GB
log_destination =3D 'stderr'
logging_collector =3D on
log_directory =3D '/var/log/pgsql'
log_filename =3D 'postgresql-%Y-%m-%d.log'
log_truncate_on_rotation =3D on
log_rotation_age =3D 1d
log_rotation_size =3D 0
log_checkpoints =3D on
log_line_prefix =3D '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] '
log_lock_waits =3D on
log_statement =3D 'ddl'
log_timezone =3D 'W-SU'
track_activity_query_size =3D 4096
autovacuum_max_workers =3D 5
autovacuum_naptime =3D 5s
autovacuum_vacuum_scale_factor =3D 0.05
autovacuum_analyze_scale_factor =3D 0.05
autovacuum_vacuum_cost_delay =3D 5ms
datestyle =3D 'iso, dmy'
timezone =3D 'W-SU'
lc_messages =3D 'en_US.UTF-8'
lc_monetary =3D 'ru_RU.UTF-8'
lc_numeric =3D 'ru_RU.UTF-8'
lc_time =3D 'ru_RU.UTF-8'
default_text_search_config =3D 'pg_catalog.russian'
pg_stat_statements.track =3D all

Yesterday we faced a segfault on a simple PL/Python function's call:

CREATE OR REPLACE FUNCTION test_test()
RETURNS TABLE("=D0=98=D0=B4=D0=B5=D0=BD=D1=82=D0=B8=D1=84=D0=B8=D0=BA=D1=82=
=D0=BE=D1=80" int, "=D0=93=D0=BE=D1=81. =D0=BD=D0=BE=D0=BC=D0=B5=D1=80" var=
char) AS
$BODY$
import time
cars =3D plpy.prepare("""
    select id, ident_code from cars limit 100
""")

for car in plpy.execute(cars):
    return car;
$BODY$
LANGUAGE plpythonu VOLATILE
COST 100
ROWS 1000;

It has happened just once and we didn't manage to reproduce this
problem. We have a huge amount of PL/Python code in the database and
have never had this kind of problem before.

Here is the backtrace from the coredump:

(gdb) bt f
#0  0x00000000007381d7 in pfree (pointer=3D0x7eff56a141a0) at mcxt.c:659
        header =3D 0x7eff56a14190
#1  0x00007eff56a14927 in PLy_output_tuple_funcs (arg=3D0x1ad6960,
desc=3D0x19d9520) at plpy_typeio.c:199
        i =3D <value optimized out>
        __func__ =3D "PLy_output_tuple_funcs"
#2  0x00007eff56a0f155 in PLy_function_build_args
(fcinfo=3D0x7fffa59cb6b0, proc=3D0x1ad6940) at plpy_exec.c:416
        desc =3D 0x19d9520
        save_exception_stack =3D 0x7fffa59cb420
        save_context_stack =3D 0x7fffa59cb630
        local_sigjmp_buf =3D {{
            __jmpbuf =3D {28236528, 5106765649720773924, 1, 28238888,
140735971899056, 140735971900064, -5106611986378781404,
-4962882071449170652},
            __mask_was_saved =3D 0,
            __saved_mask =3D {
              __val =3D {0 <repeats 16 times>}
            }
          }}
        arg =3D 0x0
        args =3D 0x7eff62979ef0
        i =3D <value optimized out>
        __func__ =3D "PLy_function_build_args"
#3  0x00007eff56a0f437 in PLy_exec_function (fcinfo=3D0x7fffa59cb6b0,
proc=3D0x1ad6940) at plpy_exec.c:59
        save_exception_stack =3D 0x7fffa59cb560
        save_context_stack =3D 0x7fffa59cb630
        local_sigjmp_buf =3D {{
            __jmpbuf =3D {28236528, 5106765649720773924, 1, 28238888,
140735971899056, 140735971900064, -5106611986395558620,
-4962882071897174748},
            __mask_was_saved =3D 0,
            __saved_mask =3D {
              __val =3D {13483862011252313380, 0, 0, 0, 192, 11739232,
601, 26726112, 139635135180228, 139635135180228, 7570912, 0, 26987960,
0, 8192, 8388608}
            }
          }}
        rv =3D <value optimized out>
        plargs =3D 0x0
        plrv =3D 0x0
        plerrcontext =3D {
          previous =3D 0x7fffa59cbaa0,
          callback =3D 0x7375b6 <AllocSetContextCreate+54>,
          arg =3D 0x1ad6940
        }
        __func__ =3D "PLy_exec_function"
#4  0x00007eff56a0ff04 in plpython_call_handler (fcinfo=3D0x7fffa59cb6b0)
at plpy_main.c:236
        funcoid =3D <value optimized out>
        proc =3D <value optimized out>
        save_exception_stack =3D 0x7fffa59cbc70
        save_context_stack =3D <value optimized out>
        local_sigjmp_buf =3D {{
            __jmpbuf =3D {28236528, 5106765649720773924, 1, 28238888,
140735971899056, 140735971900064, -5106611986208912092,
-4962882072010945244},
            __mask_was_saved =3D 0,
            __saved_mask =3D {
              __val =3D {0, 0, 281474976710656, 1043, 28238888,
3324408277, 28238888, 3324408277, 26985416, 0, 0, 28238920, 5790802, 0,
0, 0}
            }
          }}
---Type <return> to continue, or q <return> to quit---
        retval =3D <value optimized out>
        exec_ctx =3D 0x19bcdb8
        plerrcontext =3D {
          previous =3D 0x0,
          callback =3D 0x7eff56a0ffb0 <plpython_error_callback>,
          arg =3D 0x1aee428
        }
        __func__ =3D "plpython_call_handler"
#5  0x0000000000587fd5 in ExecMakeTableFunctionResult
(funcexpr=3D0x1aee428, econtext=3D0x1aedaf0, expectedDesc=3D0x1aee1e0,
randomAccess=3D0 '\000') at execQual.c:2156
        result =3D 5967294
        tupstore =3D 0x0
        tupdesc =3D 0x0
        funcrettype =3D 2249
        returnsTuple =3D 1 '\001'
        returnsSet =3D 1 '\001'
        fcinfo =3D {
          flinfo =3D 0x1aee448,
          context =3D 0x0,
          resultinfo =3D 0x7fffa59cba60,
          fncollation =3D 0,
          isnull =3D 0 '\000',
          nargs =3D 0,
          arg =3D {25859072, 0, 1043, 0 <repeats 20 times>, 7489698, 0,
0, 0, 64, 140735971899536, 4586382, 1, 2, 139635337082480, 7471000, 0,
0, 0, 420013441810432, 3324408277, 0,
            4294967295, 7433724, 25962000, 2, 8, 7482628, 0,
140735971899532, 0, 25962288, 8589934592, 25962040, 25962200, 25962176,
25962144, 8589934592, 4479650889751, 40, 25856768,
            1043, 4294967295, 100, 0, 7570165, 0, 25962928, 25952256,
25962288, 0, 140735971899960, 25962456, 5991898, 0, 26567760,
140735971852288, 7372282, 8589934594, 26567760, 1043,
            26567760, 140735971899792, 26567760, 140735971899856,
7372282, 26786736, 26567760, 1043, 26567760, 140735971899856, 7378619,
6981568, 140735971899856, 2, 0, 139635337081568,
            287762808832, 1125891316908032, 0, 6981568, 281483566645432},
          argnull =3D "\002", '\000' <repeats 16 times>"\224, \212\001",
'\000' <repeats 12 times>, "\023\004", '\000' <repeats 65 times>
        }
        fcusage =3D {
          fs =3D 0x0,
          save_f_total_time =3D {
            tv_sec =3D 0,
            tv_usec =3D 0
          },
          save_total =3D {
            tv_sec =3D 0,
            tv_usec =3D 0
          },
          f_start =3D {
            tv_sec =3D 0,
            tv_usec =3D 0
          }
        }
        rsinfo =3D {
          type =3D T_ReturnSetInfo,
          econtext =3D 0x1aedaf0,
          expectedDesc =3D 0x1aee1e0,
          allowedModes =3D 11,
          returnMode =3D SFRM_ValuePerCall,
          isDone =3D ExprSingleResult,
---Type <return> to continue, or q <return> to quit---
          setResult =3D 0x0,
          setDesc =3D 0x0
        }
        tmptup =3D {
          t_len =3D 0,
          t_self =3D {
            ip_blkid =3D {
              bi_hi =3D 0,
              bi_lo =3D 0
            },
            ip_posid =3D 0
          },
          t_tableOid =3D 0,
          t_data =3D 0x18d4b35
        }
        callerContext =3D 0x19bc3c8
        oldcontext =3D <value optimized out>
        direct_function_call =3D 1 '\001'
        first_time =3D 1 '\001'
        __func__ =3D "ExecMakeTableFunctionResult"
#6  0x0000000000599151 in FunctionNext (node=3D0x1aed9e0) at
nodeFunctionscan.c:65
        slot =3D <value optimized out>
        estate =3D <value optimized out>
        direction =3D ForwardScanDirection
        tuplestorestate =3D <value optimized out>
#7  0x0000000000588c3e in ExecScanFetch (node=3D0x1aed9e0,
accessMtd=3D0x5990e0 <FunctionNext>, recheckMtd=3D0x598e50
<FunctionRecheck>) at execScan.c:82
        estate =3D <value optimized out>
#8  ExecScan (node=3D0x1aed9e0, accessMtd=3D0x5990e0 <FunctionNext>,
recheckMtd=3D0x598e50 <FunctionRecheck>) at execScan.c:132
        econtext =3D 0x1aedaf0
        qual =3D 0x0
        projInfo =3D 0x0
        isDone =3D <value optimized out>
        resultSlot =3D <value optimized out>
#9  0x00000000005817f8 in ExecProcNode (node=3D0x1aed9e0) at
execProcnode.c:425
        result =3D <value optimized out>
        __func__ =3D "ExecProcNode"
#10 0x0000000000580722 in ExecutePlan (queryDesc=3D0x18d3990,
direction=3D<value optimized out>, count=3D0) at execMain.c:1395
        slot =3D <value optimized out>
        current_tuple_count =3D 0
#11 standard_ExecutorRun (queryDesc=3D0x18d3990, direction=3D<value
optimized out>, count=3D0) at execMain.c:303
        estate =3D 0x1aed8d0
        operation =3D CMD_SELECT
        dest =3D 0x18c2f20
        sendTuples =3D <value optimized out>
        oldcontext =3D 0x19bc2b8
#12 0x00007eff57fb2a4b in pgss_ExecutorRun (queryDesc=3D0x18d3990,
direction=3DForwardScanDirection, count=3D0) at pg_stat_statements.c:719
        save_exception_stack =3D 0x7fffa59cbe00
        save_context_stack =3D 0x0
        local_sigjmp_buf =3D {{
            __jmpbuf =3D {25985856, 5106765649720642852, 25964320,
26032528, 1, 25964272, -5106611986443793116, -4962879123083490012},
            __mask_was_saved =3D 0,
---Type <return> to continue, or q <return> to quit---
            __saved_mask =3D {
              __val =3D {5773925, 26032528, 28235984, 0, 25985856,
25964096, 5770657, 0, 28162928, 0, 25964320, 26032528, 1, 25964272,
7625096, 26782512}
            }
          }}
#13 0x00000000006568f7 in PortalRunSelect (portal=3D0x18c8340,
forward=3D<value optimized out>, count=3D0, dest=3D0x18c2f20) at pquery.c:9=
44
        queryDesc =3D 0x18d3990
        direction =3D ForwardScanDirection
        nprocessed =3D <value optimized out>
        __func__ =3D "PortalRunSelect"
#14 0x0000000000657b98 in PortalRun (portal=3D0x18c8340,
count=3D9223372036854775807, isTopLevel=3D1 '\001', dest=3D0x18c2f20,
altdest=3D0x18c2f20, completionTag=3D0x7fffa59cbf10 "") at pquery.c:788
        save_exception_stack =3D 0x7fffa59cc050
        save_context_stack =3D 0x0
        local_sigjmp_buf =3D {{
            __jmpbuf =3D {25964320, 5106769150708288804, 25985856,
26785952, 2, 25964272, -5106611986450084572, 5106765901172706596},
            __mask_was_saved =3D 0,
            __saved_mask =3D {
              __val =3D {7493551, 25964272, 26715728, 9017635, 25985856,
9017635, 80, 25856832, 25985856, 26785952, 2, 25964272, 7570597,
8762301, 2, 26785680}
            }
          }}
        result =3D <value optimized out>
        nprocessed =3D <value optimized out>
        saveTopTransactionResourceOwner =3D 0x197cff0
        saveTopTransactionContext =3D 0x197cee0
        saveActivePortal =3D 0x0
        saveResourceOwner =3D 0x197cff0
        savePortalContext =3D 0x0
        saveMemoryContext =3D 0x197cee0
        __func__ =3D "PortalRun"
#15 0x000000000065425b in exec_simple_query (query_string=3D0x198ab30 "
select * from test_test();") at postgres.c:1046
        parsetree =3D 0x198b790
        portal =3D 0x18c8340
        snapshot_set =3D <value optimized out>
        commandTag =3D <value optimized out>
        completionTag =3D
"\000\000\000\000\377\177\000\000\024\370\237\245\377\177\000\000p\277\234\=
245\377\177\000\000\060K\215\001\000\000\000\000P\277\234\245\377\177\000\0=
00\024\370\237\245\377\177\000\000\220\277\234\245\377\177\000\000\200\301\=
234\245\377\177\000"
        querytree_list =3D <value optimized out>
        plantree_list =3D 0x18c2ef0
        receiver =3D 0x18c2f20
        format =3D 0
        dest =3D DestRemote
        oldcontext =3D 0x197cee0
        parsetree_list =3D 0x198b8c0
        parsetree_item =3D 0x198b8a0
        save_log_statement_stats =3D 0 '\000'
        was_logged =3D <value optimized out>
        isTopLevel =3D <value optimized out>
        msec_str =3D
"\034\000\000\000\000\000\000\000\060\253\230\001\000\000\000\000\001\000\0=
00\000\000\000\000\000\201\200\200\200\200\200\200\200"
        __func__ =3D "exec_simple_query"
#16 0x00000000006550b0 in PostgresMain (argc=3D<value optimized out>,
argv=3D<value optimized out>, dbname=3D0x18aa4d0 "transport",
username=3D<value optimized out>) at postgres.c:3959
        query_string =3D 0x198ab30 "  select * from test_test();"
        firstchar =3D <value optimized out>
---Type <return> to continue, or q <return> to quit---
        input_message =3D {
          data =3D 0x198ab30 "  select * from test_test();",
          len =3D 29,
          maxlen =3D 1024,
          cursor =3D 29
        }
        local_sigjmp_buf =3D {{
            __jmpbuf =3D {140735971901792, 5106768996678765860, 1, 1,
-9187201950435737471, 0, -5106611986527679196, 5106765900793253156},
            __mask_was_saved =3D 1,
            __saved_mask =3D {
              __val =3D {0, 0, 0, 0, 0, 139635321201376, 139635337349552,
0, 4294967295, 0, 1, 11688832, 0, 9259542123273814145, 13970, 0}
            }
          }}
        send_ready_for_query =3D 0 '\000'
        __func__ =3D "PostgresMain"
#17 0x0000000000611991 in BackendRun () at postmaster.c:3614
        ac =3D 1
        secs =3D 439980748
        usecs =3D 431501
        i =3D <value optimized out>
        av =3D 0x18aa5e0
        maxac =3D <value optimized out>
#18 BackendStartup () at postmaster.c:3304
        bn =3D 0x18cf0a0
        pid =3D 0
#19 ServerLoop () at postmaster.c:1367
        rmask =3D {
          fds_bits =3D {8, 0 <repeats 15 times>}
        }
        selres =3D <value optimized out>
        readmask =3D {
          fds_bits =3D {56, 0 <repeats 15 times>}
        }
        nSockets =3D 6
        now =3D <value optimized out>
        last_touch_time =3D 1386663421
        __func__ =3D "ServerLoop"
#20 0x00000000006145a1 in PostmasterMain (argc=3D<value optimized out>,
argv=3D<value optimized out>) at postmaster.c:1127
        opt =3D <value optimized out>
        status =3D <value optimized out>
        userDoption =3D <value optimized out>
        listen_addr_saved =3D 1 '\001'
        i =3D <value optimized out>
        __func__ =3D "PostmasterMain"
#21 0x00000000005b3540 in main (argc=3D5, argv=3D0x18a7680) at main.c:199

Any thoughts?

--=20
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

Re: [BUG] Segmentation fault in pfree in PLy_output_tuple_funcs

From
Alvaro Herrera
Date:
Sergey Konoplev escribió:

> Yesterday we faced a segfault on a simple PL/Python function's call:
>
> CREATE OR REPLACE FUNCTION test_test()
> RETURNS TABLE("Идентификтор" int, "Гос. номер" varchar) AS
> $BODY$
> import time
> cars = plpy.prepare("""
>     select id, ident_code from cars limit 100
> """)

What's the likelihood that table "cars" was being modified concurrently?


--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [BUG] Segmentation fault in pfree in PLy_output_tuple_funcs

From
Sergey Konoplev
Date:
On Tue, Dec 10, 2013 at 1:17 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>> Yesterday we faced a segfault on a simple PL/Python function's call:
>>
>> CREATE OR REPLACE FUNCTION test_test()
>> RETURNS TABLE("=E9=C4=C5=CE=D4=C9=C6=C9=CB=D4=CF=D2" int, "=E7=CF=D3. =
=CE=CF=CD=C5=D2" varchar) AS
>> $BODY$
>> import time
>> cars =3D plpy.prepare("""
>>     select id, ident_code from cars limit 100
>> """)
>
> What's the likelihood that table "cars" was being modified concurrently?

It is rather high. Probably even very high.

--=20
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

Re: [BUG] Segmentation fault in pfree in PLy_output_tuple_funcs

From
Tom Lane
Date:
Sergey Konoplev <gray.ru@gmail.com> writes:
> On Tue, Dec 10, 2013 at 1:17 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> What's the likelihood that table "cars" was being modified concurrently?

> It is rather high. Probably even very high.

This doesn't smell like that's an issue though ...

Just eyeballing the code, I don't see how set-returning plpython functions
work at all.  Ever.  It looks like they allocate a bunch of stuff in the
SPI procedure context and expect it to still be there on the next call.
Why isn't this code falling over in any assert-enabled build?

            regards, tom lane

Re: [BUG] Segmentation fault in pfree in PLy_output_tuple_funcs

From
Sergey Konoplev
Date:
Just want to remind about this issue.

On Tue, Dec 10, 2013 at 2:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sergey Konoplev <gray.ru@gmail.com> writes:
>> On Tue, Dec 10, 2013 at 1:17 PM, Alvaro Herrera
>> <alvherre@2ndquadrant.com> wrote:
>>> What's the likelihood that table "cars" was being modified concurrently?
>
>> It is rather high. Probably even very high.
>
> This doesn't smell like that's an issue though ...
>
> Just eyeballing the code, I don't see how set-returning plpython functions
> work at all.  Ever.  It looks like they allocate a bunch of stuff in the
> SPI procedure context and expect it to still be there on the next call.
> Why isn't this code falling over in any assert-enabled build?
>
>                         regards, tom lane

^^^

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

Re: [BUG] Segmentation fault in pfree in PLy_output_tuple_funcs

From
Tom Lane
Date:
Sergey Konoplev <gray.ru@gmail.com> writes:
> Just want to remind about this issue.

My preliminary theory about this was all wrong, so I don't know what's
going on.  Unless you can reproduce it there's not much we can do.

            regards, tom lane