Re: [HACKERS] Postgres Speed or lack thereof - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [HACKERS] Postgres Speed or lack thereof
Date
Msg-id 199903151428.JAA12005@candle.pha.pa.us
Whole thread Raw
In response to Re: [HACKERS] Postgres Speed or lack thereof  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Postgres Speed or lack thereof
List pgsql-hackers
Tom, I assume this has all been corrected, right?




> I wrote:
> > As for where the time is going: "top" showed that the time for the
> > INSERT ops was almost all going into backend CPU time.  My guess is
> > that most of the cost is in parsing and planning the INSERT
> > statements.
> 
> Having an idle hour this evening, I thought it'd be interesting to build
> a backend with profiling enabled, so as to confirm or deny the above
> guess.  It seems that indeed a lot of time is being wasted, but where
> it's being wasted might surprise you!
> 
> I profiled a backend built from this morning's current CVS sources,
> being driven by a single psql run reading a series of exactly 5000
> INSERT commands.  The first few are:
> 
> INSERT INTO "fehistory_1" values (64,14,252,'D','-','-','s',1,4027,NULL,'S',1,4088,NULL,32,'Thu Jan 07 15:15:00 1999
EST','FriJan 08 08:30:38 1999 EST',4027,32,NULL,NULL,46,NULL);
 
> INSERT INTO "fehistory_1" values (65,14,252,'P','-','-','S',1,4086,NULL,NULL,NULL,NULL,NULL,32,'Fri Jan 08 16:00:00
1999EST',NULL,NULL,NULL,NULL,NULL,45,NULL);
 
> INSERT INTO "fehistory_1" values (66,14,135,'P','-','-','S',1,13619,NULL,'s',1,12967,NULL,100,'Fri Jan 08 02:00:00
1999EST',NULL,NULL,NULL,NULL,NULL,44,NULL);
 
> 
> and it goes on for a long while in that vein :-).  Total runtime was
> 278.09 CPU seconds in the backend, 307 seconds elapsed realtime.  The
> table being inserted into has no indexes and was initially empty.
> 
> The profiler is GNU gprof, which provides both flat and structured
> profile data.  The top few entries in the flat profile are:
> 
>   %   cumulative   self              self     total           
>  time   seconds   seconds    calls  ms/call  ms/call  name    
>  34.96     97.21    97.21                             _mcount
>   7.18    117.19    19.98  8047816     0.00     0.00  tree_cut
>   6.56    135.43    18.24   962187     0.02     0.02  recv
>   3.55    145.31     9.88  8234278     0.00     0.00  tree_insert
>   3.49    155.02     9.71  4292947     0.00     0.01  malloc
>   3.31    164.22     9.20  1965056     0.00     0.01  SearchSysCache
>   3.24    173.23     9.01  4291775     0.00     0.01  free
>   2.44    180.01     6.78  6171667     0.00     0.00  tree_concatenate
>   2.17    186.04     6.03  8234069     0.00     0.00  tree_delete
>   1.07    189.02     2.98    10250     0.29     0.29  write
>   1.05    191.94     2.92                             _psort_cmp
>   0.95    194.57     2.63  2983301     0.00     0.00  newNode
>   0.92    197.14     2.57  4251762     0.00     0.00  OrderedElemPop
>   0.88    199.60     2.46  2746172     0.00     0.00  comphash
>   0.82    201.88     2.28  4252495     0.00     0.01  AllocSetAlloc
>   0.77    204.02     2.14  1965056     0.00     0.00  SearchSysCacheTuple
>   0.73    206.05     2.03   265000     0.01     0.01  yylex
>   0.69    207.96     1.91                             $$remU
>   0.68    209.86     1.90  4252500     0.00     0.00  OrderedElemPushHead
>   0.62    211.58     1.72  4247473     0.00     0.00  palloc
>   0.60    213.26     1.68    25004     0.07     0.07  send
>   0.56    214.81     1.55  1965056     0.00     0.00  CatalogCacheComputeHashIndex
>   0.55    216.33     1.52     5000     0.30     1.07  yyparse
>   0.54    217.82     1.49  4251762     0.00     0.01  AllocSetFree
>   0.53    219.29     1.47  4252494     0.00     0.00  MemoryContextAlloc
>   0.51    220.71     1.42  4252500     0.00     0.00  OrderedElemPushInto
>   0.49    222.07     1.36  3430844     0.00     0.00  strncmp
>   0.48    223.41     1.34  4205327     0.00     0.00  OrderedSetGetHead
>   0.47    224.73     1.32                             elog
>   0.45    225.99     1.26  1964976     0.00     0.00  DLRemove
>   0.43    227.19     1.20  2510857     0.00     0.00  strcmp
>   0.43    228.38     1.19  4200327     0.00     0.00  OrderedElemGetBase
>   0.42    229.56     1.18  4245740     0.00     0.01  PortalHeapMemoryAlloc
>   0.38    230.62     1.06  4252500     0.00     0.00  OrderedElemPush
>   0.37    231.66     1.04  4205327     0.00     0.00  AllocSetGetFirst
>   0.36    232.66     1.00  1965145     0.00     0.00  DLAddHead
> 
> _mcount is part of the profiling apparatus and can be ignored (although
> the fact that it's so high up implies that we're executing an awful lot
> of short routine calls, because mcount runs once per entry and exit of
> profilable routines).  tree_cut, tree_insert, tree_concatenate, and
> tree_delete are subroutines of malloc/free.  Altogether, malloc/free
> and friends accounted for 61.39 seconds out of the 180.88 non-overhead
> CPU seconds in this run.
> 
> In other words, we're spending a third of our time mallocing and freeing
> memory.  A tad high, what?
> 
> Actually, it's worse than that, because AllocSetAlloc,
> PortalHeapMemoryAlloc, AllocSetFree, and all of the OrderedElemXXX
> routines represent our own bookkeeping layer atop malloc/free.
> That's another 18.66 seconds spent in these top routines, which means
> that we are real close to expending half the backend's runtime on
> memory bookkeeping.  This needs work.
> 
> The other thing that jumps out here is the unreasonably high position of
> recv(), which is called 962187 times.  The script being read by psql was
> only 957186 characters.  Evidently we're invoking a kernel recv() call
> once per character read from the frontend.  I suspect this is an
> inefficiency introduced by Magnus Hagander's recent rewrite of backend
> libpq (see, I told you there was a reason for using stdio ;-)).  We're
> gonna have to do something about that, though it's not as critical as
> the memory-allocation issue.  It also appears that send() is now being
> invoked multiple times per backend reply, which is going to create
> inefficiencies outside the backend (ie, multiple packets per reply).
> On a test case with a lot of SELECTs that would show up more than it
> does here.
> 
> 
> Moving on to the dynamic profile, the most interesting items are:
> 
> index % time    self  children    called     name
> -----------------------------------------------
>                 0.05  118.06       1/1           DoBackend [7]
> [8]     65.3    0.05  118.06       1         PostgresMain [8]
>                 0.04   57.64    5000/5000        pg_exec_query [9]
>                 0.01   39.83    5000/5000        CommitTransactionCommand [11]
>                 0.02   18.86    5001/5001        ReadCommand [27]
>                 0.01    1.18    5000/5000        StartTransactionCommand [118]
>                 0.01    0.36    5001/5001        ReadyForQuery [190]
>                 0.04    0.00    5001/5001        EnableNotifyInterrupt [385]
>                 0.04    0.00    5000/5000        strspn [388]
>                 0.00    0.01       1/1           InitPostgres [495]
>                 0.00    0.00    5000/548573      strlen [172]
>                 0.00    0.00    5001/93349       memset [366]
>                 0.00    0.00       1/1           read_pg_options [601]
>                 0.00    0.00       2/2           pq_putint [637]
>                 0.00    0.00       1/15002       pq_putnchar [123]
>                 0.00    0.00       2/10013       getenv [241]
>                 0.00    0.00       1/1           SetPgUserName [683]
>                 0.00    0.00       1/1           pq_init [687]
>                 0.00    0.00       1/1           proc_exit [704]
>                 0.00    0.00       1/1           pq_close [721]
>                 0.00    0.00       1/5002        getpgusername [521]
>                 0.00    0.00       1/2           FindExec [722]
>                 0.00    0.00       5/8           getopt [728]
>                 0.00    0.00    5001/5001        DisableNotifyInterrupt [761]
>                 0.00    0.00       9/34          pqsignal [806]
>                 0.00    0.00       2/5           atoi [837]
>                 0.00    0.00       1/2           SetProcessingMode [868]
>                 0.00    0.00       1/10002       TransactionFlushEnabled [749]
>                 0.00    0.00       1/1           sigsetjmp [934]
> 
> ("self" is the amount of time (CPU sec) spent directly in the named function,
> while "children" is the amount of time spent in its callees, pro-rated
> by the number of calls.  For example, strlen was actually measured to
> use 0.47 sec in the whole run, but since PostgresMain called it less
> than 1% of all the times it was called, PostgresMain gets blamed for
> less than 0.01 sec of that total.)
> 
> The interesting thing here is the relative times indicated for
> pg_exec_query, CommitTransactionCommand, and ReadCommand.  We already
> found out why ReadCommand is so slow (recv() per character) ... but why
> is CommitTransactionCommand so high, when I already discovered that
> doing only one commit doesn't help much?  Patience, we'll get there.
> 
> -----------------------------------------------
>                 0.03   57.61    5000/5000        pg_exec_query [9]
> [10]    31.9    0.03   57.61    5000         pg_exec_query_dest [10]
>                 0.09   39.12    5000/5000        pg_parse_and_plan [13]
>                 0.00   17.37    5000/5000        ProcessQuery [36]
>                 0.02    0.60    5000/5000        SetQuerySnapshot [159]
>                 0.01    0.34    5000/5000        CommandCounterIncrement [196]
>                 0.02    0.05   10000/4291775     free [22]
> -----------------------------------------------
>                 0.09   39.12    5000/5000        pg_exec_query_dest [10]
> [13]    21.7    0.09   39.12    5000         pg_parse_and_plan [13]
>                 0.02   19.28    5000/5000        parser [26]
>                 0.03   17.63    5000/5000        planner [34]
>                 0.00    1.98    5000/5000        QueryRewrite [90]
>                 0.02    0.05   10000/4292947     malloc [21]
>                 0.02    0.05   10000/4291775     free [22]
>                 0.00    0.02    5000/497779      lappend [91]
>                 0.01    0.00    5000/5000        IsAbortedTransactionBlockState [529]
>                 0.00    0.00    5000/72779       length [342]
> 
> In other words, parsing, planning, and executing an INSERT each take
> about the same amount of time.
> 
> -----------------------------------------------
>                 0.01   39.83    5000/5000        PostgresMain [8]
> [11]    22.0    0.01   39.83    5000         CommitTransactionCommand [11]
>                 0.02   39.81    5000/5000        CommitTransaction [12]
> -----------------------------------------------
>                 0.02   39.81    5000/5000        CommitTransactionCommand [11]
> [12]    22.0    0.02   39.81    5000         CommitTransaction [12]
>                 0.00   38.12    5000/5000        AtCommit_Memory [16]
>                 0.01    1.10    5000/5000        RecordTransactionCommit [122]
>                 0.00    0.30    5000/5000        AtCommit_Locks [211]
>                 0.02    0.10    5000/5000        AtEOXact_portals [286]
>                 0.03    0.07    5000/5000        DestroyTempRels [305]
>                 0.03    0.00    5000/5000        _lo_commit [425]
>                 0.01    0.00    5000/5000        AtCommit_Notify [522]
>                 0.01    0.00    5000/5000        CloseSequences [523]
>                 0.01    0.00    5000/5000        RelationPurgeLocalRelation [531]
>                 0.00    0.00    5000/10000       AtCommit_Cache [750]
> -----------------------------------------------
>                 0.00   38.12    5000/5000        CommitTransaction [12]
> [16]    21.1    0.00   38.12    5000         AtCommit_Memory [16]
>                 0.01   38.11    5000/5000        EndPortalAllocMode [17]
>                 0.00    0.00    5000/10552       MemoryContextSwitchTo [747]
> -----------------------------------------------
>                 0.01   38.11    5000/5000        AtCommit_Memory [16]
> [17]    21.1    0.01   38.11    5000         EndPortalAllocMode [17]
>                 0.54   37.55    5000/5000        AllocSetReset [18]
>                 0.01    0.00    5000/10000       PortalHeapMemoryGetVariableMemory [449]
>                 0.01    0.00    5000/5000        FixedStackPop [528]
>                 0.00    0.00    5000/51434       MemoryContextFree [367]
> -----------------------------------------------
>                 0.54   37.55    5000/5000        EndPortalAllocMode [17]
> [18]    21.1    0.54   37.55    5000         AllocSetReset [18]
>                 1.47   32.51 4200327/4251762     AllocSetFree [20]
>                 1.04    2.53 4205327/4205327     AllocSetGetFirst [64]
> 
> In other words, essentially *all* of the CPU time spent in
> CommitTransaction is spent freeing memory.  That's probably why
> ganging the transactions doesn't help --- it's the same number of
> memory blocks getting allocated and freed.
> 
>                 0.02   19.28    5000/5000        pg_parse_and_plan [13]
> [26]    10.7    0.02   19.28    5000         parser [26]
>                 0.02   13.88    5000/5000        parse_analyze [41]
>                 1.52    3.81    5000/5000        yyparse [54]
>                 0.01    0.01    5000/5000        init_io [474]
>                 0.00    0.01    5000/420496      pstrdup [124]
>                 0.01    0.00    5000/5000        parser_init [533]
> 
> Thomas might know why parse_analyze is taking so much time compared to
> the rest of the parsing machinery...
> 
> 
> I won't bother the list with the rest of the profile, although I'll be
> happy to send it to anyone who wants to see all of it.  Our immediate
> performance problems seem pretty clear, however: time to rethink memory
> management.
> 
>             regards, tom lane
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] pg_dump again
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] SPI_prepare() doesn't work well ?