Re: Berkeley DB... - Mailing list pgsql-hackers

From Mike Mascari
Subject Re: Berkeley DB...
Date
Msg-id 392EC6F6.91F4537D@mascari.com
Whole thread Raw
In response to RE: Berkeley DB...  ("Mikheev, Vadim" <vmikheev@SECTORBASE.COM>)
Responses Re: Berkeley DB...
List pgsql-hackers
Tom Lane wrote:
> 
> Hannu Krosing <hannu@tm.ee> writes:
> >> As Vadim points out in his comparison
> >> of COPY vs. INSERT, something is *wrong* with the time it takes
> >> for PostgreSQL to parse, plan, rewrite, and optimize.
> 
> We might have part of the story in the recently noticed fact that
> each insert/update query begins by doing a seqscan of pg_index.
> 
> I have done profiles of INSERT in the past and not found any really
> spectacular bottlenecks (but I was looking at a test table with no
> indexes, so I failed to see the pg_index problem :-().  Last time
> I did it, I had these top profile entries for inserting 100,000 rows
> of 30 columns apiece:
> 
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  30.08    290.79   290.79                             _mcount
>   6.48    353.46    62.67 30702766     0.00     0.00  AllocSetAlloc
>   5.27    404.36    50.90   205660     0.25     0.25  write
>   3.06    433.97    29.61 30702765     0.00     0.00  MemoryContextAlloc
>   2.74    460.45    26.48   100001     0.26     0.74  yyparse
>   2.63    485.86    25.41 24300077     0.00     0.00  newNode
>   2.22    507.33    21.47  3900054     0.01     0.01  yylex
>   1.63    523.04    15.71 30500751     0.00     0.00  PortalHeapMemoryAlloc
>   1.31    535.68    12.64  5419526     0.00     0.00  hash_search
>   1.18    547.11    11.43  9900000     0.00     0.00  expression_tree_walker
>   1.01    556.90     9.79  3526752     0.00     0.00  SpinRelease
> 
> While the time spent in memory allocation is annoying, that's only about
> ten mallocs per parsed data expression, so it's unlikely that we will be
> able to improve on it very much.  (We could maybe avoid having *three*
> levels of subroutine call to do an alloc, though ;-).)  Unless you are
> smarter than the flex and bison guys you are not going to be able to
> improve on the lex/parse times either.  The planner isn't even showing
> up for a simple INSERT.  Not much left, unless you can figure out how
> to write and commit a tuple with less than two disk writes.
> 
> But, as I said, this was a case with no indexes to update.
> 
> I intend to do something about caching pg_index info ASAP in the 7.1
> cycle, and then we can see how much of a difference that makes...
> 
>                         regards, tom lane

It will be interesting to see the speed differences between the
100,000 inserts above and those which have been PREPARE'd using
Karel Zak's PREPARE patch. Perhaps a generic query cache could be
used to skip the parsing/planning/optimizing stage when multiple
exact queries are submitted to the database? I suppose the cached
plans could then be discarded whenever a DDL statement or a
VACUUM ANALYZE is executed? The old Berkeley Postgres docs spoke
about cached query plans *and* results (as well as 64-bit oids,
amongst other things). I'm looking forward to when the 7.1 branch
occurs... :-)

Mike Mascari


pgsql-hackers by date:

Previous
From: "Mikheev, Vadim"
Date:
Subject: RE: Berkeley DB...
Next
From: Olivier PRENANT
Date:
Subject: New Type