Re: Berkeley DB... - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Berkeley DB...
Date
Msg-id 2130.959491684@sss.pgh.pa.us
Whole thread Raw
In response to RE: Berkeley DB...  ("Mikheev, Vadim" <vmikheev@SECTORBASE.COM>)
List pgsql-hackers
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
>>>> Well, I've dropped index but INSERTs still take 70 sec and 
>>>> COPY just 1sec -:(((

Mebbe so, but you can't blame it all on parse/plan overhead.

I did some experimentation on this with current sources, using a test
case of inserting 100,000 rows of 16 columns (the regression "tenk1"
table's contents repeated 10 times).  Each test was started with a
freshly created empty table.  The initial runs were done with all
postmaster options except -F defaulted.  All numbers are wall-clock time
in seconds; the "+" column is the time increase from the previous case:

load via COPY, fsync off:
0 indexes    24.45s
1 index        48.88s        + 24.43
2 indexes    62.65s        + 13.77
3 indexes    96.84s        + 34.19
4 indexes    134.09s        + 37.25

load via INSERTs, fsync off, one xact (begin/end around all inserts):
0 indexes    194.95s
1 index        247.21s        + 52.26
2 indexes    269.69s        + 22.48
3 indexes    307.33s        + 37.64
4 indexes    352.72s        + 45.39

load via INSERTs, fsync off, separate transaction for each insert:
0 indexes    236.53s
1 index        295.96s        + 59.43
2 indexes    323.40s        + 27.44
[ got bored before doing 3/4 index cases ... ]

load via INSERTs, fsync on, separate transactions:
0 indexes    5189.99s
[ don't want to know how long it will take with indexes :-( ]

So while the parse/plan overhead looks kinda bad next to a bare COPY,
it's not anything like a 70:1 penalty.  But an fsync per insert is
that bad and worse.

I then recompiled with -pg to learn more about where the time was going.
One of the useful places to look at is calls to FileSeek, since mdread,
mdwrite, and mdextend all call it.  To calibrate these numbers, the
table being created occupies 2326 pages and the first index is 343
pages.

Inserts (all in 1 xact), no indexes:               0.00    0.00       1/109528      init_irels [648]               0.00
  0.00      85/109528      mdread [592]               0.01    0.00    2327/109528      mdextend [474]
0.01   0.00    2343/109528      mdwrite [517]               0.23    0.00  104772/109528      _mdnblocks [251]
 
[250]    0.0    0.24    0.00  109528         FileSeek [250]
Inserts (1 xact), 1 index:               0.00    0.00       1/321663      init_irels [649]               0.00    0.00
2667/321663      mdextend [514]               0.10    0.00   55478/321663      mdread [277]               0.11    0.00
58096/321663      mdwrite [258]               0.38    0.00  205421/321663      _mdnblocks [229]
 
[213]    0.1    0.60    0.00  321663         FileSeek [213]
COPY, no indexes:               0.00    0.00       1/109527      init_irels [431]               0.00    0.00
84/109527     mdread [404]               0.00    0.00    2327/109527      mdextend [145]               0.00    0.00
2343/109527     mdwrite [178]               0.07    0.00  104772/109527      _mdnblocks [77]
 
[83]     0.0    0.07    0.00  109527         FileSeek [83]
COPY, 1 index:               0.00    0.00       1/218549      init_irels [382]               0.00    0.00
2667/218549     mdextend [220]               0.07    0.00   53917/218549      mdread [106]               0.08    0.00
56542/218549     mdwrite [99]               0.14    0.00  105422/218549      _mdnblocks [120]
 
[90]     0.0    0.30    0.00  218549         FileSeek [90]

The extra _mdnblocks() calls for the inserts/1index case seem to be from
the pg_index scans in ExecOpenIndices (which is called 100000 times in
the inserts case but just once in the COPY case).  We know how to fix
that.  Otherwise the COPY and INSERT paths seem to be pretty similar as
far as actual I/O calls go.  The thing that jumps out here, however, is
that it takes upwards of 50000 page reads and writes to prepare a
343-page index.  Most of the write calls turn out to be from
BufferReplace, which is pretty conclusive evidence that the default
setting of -B 64 is not enough for this example; we need more buffers.

At -B 128, inserts/0index seems about the same, inserts/1index traffic is               0.00    0.00       1/270331
init_irels [637]               0.01    0.00    2667/270331      mdextend [510]               0.06    0.00
29798/270331     mdread [354]               0.06    0.00   32444/270331      mdwrite [277]               0.40    0.00
205421/270331     _mdnblocks [229]
 
[223]    0.1    0.52    0.00  270331         FileSeek [223]
At -B 256, inserts/1index traffic is               0.00    0.00       1/221849      init_irels [650]               0.00
  0.00    2667/221849      mdextend [480]               0.01    0.00    5556/221849      mdread [513]
0.01   0.00    8204/221849      mdwrite [460]               0.37    0.00  205421/221849      _mdnblocks [233]
 
[240]    0.0    0.40    0.00  221849         FileSeek [240]
At -B 512, inserts/1index traffic is               0.00    0.00       1/210788      init_irels [650]               0.00
  0.00      25/210788      mdread [676]               0.00    0.00    2667/210788      mdextend [555]
0.00   0.00    2674/210788      mdwrite [564]               0.27    0.00  205421/210788      _mdnblocks [248]
 
[271]    0.0    0.28    0.00  210788         FileSeek [271]

So as long as the -B setting is large enough to avoid thrashing, there
shouldn't be much penalty to making an index.  I didn't have time to run
the COPY cases but I expect they'd be about the same.

Bottom line is that where I/O costs are concerned, the parse/plan
overhead for INSERTs is insignificant except for the known problem
of wanting to rescan pg_index for each INSERT.  The CPU overhead is
significant, at least if you're comparing no-fsync performance ...
but as I commented before, I doubt we can do a whole lot better in
that area for simple INSERTs.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Regression test failure on 7.0-STABLE
Next
From: Tatsuo Ishii
Date:
Subject: Re: Regression test failure on 7.0-STABLE