Re: [HACKERS] Postgres Speed or lack thereof - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [HACKERS] Postgres Speed or lack thereof |
Date | |
Msg-id | 7986.917207692@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [HACKERS] Postgres Speed or lack thereof (Vadim Mikheev <vadim@krs.ru>) |
List | pgsql-hackers |
Vadim Mikheev <vadim@krs.ru> writes: > Tom Lane wrote: >> In other words, we're spending a third of our time mallocing and freeing >> memory. A tad high, what? > I added some debug code to palloc/pfree and it shows that for > INSERT: > 1. 80% of allocations are made for <= 32 bytes. > 2. pfree is used for 25% of them only (others are freed > after statement/transaction is done). I had suspected the former (because most of the allocs are parser nodes) and knew the latter from profile call counts. > Note that our mmgr adds 16 bytes to each allocation > (+ some bytes in malloc) - a great overhead, yes? Youch ... for a list-node-sized request, that's a lot of overhead. I did some more profiling this weekend, after Magnus Hagander and I repaired libpq's little problem with calling recv() once per input byte. Unsurprisingly, memory allocation is still 50% of backend CPU time in my test consisting of a long series of INSERT statements. It's less than that, but still an interesting number, in two other test scenarios I set up: (1) SELECT the whole contents of a 6500-row, 38-column table; repeat 5 times. Memory allocation takes 8.74 of 54.45 backendCPU sec, or 16%. (2) Load a database from a pg_dump script. Script size is 1.3MB. This is of course mostly CREATE TABLE, COPY from stdin,CREATE INDEX. Memory allocation takes 11.08 of 38.25 sec = 29%. I believe that the reason memory alloc is such a large percentage of the INSERT scenario is that this is the only one where lots of SQL parsing is going on. In the SELECT scenario, most of the palloc/pfree traffic is temporary field value representations inside printtup (more about that below). In the database load scenario, it seems that btree index building is the biggest hit; the heaviest callers of palloc are: 0.00 0.01 10642/438019 _bt_searchr [86] 0.00 0.01 13531/438019 heap_formtuple[149] 0.00 0.01 13656/438019 CopyFrom [24] 0.00 0.01 14196/438019 textin [247] 0.01 0.01 18275/438019 datetime_in [42] 0.01 0.03 32595/438019 bpcharin [182] 0.01 0.03 38882/438019 index_formtuple [79] 0.01 0.03 39028/438019 _bt_formitem [212] 0.07 0.16 204564/438019 _bt_setsortkey [78] [88] 1.3 0.14 0.34 438019 palloc [88] > I added code to allocate a few big (16K-64K) blocks of memory for > these small allocations to speed up palloc by skiping > AllocSetAlloc/malloc. New code don't free allocated memory (to make > bookkeeping fast) but keeping in mind 2. above and memory overhead it > seems as appropriate thing to do. These code also speed up freeing > when statement/transaction is done, because of only a few blocks have > to be freed now. This is pretty much what I had in mind, but I think that it is not OK to leave everything to be freed at statement end. In particular I've just been looking at printtup(), the guts of SELECT output, and I see that each field value is palloc'd (by a type-specific output routine) and then pfree'd by printtup after emitting it. If you make the pfrees all no-ops then this means the temporary space needed for SELECT is at least equal to the total volume of output data ... not good for huge tables. (COPY in and out also seem to palloc/pfree each field value that goes past.) However, I think we could get around this by making more use of memory contexts. The space allocated inside printtup could be palloc'd from a "temporary" context that gets flushed every so often (say, after every few hundred tuples of output). All the pallocs done inside the parser could go to a "statement" context that is flushed at the end of each statement. It'd probably also be a good idea to have two kinds of memory contexts, the current kind where each block is individually free-able and the new kind where we only keep track of how to free the whole context's contents at once. That way, any particular routine that was going to allocate a *large* chunk of memory would have a way to free that chunk explicitly when done with it (but would have to remember which context it got it from in order to do the free). For the typical list-node-sized request, we wouldn't bother with the extra bookkeeping. I think that palloc() could be defined as always giving you the bulk- allocated kind of chunk. pfree() would become a no-op and eventually could be removed entirely. Anyone who wants the other kind of chunk would call a different pair of routines, where the controlling memory context has to be named at both alloc and free time. (This approach gets rid of your concern about the cost of finding which kind of context a block has been allocated in inside of pfree; we expect the caller to know that.) Getting this right is probably going to take some thought and work, but it looks worthwhile from a performance standpoint. Maybe for 6.6? > It shows that we should get rid of system malloc/free and do > all things in mmgr itself - this would allow us much faster > free memory contexts at statement/transaction end. I don't think we can or should stop using malloc(), but we can ask it for large blocks and do our own allocations inside those blocks --- was that what you meant? regards, tom lane
pgsql-hackers by date: