Thread: Space management for PGresult
In space management for PGresult of libpq, the block size of PGresult is always PGRESULT_DATA_BLOCKSIZE(2048bytes). Therefore, when a large result of query is received, malloc is executed many times. My proposal is to enlarge the size of the block whenever the block is allocated. The size of first block is PGRESULT_DATA_BLOCKSIZE. And the size of the following blocks will be doubled until it reaches PGRESULT_MAX_DATA_BLOCKSIZE. PGRESULT_MAX_DATA_BLOCKSIZE is new constants. I think that 2Mbytes is good enough for this constants. The test result is the following: Test SQL: select * from accounts; (It is pgbench's table. scale factor is 10.) The number of malloc calls at pqResultAlloc: 8.1.0 : 80542 patched: 86 Execution time: 8.1.0 : 6.80 sec patched: 6.73 sec regards, --- Atsushi Ogawa
Attachment
Atsushi Ogawa <atsushi.ogawa@gmail.com> writes: > The number of malloc calls at pqResultAlloc: > 8.1.0 : 80542 > patched: 86 > Execution time: > 8.1.0 : 6.80 sec > patched: 6.73 sec This hardly seems worth adding any complexity for ... regards, tom lane
Tom Lane wrote: > Atsushi Ogawa <atsushi.ogawa@gmail.com> writes: > > The number of malloc calls at pqResultAlloc: > > 8.1.0 : 80542 > > patched: 86 > > > Execution time: > > 8.1.0 : 6.80 sec > > patched: 6.73 sec > > This hardly seems worth adding any complexity for ... What about memory usage? Is there a notorious difference? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote: > Tom Lane wrote: > > Atsushi Ogawa <atsushi.ogawa@gmail.com> writes: > > > The number of malloc calls at pqResultAlloc: > > > 8.1.0 : 80542 > > > patched: 86 > > > > > Execution time: > > > 8.1.0 : 6.80 sec > > > patched: 6.73 sec > > > > This hardly seems worth adding any complexity for ... > > What about memory usage? Is there a notorious difference? Well, I measured memory usage by attached patch. An allocated(bytes) is total amounts of allocated memory by pqResultAlloc. An unused(bytes) is total amounts of PGresult->spaceLeft. (1)accounts table (4 columns, 1,000,000 tuples) malloc calls allocated(bytes) unused(bytes) execution time -------------------------------------------------------------------------- 8.1.0 80,542 164,950,016 2,946,402 6.80 sec. patched 86 161,478,656 177,650 6.73 sec. (2)another teble (50 columns, 100,000 tuples) malloc calls allocated(bytes) unused(bytes) execution time -------------------------------------------------------------------------- 8.1.0 55,557 113,780,736 8,561,518 6.26 sec. patched 86 104,855,552 83,307 6.21 sec. The unused memory increases when the number of columns increases. The tuple size of PGresult is proportional to the number of columns. getAnotherTuple() at fe-protocol3.c: -------------------------------------------------------------------------- conn->curTuple = (PGresAttValue *) pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE); -------------------------------------------------------------------------- regards, --- Atsushi Ogawa
Attachment
Atsushi Ogawa <atsushi.ogawa@gmail.com> writes: > (1)accounts table (4 columns, 1,000,000 tuples) > malloc calls allocated(bytes) unused(bytes) execution time > -------------------------------------------------------------------------- > 8.1.0 80,542 164,950,016 2,946,402 6.80 sec. > patched 86 161,478,656 177,650 6.73 sec. This hardly seems credible --- your patch would result in more wasted memory, not less. It looks to me like the instrumentation you added assumes that extra space in a malloc block will never be used later, which of course is not true ... regards, tom lane
On Wed, 2005-11-23 at 16:21 +0900, Atsushi Ogawa wrote: > In space management for PGresult of libpq, the block size of PGresult > is always PGRESULT_DATA_BLOCKSIZE(2048bytes). Therefore, when a large > result of query is received, malloc is executed many times. > > My proposal is to enlarge the size of the block whenever the block is > allocated. The size of first block is PGRESULT_DATA_BLOCKSIZE. And the > size of the following blocks will be doubled until it reaches > PGRESULT_MAX_DATA_BLOCKSIZE. > > PGRESULT_MAX_DATA_BLOCKSIZE is new constants. I think that 2Mbytes is > good enough for this constants. > > The test result is the following: > > Test SQL: > select * from accounts; (It is pgbench's table. scale factor is 10.) > > The number of malloc calls at pqResultAlloc: > 8.1.0 : 80542 > patched: 86 > > Execution time: > 8.1.0 : 6.80 sec > patched: 6.73 sec > What this highlights for me is that we have (IMHO) a strange viewpoint on allocating result memory, not an optimization issue. We really ought to be streaming the result back to the user, not downloading it all into a massive client side chunk of memory. It ought to be possible to do this with very low memory, and would probably have the side-effect of reducing time-to-first-row. Then we wouldn't have a memory allocation issue at all. Consider what will happen if you do "select * from too_big_table". We'll just run for ages, then blow memory and fail. (That's what it used to do, does it still? he asks lazily). Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > We really ought to be streaming the result back to the user, not > downloading it all into a massive client side chunk of memory. Have you been paying any attention to the multiple previous discussions of that point? (Latest was on pgsql-interfaces within the past week.) regards, tom lane
On Thu, 2005-11-24 at 12:32 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > We really ought to be streaming the result back to the user, not > > downloading it all into a massive client side chunk of memory. > > Have you been paying any attention to the multiple previous discussions > of that point? (Latest was on pgsql-interfaces within the past week.) Clearly not. Thanks for the heads up. Best Regards, Simon Riggs