Re: [HACKERS] strange behaviour on pooled alloc - Mailing list pgsql-hackers
From | jwieck@debis.com (Jan Wieck) |
---|---|
Subject | Re: [HACKERS] strange behaviour on pooled alloc |
Date | |
Msg-id | m109BWi-000EBPC@orion.SAPserv.Hamburg.dsh.de Whole thread Raw |
In response to | Re: [HACKERS] strange behaviour on pooled alloc (Bruce Momjian <maillist@candle.pha.pa.us>) |
List | pgsql-hackers |
Bruce Momjian wrote: > > The strange behaviour now is that depending on the blocksize > > and the limit for block/single alloction I use for the pools, > > the portals_p2 regression test fails or not. > > [...] > > I have absolutely no clue what's going on here. Anyone an > > idea how to track this down? > > My recommendation is to apply the fix and let others debug it. Someone > will find the cause. Just give them a reproducable test case. In many > cases, more eyes or another OS shows the error much clearer. New version of AllocSet...() functions is committed. palloc() is a macro now. The memory eating problem of COPY FROM, INSERT ... SELECT and UPDATES on a table that has constraints is fixed (new file nodes/freefuncs.c). The settings in aset.c aren't optimal for now, because the settings in place force the portals_p2 test to fail (at least here). Some informations for those who want to take a look at it follow. Reproducing the bug: The bug can be reproduced after the regression test has been run by running only portals_p2.sql. To cause the error, the postmaster must be started with -B64 (default) and at least one environment variable (e.g. PGDATESTYLE), that causes psql to send a SET on connection must be set. If -B is greater than 64, AllocSetAlloc() put's the allocation for the buffer reference counts in the execution state EState into it's own malloc() area, not into a smallchunk block. The problem disappears. If the ALLOC_BLOCK_SIZE (in aset.c) is changed to 8192, the problem also disappears. If none of the mentioned environment variables is set, the BEGIN from the regression test is the first command sent to the backend and the problem disappears too. But adding a simple BEGIN; END; to the top of the test forces it to appear again, so it isn't in the variable setting code. Guessings: The symptom is that in the case of many portals on a big table rows that are there don't show up. Each cursor declaration results in it's own ExecutorStart(), where the buffer reference count is saved into the newly created execution state and reset to zero. Later on ExecutorEnd() these states are restored. These disappearing rows might have to do with unpinned buffers that are expected to be pinned. Since it depends on whether the allocation for the saved reference counts is taken from a block or allocated separately, I think some counts get corrupted from somewhere else. It also depends on the blocksize, one more point that it might be from somewhere else because the refcount areas must live in the same block with some other allocation together. I'll keep on debugging, but would be very appreciated if someone could help. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
pgsql-hackers by date: