Re: 7.4 beta 1 getting out of swap - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: 7.4 beta 1 getting out of swap |
Date | |
Msg-id | 8826.1060906670@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: 7.4 beta 1 getting out of swap
Re: 7.4 beta 1 getting out of swap |
List | pgsql-hackers |
Bertrand Petit <elrond@phoe.frmug.org> writes: > And I just got another one, much simpler, that failed the same > way with the same data set: > UPDATE rimdb_atitles SET aka_title=convert(byte_title,charset,'UTF8'); [ where rimdb_atitles has an index on column "attribs varchar[]" ] Uh-huh. Actually, any large insert or update on that table will run out of memory, I bet. The problem appears to be due to the newly-added support for indexing array columns --- array_cmp() leaks memory, which is verboten for index support operators. At first I thought this would be an easy fix --- just rewrite array_cmp to not depend on deconstruct_array, as array_eq already does not. I soon found that that only reduced the speed of leakage, however. The real problem comes from the fact that array_eq and array_cmp expect to be able to save information across calls using flinfo->fn_extra. While this works to some extent, the btree routines generate a new scankey --- with a nulled fn_extra --- for every index AM call. btree knows to delete the scankey when it's done, but it doesn't know anything about deleting what fn_extra points to. (Even if it did, there is additional leakage inside equality_oper(), which would be very difficult to clean up directly.) Quite aside from the memory leak problem, it's annoying to think that the array element information will be looked up again on every btree operation. That seems expensive. I can think of a number of ways we might attack this, but none seem especially attractive --- 1. Have the index AMs create and switch into a special memory context for each call, rather than running in the main execution context. I am not sure this is workable at all, since the AMs tend to think they can create data structures that will live across calls (for example a btree lookup stack). It'd be the most general solution, if we could make it work. 2. Modify the index AMs so that the comparison function FmgrInfo is preserved across a whole query. I think this requires changes to the index AM API (index_insert for instance has no provision for sharing data across multiple calls). Messy, and would likely mean an initdb. It would probably be the fastest answer though, since lookups wouldn't need to be done more than once per query. 3. Set up a long-lived cache internal to the array functions that can translate element type OID to the needed lookup data, and won't leak memory across repeated calls. This is not the fastest or most general solution, but it seems the most localized and safest fix. Has anyone got some other ideas? regards, tom lane
pgsql-hackers by date: