Thread: [BUGS] BUG #14654: With high statistics targets on ts_vector,unexpectedly high memory use & OOM are triggered
[BUGS] BUG #14654: With high statistics targets on ts_vector,unexpectedly high memory use & OOM are triggered
From
james+postgres@carbocation.com
Date:
The following bug has been logged on the website: Bug reference: 14654 Logged by: James C Email address: james+postgres@carbocation.com PostgreSQL version: 9.5.5 Operating system: Linux Description: I am running a mirror of PubMed, which contains about 27 million articles. One of my tables (article) includes a unique ID for each article, the article title, a ts_vector of the abstract, some timestamps, and a numeric column. There are (unsurprisingly) about 27 million rows in this table. Because the abstracts contain scientific terms, the number of lexemes is quite large. With some prototyping on a local machine with 64G of RAM, I found queries to work quite well with the the ts_vector column having default statistics of 10,000. The production box has about 12G of RAM, but initially I didn't notice any issues. Today, I attempted to execute 'ANALYZE article;' However, the OOM killer kept taking down postgres. I reviewed numerous system settings and postgres settings, all of which seemed to properly give about 10G+ of RAM to postgres, and which should have constrained postgres to using less than that limit. RhodiumToad in Freenode#postgres helped me troubleshoot, and suggested I try reducing the statistics on the ts_vector. Dropping from 10,000 to 1,000 permitted ANALYZE to run without consuming large amounts of RAM. It seems that ANALYZE on a ts_vector column can consume 300 * (statistics target) * (size of data in field), which in my case ended up being well above 10 gigabytes. I wonder if this might be considered a bug (either in code, or of documentation), as this memory usage seems not to obey other limits, or at least wasn't documented in a way that might have helped me guess at the underlying problem. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #14654: With high statistics targets on ts_vector,unexpectedly high memory use & OOM are triggered
From
Heikki Linnakangas
Date:
On 05/14/2017 11:06 PM, james+postgres@carbocation.com wrote: > It seems that ANALYZE on a ts_vector column can consume 300 * (statistics > target) * (size of data in field), which in my case ended up being well > above 10 gigabytes. I wonder if this might be considered a bug (either in > code, or of documentation), as this memory usage seems not to obey other > limits, or at least wasn't documented in a way that might have helped me > guess at the underlying problem. Yes, I can see that happening here too. The problem seems to be that the analyze-function detoasts every row in the sample. Tsvectors can be very large, so it adds up. That's pretty easy to fix, the analyze function needs to free the detoasted copies as it goes. But in order to do that, it needs to make copies of all the lexemes stored in the hash table, instead of pointing directly to the detoasted copies. Patch attached. I think this counts as a bug, and we should backport this. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Attachment
Re: [BUGS] BUG #14654: With high statistics targets on ts_vector, unexpectedly high memory use & OOM are triggered
From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes: > Yes, I can see that happening here too. The problem seems to be that the > analyze-function detoasts every row in the sample. Tsvectors can be very > large, so it adds up. > That's pretty easy to fix, the analyze function needs to free the > detoasted copies as it goes. But in order to do that, it needs to make > copies of all the lexemes stored in the hash table, instead of pointing > directly to the detoasted copies. > Patch attached. I think this counts as a bug, and we should backport this. +1. I didn't test the patch, but it looks sane to the eyeball. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #14654: With high statistics targets on ts_vector,unexpectedly high memory use & OOM are triggered
From
Heikki Linnakangas
Date:
On 07/12/2017 06:30 PM, Tom Lane wrote: > Heikki Linnakangas <hlinnaka@iki.fi> writes: >> Yes, I can see that happening here too. The problem seems to be that the >> analyze-function detoasts every row in the sample. Tsvectors can be very >> large, so it adds up. > >> That's pretty easy to fix, the analyze function needs to free the >> detoasted copies as it goes. But in order to do that, it needs to make >> copies of all the lexemes stored in the hash table, instead of pointing >> directly to the detoasted copies. > >> Patch attached. I think this counts as a bug, and we should backport this. > > +1. I didn't test the patch, but it looks sane to the eyeball. Ok, committed. In some quick testing on my laptop, and the extra palloc+pfree adds about 10% of overhead, in the worst case scenario that every tsvector in the sample consists of totally unique lexemes. That's a bit unfortunate, but it's a lot better than consuming gigabytes of memory. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs