The following bug has been logged on the website:
Bug reference: 14654
Logged by: James C
Email address: james+postgres@carbocation.com
PostgreSQL version: 9.5.5
Operating system: Linux
Description:
I am running a mirror of PubMed, which contains about 27 million articles.
One of my tables (article) includes a unique ID for each article, the
article title, a ts_vector of the abstract, some timestamps, and a numeric
column. There are (unsurprisingly) about 27 million rows in this table.
Because the abstracts contain scientific terms, the number of lexemes is
quite large.
With some prototyping on a local machine with 64G of RAM, I found queries to
work quite well with the the ts_vector column having default statistics of
10,000. The production box has about 12G of RAM, but initially I didn't
notice any issues.
Today, I attempted to execute 'ANALYZE article;' However, the OOM killer
kept taking down postgres. I reviewed numerous system settings and postgres
settings, all of which seemed to properly give about 10G+ of RAM to
postgres, and which should have constrained postgres to using less than that
limit.
RhodiumToad in Freenode#postgres helped me troubleshoot, and suggested I try
reducing the statistics on the ts_vector. Dropping from 10,000 to 1,000
permitted ANALYZE to run without consuming large amounts of RAM.
It seems that ANALYZE on a ts_vector column can consume 300 * (statistics
target) * (size of data in field), which in my case ended up being well
above 10 gigabytes. I wonder if this might be considered a bug (either in
code, or of documentation), as this memory usage seems not to obey other
limits, or at least wasn't documented in a way that might have helped me
guess at the underlying problem.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs