Thread: Thousands of schemas and ANALYZE goes out of memory
Hi everyone, We have two postgresql 9.0 databases (32-bits) with more than 10,000 schemas. When we try to run ANALYZE in those databases we get errors like this (after a few hours): 2012-09-14 01:46:24 PDT ERROR: out of memory 2012-09-14 01:46:24 PDT DETAIL: Failed on request of size 421. 2012-09-14 01:46:24 PDT STATEMENT: analyze; (Note that we do have plenty of memory available for postgresql: shared_buffers=2048MB, work_mem=128MB, maintenance_work_mem=384MB, effective_cache_size = 3072MB, etc.) We have other similar databases with less than 10,000 schemas and ANALYZE works fine with them (they run on similar machines and configs). For now, we had to create shell scripts to run ANALYZE per schema, table by table. It works that way, so at least we have an alternative solution. But what exactly causes the out of memory? Is postgresql trying to run everything in a single transaction? Maybe this should be improved for the future releases. Please let me know what you guys think. Thanks in advance, Hugo -- View this message in context: http://postgresql.1045698.n5.nabble.com/Thousands-of-schemas-and-ANALYZE-goes-out-of-memory-tp5726198.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
"Hugo <Nabble>" <hugo.tech@gmail.com> writes: > We have two postgresql 9.0 databases (32-bits) with more than 10,000 > schemas. When we try to run ANALYZE in those databases we get errors like > this (after a few hours): > 2012-09-14 01:46:24 PDT ERROR: out of memory > 2012-09-14 01:46:24 PDT DETAIL: Failed on request of size 421. > 2012-09-14 01:46:24 PDT STATEMENT: analyze; I doubt that the number of *schemas* is a big deal here, but the number of *tables* might well be. How many? Also, 9.0.what? regards, tom lane
> I doubt that the number of *schemas* is a big deal here, but the number > of *tables* might well be. How many? Also, 9.0.what? Each schema has 22 tables, so we can count at least 22 x 10,000 = 220,000 tables. The postgresql version is 9.0.7-1. Regards, Hugo -- View this message in context: http://postgresql.1045698.n5.nabble.com/Thousands-of-schemas-and-ANALYZE-goes-out-of-memory-tp5726198p5726212.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Mon, Oct 1, 2012 at 12:52 PM, Hugo <Nabble> <hugo.tech@gmail.com> wrote: > Hi everyone, > > We have two postgresql 9.0 databases (32-bits) Why 32 bits? Is that what your hardware is? > with more than 10,000 > schemas. When we try to run ANALYZE in those databases we get errors like > this (after a few hours): > > 2012-09-14 01:46:24 PDT ERROR: out of memory > 2012-09-14 01:46:24 PDT DETAIL: Failed on request of size 421. > 2012-09-14 01:46:24 PDT STATEMENT: analyze; > > (Note that we do have plenty of memory available for postgresql: > shared_buffers=2048MB, work_mem=128MB, maintenance_work_mem=384MB, > effective_cache_size = 3072MB, etc.) That might be the problem. I think with 32 bits, you only 2GB of address space available to any given process, and you just allowed shared_buffers to grab all of it. Cheers, Jeff
> Why 32 bits? Is that what your hardware is? The business started in 2005 and we have been using 32 bits since then. We have several machines, each with a remote replica databases (WAL shipping) configured and changing this to 64 bits is going to be a lot of work, let alone the down time of each server (pg_dump + pg_restore). But we will probably do this in the future after we finish some priorities. > That might be the problem. I think with 32 bits, you only 2GB of > address space available to any given process, and you just allowed > shared_buffers to grab all of it. The address space for 32 bits is 4Gb. We just tried to reach a balance in the configuration and it seems to be working (except for the ANALYZE command when the number of schemas/tables is huge). Some questions I have: 1) Is there any reason to run the ANALYZE command in a single transaction? 2) Is there any difference running the ANALYZE in the whole database or running it per schema, table by table? Thanks for all the help, Hugo -- View this message in context: http://postgresql.1045698.n5.nabble.com/Thousands-of-schemas-and-ANALYZE-goes-out-of-memory-tp5726198p5726351.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Tue, Oct 02, 2012 at 10:38:38AM -0700, Hugo <Nabble> wrote: > > That might be the problem. I think with 32 bits, you only 2GB of > > address space available to any given process, and you just allowed > > shared_buffers to grab all of it. > > The address space for 32 bits is 4Gb. We just tried to reach a balance in > the configuration and it seems to be working (except for the ANALYZE command > when the number of schemas/tables is huge). Are you sure about that? You don't say what OS you are using but on Linux 3Gb is normal and on Windows 2Gb. Here are some nice diagrams: http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory In my experience it's better to keep the shared buffers around your working set size and let the kernel cache the rest as needed. Setting the shared_buffers to 1Gb will give your server much more breathing space for large operations like what you are asking. Note that unlike the way some other database servers work, the shared_buffers is the *minimum* postgres will use, not the maximum. > Some questions I have: > > 1) Is there any reason to run the ANALYZE command in a single transaction? > 2) Is there any difference running the ANALYZE in the whole database or > running it per schema, table by table? I don't think it does do everything in a single transaction, though I can imagine that if you try to analyse the whole database it uses up more memory to track the work it has to do. With 220,000 tables I imagine this could add up. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer
Attachment
On Tue, Oct 2, 2012 at 10:38 AM, Hugo <Nabble> <hugo.tech@gmail.com> wrote: >> That might be the problem. I think with 32 bits, you only 2GB of >> address space available to any given process, and you just allowed >> shared_buffers to grab all of it. > > The address space for 32 bits is 4Gb. I had thought the highest bit was not usable, but maybe that was just a Windows thing. > We just tried to reach a balance in > the configuration and it seems to be working (except for the ANALYZE command > when the number of schemas/tables is huge). > > Some questions I have: > > 1) Is there any reason to run the ANALYZE command in a single transaction? I don't know how the transactionality of analyze works. I was surprised to find that I even could run it in an explicit transaction block, I thought it would behave like vacuum and create index concurrently in that regard. However, I think that that would not solve your problem. When I run analyze on each of 220,000 tiny tables by name within one session (using autocommit, so each in a transaction), it does run about 4 times faster than just doing a database-wide vacuum which covers those same tables. (Maybe this is the lock/resource manager issue that has been fixed for 9.3?) But it takes the same amount of memory. It is only by closing the connection periodically that I can reduce the peak memory usage. So I think the memory is going to syscache, catcache, and/or stats collector, which I think are non-transactional, live for the duration of the backend, and have no way to evict least recently used members once the caches get too large. Also, some parts of them seem to have N^2 performance, albeit with a very low constant. > 2) Is there any difference running the ANALYZE in the whole database or > running it per schema, table by table? I can't think of any important ones, unless there are some things you forget to analyze that way. Does auto analyze have the same problem as a manual analyze does? Probably not, unless your tables become eligible simultaneously. Cheers, Jeff
On Wed, Oct 3, 2012 at 10:09 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Tue, Oct 2, 2012 at 10:38 AM, Hugo <Nabble> <hugo.tech@gmail.com> wrote: >>> That might be the problem. I think with 32 bits, you only 2GB of >>> address space available to any given process, and you just allowed >>> shared_buffers to grab all of it. >> >> The address space for 32 bits is 4Gb. > > I had thought the highest bit was not usable, but maybe that was just > a Windows thing. Losing the highest bit is common when something treats a number as signed, but the <4GB memory issue isn't that. It's because other areas of memory need to be addressable (devices claim certain areas), and all that has to be factored into the total 4GB addressable space. Usually you get >3GB, somewhere around the 3.5GB mark. ChrisA
On Tue, Oct 2, 2012 at 5:09 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > I don't know how the transactionality of analyze works. I was > surprised to find that I even could run it in an explicit transaction > block, I thought it would behave like vacuum and create index > concurrently in that regard. > > However, I think that that would not solve your problem. When I run > analyze on each of 220,000 tiny tables by name within one session > (using autocommit, so each in a transaction), it does run about 4 > times faster than just doing a database-wide vacuum which covers those > same tables. (Maybe this is the lock/resource manager issue that has > been fixed for 9.3?) For the record, the culprit that causes "analyze;" of a database with a large number of small objects to be quadratic in time is "get_tabstat_entry" and it is not fixed for 9.3. Cheers, Jeff
We use Debian linux 32-bits, so the addressable space available really seems to be in the 3.0Gb to 3.5Gb range. Last night I decreased the shared_buffers from 2Gb to 1Gb and tried the global ANALYZE again. It went out of memory after 3 hours 40 minutes. That database has 12,197 schemas with 22 tables each, which means 268,334 tables. Should I keep reducing the shared_buffers and trying again? We don't plan to run the ANALYZE every week, so we can keep the shared_buffers high most of the time and tweak it when needed. I will try again tonight with ~512Mb and see what happens. Please let me know if you have ideas. Thanks again! Hugo -- View this message in context: http://postgresql.1045698.n5.nabble.com/Thousands-of-schemas-and-ANALYZE-goes-out-of-memory-tp5726198p5726657.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
"Hugo <Nabble>" <hugo.tech@gmail.com> writes: > We use Debian linux 32-bits, so the addressable space available really seems > to be in the 3.0Gb to 3.5Gb range. Last night I decreased the shared_buffers > from 2Gb to 1Gb and tried the global ANALYZE again. It went out of memory > after 3 hours 40 minutes. That database has 12,197 schemas with 22 tables > each, which means 268,334 tables. It sure sounds to me like you're using the schema for what ought to be a high-order key column in a much smaller number of tables ... having that many tables is simply never going to perform well. We can play whack-a-mole on individual bottlenecks all day, but that's never going to fix the fundamental problem. Some of the bottlenecks will be at kernel or filesystem level and thus out of our reach, anyway. regards, tom lane
On Thu, Oct 4, 2012 at 10:54 AM, Hugo <Nabble> <hugo.tech@gmail.com> wrote: > We use Debian linux 32-bits, so the addressable space available really seems > to be in the 3.0Gb to 3.5Gb range. Last night I decreased the shared_buffers > from 2Gb to 1Gb and tried the global ANALYZE again. It went out of memory > after 3 hours 40 minutes. That database has 12,197 schemas with 22 tables > each, which means 268,334 tables. Should I keep reducing the shared_buffers > and trying again? We don't plan to run the ANALYZE every week, so we can > keep the shared_buffers high most of the time and tweak it when needed. I > will try again tonight with ~512Mb and see what happens. Please let me know > if you have ideas. Your analyzes must be taking more memory than mine did, probably your tables had indexes and my dummy ones did not. But you said you had shell scripts to analyze table by table, and that seems like the solution (until you can upgrade to 64-bit) as it is both faster and more memory efficient to do it that way. Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > For the record, the culprit that causes "analyze;" of a database with > a large number of small objects to be quadratic in time is > "get_tabstat_entry" and it is not fixed for 9.3. I was a bit surprised by this assertion, as I'd thought that tabstats were flushed to the collector at transaction end, and thus that the internal transaction boundaries in a VACUUM or ANALYZE should prevent the tabstats table from getting unreasonably large. However, a look at the code shows that pgstat_report_stat() is only called when the main loop in postgres.c is about to wait for client input. We could build a lot of infrastructure to try to index the tabstat arrays more efficiently ... or we could just do something like the attached. It appears that the next tallest mole in the VACUUM case is CleanupTempFiles. This workload is not creating any temp files, I hope, so the implication is that have_pending_fd_cleanup is getting set by FileSetTransient (probably from blind writes). We might want to revisit how that works --- particularly since I see no reason that there would be any actually-blind writes in this example. But in any case, that innocent looking flag setting can result in a lot of work. regards, tom lane diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 14d1c08..a5d00fc 100644 *** a/src/backend/commands/vacuum.c --- b/src/backend/commands/vacuum.c *************** vacuum(VacuumStmt *vacstmt, Oid relid, b *** 251,256 **** --- 251,257 ---- { PopActiveSnapshot(); CommitTransactionCommand(); + pgstat_report_stat(false); } } } *************** vacuum_rel(Oid relid, VacuumStmt *vacstm *** 1071,1080 **** relation_close(onerel, NoLock); /* ! * Complete the transaction and free all temporary memory used. */ PopActiveSnapshot(); CommitTransactionCommand(); /* * If the relation has a secondary toast rel, vacuum that too while we --- 1072,1083 ---- relation_close(onerel, NoLock); /* ! * Complete the transaction and free all temporary memory used. Also, ! * flush table-access statistics to the stats collector after each table. */ PopActiveSnapshot(); CommitTransactionCommand(); + pgstat_report_stat(false); /* * If the relation has a secondary toast rel, vacuum that too while we
Is there a reason the following patch wasn't applied? --------------------------------------------------------------------------- On Thu, Oct 4, 2012 at 07:14:31PM -0400, Tom Lane wrote: > Jeff Janes <jeff.janes@gmail.com> writes: > > For the record, the culprit that causes "analyze;" of a database with > > a large number of small objects to be quadratic in time is > > "get_tabstat_entry" and it is not fixed for 9.3. > > I was a bit surprised by this assertion, as I'd thought that tabstats > were flushed to the collector at transaction end, and thus that the > internal transaction boundaries in a VACUUM or ANALYZE should prevent > the tabstats table from getting unreasonably large. However, a look > at the code shows that pgstat_report_stat() is only called when the main > loop in postgres.c is about to wait for client input. > > We could build a lot of infrastructure to try to index the tabstat > arrays more efficiently ... or we could just do something like the > attached. > > It appears that the next tallest mole in the VACUUM case is > CleanupTempFiles. This workload is not creating any temp files, I hope, > so the implication is that have_pending_fd_cleanup is getting set by > FileSetTransient (probably from blind writes). We might want to revisit > how that works --- particularly since I see no reason that there would > be any actually-blind writes in this example. But in any case, that > innocent looking flag setting can result in a lot of work. > > regards, tom lane > > diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c > index 14d1c08..a5d00fc 100644 > *** a/src/backend/commands/vacuum.c > --- b/src/backend/commands/vacuum.c > *************** vacuum(VacuumStmt *vacstmt, Oid relid, b > *** 251,256 **** > --- 251,257 ---- > { > PopActiveSnapshot(); > CommitTransactionCommand(); > + pgstat_report_stat(false); > } > } > } > *************** vacuum_rel(Oid relid, VacuumStmt *vacstm > *** 1071,1080 **** > relation_close(onerel, NoLock); > > /* > ! * Complete the transaction and free all temporary memory used. > */ > PopActiveSnapshot(); > CommitTransactionCommand(); > > /* > * If the relation has a secondary toast rel, vacuum that too while we > --- 1072,1083 ---- > relation_close(onerel, NoLock); > > /* > ! * Complete the transaction and free all temporary memory used. Also, > ! * flush table-access statistics to the stats collector after each table. > */ > PopActiveSnapshot(); > CommitTransactionCommand(); > + pgstat_report_stat(false); > > /* > * If the relation has a secondary toast rel, vacuum that too while we > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian <bruce@momjian.us> writes: > Is there a reason the following patch wasn't applied? I don't think anybody ever did the legwork to verify it was a good idea. In particular, it'd be good to check if sending a tabstat message for each table adds noticeable overhead. regards, tom lane
On Wed, May 13, 2015 at 06:10:26PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Is there a reason the following patch wasn't applied? > > I don't think anybody ever did the legwork to verify it was a good idea. > In particular, it'd be good to check if sending a tabstat message for each > table adds noticeable overhead. OK, I will mark it as closed then. Thanks. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian <bruce@momjian.us> writes: > On Wed, May 13, 2015 at 06:10:26PM -0400, Tom Lane wrote: >> Bruce Momjian <bruce@momjian.us> writes: >>> Is there a reason the following patch wasn't applied? >> I don't think anybody ever did the legwork to verify it was a good idea. >> In particular, it'd be good to check if sending a tabstat message for each >> table adds noticeable overhead. > OK, I will mark it as closed then. Thanks. I don't know that it should be closed exactly --- if we don't do this, we should do something else about the performance issue. Maybe put it on TODO? regards, tom lane
On Wed, May 13, 2015 at 06:23:58PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > On Wed, May 13, 2015 at 06:10:26PM -0400, Tom Lane wrote: > >> Bruce Momjian <bruce@momjian.us> writes: > >>> Is there a reason the following patch wasn't applied? > > >> I don't think anybody ever did the legwork to verify it was a good idea. > >> In particular, it'd be good to check if sending a tabstat message for each > >> table adds noticeable overhead. > > > OK, I will mark it as closed then. Thanks. > > I don't know that it should be closed exactly --- if we don't do this, > we should do something else about the performance issue. Maybe put it > on TODO? Thanks, TODO added: Reduce memory use when analyzing many tables (This is part of my clean up of old issues in preparation for 9.5 feature freeze.) -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Wed, May 13, 2015 at 3:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bruce Momjian <bruce@momjian.us> writes:
> Is there a reason the following patch wasn't applied?
I don't think anybody ever did the legwork to verify it was a good idea.
In particular, it'd be good to check if sending a tabstat message for each
table adds noticeable overhead.
regards, tom lane
Sorry, I thought you had already committed this change.
It definitely fixes the worst CPU bottleneck. To ANALYZE (manually, no table name specified) a database with 210,000 tables goes from 1280 seconds to 120 seconds. It does not meaningfully change the memory usage, as tabstat is not the bottleneck on memory usage, CacheMemoryContext is. It takes about 1.5GB either with this patch or without it.
I'm having trouble coming up with an adverse scenario to test for performance regressions. Perhaps a platform where IPC is pathologically slow? I don't have one of those at hand, nor even know what such a one might be. And presumably such a platform would be unsuited to the task of running a very large database anyway.
By the way, this issue can now be partially worked around using "vacuumdb -Z -j8". The quadratic CPU issue is fixed because with the parallel option it issues a series of "ANALYZE table" commands rather than one unqualified command for the whole database. And the memory usage originally reported is ameliorated because each backend gets own address space. The total amount of memory used remains the same, but the 32 bit limit doesn't apply to the aggregate, only to each process.
Cheers,
Jeff