Thread: CREATE TABLE slowing down significantly over time
I’m facing a problem where running a CREATE TABLE has slowed down significantly over time.
This is problematic because my application needs to routinely create a new schema and create 300 tables in each new schema. In total it takes about 3 minutes, which may not seem like a big deal, but this is time sensitive because users of our SaaS application are waiting in real-time for the schema and 300 the tables to be created.
It used to take about 15 seconds to create those 300 tables in a new schema (when there were only a few schemas, say about 50). It now takes about 3 minutes (and now we have about 200 schemas, with more data but not hugely so).
To debug this problem, I’ve created a new database in a separate (and dinky) laptop, and running a single test CREATE TABLE command takes about 19 ms.
But on the server with 200+ schemas, this single command takes between 200 and 300 ms.
My test command on psql is:
CREATE TABLE <TheSchemaName>.academicsemesters (
id text NOT NULL,
creationdate timestamp with time zone,
academicsemestername text,
academicyearandsemestername text,
startdate timestamp with time zone,
enddate timestamp with time zone,
isplanningmode boolean NOT NULL,
isclosed boolean NOT NULL,
isactive boolean NOT NULL,
status text,
workflowstubid text,
deleted boolean NOT NULL,
academicyearid text
);
* Any tips anyone can give on what might be the underlying cause of the slowing down of the CREATE TABLE command over time?
* Is the problem caused by the increasing number of schemas?
Thanks in advance,
Aris
"Aris Samad-Yahaya" <aris@quickschools.com> writes: > I'm facing a problem where running a CREATE TABLE has slowed down > significantly over time. System catalog bloat maybe? What are your vacuuming practices? regards, tom lane
On 8/11/2009 11:15 AM, Aris Samad-Yahaya wrote: > It used to take about 15 seconds to create those 300 tables in a new > schema (when there were only a few schemas, say about 50). It now takes > about 3 minutes (and now we have about 200 schemas, with more data but > not hugely so). 200 schemas, 300 tables per schema. That's sixty THOUSAND tables. > * Is the problem caused by the increasing number of schemas? and increasing table count, I expect. You do batch the table and schema creation into a single transaction, right? If not, do that first, rather than creating each table in a separate transaction (ie: relying on autocommit). It may also be worth thinking about the app's design. Is a new schema and 300 new tables for each user really the best way to tackle what you're doing? (It might be, but it's a question worth asking yourself). -- Craig Ringer
Hi Craig, Yes we do put the creation of the 300 tables into a single transaction. The difference between putting them in a single transaction and individual transactions is about 30 seconds over the 3 minutes. As for the creation of 300 individual tables for an account... yes we were trying to think through that issue very hard. It's the SaaS maturity levels discussion: How much do you separate the databases for each account, vs sharing customer information into large tables. I hear SalesForce puts most everything in giant tables, whereas we've decided to separate customer accounts into separate schemas. -----Original Message----- From: Craig Ringer [mailto:craig@postnewspapers.com.au] Sent: Saturday, November 07, 2009 10:48 PM To: Aris Samad-Yahaya Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] CREATE TABLE slowing down significantly over time On 8/11/2009 11:15 AM, Aris Samad-Yahaya wrote: > It used to take about 15 seconds to create those 300 tables in a new > schema (when there were only a few schemas, say about 50). It now takes > about 3 minutes (and now we have about 200 schemas, with more data but > not hugely so). 200 schemas, 300 tables per schema. That's sixty THOUSAND tables. > * Is the problem caused by the increasing number of schemas? and increasing table count, I expect. You do batch the table and schema creation into a single transaction, right? If not, do that first, rather than creating each table in a separate transaction (ie: relying on autocommit). It may also be worth thinking about the app's design. Is a new schema and 300 new tables for each user really the best way to tackle what you're doing? (It might be, but it's a question worth asking yourself). -- Craig Ringer
We vacuum analyze nightly, and vacuum normally ad-hoc (but we're going to schedule this weekly moving forward). Interesting pointer about system catalog bloat. I tried to vacuum full the system catalog tables (pg_*), and the performance for creating a single table manually improved dramatically (back to what it used to be), but as soon as I created the next schema, the performance went back down to the same level. So there's a clue there somewhere. Next I will try to vacuum full the entire database. -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Saturday, November 07, 2009 10:29 PM To: Aris Samad-Yahaya Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] CREATE TABLE slowing down significantly over time "Aris Samad-Yahaya" <aris@quickschools.com> writes: > I'm facing a problem where running a CREATE TABLE has slowed down > significantly over time. System catalog bloat maybe? What are your vacuuming practices? regards, tom lane
On Sat, Nov 7, 2009 at 9:58 PM, Aris Samad-Yahaya <aris@quickschools.com> wrote: > We vacuum analyze nightly, and vacuum normally ad-hoc (but we're going to > schedule this weekly moving forward). > > Interesting pointer about system catalog bloat. I tried to vacuum full the > system catalog tables (pg_*), and the performance for creating a single > table manually improved dramatically (back to what it used to be), but as > soon as I created the next schema, the performance went back down to the > same level. > > So there's a clue there somewhere. Next I will try to vacuum full the entire > database. If you don't run autovac, it's possible you've managed to bloat your pg_catalog tables... Note that we run similar numbers of tables, as we have 30 tables and about 10 indexes in over 2000 schemas. We did the trick Tom posted: alter function pg_table_is_visible(oid) cost 10; to get faster tab completion and / or \d performance.
On Sat, Nov 7, 2009 at 11:58 PM, Aris Samad-Yahaya <aris@quickschools.com> wrote: > We vacuum analyze nightly, and vacuum normally ad-hoc (but we're going to > schedule this weekly moving forward). > > Interesting pointer about system catalog bloat. I tried to vacuum full the > system catalog tables (pg_*), and the performance for creating a single > table manually improved dramatically (back to what it used to be), but as > soon as I created the next schema, the performance went back down to the > same level. > > So there's a clue there somewhere. Next I will try to vacuum full the entire > database. And maybe REINDEX, too. ...Robert
And maybe REINDEX, too.
--
GJ
On Sat, Nov 7, 2009 at 11:58 PM, Aris Samad-Yahaya <aris@quickschools.com> wrote: > We vacuum analyze nightly, and vacuum normally ad-hoc (but we're going to > schedule this weekly moving forward). > > Interesting pointer about system catalog bloat. I tried to vacuum full the > system catalog tables (pg_*), and the performance for creating a single > table manually improved dramatically (back to what it used to be), but as > soon as I created the next schema, the performance went back down to the > same level. > > So there's a clue there somewhere. Next I will try to vacuum full the entire > database. You should really enable autovacuum. You'll probably have to VACUUM FULL and REINDEX to clean everything up, but after that autovacuum should be MUCH more effective than a nightly vacuum run. If you're running some ancient Pg version where autovacuum is not enabled by default, you should also consider upgrading. There are a lot of goodies (including performance enhancements) in newer versions. ...Robert
Why is reindex needed ? Unless most of the key values get deleted frequently..this is not needed. (I am assuming postgres 8.x and above) On Sun, Nov 8, 2009 at 7:58 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Nov 7, 2009 at 11:58 PM, Aris Samad-Yahaya > <aris@quickschools.com> wrote: >> We vacuum analyze nightly, and vacuum normally ad-hoc (but we're going to >> schedule this weekly moving forward). >> >> Interesting pointer about system catalog bloat. I tried to vacuum full the >> system catalog tables (pg_*), and the performance for creating a single >> table manually improved dramatically (back to what it used to be), but as >> soon as I created the next schema, the performance went back down to the >> same level. >> >> So there's a clue there somewhere. Next I will try to vacuum full the entire >> database. > > And maybe REINDEX, too. > > ...Robert > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On Mon, Nov 9, 2009 at 5:22 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Nov 7, 2009 at 11:58 PM, Aris Samad-Yahaya > <aris@quickschools.com> wrote: >> We vacuum analyze nightly, and vacuum normally ad-hoc (but we're going to >> schedule this weekly moving forward). >> >> Interesting pointer about system catalog bloat. I tried to vacuum full the >> system catalog tables (pg_*), and the performance for creating a single >> table manually improved dramatically (back to what it used to be), but as >> soon as I created the next schema, the performance went back down to the >> same level. >> >> So there's a clue there somewhere. Next I will try to vacuum full the entire >> database. > > You should really enable autovacuum. You'll probably have to VACUUM > FULL and REINDEX to clean everything up, but after that autovacuum > should be MUCH more effective than a nightly vacuum run. If you're > running some ancient Pg version where autovacuum is not enabled by > default, you should also consider upgrading. There are a lot of > goodies (including performance enhancements) in newer versions. Also note that the argument that autovacuum chews up too much IO is moot now that you can set cost delay to 10 to 20 milliseconds. Unless you're running on the hairy edge of maximum IO at all times, autovac should be pretty much unnoticed.
On Mon, Nov 9, 2009 at 9:46 AM, Anj Adu <fotographs@gmail.com> wrote: > Why is reindex needed ? VACUUM FULL does not fix index bloat, only table boat. ...Robert
Scott Marlowe wrote: > Also note that the argument that autovacuum chews up too much IO is > moot now that you can set cost delay to 10 to 20 milliseconds. Unless > you're running on the hairy edge of maximum IO at all times, autovac > should be pretty much unnoticed And if you're running on the hairy edge like that, you really need autovacuum whether you think you can afford it or not. Badly maintained tables are also I/O intensive, and it's easy for someone who thinks "I'm too busy to allocate VACUUM time" to end up wasting more resources than it would have taken to just do things right in the first place. I see way too many people who suffer from false economy when it comes to autovacuum planning. Some comments on this whole discussion: 1) You don't end up with dead rows [auto]vacuum needs to clean up just when you delete things. They show up when you UPDATE things, too--the original row isn't removed until after the new one is written. The overhead isn't as bad on UPDATEs in 8.3 or later, but just because you don't delete doesn't mean you don't need VACUUM to clean up dead stuff. 2) Any time you find yourself considering VACUUM FULL to clean things up, you're probably making a mistake, because the sort of situations it's the only tool to recover from tend to be broader disasters. The guidelines in http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html spell out my feeling here as a tip: "the best way is to use CLUSTER or one of the table-rewriting variants of ALTER TABLE". If you're fighting performance issues because of some amount of background mismanagement with an unknown amount of table garbage in the past, it's quite possible you'll find the reinvigorated performance you get from CLUSTER worth the maintenance cost of needing an exclusive lock for it to run for a while. See http://it.toolbox.com/blogs/database-soup/getting-rid-of-vacuum-full-feedback-needed-33959 for more information. I run CLUSTER all the time, and every time I think I'm saving time by doing VACUUM FULL/REINDEX instead I regret it as another false economy. Turn on autovacuum, make it run all the time but at a slower average speed if you're concerned about its overhead, and use CLUSTER once to blow away the accumulated bloat from before you were doing the right things. 3) There is a lot of speculation here and no measurements. What I'd be doing in this case is running something like the query at http://wiki.postgresql.org/wiki/Disk_Usage (but without the lines that filter out pg_catalog because the catalogs are a strong suspect here) regularly while debugging the problem here. Measure how big all the catalog tables and indexes are, do your operations that make things better or worse, then measure again. Turn autovacuum on, repeat the test, see if things are different. This type of problem tends to be really easy to quantify. -- Greg Smith greg@2ndquadrant.com Baltimore, MD