Thread: CREATE TABLE slowing down significantly over time

CREATE TABLE slowing down significantly over time

From
"Aris Samad-Yahaya"
Date:

I’m facing a problem where running a CREATE TABLE has slowed down significantly over time.

 

This is problematic because my application needs to routinely create a new schema and create 300 tables in each new schema. In total it takes about 3 minutes, which may not seem like a big deal, but this is time sensitive because users of our SaaS application are waiting in real-time for the schema and 300 the tables to be created.

 

It used to take about 15 seconds to create those 300 tables in a new schema (when there were only a few schemas, say about 50). It now takes about 3 minutes (and now we have about 200 schemas, with more data but not hugely so).

 

To debug this problem, I’ve created a new database in a separate (and dinky) laptop, and running a single test CREATE TABLE command takes about 19 ms.

 

But on the server with 200+ schemas, this single command takes between 200 and 300 ms.

 

My test command on psql is:

CREATE TABLE <TheSchemaName>.academicsemesters (

    id text NOT NULL,

    creationdate timestamp with time zone,

    academicsemestername text,

    academicyearandsemestername text,

    startdate timestamp with time zone,

    enddate timestamp with time zone,

    isplanningmode boolean NOT NULL,

    isclosed boolean NOT NULL,

    isactive boolean NOT NULL,

    status text,

    workflowstubid text,

    deleted boolean NOT NULL,

    academicyearid text

);

 

* Any tips anyone can give on what might be the underlying cause of the slowing down of the CREATE TABLE command over time?

* Is the problem caused by the increasing number of schemas?

 

Thanks in advance,

Aris

Re: CREATE TABLE slowing down significantly over time

From
Tom Lane
Date:
"Aris Samad-Yahaya" <aris@quickschools.com> writes:
> I'm facing a problem where running a CREATE TABLE has slowed down
> significantly over time.

System catalog bloat maybe?  What are your vacuuming practices?

            regards, tom lane

Re: CREATE TABLE slowing down significantly over time

From
Craig Ringer
Date:
On 8/11/2009 11:15 AM, Aris Samad-Yahaya wrote:

> It used to take about 15 seconds to create those 300 tables in a new
> schema (when there were only a few schemas, say about 50). It now takes
> about 3 minutes (and now we have about 200 schemas, with more data but
> not hugely so).

200 schemas, 300 tables per schema. That's sixty THOUSAND tables.

> * Is the problem caused by the increasing number of schemas?

and increasing table count, I expect.

You do batch the table and schema creation into a single transaction,
right? If not, do that first, rather than creating each table in a
separate transaction (ie: relying on autocommit).

It may also be worth thinking about the app's design. Is a new schema
and 300 new tables for each user really the best way to tackle what
you're doing? (It might be, but it's a question worth asking yourself).

--
Craig Ringer

Re: CREATE TABLE slowing down significantly over time

From
"Aris Samad-Yahaya"
Date:
Hi Craig,

Yes we do put the creation of the 300 tables into a single transaction. The
difference between putting them in a single transaction and individual
transactions is about 30 seconds over the 3 minutes.

As for the creation of 300 individual tables for an account... yes we were
trying to think through that issue very hard. It's the SaaS maturity levels
discussion: How much do you separate the databases for each account, vs
sharing customer information into large tables. I hear SalesForce puts most
everything in giant tables, whereas we've decided to separate customer
accounts into separate schemas.

-----Original Message-----
From: Craig Ringer [mailto:craig@postnewspapers.com.au]
Sent: Saturday, November 07, 2009 10:48 PM
To: Aris Samad-Yahaya
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] CREATE TABLE slowing down significantly over time

On 8/11/2009 11:15 AM, Aris Samad-Yahaya wrote:

> It used to take about 15 seconds to create those 300 tables in a new
> schema (when there were only a few schemas, say about 50). It now takes
> about 3 minutes (and now we have about 200 schemas, with more data but
> not hugely so).

200 schemas, 300 tables per schema. That's sixty THOUSAND tables.

> * Is the problem caused by the increasing number of schemas?

and increasing table count, I expect.

You do batch the table and schema creation into a single transaction,
right? If not, do that first, rather than creating each table in a
separate transaction (ie: relying on autocommit).

It may also be worth thinking about the app's design. Is a new schema
and 300 new tables for each user really the best way to tackle what
you're doing? (It might be, but it's a question worth asking yourself).

--
Craig Ringer


Re: CREATE TABLE slowing down significantly over time

From
"Aris Samad-Yahaya"
Date:
We vacuum analyze nightly, and vacuum normally ad-hoc (but we're going to
schedule this weekly moving forward).

Interesting pointer about system catalog bloat. I tried to vacuum full the
system catalog tables (pg_*), and the performance for creating a single
table manually improved dramatically (back to what it used to be), but as
soon as I created the next schema, the performance went back down to the
same level.

So there's a clue there somewhere. Next I will try to vacuum full the entire
database.

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Saturday, November 07, 2009 10:29 PM
To: Aris Samad-Yahaya
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] CREATE TABLE slowing down significantly over time

"Aris Samad-Yahaya" <aris@quickschools.com> writes:
> I'm facing a problem where running a CREATE TABLE has slowed down
> significantly over time.

System catalog bloat maybe?  What are your vacuuming practices?

            regards, tom lane


Re: CREATE TABLE slowing down significantly over time

From
Scott Marlowe
Date:
On Sat, Nov 7, 2009 at 9:58 PM, Aris Samad-Yahaya <aris@quickschools.com> wrote:
> We vacuum analyze nightly, and vacuum normally ad-hoc (but we're going to
> schedule this weekly moving forward).
>
> Interesting pointer about system catalog bloat. I tried to vacuum full the
> system catalog tables (pg_*), and the performance for creating a single
> table manually improved dramatically (back to what it used to be), but as
> soon as I created the next schema, the performance went back down to the
> same level.
>
> So there's a clue there somewhere. Next I will try to vacuum full the entire
> database.

If you don't run autovac, it's possible you've managed to bloat your
pg_catalog tables...  Note that we run similar numbers of tables, as
we have 30 tables and about 10 indexes in over 2000 schemas.  We did
the trick Tom posted:
alter function pg_table_is_visible(oid) cost 10;
to get faster tab completion and / or \d performance.

Re: CREATE TABLE slowing down significantly over time

From
Robert Haas
Date:
On Sat, Nov 7, 2009 at 11:58 PM, Aris Samad-Yahaya
<aris@quickschools.com> wrote:
> We vacuum analyze nightly, and vacuum normally ad-hoc (but we're going to
> schedule this weekly moving forward).
>
> Interesting pointer about system catalog bloat. I tried to vacuum full the
> system catalog tables (pg_*), and the performance for creating a single
> table manually improved dramatically (back to what it used to be), but as
> soon as I created the next schema, the performance went back down to the
> same level.
>
> So there's a clue there somewhere. Next I will try to vacuum full the entire
> database.

And maybe REINDEX, too.

...Robert

Re: CREATE TABLE slowing down significantly over time

From
Grzegorz Jaśkiewicz
Date:


On Mon, Nov 9, 2009 at 3:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:

And maybe REINDEX, too.

yup, nevermind the mess in table, indices are getting fscked much quicker than table it self, because of its structure. 
 



--
GJ

Re: CREATE TABLE slowing down significantly over time

From
Robert Haas
Date:
On Sat, Nov 7, 2009 at 11:58 PM, Aris Samad-Yahaya
<aris@quickschools.com> wrote:
> We vacuum analyze nightly, and vacuum normally ad-hoc (but we're going to
> schedule this weekly moving forward).
>
> Interesting pointer about system catalog bloat. I tried to vacuum full the
> system catalog tables (pg_*), and the performance for creating a single
> table manually improved dramatically (back to what it used to be), but as
> soon as I created the next schema, the performance went back down to the
> same level.
>
> So there's a clue there somewhere. Next I will try to vacuum full the entire
> database.

You should really enable autovacuum.  You'll probably have to VACUUM
FULL and REINDEX to clean everything up, but after that autovacuum
should be MUCH more effective than a nightly vacuum run.  If you're
running some ancient Pg version where autovacuum is not enabled by
default, you should also consider upgrading.  There are a lot of
goodies (including performance enhancements) in newer versions.

...Robert

Re: CREATE TABLE slowing down significantly over time

From
Anj Adu
Date:
Why is reindex needed ?    Unless most of the key values get deleted
frequently..this is not needed. (I am assuming postgres 8.x and above)

On Sun, Nov 8, 2009 at 7:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Nov 7, 2009 at 11:58 PM, Aris Samad-Yahaya
> <aris@quickschools.com> wrote:
>> We vacuum analyze nightly, and vacuum normally ad-hoc (but we're going to
>> schedule this weekly moving forward).
>>
>> Interesting pointer about system catalog bloat. I tried to vacuum full the
>> system catalog tables (pg_*), and the performance for creating a single
>> table manually improved dramatically (back to what it used to be), but as
>> soon as I created the next schema, the performance went back down to the
>> same level.
>>
>> So there's a clue there somewhere. Next I will try to vacuum full the entire
>> database.
>
> And maybe REINDEX, too.
>
> ...Robert
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: CREATE TABLE slowing down significantly over time

From
Scott Marlowe
Date:
On Mon, Nov 9, 2009 at 5:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Nov 7, 2009 at 11:58 PM, Aris Samad-Yahaya
> <aris@quickschools.com> wrote:
>> We vacuum analyze nightly, and vacuum normally ad-hoc (but we're going to
>> schedule this weekly moving forward).
>>
>> Interesting pointer about system catalog bloat. I tried to vacuum full the
>> system catalog tables (pg_*), and the performance for creating a single
>> table manually improved dramatically (back to what it used to be), but as
>> soon as I created the next schema, the performance went back down to the
>> same level.
>>
>> So there's a clue there somewhere. Next I will try to vacuum full the entire
>> database.
>
> You should really enable autovacuum.  You'll probably have to VACUUM
> FULL and REINDEX to clean everything up, but after that autovacuum
> should be MUCH more effective than a nightly vacuum run.  If you're
> running some ancient Pg version where autovacuum is not enabled by
> default, you should also consider upgrading.  There are a lot of
> goodies (including performance enhancements) in newer versions.

Also note that the argument that autovacuum chews up too much IO is
moot now that you can set cost delay to 10 to 20 milliseconds.  Unless
you're running on the hairy edge of maximum IO at all times, autovac
should be pretty much unnoticed.

Re: CREATE TABLE slowing down significantly over time

From
Robert Haas
Date:
On Mon, Nov 9, 2009 at 9:46 AM, Anj Adu <fotographs@gmail.com> wrote:
> Why is reindex needed ?

VACUUM FULL does not fix index bloat, only table boat.

...Robert

Re: CREATE TABLE slowing down significantly over time

From
Greg Smith
Date:
Scott Marlowe wrote:
> Also note that the argument that autovacuum chews up too much IO is
> moot now that you can set cost delay to 10 to 20 milliseconds.  Unless
> you're running on the hairy edge of maximum IO at all times, autovac
> should be pretty much unnoticed
And if you're running on the hairy edge like that, you really need
autovacuum whether you think you can afford it or not.  Badly maintained
tables are also I/O intensive, and it's easy for someone who thinks "I'm
too busy to allocate VACUUM time" to end up wasting more resources than
it would have taken to just do things right in the first place.  I see
way too many people who suffer from false economy when it comes to
autovacuum planning.

Some comments on this whole discussion:

1) You don't end up with dead rows [auto]vacuum needs to clean up just
when you delete things.  They show up when you UPDATE things, too--the
original row isn't removed until after the new one is written.  The
overhead isn't as bad on UPDATEs in 8.3 or later, but just because you
don't delete doesn't mean you don't need VACUUM to clean up dead stuff.

2) Any time you find yourself considering VACUUM FULL to clean things
up, you're probably making a mistake, because the sort of situations
it's the only tool to recover from tend to be broader disasters.  The
guidelines in
http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html
spell out my feeling here as a tip:  "the best way is to use CLUSTER or
one of the table-rewriting variants of ALTER TABLE".  If you're fighting
performance issues because of some amount of background mismanagement
with an unknown amount of table garbage in the past, it's quite possible
you'll find the reinvigorated performance you get from CLUSTER worth the
maintenance cost of needing an exclusive lock for it to run for a
while.  See
http://it.toolbox.com/blogs/database-soup/getting-rid-of-vacuum-full-feedback-needed-33959
for more information.  I run CLUSTER all the time, and every time I
think I'm saving time by doing VACUUM FULL/REINDEX instead I regret it
as another false economy.  Turn on autovacuum, make it run all the time
but at a slower average speed if you're concerned about its overhead,
and use CLUSTER once to blow away the accumulated bloat from before you
were doing the right things.

3) There is a lot of speculation here and no measurements.  What I'd be
doing in this case is running something like the query at
http://wiki.postgresql.org/wiki/Disk_Usage (but without the lines that
filter out pg_catalog because the catalogs are a strong suspect here)
regularly while debugging the problem here.  Measure how big all the
catalog tables and indexes are, do your operations that make things
better or worse, then measure again.  Turn autovacuum on, repeat the
test, see if things are different.  This type of problem tends to be
really easy to quantify.

--
Greg Smith    greg@2ndquadrant.com    Baltimore, MD