Thread: Can anyone explain this: duplicate dbs.

Can anyone explain this: duplicate dbs.

From
"SpaceBallOne"
Date:
Wondering if someone could explain a pecularity for me:

We have a database which takes 1000ms to perform a certain query on.

If I pg_dump that database then create a new database (e.g. "tempdb") and upload the dump file (thus making a duplicate) then the same query only takes 190ms !!
Vacuum, vacuum analyse, and vacuum full analyse does not seem to have an impact on these times.

Can anyone explain why this may be occurring and how I might be able to keep the original database running at the same speed as "tempdb"?

Thanks in advance,

Dave.

Re: Can anyone explain this: duplicate dbs.

From
Christopher Kings-Lynne
Date:
> Can anyone explain why this may be occurring and how I might be able to
> keep the original database running at the same speed as "tempdb"?

You're not vacuuming anywhere near often enough.  Read up the database
maintenance section of the manual.  Then, set up contrib/pg_autovacuum
to vacuum your database regularly, or make a cron job to run "vacuumdb
-a -z -q" once an hour, say.

You can fix for the case when you haven't been vacuuming enough by a
once off VACUUM FULL ANALYZE command, but this will lock tables
exclusively as it does its work.

Chris

Re: Can anyone explain this: duplicate dbs.

From
Christopher Kings-Lynne
Date:
> If I pg_dump that database then create a new database (e.g. "tempdb")
> and upload the dump file (thus making a duplicate) then the same query
> only takes 190ms !!
> Vacuum, vacuum analyse, and vacuum full analyse does not seem to have an
> impact on these times.

Damn, for some reason I didn't read that you had already tried vacuum
full.  In that case, I can't explain it except perhaps you aren't
vacuuming properly, or the right thing, or it's a disk cache thing.

Chris

Re: Can anyone explain this: duplicate dbs.

From
John A Meinel
Date:
SpaceBallOne wrote:

> Wondering if someone could explain a pecularity for me:
>
> We have a database which takes 1000ms to perform a certain query on.
>
> If I pg_dump that database then create a new database (e.g. "tempdb")
> and upload the dump file (thus making a duplicate) then the same query
> only takes 190ms !!
> Vacuum, vacuum analyse, and vacuum full analyse does not seem to have
> an impact on these times.
>
> Can anyone explain why this may be occurring and how I might be able
> to keep the original database running at the same speed as "tempdb"?
>
> Thanks in advance,
>
> Dave.

What version of postgres?

There are a few possibilities. If you are having a lot of updates to the
table, you can get index bloat. And vacuum doesn't fix indexes. You have
to "REINDEX" to do that. Though REINDEX has the same lock that VACUUM
FULL has, so you need to be a little careful with it.

Probably better is to do CLUSTER, as it does a REINDEX and a sort, so
your table ends up nicer when you are done.

Also, older versions of postgres had a worse time with index bloat. One
thing that caused a lot of problem is a table that you insert into over
time, so that all the values are incrementing. If you are deleting older
entries, that area won't be re-used because they fall at the back end. I
believe newer versions have been fixed.

By the way, I think doing:

CREATE DATABASE tempdb WITH TEMPLATE = originaldb;

Is a much faster way of doing dump and load. I *think* it would recreate
indexes, etc. If it just does a copy it may not show the dump/restore
improvement.

John
=:->


Attachment

Re: Can anyone explain this: duplicate dbs.

From
"SpaceBallOne"
Date:
> What version of postgres?

8.0.2 ... but I think I've seen this before on 7.3 ...

> There are a few possibilities. If you are having a lot of updates to the
> table, you can get index bloat. And vacuum doesn't fix indexes. You have
> to "REINDEX" to do that. Though REINDEX has the same lock that VACUUM
> FULL has, so you need to be a little careful with it.

> Probably better is to do CLUSTER, as it does a REINDEX and a sort, so
> your table ends up nicer when you are done.

Thanks, will try those next time this problem crops up (i just deleted /
recreated the database to speed things for its users in the office ...
probably should have held off to see if I could find a solution first!).

Yes, the database / table-in-question does have a lot of updates, deletes,
and new rows (relatively speaking for a small business).

Would CLUSTER / REINDEX still have an effect if our queries were done via
sequential scan? This is a old database (as in built by me when i was just
starting to learn unix / postgres) so the database design is pretty horrible
(little normalisation, no indexes).

Have taken Chris's advice onboard too and setup cron to do a vacuumdb hourly
instead of my weekly vacuum.

Cheers,

Dave.




Re: Can anyone explain this: duplicate dbs.

From
Christopher Kings-Lynne
Date:
> Would CLUSTER / REINDEX still have an effect if our queries were done
> via sequential scan?

SELECTS don't write to the database, so they have no effect at all on
vacuuming/analyzing.  You only need to worry about that with writes.

> This is a old database (as in built by me when i
> was just starting to learn unix / postgres) so the database design is
> pretty horrible (little normalisation, no indexes).

No indexes?  Bloody hell :D

Use EXPLAIN ANALYZE SELECT ... ; on all of your selects to see where
they are slow and where you can add indexes...

Chris

Re: Can anyone explain this: duplicate dbs.

From
John A Meinel
Date:
SpaceBallOne wrote:

>> What version of postgres?
>
>
> 8.0.2 ... but I think I've seen this before on 7.3 ...
>
>> There are a few possibilities. If you are having a lot of updates to the
>> table, you can get index bloat. And vacuum doesn't fix indexes. You have
>> to "REINDEX" to do that. Though REINDEX has the same lock that VACUUM
>> FULL has, so you need to be a little careful with it.
>
>
>> Probably better is to do CLUSTER, as it does a REINDEX and a sort, so
>> your table ends up nicer when you are done.
>
>
> Thanks, will try those next time this problem crops up (i just deleted
> / recreated the database to speed things for its users in the office
> ... probably should have held off to see if I could find a solution
> first!).
>
> Yes, the database / table-in-question does have a lot of updates,
> deletes, and new rows (relatively speaking for a small business).
>
> Would CLUSTER / REINDEX still have an effect if our queries were done
> via sequential scan? This is a old database (as in built by me when i
> was just starting to learn unix / postgres) so the database design is
> pretty horrible (little normalisation, no indexes).

Well, my first recommendation is to put in some indexes. :) They are
relatively easy to setup and can drastically improve select performance.

What version of postgres are you using?
What does it say at the end of "VACUUM FULL ANALYZE VERBOSE", that
should tell you how many free pages were reclaimed and how big your free
space map should be.

If you only did 1 VACUUM FULL, you might try another, as it sounds like
your tables aren't properly filled. I'm pretty sure vacuum only removes
empty pages/marks locations for the free space map so they can be
re-used, while vacuum full will move entries around to create free pages.

It sounds like it didn't do it properly.

But even so, CLUSTER is still your friend, as it allows you to "presort"
the rows in your tables.

>
> Have taken Chris's advice onboard too and setup cron to do a vacuumdb
> hourly instead of my weekly vacuum.
>
> Cheers,
>
> Dave.
>
>
John
=:->


Attachment

Re: Can anyone explain this: duplicate dbs.

From
Tom Lane
Date:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
>> If I pg_dump that database then create a new database (e.g. "tempdb")
>> and upload the dump file (thus making a duplicate) then the same query
>> only takes 190ms !!
>> Vacuum, vacuum analyse, and vacuum full analyse does not seem to have an
>> impact on these times.

> Damn, for some reason I didn't read that you had already tried vacuum
> full.

I'm thinking index bloat, and a PG version too old for vacuum full to
recover any index space.  But without any information about PG version
or EXPLAIN ANALYZE results, we're all just guessing.

            regards, tom lane

Re: Can anyone explain this: duplicate dbs.

From
Tom Lane
Date:
John A Meinel <john@arbash-meinel.com> writes:
> By the way, I think doing:

> CREATE DATABASE tempdb WITH TEMPLATE = originaldb;

> Is a much faster way of doing dump and load. I *think* it would recreate
> indexes, etc. If it just does a copy it may not show the dump/restore
> improvement.

CREATE DATABASE just does a physical copy, so it won't do anything at
all for bloat issues.

            regards, tom lane

Re: Can anyone explain this: duplicate dbs.

From
Bruno Wolff III
Date:
On Tue, May 24, 2005 at 21:39:15 -0500,
  John A Meinel <john@arbash-meinel.com> wrote:
>
> By the way, I think doing:
>
> CREATE DATABASE tempdb WITH TEMPLATE = originaldb;
>
> Is a much faster way of doing dump and load. I *think* it would recreate
> indexes, etc. If it just does a copy it may not show the dump/restore
> improvement.

You need to be careful when doing this. See section 18.3 of the 8.0 docs
for caveats.