Thread: slow pgsql tables - need to vacuum?

slow pgsql tables - need to vacuum?

From
Dan99
Date:
Hi,

I am having some troubles with a select group of tables in a database
which are acting unacceptably slow.  For example a table with
approximately < 10,000 rows took about 3,500ms to extract a single row
using the following select statement:

SELECT * FROM table WHERE column = 'value'

I have preformed this same test on a number of different tables, only
a few of which have this same problem.  The only common thing that I
can see between these affected tables is the fact that they are dumped
and re-populated every day from an outside source.  The temporary
solution that I have found is to copy all the data into another (brand
new) table using:

SELECT * INTO table2 FROM table

After dropping the old table and renaming the new table to reflect the
old table, I run the same tests as before and find that the table
responds much faster (approx. < 10ms).  My solution is only addressing
the symptoms of the problem and not the actual root cause.  So my
question to everyone is what is really happening?  Why are these
tables becoming so slow.  The idea that I am leaning towards is the
need to regularly vacuum these tables, which has never been done
before.  After switching from other database systems, I am
unaccustomed to the vacuuming concept.  Thus, if you believe I am
correct in my believe that this is causing all my problems, can you
please explain vacuuming to me.  I have attempted to run a simple
"VACUUM VERBOSE" command on the entire database, with little success
(it comes back saying something along the lines of the need to
increase the max_fsm_pages, which I am unfamiliar with).

Sorry for the long post, but any help with the above would be greatly
appreciated.

Thank you,
Daniel

Re: slow pgsql tables - need to vacuum?

From
"Douglas McNaught"
Date:
On Thu, Apr 3, 2008 at 2:34 PM, Dan99 <power919@gmail.com> wrote:
> Hi,
>
>  I am having some troubles with a select group of tables in a database
>  which are acting unacceptably slow.  For example a table with
>  approximately < 10,000 rows took about 3,500ms to extract a single row
>  using the following select statement:
>
>  SELECT * FROM table WHERE column = 'value'
>
>  I have preformed this same test on a number of different tables, only
>  a few of which have this same problem.  The only common thing that I
>  can see between these affected tables is the fact that they are dumped
>  and re-populated every day from an outside source.

You need to ANALYZE the tables after you load them, and make sure you
have indexes on the column you're querying (which it sounds like you
do, but they're not being used because the statistics for the table
are inaccurate).  There may also be a lot of dead tuples which will
further slow down a sequential scan.

Do read up on VACUUM and MVCC in the docs--it's a very important
thing. You will suffer horribly unless you have a working periodic
VACUUM.

Also, are you using TRUNCATE TABLE to clear out before the reload, or
a mass DELETE?  The latter will leave a lot of dead rows, bloating the
table and slowing down scans.  TRUNCATE just deletes the table file
and recreates it empty.

-Doug

Re: slow pgsql tables - need to vacuum?

From
Craig Ringer
Date:
Douglas McNaught wrote:

> You need to ANALYZE the tables after you load them, and make sure you
> have indexes on the column you're querying (which it sounds like you
> do, but they're not being used because the statistics for the table
> are inaccurate).

I've seen discussion here that made it sound like a REINDEX might also
be required on tables with really high data churn (ie when VACUUM /
VACUUM FULL are run a lot) - if you're not dropping and re-creating the
indexes for better bulk load performance anyway, of course. Am I just
confused, or can this sometimes be necessary?

--
Craig Ringer

Re: slow pgsql tables - need to vacuum?

From
Gregory Stark
Date:
"Craig Ringer" <craig@postnewspapers.com.au> writes:

> Douglas McNaught wrote:
>
>> You need to ANALYZE the tables after you load them, and make sure you
>> have indexes on the column you're querying (which it sounds like you
>> do, but they're not being used because the statistics for the table
>> are inaccurate).
>
> I've seen discussion here that made it sound like a REINDEX might also
> be required on tables with really high data churn (ie when VACUUM /
> VACUUM FULL are run a lot) - if you're not dropping and re-creating the
> indexes for better bulk load performance anyway, of course. Am I just
> confused, or can this sometimes be necessary?

If VACUUM FULL is run a lot, definitely. VACUUM FULL actually makes your
indexes *worse* not better.

If VACUUM is being run regularly enough that VACUUM FULL isn't necessary then
probably not. But there are usage patterns where it can still be necessary. If
you load lots of sequential records then delete all but one for a time period,
for example. That ends up leaving a page holding just one record which VACUUM
can't clean up.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

Re: slow pgsql tables - need to vacuum?

From
Dan99
Date:
On Apr 5, 6:36 pm, d...@mcnaught.org ("Douglas McNaught") wrote:
> On Thu, Apr 3, 2008 at 2:34 PM, Dan99 <power...@gmail.com> wrote:
> > Hi,
>
> >  I am having some troubles with a select group of tables in a database
> >  which are acting unacceptably slow.  For example a table with
> >  approximately < 10,000 rows took about 3,500ms to extract a single row
> >  using the following select statement:
>
> >  SELECT * FROM table WHERE column = 'value'
>
> >  I have preformed this same test on a number of different tables, only
> >  a few of which have this same problem.  The only common thing that I
> >  can see between these affected tables is the fact that they are dumped
> >  and re-populated every day from an outside source.
>
> You need to ANALYZE the tables after you load them, and make sure you
> have indexes on the column you're querying (which it sounds like you
> do, but they're not being used because the statistics for the table
> are inaccurate).  There may also be a lot of dead tuples which will
> further slow down a sequential scan.
>
> Do read up on VACUUM and MVCC in the docs--it's a very important
> thing. You will suffer horribly unless you have a working periodic
> VACUUM.
>
> Also, are you using TRUNCATE TABLE to clear out before the reload, or
> a mass DELETE?  The latter will leave a lot of dead rows, bloating the
> table and slowing down scans.  TRUNCATE just deletes the table file
> and recreates it empty.
>
> -Doug
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general

Unfortunately, I did not design this database (or the website for that
matter) and am only maintaining it.  As a result of the inexperience
of the website designer, there are no indexes in any of the tables and
it would be rather difficult to put them in after the fact since this
is a live website.  Does TRUNCATE TABLE keep all necessary table
information such as indexes, constraints, triggers, rules, and
privileges? Currently a mass DELETE is being used to remove the data.
Since VACUUM has never been done on the tables before, should a VACUUM
FULL be done first?  If so, approximately how long does a VACUUM FULL
take on a database with 25 tables each having anywhere form 1,000 to
50,000 rows?  The reason I ask is because this is a live website, and
any down time is very inconvenient.  Also, would it be sufficient
(after the first VACUUM FULL) to simply use a VACUUM ANALYZE after the
tables are repopulated (ie. every night)?

Your help is much appreciated.
Daniel

Re: slow pgsql tables - need to vacuum?

From
"Douglas McNaught"
Date:
On Mon, Apr 7, 2008 at 9:51 AM, Dan99 <power919@gmail.com> wrote:

>  Unfortunately, I did not design this database (or the website for that
>  matter) and am only maintaining it.  As a result of the inexperience
>  of the website designer, there are no indexes in any of the tables and
>  it would be rather difficult to put them in after the fact since this
>  is a live website.

Indexes can be created online with no downtime.  They do block some
operations.  If you're running 8.2 or 8.3, you can use CREATE INDEX
CONCURRENTLY which takes longer but doesn't block normal operations.
Otherwise, pick a time when activity is minimal to do your CREATE
INDEX.

>  Does TRUNCATE TABLE keep all necessary table
>  information such as indexes, constraints, triggers, rules, and
>  privileges? Currently a mass DELETE is being used to remove the data.

Read the docs.  It may depend on your version of Postgres.  See below
for docs location.

>  Since VACUUM has never been done on the tables before, should a VACUUM
>  FULL be done first?  If so, approximately how long does a VACUUM FULL
>  take on a database with 25 tables each having anywhere form 1,000 to
>  50,000 rows?  The reason I ask is because this is a live website, and
>  any down time is very inconvenient.  Also, would it be sufficient
>  (after the first VACUUM FULL) to simply use a VACUUM ANALYZE after the
>  tables are repopulated (ie. every night)?

If you have the extra disk space, CLUSTER is supposed to be better
than VACUUM FULL, but you need an index to cluster the table on.

If you use TRUNCATE, the VACUUM is not necessary but an ANALYZE would be useful.

I don't think you ever said what version you're running--that would be
helpful.  "SELECT version();" at the psql prompt will tell you the
server version.

I highly recommend referring to the docs for your version of Postgres at:

http://www.postgresql.org/docs/

if you have any questions about the above commands.

-Doug

Re: slow pgsql tables - need to vacuum?

From
Alan Hodgson
Date:
On Monday 07 April 2008, Dan99 <power919@gmail.com> wrote:
> Does TRUNCATE TABLE keep all necessary table
> information such as indexes, constraints, triggers, rules, and
> privileges?

Yes. It does require an exclusive lock on the table very briefly, though,
which DELETE does not.

> Currently a mass DELETE is being used to remove the data.

And that's why the table is bloating. Especially if you aren't VACUUMing it
before loading the new data.

> Since VACUUM has never been done on the tables before, should a VACUUM
> FULL be done first?  If so, approximately how long does a VACUUM FULL
> take on a database with 25 tables each having anywhere form 1,000 to
> 50,000 rows?

Honestly, you'd be better off dumping and reloading the database. With that
little data, it would be pretty quick. Although, VACUUM is pretty fast on
tables with no indexes.

> The reason I ask is because this is a live website, and
> any down time is very inconvenient.  Also, would it be sufficient
> (after the first VACUUM FULL) to simply use a VACUUM ANALYZE after the
> tables are repopulated (ie. every night)?

If you do a TRUNCATE and then a fresh load, a simple ANALYZE is sufficient.

You really should create some indexes though. Right now your queries are
looping through the whole table for every SELECT. The only reason you're
not dying is your tables are small enough to completely fit in memory, and
presumably your query load is fairly low.

--
Alan

Re: slow pgsql tables - need to vacuum?

From
Dan99
Date:
On Apr 7, 11:14 am, ahodg...@simkin.ca (Alan Hodgson) wrote:
> On Monday 07 April 2008, Dan99 <power...@gmail.com> wrote:
>
> > Does TRUNCATE TABLE keep all necessary table
> > information such as indexes, constraints, triggers, rules, and
> > privileges?
>
> Yes. It does require an exclusive lock on the table very briefly, though,
> which DELETE does not.
>
> > Currently a mass DELETE is being used to remove the data.
>
> And that's why the table is bloating. Especially if you aren't VACUUMing it
> before loading the new data.
>
> > Since VACUUM has never been done on the tables before, should a VACUUM
> > FULL be done first?  If so, approximately how long does a VACUUM FULL
> > take on a database with 25 tables each having anywhere form 1,000 to
> > 50,000 rows?
>
> Honestly, you'd be better off dumping and reloading the database. With that
> little data, it would be pretty quick. Although, VACUUM is pretty fast on
> tables with no indexes.
>
> > The reason I ask is because this is a live website, and
> > any down time is very inconvenient.  Also, would it be sufficient
> > (after the first VACUUM FULL) to simply use a VACUUM ANALYZE after the
> > tables are repopulated (ie. every night)?
>
> If you do a TRUNCATE and then a fresh load, a simple ANALYZE is sufficient.
>
> You really should create some indexes though. Right now your queries are
> looping through the whole table for every SELECT. The only reason you're
> not dying is your tables are small enough to completely fit in memory, and
> presumably your query load is fairly low.
>
> --
> Alan
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general

A new website and hence a new database is planed for the near future,
so It is good that I am learning all this now.  How do indexes work
and what columns should I put them on in a given table?  Technically,
what is the difference between a VACUUM and VACUUM FULL?  I know I can
probably get all this information from the docs, but I hope you guys
can bear with me just a little bit longer :)

Thanks,
Daniel