Thread: Performance problems - Indexes and VACUUM

Performance problems - Indexes and VACUUM

From
"Josh Berkus"
Date:
Tom, Folks:

I am having a rather interesting time getting performance out of my
database.   I'd really appreciate some feedback from the list on this.

As you may recall, I've gotten around Postgres' lack of rowset-returning
stored procedures by constructing "pointer tables" which simply hold
lists of primary keys related to the user's current search.  This is an
excellent approach for a browser-based application, and I have since
used this idea on other databases, even one that supports stored
procedures.

However, this means that I clear all of these pointer tables on a
periodic basis (how frequently depends on usage). Just clearing the
records didn't work, because of the Postgres "padded index" problem
where eventually the indexes on these tables becomes full of deleted
rows.  Which gives me problem 1:

1. INDEXES: I discovered, the hard way, a peculiar problem.  If you drop
and re-create a table within the same transaction (in a function, for
example) the indexes do not get dropped completely.  Doing this to
several tables, I had the disturbing experience of seeing incorrect rows
in response to some queries.  Specifically dropping each of the indexes,
dropping the tables, re-creating the tables, and re-creating the indexes
seems to work.  However, this seems to me to indicate a potential
problem with DDL commands within transactions.

The second problem is giving me severe grief right now:

2. I have a very complex view designed for browsing client information.
This view involves 2 other views, and two custom aggregates which are
based on sub-queries (could only do it in Postgres!).  The query plan is
as long as this e-mail, but thanks to optimization and good indexing it
runs in about 2 seconds right after a VACUUM.
    Unfortunately, 6 hours after a VACUUM, the query bogs down.  The query
plan does not seem to have changed much, but somehow what took 50% of
the processor for 2 seconds at 8:30AM flattens the processor for a full
45 seconds at 3:30 pm.
    Once VACUUM can be run in the background, I suppose that this can be
dealt with, but until then does anyone have any suggestions?

-Josh Berkus



______AGLIO DATABASE SOLUTIONS___________________________
                                       Josh Berkus
  Complete information technology      josh@agliodbs.com
   and data management solutions       (415) 565-7293
  for law firms, small businesses        fax 621-2533
    and non-profit organizations.      San Francisco

Attachment

Re: Performance problems - Indexes and VACUUM

From
"Kusuma"
Date:
Who is this?
----- Original Message -----
From: Josh Berkus <josh@agliodbs.com>
To: <pgsql-sql@postgresql.org>
Sent: Wednesday, October 17, 2001 8:59 AM
Subject: [SQL] Performance problems - Indexes and VACUUM


> Tom, Folks:
>
> I am having a rather interesting time getting performance out of my
> database.   I'd really appreciate some feedback from the list on this.
>
> As you may recall, I've gotten around Postgres' lack of rowset-returning
> stored procedures by constructing "pointer tables" which simply hold
> lists of primary keys related to the user's current search.  This is an
> excellent approach for a browser-based application, and I have since
> used this idea on other databases, even one that supports stored
> procedures.
>
> However, this means that I clear all of these pointer tables on a
> periodic basis (how frequently depends on usage). Just clearing the
> records didn't work, because of the Postgres "padded index" problem
> where eventually the indexes on these tables becomes full of deleted
> rows.  Which gives me problem 1:
>
> 1. INDEXES: I discovered, the hard way, a peculiar problem.  If you drop
> and re-create a table within the same transaction (in a function, for
> example) the indexes do not get dropped completely.  Doing this to
> several tables, I had the disturbing experience of seeing incorrect rows
> in response to some queries.  Specifically dropping each of the indexes,
> dropping the tables, re-creating the tables, and re-creating the indexes
> seems to work.  However, this seems to me to indicate a potential
> problem with DDL commands within transactions.
>
> The second problem is giving me severe grief right now:
>
> 2. I have a very complex view designed for browsing client information.
> This view involves 2 other views, and two custom aggregates which are
> based on sub-queries (could only do it in Postgres!).  The query plan is
> as long as this e-mail, but thanks to optimization and good indexing it
> runs in about 2 seconds right after a VACUUM.
> Unfortunately, 6 hours after a VACUUM, the query bogs down.  The query
> plan does not seem to have changed much, but somehow what took 50% of
> the processor for 2 seconds at 8:30AM flattens the processor for a full
> 45 seconds at 3:30 pm.
> Once VACUUM can be run in the background, I suppose that this can be
> dealt with, but until then does anyone have any suggestions?
>
> -Josh Berkus
>
>
>
> ______AGLIO DATABASE SOLUTIONS___________________________
>                                        Josh Berkus
>   Complete information technology      josh@agliodbs.com
>    and data management solutions       (415) 565-7293
>   for law firms, small businesses        fax 621-2533
>     and non-profit organizations.      San Francisco
>


----------------------------------------------------------------------------
----


>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>



Re: Performance problems - Indexes and VACUUM

From
Tom Lane
Date:
"Josh Berkus" <josh@agliodbs.com> writes:
> 1. INDEXES: I discovered, the hard way, a peculiar problem.  If you drop
> and re-create a table within the same transaction (in a function, for
> example) the indexes do not get dropped completely.

Kinda hard to believe; even if the old indexes were still around, they
wouldn't be considered to apply to the new table.  I think the problem
is something else.  Can you provide a reproducible example of what
you're seeing?

> runs in about 2 seconds right after a VACUUM.
>     Unfortunately, 6 hours after a VACUUM, the query bogs down.

What has been changing in the meantime?
        regards, tom lane


Re: Performance problems - Indexes and VACUUM

From
Tom Lane
Date:
"Josh Berkus" <josh@agliodbs.com> writes:
> This is on 7.1.2 (SuSE 7.2, ReiserFS, PG built from source).  Explicitly
> dropping the indexes before dropping the tables seems to have solved the
> problem.  My guess, without understanding the guts of the thing at all,
> is that the transactional nature of the drop and re-create causes the
> indexes not to be fully cleared before they are re-built.  Maybe it's
> even a reaction to the journaling file system.

I don't believe a single word of that explanation ... whatever is going
on here, that ain't it.  A new table is going to have a new OID, and
so will its indexes; there is no way that Postgres will confuse it with
the old one, even if bits of the old one were still hanging around
somehow (which I don't believe either).

One thing to think about, if you are dropping and recreating tables in
a plpgsql function, is that you probably need to do it with EXECUTE
to avoid plan caching.

> BTW, any issues with PostgreSQL and DMA disk access?

Not unless your kernel or hardware are broken.  But I have seen cases
where hardware problems (eg, bogus DMA controller) manifested themselves
only as database errors.  Evidently Postgres was pushing the disk harder
than anything else on the system, so it was more likely to get bit by
a sporadic hardware booboo.
        regards, tom lane


Re: Performance problems - Indexes and VACUUM

From
"Josh Berkus"
Date:
Tom,

> Kinda hard to believe; even if the old indexes were still around,
> they
> wouldn't be considered to apply to the new table.  I think the
> problem
> is something else.  Can you provide a reproducible example of what
> you're seeing?

Wish I could; it only seems to happen on the production machine ... that
is, you need a heavy load of daily use to make it happen.  But, to
reduce the issue to its essentials:

1. Create a "pointer table" as I discussed in the last e-mail.
2. Run a bunch of queries that will store several thousand records in
this pointer table, referencing the PK's of more than one data table.
3. In a function, drop the table and re-create it and its indexes.
4. In the same function, reset the sequence you use to identify each
unique user-query to 1.
5. Performing some queries using the pointer tables, some of the
references will mysteriously point to the wrong rows in the data tables.
Some will work correctly.

This is on 7.1.2 (SuSE 7.2, ReiserFS, PG built from source).  Explicitly
dropping the indexes before dropping the tables seems to have solved the
problem.  My guess, without understanding the guts of the thing at all,
is that the transactional nature of the drop and re-create causes the
indexes not to be fully cleared before they are re-built.  Maybe it's
even a reaction to the journaling file system.

BTW, any issues with PostgreSQL and DMA disk access?

> > runs in about 2 seconds right after a VACUUM.
> >  Unfortunately, 6 hours after a VACUUM, the query bogs down.
>
> What has been changing in the meantime?

Lots of data edits and adds.  This particularly seems to happen on days
where the users are changing dozens to hundreds of records that affect
one of the custom aggregate subqueries.

I'm not surprised things slow down in these circumstances, it's just the
amount of slowdown -- 25 to 1 over a mere 6 hours -- that surprised me.
But the more we talk about this, the more I think I should stop bugging
you and let you finish 7.2 so I can just do background VACUUMing.

-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________
                                       Josh Berkus
  Complete information technology      josh@agliodbs.com
   and data management solutions       (415) 565-7293
  for law firms, small businesses        fax 621-2533
    and non-profit organizations.      San Francisco

Attachment

Re: Performance problems - Indexes and VACUUM

From
"Josh Berkus"
Date:
Tom,

> I don't believe a single word of that explanation ... whatever is
> going
> on here, that ain't it.  A new table is going to have a new OID, and
> so will its indexes; there is no way that Postgres will confuse it
> with
> the old one, even if bits of the old one were still hanging around
> somehow (which I don't believe either).

You're the expert.  All I know for a fact is that when I didn't
explicitly drop the indexes, I got weird results; when I did explicitly
drop them, I didn't.

The whole system is complex enough that the problem is hard to reproduce
without reproducing the whole system (which I'd be happy to do for you,
only it contains confidential data).  From the sound of it, I'm the only
one who's encountered this.

> One thing to think about, if you are dropping and recreating tables
> in
> a plpgsql function, is that you probably need to do it with EXECUTE
> to avoid plan caching.

OK.  Will do.  Thanks.  This may also cure the intermittent
index/whatever issue.

> Not unless your kernel or hardware are broken.  But I have seen cases
> where hardware problems (eg, bogus DMA controller) manifested
> themselves
> only as database errors.  Evidently Postgres was pushing the disk
> harder
> than anything else on the system, so it was more likely to get bit by
> a sporadic hardware booboo.

Thanks!

-Josh

Attachment