Thread: Performance Question

Performance Question

From
"Tim Perdue"
Date:
Please email directly to tim@dmcity.net.


Some of you may have followed the thread about my mailing archive system
that uses PostgreSQL.

The database is growing quickly and I want to make sure the performance is
going to be OK, so here's a question:

I have a table with records like this:

fld_mail_list (int) | etc | etc | fld_mail_body
      1                x x x     <----big TEXT Field ----->
      2                x x x     <----big TEXT Field ----->
      1                x x x     <----big TEXT Field ----->
      4                x x x     <----big TEXT Field ----->
      1                x x x     <----big TEXT Field ----->
      1                x x x     <----big TEXT Field ----->
      7                x x x     <----big TEXT Field ----->
      1                x x x     <----big TEXT Field ----->
      1                x x x     <----big TEXT Field ----->
      7                x x x     <----big TEXT Field ----->
      1                x x x     <----big TEXT Field ----->
      8                x x x     <----big TEXT Field ----->

Essentially, I have a several different mailings lists being archived into 1
big table, and the key to accessing them is using the fld_mail_list key.

Am I going to run into huge performance problems on this? Should each
mailing list be archived in its own table??

When I do a

SELECT * FROM tbl_mail_archive WHERE fld_mail_list=1 AND
fld_mail_body~~'%keyword%';

am I going to get killed with a performance hit?? It's running OK now, with
the table at 20MB, but I have 100MB more worth of letters to drop into the
table.....  8-)

Please email directly to tim@dmcity.net.




Re: [SQL] Performance Question

From
Viorel Anghel
Date:
On Wed, Feb 03, 1999 at 08:25:25PM -0600, Tim Perdue wrote:
> Please email directly to tim@dmcity.net.
hmm, and CC to list, we also want to know ;-)

[snip]

> am I going to get killed with a performance hit?? It's running OK now, with
> the table at 20MB, but I have 100MB more worth of letters to drop into the
> table.....  8-)

1. What does mean "OK now"? Just for my curiosity, on which machine and
which is the average time for such a query?

2. Why don't you test with the rest of 100MB and tell us what is happening?

--
        ___
       <o-o>    Viorel ANGHEL <vang@altavista.net>
       [`-']
       -"-"-

Re: Performance Question

From
Tom Lane
Date:
"Tim Perdue" <tim@directricity.com> writes:
> Am I going to run into huge performance problems on this? Should each
> mailing list be archived in its own table??
> When I do a
> SELECT * FROM tbl_mail_archive WHERE fld_mail_list=1 AND
> fld_mail_body~~'%keyword%';
> am I going to get killed with a performance hit??

Should be OK as long as you make an index on fld_mail_list (and don't
forget to vacuum regularly).

Of course, maintaining that index is not zero-cost.  The appropriate
thing to ask is what your usage patterns will be.  If you frequently
make searches across multiple mailing lists, then you undoubtedly
want to do it as you show above.  If you never (or hardly ever) do that,
you might as well keep each mailing list in its own table and live with
having to do multiple SELECTs when you do want to look across multiple
lists.

> It's running OK now, with the table at 20MB, but I have 100MB more
> worth of letters to drop into the table.....  8-)

I suspect your real problem is going to be that searching 100MB with
"fld_mail_body~~'%keyword%'" is going to be dog-slow.  I think you
are going to want a full-text index if you expect to do that a lot.

There is a simple all-Postgres FTI in the contrib part of the
distribution, but I think it'd probably run out of steam long before you
got to 100MB.  What I'd probably do in your situation is to use
Glimpse (http://glimpse.cs.arizona.edu/) for the text index.
That'd likely mean storing the message bodies in separate files outside
the database proper, and keeping only the file names in the database
rows.  (But that'd get rid of the message-over-8K problem, so it isn't
all bad...)

            regards, tom lane

Re: Performance Question

From
"Tim Perdue, The Des Moines City.net"
Date:
I did finally get most of the data dropped into the Postgres database. It's
now sitting at 55,000 email messages (100MB).

Even with a huge mix of records, search time is not impacted as far as I can
tell. Queries come back just as quickly as they did before.

I did discover that indexes on the text fields are not being used when I use
a Regex operator. So I dropped the index with no performance hit.

Thanks to everyone for their input,

Tim Perdue
geocrawler.com