Thread: Re: Questions about indexes?

Re: Questions about indexes?

From
Josh Berkus
Date:
Ryan,

I am crossing this discussion to the PGSQL-PERFORMANCE list, which is the
proper place for it.   Anyone interested, please follow us there!

>>>Ryan Bradetich said:
 > the table would look like:
 > 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user x has an invalid shell.
 > 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has an invalid shell.
 > 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has expired password.
 > 2 | Mon Feb 17 00:34:24 MST 2003 | f101 | file /foo has improper owner.
 > etc...
 >
 > So I do not need the anomaly to be part of the index, I only need it to
 >
 > I agree with you, that I would not normally add the anomally to the
 > index, except for the unique row requirement.  Thinking about it now,
 > maybe I should guarentee unique rows via a check constraint...
 >
 > Thanks for making me think about this in a different way!

First off, I'm not clear on why a duplicate anominaly would be necessarily
invalid, given the above.  Not useful, certainly, but legitimate real data.

I realize that you have performance considerations to work with.  However, I
must point out that your database design is not *at all* normalized ... and
that normalizing it might solve some of your indexing problems.

A normalized structure would look something like:

TABLE operations
    id serial not null primary key,
    host_id int not null,
    timeoccurred timestamp not null default now(),
    category varchar(5) not null,
    constraint operations_unq unique (host_id, timeoccurred, category)

TABLE anominalies
    id serial not null primary key,
    operation_id int not null references operations(id) on delete cascade,
    anominaly text

This structure would have two advantages for you:
1) the operations table would be *much* smaller than what you have now, as you
would not be repeating rows for each anominaly.
2) In neither table would you be including the anominaly text in an index ...
thus reducing index size tremendously.

As a warning, though:  you may find that for insert speed the referential
integrity key is better maintained at the middleware layer.   We've had some
complaints about slow FK enforcement in Postgres.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Performance Baseline Script

From
"Keith Bottner"
Date:
Does there currently exist any kind of script that can be run on
Postgres to conduct a complete feature coverage test with varying
dataset sizes to compare performance between functionality changes?

The interest of course is to get a baseline of performance and then to
see how manipulation of internal algorithms or vacuum frequency or WAL
logs being place on a separate physical disk affect the performance of
the various features with various dataset sizes.

If not, how many people would be interested in such a script being
written?

Keith Bottner
kbottner@istation.com

"Vegetarian - that's an old Indian word meaning 'lousy hunter.'" - Andy
Rooney


Re: Performance Baseline Script

From
"Josh Berkus"
Date:
Keith,

> Does there currently exist any kind of script that can be run on
> Postgres to conduct a complete feature coverage test with varying
> dataset sizes to compare performance between functionality changes?

No.

> If not, how many people would be interested in such a script being
> written?


Just about everyon on this list, as well as Advocacy and Hackers, to
judge by the current long-running thread on the topic, which has
meandered across several lists.

-Josh Berkus

Re: Questions about indexes?

From
Ryan Bradetich
Date:
Josh,

Posting to the performance list as requested :)  The reason I orgionally
posted to the hackers list was I was curious about the contents of the
index and how they worked.... but now this thread is more about
performance, so this list is more appropriate.


On Tue, 2003-02-18 at 10:37, Josh Berkus wrote:
> Ryan,
>
> I am crossing this discussion to the PGSQL-PERFORMANCE list, which is the
> proper place for it.   Anyone interested, please follow us there!
>
> >>>Ryan Bradetich said:
>  > the table would look like:
>  > 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user x has an invalid shell.
>  > 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has an invalid shell.
>  > 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has expired password.
>  > 2 | Mon Feb 17 00:34:24 MST 2003 | f101 | file /foo has improper owner.
>  > etc...
>  >
>  > So I do not need the anomaly to be part of the index, I only need it to
>  >
>  > I agree with you, that I would not normally add the anomally to the
>  > index, except for the unique row requirement.  Thinking about it now,
>  > maybe I should guarentee unique rows via a check constraint...
>  >
>  > Thanks for making me think about this in a different way!
>
> First off, I'm not clear on why a duplicate anominaly would be necessarily
> invalid, given the above.  Not useful, certainly, but legitimate real data.

Duplicate anomalies are not invalid, they are only invalid if they are
for the same system, same category, from the same compliancy report.

ie.  This is bad:
1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user x has an invalid shell.
1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user x has an invalid shell.

This is ok:
1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user x has an invalid shell.
1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has an invalid shell.

The only reason the duplicate date would occur is if the same compliancy
report was entered into the database twice.  (ie. The host did not
generate a new compliancy report, or a bug in the data stuffing script,
etc). Daniel Kalchev from the pgsql-hackers list had a good idea that I
am investigating, which is to have the archive script be responsible for
preventing duplicate entries into the database, thus the requirement for
an index to do this is eliminated.

The whole reason I decided to not allow duplicte entries into the
architve table is so when I generate reports, I do not have to put the
distinct qualifier on the queries which eliminates the sort and speeds
up the queries.  The entire purpose of the index was to maintain good
data integrity in the archive tables for reporting purposes.  If I can
enforce the data integrity another way (ie, data stuffer scripts, index
on md5 hash of the data, etc) then I can drop this huge index and be
happy :)

> I realize that you have performance considerations to work with.  However, I
> must point out that your database design is not *at all* normalized ... and
> that normalizing it might solve some of your indexing problems.

Please do point out these design errors!  I am always interested in
learning more about normialization since I do not have any formal DBA
training, and most of my knowledge is from reading books, mailing lists,
and experimenting :)


> A normalized structure would look something like:
>
> TABLE operations
>     id serial not null primary key,
>     host_id int not null,
>     timeoccurred timestamp not null default now(),
>     category varchar(5) not null,
>     constraint operations_unq unique (host_id, timeoccurred, category)
>
> TABLE anominalies
>     id serial not null primary key,
>     operation_id int not null references operations(id) on delete cascade,
>     anominaly text
>
> This structure would have two advantages for you:
> 1) the operations table would be *much* smaller than what you have now, as you
> would not be repeating rows for each anominaly.

I agree the operations table would be smaller, but you have also added
some data by breaking it up into 2 tables.  You have an oid (8 bytes) +
operations.id (4 bytes) + anomalies.id (4 bytes) + operation_id (4
bytes) + tuple overhead[2] (36 bytes).

The anomalies.id and operation_id will be duplicated for all
~85 Millon rows[1], but we did remove the host_id (4 bytes), timestamp
(8 bytes) and category (~5 bytes) ... for a savings of 9 bytes / row.

My quick estimation shows this saves ~ 730 MB in the table and the index
for a combined total of 1.46 GB (The reason the index savings in the
anomalies is not more is explain further in response to point 2.).

So to gain any space savings from breaking the tables up, the total size
of the operations table + primary index + operatoins_unq index < 1.46
GB.

The operations table contains oid (8 bytes) + host_id (4 bytes) +
timestamp (8 bytes) + category (~5 bytes) + tuple overhead[2] (36
bytes).  Also since every category is duplicated in either the primary
index or the operations_unq index, the index sizes will be approximately
the size of the main table.

So 730 MB / (61 bytes) == ~ 12.5 Million rows.

A quick calculation of hosts * categories * data points show that we
could have a maximum of ~ 12 million entries[3] in the operation table,
so this would save some space :)


> 2) In neither table would you be including the anominaly text in an index ...
> thus reducing index size tremendously.

Unless I impliment Daniel's method of verifying the uniqness at the data
insertion point, I will still need to guarentee the anomaly is unique
for the given operation_id.  If I mis-read the table schema, would you
please point it out to me .. I'm probably being dense :)

Also, I do not understand why the anomalies table need the id key for
the primary key.  Wouldn't the operation_id and the anomaly form the
primary key?  We could save 8 bytes (4 from table + 4 from index) * ~85
Million rows by removing this column.


> As a warning, though:  you may find that for insert speed the referential
> integrity key is better maintained at the middleware layer.   We've had some
> complaints about slow FK enforcement in Postgres.


Thanks, I will keep this in mind.  Although the inserts are usually done
in a batch job ... so interactive speed is generally not an issue ...
but faster is always better :)

Also I am curious ... With the table more nomalized, I now need to
perform a join when selecting data.... I realize there will be fewer
pages to read from disk (which is good!) when doing this join, but I
will interested to see how the join affects the interactive performance
of the queries.... something to test :)


Thanks for looking at this, Josh, and providing input.  Hopefully by
explaining my figuring and thinking you can see what am I looking at ...
and point out additional flaws in my methods :)  Unfortunately I still
do not think I can remove the anomaly field from the index yet, even by
normalizing the tables like you did.

I think Daniel has the correct answer by moving the unique constraint
check out into the stuffer script, or by performing some method of
indexing on a hash as I proposed at the beginning of the thread.

I have figured out a way to reduce my md5 index size in 1/2 again, and
have deveoped a method for dealing with collisions too.  I am planning
on running some bench marks against the current method, with the tables
normalized as Josh suggested, and using the md5 hash I am working on. My
benchmarks will be fairly simple ... average time to insert X number of
valid inserts and average time to insert X number of duplicate inserts
along with disk space usage.  If anyone is interested I am willing to
post the results to this list ... and if anyone has some other benchmark
suggestions they would like to see, feel free to let me know.

Thanks much for all the help and insight!

- Ryan

[1]     select count(anomaly) from history;
     count
     ----------
      85221799

[2]     I grabed the 36 bytes overhead / tuple from this old FAQ I found
    at  http://www.guides.sk/postgresql_docs/faq-english.html
    I did not look at what the current value is today.

[3]     This was a rough calculation of a maxium, I do not believe we     are
at this maxium, so the space savings is most likely larger.

--
Ryan Bradetich <rbradetich@uswest.net>


Re: Performance Baseline Script

From
Tom Lane
Date:
"Josh Berkus" <josh@agliodbs.com> writes:
> Keith,
>> Does there currently exist any kind of script that can be run on
>> Postgres to conduct a complete feature coverage test with varying
>> dataset sizes to compare performance between functionality changes?

> No.

If you squint properly, OSDB (http://osdb.sourceforge.net/) might be
thought to do this, or at least be a starting point for it.

            regards, tom lane

Re: Performance Baseline Script

From
Justin Clift
Date:
Tom Lane wrote:
> "Josh Berkus" <josh@agliodbs.com> writes:
>
>>Keith,
>>
>>>Does there currently exist any kind of script that can be run on
>>>Postgres to conduct a complete feature coverage test with varying
>>>dataset sizes to compare performance between functionality changes?
>
>>No.
>
> If you squint properly, OSDB (http://osdb.sourceforge.net/) might be
> thought to do this, or at least be a starting point for it.

As a side thought, just today found out that the TPC organisation
provides all kinds of code freely for building/running the TPC-x tests.
  Didn't know that before.

Sure, we can't submit results yet, but we might at least be able to run
the tests and see if anything interesting turns up.  People have talked
about the TPC tests before, but not sure if anyone has really looked at
making them runnable on PostgreSQL for everyone yet.

Regards and best wishes,

Justin Clift


>             regards, tom lane

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi


Re: Performance Baseline Script

From
Andrew Sullivan
Date:
On Thu, Feb 20, 2003 at 01:45:25AM +1030, Justin Clift wrote:

> As a side thought, just today found out that the TPC organisation
> provides all kinds of code freely for building/running the TPC-x tests.

Yes, but be careful what you mean there.

It is _not_ a TPC test unless it is run under tightly-controlled and
-audited conditions as specified by TPC.  And that effectively means
you have to pay them.  In other words, it's not a TPC test unless you
can get it published by them.

That doesn't mean you can't run a test "based on the documents
provided by the TPC for test definition _x_".  Just make sure you
walk on the correct side of the intellectual property line, or you'll
be hearing from their lawyers.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Questions about indexes?

From
Josh Berkus
Date:
Ryan,

> Posting to the performance list as requested :)  The reason I orgionally
> posted to the hackers list was I was curious about the contents of the
> index and how they worked.... but now this thread is more about
> performance, so this list is more appropriate.

*shrug* I just figured that you didn't know about the performance list ...
also, I'm doing my bit to reduce traffic on -hackers.

> Duplicate anomalies are not invalid, they are only invalid if they are
> for the same system, same category, from the same compliancy report.
>
> ie.  This is bad:
> 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user x has an invalid shell.
> 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user x has an invalid shell.
>
> This is ok:
> 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user x has an invalid shell.
> 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has an invalid shell.

OK.  Given the necessity of verifying anominaly uniqueness, my suggestions
below change somewhat.

> Please do point out these design errors!  I am always interested in
> learning more about normialization since I do not have any formal DBA
> training, and most of my knowledge is from reading books, mailing lists,
> and experimenting :)

"Practical Issues in Database Management" and "Database Design for Mere
Mortals" are useful.  Me, I learned through 5 years of doing the wrong thing
and finding out why it was wrong ...

> I agree the operations table would be smaller, but you have also added
> some data by breaking it up into 2 tables.  You have an oid (8 bytes) +
> operations.id (4 bytes) + anomalies.id (4 bytes) + operation_id (4
> bytes) + tuple overhead[2] (36 bytes).

Yes, and given your level of traffic, you might have to use 8byte id fields.
But if disk space is your main issue, then I'd suggest swaping the category
code to a "categories" table, allowing you to use an int4 or even and int2 as
the category_id in the Operations table.   This would save you 6-8 bytes per
row in Operations.

> > 2) In neither table would you be including the anominaly text in an index
> > ... thus reducing index size tremendously.
>
> Unless I impliment Daniel's method of verifying the uniqness at the data
> insertion point, I will still need to guarentee the anomaly is unique
> for the given operation_id.  If I mis-read the table schema, would you
> please point it out to me .. I'm probably being dense :)
>
> Also, I do not understand why the anomalies table need the id key for
> the primary key.  Wouldn't the operation_id and the anomaly form the
> primary key?  We could save 8 bytes (4 from table + 4 from index) * ~85
> Million rows by removing this column.

As I said above, I didn't understand why you needed to check anominaly
uniqueness.  Given that you do, I'd suggest that you do the above.

Of course, you also have another option.  You could check uniqueness on the
operation_id and the md5 of the anominaly field.  This would be somewhat
awkward, and would still require that you have a seperate primary key for the
anominalies table.   But the difference between an index on an up-to-1024
character field and a md5 string might make it worth it, particularly when it
comes to inserting new rows.

In other words, test:
1) drop the anominaly_id as suggested, above.
2) adding an anominaly_md5 column to the anominalies table.
3) make operation_id, anominaly_md5 your primary key
4) write a BEFORE trigger that caclulates the md5 of any incoming anominalies
and adds it to that column.

It's worth testing, since a unique index on a 1024-character field for 85
million records could be very slow.

> Thanks, I will keep this in mind.  Although the inserts are usually done
> in a batch job ... so interactive speed is generally not an issue ...
> but faster is always better :)

In a transaction, I hope.

> Also I am curious ... With the table more nomalized, I now need to
> perform a join when selecting data.... I realize there will be fewer
> pages to read from disk (which is good!) when doing this join, but I
> will interested to see how the join affects the interactive performance
> of the queries.... something to test :)

I'll look forward to seeing the results of your test.

>  If anyone is interested I am willing to
> post the results to this list ... and if anyone has some other benchmark
> suggestions they would like to see, feel free to let me know.

We're always interested in benchmarks.


--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Performance Baseline Script

From
Justin Clift
Date:
Andrew Sullivan wrote:
> On Thu, Feb 20, 2003 at 01:45:25AM +1030, Justin Clift wrote:
>
>>As a side thought, just today found out that the TPC organisation
>>provides all kinds of code freely for building/running the TPC-x tests.
>
> Yes, but be careful what you mean there.
>
> It is _not_ a TPC test unless it is run under tightly-controlled and
> -audited conditions as specified by TPC.  And that effectively means
> you have to pay them.  In other words, it's not a TPC test unless you
> can get it published by them.
>
> That doesn't mean you can't run a test "based on the documents
> provided by the TPC for test definition _x_".  Just make sure you
> walk on the correct side of the intellectual property line, or you'll
> be hearing from their lawyers.

Good point.

What I was thinking about was that we could likely get the "test suite"
of code that the TPC publishes and ensure that it works with PostgreSQL.

The reason I'm thinking of is that if any of the existing PostgreSQL
support companies (or an alliance of them), decided to become a member
of the TPC in order to submit results then the difficuly of entry would
be that  bit lower, and there would be people around at that stage who'd
already gained good experience with the test suite(s).

:-)

Regards and best wishes,

Justin Clift

> A


--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi



Re: Questions about indexes?

From
Curt Sampson
Date:
On Wed, 19 Feb 2003, Ryan Bradetich wrote:

> 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user x has an invalid shell.
> 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has an invalid shell.
> 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has expired password.
> 2 | Mon Feb 17 00:34:24 MST 2003 | f101 | file /foo has improper owner.

If you're going to normalize this a bit, you should start looking at
the data that are repeated and trying to get rid of the repititions.
First of all, the timestamp is repeated a lot, you might move that to a
separate table and just use a key into that table. But you might even
do better with multiple columns: combine the timestamp and host ID into
one table to get a "host report instance" and replace both those columns
with just that. If host-id/timestamp/category triplets are frequently
repeated, you might even consider combining the three into another
table, and just using an ID from that table with each anomaly.

But the biggest space and time savings would come from normalizing your
anomalys themselves, because there's a huge amount repeated there. If you're
able to change the format to something like:

    invalid shell for user: x
    invalid shell for user: y
    expired password for user: y
    improper owner for file: /foo

You can split those error messages off into another table:

    anomaly_id | anomaly
    -----------+------------------------------------------------
             1 | invalid shell for user
         2 | expired password for user
         3 | improper owner for file

And now your main table looks like this:

    host_id | timestamp                    | ctgr | anomaly_id | datum
    --------+------------------------------+------+------------+------
          1 | Mon Feb 17 00:34:24 MST 2003 | p101 |          1 | x
          1 | Mon Feb 17 00:34:24 MST 2003 | p101 |          1 | y
          1 | Mon Feb 17 00:34:24 MST 2003 | p101 |          2 | y
          2 | Mon Feb 17 00:34:24 MST 2003 | f101 |          3 | /foo

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC