Re: Table partitioning for maximum speed? - Mailing list pgsql-general

From greg@turnstep.com
Subject Re: Table partitioning for maximum speed?
Date
Msg-id 932750af8aa6e5a52cde4825837bd87e@biglumber.com
Whole thread Raw
In response to Table partitioning for maximum speed?  (Jeff Boes <jboes@nexcerpt.com>)
Responses Re: Table partitioning for maximum speed?  (Alvaro Herrera <alvherre@dcc.uchile.cl>)
List pgsql-general
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> That's important, because we can afford more application
> machines as load increases, but we can't as easily
> upgrade the database server.

Two ideas come to mind:

One way to speed things up is to convert the entire checksum. Consider
what a md5 checksum really is: a text string representing a hexadecimal
number. Storing it as TEXT or CHAR is not as good as storing it as a
number directly. Have your application convert it to a decimal number,
and then store the checksum as type NUMERIC in the database. This gives
an immediate speed boost. Next, use partial indexes to speed things up
even further. How many partial indexes you want to create depends on your
ratio of selects to updates, and how important each is to you. Some quick
statistical analysis I did showed that for 10 indexes, the magic number
is somewhere around 3.402 x 10 ^ 37. In other words:

CREATE TABLE md5check (id SERIAL, md5 NUMERIC);

CREATE INDEX md5_i0 ON md5check (md5) WHERE
  md5 <= 34000000000000000000000000000000000000;
CREATE INDEX md5_i1 ON md5check (md5) WHERE
  md5 >  34000000000000000000000000000000000000 AND
  md5 <= 68000000000000000000000000000000000000;
CREATE INDEX md5_i2 ON md5check (md5) WHERE
  md5 >  68000000000000000000000000000000000000 AND
  md5 <= 102000000000000000000000000000000000000;
...
CREATE INDEX md5_i10 ON md5check (md5) WHERE
  md5 > 340000000000000000000000000000000000000;

On my test table with 1/2 million rows, I saw a speed up from
.16 msec (using TEXT only) to .09 msec. The more partial indexes
you create, the faster things will go. Just remember to put the
upper and lower boundary indexes in place to catch everything.

Aside: if you are merely testing for the existence of the row,
you can pull back a constant instead of the whole row:

SELECT 1 FROM md5check WHERE md5 = ?


Another way to speed things up is to break the checksum up into parts
so that we can use one of the "normal" datatypes: specifically, BIGINT.
Divide the 32 character checksum into four pieces, convert each piece
to a decimal number, and store each in its own BIGINT column. The good
news with this way is that you only need an index on one of the columns.
Even at 7 million plus, the number of matches of 1/4 of the checksum
characters is small enough to not need additional indexes.

CREATE TABLE md5check
  (id SERIAL, md1 BIGINT, md2 BIGINT, md3 BIGINT, md4 BIGINT);

CREATE INDEX md5_i1 ON md5check(md1);

You can also add partial indexes to this as well, for maximum speed.


- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200310101135

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE/ht6DvJuQZxSWSsgRAjvyAJ9ndadWAgJIm84dc/kB8RABEIzIbwCg1UJL
2VUQeQU+LMgXnumOoMT6kWk=
=PeUQ
-----END PGP SIGNATURE-----



pgsql-general by date:

Previous
From: Oliver Elphick
Date:
Subject: Re: Humor me: Postgresql vs. MySql (esp. licensing)
Next
From: Bruno Wolff III
Date:
Subject: Re: Table partitioning for maximum speed?