Re: Table partitioning for maximum speed? - Mailing list pgsql-general
From | greg@turnstep.com |
---|---|
Subject | Re: Table partitioning for maximum speed? |
Date | |
Msg-id | 932750af8aa6e5a52cde4825837bd87e@biglumber.com Whole thread Raw |
In response to | Table partitioning for maximum speed? (Jeff Boes <jboes@nexcerpt.com>) |
Responses |
Re: Table partitioning for maximum speed?
|
List | pgsql-general |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > That's important, because we can afford more application > machines as load increases, but we can't as easily > upgrade the database server. Two ideas come to mind: One way to speed things up is to convert the entire checksum. Consider what a md5 checksum really is: a text string representing a hexadecimal number. Storing it as TEXT or CHAR is not as good as storing it as a number directly. Have your application convert it to a decimal number, and then store the checksum as type NUMERIC in the database. This gives an immediate speed boost. Next, use partial indexes to speed things up even further. How many partial indexes you want to create depends on your ratio of selects to updates, and how important each is to you. Some quick statistical analysis I did showed that for 10 indexes, the magic number is somewhere around 3.402 x 10 ^ 37. In other words: CREATE TABLE md5check (id SERIAL, md5 NUMERIC); CREATE INDEX md5_i0 ON md5check (md5) WHERE md5 <= 34000000000000000000000000000000000000; CREATE INDEX md5_i1 ON md5check (md5) WHERE md5 > 34000000000000000000000000000000000000 AND md5 <= 68000000000000000000000000000000000000; CREATE INDEX md5_i2 ON md5check (md5) WHERE md5 > 68000000000000000000000000000000000000 AND md5 <= 102000000000000000000000000000000000000; ... CREATE INDEX md5_i10 ON md5check (md5) WHERE md5 > 340000000000000000000000000000000000000; On my test table with 1/2 million rows, I saw a speed up from .16 msec (using TEXT only) to .09 msec. The more partial indexes you create, the faster things will go. Just remember to put the upper and lower boundary indexes in place to catch everything. Aside: if you are merely testing for the existence of the row, you can pull back a constant instead of the whole row: SELECT 1 FROM md5check WHERE md5 = ? Another way to speed things up is to break the checksum up into parts so that we can use one of the "normal" datatypes: specifically, BIGINT. Divide the 32 character checksum into four pieces, convert each piece to a decimal number, and store each in its own BIGINT column. The good news with this way is that you only need an index on one of the columns. Even at 7 million plus, the number of matches of 1/4 of the checksum characters is small enough to not need additional indexes. CREATE TABLE md5check (id SERIAL, md1 BIGINT, md2 BIGINT, md3 BIGINT, md4 BIGINT); CREATE INDEX md5_i1 ON md5check(md1); You can also add partial indexes to this as well, for maximum speed. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200310101135 -----BEGIN PGP SIGNATURE----- Comment: http://www.turnstep.com/pgp.html iD8DBQE/ht6DvJuQZxSWSsgRAjvyAJ9ndadWAgJIm84dc/kB8RABEIzIbwCg1UJL 2VUQeQU+LMgXnumOoMT6kWk= =PeUQ -----END PGP SIGNATURE-----
pgsql-general by date: