Thread: Table partitioning for maximum speed?

Table partitioning for maximum speed?

From
Jeff Boes
Date:
I'm sure this is a concept that's been explored here. I have a table
(fairly simple, just two columns, one of which is a 32-digit checksum)
with several million rows (currently, about 7 million). About a million
times a day we do

   select * from my_table where md5 = ?

to verify presence or absence of the row, and base further processing on
that information.

The idea bandied about now is to partition this table into 16 (or 256,
or ...) chunks by first digit (or 2, or ...). In the simplest case, this
would mean:

create table my_table_0 as select * from my_table where md5 like '0%';

create table my_table_1 as select * from my_table where md5 like '1%';

...

create table my_table_f as select * from my_table where md5 like 'f%';


Then change the code to examine the checksum and create a query to the
appropriate table based on the first digit.

Obviously, this is conceptually similar to what the index on the "md5"
column is supposed to do for us. However, partitioning moves just a
little of the processing load off the database server and onto the
machine running the application. That's important, because we can afford
more application machines as load increases, but we can't as easily
upgrade the database server.

Will a query against a table of 0.5 million rows beat a query against a
table of 7 million rows by a margin that makes it worth the hassle of
supporting 15 "extra" tables?

--
Jeff Boes                                      vox 269.226.9550 ext 24
Database Engineer                                     fax 269.349.9076
Nexcerpt, Inc.                                 http://www.nexcerpt.com
            ...Nexcerpt... Extend your Expertise


Re: Table partitioning for maximum speed?

From
Bruno Wolff III
Date:
On Thu, Oct 09, 2003 at 18:37:19 +0000,
  Jeff Boes <jboes@nexcerpt.com> wrote:
> I'm sure this is a concept that's been explored here. I have a table
> (fairly simple, just two columns, one of which is a 32-digit checksum)
> with several million rows (currently, about 7 million). About a million
> times a day we do
>
>   select * from my_table where md5 = ?
>
> to verify presence or absence of the row, and base further processing on
> that information.
>
> The idea bandied about now is to partition this table into 16 (or 256,
> or ...) chunks by first digit (or 2, or ...). In the simplest case, this
> would mean:

If there is an index on the checksum column, then you shouldn't get
much of a speed up by partitioning the data.
If you don't have an index on the checksum, it sounds like you should.

Re: Table partitioning for maximum speed?

From
greg@turnstep.com
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> That's important, because we can afford more application
> machines as load increases, but we can't as easily
> upgrade the database server.

Two ideas come to mind:

One way to speed things up is to convert the entire checksum. Consider
what a md5 checksum really is: a text string representing a hexadecimal
number. Storing it as TEXT or CHAR is not as good as storing it as a
number directly. Have your application convert it to a decimal number,
and then store the checksum as type NUMERIC in the database. This gives
an immediate speed boost. Next, use partial indexes to speed things up
even further. How many partial indexes you want to create depends on your
ratio of selects to updates, and how important each is to you. Some quick
statistical analysis I did showed that for 10 indexes, the magic number
is somewhere around 3.402 x 10 ^ 37. In other words:

CREATE TABLE md5check (id SERIAL, md5 NUMERIC);

CREATE INDEX md5_i0 ON md5check (md5) WHERE
  md5 <= 34000000000000000000000000000000000000;
CREATE INDEX md5_i1 ON md5check (md5) WHERE
  md5 >  34000000000000000000000000000000000000 AND
  md5 <= 68000000000000000000000000000000000000;
CREATE INDEX md5_i2 ON md5check (md5) WHERE
  md5 >  68000000000000000000000000000000000000 AND
  md5 <= 102000000000000000000000000000000000000;
...
CREATE INDEX md5_i10 ON md5check (md5) WHERE
  md5 > 340000000000000000000000000000000000000;

On my test table with 1/2 million rows, I saw a speed up from
.16 msec (using TEXT only) to .09 msec. The more partial indexes
you create, the faster things will go. Just remember to put the
upper and lower boundary indexes in place to catch everything.

Aside: if you are merely testing for the existence of the row,
you can pull back a constant instead of the whole row:

SELECT 1 FROM md5check WHERE md5 = ?


Another way to speed things up is to break the checksum up into parts
so that we can use one of the "normal" datatypes: specifically, BIGINT.
Divide the 32 character checksum into four pieces, convert each piece
to a decimal number, and store each in its own BIGINT column. The good
news with this way is that you only need an index on one of the columns.
Even at 7 million plus, the number of matches of 1/4 of the checksum
characters is small enough to not need additional indexes.

CREATE TABLE md5check
  (id SERIAL, md1 BIGINT, md2 BIGINT, md3 BIGINT, md4 BIGINT);

CREATE INDEX md5_i1 ON md5check(md1);

You can also add partial indexes to this as well, for maximum speed.


- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200310101135

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE/ht6DvJuQZxSWSsgRAjvyAJ9ndadWAgJIm84dc/kB8RABEIzIbwCg1UJL
2VUQeQU+LMgXnumOoMT6kWk=
=PeUQ
-----END PGP SIGNATURE-----



Re: Table partitioning for maximum speed?

From
Bruno Wolff III
Date:
Please keep discussions on the list so that others may learn from or comment
on the suggested solutions.

On Fri, Oct 10, 2003 at 11:27:50 -0400,
  Jeff Boes <jboes@nexcerpt.com> wrote:
> Bruno Wolff III wrote:
>
> >On Thu, Oct 09, 2003 at 18:37:19 +0000,
> > Jeff Boes <jboes@nexcerpt.com> wrote:
> >
> >
> >>
> >>The idea bandied about now is to partition this table into 16 (or 256,
> >>or ...) chunks by first digit (or 2, or ...). In the simplest case, this
> >>would mean:
> >>
> >>
> >
> >If there is an index on the checksum column, then you shouldn't get
> >much of a speed up by partitioning the data.
> >If you don't have an index on the checksum, it sounds like you should.
> >
> >
> Yes, the table has:
>
>    Table "public.link_checksums"
> Column  |     Type      | Modifiers
> ---------+---------------+-----------
> md5     | character(32) | not null
> link_id | integer       | not null
> Indexes: ix_link_checksums_pk primary key btree (md5)

In that event I would expect that you might only save a few disk accesses
by having a btree with fewer levels.

If the query is slow, it might be doing a sequential search because of
a type mismatch. You can use explain to double check what plan is being
used.

Re: Table partitioning for maximum speed?

From
Jeff Boes
Date:
Bruno Wolff III wrote:

>On Fri, Oct 10, 2003 at 11:27:50 -0400,
>  Jeff Boes <jboes@nexcerpt.com> wrote:
>
>
>>Yes, the table has:
>>
>>   Table "public.link_checksums"
>>Column  |     Type      | Modifiers
>>---------+---------------+-----------
>>md5     | character(32) | not null
>>link_id | integer       | not null
>>Indexes: ix_link_checksums_pk primary key btree (md5)
>>
>>
>
>In that event I would expect that you might only save a few disk accesses
>by having a btree with fewer levels.
>
>If the query is slow, it might be doing a sequential search because of
>a type mismatch. You can use explain to double check what plan is being
>used.
>
>

Actually, the query is *not* slow; but since we executing it a million
times a day, any savings we can realize will add up in a hurry. For
example, yesterday this query resulted in the following stats:

    'count' => 814621,
    'avg' => '0.009',
    'time' => '7674.932'

That is, we executed it 814,621 times, for a total (wallclock) time
spent waiting of 7,674 seconds (obviously, we have multiple backends
executing). So, even if we can cut this by only 0.004, that would result
in a savings of almost an hour.

So, again: will front-loading the work by mapping the original query to
16 (or 256) different queries by examining the first digit save us
anything? (My colleague who came up with this idea thinks so, since the
calculation will be done on a box other than the database host, and even
one disk access saved per query would outweigh the calculation.)

Will having 15 (or 255) additional tables make the cache behave
differently? Is there overhead associated with having another 15 (or
255) tables?

--
Jeff Boes                                      vox 269.226.9550 ext 24
Database Engineer                                     fax 269.349.9076
Nexcerpt, Inc.                                 http://www.nexcerpt.com
           ...Nexcerpt... Extend your Expertise



Re: Table partitioning for maximum speed?

From
Alvaro Herrera
Date:
On Fri, Oct 10, 2003 at 04:30:20PM -0000, greg@turnstep.com wrote:

> One way to speed things up is to convert the entire checksum. Consider
> what a md5 checksum really is: a text string representing a hexadecimal
> number. Storing it as TEXT or CHAR is not as good as storing it as a
> number directly. Have your application convert it to a decimal number,
> and then store the checksum as type NUMERIC in the database.

A subsequent idea is that with NUMERIC or other variable length fields
you are wasting time, space and cache hits anyway.  It would be probably
faster to create a custom datatype, with fixed length for the exact size
of an MD5 sum.  With suitable input and output functions and all the
operators you need, you will likely gain some additional performance
boost.

IIRC, Manfred Koizar developed a fixed-width char datatype for Shridar
Daitankhar (sp?) maybe a year ago.  It is probably a good starting
point.  Look for it in the pgsql-performance archives.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Endurecerse, pero jamás perder la ternura" (E. Guevara)

Re: Table partitioning for maximum speed?

From
Bruno Wolff III
Date:
On Fri, Oct 10, 2003 at 13:40:17 -0400,
  Jeff Boes <jboes@nexcerpt.com> wrote:
>
> So, again: will front-loading the work by mapping the original query to
> 16 (or 256) different queries by examining the first digit save us
> anything? (My colleague who came up with this idea thinks so, since the
> calculation will be done on a box other than the database host, and even
> one disk access saved per query would outweigh the calculation.)

This could potentially save you some disk accesses. If the index is mostly
in cache now, you might not get a noticable benefit.

> Will having 15 (or 255) additional tables make the cache behave
> differently? Is there overhead associated with having another 15 (or
> 255) tables?

I am not sure whether or not you would use significantly more space
by having multiple tables.

The data format change suggested by someone else may be worth trying
as well. In addition to their suggestions, you might experiment with
keeping the hash in either 4 ints or 2 bigints. If you use bigints,
you could probably just use an index on one of the bigints and have
only a small chance of finding more than one row that matches.

Re: Table partitioning for maximum speed?

From
Vivek Khera
Date:
>>>>> "JB" == Jeff Boes <jboes@nexcerpt.com> writes:

JB> Will a query against a table of 0.5 million rows beat a query against
JB> a table of 7 million rows by a margin that makes it worth the hassle
JB> of supporting 15 "extra" tables?

I think you'll be better off with a single table, as you won't have
contention for the index pages in the cache.

One thing to do is to reindex reasonably often (for PG < 7.4) to avoid
index bloat, which will make them not fit in cache.  Just check the
size of your index in the pg_class table, and when it gets big,
reindex (assuming you do lots of updates/inserts to the table).

Your table splitting solution sounds like something I'd do if I were
forced to use mysql ;-)


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: Table partitioning for maximum speed?

From
Greg Stark
Date:
Bruno Wolff III <bruno@wolff.to> writes:

> The data format change suggested by someone else may be worth trying
> as well. In addition to their suggestions, you might experiment with
> keeping the hash in either 4 ints or 2 bigints. If you use bigints,
> you could probably just use an index on one of the bigints and have
> only a small chance of finding more than one row that matches.

This sounds good to me too.

You would have to experiment to see if the 4x int format is faster than the 2x
bigint or vice versa. I suspect the 4x int format is way faster, if you have
few enough collisions (like single digit) it would probably be the best.

Using native fixed-size datatypes that fit in a Datum and have assembly
instructions for comparison should be a big win over a variable sized datatype
that has to be dereferenced from a pointer and then put through complex
functions to handle comparison.

--
greg

Re: Table partitioning for maximum speed?

From
Joe Conway
Date:
Bruno Wolff III wrote:
> The data format change suggested by someone else may be worth trying
> as well. In addition to their suggestions, you might experiment with
> keeping the hash in either 4 ints or 2 bigints. If you use bigints,
> you could probably just use an index on one of the bigints and have
> only a small chance of finding more than one row that matches.
>

This is an interesting idea. Alternatively just use bytea and store the
16 bytes directly (i.e. no hex or base64 encoding). There is b-tree
index support for bytea.

Joe



Re: Table partitioning for maximum speed?

From
Joe Conway
Date:
Jeff Boes wrote:
> Obviously, this is conceptually similar to what the index on the "md5"
> column is supposed to do for us. However, partitioning moves just a
> little of the processing load off the database server and onto the
> machine running the application. That's important, because we can afford
> more application machines as load increases, but we can't as easily
> upgrade the database server.
>
> Will a query against a table of 0.5 million rows beat a query against a
> table of 7 million rows by a margin that makes it worth the hassle of
> supporting 15 "extra" tables?
>

I don't think 16 tables on the same server will help, but if you already
have your app tier physically separate from the database tier, you could
partition your data to more than one database server based on the first
byte of the md5 column. I designed and built something similar a few
years ago. We never got to the point where we really needed that kind of
scalability, but it worked pretty well in (limited) testing.

Joe


Re: Table partitioning for maximum speed?

From
Jean-Luc Lachance
Date:
BULL.

How many times does PG have to scan the whole table because of MVCC?
At least with partitioning there is a fighting chance that that won't be
necessary.
Queries that involve the field on which the table is partitioned execute
faster by an order of magnitude.
It also helps with vaccuming as PG can vaccum only one partition at a
time.
I have 17M row table where all records get frequently updated over a
year.
I would do my own partitioning with inheritance if it was not broken.
Partitioning would be a BIG plus in my book. So would visibility of
records but that is another fight.

JLL

Vivek Khera wrote:
>
> >>>>> "JB" == Jeff Boes <jboes@nexcerpt.com> writes:
>
> JB> Will a query against a table of 0.5 million rows beat a query against
> JB> a table of 7 million rows by a margin that makes it worth the hassle
> JB> of supporting 15 "extra" tables?
>
> I think you'll be better off with a single table, as you won't have
> contention for the index pages in the cache.
>
> One thing to do is to reindex reasonably often (for PG < 7.4) to avoid
> index bloat, which will make them not fit in cache.  Just check the
> size of your index in the pg_class table, and when it gets big,
> reindex (assuming you do lots of updates/inserts to the table).
>
> Your table splitting solution sounds like something I'd do if I were
> forced to use mysql ;-)
>
> --
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Vivek Khera, Ph.D.                Khera Communications, Inc.
> Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
> AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

Re: Table partitioning for maximum speed?

From
Jean-Luc Lachance
Date:
Jean-Luc Lachance wrote:
>
> BULL.
>
> How many times does PG have to scan the whole table because of MVCC?
> At least with partitioning there is a fighting chance that that won't be
> necessary.
> Queries that involve the field on which the table is partitioned execute
> faster by an order of magnitude.
> It also helps with vaccuming as PG can vaccum only one partition at a
> time.
> I have 17M row table where all records get frequently updated over a
> year.
> I would do my own partitioning with inheritance if it was not broken.
> Partitioning would be a BIG plus in my book. So would visibility of
> records but that is another fight.


I meant to say visibility of record in the index.


>
> JLL
>
> Vivek Khera wrote:
> >
> > >>>>> "JB" == Jeff Boes <jboes@nexcerpt.com> writes:
> >
> > JB> Will a query against a table of 0.5 million rows beat a query against
> > JB> a table of 7 million rows by a margin that makes it worth the hassle
> > JB> of supporting 15 "extra" tables?
> >
> > I think you'll be better off with a single table, as you won't have
> > contention for the index pages in the cache.
> >
> > One thing to do is to reindex reasonably often (for PG < 7.4) to avoid
> > index bloat, which will make them not fit in cache.  Just check the
> > size of your index in the pg_class table, and when it gets big,
> > reindex (assuming you do lots of updates/inserts to the table).
> >
> > Your table splitting solution sounds like something I'd do if I were
> > forced to use mysql ;-)
> >
> > --
> > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> > Vivek Khera, Ph.D.                Khera Communications, Inc.
> > Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
> > AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 9: the planner will ignore your desire to choose an index scan if your
> >       joining column's datatypes do not match
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend

Re: Table partitioning for maximum speed?

From
"David Busby"
Date:
Is this partitioning like the schemas mentioned here:
http://www.postgresql.org/docs/current/static/ddl-schemas.html?  Would those
help and increase performance?

/B

----- Original Message -----
From: "Jean-Luc Lachance" <jllachan@nsd.ca>
To: "Vivek Khera" <khera@kcilink.com>
Cc: <pgsql-general@postgresql.org>
Sent: Friday, October 10, 2003 14:23
Subject: Re: [GENERAL] Table partitioning for maximum speed?


> BULL.
>
> How many times does PG have to scan the whole table because of MVCC?
> At least with partitioning there is a fighting chance that that won't be
> necessary.
> Queries that involve the field on which the table is partitioned execute
> faster by an order of magnitude.
> It also helps with vaccuming as PG can vaccum only one partition at a
> time.
> I have 17M row table where all records get frequently updated over a
> year.
> I would do my own partitioning with inheritance if it was not broken.
> Partitioning would be a BIG plus in my book. So would visibility of
> records but that is another fight.
>
> JLL
>
> Vivek Khera wrote:
> >
> > >>>>> "JB" == Jeff Boes <jboes@nexcerpt.com> writes:
> >
> > JB> Will a query against a table of 0.5 million rows beat a query
against
> > JB> a table of 7 million rows by a margin that makes it worth the hassle
> > JB> of supporting 15 "extra" tables?
> >
> > I think you'll be better off with a single table, as you won't have
> > contention for the index pages in the cache.
> >
> > One thing to do is to reindex reasonably often (for PG < 7.4) to avoid
> > index bloat, which will make them not fit in cache.  Just check the
> > size of your index in the pg_class table, and when it gets big,
> > reindex (assuming you do lots of updates/inserts to the table).
> >
> > Your table splitting solution sounds like something I'd do if I were
> > forced to use mysql ;-)
> >
> > --
> > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> > Vivek Khera, Ph.D.                Khera Communications, Inc.
> > Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
> > AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 9: the planner will ignore your desire to choose an index scan if
your
> >       joining column's datatypes do not match
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend


Re: Table partitioning for maximum speed?

From
Vivek Khera
Date:
>>>>> "JL" == Jean-Luc Lachance <jllachan@nsd.ca> writes:


JL> BULL.
JL> How many times does PG have to scan the whole table because of MVCC?
JL> At least with partitioning there is a fighting chance that that won't be
JL> necessary.

Huh?  His specific query was "WHERE md5 = '....'".  Why on earth would
that force a sequence scan if it were an indexed column?  Heck, the
btree index should rule out 15/16ths of the rows after the first
character comparison.


Re: Table partitioning for maximum speed?

From
Jean-Luc Lachance
Date:
No.  There is a big difference between schemas and partitioned tables.

The closest thing would be a bunch of tables enherited from a base
table.
Each tables having a common field all the same for a specific table.

The problem is that the planner cannot be aware of this and make better
use of the implicit key.

JLL


David Busby wrote:
>
> Is this partitioning like the schemas mentioned here:
> http://www.postgresql.org/docs/current/static/ddl-schemas.html?  Would those
> help and increase performance?
>
> /B
>
> ----- Original Message -----
> From: "Jean-Luc Lachance" <jllachan@nsd.ca>
> To: "Vivek Khera" <khera@kcilink.com>
> Cc: <pgsql-general@postgresql.org>
> Sent: Friday, October 10, 2003 14:23
> Subject: Re: [GENERAL] Table partitioning for maximum speed?
>
> > BULL.
> >
> > How many times does PG have to scan the whole table because of MVCC?
> > At least with partitioning there is a fighting chance that that won't be
> > necessary.
> > Queries that involve the field on which the table is partitioned execute
> > faster by an order of magnitude.
> > It also helps with vaccuming as PG can vaccum only one partition at a
> > time.
> > I have 17M row table where all records get frequently updated over a
> > year.
> > I would do my own partitioning with inheritance if it was not broken.
> > Partitioning would be a BIG plus in my book. So would visibility of
> > records but that is another fight.
> >
> > JLL
> >
> > Vivek Khera wrote:
> > >
> > > >>>>> "JB" == Jeff Boes <jboes@nexcerpt.com> writes:
> > >
> > > JB> Will a query against a table of 0.5 million rows beat a query
> against
> > > JB> a table of 7 million rows by a margin that makes it worth the hassle
> > > JB> of supporting 15 "extra" tables?
> > >
> > > I think you'll be better off with a single table, as you won't have
> > > contention for the index pages in the cache.
> > >
> > > One thing to do is to reindex reasonably often (for PG < 7.4) to avoid
> > > index bloat, which will make them not fit in cache.  Just check the
> > > size of your index in the pg_class table, and when it gets big,
> > > reindex (assuming you do lots of updates/inserts to the table).
> > >
> > > Your table splitting solution sounds like something I'd do if I were
> > > forced to use mysql ;-)
> > >
> > > --
> > > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> > > Vivek Khera, Ph.D.                Khera Communications, Inc.
> > > Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
> > > AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 9: the planner will ignore your desire to choose an index scan if
> your
> > >       joining column's datatypes do not match
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 8: explain analyze is your friend
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org

Re: Table partitioning for maximum speed?

From
Jean-Luc Lachance
Date:
I was replying to the general comment that PG cannot profit from having
patitioned tables.

Vivek Khera wrote:
>
> >>>>> "JL" == Jean-Luc Lachance <jllachan@nsd.ca> writes:
>
> JL> BULL.
> JL> How many times does PG have to scan the whole table because of MVCC?
> JL> At least with partitioning there is a fighting chance that that won't be
> JL> necessary.
>
> Huh?  His specific query was "WHERE md5 = '....'".  Why on earth would
> that force a sequence scan if it were an indexed column?  Heck, the
> btree index should rule out 15/16ths of the rows after the first
> character comparison.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org