Thread: PostgreSQL Indexing versus MySQL

PostgreSQL Indexing versus MySQL

From

"Chris Fossenier"

Date:

14 February 2004, 17:34:44

I currently have a client with a database that must hold 125 million records and all tallied about 250 fields.

The database has been normalized and indexed appropriately.

If any of you have worked with MySQL, you will have discovered that indexing is very limited. You can only have one index file per table. The indexing process actuallly creates a full copy of the original table and once you get above 2 indexes with 125million records, it is extremely slow.

Should I even bother trying PostgreSQL to resolve this issue?

We can generate the same indexes in MS SQL and Oracle in a fraction of the amount of time when held up to MySQL.

Thanks

Chris.

Re: PostgreSQL Indexing versus MySQL

From

Adam Ruth

Date:

15 February 2004, 19:11:19

I had this same problem not long ago, and it caused me go with
PostgreSQL over MySQL on a project.  The problem you're describing
does not exist in PostgreSQL.  Creating a new index on a table does
not require recreating the other indexes.  MySQL also has the same
problem if you add or drop columns, all indexes need to be recreated.
Again, not a problem in PSQL.  You'll find times similar to Oracle and
MS SQL (I've never directly compared them, but they feel about the
same).

Regards,

Adam Ruth

On Feb 11, 2004, at 1:58 PM, Chris Fossenier wrote:

<excerpt><fontfamily><param>Arial</param><smaller>I currently have a
client with a database that must hold 125 million records and all
tallied about 250 fields.</smaller></fontfamily>

<fontfamily><param>Arial</param><smaller>The database has been
normalized and indexed appropriately.</smaller></fontfamily>

<fontfamily><param>Arial</param><smaller>If any of you have worked
with MySQL, you will have discovered that indexing is very limited.
You can only have one index file per table. The indexing process
actuallly creates a full copy of the original table and once you get
above 2 indexes with 125million records, it is extremely slow.</smaller></fontfamily>

<fontfamily><param>Arial</param><smaller>Should I even bother trying
PostgreSQL to resolve this issue?</smaller></fontfamily>

<fontfamily><param>Arial</param><smaller>We can generate the same
indexes in MS SQL and Oracle in a fraction of the amount of time when
held up to MySQL.</smaller></fontfamily>

<fontfamily><param>Arial</param><smaller>Thanks</smaller></fontfamily>

<fontfamily><param>Arial</param><smaller>Chris.</smaller></fontfamily>

</excerpt>I had this same problem not long ago, and it caused me go with
PostgreSQL over MySQL on a project.  The problem you're describing does
not exist in PostgreSQL.  Creating a new index on a table does not
require recreating the other indexes.  MySQL also has the same problem
if you add or drop columns, all indexes need to be recreated.  Again,
not a problem in PSQL.  You'll find times similar to Oracle and MS SQL
(I've never directly compared them, but they feel about the same).

Regards,

Adam Ruth

On Feb 11, 2004, at 1:58 PM, Chris Fossenier wrote:

> I currently have a client with a database that must hold 125 million
> records and all tallied about 250 fields.
>  
> The database has been normalized and indexed appropriately.
>  
> If any of you have worked with MySQL, you will have discovered that
> indexing is very limited. You can only have one index file per table.
> The indexing process actuallly creates a full copy of the original
> table and once you get above 2 indexes with 125million records, it is
> extremely slow.
>  
> Should I even bother trying PostgreSQL to resolve this issue?
>  
> We can generate the same indexes in MS SQL and Oracle in a fraction of
> the amount of time when held up to MySQL.
>  
> Thanks
>  
> Chris.

Re: PostgreSQL Indexing versus MySQL

From

Jan Wieck

Date:

15 February 2004, 19:47:50

Chris Fossenier wrote:

> I currently have a client with a database that must hold 125 million records
> and all tallied about 250 fields.
>
> The database has been normalized and indexed appropriately.
>
> If any of you have worked with MySQL, you will have discovered that indexing
> is very limited. You can only have one index file per table. The indexing
> process actuallly creates a full copy of the original table and once you get
> above 2 indexes with 125million records, it is extremely slow.
>
> Should I even bother trying PostgreSQL to resolve this issue?
>
> We can generate the same indexes in MS SQL and Oracle in a fraction of the
> amount of time when held up to MySQL.

This is a known MySQL specific problem. I know of no other database that
handles index creation in such an inefficient way.

Creating an index in PostgreSQL requires scanning the entire main table
and sorting the resulting key set ... don't see how to do it better.
Every index is stored in its own (set of) file(s). During index
creation, an exclusive lock on the table is required by create index, so
reindexing your entire DB is not your preferred operation during peak
access times of your webserver. But I think professional DBA's don't
torture Oracle that way either.

Jan

>
> Thanks
>
> Chris.
>

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: PostgreSQL Indexing versus MySQL

From

"Chris Fossenier"

Date:

17 February 2004, 15:44:10

Jan,

Thanks for the reply. Is there anyway to control your index file locations?
Can I place my Postgres indexes on different mount points?

I haven't put Postgres to the full test yet for importing/indexing but I was
concerned on the time it took to perform some sample queries that MySQL
seemed to handle more efficiently.

I would like view support but let me know if you have any insight into my
first question.

Thanks for the reply.

Chris.

-----Original Message-----
From: Jan Wieck [mailto:JanWieck@Yahoo.com]
Sent: Sunday, February 15, 2004 4:23 PM
To: Chris Fossenier
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] PostgreSQL Indexing versus MySQL

Chris Fossenier wrote:

> I currently have a client with a database that must hold 125 million
> records and all tallied about 250 fields.
>
> The database has been normalized and indexed appropriately.
>
> If any of you have worked with MySQL, you will have discovered that
> indexing is very limited. You can only have one index file per table.
> The indexing process actuallly creates a full copy of the original
> table and once you get above 2 indexes with 125million records, it is
> extremely slow.
>
> Should I even bother trying PostgreSQL to resolve this issue?
>
> We can generate the same indexes in MS SQL and Oracle in a fraction of
> the amount of time when held up to MySQL.

This is a known MySQL specific problem. I know of no other database that
handles index creation in such an inefficient way.

Creating an index in PostgreSQL requires scanning the entire main table
and sorting the resulting key set ... don't see how to do it better.
Every index is stored in its own (set of) file(s). During index
creation, an exclusive lock on the table is required by create index, so
reindexing your entire DB is not your preferred operation during peak
access times of your webserver. But I think professional DBA's don't
torture Oracle that way either.

Jan

>
> Thanks
>
> Chris.
>

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: PostgreSQL Indexing versus MySQL

From

"scott.marlowe"

Date:

17 February 2004, 19:22:07

On Sun, 15 Feb 2004, Chris Fossenier wrote:

> Jan,
>
> Thanks for the reply. Is there anyway to control your index file locations?
> Can I place my Postgres indexes on different mount points?

Yes, but there's no clean SQL command line interface to do it.  You have
to find the oid of the index, shut down the database, move the index, link
to it with 'ln -s' and restart the database.  Should you reindex, or
recreate the index, you'll have to remember to go back and do this all
over again for the new indexes OID.

> I haven't put Postgres to the full test yet for importing/indexing but I was
> concerned on the time it took to perform some sample queries that MySQL
> seemed to handle more efficiently.

Five (possible) issues here.

1:  Postgresql is built to handle massive parallel load.  If you're
testing a single thread, MySQL will often win by a fair margin.  Put a
real load on it, say 10% write, 90 read, with 100 clients, and you'll see
a win the favor of Postgresql.

2:  Postgresql has a really need feature that allows users to define their
own types.  Yeah, cool.  But, it means that the parser is not much
smarter about coercing an int4 to an int8 than it is about coercing a
custom type (hex, foobar, etc...) from one to another.  What this means
too you, the user, is that:

create table test (id int8, info text);
<insert 10,000 rows>
select * from test where id=456;

will result in a sequential scan.  Why?  Because the default integer type
is int4, and your id field is int8.  Cast the value to int8, and watch it
use an index scan:

select * From test where id=cast(456 as int8);

3:  You might not have a representative data set.  Testing postgresql with
1,000 rows when you're really gonna have 1,000,000 is not good either.  It
behaves differently for differently sized data sets because things that
are cheap on small data sets might be expensive on large ones, and vice
versa (though not usually the vice versa so much.)

4:  You haven't run analyze and / or vacuum full and /or havn't been
running regular vacuums enough and / or don't have enough fsm slots set up
to handle the amount of tuples you are turning over if you are running
regular vacuums enough.

5:  You haven't tuned your installation for the hardware it's on.
Postgresql tends to favor I/O over CPU, and believes it is being installed
on a P100 with 64 Meg of ram if not told otherwise.  The reason is
obvious, if you install a database tuned for a small computer on a big
controller, it may not be real fast, but it will run.  If you install a
database tuned for a monster machine on a small machine the database may
not start.

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

has some real good information on performance tuning.

> I would like view support but let me know if you have any insight into my
> first question.

Views have been fully featured since around 7.1 or so, a couple years
back:

http://www.postgresql.org/docs/7.3/static/sql-createview.html

Re: PostgreSQL Indexing versus MySQL

From

Lincoln Yeoh

Date:

18 February 2004, 06:33:58

At 04:14 PM 2/17/2004 -0700, scott.marlowe wrote:
>custom type (hex, foobar, etc...) from one to another.  What this means
>too you, the user, is that:
>
>create table test (id int8, info text);
><insert 10,000 rows>
>select * from test where id=456;
>
>will result in a sequential scan.  Why?  Because the default integer type
>is int4, and your id field is int8.  Cast the value to int8, and watch it
>use an index scan:
>
>select * From test where id=cast(456 as int8);

Actually won't
select * from test where id='456'
use the index?

I'm curious if this work in all cases - e.g. postgresql figures the best
cast for text to whatever, even for relevant custom types?

Re: PostgreSQL Indexing versus MySQL

From

Jan Wieck

Date:

18 February 2004, 10:27:09

Lincoln Yeoh wrote:
> At 04:14 PM 2/17/2004 -0700, scott.marlowe wrote:
>>custom type (hex, foobar, etc...) from one to another.  What this means
>>too you, the user, is that:
>>
>>create table test (id int8, info text);
>><insert 10,000 rows>
>>select * from test where id=456;
>>
>>will result in a sequential scan.  Why?  Because the default integer type
>>is int4, and your id field is int8.  Cast the value to int8, and watch it
>>use an index scan:
>>
>>select * From test where id=cast(456 as int8);
>
> Actually won't
> select * from test where id='456'
> use the index?
>
> I'm curious if this work in all cases - e.g. postgresql figures the best
> cast for text to whatever, even for relevant custom types?
>

'456' is not text, it is a quoted literal of (yet) unknown type and kept
that way for long enough to know that it should be an int8, ideally.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: PostgreSQL Indexing versus MySQL

From

"scott.marlowe"

Date:

18 February 2004, 13:43:52

On Wed, 18 Feb 2004, Lincoln Yeoh wrote:

> At 04:14 PM 2/17/2004 -0700, scott.marlowe wrote:
> >custom type (hex, foobar, etc...) from one to another.  What this means
> >too you, the user, is that:
> >
> >create table test (id int8, info text);
> ><insert 10,000 rows>
> >select * from test where id=456;
> >
> >will result in a sequential scan.  Why?  Because the default integer type
> >is int4, and your id field is int8.  Cast the value to int8, and watch it
> >use an index scan:
> >
> >select * From test where id=cast(456 as int8);
>
> Actually won't
> select * from test where id='456'
> use the index?
>
> I'm curious if this work in all cases - e.g. postgresql figures the best
> cast for text to whatever, even for relevant custom types?

It works, I just like writing things in the most self documenting manner
possible, since one day somebody else may look at:

select * from test where id='456'

and go, "hey, that's just an int, no need for the quotes" and take them
out not knowing what they do.  cast(456 as int8) is pretty obvious, '456'
is much more subtle.

Re: PostgreSQL Indexing versus MySQL

From

Lincoln Yeoh

Date:

18 February 2004, 14:30:10

At 10:35 AM 2/18/2004 -0700, scott.marlowe wrote:
>It works, I just like writing things in the most self documenting manner
>possible, since one day somebody else may look at:
>
>select * from test where id='456'
>
>and go, "hey, that's just an int, no need for the quotes" and take them
>out not knowing what they do.  cast(456 as int8) is pretty obvious, '456'
>is much more subtle.

Well whenever I see:

select * from test where id=$var

I think "possible SQL injection".

I'd rather see bind variables be used or something similar, and when you
use those, the quotes are automatic.

With your method if the table is altered to int4 or something else, it
won't use the index (unless a postgresql dev adds stuff).

'456'::int8 seems to work too. But I suppose that's a Postgresql-ism.