Re: PostgreSQL Indexing versus MySQL - Mailing list pgsql-general

From scott.marlowe
Subject Re: PostgreSQL Indexing versus MySQL
Date
Msg-id Pine.LNX.4.33.0402171602170.31696-100000@css120.ihs.com
Whole thread Raw
In response to Re: PostgreSQL Indexing versus MySQL  ("Chris Fossenier" <chris@engenuit.com>)
Responses Re: PostgreSQL Indexing versus MySQL  (Lincoln Yeoh <lyeoh@pop.jaring.my>)
List pgsql-general
On Sun, 15 Feb 2004, Chris Fossenier wrote:

> Jan,
>
> Thanks for the reply. Is there anyway to control your index file locations?
> Can I place my Postgres indexes on different mount points?

Yes, but there's no clean SQL command line interface to do it.  You have
to find the oid of the index, shut down the database, move the index, link
to it with 'ln -s' and restart the database.  Should you reindex, or
recreate the index, you'll have to remember to go back and do this all
over again for the new indexes OID.

> I haven't put Postgres to the full test yet for importing/indexing but I was
> concerned on the time it took to perform some sample queries that MySQL
> seemed to handle more efficiently.

Five (possible) issues here.

1:  Postgresql is built to handle massive parallel load.  If you're
testing a single thread, MySQL will often win by a fair margin.  Put a
real load on it, say 10% write, 90 read, with 100 clients, and you'll see
a win the favor of Postgresql.

2:  Postgresql has a really need feature that allows users to define their
own types.  Yeah, cool.  But, it means that the parser is not much
smarter about coercing an int4 to an int8 than it is about coercing a
custom type (hex, foobar, etc...) from one to another.  What this means
too you, the user, is that:

create table test (id int8, info text);
<insert 10,000 rows>
select * from test where id=456;

will result in a sequential scan.  Why?  Because the default integer type
is int4, and your id field is int8.  Cast the value to int8, and watch it
use an index scan:

select * From test where id=cast(456 as int8);

3:  You might not have a representative data set.  Testing postgresql with
1,000 rows when you're really gonna have 1,000,000 is not good either.  It
behaves differently for differently sized data sets because things that
are cheap on small data sets might be expensive on large ones, and vice
versa (though not usually the vice versa so much.)

4:  You haven't run analyze and / or vacuum full and /or havn't been
running regular vacuums enough and / or don't have enough fsm slots set up
to handle the amount of tuples you are turning over if you are running
regular vacuums enough.

5:  You haven't tuned your installation for the hardware it's on.
Postgresql tends to favor I/O over CPU, and believes it is being installed
on a P100 with 64 Meg of ram if not told otherwise.  The reason is
obvious, if you install a database tuned for a small computer on a big
controller, it may not be real fast, but it will run.  If you install a
database tuned for a monster machine on a small machine the database may
not start.

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

has some real good information on performance tuning.

> I would like view support but let me know if you have any insight into my
> first question.

Views have been fully featured since around 7.1 or so, a couple years
back:

http://www.postgresql.org/docs/7.3/static/sql-createview.html


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_dump and circular dependency
Next
From: "scott.marlowe"
Date:
Subject: Re: Support.