Thread: PostgreSQL Indexing versus MySQL
I currently have a client with a database that must hold 125 million records and all tallied about 250 fields.
The database has been normalized and indexed appropriately.
If any of you have worked with MySQL, you will have discovered that indexing is very limited. You can only have one index file per table. The indexing process actuallly creates a full copy of the original table and once you get above 2 indexes with 125million records, it is extremely slow.
Should I even bother trying PostgreSQL to resolve this issue?
We can generate the same indexes in MS SQL and Oracle in a fraction of the amount of time when held up to MySQL.
Thanks
Chris.
I had this same problem not long ago, and it caused me go with PostgreSQL over MySQL on a project. The problem you're describing does not exist in PostgreSQL. Creating a new index on a table does not require recreating the other indexes. MySQL also has the same problem if you add or drop columns, all indexes need to be recreated. Again, not a problem in PSQL. You'll find times similar to Oracle and MS SQL (I've never directly compared them, but they feel about the same). Regards, Adam Ruth On Feb 11, 2004, at 1:58 PM, Chris Fossenier wrote: <excerpt><fontfamily><param>Arial</param><smaller>I currently have a client with a database that must hold 125 million records and all tallied about 250 fields.</smaller></fontfamily> <fontfamily><param>Arial</param><smaller>The database has been normalized and indexed appropriately.</smaller></fontfamily> <fontfamily><param>Arial</param><smaller>If any of you have worked with MySQL, you will have discovered that indexing is very limited. You can only have one index file per table. The indexing process actuallly creates a full copy of the original table and once you get above 2 indexes with 125million records, it is extremely slow.</smaller></fontfamily> <fontfamily><param>Arial</param><smaller>Should I even bother trying PostgreSQL to resolve this issue?</smaller></fontfamily> <fontfamily><param>Arial</param><smaller>We can generate the same indexes in MS SQL and Oracle in a fraction of the amount of time when held up to MySQL.</smaller></fontfamily> <fontfamily><param>Arial</param><smaller>Thanks</smaller></fontfamily> <fontfamily><param>Arial</param><smaller>Chris.</smaller></fontfamily> </excerpt>I had this same problem not long ago, and it caused me go with PostgreSQL over MySQL on a project. The problem you're describing does not exist in PostgreSQL. Creating a new index on a table does not require recreating the other indexes. MySQL also has the same problem if you add or drop columns, all indexes need to be recreated. Again, not a problem in PSQL. You'll find times similar to Oracle and MS SQL (I've never directly compared them, but they feel about the same). Regards, Adam Ruth On Feb 11, 2004, at 1:58 PM, Chris Fossenier wrote: > I currently have a client with a database that must hold 125 million > records and all tallied about 250 fields. > > The database has been normalized and indexed appropriately. > > If any of you have worked with MySQL, you will have discovered that > indexing is very limited. You can only have one index file per table. > The indexing process actuallly creates a full copy of the original > table and once you get above 2 indexes with 125million records, it is > extremely slow. > > Should I even bother trying PostgreSQL to resolve this issue? > > We can generate the same indexes in MS SQL and Oracle in a fraction of > the amount of time when held up to MySQL. > > Thanks > > Chris.
Chris Fossenier wrote: > I currently have a client with a database that must hold 125 million records > and all tallied about 250 fields. > > The database has been normalized and indexed appropriately. > > If any of you have worked with MySQL, you will have discovered that indexing > is very limited. You can only have one index file per table. The indexing > process actuallly creates a full copy of the original table and once you get > above 2 indexes with 125million records, it is extremely slow. > > Should I even bother trying PostgreSQL to resolve this issue? > > We can generate the same indexes in MS SQL and Oracle in a fraction of the > amount of time when held up to MySQL. This is a known MySQL specific problem. I know of no other database that handles index creation in such an inefficient way. Creating an index in PostgreSQL requires scanning the entire main table and sorting the resulting key set ... don't see how to do it better. Every index is stored in its own (set of) file(s). During index creation, an exclusive lock on the table is required by create index, so reindexing your entire DB is not your preferred operation during peak access times of your webserver. But I think professional DBA's don't torture Oracle that way either. Jan > > Thanks > > Chris. > -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan, Thanks for the reply. Is there anyway to control your index file locations? Can I place my Postgres indexes on different mount points? I haven't put Postgres to the full test yet for importing/indexing but I was concerned on the time it took to perform some sample queries that MySQL seemed to handle more efficiently. I would like view support but let me know if you have any insight into my first question. Thanks for the reply. Chris. -----Original Message----- From: Jan Wieck [mailto:JanWieck@Yahoo.com] Sent: Sunday, February 15, 2004 4:23 PM To: Chris Fossenier Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] PostgreSQL Indexing versus MySQL Chris Fossenier wrote: > I currently have a client with a database that must hold 125 million > records and all tallied about 250 fields. > > The database has been normalized and indexed appropriately. > > If any of you have worked with MySQL, you will have discovered that > indexing is very limited. You can only have one index file per table. > The indexing process actuallly creates a full copy of the original > table and once you get above 2 indexes with 125million records, it is > extremely slow. > > Should I even bother trying PostgreSQL to resolve this issue? > > We can generate the same indexes in MS SQL and Oracle in a fraction of > the amount of time when held up to MySQL. This is a known MySQL specific problem. I know of no other database that handles index creation in such an inefficient way. Creating an index in PostgreSQL requires scanning the entire main table and sorting the resulting key set ... don't see how to do it better. Every index is stored in its own (set of) file(s). During index creation, an exclusive lock on the table is required by create index, so reindexing your entire DB is not your preferred operation during peak access times of your webserver. But I think professional DBA's don't torture Oracle that way either. Jan > > Thanks > > Chris. > -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Sun, 15 Feb 2004, Chris Fossenier wrote: > Jan, > > Thanks for the reply. Is there anyway to control your index file locations? > Can I place my Postgres indexes on different mount points? Yes, but there's no clean SQL command line interface to do it. You have to find the oid of the index, shut down the database, move the index, link to it with 'ln -s' and restart the database. Should you reindex, or recreate the index, you'll have to remember to go back and do this all over again for the new indexes OID. > I haven't put Postgres to the full test yet for importing/indexing but I was > concerned on the time it took to perform some sample queries that MySQL > seemed to handle more efficiently. Five (possible) issues here. 1: Postgresql is built to handle massive parallel load. If you're testing a single thread, MySQL will often win by a fair margin. Put a real load on it, say 10% write, 90 read, with 100 clients, and you'll see a win the favor of Postgresql. 2: Postgresql has a really need feature that allows users to define their own types. Yeah, cool. But, it means that the parser is not much smarter about coercing an int4 to an int8 than it is about coercing a custom type (hex, foobar, etc...) from one to another. What this means too you, the user, is that: create table test (id int8, info text); <insert 10,000 rows> select * from test where id=456; will result in a sequential scan. Why? Because the default integer type is int4, and your id field is int8. Cast the value to int8, and watch it use an index scan: select * From test where id=cast(456 as int8); 3: You might not have a representative data set. Testing postgresql with 1,000 rows when you're really gonna have 1,000,000 is not good either. It behaves differently for differently sized data sets because things that are cheap on small data sets might be expensive on large ones, and vice versa (though not usually the vice versa so much.) 4: You haven't run analyze and / or vacuum full and /or havn't been running regular vacuums enough and / or don't have enough fsm slots set up to handle the amount of tuples you are turning over if you are running regular vacuums enough. 5: You haven't tuned your installation for the hardware it's on. Postgresql tends to favor I/O over CPU, and believes it is being installed on a P100 with 64 Meg of ram if not told otherwise. The reason is obvious, if you install a database tuned for a small computer on a big controller, it may not be real fast, but it will run. If you install a database tuned for a monster machine on a small machine the database may not start. http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html has some real good information on performance tuning. > I would like view support but let me know if you have any insight into my > first question. Views have been fully featured since around 7.1 or so, a couple years back: http://www.postgresql.org/docs/7.3/static/sql-createview.html
At 04:14 PM 2/17/2004 -0700, scott.marlowe wrote: >custom type (hex, foobar, etc...) from one to another. What this means >too you, the user, is that: > >create table test (id int8, info text); ><insert 10,000 rows> >select * from test where id=456; > >will result in a sequential scan. Why? Because the default integer type >is int4, and your id field is int8. Cast the value to int8, and watch it >use an index scan: > >select * From test where id=cast(456 as int8); Actually won't select * from test where id='456' use the index? I'm curious if this work in all cases - e.g. postgresql figures the best cast for text to whatever, even for relevant custom types?
Lincoln Yeoh wrote: > At 04:14 PM 2/17/2004 -0700, scott.marlowe wrote: >>custom type (hex, foobar, etc...) from one to another. What this means >>too you, the user, is that: >> >>create table test (id int8, info text); >><insert 10,000 rows> >>select * from test where id=456; >> >>will result in a sequential scan. Why? Because the default integer type >>is int4, and your id field is int8. Cast the value to int8, and watch it >>use an index scan: >> >>select * From test where id=cast(456 as int8); > > Actually won't > select * from test where id='456' > use the index? > > I'm curious if this work in all cases - e.g. postgresql figures the best > cast for text to whatever, even for relevant custom types? > '456' is not text, it is a quoted literal of (yet) unknown type and kept that way for long enough to know that it should be an int8, ideally. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Wed, 18 Feb 2004, Lincoln Yeoh wrote: > At 04:14 PM 2/17/2004 -0700, scott.marlowe wrote: > >custom type (hex, foobar, etc...) from one to another. What this means > >too you, the user, is that: > > > >create table test (id int8, info text); > ><insert 10,000 rows> > >select * from test where id=456; > > > >will result in a sequential scan. Why? Because the default integer type > >is int4, and your id field is int8. Cast the value to int8, and watch it > >use an index scan: > > > >select * From test where id=cast(456 as int8); > > Actually won't > select * from test where id='456' > use the index? > > I'm curious if this work in all cases - e.g. postgresql figures the best > cast for text to whatever, even for relevant custom types? It works, I just like writing things in the most self documenting manner possible, since one day somebody else may look at: select * from test where id='456' and go, "hey, that's just an int, no need for the quotes" and take them out not knowing what they do. cast(456 as int8) is pretty obvious, '456' is much more subtle.
At 10:35 AM 2/18/2004 -0700, scott.marlowe wrote: >It works, I just like writing things in the most self documenting manner >possible, since one day somebody else may look at: > >select * from test where id='456' > >and go, "hey, that's just an int, no need for the quotes" and take them >out not knowing what they do. cast(456 as int8) is pretty obvious, '456' >is much more subtle. Well whenever I see: select * from test where id=$var I think "possible SQL injection". I'd rather see bind variables be used or something similar, and when you use those, the quotes are automatic. With your method if the table is altered to int4 or something else, it won't use the index (unless a postgresql dev adds stuff). '456'::int8 seems to work too. But I suppose that's a Postgresql-ism.