Thread: Insight into indexes? (or inverting then externally)
So far as I can tell indexes in PostgreSQL are fairly opaque things. The planner may or may not decide to use them, exactly what they contain is largely unknown (unless I want to read the source), how PostgreSQL uses them and in which cases a btree index is better than a hash index. Foo. If I've managed to just skip over a manual section please whack me with a 2x4. If I haven't then any links would be appreciated. I'm considering how best to provide a fast index on a 2.5 million row table (all the voters in Minnesota). Since I can't see what PostgreSQL is doing with the indexes I have to wonder if I can eke out some more performance by just indexing the record's id numbers (a usually numeric char(10) - blame the MN Secretary of State for the "mostly" part) in PostgreSQL and then sticking an inverted index into say... BerkeleyDB. Records might look like where it's obvious that the ids store nicely.. lname.db 'Jore' => '12345678901234567890' So before I jump down this rabbit hole is there anything I can do to PostgreSQL to see what is in the indexes and if there is any way to tweak them? And when is a hash index preferrable to a btree index? Or that other index type that I don't remember the name of. Joshua b. Jore ; http://www.greentechnologist.org ; 1121 1233 1311 200 1201 1302 1211 200 1201 1303 200 1300 1233 1313 1211 1302 1212 1311 1230 200 1201 1303 200 1321 1233 1311 1302 200 1211 1232 1211 1231 1321 200 1310 1220 1221 1232 1223 1303 200 1321 1233 1311 200 1201 1302 1211 232 200 1112 1233 1310 1211 200 1013 1302 1211 1211 1232 201 22
Joshua, what kind of queries do you need ? Do you need to perform full text search ? Oleg On Wed, 5 Jun 2002, Joshua b. Jore wrote: > So far as I can tell indexes in PostgreSQL are fairly opaque things. The > planner may or may not decide to use them, exactly what they contain is > largely unknown (unless I want to read the source), how PostgreSQL uses > them and in which cases a btree index is better than a hash index. > > Foo. If I've managed to just skip over a manual section please whack me > with a 2x4. If I haven't then any links would be appreciated. I'm > considering how best to provide a fast index on a 2.5 million row table > (all the voters in Minnesota). Since I can't see what PostgreSQL is doing > with the indexes I have to wonder if I can eke out some more performance > by just indexing the record's id numbers (a usually numeric char(10) - > blame the MN Secretary of State for the "mostly" part) in PostgreSQL and > then sticking an inverted index into say... BerkeleyDB. > > Records might look like where it's obvious that the ids store nicely.. > > lname.db > 'Jore' => '12345678901234567890' > > So before I jump down this rabbit hole is there anything I can do to > PostgreSQL to see what is in the indexes and if there is any way to tweak > them? And when is a hash index preferrable to a btree index? Or that other > index type that I don't remember the name of. > > Joshua b. Jore ; http://www.greentechnologist.org ; 1121 1233 1311 200 > 1201 1302 1211 200 1201 1303 200 1300 1233 1313 1211 1302 1212 1311 1230 > 200 1201 1303 200 1321 1233 1311 1302 200 1211 1232 1211 1231 1321 200 > 1310 1220 1221 1232 1223 1303 200 1321 1233 1311 200 1201 1302 1211 232 > 200 1112 1233 1310 1211 200 1013 1302 1211 1211 1232 201 22 > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
What's the performance like? I have a similar table (about 3 million names, and I get great (sub second) performance when searching on a name doing a select distinct name where name like 'smith john%'. Have you run an EXPLAIN on the query you are doing? Is it even using your indexes? -Pete Oleg Bartunov wrote: >Joshua, > >what kind of queries do you need ? Do you need to perform full text search ? > > Oleg >On Wed, 5 Jun 2002, Joshua b. Jore wrote: > >>So far as I can tell indexes in PostgreSQL are fairly opaque things. The >>planner may or may not decide to use them, exactly what they contain is >>largely unknown (unless I want to read the source), how PostgreSQL uses >>them and in which cases a btree index is better than a hash index. >> >>Foo. If I've managed to just skip over a manual section please whack me >>with a 2x4. If I haven't then any links would be appreciated. I'm >>considering how best to provide a fast index on a 2.5 million row table >>(all the voters in Minnesota). Since I can't see what PostgreSQL is doing >>with the indexes I have to wonder if I can eke out some more performance >>by just indexing the record's id numbers (a usually numeric char(10) - >>blame the MN Secretary of State for the "mostly" part) in PostgreSQL and >>then sticking an inverted index into say... BerkeleyDB. >> >>Records might look like where it's obvious that the ids store nicely.. >> >>lname.db >>'Jore' => '12345678901234567890' >> >>So before I jump down this rabbit hole is there anything I can do to >>PostgreSQL to see what is in the indexes and if there is any way to tweak >>them? And when is a hash index preferrable to a btree index? Or that other >>index type that I don't remember the name of. >> >>Joshua b. Jore ; http://www.greentechnologist.org ; 1121 1233 1311 200 >>1201 1302 1211 200 1201 1303 200 1300 1233 1313 1211 1302 1212 1311 1230 >>200 1201 1303 200 1321 1233 1311 1302 200 1211 1232 1211 1231 1321 200 >>1310 1220 1221 1232 1223 1303 200 1321 1233 1311 200 1201 1302 1211 232 >>200 1112 1233 1310 1211 200 1013 1302 1211 1211 1232 201 22 >> >> >>---------------------------(end of broadcast)--------------------------- >>TIP 2: you can get off all lists at once with the unregister command >> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >> > > Regards, > Oleg >_____________________________________________________________ >Oleg Bartunov, sci.researcher, hostmaster of AstroNet, >Sternberg Astronomical Institute, Moscow University (Russia) >Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >phone: +007(095)939-16-83, +007(095)939-23-83 > > >---------------------------(end of broadcast)--------------------------- >TIP 4: Don't 'kill -9' the postmaster >
On Wed, 5 Jun 2002 09:07:37 -0500 (CDT) "Joshua b. Jore" <josh@greentechnologist.org> wrote: > So before I jump down this rabbit hole is there anything I can do to > PostgreSQL to see what is in the indexes and if there is any way to tweak > them? Exactly what would you like to "tweak"? I'm not really sure what the problem is... > And when is a hash index preferrable to a btree index? Never (if you know of a situation in which hash has a significant advantage over btree, let me know -- I haven't been able to find one). The current hash index code doesn't handle multiple columns, unique indexes, tends to deadlock under heavy concurrent access, and is VERY slow to create. > Or that other index type that I don't remember the name of. You're probably thinking of rtree, which is for non-scalar data (and thus, isn't an option for you AFAICT). Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC