Thread: 50 MB Table
I have a 50 MB +- table in postgres. The data is normalized so there's not much I can do about the size. The tuples are about 512 bytes so there's a pile of 'em. I need searching on of several fields, a couple in particular are text fields that needs 'LIKE'. The problem is, the thing is way too slow. So, I was wondering, before I go hunting for some other solution, could anyone here point me to some ways to (hand) optimize the searching in postgres? Different indexes, hashing and LIKE? I'm not sure where to go with this. The basic criteria are: - sizes of indexes, etc, is not an issue. There's lot's of room on the box. - the data is basically static so a read-only (if such a thing) is fine. - it needs to be FAST cheers jb
JB wrote: > I have a 50 MB +- table in postgres. The data is normalized so there's > not much I can do about the size. The tuples are about 512 bytes so > there's a pile of 'em. I need searching on of several fields, a couple > in particular are text fields that needs 'LIKE'. The problem is, the > thing is way too slow. So, I was wondering, before I go hunting for some > other solution, could anyone here point me to some ways to (hand) > optimize the searching in postgres? Different indexes, hashing and LIKE? > I'm not sure where to go with this. > > The basic criteria are: > - sizes of indexes, etc, is not an issue. There's lot's of room on the > box. > - the data is basically static so a read-only (if such a thing) is > fine. > - it needs to be FAST > > cheers > jb > > ************ It sounds as if you have several different kinds of information encoded in a single column using special words or letter combinations. This is a violation of the ideal that data items should be "atomic." You should make a catalog of all the things that you want to be able to say about each tuple, and design a relational schema in which atomic assertion is given its own column (attribute). Then you will be able to create indices on each, and you won't have to use LIKE in your WHERE clauses. Paul
Thanks for taking the time to reply. I think that I wasn't as clear as I could be. This table is normalized and as far as I understand, what I'm doing with it is not extraordinary. The schema is basically... CREATE TABLE info ( lastname char(50), street_name char(50), street_number char(5), ... (a bunch of other stuff that works fine with '=') ); CREATE INDEX nx_info1 ON info (lastname); CREATE INDEX nx_info2 ON info (street_name); The select is as simple as this in most cases... SELECT * FROM info WHERE street_name LIKE 'MAIN%'; .,,the table about 50MB worth, about 70,000 records. I have an index on 'lastname' and 'street_name' and I need to search on each of these with 'LIKE'. So I was wondering about ways to speed this up. It's very slow. It takes about 20 seconds for the above query. I even uppercased all the names, hoping tht would help. I wondered if I'd used the wrong index type (btree), or if there were some flags that would help. Is there a way to bust the indexes out alpha on the first letter say, or some other such scheme. BTW the machine is RH6.1 with 128mb ram, 27 GB, P350, no X and no users (except me ;) Paul Condon wrote: > > JB wrote: > > > I have a 50 MB +- table in postgres. The data is normalized so there's > > not much I can do about the size. The tuples are about 512 bytes so > > there's a pile of 'em. I need searching on of several fields, a couple > > in particular are text fields that needs 'LIKE'. The problem is, the > > thing is way too slow. So, I was wondering, before I go hunting for some > > other solution, could anyone here point me to some ways to (hand) > > optimize the searching in postgres? Different indexes, hashing and LIKE? > > I'm not sure where to go with this. > > > > The basic criteria are: > > - sizes of indexes, etc, is not an issue. There's lot's of room on the > > box. > > - the data is basically static so a read-only (if such a thing) is > > fine. > > - it needs to be FAST > > > > cheers > > jb > > > > ************ > > It sounds as if you have several different kinds of information encoded in > a single column using special words or letter combinations. This is a > violation of the ideal that data items should be "atomic." You should make > a catalog of all the things that you want to be able to say about each > tuple, and design a relational schema in which atomic assertion is given > its own column (attribute). Then you will be able to create indices on > each, and you won't have to use LIKE in your WHERE clauses. > > Paul -- I'm in direct contact with many advanced fun CONCEPTS. Paul Condon wrote: > > JB wrote: > > > I have a 50 MB +- table in postgres. The data is normalized so there's > > not much I can do about the size. The tuples are about 512 bytes so > > there's a pile of 'em. I need searching on of several fields, a couple > > in particular are text fields that needs 'LIKE'. The problem is, the > > thing is way too slow. So, I was wondering, before I go hunting for some > > other solution, could anyone here point me to some ways to (hand) > > optimize the searching in postgres? Different indexes, hashing and LIKE? > > I'm not sure where to go with this. > > > > The basic criteria are: > > - sizes of indexes, etc, is not an issue. There's lot's of room on the > > box. > > - the data is basically static so a read-only (if such a thing) is > > fine. > > - it needs to be FAST > > > > cheers > > jb > > > > ************ > > It sounds as if you have several different kinds of information encoded in > a single column using special words or letter combinations. This is a > violation of the ideal that data items should be "atomic." You should make > a catalog of all the things that you want to be able to say about each > tuple, and design a relational schema in which atomic assertion is given > its own column (attribute). Then you will be able to create indices on > each, and you won't have to use LIKE in your WHERE clauses. > > Paul > > ************ -- I'm in direct contact with many advanced fun CONCEPTS.
On Mon, 6 Mar 2000, JB wrote: > [SNIP] > CREATE TABLE info ( > lastname char(50), > street_name char(50), > street_number char(5), > ... (a bunch of other stuff that works fine with '=') > ); > > CREATE INDEX nx_info1 ON info (lastname); > CREATE INDEX nx_info2 ON info (street_name); > > The select is as simple as this in most cases... > > SELECT * FROM info WHERE street_name LIKE 'MAIN%'; > [SNIP] might want to try CLUSTERing the table based on one of the indexes. that'll put everything in order ( physically ) and should speed it up a bit. id also suggest grabbing more ram while its (relatively) inexpensive. > [SNIP] --- Howie <caffeine@toodarkpark.org> URL: http://www.toodarkpark.org "You do not have the right to free health care. That would be nice, but from the looks of public housing, we're just not interested in health care."
The example you give, LIKE 'MAIN%', should be able to use the index, but I'm not that expert on the internals of PostgreSQL. I was speaking from a general knowledge of SQL. I was supposing you were trying to do something, such as, LIKE '%BLVD%'. In this latter case, an index will bring you no benefit. Perhaps it is working faster than it would have, had there been no index. Have you tried dropping the index and seeing if the performance gets worse? To make this test more powerful, choose a pattern that matches very few rows in your table. JB wrote: > Thanks for taking the time to reply. I think that I wasn't as clear as I > could be. This table is normalized and as far as I understand, what I'm > doing with it is not extraordinary. The schema is basically... > > CREATE TABLE info ( > lastname char(50), > street_name char(50), > street_number char(5), > ... (a bunch of other stuff that works fine with '=') > ); > > CREATE INDEX nx_info1 ON info (lastname); > CREATE INDEX nx_info2 ON info (street_name); > > The select is as simple as this in most cases... > > SELECT * FROM info WHERE street_name LIKE 'MAIN%'; > > .,,the table about 50MB worth, about 70,000 records. I have an index on > 'lastname' and 'street_name' and I need to search on each of these with > 'LIKE'. So I was wondering about ways to speed this up. It's very slow. > It takes about 20 seconds for the above query. I even uppercased all the > names, hoping tht would help. I wondered if I'd used the wrong index > type (btree), or if there were some flags that would help. Is there a > way to bust the indexes out alpha on the first letter say, or some other > such scheme. BTW the machine is RH6.1 with 128mb ram, 27 GB, P350, no X > and no users (except me ;) > > Paul Condon wrote: > > > > JB wrote: > > > > > I have a 50 MB +- table in postgres. The data is normalized so there's > > > not much I can do about the size. The tuples are about 512 bytes so > > > there's a pile of 'em. I need searching on of several fields, a couple > > > in particular are text fields that needs 'LIKE'. The problem is, the > > > thing is way too slow. So, I was wondering, before I go hunting for some > > > other solution, could anyone here point me to some ways to (hand) > > > optimize the searching in postgres? Different indexes, hashing and LIKE? > > > I'm not sure where to go with this. > > > > > > The basic criteria are: > > > - sizes of indexes, etc, is not an issue. There's lot's of room on the > > > box. > > > - the data is basically static so a read-only (if such a thing) is > > > fine. > > > - it needs to be FAST > > > > > > cheers > > > jb > > > > > > ************ > > > > It sounds as if you have several different kinds of information encoded in > > a single column using special words or letter combinations. This is a > > violation of the ideal that data items should be "atomic." You should make > > a catalog of all the things that you want to be able to say about each > > tuple, and design a relational schema in which atomic assertion is given > > its own column (attribute). Then you will be able to create indices on > > each, and you won't have to use LIKE in your WHERE clauses. > > > > Paul > > -- > I'm in direct contact with many advanced fun CONCEPTS. > > Paul Condon wrote: > > >
> CREATE INDEX nx_info1 ON info (lastname); > CREATE INDEX nx_info2 ON info (street_name); > > The select is as simple as this in most cases... > > SELECT * FROM info WHERE street_name LIKE 'MAIN%'; > > .,,the table about 50MB worth, about 70,000 records. I have an index on > 'lastname' and 'street_name' and I need to search on each of these with > 'LIKE'. So I was wondering about ways to speed this up. It's very slow. > It takes about 20 seconds for the above query. I even uppercased all the I think, that psql can not use any indicces when doing queries like this. The indices would work if you would compare with "=". Actually I even do not even know, how one culd create a simple index helping for searches like this. Marten
I've been shuffling thru the 'LIKE' code in pg, but whilst perusing, it occurred to me that I could use a function for this thing. Since i'm only looking at the first part of the string I use... SELECT * FROM info WHERE substring(street_name from 1 to 4) = 'MAIN' ...and the query completes in under 2 secs. Stupidly obvious now but I thought I would share this blinding insight with the list ;) Thanks again for all your help. cheers jb Paul Condon wrote: > > The example you give, LIKE 'MAIN%', should be able to use the index, but I'm > not that expert on the internals of PostgreSQL. I was speaking from a general > knowledge of SQL. I was supposing you were trying to do something, such as, > LIKE '%BLVD%'. In this latter case, an index will bring you no benefit. > > Perhaps it is working faster than it would have, had there been no index. Have > you tried dropping the index and seeing if the performance gets worse? To make > this test more powerful, choose a pattern that matches very few rows in your > table. > > JB wrote: > > > Thanks for taking the time to reply. I think that I wasn't as clear as I > > could be. This table is normalized and as far as I understand, what I'm > > doing with it is not extraordinary. The schema is basically... > > > > CREATE TABLE info ( > > lastname char(50), > > street_name char(50), > > street_number char(5), > > ... (a bunch of other stuff that works fine with '=') > > ); > > > > CREATE INDEX nx_info1 ON info (lastname); > > CREATE INDEX nx_info2 ON info (street_name); > > > > The select is as simple as this in most cases... > > > > SELECT * FROM info WHERE street_name LIKE 'MAIN%'; > > > > .,,the table about 50MB worth, about 70,000 records. I have an index on > > 'lastname' and 'street_name' and I need to search on each of these with > > 'LIKE'. So I was wondering about ways to speed this up. It's very slow. > > It takes about 20 seconds for the above query. I even uppercased all the > > names, hoping tht would help. I wondered if I'd used the wrong index > > type (btree), or if there were some flags that would help. Is there a > > way to bust the indexes out alpha on the first letter say, or some other > > such scheme. BTW the machine is RH6.1 with 128mb ram, 27 GB, P350, no X > > and no users (except me ;) > > > > Paul Condon wrote: > > > > > > JB wrote: > > > > > > > I have a 50 MB +- table in postgres. The data is normalized so there's > > > > not much I can do about the size. The tuples are about 512 bytes so > > > > there's a pile of 'em. I need searching on of several fields, a couple > > > > in particular are text fields that needs 'LIKE'. The problem is, the > > > > thing is way too slow. So, I was wondering, before I go hunting for some > > > > other solution, could anyone here point me to some ways to (hand) > > > > optimize the searching in postgres? Different indexes, hashing and LIKE? > > > > I'm not sure where to go with this. > > > > > > > > The basic criteria are: > > > > - sizes of indexes, etc, is not an issue. There's lot's of room on the > > > > box. > > > > - the data is basically static so a read-only (if such a thing) is > > > > fine. > > > > - it needs to be FAST > > > > > > > > cheers > > > > jb > > > > > > > > ************ > > > > > > It sounds as if you have several different kinds of information encoded in > > > a single column using special words or letter combinations. This is a > > > violation of the ideal that data items should be "atomic." You should make > > > a catalog of all the things that you want to be able to say about each > > > tuple, and design a relational schema in which atomic assertion is given > > > its own column (attribute). Then you will be able to create indices on > > > each, and you won't have to use LIKE in your WHERE clauses. > > > > > > Paul > > -- If everything is coming your way then you're in the wrong lane.
> I've been shuffling thru the 'LIKE' code in pg, but whilst perusing, it > occurred to me that I could use a function for this thing. Since i'm > only looking at the first part of the string I use... > > SELECT * FROM info WHERE substring(street_name from 1 to 4) = 'MAIN' > > ...and the query completes in under 2 secs. Stupidly obvious now but I > thought I would share this blinding insight with the list ;) It was not so obvious to me. I was struggling with the same thing. This is a good workaround for the case of "searchstring%". I wonder if the LIKE code could be enhanced to perform the query in the same way. > > Thanks again for all your help. > > cheers > jb > > Paul Condon wrote: > > > > The example you give, LIKE 'MAIN%', should be able to use the index, but I'm > > not that expert on the internals of PostgreSQL. I was speaking from a general > > knowledge of SQL. I was supposing you were trying to do something, such as, > > LIKE '%BLVD%'. In this latter case, an index will bring you no benefit. > > > > Perhaps it is working faster than it would have, had there been no index. Have > > you tried dropping the index and seeing if the performance gets worse? To make > > this test more powerful, choose a pattern that matches very few rows in your > > table. > > > > JB wrote: > > > > > Thanks for taking the time to reply. I think that I wasn't as clear as I > > > could be. This table is normalized and as far as I understand, what I'm > > > doing with it is not extraordinary. The schema is basically... > > > > > > CREATE TABLE info ( > > > lastname char(50), > > > street_name char(50), > > > street_number char(5), > > > ... (a bunch of other stuff that works fine with '=') > > > ); > > > > > > CREATE INDEX nx_info1 ON info (lastname); > > > CREATE INDEX nx_info2 ON info (street_name); > > > > > > The select is as simple as this in most cases... > > > > > > SELECT * FROM info WHERE street_name LIKE 'MAIN%'; > > > > > > .,,the table about 50MB worth, about 70,000 records. I have an index on > > > 'lastname' and 'street_name' and I need to search on each of these with > > > 'LIKE'. So I was wondering about ways to speed this up. It's very slow. > > > It takes about 20 seconds for the above query. I even uppercased all the > > > names, hoping tht would help. I wondered if I'd used the wrong index > > > type (btree), or if there were some flags that would help. Is there a > > > way to bust the indexes out alpha on the first letter say, or some other > > > such scheme. BTW the machine is RH6.1 with 128mb ram, 27 GB, P350, no X > > > and no users (except me ;) > > > > > > Paul Condon wrote: > > > > > > > > JB wrote: > > > > > > > > > I have a 50 MB +- table in postgres. The data is normalized so there's > > > > > not much I can do about the size. The tuples are about 512 bytes so > > > > > there's a pile of 'em. I need searching on of several fields, a couple > > > > > in particular are text fields that needs 'LIKE'. The problem is, the > > > > > thing is way too slow. So, I was wondering, before I go hunting for some > > > > > other solution, could anyone here point me to some ways to (hand) > > > > > optimize the searching in postgres? Different indexes, hashing and LIKE? > > > > > I'm not sure where to go with this. > > > > > > > > > > The basic criteria are: > > > > > - sizes of indexes, etc, is not an issue. There's lot's of room on the > > > > > box. > > > > > - the data is basically static so a read-only (if such a thing) is > > > > > fine. > > > > > - it needs to be FAST > > > > > > > > > > cheers > > > > > jb > > > > > > > > > > ************ > > > > > > > > It sounds as if you have several different kinds of information encoded in > > > > a single column using special words or letter combinations. This is a > > > > violation of the ideal that data items should be "atomic." You should make > > > > a catalog of all the things that you want to be able to say about each > > > > tuple, and design a relational schema in which atomic assertion is given > > > > its own column (attribute). Then you will be able to create indices on > > > > each, and you won't have to use LIKE in your WHERE clauses. > > > > > > > > Paul > > > > > -- > If everything is coming your way then you're in the wrong lane. > > ************ >
I've posted this as a suggestion to the HACKERS list. I'd love to make a patch but just don't have the time right now to sort through the code. cheers jim Bruce Bantos wrote: > > > I've been shuffling thru the 'LIKE' code in pg, but whilst perusing, it > > occurred to me that I could use a function for this thing. Since i'm > > only looking at the first part of the string I use... > > > > SELECT * FROM info WHERE substring(street_name from 1 to 4) = 'MAIN' > > > > ...and the query completes in under 2 secs. Stupidly obvious now but I > > thought I would share this blinding insight with the list ;) > > It was not so obvious to me. I was struggling with the same thing. This is a > good workaround for the case of "searchstring%". I wonder if the LIKE code > could be enhanced to perform the query in the same way. > -- If everything is coming your way then you're in the wrong lane.
Correct me if I'm wrong, but it may be slightly faster to use regular expressions for this type of application. Instead of doing (WHERE string LIKE '%MAIN%') you would use (WHERE string ~ 'MAIN'). Or perhaps the difference in speed would be negligible? Also, I believe there there would be a good case for including a new type of index, for strings, which would be some kind of tree structure, based on the first N characters of the string? Where the first branch of the tree is the first character, second brand would be the second character, and so on. Similar to a binary tree I guess. So that searches of type (WHERE string LIKE 'MAIN%') would benefit greatly on large tables. Andrew.
hmm, I have a 900MB Table with 6.500.000 rows, one varchar field indexed (about 250MB index) running PG 6.5.3 on a dual PII 350 with 512MB RAM. A query via PHP3 (with a like statement) is processed in about 2 seconds (returning about 500 rows) or 30 seconds when returning 30.000 rows. The number of rows returned depends on the user input of course. /mike http://www.libranet.com/petition.html
After much futzing and fiddling I've determined, umm, well very little regarding my problem but a little more about PG. The LIKE query does use the index if and only if, the '%' is at the tail of the string and there is only one '%' in the string. So LIKE 'MAIN%' does use the index but LIKE '%MAIN%' won't. This was verified on the HACKERS list. This means that the 'substring' query that I used should be a heck of a lot slower. EXPLAIN bears this out where the 'substring' query looked at 15,000 rows +- and the LIKE 'MAIN%' used 30 rows. In practise tho, I still wait for the LIKE query for a lot longer than the 'substring' one. The problem, I think I can safely say, is not here. Which leaves me with the setup, implementation, memory usage, OS, permissions, etc. Sorry for all the traffic and confusion. I'll report back when I figure this out. cheers jb -- This is the first age that's paid much attention to the future, which is a little ironic since we may not have one. -- Arthur Clarke
>I've been shuffling thru the 'LIKE' code in pg, but whilst perusing, it >occurred to me that I could use a function for this thing. Since i'm >only looking at the first part of the string I use... > >SELECT * FROM info WHERE substring(street_name from 1 to 4) = 'MAIN' > >...and the query completes in under 2 secs. Stupidly obvious now but I >thought I would share this blinding insight with the list ;) if that case, try: SELECT * FROM info WHERE street_name ~ '^MAIN'; One trick with string indexes in PG is they must be 'front-anchored', which I thought "LIKE 'MAIN%'" was. But maybe it's not. This solution is a little more flexible than substring, and may give the speed you desire. (I'm not absolutely sure this query won't have the same performance problems your first query did, but it typically works for us. If you do try this, I'd appreciate knowing whether it works for you or not) -- Karl DeBisschop <kdebisschop@alert.infoplease.com> 617.542.6500x2332 (Fax: 617.956.2696) Information Please / Family Education Network http://www.infoplease.com - your source for FREE online reference