Thread: 50 MB Table

50 MB Table

From
JB
Date:
I have a 50 MB +- table in postgres. The data is normalized so there's
not much I can do about the size. The tuples are about 512 bytes so
there's a pile of 'em. I need searching on of several fields, a couple
in particular are text fields that needs 'LIKE'. The problem is, the
thing is way too slow. So, I was wondering, before I go hunting for some
other solution, could anyone here point me to some ways to (hand)
optimize the searching in postgres? Different indexes, hashing and LIKE?
I'm not sure where to go with this.

The basic criteria are:
 - sizes of indexes, etc, is not an issue. There's lot's of room on the
box.
 - the data is basically static so a read-only (if such a thing) is
fine.
 - it needs to be FAST

cheers
jb

Re: [GENERAL] 50 MB Table

From
Paul Condon
Date:
JB wrote:

> I have a 50 MB +- table in postgres. The data is normalized so there's
> not much I can do about the size. The tuples are about 512 bytes so
> there's a pile of 'em. I need searching on of several fields, a couple
> in particular are text fields that needs 'LIKE'. The problem is, the
> thing is way too slow. So, I was wondering, before I go hunting for some
> other solution, could anyone here point me to some ways to (hand)
> optimize the searching in postgres? Different indexes, hashing and LIKE?
> I'm not sure where to go with this.
>
> The basic criteria are:
>  - sizes of indexes, etc, is not an issue. There's lot's of room on the
> box.
>  - the data is basically static so a read-only (if such a thing) is
> fine.
>  - it needs to be FAST
>
> cheers
> jb
>
> ************

It sounds as if you have several different kinds of information encoded in
a single column using special words or letter combinations. This is a
violation of the ideal that data items should be "atomic." You should make
a catalog of all the things that you want to be able to say about each
tuple, and design a relational schema in which atomic assertion is given
its own column (attribute). Then you will be able to create indices on
each, and you won't have to use LIKE in your WHERE clauses.

Paul


Re: [GENERAL] 50 MB Table

From
JB
Date:
Thanks for taking the time to reply. I think that I wasn't as clear as I
could be. This table is normalized and as far as I understand, what I'm
doing with it is not extraordinary. The schema is basically...

CREATE TABLE info (
  lastname char(50),
  street_name char(50),
  street_number char(5),
  ... (a bunch of other stuff that works fine with '=')
);

CREATE INDEX nx_info1 ON info (lastname);
CREATE INDEX nx_info2 ON info (street_name);

The select is as simple as this in most cases...

SELECT * FROM info WHERE street_name LIKE 'MAIN%';

.,,the table about 50MB worth, about 70,000 records. I have an index on
'lastname' and 'street_name' and I need to search on each of these with
'LIKE'. So I was wondering about ways to speed this up. It's very slow.
It takes about 20 seconds for the above query. I even uppercased all the
names, hoping tht would help. I wondered if I'd used the wrong index
type (btree), or if there were some flags that would help. Is there a
way to bust the indexes out alpha on the first letter say, or some other
such scheme. BTW the machine is RH6.1 with 128mb ram, 27 GB, P350, no X
and no users (except me ;)

Paul Condon wrote:
>
> JB wrote:
>
> > I have a 50 MB +- table in postgres. The data is normalized so there's
> > not much I can do about the size. The tuples are about 512 bytes so
> > there's a pile of 'em. I need searching on of several fields, a couple
> > in particular are text fields that needs 'LIKE'. The problem is, the
> > thing is way too slow. So, I was wondering, before I go hunting for some
> > other solution, could anyone here point me to some ways to (hand)
> > optimize the searching in postgres? Different indexes, hashing and LIKE?
> > I'm not sure where to go with this.
> >
> > The basic criteria are:
> >  - sizes of indexes, etc, is not an issue. There's lot's of room on the
> > box.
> >  - the data is basically static so a read-only (if such a thing) is
> > fine.
> >  - it needs to be FAST
> >
> > cheers
> > jb
> >
> > ************
>
> It sounds as if you have several different kinds of information encoded in
> a single column using special words or letter combinations. This is a
> violation of the ideal that data items should be "atomic." You should make
> a catalog of all the things that you want to be able to say about each
> tuple, and design a relational schema in which atomic assertion is given
> its own column (attribute). Then you will be able to create indices on
> each, and you won't have to use LIKE in your WHERE clauses.
>
> Paul

--
I'm in direct contact with many advanced fun CONCEPTS.

Paul Condon wrote:
>
> JB wrote:
>
> > I have a 50 MB +- table in postgres. The data is normalized so there's
> > not much I can do about the size. The tuples are about 512 bytes so
> > there's a pile of 'em. I need searching on of several fields, a couple
> > in particular are text fields that needs 'LIKE'. The problem is, the
> > thing is way too slow. So, I was wondering, before I go hunting for some
> > other solution, could anyone here point me to some ways to (hand)
> > optimize the searching in postgres? Different indexes, hashing and LIKE?
> > I'm not sure where to go with this.
> >
> > The basic criteria are:
> >  - sizes of indexes, etc, is not an issue. There's lot's of room on the
> > box.
> >  - the data is basically static so a read-only (if such a thing) is
> > fine.
> >  - it needs to be FAST
> >
> > cheers
> > jb
> >
> > ************
>
> It sounds as if you have several different kinds of information encoded in
> a single column using special words or letter combinations. This is a
> violation of the ideal that data items should be "atomic." You should make
> a catalog of all the things that you want to be able to say about each
> tuple, and design a relational schema in which atomic assertion is given
> its own column (attribute). Then you will be able to create indices on
> each, and you won't have to use LIKE in your WHERE clauses.
>
> Paul
>
> ************

--
I'm in direct contact with many advanced fun CONCEPTS.

Re: [GENERAL] 50 MB Table

From
Howie
Date:
On Mon, 6 Mar 2000, JB wrote:

> [SNIP]
> CREATE TABLE info (
>   lastname char(50),
>   street_name char(50),
>   street_number char(5),
>   ... (a bunch of other stuff that works fine with '=')
> );
>
> CREATE INDEX nx_info1 ON info (lastname);
> CREATE INDEX nx_info2 ON info (street_name);
>
> The select is as simple as this in most cases...
>
> SELECT * FROM info WHERE street_name LIKE 'MAIN%';
> [SNIP]
might want to try CLUSTERing the table based on one of the indexes.
that'll put everything in order ( physically ) and should speed it up a
bit.

id also suggest grabbing more ram while its (relatively) inexpensive.

> [SNIP]

---
Howie <caffeine@toodarkpark.org>   URL: http://www.toodarkpark.org
"You do not have the right to free health care.  That would be nice, but
 from the looks of public housing, we're just not interested in health care."


Re: [GENERAL] 50 MB Table

From
Paul Condon
Date:
The example you give, LIKE 'MAIN%', should be able to use the index, but I'm
not that expert on the internals of PostgreSQL. I was speaking from a general
knowledge of SQL. I was supposing you were trying to do something, such as,
LIKE '%BLVD%'. In this latter case, an index will bring you no benefit.

Perhaps it is working faster than it would have, had there been no index. Have
you tried dropping the index and seeing if the performance gets worse? To make
this test more powerful, choose a pattern that matches very few rows in your
table.


JB wrote:

> Thanks for taking the time to reply. I think that I wasn't as clear as I
> could be. This table is normalized and as far as I understand, what I'm
> doing with it is not extraordinary. The schema is basically...
>
> CREATE TABLE info (
>   lastname char(50),
>   street_name char(50),
>   street_number char(5),
>   ... (a bunch of other stuff that works fine with '=')
> );
>
> CREATE INDEX nx_info1 ON info (lastname);
> CREATE INDEX nx_info2 ON info (street_name);
>
> The select is as simple as this in most cases...
>
> SELECT * FROM info WHERE street_name LIKE 'MAIN%';
>
> .,,the table about 50MB worth, about 70,000 records. I have an index on
> 'lastname' and 'street_name' and I need to search on each of these with
> 'LIKE'. So I was wondering about ways to speed this up. It's very slow.
> It takes about 20 seconds for the above query. I even uppercased all the
> names, hoping tht would help. I wondered if I'd used the wrong index
> type (btree), or if there were some flags that would help. Is there a
> way to bust the indexes out alpha on the first letter say, or some other
> such scheme. BTW the machine is RH6.1 with 128mb ram, 27 GB, P350, no X
> and no users (except me ;)
>
> Paul Condon wrote:
> >
> > JB wrote:
> >
> > > I have a 50 MB +- table in postgres. The data is normalized so there's
> > > not much I can do about the size. The tuples are about 512 bytes so
> > > there's a pile of 'em. I need searching on of several fields, a couple
> > > in particular are text fields that needs 'LIKE'. The problem is, the
> > > thing is way too slow. So, I was wondering, before I go hunting for some
> > > other solution, could anyone here point me to some ways to (hand)
> > > optimize the searching in postgres? Different indexes, hashing and LIKE?
> > > I'm not sure where to go with this.
> > >
> > > The basic criteria are:
> > >  - sizes of indexes, etc, is not an issue. There's lot's of room on the
> > > box.
> > >  - the data is basically static so a read-only (if such a thing) is
> > > fine.
> > >  - it needs to be FAST
> > >
> > > cheers
> > > jb
> > >
> > > ************
> >
> > It sounds as if you have several different kinds of information encoded in
> > a single column using special words or letter combinations. This is a
> > violation of the ideal that data items should be "atomic." You should make
> > a catalog of all the things that you want to be able to say about each
> > tuple, and design a relational schema in which atomic assertion is given
> > its own column (attribute). Then you will be able to create indices on
> > each, and you won't have to use LIKE in your WHERE clauses.
> >
> > Paul
>
> --
> I'm in direct contact with many advanced fun CONCEPTS.
>
> Paul Condon wrote:
> >
>


Re: [GENERAL] 50 MB Table

From
Marten Feldtmann
Date:
> CREATE INDEX nx_info1 ON info (lastname);
> CREATE INDEX nx_info2 ON info (street_name);
>
> The select is as simple as this in most cases...
>
> SELECT * FROM info WHERE street_name LIKE 'MAIN%';
>
> .,,the table about 50MB worth, about 70,000 records. I have an index on
> 'lastname' and 'street_name' and I need to search on each of these with
> 'LIKE'. So I was wondering about ways to speed this up. It's very slow.
> It takes about 20 seconds for the above query. I even uppercased all the

 I think, that psql can not use any indicces when doing queries like
this. The indices would work if you would compare with "=".

 Actually I even do not even know, how one culd create a simple index
helping for searches like this.

 Marten





Re: [GENERAL] 50 MB Table

From
JB
Date:
I've been shuffling thru the 'LIKE' code in pg, but whilst perusing, it
occurred to me that I could use a function for this thing. Since i'm
only looking at the first part of the string I use...

SELECT * FROM info WHERE substring(street_name from 1 to 4) = 'MAIN'

...and the query completes in under 2 secs. Stupidly obvious now but I
thought I would share this blinding insight with the list ;)

Thanks again for all your help.

cheers
jb

Paul Condon wrote:
>
> The example you give, LIKE 'MAIN%', should be able to use the index, but I'm
> not that expert on the internals of PostgreSQL. I was speaking from a general
> knowledge of SQL. I was supposing you were trying to do something, such as,
> LIKE '%BLVD%'. In this latter case, an index will bring you no benefit.
>
> Perhaps it is working faster than it would have, had there been no index. Have
> you tried dropping the index and seeing if the performance gets worse? To make
> this test more powerful, choose a pattern that matches very few rows in your
> table.
>
> JB wrote:
>
> > Thanks for taking the time to reply. I think that I wasn't as clear as I
> > could be. This table is normalized and as far as I understand, what I'm
> > doing with it is not extraordinary. The schema is basically...
> >
> > CREATE TABLE info (
> >   lastname char(50),
> >   street_name char(50),
> >   street_number char(5),
> >   ... (a bunch of other stuff that works fine with '=')
> > );
> >
> > CREATE INDEX nx_info1 ON info (lastname);
> > CREATE INDEX nx_info2 ON info (street_name);
> >
> > The select is as simple as this in most cases...
> >
> > SELECT * FROM info WHERE street_name LIKE 'MAIN%';
> >
> > .,,the table about 50MB worth, about 70,000 records. I have an index on
> > 'lastname' and 'street_name' and I need to search on each of these with
> > 'LIKE'. So I was wondering about ways to speed this up. It's very slow.
> > It takes about 20 seconds for the above query. I even uppercased all the
> > names, hoping tht would help. I wondered if I'd used the wrong index
> > type (btree), or if there were some flags that would help. Is there a
> > way to bust the indexes out alpha on the first letter say, or some other
> > such scheme. BTW the machine is RH6.1 with 128mb ram, 27 GB, P350, no X
> > and no users (except me ;)
> >
> > Paul Condon wrote:
> > >
> > > JB wrote:
> > >
> > > > I have a 50 MB +- table in postgres. The data is normalized so there's
> > > > not much I can do about the size. The tuples are about 512 bytes so
> > > > there's a pile of 'em. I need searching on of several fields, a couple
> > > > in particular are text fields that needs 'LIKE'. The problem is, the
> > > > thing is way too slow. So, I was wondering, before I go hunting for some
> > > > other solution, could anyone here point me to some ways to (hand)
> > > > optimize the searching in postgres? Different indexes, hashing and LIKE?
> > > > I'm not sure where to go with this.
> > > >
> > > > The basic criteria are:
> > > >  - sizes of indexes, etc, is not an issue. There's lot's of room on the
> > > > box.
> > > >  - the data is basically static so a read-only (if such a thing) is
> > > > fine.
> > > >  - it needs to be FAST
> > > >
> > > > cheers
> > > > jb
> > > >
> > > > ************
> > >
> > > It sounds as if you have several different kinds of information encoded in
> > > a single column using special words or letter combinations. This is a
> > > violation of the ideal that data items should be "atomic." You should make
> > > a catalog of all the things that you want to be able to say about each
> > > tuple, and design a relational schema in which atomic assertion is given
> > > its own column (attribute). Then you will be able to create indices on
> > > each, and you won't have to use LIKE in your WHERE clauses.
> > >
> > > Paul
> >

--
If everything is coming your way then you're in the wrong lane.

Re: [GENERAL] 50 MB Table

From
"Bruce Bantos"
Date:
> I've been shuffling thru the 'LIKE' code in pg, but whilst perusing, it
> occurred to me that I could use a function for this thing. Since i'm
> only looking at the first part of the string I use...
>
> SELECT * FROM info WHERE substring(street_name from 1 to 4) = 'MAIN'
>
> ...and the query completes in under 2 secs. Stupidly obvious now but I
> thought I would share this blinding insight with the list ;)

It was not so obvious to me. I was struggling with the same thing. This is a
good workaround for the case of "searchstring%". I wonder if the LIKE code
could be enhanced to perform the query in the same way.

>
> Thanks again for all your help.
>
> cheers
> jb
>
> Paul Condon wrote:
> >
> > The example you give, LIKE 'MAIN%', should be able to use the index, but
I'm
> > not that expert on the internals of PostgreSQL. I was speaking from a
general
> > knowledge of SQL. I was supposing you were trying to do something, such
as,
> > LIKE '%BLVD%'. In this latter case, an index will bring you no benefit.
> >
> > Perhaps it is working faster than it would have, had there been no
index. Have
> > you tried dropping the index and seeing if the performance gets worse?
To make
> > this test more powerful, choose a pattern that matches very few rows in
your
> > table.
> >
> > JB wrote:
> >
> > > Thanks for taking the time to reply. I think that I wasn't as clear as
I
> > > could be. This table is normalized and as far as I understand, what
I'm
> > > doing with it is not extraordinary. The schema is basically...
> > >
> > > CREATE TABLE info (
> > >   lastname char(50),
> > >   street_name char(50),
> > >   street_number char(5),
> > >   ... (a bunch of other stuff that works fine with '=')
> > > );
> > >
> > > CREATE INDEX nx_info1 ON info (lastname);
> > > CREATE INDEX nx_info2 ON info (street_name);
> > >
> > > The select is as simple as this in most cases...
> > >
> > > SELECT * FROM info WHERE street_name LIKE 'MAIN%';
> > >
> > > .,,the table about 50MB worth, about 70,000 records. I have an index
on
> > > 'lastname' and 'street_name' and I need to search on each of these
with
> > > 'LIKE'. So I was wondering about ways to speed this up. It's very
slow.
> > > It takes about 20 seconds for the above query. I even uppercased all
the
> > > names, hoping tht would help. I wondered if I'd used the wrong index
> > > type (btree), or if there were some flags that would help. Is there a
> > > way to bust the indexes out alpha on the first letter say, or some
other
> > > such scheme. BTW the machine is RH6.1 with 128mb ram, 27 GB, P350, no
X
> > > and no users (except me ;)
> > >
> > > Paul Condon wrote:
> > > >
> > > > JB wrote:
> > > >
> > > > > I have a 50 MB +- table in postgres. The data is normalized so
there's
> > > > > not much I can do about the size. The tuples are about 512 bytes
so
> > > > > there's a pile of 'em. I need searching on of several fields, a
couple
> > > > > in particular are text fields that needs 'LIKE'. The problem is,
the
> > > > > thing is way too slow. So, I was wondering, before I go hunting
for some
> > > > > other solution, could anyone here point me to some ways to (hand)
> > > > > optimize the searching in postgres? Different indexes, hashing and
LIKE?
> > > > > I'm not sure where to go with this.
> > > > >
> > > > > The basic criteria are:
> > > > >  - sizes of indexes, etc, is not an issue. There's lot's of room
on the
> > > > > box.
> > > > >  - the data is basically static so a read-only (if such a thing)
is
> > > > > fine.
> > > > >  - it needs to be FAST
> > > > >
> > > > > cheers
> > > > > jb
> > > > >
> > > > > ************
> > > >
> > > > It sounds as if you have several different kinds of information
encoded in
> > > > a single column using special words or letter combinations. This is
a
> > > > violation of the ideal that data items should be "atomic." You
should make
> > > > a catalog of all the things that you want to be able to say about
each
> > > > tuple, and design a relational schema in which atomic assertion is
given
> > > > its own column (attribute). Then you will be able to create indices
on
> > > > each, and you won't have to use LIKE in your WHERE clauses.
> > > >
> > > > Paul
> > >
>
> --
> If everything is coming your way then you're in the wrong lane.
>
> ************
>


Re: [GENERAL] 50 MB Table

From
JB
Date:
I've posted this as a suggestion to the HACKERS list. I'd love to make a
patch but just don't have the time right now to sort through the code.

cheers
jim

Bruce Bantos wrote:
>
> > I've been shuffling thru the 'LIKE' code in pg, but whilst perusing, it
> > occurred to me that I could use a function for this thing. Since i'm
> > only looking at the first part of the string I use...
> >
> > SELECT * FROM info WHERE substring(street_name from 1 to 4) = 'MAIN'
> >
> > ...and the query completes in under 2 secs. Stupidly obvious now but I
> > thought I would share this blinding insight with the list ;)
>
> It was not so obvious to me. I was struggling with the same thing. This is a
> good workaround for the case of "searchstring%". I wonder if the LIKE code
> could be enhanced to perform the query in the same way.
>
--
If everything is coming your way then you're in the wrong lane.

RE: [GENERAL] 50 MB Table

From
"Andrew Snow"
Date:
Correct me if I'm wrong, but it may be slightly faster to use regular
expressions for this type of application.  Instead of doing
(WHERE string LIKE '%MAIN%') you would use (WHERE string ~ 'MAIN').  Or
perhaps the difference in speed would be negligible?



Also, I believe there there would be a good case for including a new type of
index, for strings, which would be some kind of tree structure, based on the
first N characters of the string?  Where the first branch of the tree is the
first character, second brand would be the second character, and so on.
Similar to a binary tree I guess.  So that searches of type
(WHERE string LIKE 'MAIN%') would benefit greatly on large tables.



Andrew.



Re: [GENERAL] 50 MB Table

From
m.malicky@landesmuseum-linz.ac.at
Date:
hmm, I have a 900MB Table with 6.500.000 rows, one varchar field
indexed (about 250MB index) running PG 6.5.3 on a dual PII 350
with 512MB RAM.
A query via PHP3 (with a like statement) is processed in about 2
seconds (returning about 500 rows) or 30 seconds when returning
30.000 rows. The number of rows returned depends on the user
input of course.

/mike


http://www.libranet.com/petition.html

Re: [GENERAL] 50 MB Table

From
JB
Date:
After much futzing and fiddling I've determined, umm, well very little
regarding my problem but a little more about PG. The LIKE query does use
the index if and only if, the '%' is at the tail of the string and there
is only one '%' in the string. So LIKE 'MAIN%' does use the index but
LIKE '%MAIN%' won't. This was verified on the HACKERS list. This means
that the 'substring' query that I used should be a heck of a lot slower.
EXPLAIN bears this out where the 'substring' query looked at 15,000 rows
+- and the LIKE 'MAIN%' used 30 rows. In practise tho, I still wait for
the LIKE query for a lot longer than the 'substring' one. The problem, I
think I can safely say, is not here. Which leaves me with the setup,
implementation, memory usage, OS, permissions, etc.

Sorry for all the traffic and confusion. I'll report back when I figure
this out.

cheers
jb

--
This is the first age that's paid much attention to the future, which is
a
little ironic since we may not have one.
        -- Arthur Clarke

Re: [GENERAL] 50 MB Table

From
Karl DeBisschop
Date:
>I've been shuffling thru the 'LIKE' code in pg, but whilst perusing, it
>occurred to me that I could use a function for this thing. Since i'm
>only looking at the first part of the string I use...
>
>SELECT * FROM info WHERE substring(street_name from 1 to 4) = 'MAIN'
>
>...and the query completes in under 2 secs. Stupidly obvious now but I
>thought I would share this blinding insight with the list ;)

if that case, try:

SELECT * FROM info WHERE street_name ~ '^MAIN';

One trick with string indexes in PG is they must be 'front-anchored',
which I thought "LIKE 'MAIN%'" was.  But maybe it's not.

This solution is a little more flexible than substring, and may give
the speed you desire.

(I'm not absolutely sure this query won't have the same performance
problems your first query did, but it typically works for us.  If you
do try this, I'd appreciate knowing whether it works for you or not)

--
Karl DeBisschop <kdebisschop@alert.infoplease.com>
617.542.6500x2332 (Fax: 617.956.2696)

Information Please / Family Education Network
http://www.infoplease.com  - your source for FREE online reference