Thread: Re: postgres slower than grep?

Re: postgres slower than grep?

From
Allan Engelhardt
Date:
I could only be botherd to try a million lines, but my results from 7.1.2 are below.

Basically:

1. I find about 50% database storage overhead in this case.  That's not completely silly, considering this is
structureddata, but seems a little high.  I don't know the internal structures well enough to really see what's
happening.

2. Why would it be faster than grep?  This has to match structured data, in this case varchar, and not just bytes.  It
hasto worry about transactions and logs, not just a stream of data.  Besides, in my tests it is not *that* slow (3 sec,
comparedwith 1/2).  Dunno what's up with your system. 

3. As you said: With an index it rocks, easily beating grep.  Use an index - it's your friend :-)


Allan.




[playpen]$ wc -l /tmp/input.txt
1000001 /tmp/input.txt
[playpen]$ ls -lh /tmp/input.txt
-rw-rw-r--    1 allane   allane       124M Jul  9 22:12 input.txt
[playpen]$ psql -f create_table.sql test
CREATE
[playpen]$ time psql -c "COPY clients FROM '/tmp/input.txt';" test

COPY

real 1m26.543s
user 0m0.000s
sys 0m0.010s
[playpen]$ du -sh $PGDATA/base/963549/
179M /var/lib/pgsql/data/base/963549
[playpen]$ time psql -c 'SELECT * FROM clients WHERE tel=1234567;' test
 test
 clientid | firstname | surname | area | neigh | dimos | tel
----------+-----------+---------+------+-------+-------+-----
(0 rows)


real 0m24.281s
user 0m0.010s
sys 0m0.000s
[playpen]$vacuumdb -a; vacuumdb -a --analyze
...
[playpen]$ time psql -c 'SELECT * FROM clients WHERE tel=1234567;' test
 clientid | firstname | surname | area | neigh | dimos | tel
----------+-----------+---------+------+-------+-------+-----
(0 rows)


real 0m3.808s
user 0m0.010s
sys 0m0.000s
[playpen]$ time grep '1234567' /tmp/input.txt
...
real 0m0.505s
user 0m0.150s
sys 0m0.360s
[playpen]$ dropdb test
DROP DATABASE
[playpen]$ createdb -E LATIN1 test
CREATE DATABASE
[playpen]$ psql -f create_table.sql test
CREATE
[playpen]$ time psql -c "COPY clients FROM '/tmp/input.txt';" test
COPY

real 1m23.927s
user 0m0.010s
sys 0m0.000s
[playpen]$ time psql -c 'SELECT * FROM clients WHERE tel=1234567;' test
 clientid | firstname | surname | area | neigh | dimos | tel
----------+-----------+---------+------+-------+-------+-----
(0 rows)


real 0m23.934s
user 0m0.010s
sys 0m0.010s
[playpen]$vacuumdb -a; vacuumdb -a --analyze
...
[playpen]$ time psql -c 'SELECT * FROM clients WHERE tel=1234567;' test
 clientid | firstname | surname | area | neigh | dimos | tel
----------+-----------+---------+------+-------+-------+-----
(0 rows)


real 0m3.796s
user 0m0.010s
sys 0m0.000s


[playpen]$ psql -c 'CREATE INDEX clients_idx_tel ON clients (tel);' test
CREATE
[playpen]$ vacuumdb --analyze test
VACUUM
[playpen]$ time psql -c 'SELECT * FROM clients WHERE tel=1234567;' test
 clientid | firstname | surname | area | neigh | dimos | tel
----------+-----------+---------+------+-------+-------+-----
(0 rows)


real 0m0.189s
user 0m0.010s
sys 0m0.000s

Spiros Ioannou wrote:

> Hello all,
>
> before you start reading, have in mind that this is not post to advertise
> one db over another, I just want to know what happens, that's why I did
> the comparisons with other db.
> to the point:
>
> This is my table:
>
>            Table "table1"
>  Attribute |    Type     | Modifier
> -----------+-------------+----------
>  ClientID  | integer     |
>  firstname | varchar(5)  |
>  surname   | varchar(22) |
>  area      | varchar(3)  |
>  neigh     | varchar(27) |
>  dimos     | varchar(50) |
>  tel       | varchar(7)  |
>
> The Facts:
> ----------
> The table exported to a text file is about 330MB, and contains about 5
> milion lines.
>
> The same table takes 670MB in postgres. Why?
> (I tried mysql, and it uses 340MB).
>
> issuing the following query:
> select * from table1 where tel='7485842';
> takes about 1min and 40 seconds to complete. (query has one result)
>
> explain of the above returns:
> Seq Scan on table1  (cost=0.00..147835.01 rows=23 width=76)
>
> issuing a grep in the text file takes about 25 seconds! *(for the whole grep
> to finish - worse case)*
>
> issuing the same query in mysql takes about 25 seconds.
>
> -to test the disk speed, I doubled the size of the text file by copying
>  it twice and the same grep query took 51 seconds (as expected)
>
> when creating an index the query completes of course in no time
>
> yes , i tried vacuum
>
> Postgres is version 7.0
> system is solaris 2.7
> hard disk is 10000rpm, ultraSCSI
> cpu is UltraSparcIIi 333Mhz
> physical memory is 384MB
>
> and now the questions
> ---------------------
> -shouldn't postgres be at least as fast as the grep?
> -shouldn't the table data file be smaller to reduce disk activity? Why is it
>  double as the mysql same data file or the text file? I also noticed that
>  an index file for column "tel" is about 130MB large and the mysql's one
>  was 64MB, is this normal?
>
> Thank you for your time,
>
> Spiros  Ioannou
> e-mail:sivann@kill-9.gr
> ---------------------------------------
> Image Video & Multimedia Systems Lab.
> Department of Electrical & Computer Eng.
> National Technical University of Athens


Re: Re: postgres slower than grep?

From
"Mitch Vincent"
Date:
> 1. I find about 50% database storage overhead in this case.  That's not
completely silly, considering this is structured data, but seems a little
high.  I don't know >the internal structures well enough to really see
what's happening.

Hmm, the PG docs say to expect data stored in the database to take up %600
(or so) more space..

see: http://postgresql.bteg.net/docs/faq-english.html#4.7

> 2. Why would it be faster than grep?  This has to match structured data,
in this case varchar, and not just bytes.  It has to worry about
transactions and logs, not >just a stream of data.  Besides, in my tests it
is not *that* slow (3 sec, compared with 1/2).  Dunno what's up with your
system.

Sure, I'd expect grep to find a string in a semi-large text file faster than
PostgreSQL -- there is a hell of a lot less overhead with grep! :-)

> 3. As you said: With an index it rocks, easily beating grep.  Use an
index - it's your friend :-)

Yep yep!

-Mitch