Re: A speed comparison with sqlite - Mailing list pgsql-advocacy
From | drh@hwaci.com |
---|---|
Subject | Re: A speed comparison with sqlite |
Date | |
Msg-id | 2454484.483031911258676100307556033850405637744713689@hwaci.com Whole thread Raw |
In response to | A speed comparison with sqlite (Shane Ambler <pgsql@Sheeky.Biz>) |
List | pgsql-advocacy |
> A guy using RealBasic (GUI development IDE) which uses sqlite as it's > inbuilt db engine wanted to generate a series of unique codes and > decided sql will help (wanting 30 million codes starts to rule out ram > based solutions) > > This is a 7 character alphanumeric code. > > His program generated 30M codes in about 15 minutes into the sqlite db > file without any duplicate checks. > > Select distinct(pincode) from codes; returned the results (that is to > his client not out to file) after 22 hours and he was after a faster > solution. > > Using a unique index to check as he went, inserts dropped from 18,000 > per second to about 200 a second after 8 hours (without completing) > > (the following times are taken from psql's timing option) > > With a P4 3Ghz - 80GB IDE drive - running XP pro - pg 8.2.6 - > ... > The same hardware running FreeBSD 7.0RC1 GENERIC Kernel - > > create the table and insert 30M codes in 5.6 mins > > select count(distinct(pincode)) from codes; took 5.9 minutes > (29,993,279 unique codes of the 30M created) > > setting output to a text file - select distinct(pincode) from codes took > 9.7 minutes to send them all out to file. > > It would appear that the method of locating distinct results is the key > here - postgresql sorts the distinct column allowing it to identify a > duplicate or new value quicker - sqlite returns the distinct columns in > the order they are inserted - so it must be scanning all previous > entries in the output to ensure distinct. > Come on, guys. SQLite does have performance issues with sorting large tables. It is something we are working on. But it isn't *that* bad. I reran the test with no special tricks and SQLite took 15 minutes. Still 3x slower than PostgreSQL, I admit. But it is within a factor of 3.... Here is the output I get when I create 2^25 random 7-character strings, then did a CREATE TABLE t2 AS SELECT DISTINCT * FROM t1: 2008-01-18 23:29:22: i=16 2008-01-18 23:29:22: i=32 2008-01-18 23:29:22: i=64 2008-01-18 23:29:22: i=128 2008-01-18 23:29:22: i=256 2008-01-18 23:29:22: i=512 2008-01-18 23:29:22: i=1024 2008-01-18 23:29:22: i=2048 2008-01-18 23:29:22: i=4096 2008-01-18 23:29:22: i=8192 2008-01-18 23:29:22: i=16384 2008-01-18 23:29:22: i=32768 2008-01-18 23:29:23: i=65536 2008-01-18 23:29:23: i=131072 2008-01-18 23:29:24: i=262144 2008-01-18 23:29:25: i=524288 2008-01-18 23:29:27: i=1048576 2008-01-18 23:29:31: i=2097152 2008-01-18 23:29:40: i=4194304 2008-01-18 23:29:59: i=8388608 2008-01-18 23:30:37: i=16777216 2008-01-18 23:31:54: start the distinct... 2008-01-18 23:46:46: finish the distinct... So you can see, the SELECT distinct took about 15 minutes. The TCL script used to run this test is: sqlite db test.db db eval { CREATE TABLE t1(pincode TEXT); INSERT INTO t1 VALUES(randstr(7,7)); } proc msg {txt} { set now [db one {SELECT datetime('now')}] puts "$now: $txt" flush stdout } for {set i 1} {$i<30000000} {incr i $i} { msg "i=$i" flush stdout db eval { INSERT INTO t1 SELECT randstr(7,7) FROM t1; } } msg "start the distinct..." db eval { CREATE TABLE t2 AS SELECT distinct(pincode) FROM t1; } msg "finish the distinct..." -- D. Richard Hipp <drh@hwaci.com>
pgsql-advocacy by date: