Thread: Performance problem with large resultsets (byte array 2200)
Hi
I'm having a problem with bad performance when retrieving many rows where each row has a 2200 long byte array (called template_compressed) and a 50 character varying (called uniqueid)
Is there a better datatype than bytea when it is important to fetch the information from the database?
Would it be better to store a reference in the database and then store the 2200 big template in a separate file?
What is the suggested size threshold for when it is better to store in file than in database?
What parts need to be configured and how should they be configured given the environment described at the bottom of this email?
(short version: DualCore P4 2.8 GHz, 1 GB RAM, 2 x 80 GB SATA RAID1)
What is the expected performance when an application on the same server is querying the database and doing data retrieval with this large recordset?
Are the use of cursors recommended when dealing with large recordsets? (currently cursors are not used)? Would the performance differ and if so, in what direction?
How can i decrease the difference in duration between the explain analyze time and the duration when actually retrieving the data from postgresql to the application?
When I do an explain analyze on the same query I get Total runtime: 3306.216 ms. (explain analyze output is available below)
This is the output from the postgresql-log file after enabling logging of slow queries (1second).
LOG: duration: 202927.174 ms statement: SELECT p.uniqueid AS uid, f.template_compressed AS ctemplate FROM person p INNER JOIN features f ON p.id=f.person_id WHERE p.website_id = '11' AND p.uniqueid != 'extrababak' AND birthdate < (NOW() - '13 YEAR'::interval) AND birthdate > (NOW() - '53 YEAR'::interval) AND f.datecreated > '1980-1-1'::date ORDER BY f.datecreated DESC
I have created an index on all columns which may be part of the where clause.
explain analyze SELECT p.uniqueid AS uid, f.template_compressed AS ctemplate FROM person p INNER JOIN features f ON p.id=f.person_id WHERE p.website_id = '11' AND p.uniqueid != 'extrababak' AND birthdate < (NOW() - '13 YEAR'::interval) AND birthdate > (NOW() - '53 YEAR'::interval) AND f.datecreated > '1980-1-1'::date ORDER BY f.datecreated DESC;
--
Victor Adolfsson
Chief Technology Officer
+46 733 404 623
Optimum Biometric Labs AB
Campus Gräsvik 5
37175 Karlskrona
Sweden
I'm having a problem with bad performance when retrieving many rows where each row has a 2200 long byte array (called template_compressed) and a 50 character varying (called uniqueid)
Is there a better datatype than bytea when it is important to fetch the information from the database?
Would it be better to store a reference in the database and then store the 2200 big template in a separate file?
What is the suggested size threshold for when it is better to store in file than in database?
What parts need to be configured and how should they be configured given the environment described at the bottom of this email?
(short version: DualCore P4 2.8 GHz, 1 GB RAM, 2 x 80 GB SATA RAID1)
What is the expected performance when an application on the same server is querying the database and doing data retrieval with this large recordset?
Are the use of cursors recommended when dealing with large recordsets? (currently cursors are not used)? Would the performance differ and if so, in what direction?
How can i decrease the difference in duration between the explain analyze time and the duration when actually retrieving the data from postgresql to the application?
When I do an explain analyze on the same query I get Total runtime: 3306.216 ms. (explain analyze output is available below)
This is the output from the postgresql-log file after enabling logging of slow queries (1second).
LOG: duration: 202927.174 ms statement: SELECT p.uniqueid AS uid, f.template_compressed AS ctemplate FROM person p INNER JOIN features f ON p.id=f.person_id WHERE p.website_id = '11' AND p.uniqueid != 'extrababak' AND birthdate < (NOW() - '13 YEAR'::interval) AND birthdate > (NOW() - '53 YEAR'::interval) AND f.datecreated > '1980-1-1'::date ORDER BY f.datecreated DESC
I have created an index on all columns which may be part of the where clause.
explain analyze SELECT p.uniqueid AS uid, f.template_compressed AS ctemplate FROM person p INNER JOIN features f ON p.id=f.person_id WHERE p.website_id = '11' AND p.uniqueid != 'extrababak' AND birthdate < (NOW() - '13 YEAR'::interval) AND birthdate > (NOW() - '53 YEAR'::interval) AND f.datecreated > '1980-1-1'::date ORDER BY f.datecreated DESC;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..13420.71 rows=63686 width=55) (actual time=36.335..3265.844 rows=63902 loops=1)
-> Index Scan Backward using idx_features_datecreated on features f (cost=0.00..3380.64 rows=79249 width=44) (actual time=1.090..1488.601 rows=79264 loops=1)
Index Cond: (datecreated > '1980-01-01'::date)
-> Index Scan using person_pkey on person p (cost=0.00..0.11 rows=1 width=19) (actual time=0.019..0.020 rows=1 loops=79264)
Index Cond: ( p.id = f.person_id)
Filter: ((website_id = 11) AND ((uniqueid)::text <> 'extrababak'::text) AND (birthdate < (now() - '13 years'::interval)) AND (birthdate > (now() - '53 years'::interval)))
Total runtime: 3306.216 ms
(7 rows)
explain analyze yet another time
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..13420.71 rows=63686 width=55) (actual time=0.355..1123.840 rows=63902 loops=1)
-> Index Scan Backward using idx_features_datecreated on features f (cost=0.00..3380.64 rows=79249 width=44) (actual time= 0.072..97.846 rows=79264 loops=1)
Index Cond: (datecreated > '1980-01-01'::date)
-> Index Scan using person_pkey on person p (cost=0.00..0.11 rows=1 width=19) (actual time=0.009..0.010 rows=1 loops=79264)
Index Cond: (p.id = f.person_id)
Filter: ((website_id = 11) AND ((uniqueid)::text <> 'extrababak'::text) AND (birthdate < (now() - '13 years'::interval)) AND (birthdate > (now() - '53 years'::interval)))
Total runtime: 1163.758 ms
(7 rows)
\d person;
Table "public.person"
Column | Type | Modifiers
--------------+--------------------------+-----------------------------------------------------
id | integer | not null default nextval('person_id_seq'::regclass)
uniqueid | character varying(50) |
website_id | integer |
datecreated | timestamp with time zone | default now()
gender | character varying(1) | default 'U'::character varying
birthdate | date |
category_id | integer |
city | character varying(100) |
zipcode | character varying(20) |
state | character varying(100) |
country_iso2 | character varying(2) |
Indexes:
"person_pkey" PRIMARY KEY, btree (id)
"idx_person_birthdate" btree (birthdate)
"idx_person_category_id" btree (category_id)
"idx_person_city" btree (city)
"idx_person_country_iso2" btree (country_iso2)
"idx_person_datecreated" btree (datecreated)
"idx_person_gender" btree (gender)
"idx_person_state" btree (state)
"idx_person_uniqueid" btree (uniqueid)
"idx_person_website_id" btree (website_id)
"idx_person_zipcode" btree (zipcode)
Foreign-key constraints:
"person_website_id_fkey" FOREIGN KEY (website_id) REFERENCES website(id)
obl_db2=# \d features;
Table "public.features"
Column | Type | Modifiers
---------------------+--------------------------+-------------------------------------------------------
id | integer | not null default nextval('features_id_seq'::regclass)
datecreated | timestamp with time zone |
template | bytea |
person_id | integer |
templateversion_id | integer |
template_compressed | bytea |
Indexes:
"features_pkey" PRIMARY KEY, btree (id)
"idx_features_datecreated" btree (datecreated)
"idx_features_person_id" btree (person_id)
Foreign-key constraints:
"features_person_id_fkey" FOREIGN KEY (person_id) REFERENCES person(id) ON DELETE CASCADE
"features_templateversion_id_fkey" FOREIGN KEY (templateversion_id) REFERENCES templateversion(id)
[victor@optimum02 ~]$ free
total used free shared buffers cached
Mem: 1034772 973988 60784 0 111080 361464
-/+ buffers/cache: 501444 533328
Swap: 2031608 88 2031520
uname -a
Linux optimum02 2.6.18-1.2257.fc5smp #1 SMP Fri Dec 15 16:33:51 EST 2006 i686 i686 i386 GNU/Linux
[victor@optimum02 ~]$ psql --version
psql (PostgreSQL) 8.2.0
contains support for command-line editing
select version();
version
-------------------------------------------------------------------------------------------------------
PostgreSQL 8.2.0 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1)
cat /proc/sys/kernel/shmall
2097152
cat /proc/sys/kernel/shmmni
4096
cat /proc/sys/kernel/shmmax
134217728
From the postgresql.conf file, the following settings have been changed to these values
shared_buffers = 96MB
effective_cache_size = 512MB
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..13420.71 rows=63686 width=55) (actual time=36.335..3265.844 rows=63902 loops=1)
-> Index Scan Backward using idx_features_datecreated on features f (cost=0.00..3380.64 rows=79249 width=44) (actual time=1.090..1488.601 rows=79264 loops=1)
Index Cond: (datecreated > '1980-01-01'::date)
-> Index Scan using person_pkey on person p (cost=0.00..0.11 rows=1 width=19) (actual time=0.019..0.020 rows=1 loops=79264)
Index Cond: ( p.id = f.person_id)
Filter: ((website_id = 11) AND ((uniqueid)::text <> 'extrababak'::text) AND (birthdate < (now() - '13 years'::interval)) AND (birthdate > (now() - '53 years'::interval)))
Total runtime: 3306.216 ms
(7 rows)
explain analyze yet another time
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..13420.71 rows=63686 width=55) (actual time=0.355..1123.840 rows=63902 loops=1)
-> Index Scan Backward using idx_features_datecreated on features f (cost=0.00..3380.64 rows=79249 width=44) (actual time= 0.072..97.846 rows=79264 loops=1)
Index Cond: (datecreated > '1980-01-01'::date)
-> Index Scan using person_pkey on person p (cost=0.00..0.11 rows=1 width=19) (actual time=0.009..0.010 rows=1 loops=79264)
Index Cond: (p.id = f.person_id)
Filter: ((website_id = 11) AND ((uniqueid)::text <> 'extrababak'::text) AND (birthdate < (now() - '13 years'::interval)) AND (birthdate > (now() - '53 years'::interval)))
Total runtime: 1163.758 ms
(7 rows)
\d person;
Table "public.person"
Column | Type | Modifiers
--------------+--------------------------+-----------------------------------------------------
id | integer | not null default nextval('person_id_seq'::regclass)
uniqueid | character varying(50) |
website_id | integer |
datecreated | timestamp with time zone | default now()
gender | character varying(1) | default 'U'::character varying
birthdate | date |
category_id | integer |
city | character varying(100) |
zipcode | character varying(20) |
state | character varying(100) |
country_iso2 | character varying(2) |
Indexes:
"person_pkey" PRIMARY KEY, btree (id)
"idx_person_birthdate" btree (birthdate)
"idx_person_category_id" btree (category_id)
"idx_person_city" btree (city)
"idx_person_country_iso2" btree (country_iso2)
"idx_person_datecreated" btree (datecreated)
"idx_person_gender" btree (gender)
"idx_person_state" btree (state)
"idx_person_uniqueid" btree (uniqueid)
"idx_person_website_id" btree (website_id)
"idx_person_zipcode" btree (zipcode)
Foreign-key constraints:
"person_website_id_fkey" FOREIGN KEY (website_id) REFERENCES website(id)
obl_db2=# \d features;
Table "public.features"
Column | Type | Modifiers
---------------------+--------------------------+-------------------------------------------------------
id | integer | not null default nextval('features_id_seq'::regclass)
datecreated | timestamp with time zone |
template | bytea |
person_id | integer |
templateversion_id | integer |
template_compressed | bytea |
Indexes:
"features_pkey" PRIMARY KEY, btree (id)
"idx_features_datecreated" btree (datecreated)
"idx_features_person_id" btree (person_id)
Foreign-key constraints:
"features_person_id_fkey" FOREIGN KEY (person_id) REFERENCES person(id) ON DELETE CASCADE
"features_templateversion_id_fkey" FOREIGN KEY (templateversion_id) REFERENCES templateversion(id)
[victor@optimum02 ~]$ free
total used free shared buffers cached
Mem: 1034772 973988 60784 0 111080 361464
-/+ buffers/cache: 501444 533328
Swap: 2031608 88 2031520
uname -a
Linux optimum02 2.6.18-1.2257.fc5smp #1 SMP Fri Dec 15 16:33:51 EST 2006 i686 i686 i386 GNU/Linux
[victor@optimum02 ~]$ psql --version
psql (PostgreSQL) 8.2.0
contains support for command-line editing
select version();
version
-------------------------------------------------------------------------------------------------------
PostgreSQL 8.2.0 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1)
cat /proc/sys/kernel/shmall
2097152
cat /proc/sys/kernel/shmmni
4096
cat /proc/sys/kernel/shmmax
134217728
From the postgresql.conf file, the following settings have been changed to these values
shared_buffers = 96MB
effective_cache_size = 512MB
--
Victor Adolfsson
Chief Technology Officer
+46 733 404 623
Optimum Biometric Labs AB
Campus Gräsvik 5
37175 Karlskrona
Sweden
"Victor Adolfsson" <victor@optimumbiometrics.com> writes: > I'm having a problem with bad performance when retrieving many rows where > each row has a 2200 long byte array (called template_compressed) and a 50 > character varying (called uniqueid) 2200 bytes is (just) long enough to trigger toasting of the entries, and I suppose it's the extra time to fetch a toasted value that's hurting. As long as the total row width is under 8K, you could disable toasting by ALTER TABLE ... SET STORAGE PLAIN on all the variable-width columns of the table. Note that this wouldn't in itself de-toast rows already stored; so you'd have to dump/truncate/reload the table before seeing any performance benefit. regards, tom lane