Home > mailing lists

Performance problem with large resultsets (byte array 2200) - Mailing list pgsql-general

From	Victor Adolfsson
Subject	Performance problem with large resultsets (byte array 2200)
Date	July 31, 2007 10:21:55
Msg-id	9a6439440707310256j1474c172g4f0ba96fbb4c30c2@mail.gmail.com Whole thread Raw
Responses	Re: Performance problem with large resultsets (byte array 2200)
List	pgsql-general

Tree view

Hi

I'm having a problem with bad performance when retrieving many rows where each row has a 2200 long byte array (called template_compressed) and a 50 character varying (called uniqueid)

Is there a better datatype than bytea when it is important to fetch the information from the database?

Would it be better to store a reference in the database and then store the 2200 big template in a separate file?

What is the suggested size threshold for when it is better to store in file than in database?

What parts need to be configured and how should they be configured given the environment described at the bottom of this email?
(short version: DualCore P4 2.8 GHz, 1 GB RAM, 2 x 80 GB SATA RAID1)

What is the expected performance when an application on the same server is querying the database and doing data retrieval with this large recordset?

Are the use of cursors recommended when dealing with large recordsets? (currently cursors are not used)? Would the performance differ and if so, in what direction?

How can i decrease the difference in duration between the explain analyze time and the duration when actually retrieving the data from postgresql to the application?

When I do an explain analyze on the same query I get Total runtime: 3306.216 ms. (explain analyze output is available below)

This is the output from the postgresql-log file after enabling logging of slow queries (1second).
LOG: duration: 202927.174 ms statement: SELECT p.uniqueid AS uid, f.template_compressed AS ctemplate FROM person p INNER JOIN features f ON p.id=f.person_id WHERE p.website_id = '11' AND p.uniqueid != 'extrababak' AND birthdate < (NOW() - '13 YEAR'::interval) AND birthdate > (NOW() - '53 YEAR'::interval) AND f.datecreated > '1980-1-1'::date ORDER BY f.datecreated DESC

I have created an index on all columns which may be part of the where clause.

explain analyze SELECT p.uniqueid AS uid, f.template_compressed AS ctemplate FROM person p INNER JOIN features f ON p.id=f.person_id WHERE p.website_id = '11' AND p.uniqueid != 'extrababak' AND birthdate < (NOW() - '13 YEAR'::interval) AND birthdate > (NOW() - '53 YEAR'::interval) AND f.datecreated > '1980-1-1'::date ORDER BY f.datecreated DESC;

                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..13420.71 rows=63686 width=55) (actual time=36.335..3265.844 rows=63902 loops=1)
   -> Index Scan Backward using idx_features_datecreated on features f (cost=0.00..3380.64 rows=79249 width=44) (actual time=1.090..1488.601 rows=79264 loops=1)
         Index Cond: (datecreated > '1980-01-01'::date)
   -> Index Scan using person_pkey on person p (cost=0.00..0.11 rows=1 width=19) (actual time=0.019..0.020 rows=1 loops=79264)
         Index Cond: ( p.id = f.person_id)
         Filter: ((website_id = 11) AND ((uniqueid)::text <> 'extrababak'::text) AND (birthdate < (now() - '13 years'::interval)) AND (birthdate > (now() - '53 years'::interval)))
Total runtime: 3306.216 ms
(7 rows)

explain analyze yet another time
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..13420.71 rows=63686 width=55) (actual time=0.355..1123.840 rows=63902 loops=1)
   -> Index Scan Backward using idx_features_datecreated on features f (cost=0.00..3380.64 rows=79249 width=44) (actual time= 0.072..97.846 rows=79264 loops=1)
         Index Cond: (datecreated > '1980-01-01'::date)
   -> Index Scan using person_pkey on person p (cost=0.00..0.11 rows=1 width=19) (actual time=0.009..0.010 rows=1 loops=79264)
         Index Cond: (p.id = f.person_id)
         Filter: ((website_id = 11) AND ((uniqueid)::text <> 'extrababak'::text) AND (birthdate < (now() - '13 years'::interval)) AND (birthdate > (now() - '53 years'::interval)))
Total runtime: 1163.758 ms
(7 rows)

\d person;
                                     Table "public.person"
    Column    |           Type           |                      Modifiers
--------------+--------------------------+-----------------------------------------------------
id           | integer                  | not null default nextval('person_id_seq'::regclass)
uniqueid     | character varying(50)    |
website_id   | integer                  |
datecreated | timestamp with time zone | default now()
gender       | character varying(1)     | default 'U'::character varying
birthdate    | date                     |
category_id | integer                  |
city         | character varying(100)   |
zipcode      | character varying(20)    |
state        | character varying(100)   |
country_iso2 | character varying(2)     |
Indexes:
    "person_pkey" PRIMARY KEY, btree (id)
    "idx_person_birthdate" btree (birthdate)
    "idx_person_category_id" btree (category_id)
    "idx_person_city" btree (city)
    "idx_person_country_iso2" btree (country_iso2)
    "idx_person_datecreated" btree (datecreated)
    "idx_person_gender" btree (gender)
    "idx_person_state" btree (state)
    "idx_person_uniqueid" btree (uniqueid)
    "idx_person_website_id" btree (website_id)
    "idx_person_zipcode" btree (zipcode)
Foreign-key constraints:
    "person_website_id_fkey" FOREIGN KEY (website_id) REFERENCES website(id)

obl_db2=# \d features;
                                        Table "public.features"
       Column        |           Type           |                       Modifiers
---------------------+--------------------------+-------------------------------------------------------
id                  | integer                  | not null default nextval('features_id_seq'::regclass)
datecreated         | timestamp with time zone |
template            | bytea                    |
person_id           | integer                  |
templateversion_id | integer                  |
template_compressed | bytea                    |
Indexes:
    "features_pkey" PRIMARY KEY, btree (id)
    "idx_features_datecreated" btree (datecreated)
    "idx_features_person_id" btree (person_id)
Foreign-key constraints:
    "features_person_id_fkey" FOREIGN KEY (person_id) REFERENCES person(id) ON DELETE CASCADE
    "features_templateversion_id_fkey" FOREIGN KEY (templateversion_id) REFERENCES templateversion(id)

[victor@optimum02 ~]$ free
             total       used       free     shared    buffers     cached
Mem:       1034772     973988      60784          0     111080     361464
-/+ buffers/cache:     501444     533328
Swap:      2031608         88    2031520

uname -a
Linux optimum02 2.6.18-1.2257.fc5smp #1 SMP Fri Dec 15 16:33:51 EST 2006 i686 i686 i386 GNU/Linux

[victor@optimum02 ~]$ psql --version
psql (PostgreSQL) 8.2.0
contains support for command-line editing

select version();
                                                version
-------------------------------------------------------------------------------------------------------
PostgreSQL 8.2.0 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1)

cat /proc/sys/kernel/shmall
2097152
cat /proc/sys/kernel/shmmni
4096
cat /proc/sys/kernel/shmmax
134217728

From the postgresql.conf file, the following settings have been changed to these values
shared_buffers = 96MB
effective_cache_size = 512MB

--
Victor Adolfsson
Chief Technology Officer

+46 733 404 623
Optimum Biometric Labs AB
Campus Gräsvik 5
37175 Karlskrona
Sweden

pgsql-general by date:

From: Nis Jørgensen
Date: 31 July 2007, 07:57:28
Subject: Re: alter table table add column

From: Viatcheslav Kalinin
Date: 31 July 2007, 11:35:26
Subject: Polymorphic functions' weird behavior

Performance problem with large resultsets (byte array 2200) - Mailing list pgsql-general

Previous

Next