Thread: Performance trouble finding records through related records

Performance trouble finding records through related records

From

sverhagen

Date:

01 March 2011, 20:30:24

Hi, appreciated mailing list. Thanks already for taking your time for my
performance question. Regards, Sander.


===POSTGRESQL VERSION AND ORIGIN===

PostgreSQL 8.3.9 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.2.4
(Ubuntu 4.2.4-1ubuntu3)
Installed using "apt-get install postgresql-8.3"


===A DESCRIPTION OF WHAT YOU ARE TRYING TO ACHIEVE===

Query involving tables events_events and events_eventdetails. There is any
number of events_eventdetails records for each events_events record.

There may be multiple records in events_events that have the same value for
their transactionId, which is available in one of their events_eventdetails
records.

We want a total query that returns events_events records that match
condition I. or II., sorted by datetime descending, first 50.

Condition I.
All events_events records for which an events_eventdetails records that
matches the following conditions:
- Column keyname (in events_eventdetails) equals "customerId", and
- Column value (in events_eventdetails) equals 598124, or more precisely
substring(customerDetails.value,0,32)='598124'

Condition II.
All events_events records that have a same value for in one of their
events_eventdetails records with keyname 'transactionId' as any of the
resulting events_events records of condition I.

In other words: I want all events for a certain customerId, and all events
with the same transactionId as those.

The total query's time should be of the magnitude 100ms, but currently is of
the magnitude 1min.

JUST FOR THE PURPOSE OF EXPERIMENT I've now a denormalized copy of
transactionId as a column in the events_events records. Been trying queries
on those, with no improvements.

I am not seeking WHY my query is too slow, rather trying to find a way to
get it faster :-)


===THE EXACT TEXT OF THE QUERY YOU RAN===

The total query:

SELECT events1.id, events1.transactionId, events1.dateTime FROM
events_events events1
JOIN events_eventdetails customerDetails
ON events1.id = customerDetails.event_id
AND customerDetails.keyname='customer_id'
AND substring(customerDetails.value,0,32)='598124'
WHERE events1.eventtype_id IN

(100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108)
UNION
SELECT events2.id, events2.transactionId, events2.dateTime FROM
events_events events2
JOIN events_eventdetails details2_transKey
    ON events2.id = details2_transKey.event_id
    AND details2_transKey.keyname='transactionId'
JOIN events_eventdetails details2_transValue
    ON substring(details2_transKey.value,0,32) =
substring(details2_transValue.value,0,32)
    AND details2_transValue.keyname='transactionId'
JOIN events_eventdetails customerDetails
    ON details2_transValue.event_id = customerDetails.event_id
    AND customerDetails.keyname='customer_id'
    AND substring(customerDetails.value,0,32)='598124'
WHERE events2.eventtype_id IN

(100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108)
ORDER BY dateTime DESC LIMIT 50


===THE EXACT OUTPUT OF THAT QUERY===

The exactly correct and desired output is as follows:

    id    |            transactionid            |          datetime
----------+-------------------------------------+----------------------------
 16336643 |                                     | 2011-03-01 11:10:38.648+01
 16336642 |                                     | 2011-03-01 11:10:35.629+01
 16336641 |                                     | 2011-03-01 11:10:35.625+01
 16336637 |                                     | 2011-03-01 11:09:53.306+01
 16336634 |                                     | 2011-03-01 11:09:14.027+01
 16336633 | 26eaeb24-7a93-4c9a-99f9-bd3b77f9636 | 2011-03-01 11:09:14.004+01
 16336632 | 26eaeb24-7a93-4c9a-99f9-bd3b77f9636 | 2011-03-01 11:09:13.925+01
 16336631 |                                     | 2011-03-01 11:09:13.873+01
 16336630 |                                     | 2011-03-01 11:09:13.741+01
 16336626 |                                     | 2011-03-01 11:09:08.931+01
 16336625 |                                     | 2011-03-01 11:09:01.811+01
 16336624 | 2037f235-89d2-402a-90eb-3bcf40d633c | 2011-03-01 11:09:01.771+01
 16336623 | 2037f235-89d2-402a-90eb-3bcf40d633c | 2011-03-01 11:09:01.729+01
 16336611 |                                     | 2011-03-01 11:08:08.63+01
 16336610 |                                     | 2011-03-01 11:08:02.805+01
 16336609 |                                     | 2011-03-01 11:08:02.801+01
 16336606 |                                     | 2011-03-01 11:07:55.324+01
 16336602 |                                     | 2011-03-01 11:07:38.63+01
 16336600 |                                     | 2011-03-01 11:07:34.561+01
 16336599 |                                     | 2011-03-01 11:07:34.547+01
 16336595 |                                     | 2011-03-01 11:07:24.471+01
 16336594 | e3235ae5-9f40-4ceb-94bf-61ed99b793a | 2011-03-01 11:07:24.445+01
 16336593 | e3235ae5-9f40-4ceb-94bf-61ed99b793a | 2011-03-01 11:07:24.373+01
 16336591 |                                     | 2011-03-01 11:07:24.268+01
 16336590 |                                     | 2011-03-01 11:07:24.065+01
 16336583 |                                     | 2011-03-01 11:06:43.63+01
 16336582 |                                     | 2011-03-01 11:06:36.977+01
 16336581 |                                     | 2011-03-01 11:06:36.973+01
 16336575 |                                     | 2011-03-01 11:06:18.637+01
 16336573 |                                     | 2011-03-01 11:06:16.728+01
 16336572 |                                     | 2011-03-01 11:06:16.723+01
 16336569 |                                     | 2011-03-01 11:06:06.662+01
 16336568 | 3519e6de-bc8f-4641-a686-c088e459b47 | 2011-03-01 11:06:06.639+01
 16336567 | 3519e6de-bc8f-4641-a686-c088e459b47 | 2011-03-01 11:06:06.569+01
 16336566 |                                     | 2011-03-01 11:06:06.526+01
 16336565 |                                     | 2011-03-01 11:06:06.359+01
 16336561 |                                     | 2011-03-01 11:05:58.868+01
 16336560 |                                     | 2011-03-01 11:05:50.80+01
 16336559 | fbd5c7b2-6035-45cf-9222-a3d9f77d9ae | 2011-03-01 11:05:50.767+01
 16336558 | fbd5c7b2-6035-45cf-9222-a3d9f77d9ae | 2011-03-01 11:05:50.724+01
 16336550 |                                     | 2011-03-01 11:05:33.631+01
 16336549 |                                     | 2011-03-01 11:05:28.313+01
 16336548 |                                     | 2011-03-01 11:05:28.309+01
 16336545 |                                     | 2011-03-01 11:05:20.86+01
 16336541 |                                     | 2011-03-01 11:03:23.626+01
 16336539 |                                     | 2011-03-01 11:03:18.623+01
 16336538 |                                     | 2011-03-01 11:03:18.613+01
 16336535 |                                     | 2011-03-01 11:03:08.553+01
 16336534 | db9dcfb9-870d-42f6-947f-9a22fb49592 | 2011-03-01 11:03:08.531+01
 16336533 | db9dcfb9-870d-42f6-947f-9a22fb49592 | 2011-03-01 11:03:08.457+01
(50 rows)
Time: 51442.732 ms

Only problem is the time it took!


===WHAT PROGRAM YOU'RE USING TO CONNECT===

For this debugging purposes I'm using a mix of psql (on the target system)
and pgAdmin (from my Windows machine). The final application involves a Java
web application running in Tomcat 5.5.


===CHANGES MADE TO THE SETTINGS IN THE POSTGRESQL.CONF FILE===

    # This specific configuration is made on basis of postgresql.conf of
PostgreSQL 8.3
    #
    # Memory usage:             typical: maximum:
    # - Application server         1,2GB    1,5GB
    # - Shared buffers             1,0GB    1,0GB
    # - Work memory                ?        2,2GB
    #   ((>>connections) * work_mem)
    #   (only counting relevant application server connections*)
    # - Query cache                ?        2,0GB
    # -------------------------------------------
    #   Total                               6,7GB
    #

    # tailor maximum connections to what our application server requires
application
    # server to use 30 + 8 at most few left over for i.e. psql thingies
    max_connections = 40

    # 1/2 up to 3/4 of total memory, used by all connections and such
    shared_buffers = 1024MB

    temp_buffers = 64MB

    max_prepared_transactions = 25

    # sorting large sets of queries (such as events, event details)
    # might require some memory to be able to be done in RAM
    # the total work memory is max_connections * work_mem
    # dynamically addressed
    work_mem = 75MB

    # been told that no one should want to use this anymore
    # there are no known disadvantages of not writing full pages
    full_page_writes = off

    wal_buffers = 8MB

    # larger numbers of segments to write logs into makes each of them easier,
    # thus checkpointing becoming less of a resource hog
    checkpoint_segments = 32

    # longer time to do the same amount of work,
    # thus checkpointing becoming less of a resource hog
    checkpoint_timeout = 10min

    # longer time to do the same amount of work,
    # thus checkpointing becoming less of a resource hog
    checkpoint_completion_target = 0.8

    # 1/2 up to 3/4 of total memory
    # used to keep query results for reuse for a certain amount of time
    # dynamically addressed
    effective_cache_size = 2048MB

    # the higher the value, the better the query planner optimizes for
performance
    default_statistics_target = 100

    # constraint_exclusion not retained over dump/restore otherwise
    constraint_exclusion = on

    log_destination = 'syslog'

    # log queries if they take longer than this value (in ms)
    log_min_duration_statement = 300

    # auto-vacuum less often, thus less of a performance hog
    autovacuum_vacuum_threshold = 500

    # analyze less often, thus less of a performance hog
    autovacuum_analyze_threshold = 500


===OPERATING SYSTEM AND VERSION===

Ubuntu 8.04.4 LTS \n \l
Linux 2.6.24-23-server
Software RAID I (MDADM)


===FOR PERFORMANCE QUESTIONS: WHAT KIND OF HARDWARE===

INTEL CR2 DUO DT 1.8G 800F 2M 775P TY(G)
SEAGATE 80G 3.5" SATA 7KRPM 8M(G)
8G DDR2-800 RAM


===FULL TABLE AND INDEX SCHEMA===

    CREATE TABLE events_events
    (
      id bigserial NOT NULL,
      carparkid bigint,
      cleared boolean NOT NULL,
      datetime timestamp with time zone,
      identity character varying(255),
      generatedbystationid bigint,
      eventtype_id bigint NOT NULL,
      relatedstationid bigint,
      processingstatus character varying(255) NOT NULL,
      transactionid character varying(36),
      CONSTRAINT events_events_pkey PRIMARY KEY (id),
      CONSTRAINT fk88fe3effa0559276 FOREIGN KEY (eventtype_id)
          REFERENCES events_event_types (id) MATCH SIMPLE
          ON UPDATE NO ACTION ON DELETE NO ACTION
    )
    WITH (OIDS=FALSE);
    ALTER TABLE events_events OWNER TO postgres;

    CREATE INDEX events_events_cleared_eventtype_id_datetime_ind ON
events_events USING btree (cleared, eventtype_id, datetime);
    CREATE INDEX events_events_cleared_ind ON events_events USING btree
(cleared);
    CREATE INDEX events_events_datetime_cleared_ind ON events_events USING
btree (datetime, cleared) WHERE NOT cleared;
    CREATE INDEX events_events_datetime_eventtype_id_ind ON events_events USING
btree (datetime, eventtype_id);
    CREATE INDEX events_events_datetime_ind ON events_events USING btree
(datetime);
    CREATE INDEX events_events_eventtype_id_datetime_ind ON events_events USING
btree (eventtype_id, datetime);
    CREATE INDEX events_events_eventtype_id_ind ON events_events USING btree
(eventtype_id);
    CREATE INDEX events_events_identity_ind ON events_events USING btree
(identity);
    CREATE INDEX events_events_not_cleared_ind ON events_events USING btree
(cleared) WHERE NOT cleared;
    CREATE INDEX events_events_processingstatus_new ON events_events USING
btree (processingstatus) WHERE processingstatus::text = 'NEW'::text;
    CREATE INDEX events_events_relatedstation_eventtype_datetime_desc_ind ON
events_events USING btree (relatedstationid, eventtype_id, datetime);


    CREATE TABLE events_eventdetails
    (
      id bigserial NOT NULL,
      keyname character varying(255) NOT NULL,
      "value" text NOT NULL,
      event_id bigint NOT NULL,
      listindex integer,
      CONSTRAINT events_eventdetails_pkey PRIMARY KEY (id),
      CONSTRAINT events_eventdetails_event_id_fk FOREIGN KEY (event_id)
          REFERENCES events_events (id) MATCH SIMPLE
          ON UPDATE NO ACTION ON DELETE CASCADE,
      CONSTRAINT events_eventdetails_event_id_key UNIQUE (event_id, keyname,
listindex)
    )
    WITH (OIDS=FALSE);
    ALTER TABLE events_eventdetails OWNER TO postgres;

    CREATE INDEX events_eventdetails_event_id_ind ON events_eventdetails USING
btree (event_id);
    CREATE INDEX events_eventdetails_keyname_ind ON events_eventdetails USING
btree (keyname);
    CREATE INDEX events_eventdetails_substring_ind ON events_eventdetails USING
btree (keyname, "substring"(value, 0, 32));
    CREATE INDEX events_eventdetails_value_ind ON events_eventdetails USING
btree ("substring"(value, 0, 32));

    Many partitions, approx. 50. E.g.:
    CREATE OR REPLACE RULE events_eventdetails_insert_customer_id AS ON INSERT
TO events_eventdetails WHERE new.keyname::text = 'customer_id'::text DO
INSTEAD INSERT INTO events_eventdetails_customer_id (id, keyname, value,
event_id, listindex) VALUES (new.id, new.keyname, new.value, new.event_id,
new.listindex);


    CREATE TABLE events_eventdetails_customer_id
    (
    -- Inherited:   id bigint NOT NULL DEFAULT
nextval('events_eventdetails_id_seq'::regclass),
    -- Inherited:   keyname character varying(255) NOT NULL,
    -- Inherited:   "value" text NOT NULL,
    -- Inherited:   event_id bigint NOT NULL,
    -- Inherited:   listindex integer,
      CONSTRAINT events_eventdetails_customer_id_pkey PRIMARY KEY (id),
      CONSTRAINT events_eventdetails_customer_id_event_id_fk FOREIGN KEY
(event_id)
          REFERENCES events_events (id) MATCH SIMPLE
          ON UPDATE NO ACTION ON DELETE CASCADE,
      CONSTRAINT events_eventdetails_customer_id_keyname_check CHECK
(keyname::text = 'customer_id'::text)
    )
    INHERITS (events_eventdetails)
    WITH (OIDS=FALSE);
    ALTER TABLE events_eventdetails_customer_id OWNER TO postgres;

    CREATE INDEX events_eventdetails_customer_id_event_id_ind ON
events_eventdetails_customer_id USING btree (event_id);
    CREATE INDEX events_eventdetails_customer_id_substring_ind ON
events_eventdetails_customer_id USING btree (keyname, "substring"(value, 0,
32));


===TABLE METADATA===

    select count(*) from events_events; --> 3910163
    select count(*) from events_eventdetails; --> 30216033
    select count(*) from events_eventdetails_customer_id; (single partition)
--> 2976101


===EXPLAIN ANALYZE===

Limit  (cost=36962467348.39..36962467348.51 rows=50 width=52) (actual
time=58765.029..58765.078 rows=50 loops=1)
  ->  Sort  (cost=36962467348.39..37251140933.75 rows=115469434145 width=52)
(actual time=58765.023..58765.042 rows=50 loops=1)
        Sort Key: events1.datetime
        Sort Method:  top-N heapsort  Memory: 19kB
        ->  Unique  (cost=31971961433.07..33126655774.52 rows=115469434145
width=52) (actual time=58764.565..58764.844 rows=145 loops=1)
              ->  Sort  (cost=31971961433.07..32260635018.43
rows=115469434145 width=52) (actual time=58764.564..58764.652 rows=222
loops=1)
                    Sort Key: events1.id, events1.transactionid,
events1.datetime
                    Sort Method:  quicksort  Memory: 29kB
                    ->  Append  (cost=0.00..3256444445.93 rows=115469434145
width=52) (actual time=0.304..58763.738 rows=222 loops=1)
                          ->  Nested Loop  (cost=0.00..148161.10 rows=10345
width=52) (actual time=0.303..2.781 rows=145 loops=1)
                                ->  Append  (cost=0.00..21312.39 rows=15312
width=8) (actual time=0.236..0.738 rows=187 loops=1)
                                      ->  Index Scan using
events_eventdetails_substring_ind on events_eventdetails customerdetails
(cost=0.00..457.37 rows=113 width=8) (actual time=0.077..0.077 rows=0
loops=1)
                                            Index Cond: (((keyname)::text =
'customer_id'::text) AND ("substring"(value, 0, 32) = '598124'::text))
                                      ->  Index Scan using
events_eventdetails_customer_id_substring_ind on
events_eventdetails_customer_id customerdetails  (cost=0.00..20855.02
rows=15199 width=8) (actual time=0.158..0.530 rows=187 loops=1)
                                            Index Cond: (((keyname)::text =
'customer_id'::text) AND ("substring"(value, 0, 32) = '598124'::text))
                                ->  Index Scan using events_events_pkey on
events_events events1  (cost=0.00..8.27 rows=1 width=52) (actual
time=0.009..0.009 rows=1 loops=187)
                                      Index Cond: (events1.id =
customerdetails.event_id)
                                      Filter: (events1.eventtype_id = ANY

('{100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108}'::bigint[]))
                          ->  Merge Join  (cost=369560509.82..2101601943.38
rows=115469423800 width=52) (actual time=58760.353..58760.806 rows=77
loops=1)
                                Merge Cond: (customerdetails.event_id =
details2_transvalue.event_id)
                                ->  Sort  (cost=24111.00..24149.28
rows=15312 width=8) (actual time=0.644..0.710 rows=187 loops=1)
                                      Sort Key: customerdetails.event_id
                                      Sort Method:  quicksort  Memory: 24kB
                                      ->  Append  (cost=0.00..23046.64
rows=15312 width=8) (actual time=0.130..0.461 rows=187 loops=1)
                                            ->  Index Scan using
events_eventdetails_substring_ind on events_eventdetails customerdetails
(cost=0.00..457.37 rows=113 width=8) (actual time=0.021..0.021 rows=0
loops=1)
                                                  Index Cond:
(((keyname)::text = 'customer_id'::text) AND ("substring"(value, 0, 32) =
'598124'::text))
                                            ->  Index Scan using
events_eventdetails_customer_id_substring_ind on
events_eventdetails_customer_id customerdetails  (cost=0.00..22589.27
rows=15199 width=8) (actual time=0.107..0.319 rows=187 loops=1)
                                                  Index Cond:
(((keyname)::text = 'customer_id'::text) AND ("substring"(value, 0, 32) =
'598124'::text))
                                ->  Materialize
(cost=369536398.82..388389165.24 rows=1508221314 width=60) (actual
time=56515.482..58227.360 rows=986788 loops=1)
                                      ->  Sort
(cost=369536398.82..373306952.10 rows=1508221314 width=60) (actual
time=56515.478..57357.833 rows=986788 loops=1)
                                            Sort Key:
details2_transvalue.event_id
                                            Sort Method:  external merge
Disk: 69416kB
                                            ->  Merge Join
(cost=1181483.03..31350423.76 rows=1508221314 width=60) (actual
time=42137.760..51804.819 rows=986808 loops=1)
                                                  Merge Cond:
(("substring"(details2_transkey.value, 0, 32)) =
("substring"(details2_transvalue.value, 0, 32)))
                                                  ->  Sort
(cost=908652.98..909781.59 rows=451445 width=127) (actual
time=25898.797..27330.921 rows=658637 loops=1)
                                                        Sort Key:
("substring"(details2_transkey.value, 0, 32))
                                                        Sort Method:
external merge  Disk: 85584kB
                                                        ->  Hash Join
(cost=621670.67..866252.84 rows=451445 width=127) (actual
time=8238.959..15256.168 rows=658637 loops=1)
                                                              Hash Cond:
(details2_transkey.event_id = events2.id)
                                                              ->  Append
(cost=16092.38..208184.56 rows=668175 width=83) (actual
time=383.062..3180.853 rows=658638 loops=1)
                                                                    ->
Bitmap Heap Scan on events_eventdetails details2_transkey
(cost=16092.38..208184.56 rows=668175 width=83) (actual
time=383.060..2755.386 rows=658638 loops=1)

Recheck Cond: ((keyname)::text = 'transactionId'::text)
                                                                          ->
Bitmap Index Scan on events_eventdetails_keyname_ind  (cost=0.00..15925.33
rows=668175 width=0) (actual time=274.388..274.388 rows=658909 loops=1)

Index Cond: ((keyname)::text = 'transactionId'::text)
                                                              ->  Hash
(cost=548042.97..548042.97 rows=2641946 width=52) (actual
time=7855.242..7855.242 rows=3711961 loops=1)
                                                                    ->
Bitmap Heap Scan on events_events events2  (cost=75211.99..548042.97
rows=2641946 width=52) (actual time=1024.685..4581.588 rows=3711961 loops=1)

Recheck Cond: (eventtype_id = ANY

('{100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108}'::bigint[]))
                                                                          ->
Bitmap Index Scan on events_events_eventtype_id_ind  (cost=0.00..74551.50
rows=2641946 width=0) (actual time=983.354..983.354 rows=3712003 loops=1)

Index Cond: (eventtype_id = ANY

('{100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108}'::bigint[]))
                                                  ->  Sort
(cost=272830.05..274500.49 rows=668175 width=83) (actual
time=16238.940..16958.522 rows=986808 loops=1)
                                                        Sort Key:
("substring"(details2_transvalue.value, 0, 32))
                                                        Sort Method:
external sort  Disk: 61784kB
                                                        ->  Result
(cost=16092.38..208184.56 rows=668175 width=83) (actual
time=391.124..4336.367 rows=658638 loops=1)
                                                              ->  Append
(cost=16092.38..208184.56 rows=668175 width=83) (actual
time=391.104..3130.494 rows=658638 loops=1)
                                                                    ->
Bitmap Heap Scan on events_eventdetails details2_transvalue
(cost=16092.38..208184.56 rows=668175 width=83) (actual
time=391.103..2713.520 rows=658638 loops=1)

Recheck Cond: ((keyname)::text = 'transactionId'::text)
                                                                          ->
Bitmap Index Scan on events_eventdetails_keyname_ind  (cost=0.00..15925.33
rows=668175 width=0) (actual time=283.327..283.327 rows=658909 loops=1)

Index Cond: ((keyname)::text = 'transactionId'::text)
Total runtime: 58869.397 ms

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Performance-trouble-finding-records-through-related-records-tp3405914p3405914.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: Performance trouble finding records through related records

From

Andy Colson

Date:

01 March 2011, 23:41:33

On 03/01/2011 06:14 PM, sverhagen wrote:
> Hi, appreciated mailing list. Thanks already for taking your time for my
> performance question. Regards, Sander.
>
>
> ===POSTGRESQL VERSION AND ORIGIN===
>
> PostgreSQL 8.3.9 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.2.4
> (Ubuntu 4.2.4-1ubuntu3)
> Installed using "apt-get install postgresql-8.3"
>
>
> ===A DESCRIPTION OF WHAT YOU ARE TRYING TO ACHIEVE===
>
> Query involving tables events_events and events_eventdetails. There is any
> number of events_eventdetails records for each events_events record.
>
> There may be multiple records in events_events that have the same value for
> their transactionId, which is available in one of their events_eventdetails
> records.
>
> We want a total query that returns events_events records that match
> condition I. or II., sorted by datetime descending, first 50.
>
> Condition I.
> All events_events records for which an events_eventdetails records that
> matches the following conditions:
> - Column keyname (in events_eventdetails) equals "customerId", and
> - Column value (in events_eventdetails) equals 598124, or more precisely
> substring(customerDetails.value,0,32)='598124'
>
> Condition II.
> All events_events records that have a same value for in one of their
> events_eventdetails records with keyname 'transactionId' as any of the
> resulting events_events records of condition I.
>
> In other words: I want all events for a certain customerId, and all events
> with the same transactionId as those.
>
> The total query's time should be of the magnitude 100ms, but currently is of
> the magnitude 1min.
>
> JUST FOR THE PURPOSE OF EXPERIMENT I've now a denormalized copy of
> transactionId as a column in the events_events records. Been trying queries
> on those, with no improvements.
>
> I am not seeking WHY my query is too slow, rather trying to find a way to
> get it faster :-)
>

<much snippage>

First off, excellent detail.

Second, your explain analyze was hard to read... but since you are not really interested in your posted query, I wont
worryabout looking at it... but... have you seen: 

http://explain.depesz.com/

Its nice.

And last, to my questions:

SELECT events1.id, events1.transactionId, events1.dateTime FROM
events_events events1
JOIN events_eventdetails customerDetails
ON events1.id = customerDetails.event_id
AND customerDetails.keyname='customer_id'
AND substring(customerDetails.value,0,32)='598124'
WHERE events1.eventtype_id IN

(100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108)
UNION
SELECT events2.id, events2.transactionId, events2.dateTime FROM
events_events events2
JOIN events_eventdetails details2_transKey
    ON events2.id = details2_transKey.event_id
    AND details2_transKey.keyname='transactionId'
JOIN events_eventdetails details2_transValue
    ON substring(details2_transKey.value,0,32) =
substring(details2_transValue.value,0,32)
    AND details2_transValue.keyname='transactionId'
JOIN events_eventdetails customerDetails
    ON details2_transValue.event_id = customerDetails.event_id
    AND customerDetails.keyname='customer_id'
    AND substring(customerDetails.value,0,32)='598124'
WHERE events2.eventtype_id IN

(100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108)
ORDER BY dateTime DESC LIMIT 50


If you run the individual queries, without the union, are the part's slow too?

Looked like your row counts (the estimate vs the actual) were way off, have you analyzed lately?

I could not tell from the explain analyze if an index was used, but I notice you have a ton of indexes on events_events
table. You have two indexes on the same fields but in reverse order: 

events_events_eventtype_id_datetime_ind (datetime, eventtype_id);
events_events_datetime_eventtype_id_ind (eventtype_id, datetime);

AND both eventtype_id and datetime are in other indexes!  I think you need to review your indexes.  Drop all of them
andadd one or two that are actually useful. 


A useful tool I have found for complex queries is to break them down into smaller sub sets, write sql that get's me
justthose sets, and them add them all back into one main query with subselects: 

select a,b,c,...
from events_events
where
  id in ( select id from details where some subset is needed )
and id not in ( select id frome details where some set is bad )
and id in ( select anotherid from anothertable  where ... )


Its the subselects you need to think about.  Find one that gets you a small set that's interesting somehow.  Once you
getall your little sets, its easy to combine them. 

-Andy

Re: Performance trouble finding records through related records

From

sverhagen

Date:

02 March 2011, 20:12:45

Thanks for your help already!
Hope you're up for some more :-)

Andy Colson wrote:
>
> First off, excellent detail.
>
> Second, your explain analyze was hard to read... but since you are not
> really interested in your posted query, I wont worry about looking at
> it... but... have you seen:
>
> http://explain.depesz.com/
>

Thanks for that. Using it below :-)

Andy Colson wrote:
>
> If you run the individual queries, without the union, are the part's slow
> too?
>

Only problem is the second part. So that part can safely be isolated. Also
the following does not play a role at this point: WHERE events2.eventtype_id
IN
(100,103,105,...

Then I went ahead and denormalized the transactionId on both ends, so that
both events_events records and events_eventdetails records have the
transactionId (or NULL). That simplifies the query to this:

    SELECT events_events.* FROM events_events WHERE transactionid IN (
        SELECT transactionid FROM events_eventdetails customerDetails
        WHERE customerDetails.keyname='customer_id'
        AND substring(customerDetails.value,0,32)='1957'
        AND transactionid IS NOT NULL
    ) ORDER BY id LIMIT 50;

To no avail. Also changing the above WHERE IN into implicit/explicit JOIN's
doesn't make more than a marginal difference. Should joining not be very
efficient somehow?

http://explain.depesz.com/s/Pnb

The above link nicely shows the hotspots, but I am at a loss now as how to
approach them.

Andy Colson wrote:
>
> Looked like your row counts (the estimate vs the actual) were way off,
> have you analyzed lately?
>

Note sure what that means.
Isn't all the maintenance nicely automated through my config?

Andy Colson wrote:
>
> I could not tell from the explain analyze if an index was used, but I
> notice you have a ton of indexes on events_events table.
>

Yes, a ton of indexes, but still not the right one :-)

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Performance-trouble-finding-records-through-related-records-tp3405914p3407330.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: Performance trouble finding records through related records

From

Andy Colson

Date:

02 March 2011, 23:04:54

On 03/02/2011 06:12 PM, sverhagen wrote:
> Thanks for your help already!
> Hope you're up for some more :-)
>
>
> Andy Colson wrote:
>>
>> First off, excellent detail.
>>
>> Second, your explain analyze was hard to read... but since you are not
>> really interested in your posted query, I wont worry about looking at
>> it... but... have you seen:
>>
>> http://explain.depesz.com/
>>
>
> Thanks for that. Using it below :-)
>
>
> Andy Colson wrote:
>>
>> If you run the individual queries, without the union, are the part's slow
>> too?
>>
>
> Only problem is the second part. So that part can safely be isolated. Also
> the following does not play a role at this point: WHERE events2.eventtype_id
> IN
> (100,103,105,...
>
> Then I went ahead and denormalized the transactionId on both ends, so that
> both events_events records and events_eventdetails records have the
> transactionId (or NULL). That simplifies the query to this:
>
>     SELECT events_events.* FROM events_events WHERE transactionid IN (
>         SELECT transactionid FROM events_eventdetails customerDetails
>         WHERE customerDetails.keyname='customer_id'
>         AND substring(customerDetails.value,0,32)='1957'
>         AND transactionid IS NOT NULL
>     ) ORDER BY id LIMIT 50;
>
> To no avail. Also changing the above WHERE IN into implicit/explicit JOIN's
> doesn't make more than a marginal difference. Should joining not be very
> efficient somehow?
>
> http://explain.depesz.com/s/Pnb
>
> The above link nicely shows the hotspots, but I am at a loss now as how to
> approach them.
>
>
> Andy Colson wrote:
>>
>> Looked like your row counts (the estimate vs the actual) were way off,
>> have you analyzed lately?
>>
>
> Note sure what that means.
> Isn't all the maintenance nicely automated through my config?
>
>

In the explain analyze you'll see stuff like:
Append  (cost=0.00..3256444445.93 rows=115469434145 width=52) (actual time=0.304..58763.738 rows=222 loops=1)

This is taken from your first email.  Red flags should go off when the row counts are not close.  The first set is the
planner'sguess.  The second set is what actually happened.  The planner thought there would be 115,469,434,145 rows..
butturned out to only be 222.  That's usually caused by bad stats. 

> Isn't all the maintenance nicely automated through my config?
>

I'd never assume.  But the numbers in the plan you posted:

> http://explain.depesz.com/s/Pnb

look fine to me (well, the row counts), and I didnt look to much at that plan in the first email, so we can probably
ignoreit. 


> Andy Colson wrote:
>>
>> I could not tell from the explain analyze if an index was used, but I
>> notice you have a ton of indexes on events_events table.
>>
>
> Yes, a ton of indexes, but still not the right one :-)

But... many indexes will slow down update/inserts.  And an index on an unselective field can cause more problems than
itwould help.  Especially if the stats are off.  If PG has lots and lots of options, it'll take longer to plan querys
too. If it picks an index to use, that it thinks is selective, but in reality is not, you are in for a world of hurt. 

For your query, I think a join would be the best bet, can we see its explain analyze?

-Andy

Re: Performance trouble finding records through related records

From

sverhagen

Date:

03 March 2011, 05:19:29

Andy Colson wrote:
>
> For your query, I think a join would be the best bet, can we see its
> explain analyze?
>


Here is a few variations:


SELECT events_events.* FROM events_events WHERE transactionid IN (
    SELECT transactionid FROM events_eventdetails customerDetails
    WHERE customerDetails.keyname='customer_id'
    AND substring(customerDetails.value,0,32)='1957'
    AND transactionid IS NOT NULL
) ORDER BY id LIMIT 50;

-- http://explain.depesz.com/s/Pnb


explain analyze SELECT events_events.* FROM events_events,
events_eventdetails customerDetails
    WHERE events_events.transactionid = customerDetails.transactionid
    AND customerDetails.keyname='customer_id'
    AND substring(customerDetails.value,0,32)='1957'
    AND customerDetails.transactionid IS NOT NULL
ORDER BY id LIMIT 50;

-- http://explain.depesz.com/s/rDh


explain analyze SELECT events_events.* FROM events_events
JOIN events_eventdetails customerDetails
    ON events_events.transactionid = customerDetails.transactionid
    AND customerDetails.keyname='customer_id'
    AND substring(customerDetails.value,0,32)='1957'
    AND customerDetails.transactionid IS NOT NULL
ORDER BY id LIMIT 50;

-- http://explain.depesz.com/s/6aB


Thanks for your efforts!

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Performance-trouble-finding-records-through-related-records-tp3405914p3407689.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: Performance trouble finding records through related records

From

Andy Colson

Date:

03 March 2011, 10:55:51

On 3/3/2011 3:19 AM, sverhagen wrote:
>
> Andy Colson wrote:
>>
>> For your query, I think a join would be the best bet, can we see its
>> explain analyze?
>>
>
>
> Here is a few variations:
>
>
> SELECT events_events.* FROM events_events WHERE transactionid IN (
>     SELECT transactionid FROM events_eventdetails customerDetails
>     WHERE customerDetails.keyname='customer_id'
>     AND substring(customerDetails.value,0,32)='1957'
>     AND transactionid IS NOT NULL
> ) ORDER BY id LIMIT 50;
>
> -- http://explain.depesz.com/s/Pnb
>
>
> explain analyze SELECT events_events.* FROM events_events,
> events_eventdetails customerDetails
>     WHERE events_events.transactionid = customerDetails.transactionid
>     AND customerDetails.keyname='customer_id'
>     AND substring(customerDetails.value,0,32)='1957'
>     AND customerDetails.transactionid IS NOT NULL
> ORDER BY id LIMIT 50;
>
> -- http://explain.depesz.com/s/rDh
>
>
> explain analyze SELECT events_events.* FROM events_events
> JOIN events_eventdetails customerDetails
>     ON events_events.transactionid = customerDetails.transactionid
>     AND customerDetails.keyname='customer_id'
>     AND substring(customerDetails.value,0,32)='1957'
>     AND customerDetails.transactionid IS NOT NULL
> ORDER BY id LIMIT 50;
>
> -- http://explain.depesz.com/s/6aB
>
>
> Thanks for your efforts!
>

Huh.  Pretty much exactly the same.  I'm sorry but I think I'm at my
limit.  I'm not sure why the nested loop takes so long, or how to get it
to use something different.

-Andy

Re: Performance trouble finding records through related records

From

Robert Haas

Date:

07 March 2011, 15:26:00

On Thu, Mar 3, 2011 at 9:55 AM, Andy Colson <andy@squeakycode.net> wrote:
>> explain analyze SELECT events_events.* FROM events_events
>> JOIN events_eventdetails customerDetails
>>        ON events_events.transactionid = customerDetails.transactionid
>>        AND customerDetails.keyname='customer_id'
>>        AND substring(customerDetails.value,0,32)='1957'
>>        AND customerDetails.transactionid IS NOT NULL
>> ORDER BY id LIMIT 50;
>>
>> -- http://explain.depesz.com/s/6aB
>>
>>
>> Thanks for your efforts!
>>
>
> Huh.  Pretty much exactly the same.  I'm sorry but I think I'm at my limit.
>  I'm not sure why the nested loop takes so long, or how to get it to use
> something different.

The join condition is showing up in the explain output as:

Join Filter: ((events_events.transactionid)::text =
(customerdetails.transactionid)::text)

Now why is there a cast to text there on both sides?  Do those two
columns have exactly the same datatype?  If not, you probably want to
fix that, as it can make a big difference.

Also, how many rows are there in events_events and how many in
events_eventdetails?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Performance trouble finding records through related records

From

Merlin Moncure

Date:

07 March 2011, 16:27:36

On Wed, Mar 2, 2011 at 6:12 PM, sverhagen <sverhagen@wps-nl.com> wrote:
> Thanks for your help already!
> Hope you're up for some more :-)
>
>
> Andy Colson wrote:
>>
>> First off, excellent detail.
>>
>> Second, your explain analyze was hard to read... but since you are not
>> really interested in your posted query, I wont worry about looking at
>> it... but... have you seen:
>>
>> http://explain.depesz.com/
>>
>
> Thanks for that. Using it below :-)
>
>
> Andy Colson wrote:
>>
>> If you run the individual queries, without the union, are the part's slow
>> too?
>>
>
> Only problem is the second part. So that part can safely be isolated. Also
> the following does not play a role at this point: WHERE events2.eventtype_id
> IN
> (100,103,105,...
>
> Then I went ahead and denormalized the transactionId on both ends, so that
> both events_events records and events_eventdetails records have the
> transactionId (or NULL). That simplifies the query to this:
>
>        SELECT events_events.* FROM events_events WHERE transactionid IN (
>                SELECT transactionid FROM events_eventdetails customerDetails
>                WHERE customerDetails.keyname='customer_id'
>                AND substring(customerDetails.value,0,32)='1957'
>                AND transactionid IS NOT NULL
>        ) ORDER BY id LIMIT 50;

8.3? try converting the above to WHERE EXISTS or (even better) a JOIN...

merlin

Re: Performance trouble finding records through related records

From

sverhagen

Date:

08 March 2011, 07:17:51

Hi. Thanks for your response.


Robert Haas wrote:
>
> Join Filter: ((events_events.transactionid)::text =
> (customerdetails.transactionid)::text)
>
> Now why is there a cast to text there on both sides?  Do those two
> columns have exactly the same datatype?  If not, you probably want to
> fix that, as it can make a big difference.
>

Good question. I seem not able to get rid of that, even though these are
same type:

    =# \d events_events
    Table "public.events_events"
            Column        |           Type           | Modifiers
    ----------------------+--------------------------+----------
    [snip]
     transactionid        | character varying(36)    | not null
    [snip]

    =# \d events_eventdetails
    Table "public.events_eventdetails"
        Column     |          Type          | Modifiers
    ---------------+------------------------+----------
    [snip]
     transactionid | character varying(36)  | not null
    [snip]

(These columns allowing null or not is just something I've been playing with
to no avail too.)



Robert Haas wrote:
>
> Also, how many rows are there in events_events and how many in
> events_eventdetails?
>

select count(*) from events_events; --> 3910163
select count(*) from events_eventdetails; --> 30216033
select count(*) from events_eventdetails_customer_id; (single partition) -->
2976101


--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Performance-trouble-finding-records-through-related-records-tp3405914p3413801.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: Performance trouble finding records through related records

From

sverhagen

Date:

08 March 2011, 07:31:51

Merlin Moncure-2 wrote:
>
>
> 8.3? try converting the above to WHERE EXISTS or (even better) a JOIN...
>
>

Thanks for that. But in my Mar 03, 2011; 10:19am post I already broke it
down to the barebones with some variations, among which JOIN. The EXISTS IN
variation was so poor that I left that one out.

Best regards, Sander.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Performance-trouble-finding-records-through-related-records-tp3405914p3413814.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: Performance trouble finding records through related records

From

sverhagen

Date:

10 March 2011, 10:10:52

Hi, all. I've done some further analysis, found a form that works if I split
things over two separate queries (B1 and B2, below) but still trouble when
combining (B, below).

This is the full pseudo-query: SELECT FROM A UNION SELECT FROM B ORDER BY
dateTime DESC LIMIT 50
In that pseudo-query:
        - A is fast (few ms). A is all events for the given customer
        - B is slow (1 minute). B is all events for the same transactions as
all events for the given customer

Zooming in on B it looks originally as follows:

SELECT events2.id, events2.transactionId, events2.dateTime FROM
events_events events2
JOIN events_eventdetails details2_transKey
        ON events2.id = details2_transKey.event_id
        AND details2_transKey.keyname='transactionId'
JOIN events_eventdetails details2_transValue
        ON substring(details2_transKey.value,0,32) =
substring(details2_transValue.value,0,32)
        AND details2_transValue.keyname='transactionId'
JOIN events_eventdetails customerDetails
        ON details2_transValue.event_id = customerDetails.event_id
        AND customerDetails.keyname='customer_id'
        AND substring(customerDetails.value,0,32)='598124'
WHERE events2.eventtype_id IN (100,103,105,... et cetera ...)


The above version of B is tremendously slow.

The only fast version I've yet come to find is as follows:
        - Do a sub-query B1
        - Do a sub-query B2 with the results of B1

B1 looks as follows:
Works very fast (few ms)
http://explain.depesz.com/s/7JS

SELECT substring(details2_transValue.value,0,32)
FROM events_eventdetails_customer_id customerDetails
JOIN only events_eventdetails details2_transValue
USING (event_id)
        WHERE customerDetails.keyname='customer_id'
        AND substring(customerDetails.value,0,32)='49'
        AND details2_transValue.keyname='transactionId'


B2 looks as follows:
Works very fast (few ms)
http://explain.depesz.com/s/jGO

SELECT events2.id, events2.dateTime
FROM events_events events2
JOIN events_eventdetails details2_transKey
ON events2.id = details2_transKey.event_id
        AND details2_transKey.keyname='transactionId'
        AND substring(details2_transKey.value,0,32) IN (... results of B1
...)
        AND events2.eventtype_id IN

(100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108)

The combined version of B works slow again (3-10 seconds):
http://explain.depesz.com/s/9oM

SELECT events2.id, events2.dateTime
FROM events_events events2
JOIN events_eventdetails details2_transKey
ON events2.id = details2_transKey.event_id
        AND details2_transKey.keyname='transactionId'
        AND substring(details2_transKey.value,0,32) IN (
                SELECT substring(details2_transValue.value,0,32)
                FROM events_eventdetails_customer_id customerDetails
                JOIN only events_eventdetails details2_transValue
                USING (event_id)
                        WHERE customerDetails.keyname='customer_id'
                        AND substring(customerDetails.value,0,32)='49'
                        AND details2_transValue.keyname='transactionId')
        AND events2.eventtype_id IN

(100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108)

At the moment I see not other conclusion than to offer B1 and B2 to the
database separately, but it feels like defeat :-|


--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Performance-trouble-finding-records-through-related-records-tp3405914p3423334.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.