Thread: Performance trouble finding records through related records
Hi, appreciated mailing list. Thanks already for taking your time for my performance question. Regards, Sander. ===POSTGRESQL VERSION AND ORIGIN=== PostgreSQL 8.3.9 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.2.4 (Ubuntu 4.2.4-1ubuntu3) Installed using "apt-get install postgresql-8.3" ===A DESCRIPTION OF WHAT YOU ARE TRYING TO ACHIEVE=== Query involving tables events_events and events_eventdetails. There is any number of events_eventdetails records for each events_events record. There may be multiple records in events_events that have the same value for their transactionId, which is available in one of their events_eventdetails records. We want a total query that returns events_events records that match condition I. or II., sorted by datetime descending, first 50. Condition I. All events_events records for which an events_eventdetails records that matches the following conditions: - Column keyname (in events_eventdetails) equals "customerId", and - Column value (in events_eventdetails) equals 598124, or more precisely substring(customerDetails.value,0,32)='598124' Condition II. All events_events records that have a same value for in one of their events_eventdetails records with keyname 'transactionId' as any of the resulting events_events records of condition I. In other words: I want all events for a certain customerId, and all events with the same transactionId as those. The total query's time should be of the magnitude 100ms, but currently is of the magnitude 1min. JUST FOR THE PURPOSE OF EXPERIMENT I've now a denormalized copy of transactionId as a column in the events_events records. Been trying queries on those, with no improvements. I am not seeking WHY my query is too slow, rather trying to find a way to get it faster :-) ===THE EXACT TEXT OF THE QUERY YOU RAN=== The total query: SELECT events1.id, events1.transactionId, events1.dateTime FROM events_events events1 JOIN events_eventdetails customerDetails ON events1.id = customerDetails.event_id AND customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='598124' WHERE events1.eventtype_id IN (100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108) UNION SELECT events2.id, events2.transactionId, events2.dateTime FROM events_events events2 JOIN events_eventdetails details2_transKey ON events2.id = details2_transKey.event_id AND details2_transKey.keyname='transactionId' JOIN events_eventdetails details2_transValue ON substring(details2_transKey.value,0,32) = substring(details2_transValue.value,0,32) AND details2_transValue.keyname='transactionId' JOIN events_eventdetails customerDetails ON details2_transValue.event_id = customerDetails.event_id AND customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='598124' WHERE events2.eventtype_id IN (100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108) ORDER BY dateTime DESC LIMIT 50 ===THE EXACT OUTPUT OF THAT QUERY=== The exactly correct and desired output is as follows: id | transactionid | datetime ----------+-------------------------------------+---------------------------- 16336643 | | 2011-03-01 11:10:38.648+01 16336642 | | 2011-03-01 11:10:35.629+01 16336641 | | 2011-03-01 11:10:35.625+01 16336637 | | 2011-03-01 11:09:53.306+01 16336634 | | 2011-03-01 11:09:14.027+01 16336633 | 26eaeb24-7a93-4c9a-99f9-bd3b77f9636 | 2011-03-01 11:09:14.004+01 16336632 | 26eaeb24-7a93-4c9a-99f9-bd3b77f9636 | 2011-03-01 11:09:13.925+01 16336631 | | 2011-03-01 11:09:13.873+01 16336630 | | 2011-03-01 11:09:13.741+01 16336626 | | 2011-03-01 11:09:08.931+01 16336625 | | 2011-03-01 11:09:01.811+01 16336624 | 2037f235-89d2-402a-90eb-3bcf40d633c | 2011-03-01 11:09:01.771+01 16336623 | 2037f235-89d2-402a-90eb-3bcf40d633c | 2011-03-01 11:09:01.729+01 16336611 | | 2011-03-01 11:08:08.63+01 16336610 | | 2011-03-01 11:08:02.805+01 16336609 | | 2011-03-01 11:08:02.801+01 16336606 | | 2011-03-01 11:07:55.324+01 16336602 | | 2011-03-01 11:07:38.63+01 16336600 | | 2011-03-01 11:07:34.561+01 16336599 | | 2011-03-01 11:07:34.547+01 16336595 | | 2011-03-01 11:07:24.471+01 16336594 | e3235ae5-9f40-4ceb-94bf-61ed99b793a | 2011-03-01 11:07:24.445+01 16336593 | e3235ae5-9f40-4ceb-94bf-61ed99b793a | 2011-03-01 11:07:24.373+01 16336591 | | 2011-03-01 11:07:24.268+01 16336590 | | 2011-03-01 11:07:24.065+01 16336583 | | 2011-03-01 11:06:43.63+01 16336582 | | 2011-03-01 11:06:36.977+01 16336581 | | 2011-03-01 11:06:36.973+01 16336575 | | 2011-03-01 11:06:18.637+01 16336573 | | 2011-03-01 11:06:16.728+01 16336572 | | 2011-03-01 11:06:16.723+01 16336569 | | 2011-03-01 11:06:06.662+01 16336568 | 3519e6de-bc8f-4641-a686-c088e459b47 | 2011-03-01 11:06:06.639+01 16336567 | 3519e6de-bc8f-4641-a686-c088e459b47 | 2011-03-01 11:06:06.569+01 16336566 | | 2011-03-01 11:06:06.526+01 16336565 | | 2011-03-01 11:06:06.359+01 16336561 | | 2011-03-01 11:05:58.868+01 16336560 | | 2011-03-01 11:05:50.80+01 16336559 | fbd5c7b2-6035-45cf-9222-a3d9f77d9ae | 2011-03-01 11:05:50.767+01 16336558 | fbd5c7b2-6035-45cf-9222-a3d9f77d9ae | 2011-03-01 11:05:50.724+01 16336550 | | 2011-03-01 11:05:33.631+01 16336549 | | 2011-03-01 11:05:28.313+01 16336548 | | 2011-03-01 11:05:28.309+01 16336545 | | 2011-03-01 11:05:20.86+01 16336541 | | 2011-03-01 11:03:23.626+01 16336539 | | 2011-03-01 11:03:18.623+01 16336538 | | 2011-03-01 11:03:18.613+01 16336535 | | 2011-03-01 11:03:08.553+01 16336534 | db9dcfb9-870d-42f6-947f-9a22fb49592 | 2011-03-01 11:03:08.531+01 16336533 | db9dcfb9-870d-42f6-947f-9a22fb49592 | 2011-03-01 11:03:08.457+01 (50 rows) Time: 51442.732 ms Only problem is the time it took! ===WHAT PROGRAM YOU'RE USING TO CONNECT=== For this debugging purposes I'm using a mix of psql (on the target system) and pgAdmin (from my Windows machine). The final application involves a Java web application running in Tomcat 5.5. ===CHANGES MADE TO THE SETTINGS IN THE POSTGRESQL.CONF FILE=== # This specific configuration is made on basis of postgresql.conf of PostgreSQL 8.3 # # Memory usage: typical: maximum: # - Application server 1,2GB 1,5GB # - Shared buffers 1,0GB 1,0GB # - Work memory ? 2,2GB # ((>>connections) * work_mem) # (only counting relevant application server connections*) # - Query cache ? 2,0GB # ------------------------------------------- # Total 6,7GB # # tailor maximum connections to what our application server requires application # server to use 30 + 8 at most few left over for i.e. psql thingies max_connections = 40 # 1/2 up to 3/4 of total memory, used by all connections and such shared_buffers = 1024MB temp_buffers = 64MB max_prepared_transactions = 25 # sorting large sets of queries (such as events, event details) # might require some memory to be able to be done in RAM # the total work memory is max_connections * work_mem # dynamically addressed work_mem = 75MB # been told that no one should want to use this anymore # there are no known disadvantages of not writing full pages full_page_writes = off wal_buffers = 8MB # larger numbers of segments to write logs into makes each of them easier, # thus checkpointing becoming less of a resource hog checkpoint_segments = 32 # longer time to do the same amount of work, # thus checkpointing becoming less of a resource hog checkpoint_timeout = 10min # longer time to do the same amount of work, # thus checkpointing becoming less of a resource hog checkpoint_completion_target = 0.8 # 1/2 up to 3/4 of total memory # used to keep query results for reuse for a certain amount of time # dynamically addressed effective_cache_size = 2048MB # the higher the value, the better the query planner optimizes for performance default_statistics_target = 100 # constraint_exclusion not retained over dump/restore otherwise constraint_exclusion = on log_destination = 'syslog' # log queries if they take longer than this value (in ms) log_min_duration_statement = 300 # auto-vacuum less often, thus less of a performance hog autovacuum_vacuum_threshold = 500 # analyze less often, thus less of a performance hog autovacuum_analyze_threshold = 500 ===OPERATING SYSTEM AND VERSION=== Ubuntu 8.04.4 LTS \n \l Linux 2.6.24-23-server Software RAID I (MDADM) ===FOR PERFORMANCE QUESTIONS: WHAT KIND OF HARDWARE=== INTEL CR2 DUO DT 1.8G 800F 2M 775P TY(G) SEAGATE 80G 3.5" SATA 7KRPM 8M(G) 8G DDR2-800 RAM ===FULL TABLE AND INDEX SCHEMA=== CREATE TABLE events_events ( id bigserial NOT NULL, carparkid bigint, cleared boolean NOT NULL, datetime timestamp with time zone, identity character varying(255), generatedbystationid bigint, eventtype_id bigint NOT NULL, relatedstationid bigint, processingstatus character varying(255) NOT NULL, transactionid character varying(36), CONSTRAINT events_events_pkey PRIMARY KEY (id), CONSTRAINT fk88fe3effa0559276 FOREIGN KEY (eventtype_id) REFERENCES events_event_types (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) WITH (OIDS=FALSE); ALTER TABLE events_events OWNER TO postgres; CREATE INDEX events_events_cleared_eventtype_id_datetime_ind ON events_events USING btree (cleared, eventtype_id, datetime); CREATE INDEX events_events_cleared_ind ON events_events USING btree (cleared); CREATE INDEX events_events_datetime_cleared_ind ON events_events USING btree (datetime, cleared) WHERE NOT cleared; CREATE INDEX events_events_datetime_eventtype_id_ind ON events_events USING btree (datetime, eventtype_id); CREATE INDEX events_events_datetime_ind ON events_events USING btree (datetime); CREATE INDEX events_events_eventtype_id_datetime_ind ON events_events USING btree (eventtype_id, datetime); CREATE INDEX events_events_eventtype_id_ind ON events_events USING btree (eventtype_id); CREATE INDEX events_events_identity_ind ON events_events USING btree (identity); CREATE INDEX events_events_not_cleared_ind ON events_events USING btree (cleared) WHERE NOT cleared; CREATE INDEX events_events_processingstatus_new ON events_events USING btree (processingstatus) WHERE processingstatus::text = 'NEW'::text; CREATE INDEX events_events_relatedstation_eventtype_datetime_desc_ind ON events_events USING btree (relatedstationid, eventtype_id, datetime); CREATE TABLE events_eventdetails ( id bigserial NOT NULL, keyname character varying(255) NOT NULL, "value" text NOT NULL, event_id bigint NOT NULL, listindex integer, CONSTRAINT events_eventdetails_pkey PRIMARY KEY (id), CONSTRAINT events_eventdetails_event_id_fk FOREIGN KEY (event_id) REFERENCES events_events (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE CASCADE, CONSTRAINT events_eventdetails_event_id_key UNIQUE (event_id, keyname, listindex) ) WITH (OIDS=FALSE); ALTER TABLE events_eventdetails OWNER TO postgres; CREATE INDEX events_eventdetails_event_id_ind ON events_eventdetails USING btree (event_id); CREATE INDEX events_eventdetails_keyname_ind ON events_eventdetails USING btree (keyname); CREATE INDEX events_eventdetails_substring_ind ON events_eventdetails USING btree (keyname, "substring"(value, 0, 32)); CREATE INDEX events_eventdetails_value_ind ON events_eventdetails USING btree ("substring"(value, 0, 32)); Many partitions, approx. 50. E.g.: CREATE OR REPLACE RULE events_eventdetails_insert_customer_id AS ON INSERT TO events_eventdetails WHERE new.keyname::text = 'customer_id'::text DO INSTEAD INSERT INTO events_eventdetails_customer_id (id, keyname, value, event_id, listindex) VALUES (new.id, new.keyname, new.value, new.event_id, new.listindex); CREATE TABLE events_eventdetails_customer_id ( -- Inherited: id bigint NOT NULL DEFAULT nextval('events_eventdetails_id_seq'::regclass), -- Inherited: keyname character varying(255) NOT NULL, -- Inherited: "value" text NOT NULL, -- Inherited: event_id bigint NOT NULL, -- Inherited: listindex integer, CONSTRAINT events_eventdetails_customer_id_pkey PRIMARY KEY (id), CONSTRAINT events_eventdetails_customer_id_event_id_fk FOREIGN KEY (event_id) REFERENCES events_events (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE CASCADE, CONSTRAINT events_eventdetails_customer_id_keyname_check CHECK (keyname::text = 'customer_id'::text) ) INHERITS (events_eventdetails) WITH (OIDS=FALSE); ALTER TABLE events_eventdetails_customer_id OWNER TO postgres; CREATE INDEX events_eventdetails_customer_id_event_id_ind ON events_eventdetails_customer_id USING btree (event_id); CREATE INDEX events_eventdetails_customer_id_substring_ind ON events_eventdetails_customer_id USING btree (keyname, "substring"(value, 0, 32)); ===TABLE METADATA=== select count(*) from events_events; --> 3910163 select count(*) from events_eventdetails; --> 30216033 select count(*) from events_eventdetails_customer_id; (single partition) --> 2976101 ===EXPLAIN ANALYZE=== Limit (cost=36962467348.39..36962467348.51 rows=50 width=52) (actual time=58765.029..58765.078 rows=50 loops=1) -> Sort (cost=36962467348.39..37251140933.75 rows=115469434145 width=52) (actual time=58765.023..58765.042 rows=50 loops=1) Sort Key: events1.datetime Sort Method: top-N heapsort Memory: 19kB -> Unique (cost=31971961433.07..33126655774.52 rows=115469434145 width=52) (actual time=58764.565..58764.844 rows=145 loops=1) -> Sort (cost=31971961433.07..32260635018.43 rows=115469434145 width=52) (actual time=58764.564..58764.652 rows=222 loops=1) Sort Key: events1.id, events1.transactionid, events1.datetime Sort Method: quicksort Memory: 29kB -> Append (cost=0.00..3256444445.93 rows=115469434145 width=52) (actual time=0.304..58763.738 rows=222 loops=1) -> Nested Loop (cost=0.00..148161.10 rows=10345 width=52) (actual time=0.303..2.781 rows=145 loops=1) -> Append (cost=0.00..21312.39 rows=15312 width=8) (actual time=0.236..0.738 rows=187 loops=1) -> Index Scan using events_eventdetails_substring_ind on events_eventdetails customerdetails (cost=0.00..457.37 rows=113 width=8) (actual time=0.077..0.077 rows=0 loops=1) Index Cond: (((keyname)::text = 'customer_id'::text) AND ("substring"(value, 0, 32) = '598124'::text)) -> Index Scan using events_eventdetails_customer_id_substring_ind on events_eventdetails_customer_id customerdetails (cost=0.00..20855.02 rows=15199 width=8) (actual time=0.158..0.530 rows=187 loops=1) Index Cond: (((keyname)::text = 'customer_id'::text) AND ("substring"(value, 0, 32) = '598124'::text)) -> Index Scan using events_events_pkey on events_events events1 (cost=0.00..8.27 rows=1 width=52) (actual time=0.009..0.009 rows=1 loops=187) Index Cond: (events1.id = customerdetails.event_id) Filter: (events1.eventtype_id = ANY ('{100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108}'::bigint[])) -> Merge Join (cost=369560509.82..2101601943.38 rows=115469423800 width=52) (actual time=58760.353..58760.806 rows=77 loops=1) Merge Cond: (customerdetails.event_id = details2_transvalue.event_id) -> Sort (cost=24111.00..24149.28 rows=15312 width=8) (actual time=0.644..0.710 rows=187 loops=1) Sort Key: customerdetails.event_id Sort Method: quicksort Memory: 24kB -> Append (cost=0.00..23046.64 rows=15312 width=8) (actual time=0.130..0.461 rows=187 loops=1) -> Index Scan using events_eventdetails_substring_ind on events_eventdetails customerdetails (cost=0.00..457.37 rows=113 width=8) (actual time=0.021..0.021 rows=0 loops=1) Index Cond: (((keyname)::text = 'customer_id'::text) AND ("substring"(value, 0, 32) = '598124'::text)) -> Index Scan using events_eventdetails_customer_id_substring_ind on events_eventdetails_customer_id customerdetails (cost=0.00..22589.27 rows=15199 width=8) (actual time=0.107..0.319 rows=187 loops=1) Index Cond: (((keyname)::text = 'customer_id'::text) AND ("substring"(value, 0, 32) = '598124'::text)) -> Materialize (cost=369536398.82..388389165.24 rows=1508221314 width=60) (actual time=56515.482..58227.360 rows=986788 loops=1) -> Sort (cost=369536398.82..373306952.10 rows=1508221314 width=60) (actual time=56515.478..57357.833 rows=986788 loops=1) Sort Key: details2_transvalue.event_id Sort Method: external merge Disk: 69416kB -> Merge Join (cost=1181483.03..31350423.76 rows=1508221314 width=60) (actual time=42137.760..51804.819 rows=986808 loops=1) Merge Cond: (("substring"(details2_transkey.value, 0, 32)) = ("substring"(details2_transvalue.value, 0, 32))) -> Sort (cost=908652.98..909781.59 rows=451445 width=127) (actual time=25898.797..27330.921 rows=658637 loops=1) Sort Key: ("substring"(details2_transkey.value, 0, 32)) Sort Method: external merge Disk: 85584kB -> Hash Join (cost=621670.67..866252.84 rows=451445 width=127) (actual time=8238.959..15256.168 rows=658637 loops=1) Hash Cond: (details2_transkey.event_id = events2.id) -> Append (cost=16092.38..208184.56 rows=668175 width=83) (actual time=383.062..3180.853 rows=658638 loops=1) -> Bitmap Heap Scan on events_eventdetails details2_transkey (cost=16092.38..208184.56 rows=668175 width=83) (actual time=383.060..2755.386 rows=658638 loops=1) Recheck Cond: ((keyname)::text = 'transactionId'::text) -> Bitmap Index Scan on events_eventdetails_keyname_ind (cost=0.00..15925.33 rows=668175 width=0) (actual time=274.388..274.388 rows=658909 loops=1) Index Cond: ((keyname)::text = 'transactionId'::text) -> Hash (cost=548042.97..548042.97 rows=2641946 width=52) (actual time=7855.242..7855.242 rows=3711961 loops=1) -> Bitmap Heap Scan on events_events events2 (cost=75211.99..548042.97 rows=2641946 width=52) (actual time=1024.685..4581.588 rows=3711961 loops=1) Recheck Cond: (eventtype_id = ANY ('{100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108}'::bigint[])) -> Bitmap Index Scan on events_events_eventtype_id_ind (cost=0.00..74551.50 rows=2641946 width=0) (actual time=983.354..983.354 rows=3712003 loops=1) Index Cond: (eventtype_id = ANY ('{100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108}'::bigint[])) -> Sort (cost=272830.05..274500.49 rows=668175 width=83) (actual time=16238.940..16958.522 rows=986808 loops=1) Sort Key: ("substring"(details2_transvalue.value, 0, 32)) Sort Method: external sort Disk: 61784kB -> Result (cost=16092.38..208184.56 rows=668175 width=83) (actual time=391.124..4336.367 rows=658638 loops=1) -> Append (cost=16092.38..208184.56 rows=668175 width=83) (actual time=391.104..3130.494 rows=658638 loops=1) -> Bitmap Heap Scan on events_eventdetails details2_transvalue (cost=16092.38..208184.56 rows=668175 width=83) (actual time=391.103..2713.520 rows=658638 loops=1) Recheck Cond: ((keyname)::text = 'transactionId'::text) -> Bitmap Index Scan on events_eventdetails_keyname_ind (cost=0.00..15925.33 rows=668175 width=0) (actual time=283.327..283.327 rows=658909 loops=1) Index Cond: ((keyname)::text = 'transactionId'::text) Total runtime: 58869.397 ms -- View this message in context: http://postgresql.1045698.n5.nabble.com/Performance-trouble-finding-records-through-related-records-tp3405914p3405914.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On 03/01/2011 06:14 PM, sverhagen wrote: > Hi, appreciated mailing list. Thanks already for taking your time for my > performance question. Regards, Sander. > > > ===POSTGRESQL VERSION AND ORIGIN=== > > PostgreSQL 8.3.9 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.2.4 > (Ubuntu 4.2.4-1ubuntu3) > Installed using "apt-get install postgresql-8.3" > > > ===A DESCRIPTION OF WHAT YOU ARE TRYING TO ACHIEVE=== > > Query involving tables events_events and events_eventdetails. There is any > number of events_eventdetails records for each events_events record. > > There may be multiple records in events_events that have the same value for > their transactionId, which is available in one of their events_eventdetails > records. > > We want a total query that returns events_events records that match > condition I. or II., sorted by datetime descending, first 50. > > Condition I. > All events_events records for which an events_eventdetails records that > matches the following conditions: > - Column keyname (in events_eventdetails) equals "customerId", and > - Column value (in events_eventdetails) equals 598124, or more precisely > substring(customerDetails.value,0,32)='598124' > > Condition II. > All events_events records that have a same value for in one of their > events_eventdetails records with keyname 'transactionId' as any of the > resulting events_events records of condition I. > > In other words: I want all events for a certain customerId, and all events > with the same transactionId as those. > > The total query's time should be of the magnitude 100ms, but currently is of > the magnitude 1min. > > JUST FOR THE PURPOSE OF EXPERIMENT I've now a denormalized copy of > transactionId as a column in the events_events records. Been trying queries > on those, with no improvements. > > I am not seeking WHY my query is too slow, rather trying to find a way to > get it faster :-) > <much snippage> First off, excellent detail. Second, your explain analyze was hard to read... but since you are not really interested in your posted query, I wont worryabout looking at it... but... have you seen: http://explain.depesz.com/ Its nice. And last, to my questions: SELECT events1.id, events1.transactionId, events1.dateTime FROM events_events events1 JOIN events_eventdetails customerDetails ON events1.id = customerDetails.event_id AND customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='598124' WHERE events1.eventtype_id IN (100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108) UNION SELECT events2.id, events2.transactionId, events2.dateTime FROM events_events events2 JOIN events_eventdetails details2_transKey ON events2.id = details2_transKey.event_id AND details2_transKey.keyname='transactionId' JOIN events_eventdetails details2_transValue ON substring(details2_transKey.value,0,32) = substring(details2_transValue.value,0,32) AND details2_transValue.keyname='transactionId' JOIN events_eventdetails customerDetails ON details2_transValue.event_id = customerDetails.event_id AND customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='598124' WHERE events2.eventtype_id IN (100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108) ORDER BY dateTime DESC LIMIT 50 If you run the individual queries, without the union, are the part's slow too? Looked like your row counts (the estimate vs the actual) were way off, have you analyzed lately? I could not tell from the explain analyze if an index was used, but I notice you have a ton of indexes on events_events table. You have two indexes on the same fields but in reverse order: events_events_eventtype_id_datetime_ind (datetime, eventtype_id); events_events_datetime_eventtype_id_ind (eventtype_id, datetime); AND both eventtype_id and datetime are in other indexes! I think you need to review your indexes. Drop all of them andadd one or two that are actually useful. A useful tool I have found for complex queries is to break them down into smaller sub sets, write sql that get's me justthose sets, and them add them all back into one main query with subselects: select a,b,c,... from events_events where id in ( select id from details where some subset is needed ) and id not in ( select id frome details where some set is bad ) and id in ( select anotherid from anothertable where ... ) Its the subselects you need to think about. Find one that gets you a small set that's interesting somehow. Once you getall your little sets, its easy to combine them. -Andy
Thanks for your help already! Hope you're up for some more :-) Andy Colson wrote: > > First off, excellent detail. > > Second, your explain analyze was hard to read... but since you are not > really interested in your posted query, I wont worry about looking at > it... but... have you seen: > > http://explain.depesz.com/ > Thanks for that. Using it below :-) Andy Colson wrote: > > If you run the individual queries, without the union, are the part's slow > too? > Only problem is the second part. So that part can safely be isolated. Also the following does not play a role at this point: WHERE events2.eventtype_id IN (100,103,105,... Then I went ahead and denormalized the transactionId on both ends, so that both events_events records and events_eventdetails records have the transactionId (or NULL). That simplifies the query to this: SELECT events_events.* FROM events_events WHERE transactionid IN ( SELECT transactionid FROM events_eventdetails customerDetails WHERE customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='1957' AND transactionid IS NOT NULL ) ORDER BY id LIMIT 50; To no avail. Also changing the above WHERE IN into implicit/explicit JOIN's doesn't make more than a marginal difference. Should joining not be very efficient somehow? http://explain.depesz.com/s/Pnb The above link nicely shows the hotspots, but I am at a loss now as how to approach them. Andy Colson wrote: > > Looked like your row counts (the estimate vs the actual) were way off, > have you analyzed lately? > Note sure what that means. Isn't all the maintenance nicely automated through my config? Andy Colson wrote: > > I could not tell from the explain analyze if an index was used, but I > notice you have a ton of indexes on events_events table. > Yes, a ton of indexes, but still not the right one :-) -- View this message in context: http://postgresql.1045698.n5.nabble.com/Performance-trouble-finding-records-through-related-records-tp3405914p3407330.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On 03/02/2011 06:12 PM, sverhagen wrote: > Thanks for your help already! > Hope you're up for some more :-) > > > Andy Colson wrote: >> >> First off, excellent detail. >> >> Second, your explain analyze was hard to read... but since you are not >> really interested in your posted query, I wont worry about looking at >> it... but... have you seen: >> >> http://explain.depesz.com/ >> > > Thanks for that. Using it below :-) > > > Andy Colson wrote: >> >> If you run the individual queries, without the union, are the part's slow >> too? >> > > Only problem is the second part. So that part can safely be isolated. Also > the following does not play a role at this point: WHERE events2.eventtype_id > IN > (100,103,105,... > > Then I went ahead and denormalized the transactionId on both ends, so that > both events_events records and events_eventdetails records have the > transactionId (or NULL). That simplifies the query to this: > > SELECT events_events.* FROM events_events WHERE transactionid IN ( > SELECT transactionid FROM events_eventdetails customerDetails > WHERE customerDetails.keyname='customer_id' > AND substring(customerDetails.value,0,32)='1957' > AND transactionid IS NOT NULL > ) ORDER BY id LIMIT 50; > > To no avail. Also changing the above WHERE IN into implicit/explicit JOIN's > doesn't make more than a marginal difference. Should joining not be very > efficient somehow? > > http://explain.depesz.com/s/Pnb > > The above link nicely shows the hotspots, but I am at a loss now as how to > approach them. > > > Andy Colson wrote: >> >> Looked like your row counts (the estimate vs the actual) were way off, >> have you analyzed lately? >> > > Note sure what that means. > Isn't all the maintenance nicely automated through my config? > > In the explain analyze you'll see stuff like: Append (cost=0.00..3256444445.93 rows=115469434145 width=52) (actual time=0.304..58763.738 rows=222 loops=1) This is taken from your first email. Red flags should go off when the row counts are not close. The first set is the planner'sguess. The second set is what actually happened. The planner thought there would be 115,469,434,145 rows.. butturned out to only be 222. That's usually caused by bad stats. > Isn't all the maintenance nicely automated through my config? > I'd never assume. But the numbers in the plan you posted: > http://explain.depesz.com/s/Pnb look fine to me (well, the row counts), and I didnt look to much at that plan in the first email, so we can probably ignoreit. > Andy Colson wrote: >> >> I could not tell from the explain analyze if an index was used, but I >> notice you have a ton of indexes on events_events table. >> > > Yes, a ton of indexes, but still not the right one :-) But... many indexes will slow down update/inserts. And an index on an unselective field can cause more problems than itwould help. Especially if the stats are off. If PG has lots and lots of options, it'll take longer to plan querys too. If it picks an index to use, that it thinks is selective, but in reality is not, you are in for a world of hurt. For your query, I think a join would be the best bet, can we see its explain analyze? -Andy
Andy Colson wrote: > > For your query, I think a join would be the best bet, can we see its > explain analyze? > Here is a few variations: SELECT events_events.* FROM events_events WHERE transactionid IN ( SELECT transactionid FROM events_eventdetails customerDetails WHERE customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='1957' AND transactionid IS NOT NULL ) ORDER BY id LIMIT 50; -- http://explain.depesz.com/s/Pnb explain analyze SELECT events_events.* FROM events_events, events_eventdetails customerDetails WHERE events_events.transactionid = customerDetails.transactionid AND customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='1957' AND customerDetails.transactionid IS NOT NULL ORDER BY id LIMIT 50; -- http://explain.depesz.com/s/rDh explain analyze SELECT events_events.* FROM events_events JOIN events_eventdetails customerDetails ON events_events.transactionid = customerDetails.transactionid AND customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='1957' AND customerDetails.transactionid IS NOT NULL ORDER BY id LIMIT 50; -- http://explain.depesz.com/s/6aB Thanks for your efforts! -- View this message in context: http://postgresql.1045698.n5.nabble.com/Performance-trouble-finding-records-through-related-records-tp3405914p3407689.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On 3/3/2011 3:19 AM, sverhagen wrote: > > Andy Colson wrote: >> >> For your query, I think a join would be the best bet, can we see its >> explain analyze? >> > > > Here is a few variations: > > > SELECT events_events.* FROM events_events WHERE transactionid IN ( > SELECT transactionid FROM events_eventdetails customerDetails > WHERE customerDetails.keyname='customer_id' > AND substring(customerDetails.value,0,32)='1957' > AND transactionid IS NOT NULL > ) ORDER BY id LIMIT 50; > > -- http://explain.depesz.com/s/Pnb > > > explain analyze SELECT events_events.* FROM events_events, > events_eventdetails customerDetails > WHERE events_events.transactionid = customerDetails.transactionid > AND customerDetails.keyname='customer_id' > AND substring(customerDetails.value,0,32)='1957' > AND customerDetails.transactionid IS NOT NULL > ORDER BY id LIMIT 50; > > -- http://explain.depesz.com/s/rDh > > > explain analyze SELECT events_events.* FROM events_events > JOIN events_eventdetails customerDetails > ON events_events.transactionid = customerDetails.transactionid > AND customerDetails.keyname='customer_id' > AND substring(customerDetails.value,0,32)='1957' > AND customerDetails.transactionid IS NOT NULL > ORDER BY id LIMIT 50; > > -- http://explain.depesz.com/s/6aB > > > Thanks for your efforts! > Huh. Pretty much exactly the same. I'm sorry but I think I'm at my limit. I'm not sure why the nested loop takes so long, or how to get it to use something different. -Andy
On Thu, Mar 3, 2011 at 9:55 AM, Andy Colson <andy@squeakycode.net> wrote: >> explain analyze SELECT events_events.* FROM events_events >> JOIN events_eventdetails customerDetails >> ON events_events.transactionid = customerDetails.transactionid >> AND customerDetails.keyname='customer_id' >> AND substring(customerDetails.value,0,32)='1957' >> AND customerDetails.transactionid IS NOT NULL >> ORDER BY id LIMIT 50; >> >> -- http://explain.depesz.com/s/6aB >> >> >> Thanks for your efforts! >> > > Huh. Pretty much exactly the same. I'm sorry but I think I'm at my limit. > I'm not sure why the nested loop takes so long, or how to get it to use > something different. The join condition is showing up in the explain output as: Join Filter: ((events_events.transactionid)::text = (customerdetails.transactionid)::text) Now why is there a cast to text there on both sides? Do those two columns have exactly the same datatype? If not, you probably want to fix that, as it can make a big difference. Also, how many rows are there in events_events and how many in events_eventdetails? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 2, 2011 at 6:12 PM, sverhagen <sverhagen@wps-nl.com> wrote: > Thanks for your help already! > Hope you're up for some more :-) > > > Andy Colson wrote: >> >> First off, excellent detail. >> >> Second, your explain analyze was hard to read... but since you are not >> really interested in your posted query, I wont worry about looking at >> it... but... have you seen: >> >> http://explain.depesz.com/ >> > > Thanks for that. Using it below :-) > > > Andy Colson wrote: >> >> If you run the individual queries, without the union, are the part's slow >> too? >> > > Only problem is the second part. So that part can safely be isolated. Also > the following does not play a role at this point: WHERE events2.eventtype_id > IN > (100,103,105,... > > Then I went ahead and denormalized the transactionId on both ends, so that > both events_events records and events_eventdetails records have the > transactionId (or NULL). That simplifies the query to this: > > SELECT events_events.* FROM events_events WHERE transactionid IN ( > SELECT transactionid FROM events_eventdetails customerDetails > WHERE customerDetails.keyname='customer_id' > AND substring(customerDetails.value,0,32)='1957' > AND transactionid IS NOT NULL > ) ORDER BY id LIMIT 50; 8.3? try converting the above to WHERE EXISTS or (even better) a JOIN... merlin
Hi. Thanks for your response. Robert Haas wrote: > > Join Filter: ((events_events.transactionid)::text = > (customerdetails.transactionid)::text) > > Now why is there a cast to text there on both sides? Do those two > columns have exactly the same datatype? If not, you probably want to > fix that, as it can make a big difference. > Good question. I seem not able to get rid of that, even though these are same type: =# \d events_events Table "public.events_events" Column | Type | Modifiers ----------------------+--------------------------+---------- [snip] transactionid | character varying(36) | not null [snip] =# \d events_eventdetails Table "public.events_eventdetails" Column | Type | Modifiers ---------------+------------------------+---------- [snip] transactionid | character varying(36) | not null [snip] (These columns allowing null or not is just something I've been playing with to no avail too.) Robert Haas wrote: > > Also, how many rows are there in events_events and how many in > events_eventdetails? > select count(*) from events_events; --> 3910163 select count(*) from events_eventdetails; --> 30216033 select count(*) from events_eventdetails_customer_id; (single partition) --> 2976101 -- View this message in context: http://postgresql.1045698.n5.nabble.com/Performance-trouble-finding-records-through-related-records-tp3405914p3413801.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Merlin Moncure-2 wrote: > > > 8.3? try converting the above to WHERE EXISTS or (even better) a JOIN... > > Thanks for that. But in my Mar 03, 2011; 10:19am post I already broke it down to the barebones with some variations, among which JOIN. The EXISTS IN variation was so poor that I left that one out. Best regards, Sander. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Performance-trouble-finding-records-through-related-records-tp3405914p3413814.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Hi, all. I've done some further analysis, found a form that works if I split things over two separate queries (B1 and B2, below) but still trouble when combining (B, below). This is the full pseudo-query: SELECT FROM A UNION SELECT FROM B ORDER BY dateTime DESC LIMIT 50 In that pseudo-query: - A is fast (few ms). A is all events for the given customer - B is slow (1 minute). B is all events for the same transactions as all events for the given customer Zooming in on B it looks originally as follows: SELECT events2.id, events2.transactionId, events2.dateTime FROM events_events events2 JOIN events_eventdetails details2_transKey ON events2.id = details2_transKey.event_id AND details2_transKey.keyname='transactionId' JOIN events_eventdetails details2_transValue ON substring(details2_transKey.value,0,32) = substring(details2_transValue.value,0,32) AND details2_transValue.keyname='transactionId' JOIN events_eventdetails customerDetails ON details2_transValue.event_id = customerDetails.event_id AND customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='598124' WHERE events2.eventtype_id IN (100,103,105,... et cetera ...) The above version of B is tremendously slow. The only fast version I've yet come to find is as follows: - Do a sub-query B1 - Do a sub-query B2 with the results of B1 B1 looks as follows: Works very fast (few ms) http://explain.depesz.com/s/7JS SELECT substring(details2_transValue.value,0,32) FROM events_eventdetails_customer_id customerDetails JOIN only events_eventdetails details2_transValue USING (event_id) WHERE customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='49' AND details2_transValue.keyname='transactionId' B2 looks as follows: Works very fast (few ms) http://explain.depesz.com/s/jGO SELECT events2.id, events2.dateTime FROM events_events events2 JOIN events_eventdetails details2_transKey ON events2.id = details2_transKey.event_id AND details2_transKey.keyname='transactionId' AND substring(details2_transKey.value,0,32) IN (... results of B1 ...) AND events2.eventtype_id IN (100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108) The combined version of B works slow again (3-10 seconds): http://explain.depesz.com/s/9oM SELECT events2.id, events2.dateTime FROM events_events events2 JOIN events_eventdetails details2_transKey ON events2.id = details2_transKey.event_id AND details2_transKey.keyname='transactionId' AND substring(details2_transKey.value,0,32) IN ( SELECT substring(details2_transValue.value,0,32) FROM events_eventdetails_customer_id customerDetails JOIN only events_eventdetails details2_transValue USING (event_id) WHERE customerDetails.keyname='customer_id' AND substring(customerDetails.value,0,32)='49' AND details2_transValue.keyname='transactionId') AND events2.eventtype_id IN (100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108) At the moment I see not other conclusion than to offer B1 and B2 to the database separately, but it feels like defeat :-| -- View this message in context: http://postgresql.1045698.n5.nabble.com/Performance-trouble-finding-records-through-related-records-tp3405914p3423334.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.