Thread: From Simple to Complex
My slow query today is somewhat more complex than yesterday's, but I'm hopeful it can be improved. Here's the query: SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments JOIN emotions USING (moment_id) WHERE moments.inserted > 'today' AND moments.tableoid = pg_class.oid GROUP BY relname, emotion ORDER BY relname, emotion; As you'll see below, moments is inherited by a number of other tables and the purpose of relname is to see which one. Meanwhile, emotions inherits feedback. Here's the Full Table and Index Schema: CREATE TABLE moments ( moment_id character(24) NOT NULL DEFAULT to_char(now(), 'JHH24MISSUS'::text), block_id character(24) NOT NULL, inserted timestamp without time zone NOT NULL DEFAULT now(), CONSTRAINT moments_pkey PRIMARY KEY (moment_id ) ) WITH ( OIDS=FALSE ); CREATE INDEX moments_block_id_idx ON moments USING btree (block_id ); CREATE INDEX moments_inserted_idx ON moments USING btree (inserted ); CREATE TABLE feedback ( feedback_id character(24) NOT NULL, user_id character(24) NOT NULL, moment_id character(24) NOT NULL, created timestamp without time zone, inserted timestamp without time zone NOT NULL DEFAULT now(), lnglat point, CONSTRAINT feedback_pkey PRIMARY KEY (feedback_id ) ) WITH ( OIDS=FALSE ); CREATE INDEX feedback_lnglat_idx ON feedback USING gist (lnglat ); CREATE INDEX feedback_moment_id_idx ON feedback USING btree (moment_id ); CREATE TABLE emotions ( -- Inherited from table feedback: feedback_id character(24) NOT NULL, -- Inherited from table feedback: user_id character(24) NOT NULL, -- Inherited from table feedback: moment_id character(24) NOT NULL, -- Inherited from table feedback: created timestamp without time zone, -- Inherited from table feedback: inserted timestamp without time zone NOT NULL DEFAULT now(), emotion character varying NOT NULL, -- Inherited from table : lnglat point, CONSTRAINT emotions_pkey PRIMARY KEY (feedback_id ) ) INHERITS (feedback) WITH ( OIDS=FALSE ); CREATE INDEX emotions_emotion_idx ON emotions USING btree (emotion ); Here's the results from EXPLAIN ANALYZE: "Sort (cost=309717.70..309718.43 rows=1460 width=94) (actual time=60462.534..60462.544 rows=25 loops=1)" " Sort Key: pg_class.relname, emotions.emotion" " Sort Method: quicksort Memory: 20kB" " -> HashAggregate (cost=309697.24..309702.35 rows=1460 width=94) (actual time=60462.457..60462.476 rows=25 loops=1)" " -> Hash Join (cost=133154.62..308963.70 rows=489024 width=94) (actual time=26910.488..60031.589 rows=194642 loops=1)" " Hash Cond: (public.moments.tableoid = pg_class.oid)" " -> Hash Join (cost=133144.72..307119.96 rows=489024 width=34) (actual time=26909.984..59434.137 rows=194642 loops=1)" " Hash Cond: (public.moments.moment_id = emotions.moment_id)" " -> Append (cost=0.00..114981.64 rows=119665 width=29) (actual time=883.153..21696.939 rows=357565 loops=1)" " -> Seq Scan on moments (cost=0.00..0.00 rows=1 width=104) (actual time=0.000..0.000 rows=0 loops=1)" " Filter: (inserted > '2012-01-31 00:00:00'::timestamp without time zone)" " -> Seq Scan on thoughts moments (cost=0.00..38856.88 rows=44388 width=29) (actual time=883.150..9040.959 rows=115436 loops=1)" " Filter: (inserted > '2012-01-31 00:00:00'::timestamp without time zone)" " -> Seq Scan on photos moments (cost=0.00..29635.78 rows=194 width=29) (actual time=5329.700..5827.447 rows=116420 loops=1)" " Filter: (inserted > '2012-01-31 00:00:00'::timestamp without time zone)" " -> Seq Scan on music moments (cost=0.00..9371.88 rows=19070 width=29) (actual time=354.147..383.266 rows=37248 loops=1)" " Filter: (inserted > '2012-01-31 00:00:00'::timestamp without time zone)" " -> Seq Scan on people moments (cost=0.00..5945.26 rows=27 width=29) (actual time=185.393..185.393 rows=0 loops=1)" " Filter: (inserted > '2012-01-31 00:00:00'::timestamp without time zone)" " -> Seq Scan on places moments (cost=0.00..24551.03 rows=54961 width=29) (actual time=5224.044..5324.517 rows=85564 loops=1)" " Filter: (inserted > '2012-01-31 00:00:00'::timestamp without time zone)" " -> Seq Scan on videos moments (cost=0.00..981.31 rows=734 width=29) (actual time=21.075..28.735 rows=2897 loops=1)" " Filter: (inserted > '2012-01-31 00:00:00'::timestamp without time zone)" " -> Seq Scan on facebook_people moments (cost=0.00..10.84 rows=80 width=104) (actual time=0.001..0.001 rows=0 loops=1)" " Filter: (inserted > '2012-01-31 00:00:00'::timestamp without time zone)" " -> Seq Scan on address_people moments (cost=0.00..10.84 rows=80 width=104) (actual time=0.005..0.005 rows=0 loops=1)" " Filter: (inserted > '2012-01-31 00:00:00'::timestamp without time zone)" " -> Seq Scan on path_people moments (cost=0.00..5606.79 rows=30 width=29) (actual time=211.166..211.166 rows=0 loops=1)" " Filter: (inserted > '2012-01-31 00:00:00'::timestamp without time zone)" " -> Seq Scan on sleep moments (cost=0.00..11.05 rows=100 width=104) (actual time=0.002..0.002 rows=0 loops=1)" " Filter: (inserted > '2012-01-31 00:00:00'::timestamp without time zone)" " -> Hash (cost=79292.49..79292.49 rows=4059496 width=55) (actual time=25757.998..25757.998 rows=4058642 loops=1)" " Buckets: 262144 Batches: 4 Memory Usage: 75211kB" " -> Seq Scan on emotions (cost=0.00..79292.49 rows=4059496 width=55) (actual time=0.012..15969.981 rows=4058642 loops=1)" " -> Hash (cost=8.88..8.88 rows=292 width=68) (actual time=0.487..0.487 rows=319 loops=1)" " Buckets: 1024 Batches: 1 Memory Usage: 28kB" " -> Seq Scan on pg_class (cost=0.00..8.88 rows=292 width=68) (actual time=0.013..0.234 rows=319 loops=1)" "Total runtime: 60601.612 ms" Postgres version: is still 9.0.5 History: N/A (This is the first time I've run this query.) Hardware: 1.7 GB Cache and other things you'd expect from a Ronin instance of a Heroku Postgres database. Maintenance Setup: What Heroku does. As before, vacuum should not be relevant as there are no deletes or even updates (just inserts and selects) WAL Configuration: I still don't know. Heroku hosts the database on Amazon's servers, so maybe that answers the question? GUC Settings: As per the yesterday's discussion, I reduced random_page_cost to 2. Other than that, it's all default. Bonus question: If that was too simple, here's something even more complex I'd like to do: I have another table that inherits feedback called "comments". Ideally, rather than an "emotion" column coming out, I would like to have a "feedback_type" column that would be either the value in the emotion column of the emotions table, or "comment" if it's from the comments table. I'm thinking I'm going to have to simply put that together on the client, but if I can do that in a single query (that doesn't take an hour to run) that would be super cool. But that's definitely secondary compared to getting the above query to run faster. Thank you very much for any help! -Alessandro Gagliardi
Looks like I missed a key sentence in http://www.postgresql.org/docs/9.0/static/ddl-inherit.html which states: "A serious limitation of the inheritance feature is that indexes (including unique constraints) and foreign key constraints only apply to single tables, not to their inheritance children."
I should have realized that as I exploited that "limitation" in three of my tables. Gradually adding those indices now; will report on what kind of difference it makes....
On Tue, Jan 31, 2012 at 1:22 PM, Alessandro Gagliardi <alessandro@path.com> wrote:
My slow query today is somewhat more complex than yesterday's, but I'm
hopeful it can be improved. Here's the query:
SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments
JOIN emotions USING (moment_id)
WHERE moments.inserted > 'today' AND moments.tableoid = pg_class.oid
GROUP BY relname, emotion ORDER BY relname, emotion;
As you'll see below, moments is inherited by a number of other tables
and the purpose of relname is to see which one. Meanwhile, emotions
inherits feedback.
Here's the Full Table and Index Schema:
CREATE TABLE moments
(
moment_id character(24) NOT NULL DEFAULT to_char(now(), 'JHH24MISSUS'::text),
block_id character(24) NOT NULL,
inserted timestamp without time zone NOT NULL DEFAULT now(),
CONSTRAINT moments_pkey PRIMARY KEY (moment_id )
)
WITH (
OIDS=FALSE
);
CREATE INDEX moments_block_id_idx
ON moments
USING btree
(block_id );
CREATE INDEX moments_inserted_idx
ON moments
USING btree
(inserted );
CREATE TABLE feedback
(
feedback_id character(24) NOT NULL,
user_id character(24) NOT NULL,
moment_id character(24) NOT NULL,
created timestamp without time zone,
inserted timestamp without time zone NOT NULL DEFAULT now(),
lnglat point,
CONSTRAINT feedback_pkey PRIMARY KEY (feedback_id )
)
WITH (
OIDS=FALSE
);
CREATE INDEX feedback_lnglat_idx
ON feedback
USING gist
(lnglat );
CREATE INDEX feedback_moment_id_idx
ON feedback
USING btree
(moment_id );
CREATE TABLE emotions
(
-- Inherited from table feedback: feedback_id character(24) NOT NULL,
-- Inherited from table feedback: user_id character(24) NOT NULL,
-- Inherited from table feedback: moment_id character(24) NOT NULL,
-- Inherited from table feedback: created timestamp without time zone,
-- Inherited from table feedback: inserted timestamp without time
zone NOT NULL DEFAULT now(),
emotion character varying NOT NULL,
-- Inherited from table : lnglat point,
CONSTRAINT emotions_pkey PRIMARY KEY (feedback_id )
)
INHERITS (feedback)
WITH (
OIDS=FALSE
);
CREATE INDEX emotions_emotion_idx
ON emotions
USING btree
(emotion );
Here's the results from EXPLAIN ANALYZE:
"Sort (cost=309717.70..309718.43 rows=1460 width=94) (actual
time=60462.534..60462.544 rows=25 loops=1)"
" Sort Key: pg_class.relname, emotions.emotion"
" Sort Method: quicksort Memory: 20kB"
" -> HashAggregate (cost=309697.24..309702.35 rows=1460 width=94)
(actual time=60462.457..60462.476 rows=25 loops=1)"
" -> Hash Join (cost=133154.62..308963.70 rows=489024
width=94) (actual time=26910.488..60031.589 rows=194642 loops=1)"
" Hash Cond: (public.moments.tableoid = pg_class.oid)"
" -> Hash Join (cost=133144.72..307119.96 rows=489024
width=34) (actual time=26909.984..59434.137 rows=194642 loops=1)"
" Hash Cond: (public.moments.moment_id = emotions.moment_id)"
" -> Append (cost=0.00..114981.64 rows=119665
width=29) (actual time=883.153..21696.939 rows=357565 loops=1)"
" -> Seq Scan on moments (cost=0.00..0.00
rows=1 width=104) (actual time=0.000..0.000 rows=0 loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on thoughts moments
(cost=0.00..38856.88 rows=44388 width=29) (actual
time=883.150..9040.959 rows=115436 loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on photos moments
(cost=0.00..29635.78 rows=194 width=29) (actual
time=5329.700..5827.447 rows=116420 loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on music moments
(cost=0.00..9371.88 rows=19070 width=29) (actual time=354.147..383.266
rows=37248 loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on people moments
(cost=0.00..5945.26 rows=27 width=29) (actual time=185.393..185.393
rows=0 loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on places moments
(cost=0.00..24551.03 rows=54961 width=29) (actual
time=5224.044..5324.517 rows=85564 loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on videos moments
(cost=0.00..981.31 rows=734 width=29) (actual time=21.075..28.735
rows=2897 loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on facebook_people moments
(cost=0.00..10.84 rows=80 width=104) (actual time=0.001..0.001 rows=0
loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on address_people moments
(cost=0.00..10.84 rows=80 width=104) (actual time=0.005..0.005 rows=0
loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on path_people moments
(cost=0.00..5606.79 rows=30 width=29) (actual time=211.166..211.166
rows=0 loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on sleep moments
(cost=0.00..11.05 rows=100 width=104) (actual time=0.002..0.002 rows=0
loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Hash (cost=79292.49..79292.49 rows=4059496
width=55) (actual time=25757.998..25757.998 rows=4058642 loops=1)"
" Buckets: 262144 Batches: 4 Memory Usage: 75211kB"
" -> Seq Scan on emotions
(cost=0.00..79292.49 rows=4059496 width=55) (actual
time=0.012..15969.981 rows=4058642 loops=1)"
" -> Hash (cost=8.88..8.88 rows=292 width=68) (actual
time=0.487..0.487 rows=319 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 28kB"
" -> Seq Scan on pg_class (cost=0.00..8.88
rows=292 width=68) (actual time=0.013..0.234 rows=319 loops=1)"
"Total runtime: 60601.612 ms"
Postgres version: is still 9.0.5
History: N/A (This is the first time I've run this query.)
Hardware: 1.7 GB Cache and other things you'd expect from a Ronin
instance of a Heroku Postgres database.
Maintenance Setup: What Heroku does. As before, vacuum should not be
relevant as there are no deletes or even updates (just inserts and
selects)
WAL Configuration: I still don't know. Heroku hosts the database on
Amazon's servers, so maybe that answers the question?
GUC Settings: As per the yesterday's discussion, I reduced
random_page_cost to 2. Other than that, it's all default.
Bonus question: If that was too simple, here's something even more
complex I'd like to do: I have another table that inherits feedback
called "comments". Ideally, rather than an "emotion" column coming
out, I would like to have a "feedback_type" column that would be
either the value in the emotion column of the emotions table, or
"comment" if it's from the comments table. I'm thinking I'm going to
have to simply put that together on the client, but if I can do that
in a single query (that doesn't take an hour to run) that would be
super cool. But that's definitely secondary compared to getting the
above query to run faster.
Thank you very much for any help!
-Alessandro Gagliardi
I changed the query a bit so the results would not change over the course of the day to: SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments JOIN emotions USING (moment_id) WHERE moments.inserted BETWEEN 'yesterday' AND 'today' AND moments.tableoid = pg_class.oid GROUP BY relname, emotion ORDER BY relname, emotion; Adding the indices means that I am now doing index scans instead of seq scans but it doesn't seem to help with speed. Here are the new EXPLAIN ANALYZE results: "Sort (cost=174432.85..174433.58 rows=1460 width=94) (actual time=73440.079..73440.088 rows=25 loops=1)" " Sort Key: pg_class.relname, emotions.emotion" " Sort Method: quicksort Memory: 20kB" " -> HashAggregate (cost=174412.39..174417.50 rows=1460 width=94) (actual time=73437.905..73437.940 rows=25 loops=1)" " -> Merge Join (cost=27888.98..172032.86 rows=1586355 width=94) (actual time=65563.027..72763.848 rows=245917 loops=1)" " Merge Cond: (emotions.moment_id = public.moments.moment_id)" " -> Index Scan using emotions_moment_id_idx on emotions (cost=0.00..135759.78 rows=4077358 width=55) (actual time=1.283..43894.799 rows=3841095 loops=1)" " -> Sort (cost=27888.98..28083.07 rows=388184 width=89) (actual time=16556.348..17384.537 rows=521025 loops=1)" " Sort Key: public.moments.moment_id" " Sort Method: quicksort Memory: 60865kB" " -> Hash Join (cost=9.90..20681.81 rows=388184 width=89) (actual time=2.612..4309.131 rows=396594 loops=1)" " Hash Cond: (public.moments.tableoid = pg_class.oid)" " -> Append (cost=0.00..19216.22 rows=388184 width=29) (actual time=2.066..2851.885 rows=396594 loops=1)" " -> Seq Scan on moments (cost=0.00..0.00 rows=1 width=104) (actual time=0.002..0.002 rows=0 loops=1)" " Filter: ((inserted >= '2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <= '2012-01-31 00:00:00'::timestamp without time zone))" " -> Index Scan using thoughts_inserted_idx on thoughts moments (cost=0.00..6146.96 rows=136903 width=29) (actual time=2.063..606.584 rows=130884 loops=1)" " Index Cond: ((inserted >= '2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <= '2012-01-31 00:00:00'::timestamp without time zone))" " -> Index Scan using photos_inserted_idx on photos moments (cost=0.00..4975.46 rows=109900 width=29) (actual time=1.542..836.063 rows=128286 loops=1)" " Index Cond: ((inserted >= '2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <= '2012-01-31 00:00:00'::timestamp without time zone))" " -> Index Scan using music_inserted_idx on music moments (cost=0.00..3102.69 rows=40775 width=29) (actual time=0.756..308.031 rows=41176 loops=1)" " Index Cond: ((inserted >= '2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <= '2012-01-31 00:00:00'::timestamp without time zone))" " -> Index Scan using people_inserted_idx on people moments (cost=0.00..4.07 rows=1 width=29) (actual time=0.015..0.015 rows=0 loops=1)" " Index Cond: ((inserted >= '2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <= '2012-01-31 00:00:00'::timestamp without time zone))" " -> Index Scan using places_inserted_idx on places moments (cost=0.00..4125.65 rows=96348 width=29) (actual time=0.066..263.853 rows=92756 loops=1)" " Index Cond: ((inserted >= '2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <= '2012-01-31 00:00:00'::timestamp without time zone))" " -> Bitmap Heap Scan on videos moments (cost=29.56..835.20 rows=3660 width=29) (actual time=3.122..87.889 rows=3492 loops=1)" " Recheck Cond: ((inserted >= '2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <= '2012-01-31 00:00:00'::timestamp without time zone))" " -> Bitmap Index Scan on videos_inserted_idx (cost=0.00..29.37 rows=3660 width=0) (actual time=0.696..0.696 rows=3492 loops=1)" " Index Cond: ((inserted >= '2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <= '2012-01-31 00:00:00'::timestamp without time zone))" " -> Seq Scan on facebook_people moments (cost=0.00..1.04 rows=1 width=104) (actual time=0.040..0.040 rows=0 loops=1)" " Filter: ((inserted >= '2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <= '2012-01-31 00:00:00'::timestamp without time zone))" " -> Index Scan using address_people_inserted_idx on address_people moments (cost=0.00..4.06 rows=1 width=29) (actual time=0.017..0.017 rows=0 loops=1)" " Index Cond: ((inserted >= '2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <= '2012-01-31 00:00:00'::timestamp without time zone))" " -> Index Scan using path_people_inserted_idx on path_people moments (cost=0.00..17.03 rows=593 width=29) (actual time=1.758..1.758 rows=0 loops=1)" " Index Cond: ((inserted >= '2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <= '2012-01-31 00:00:00'::timestamp without time zone))" " -> Index Scan using sleep_inserted_idx on sleep moments (cost=0.00..4.06 rows=1 width=29) (actual time=0.012..0.012 rows=0 loops=1)" " Index Cond: ((inserted >= '2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <= '2012-01-31 00:00:00'::timestamp without time zone))" " -> Hash (cost=8.88..8.88 rows=292 width=68) (actual time=0.520..0.520 rows=334 loops=1)" " Buckets: 1024 Batches: 1 Memory Usage: 29kB" " -> Seq Scan on pg_class (cost=0.00..8.88 rows=292 width=68) (actual time=0.007..0.257 rows=334 loops=1)" "Total runtime: 73511.072 ms" Please let me know if there is any way to make this more efficient. Thank you, -Alessandro On Tue, Jan 31, 2012 at 2:10 PM, Alessandro Gagliardi <alessandro@path.com> wrote: > > Looks like I missed a key sentence in http://www.postgresql.org/docs/9.0/static/ddl-inherit.html which states: "A seriouslimitation of the inheritance feature is that indexes (including unique constraints) and foreign key constraints onlyapply to single tables, not to their inheritance children." > I should have realized that as I exploited that "limitation" in three of my tables. Gradually adding those indices now;will report on what kind of difference it makes.... > > On Tue, Jan 31, 2012 at 1:22 PM, Alessandro Gagliardi <alessandro@path.com> wrote: >> >> My slow query today is somewhat more complex than yesterday's, but I'm >> hopeful it can be improved. Here's the query: >> >> SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments >> JOIN emotions USING (moment_id) >> WHERE moments.inserted > 'today' AND moments.tableoid = pg_class.oid >> GROUP BY relname, emotion ORDER BY relname, emotion; >> >> As you'll see below, moments is inherited by a number of other tables >> and the purpose of relname is to see which one. Meanwhile, emotions >> inherits feedback. >> >> Here's the Full Table and Index Schema: >> >> CREATE TABLE moments >> ( >> moment_id character(24) NOT NULL DEFAULT to_char(now(), 'JHH24MISSUS'::text), >> block_id character(24) NOT NULL, >> inserted timestamp without time zone NOT NULL DEFAULT now(), >> CONSTRAINT moments_pkey PRIMARY KEY (moment_id ) >> ) >> WITH ( >> OIDS=FALSE >> ); >> >> CREATE INDEX moments_block_id_idx >> ON moments >> USING btree >> (block_id ); >> >> CREATE INDEX moments_inserted_idx >> ON moments >> USING btree >> (inserted ); >> >> CREATE TABLE feedback >> ( >> feedback_id character(24) NOT NULL, >> user_id character(24) NOT NULL, >> moment_id character(24) NOT NULL, >> created timestamp without time zone, >> inserted timestamp without time zone NOT NULL DEFAULT now(), >> lnglat point, >> CONSTRAINT feedback_pkey PRIMARY KEY (feedback_id ) >> ) >> WITH ( >> OIDS=FALSE >> ); >> >> CREATE INDEX feedback_lnglat_idx >> ON feedback >> USING gist >> (lnglat ); >> >> CREATE INDEX feedback_moment_id_idx >> ON feedback >> USING btree >> (moment_id ); >> >> CREATE TABLE emotions >> ( >> -- Inherited from table feedback: feedback_id character(24) NOT NULL, >> -- Inherited from table feedback: user_id character(24) NOT NULL, >> -- Inherited from table feedback: moment_id character(24) NOT NULL, >> -- Inherited from table feedback: created timestamp without time zone, >> -- Inherited from table feedback: inserted timestamp without time >> zone NOT NULL DEFAULT now(), >> emotion character varying NOT NULL, >> -- Inherited from table : lnglat point, >> CONSTRAINT emotions_pkey PRIMARY KEY (feedback_id ) >> ) >> INHERITS (feedback) >> WITH ( >> OIDS=FALSE >> ); >> >> CREATE INDEX emotions_emotion_idx >> ON emotions >> USING btree >> (emotion ); >> >> Here's the results from EXPLAIN ANALYZE: >> >> "Sort (cost=309717.70..309718.43 rows=1460 width=94) (actual >> time=60462.534..60462.544 rows=25 loops=1)" >> " Sort Key: pg_class.relname, emotions.emotion" >> " Sort Method: quicksort Memory: 20kB" >> " -> HashAggregate (cost=309697.24..309702.35 rows=1460 width=94) >> (actual time=60462.457..60462.476 rows=25 loops=1)" >> " -> Hash Join (cost=133154.62..308963.70 rows=489024 >> width=94) (actual time=26910.488..60031.589 rows=194642 loops=1)" >> " Hash Cond: (public.moments.tableoid = pg_class.oid)" >> " -> Hash Join (cost=133144.72..307119.96 rows=489024 >> width=34) (actual time=26909.984..59434.137 rows=194642 loops=1)" >> " Hash Cond: (public.moments.moment_id = emotions.moment_id)" >> " -> Append (cost=0.00..114981.64 rows=119665 >> width=29) (actual time=883.153..21696.939 rows=357565 loops=1)" >> " -> Seq Scan on moments (cost=0.00..0.00 >> rows=1 width=104) (actual time=0.000..0.000 rows=0 loops=1)" >> " Filter: (inserted > '2012-01-31 >> 00:00:00'::timestamp without time zone)" >> " -> Seq Scan on thoughts moments >> (cost=0.00..38856.88 rows=44388 width=29) (actual >> time=883.150..9040.959 rows=115436 loops=1)" >> " Filter: (inserted > '2012-01-31 >> 00:00:00'::timestamp without time zone)" >> " -> Seq Scan on photos moments >> (cost=0.00..29635.78 rows=194 width=29) (actual >> time=5329.700..5827.447 rows=116420 loops=1)" >> " Filter: (inserted > '2012-01-31 >> 00:00:00'::timestamp without time zone)" >> " -> Seq Scan on music moments >> (cost=0.00..9371.88 rows=19070 width=29) (actual time=354.147..383.266 >> rows=37248 loops=1)" >> " Filter: (inserted > '2012-01-31 >> 00:00:00'::timestamp without time zone)" >> " -> Seq Scan on people moments >> (cost=0.00..5945.26 rows=27 width=29) (actual time=185.393..185.393 >> rows=0 loops=1)" >> " Filter: (inserted > '2012-01-31 >> 00:00:00'::timestamp without time zone)" >> " -> Seq Scan on places moments >> (cost=0.00..24551.03 rows=54961 width=29) (actual >> time=5224.044..5324.517 rows=85564 loops=1)" >> " Filter: (inserted > '2012-01-31 >> 00:00:00'::timestamp without time zone)" >> " -> Seq Scan on videos moments >> (cost=0.00..981.31 rows=734 width=29) (actual time=21.075..28.735 >> rows=2897 loops=1)" >> " Filter: (inserted > '2012-01-31 >> 00:00:00'::timestamp without time zone)" >> " -> Seq Scan on facebook_people moments >> (cost=0.00..10.84 rows=80 width=104) (actual time=0.001..0.001 rows=0 >> loops=1)" >> " Filter: (inserted > '2012-01-31 >> 00:00:00'::timestamp without time zone)" >> " -> Seq Scan on address_people moments >> (cost=0.00..10.84 rows=80 width=104) (actual time=0.005..0.005 rows=0 >> loops=1)" >> " Filter: (inserted > '2012-01-31 >> 00:00:00'::timestamp without time zone)" >> " -> Seq Scan on path_people moments >> (cost=0.00..5606.79 rows=30 width=29) (actual time=211.166..211.166 >> rows=0 loops=1)" >> " Filter: (inserted > '2012-01-31 >> 00:00:00'::timestamp without time zone)" >> " -> Seq Scan on sleep moments >> (cost=0.00..11.05 rows=100 width=104) (actual time=0.002..0.002 rows=0 >> loops=1)" >> " Filter: (inserted > '2012-01-31 >> 00:00:00'::timestamp without time zone)" >> " -> Hash (cost=79292.49..79292.49 rows=4059496 >> width=55) (actual time=25757.998..25757.998 rows=4058642 loops=1)" >> " Buckets: 262144 Batches: 4 Memory Usage: 75211kB" >> " -> Seq Scan on emotions >> (cost=0.00..79292.49 rows=4059496 width=55) (actual >> time=0.012..15969.981 rows=4058642 loops=1)" >> " -> Hash (cost=8.88..8.88 rows=292 width=68) (actual >> time=0.487..0.487 rows=319 loops=1)" >> " Buckets: 1024 Batches: 1 Memory Usage: 28kB" >> " -> Seq Scan on pg_class (cost=0.00..8.88 >> rows=292 width=68) (actual time=0.013..0.234 rows=319 loops=1)" >> "Total runtime: 60601.612 ms" >> >> Postgres version: is still 9.0.5 >> >> History: N/A (This is the first time I've run this query.) >> >> Hardware: 1.7 GB Cache and other things you'd expect from a Ronin >> instance of a Heroku Postgres database. >> >> Maintenance Setup: What Heroku does. As before, vacuum should not be >> relevant as there are no deletes or even updates (just inserts and >> selects) >> >> WAL Configuration: I still don't know. Heroku hosts the database on >> Amazon's servers, so maybe that answers the question? >> >> GUC Settings: As per the yesterday's discussion, I reduced >> random_page_cost to 2. Other than that, it's all default. >> >> Bonus question: If that was too simple, here's something even more >> complex I'd like to do: I have another table that inherits feedback >> called "comments". Ideally, rather than an "emotion" column coming >> out, I would like to have a "feedback_type" column that would be >> either the value in the emotion column of the emotions table, or >> "comment" if it's from the comments table. I'm thinking I'm going to >> have to simply put that together on the client, but if I can do that >> in a single query (that doesn't take an hour to run) that would be >> super cool. But that's definitely secondary compared to getting the >> above query to run faster. >> >> Thank you very much for any help! >> -Alessandro Gagliardi > >
I just got a pointer on presenting EXPLAIN ANALYZE in a more human friendly fashion (thanks, Agent M!): http://explain.depesz.com/s/A9S
From this it looks like the bottleneck happens when Postgres does an Index Scan using emotions_moment_id_idx on emotions before filtering on moments.inserted so I thought I'd try filtering on emotions.inserted instead but that only made it worse. At the same time, I noticed that "FROM pg_class, moments WHERE moments.tableoid = pg_class.oid" tends to run a bit faster than "FROM pg_class JOIN moments ON moments.tableoid = pg_class.oid". So I tried:
SELECT relname, emotion, COUNT(feedback_id)
FROM pg_class, moments, emotions
WHERE moments.tableoid = pg_class.oid
AND emotions.inserted > 'yesterday'
AND moments.inserted BETWEEN 'yesterday' AND 'today'
AND emotions.moment_id = moments.moment_id
GROUP BY relname, emotion
ORDER BY relname, emotion;
That was a bit faster, but still very slow. Here's the EXPLAIN: http://explain.depesz.com/s/ZdF
On Tue, Jan 31, 2012 at 2:53 PM, Alessandro Gagliardi <alessandro@path.com> wrote:
I changed the query a bit so the results would not change over the
course of the day to:WHERE moments.inserted BETWEEN 'yesterday' AND 'today' AND
SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments
JOIN emotions USING (moment_id)moments.tableoid = pg_class.oid
GROUP BY relname, emotion ORDER BY relname, emotion;
Final update on this thread: since it is only necessary for me to get a rough ratio of the distribution (and not the absolute count), I refactored the query to include a subquery that samples from the moments table thus: SELECT moment_id, block_id FROM moments WHERE inserted BETWEEN 'yesterday' AND 'today' ORDER BY RANDOM() LIMIT 10000; I also took advantage of another table called blocks that happens to contain the moment_type as well (thus making it so I don't need to reference pg_class). The final query looks like:
SELECT moment_type, emotion, COUNT(feedback_id)
FROM (SELECT moment_id, block_id
FROM moments
WHERE inserted BETWEEN 'yesterday' AND 'today'
ORDER BY RANDOM() LIMIT 10000) AS sample_moments
JOIN blocks USING (block_id)
JOIN emotions USING (moment_id)
GROUP BY moment_type, emotion
ORDER BY moment_type, emotion
The explain is at http://explain.depesz.com/s/lYh
Interestingly, increasing the limit does not seem to increase the runtime in a linear fashion. When I run it with a limit of 60000 I get a runtime of 14991 ms. But if I run it with a limit of 70000 I get a runtime of 77744 ms. I assume that that's because I'm hitting a memory limit and paging out. Is that right?
On Tue, Jan 31, 2012 at 3:43 PM, Alessandro Gagliardi <alessandro@path.com> wrote:
I just got a pointer on presenting EXPLAIN ANALYZE in a more human friendly fashion (thanks, Agent M!): http://explain.depesz.com/s/A9SFrom this it looks like the bottleneck happens when Postgres does an Index Scan using emotions_moment_id_idx on emotions before filtering on moments.inserted so I thought I'd try filtering on emotions.inserted instead but that only made it worse. At the same time, I noticed that "FROM pg_class, moments WHERE moments.tableoid = pg_class.oid" tends to run a bit faster than "FROM pg_class JOIN moments ON moments.tableoid = pg_class.oid". So I tried:SELECT relname, emotion, COUNT(feedback_id)FROM pg_class, moments, emotionsWHERE moments.tableoid = pg_class.oidAND emotions.inserted > 'yesterday'AND moments.inserted BETWEEN 'yesterday' AND 'today'AND emotions.moment_id = moments.moment_idGROUP BY relname, emotionORDER BY relname, emotion;That was a bit faster, but still very slow. Here's the EXPLAIN: http://explain.depesz.com/s/ZdFOn Tue, Jan 31, 2012 at 2:53 PM, Alessandro Gagliardi <alessandro@path.com> wrote:I changed the query a bit so the results would not change over the
course of the day to:WHERE moments.inserted BETWEEN 'yesterday' AND 'today' AND
SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments
JOIN emotions USING (moment_id)moments.tableoid = pg_class.oid
GROUP BY relname, emotion ORDER BY relname, emotion;
On Wed, Feb 1, 2012 at 11:19 AM, Alessandro Gagliardi <alessandro@path.com> wrote: > Final update on this thread: since it is only necessary for me to get a > rough ratio of the distribution (and not the absolute count), I refactored > the query to include a subquery that samples from the moments table > thus: SELECT moment_id, block_id FROM moments WHERE inserted BETWEEN > 'yesterday' AND 'today' ORDER BY RANDOM() LIMIT 10000; I also took advantage > of another table called blocks that happens to contain the moment_type as > well (thus making it so I don't need to reference pg_class). The final query > looks like: > > SELECT moment_type, emotion, COUNT(feedback_id) > FROM (SELECT moment_id, block_id > FROM moments > WHERE inserted BETWEEN 'yesterday' AND 'today' > ORDER BY RANDOM() LIMIT 10000) AS sample_moments > JOIN blocks USING (block_id) > JOIN emotions USING (moment_id) > GROUP BY moment_type, emotion > ORDER BY moment_type, emotion > > The explain is at http://explain.depesz.com/s/lYh > > Interestingly, increasing the limit does not seem to increase the runtime in > a linear fashion. When I run it with a limit of 60000 I get a runtime > of 14991 ms. But if I run it with a limit of 70000 I get a runtime of 77744 > ms. I assume that that's because I'm hitting a memory limit and paging out. > Is that right? Hard to say. more likely your query plan changes at that point. Run the queries with "explain analyze" in front of them to find out.
LIMIT 65536; Total query runtime: 14846 ms. - http://explain.depesz.com/s/I3E
LIMIT 69632: Total query runtime: 80141 ms. - http://explain.depesz.com/s/9hp
So it looks like when the limit crosses a certain threshold (somewhere north of 2^16), Postgres decides to do a Seq Scan instead of an Index Scan.
I've already lowered random_page_cost to 2. Maybe I should lower it to 1.5? Actually 60K should be plenty for my purposes anyway.
On Wed, Feb 1, 2012 at 10:35 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
Hard to say. more likely your query plan changes at that point. RunOn Wed, Feb 1, 2012 at 11:19 AM, Alessandro Gagliardi
<alessandro@path.com> wrote:
> Interestingly, increasing the limit does not seem to increase the runtime in
> a linear fashion. When I run it with a limit of 60000 I get a runtime
> of 14991 ms. But if I run it with a limit of 70000 I get a runtime of 77744
> ms. I assume that that's because I'm hitting a memory limit and paging out.
> Is that right?
the queries with "explain analyze" in front of them to find out.
On Wed, Feb 1, 2012 at 11:48 AM, Alessandro Gagliardi <alessandro@path.com> wrote: > LIMIT 65536; Total query runtime: 14846 ms. > - http://explain.depesz.com/s/I3E > LIMIT 69632: Total query runtime: 80141 ms. > - http://explain.depesz.com/s/9hp > > So it looks like when the limit crosses a certain threshold (somewhere north > of 2^16), Postgres decides to do a Seq Scan instead of an Index Scan. > I've already lowered random_page_cost to 2. Maybe I should lower it to 1.5? > Actually 60K should be plenty for my purposes anyway. It's important to set random_page_cost according to more than just one query, but yeah, at this point it's likely a good idea to set it closer to 1.0. You're on heroku right? Something closer to 1.0 is likely called for if so. 1.2 to 1.4 or so. If you've got other queries you can test the change on all the better.
Possibly. What does
psql > show work_mem;
say?
Bob Lunney
From: Alessandro Gagliardi <alessandro@path.com>
To: pgsql-performance@postgresql.org
Sent: Wednesday, February 1, 2012 12:19 PM
Subject: Re: [PERFORM] From Simple to Complex
Final update on this thread: since it is only necessary for me to get a rough ratio of the distribution (and not the absolute count), I refactored the query to include a subquery that samples from the moments table thus: SELECT moment_id, block_id FROM moments WHERE inserted BETWEEN 'yesterday' AND 'today' ORDER BY RANDOM() LIMIT 10000; I also took advantage of another table called blocks that happens to contain the moment_type as well (thus making it so I don't need to reference pg_class). The final query looks like:
SELECT moment_type, emotion, COUNT(feedback_id)
FROM (SELECT moment_id, block_id
FROM moments
WHERE inserted BETWEEN 'yesterday' AND 'today'
ORDER BY RANDOM() LIMIT 10000) AS sample_moments
JOIN blocks USING (block_id)
JOIN emotions USING (moment_id)
GROUP BY moment_type, emotion
ORDER BY moment_type, emotion
The explain is at http://explain.depesz.com/s/lYh
Interestingly, increasing the limit does not seem to increase the runtime in a linear fashion. When I run it with a limit of 60000 I get a runtime of 14991 ms. But if I run it with a limit of 70000 I get a runtime of 77744 ms. I assume that that's because I'm hitting a memory limit and paging out. Is that right?
On Tue, Jan 31, 2012 at 3:43 PM, Alessandro Gagliardi <alessandro@path.com> wrote:
I just got a pointer on presenting EXPLAIN ANALYZE in a more human friendly fashion (thanks, Agent M!): http://explain.depesz.com/s/A9SFrom this it looks like the bottleneck happens when Postgres does an Index Scan using emotions_moment_id_idx on emotions before filtering on moments.inserted so I thought I'd try filtering on emotions.inserted instead but that only made it worse. At the same time, I noticed that "FROM pg_class, moments WHERE moments.tableoid = pg_class.oid" tends to run a bit faster than "FROM pg_class JOIN moments ON moments.tableoid = pg_class.oid". So I tried:SELECT relname, emotion, COUNT(feedback_id)FROM pg_class, moments, emotionsWHERE moments.tableoid = pg_class.oidAND emotions.inserted > 'yesterday'AND moments.inserted BETWEEN 'yesterday' AND 'today'AND emotions.moment_id = moments.moment_idGROUP BY relname, emotionORDER BY relname, emotion;That was a bit faster, but still very slow. Here's the EXPLAIN: http://explain.depesz.com/s/ZdFOn Tue, Jan 31, 2012 at 2:53 PM, Alessandro Gagliardi <alessandro@path.com> wrote:I changed the query a bit so the results would not change over the
course of the day to:WHERE moments.inserted BETWEEN 'yesterday' AND 'today' AND
SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments
JOIN emotions USING (moment_id)moments.tableoid = pg_class.oid
GROUP BY relname, emotion ORDER BY relname, emotion;
On Wed, Feb 1, 2012 at 11:04 AM, Bob Lunney <bob_lunney@yahoo.com> wrote:
100MBPossibly. What doespsql > show work_mem;say?
On Wed, Feb 1, 2012 at 12:48 PM, Alessandro Gagliardi <alessandro@path.com> wrote: > LIMIT 65536; Total query runtime: 14846 ms. > - http://explain.depesz.com/s/I3E > LIMIT 69632: Total query runtime: 80141 ms. > - http://explain.depesz.com/s/9hp > > So it looks like when the limit crosses a certain threshold (somewhere north > of 2^16), Postgres decides to do a Seq Scan instead of an Index Scan. > I've already lowered random_page_cost to 2. Maybe I should lower it to 1.5? > Actually 60K should be plenty for my purposes anyway. also, is effective_cache_size set to a reasonable value? merlin
On Thu, Feb 2, 2012 at 6:52 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
Yeah, it's 1530000kBalso, is effective_cache_size set to a reasonable value?