Thread: query issue
Hi, I have postgres 10 running on RDS instance. I have query below: select * from "op_KFDaBAZDSXc4YYts9"."UserFeedItems" where (("itemType" not in ('WELCOME_POST', 'UPLOAD_CONTACTS', 'BROADCAST_POST')) and ("userId" = '5d230d67bd99c5001b1ae757' and "is_deleted" in (true, false))) order by "score" asc, "updatedAt" desc limit 10; Explain plan is like given below: QUERY PLAN Limit (cost=11058.03..11058.05 rows=10 width=1304) (actual time=6105.283..6105.293 rows=10 loops=1) -> Sort (cost=11058.03..11065.36 rows=2935 width=1304) (actual time=6105.281..6105.283 rows=10 loops=1) Sort Key: score, "updatedAt" DESC Sort Method: top-N heapsort Memory: 36kB -> Bitmap Heap Scan on "UserFeedItems" (cost=131.33..10994.60 rows=2935 width=1304) (actual time=26.245..6093.680 rows=3882 loops=1) Recheck Cond: ("userId" = '5d230d67bd99c5001b1ae757'::text) Filter: ((is_deleted = ANY ('{t,f}'::boolean[])) AND ("itemType" <> ALL ('{WELCOME_POST,UPLOAD_CONTACTS,BROADCAST_POST}'::text[]))) Rows Removed by Filter: 1 Heap Blocks: exact=3804 -> Bitmap Index Scan on "userId" (cost=0.00..130.60 rows=2956 width=0) (actual time=24.835..24.836 rows=3885 loops=1) Index Cond: ("userId" = '5d230d67bd99c5001b1ae757'::text) Planning time: 20.928 ms Execution time: 6108.610 ms My table structure is somewhat like this CREATE TABLE "op_KFDaBAZDSXc4YYts9"."UserFeedItems" ( _id text COLLATE pg_catalog."default" NOT NULL DEFAULT uuid_generate_v4(), "userId" text COLLATE pg_catalog."default" NOT NULL, "itemType" text COLLATE pg_catalog."default" NOT NULL, payload jsonb NOT NULL, score numeric NOT NULL, "generalFeedItemId" text COLLATE pg_catalog."default", "createdAt" timestamp without time zone NOT NULL DEFAULT (now())::timestamp without time zone, "createdBy" text COLLATE pg_catalog."default", "updatedAt" timestamp without time zone NOT NULL DEFAULT (now())::timestamp without time zone, "updatedBy" text COLLATE pg_catalog."default", is_deleted boolean DEFAULT false, "isRead" boolean NOT NULL DEFAULT false, CONSTRAINT "UserFeedItems_pkey" PRIMARY KEY (_id) ) WITH ( OIDS = FALSE ) TABLESPACE pg_default; CREATE INDEX "UserFeedItems_id" ON "op_KFDaBAZDSXc4YYts9"."UserFeedItems" USING btree (_id COLLATE pg_catalog."default" ASC NULLS LAST) TABLESPACE pg_default; -- Index: UserFeedItems_itemType_userId_isdeleted_score_updatedAt -- DROP INDEX "op_KFDaBAZDSXc4YYts9"."UserFeedItems_itemType_userId_isdeleted_score_updatedAt"; CREATE INDEX "UserFeedItems_itemType_userId_isdeleted_score_updatedAt" ON "op_KFDaBAZDSXc4YYts9"."UserFeedItems" USING btree ("itemType" COLLATE pg_catalog."default" ASC NULLS LAST, "userId" COLLATE pg_catalog."default" ASC NULLS LAST, is_deleted ASC NULLS LAST, score ASC NULLS LAST, "updatedAt" DESC NULLS FIRST) TABLESPACE pg_default; -- Index: score -- DROP INDEX "op_KFDaBAZDSXc4YYts9".score; CREATE INDEX score ON "op_KFDaBAZDSXc4YYts9"."UserFeedItems" USING btree (score ASC NULLS LAST) TABLESPACE pg_default; -- Index: updatedat -- DROP INDEX "op_KFDaBAZDSXc4YYts9".updatedat; CREATE INDEX updatedat ON "op_KFDaBAZDSXc4YYts9"."UserFeedItems" USING btree ("updatedAt" DESC NULLS FIRST) TABLESPACE pg_default; -- Index: userId -- DROP INDEX "op_KFDaBAZDSXc4YYts9"."userId"; CREATE INDEX "userId" ON "op_KFDaBAZDSXc4YYts9"."UserFeedItems" USING btree ("userId" COLLATE pg_catalog."default" text_pattern_ops ASC NULLS LAST) TABLESPACE pg_default; So my doubt is initially when I run this query it takes around 42 seconds to complete but later after few minutes it completes in 2-3 seconds. I tried to create indexes on table for columns score & "updatedAt" DESC seperately but found no proper satisfied solution. So please help me telling what I am exactly missing here ? Regards, Atul
On Tue, 15 Jun 2021 16:12:11 +0530 Atul Kumar <akumar14871@gmail.com> wrote: > Hi, > > I have postgres 10 running on RDS instance. > > I have query below: [...] > > So my doubt is initially when I run this query it takes around 42 > seconds to complete but later after few minutes it completes in 2-3 > seconds. > > I tried to create indexes on table for columns score & "updatedAt" > DESC seperately but found no proper satisfied solution. > > So please help me telling what I am exactly missing here ? The worst part of your plan is the Bitmap Heap Scan, where the plan is actually fetching the rows from the table. The bitmap index scan and sort are fast. There's not much to do about them. This query need to fetch 3882 rows from your table. So either the fetching part of the plan is really, really slow (IO/CPU bound), or the simple filter, on only ~4k, is really slow (CPU bound). You might want to avoid "SELECT *" and only specify the fields you really need. Try first with only "SELECT _id", just to compare. You have an average row size of 1.3k that the executor need to fetch and carry all the way to the result set. This can cost a lot of useless IO and CPU. You might want to tweak random_page_cost/seq_page_cost/effective_cache_size to find out if an index scan would do a better job, but I'm unsure because I lack of informations about your data and system. Regards,
hi, I have an RDS instance with 2GB of RAM, 1 CPU, instance class - t2.small. If you need any more info please let me know. and as you shared I need to tweak random_page_cost/seq_page_cost/effective_cache_size So please suggest which parameter value I need to increase or decrease as I am known well with these parameters. Regards. On 6/15/21, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote: > On Tue, 15 Jun 2021 16:12:11 +0530 > Atul Kumar <akumar14871@gmail.com> wrote: > >> Hi, >> >> I have postgres 10 running on RDS instance. >> >> I have query below: > [...] >> >> So my doubt is initially when I run this query it takes around 42 >> seconds to complete but later after few minutes it completes in 2-3 >> seconds. >> >> I tried to create indexes on table for columns score & "updatedAt" >> DESC seperately but found no proper satisfied solution. >> >> So please help me telling what I am exactly missing here ? > > The worst part of your plan is the Bitmap Heap Scan, where the plan is > actually > fetching the rows from the table. The bitmap index scan and sort are fast. > There's not much to do about them. > > This query need to fetch 3882 rows from your table. So either the fetching > part > of the plan is really, really slow (IO/CPU bound), or the simple filter, on > only > ~4k, is really slow (CPU bound). > > You might want to avoid "SELECT *" and only specify the fields you really > need. > Try first with only "SELECT _id", just to compare. You have an average row > size > of 1.3k that the executor need to fetch and carry all the way to the result > set. > This can cost a lot of useless IO and CPU. > > You might want to tweak random_page_cost/seq_page_cost/effective_cache_size > to > find out if an index scan would do a better job, but I'm unsure because I > lack > of informations about your data and system. > > Regards, >
On Tue, 15 Jun 2021 19:16:41 +0530 Atul Kumar <akumar14871@gmail.com> wrote: > hi, > > I have an RDS instance with 2GB of RAM, 1 CPU, instance class - t2.small. > > If you need any more info please let me know. > > and as you shared I need to tweak > random_page_cost/seq_page_cost/effective_cache_size So please suggest > which parameter value I need to increase or decrease as I am known > well with these parameters. First, did you test with "SELECT _id" instead of "SELECT *" ? About rand_page_costs/effective_cache_size, the fine manual already give some explanations and tips: https://www.postgresql.org/docs/current/runtime-config-query.html With such a low setup, I'm not sure what you can expect though. What is the concurrency? How many lines in total? The table size?
Hi, Please find below the details you asked for: Relation size 1986 MB table count - 1407721 We have removed few indexes. Query - QUERY PLAN Limit (cost=0.43..5529.03 rows=10 width=37) (actual time=0.974..12911.087 rows=10 loops=1) Output: items._id Buffers: shared hit=4838 read=3701 -> Subquery Scan on items (cost=0.43..1622646.30 rows=2935 width=37) (actual time=0.972..12911.078 rows=10 loops=1) Output: items._id Buffers: shared hit=4838 read=3701 -> Index Scan using sort on "op_KFDaBAZDSXc4YYts9"."UserFeedItems" (cost=0.43..1622616.95 rows=2935 width=50) (actual time=0.970..12911.070 rows=10 loops=1) Output: "UserFeedItems"._id, "UserFeedItems".score, "UserFeedItems"."updatedAt" Filter: (("UserFeedItems".is_deleted = ANY ('{t,f}'::boolean[])) AND ("UserFeedItems"."userId" = '5d230d67bd99c5001b1ae757'::text) AND ("UserFeedItems"."itemType" <> ALL ('{WELCOME_POST,UPLOAD_CONTACTS,BROADCAST_POST}'::text[]))) Rows Removed by Filter: 15478 Buffers: shared hit=4838 read=3701 Planning time: 100.949 ms Execution time: 12930.302 ms QUERY PLAN Limit (cost=0.43..5529.03 rows=10 width=37) (actual time=0.974..12911.087 rows=10 loops=1) Output: items._id Buffers: shared hit=4838 read=3701 -> Subquery Scan on items (cost=0.43..1622646.30 rows=2935 width=37) (actual time=0.972..12911.078 rows=10 loops=1) Output: items._id Buffers: shared hit=4838 read=3701 -> Index Scan using sort on "op_KFDaBAZDSXc4YYts9"."UserFeedItems" (cost=0.43..1622616.95 rows=2935 width=50) (actual time=0.970..12911.070 rows=10 loops=1) Output: "UserFeedItems"._id, "UserFeedItems".score, "UserFeedItems"."updatedAt" Filter: (("UserFeedItems".is_deleted = ANY ('{t,f}'::boolean[])) AND ("UserFeedItems"."userId" = '5d230d67bd99c5001b1ae757'::text) AND ("UserFeedItems"."itemType" <> ALL ('{WELCOME_POST,UPLOAD_CONTACTS,BROADCAST_POST}'::text[]))) Rows Removed by Filter: 15478 Buffers: shared hit=4838 read=3701 Planning time: 100.949 ms Execution time: 12930.302 ms Please share your suggestions. Regards, Atul On 6/15/21, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote: > On Tue, 15 Jun 2021 19:16:41 +0530 > Atul Kumar <akumar14871@gmail.com> wrote: > >> hi, >> >> I have an RDS instance with 2GB of RAM, 1 CPU, instance class - t2.small. >> >> If you need any more info please let me know. >> >> and as you shared I need to tweak >> random_page_cost/seq_page_cost/effective_cache_size So please suggest >> which parameter value I need to increase or decrease as I am known >> well with these parameters. > > First, did you test with "SELECT _id" instead of "SELECT *" ? > > About rand_page_costs/effective_cache_size, the fine manual already give > some > explanations and tips: > https://www.postgresql.org/docs/current/runtime-config-query.html > > With such a low setup, I'm not sure what you can expect though. What is the > concurrency? How many lines in total? The table size? >
On Wed, 16 Jun 2021 at 18:29, Atul Kumar <akumar14871@gmail.com> wrote: > QUERY PLAN > Limit (cost=0.43..5529.03 rows=10 width=37) (actual > time=0.974..12911.087 rows=10 loops=1) > Output: items._id > Buffers: shared hit=4838 read=3701 > -> Subquery Scan on items (cost=0.43..1622646.30 rows=2935 > width=37) (actual time=0.972..12911.078 rows=10 loops=1) > Output: items._id > Buffers: shared hit=4838 read=3701 > -> Index Scan using sort on > "op_KFDaBAZDSXc4YYts9"."UserFeedItems" (cost=0.43..1622616.95 > rows=2935 width=50) (actual time=0.970..12911.070 rows=10 loops=1) > Output: "UserFeedItems"._id, "UserFeedItems".score, > "UserFeedItems"."updatedAt" > Filter: (("UserFeedItems".is_deleted = ANY > ('{t,f}'::boolean[])) AND ("UserFeedItems"."userId" = > '5d230d67bd99c5001b1ae757'::text) AND ("UserFeedItems"."itemType" <> > ALL ('{WELCOME_POST,UPLOAD_CONTACTS,BROADCAST_POST}'::text[]))) > Rows Removed by Filter: 15478 > Buffers: shared hit=4838 read=3701 > Planning time: 100.949 ms > Execution time: 12930.302 ms It seems to me that this system is pretty slow on I/O. I imagine if you do: SET track_io_timing = ON: then run EXPLAIN (ANALYZE, BUFFERS) on the query that you'll see that most of the time is spent doing I/O. If you're unable to make I/O faster then you might want to upgrade to a machine that's more likely to be able to keep the working set of your database in memory. From looking at your earlier queries: > -> Bitmap Heap Scan on "UserFeedItems" > (cost=131.33..10994.60 rows=2935 width=1304) (actual > time=26.245..6093.680 rows=3882 loops=1) > Recheck Cond: ("userId" = '5d230d67bd99c5001b1ae757'::text) > Filter: ((is_deleted = ANY ('{t,f}'::boolean[])) AND > ("itemType" <> ALL > ('{WELCOME_POST,UPLOAD_CONTACTS,BROADCAST_POST}'::text[]))) > Rows Removed by Filter: 1 > Heap Blocks: exact=3804 The bitmap index scan matched 3804 pages and there are only 3882 rows. That's an average of just over 1 tuple per page. If the table was clustered by that index then that might help speed up your query, but since the cluster order is not maintained then it'll likely go out over time and the query will just become slow again. If you had been using at least PostgreSQL 12 then you could have looked into using partitioning. Partitioning does exist before 12, but it became much better in that version. Partitioning might help you here if you partitioned by HASH (userid) as you might average a few more matched rows per page in the bitmap scan and reduce the number of pages that need to be read from disk. I'm not really sure how many partitions you'd have to make to get a meaningful improvement there though. That'll depend on how many users there are and how big the rows are. There are also some pretty bad design choices with your table. You seem to have lots of IDs which are TEXT fields. It's fairly normal practice in databases not to do that and to use something like INT or BIGINT. Using TEXT is pretty bad for a few reasons. 1) it makes your data bigger and reduces cache hit ratios. 2) variable length fields at the start of tables is not so great as tuple deforming becomes slower due to there being no fixed offset into columns that come after a variable-length field. Anyway, there's quite a lot you could do here to make this query run faster. David