Re: contrib/cache_scan (Re: What's needed for cache-only table scan?) - Mailing list pgsql-hackers
From | Kouhei Kaigai |
---|---|
Subject | Re: contrib/cache_scan (Re: What's needed for cache-only table scan?) |
Date | |
Msg-id | 9A28C8860F777E439AA12E8AEA7694F8F837C1@BPXM15GP.gisp.nec.co.jp Whole thread Raw |
In response to | Re: contrib/cache_scan (Re: What's needed for cache-only table scan?) (Haribabu Kommi <kommi.haribabu@gmail.com>) |
Responses |
Re: contrib/cache_scan (Re: What's needed for cache-only
table scan?)
|
List | pgsql-hackers |
Thanks for your reviewing. According to the discussion in the Custom-Scan API thread, I moved all the supplemental facilities (like bms_to/from_string) into the main patch. So, you don’t need to apply ctidscan and postgres_fdw patch for testing any more. (I'll submit the revised one later) > 1. memcpy(dest, tuple, HEAPTUPLESIZE); > + memcpy((char *)dest + HEAPTUPLESIZE, > > + tuple->t_data, tuple->t_len); > > For a normal tuple these two addresses are different but in case of ccache, > it is a continuous memory. > Better write a comment as even if it continuous memory, it is treated > as different only. > OK, I put a source code comment as follows: /* * Even though we put the body of HeapTupleHeaderData just after * HeapTupleData, usually, here is no guarantee that both of data * structures are located on continuous memory address. * So, we explicitly adjust tuple->t_data to point the area just * behind of itself, to reference the HeapTuple on columnar-cache * as like regular ones. */ > 2. + uint32 required = HEAPTUPLESIZE + MAXALIGN(tuple->t_len); > > t_len is already maxaligned. No problem of using it again, The required > length calculation is differing function to function. > For example, in below part of the same function, the same t_len is used > directly. It didn't generate any problem, but it may give some confusion. > Once I tried to trust t_len is aligned well, however, Assert() macro said it is not a right assumption. See heap_compute_data_size(), it computes length of tuple body and adjusts alignment according to the "attalign" value of pg_attribute; that is not usually same with sizeof(Datum). > 4. + cchunk = ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid); > + if (pchunk != NULL && pchunk != cchunk) > > + ccache_merge_chunk(ccache, pchunk); > > + pchunk = cchunk; > > > The merge_chunk is called only when the heap tuples are spread across > two cache chunks. Actually one cache chunk can accommodate one or more than > heap pages. it needs some other way of handling. > I adjusted the logic to merge the chunks as follows: Once a tuple is vacuumed from a chunk, it also checks whether it can be merged with its child leafs. A chunk has up to two child leafs; left one has less ctid that the parent, and right one has greater ctid. It means a chunk without right child in the left sub-tree or a chunk without left child in the right sub-tree are neighbor of the chunk being vacuumed. In addition, if vacuumed chunk does not have either (or both) of children, it can be merged with parent node. I modified ccache_vacuum_tuple() to merge chunks during t-tree walk-down, if vacuumed chunk has enough free space. > 4. for (i=0; i < 20; i++) > > Better to replace this magic number with a meaningful macro. > I rethought here is no good reason why we should construct multiple ccache_entries at once. So, I adjusted the code to create a new ccache_entry on demand, to track a columnar-cache being acquired. > 5. "columner" is present in sgml file. correct it. > Sorry, fixed it. > 6. "max_cached_attnum" value in the document saying as 128 by default but > in the code it set as 256. > Sorry, fixed it. Also, I tried to run a benchmark of cache_scan on the case when this module performs most effectively. The table t1 is declared as follows: create table t1 (a int, b float, c float, d text, e date, f char(200)); Its width is almost 256bytes/record, and contains 4milion records, thus total table size is almost 1GB. * 1st trial - it takes longer time than sequential scan because of columnar- cache construction postgres=# explain analyze select count(*) from t1 where a % 10 = 5; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=200791.62..200791.64 rows=1 width=0) (actual time=63105.036..63105.037 rows=1 loops=1) -> Custom Scan (cache scan) on t1 (cost=0.00..200741.62 rows=20000 width=0) (actual time=7.397..62832.728 rows=400000loops=1) Filter: ((a % 10) = 5) Rows Removed by Filter: 3600000 Planning time: 214.506 ms Total runtime: 64629.296 ms (6 rows) * 2nd trial - it takes much faster than sequential scan because of no disk access postgres=# explain analyze select count(*) from t1 where a % 10 = 5; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=67457.53..67457.54 rows=1 width=0) (actual time=7833.313..7833.313 rows=1 loops=1) -> Custom Scan (cache scan) on t1 (cost=0.00..67407.53 rows=20000 width=0) (actual time=0.154..7615.914 rows=400000loops=1) Filter: ((a % 10) = 5) Rows Removed by Filter: 3600000 Planning time: 1.019 ms Total runtime: 7833.761 ms (6 rows) * 3rd trial - turn off the cache_scan, so planner chooses the built-in SeqScan. postgres=# set cache_scan.enabled = off; SET postgres=# explain analyze select count(*) from t1 where a % 10 = 5; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Aggregate (cost=208199.08..208199.09 rows=1 width=0) (actual time=59700.810..59700.810 rows=1 loops=1) -> Seq Scan on t1 (cost=0.00..208149.08 rows=20000 width=0) (actual time=715.489..59518.095 rows=400000 loops=1) Filter: ((a % 10) = 5) Rows Removed by Filter: 3600000 Planning time: 0.630 ms Total runtime: 59701.104 ms (6 rows) The reason why such an extreme result. I adjusted the system page cache usage to constrain disk cache hit in the operating system level, so sequential scan is dominated by disk access performance in this case. On the other hand, columnar cache allowed to host whole of the records because it omits to cache unreferenced columns. * GUCs shared_buffers = 512MB shared_preload_libraries = 'cache_scan' cache_scan.num_blocks = 400 [kaigai@iwashi backend]$ free -m total used free shared buffers cached Mem: 7986 7839 146 0 2 572 -/+ buffers/cache: 7265 721 Swap: 8079 265 7814 Please don't throw me stones. :-) The primary purpose of this extension is to demonstrate usage of custom-scan interface and heap_page_prune_hook(). Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com] > Sent: Monday, February 24, 2014 12:42 PM > To: Kohei KaiGai > Cc: Kaigai, Kouhei(海外, 浩平); Tom Lane; PgHacker; Robert Haas > Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only > table scan?) > > On Fri, Feb 21, 2014 at 2:19 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > > > Hello, > > The attached patch is a revised one for cache-only scan module > on top of custom-scan interface. Please check it. > > > > Thanks for the revised patch. Please find some minor comments. > > 1. memcpy(dest, tuple, HEAPTUPLESIZE); > + memcpy((char *)dest + HEAPTUPLESIZE, > > + tuple->t_data, tuple->t_len); > > > For a normal tuple these two addresses are different but in case of ccache, > it is a continuous memory. > Better write a comment as even if it continuous memory, it is treated > as different only. > > 2. + uint32 required = HEAPTUPLESIZE + MAXALIGN(tuple->t_len); > > t_len is already maxaligned. No problem of using it again, The required > length calculation is differing function to function. > For example, in below part of the same function, the same t_len is used > directly. It didn't generate any problem, but it may give some confusion. > > 4. + cchunk = ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid); > + if (pchunk != NULL && pchunk != cchunk) > > + ccache_merge_chunk(ccache, pchunk); > > + pchunk = cchunk; > > > The merge_chunk is called only when the heap tuples are spread across > two cache chunks. Actually one cache chunk can accommodate one or more than > heap pages. it needs some other way of handling. > > 4. for (i=0; i < 20; i++) > > Better to replace this magic number with a meaningful macro. > > 5. "columner" is present in sgml file. correct it. > > 6. "max_cached_attnum" value in the document saying as 128 by default but > in the code it set as 256. > > I will start regress and performance tests. I will inform you the same once > i finish. > > > Regards, > Hari Babu > > Fujitsu Australia
Attachment
pgsql-hackers by date: