Re: contrib/cache_scan (Re: What's needed for cache-only table scan?) - Mailing list pgsql-hackers
From | Kouhei Kaigai |
---|---|
Subject | Re: contrib/cache_scan (Re: What's needed for cache-only table scan?) |
Date | |
Msg-id | 9A28C8860F777E439AA12E8AEA7694F8F8382D@BPXM15GP.gisp.nec.co.jp Whole thread Raw |
In response to | Re: contrib/cache_scan (Re: What's needed for cache-only table scan?) (Kouhei Kaigai <kaigai@ak.jp.nec.com>) |
Responses |
Re: contrib/cache_scan (Re: What's needed for cache-only
table scan?)
|
List | pgsql-hackers |
Sorry, the previous one still has "columner" in the sgml files. Please see the attached one, instead. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > Sent: Tuesday, March 04, 2014 12:35 PM > To: Haribabu Kommi; Kohei KaiGai > Cc: Tom Lane; PgHacker; Robert Haas > Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only > table scan?) > > Thanks for your reviewing. > > According to the discussion in the Custom-Scan API thread, I moved all the > supplemental facilities (like bms_to/from_string) into the main patch. So, > you don’t need to apply ctidscan and postgres_fdw patch for testing any > more. > (I'll submit the revised one later) > > > 1. memcpy(dest, tuple, HEAPTUPLESIZE); > > + memcpy((char *)dest + HEAPTUPLESIZE, > > > > + tuple->t_data, tuple->t_len); > > > > For a normal tuple these two addresses are different but in case of > > ccache, it is a continuous memory. > > Better write a comment as even if it continuous memory, it is > > treated as different only. > > > OK, I put a source code comment as follows: > > /* > * Even though we put the body of HeapTupleHeaderData just after > * HeapTupleData, usually, here is no guarantee that both of data > * structures are located on continuous memory address. > * So, we explicitly adjust tuple->t_data to point the area just > * behind of itself, to reference the HeapTuple on columnar-cache > * as like regular ones. > */ > > > 2. + uint32 required = HEAPTUPLESIZE + MAXALIGN(tuple->t_len); > > > > t_len is already maxaligned. No problem of using it again, The > > required length calculation is differing function to function. > > For example, in below part of the same function, the same t_len is > > used directly. It didn't generate any problem, but it may give some > confusion. > > > Once I tried to trust t_len is aligned well, however, Assert() macro said > it is not a right assumption. See heap_compute_data_size(), it computes > length of tuple body and adjusts alignment according to the "attalign" value > of pg_attribute; that is not usually same with sizeof(Datum). > > > 4. + cchunk = ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid); > > + if (pchunk != NULL && pchunk != cchunk) > > > > + ccache_merge_chunk(ccache, pchunk); > > > > + pchunk = cchunk; > > > > > > The merge_chunk is called only when the heap tuples are spread > > across two cache chunks. Actually one cache chunk can accommodate one > > or more than heap pages. it needs some other way of handling. > > > I adjusted the logic to merge the chunks as follows: > > Once a tuple is vacuumed from a chunk, it also checks whether it can be > merged with its child leafs. A chunk has up to two child leafs; left one > has less ctid that the parent, and right one has greater ctid. It means > a chunk without right child in the left sub-tree or a chunk without left > child in the right sub-tree are neighbor of the chunk being vacuumed. In > addition, if vacuumed chunk does not have either (or both) of children, > it can be merged with parent node. > I modified ccache_vacuum_tuple() to merge chunks during t-tree walk-down, > if vacuumed chunk has enough free space. > > > 4. for (i=0; i < 20; i++) > > > > Better to replace this magic number with a meaningful macro. > > > I rethought here is no good reason why we should construct multiple > ccache_entries at once. So, I adjusted the code to create a new ccache_entry > on demand, to track a columnar-cache being acquired. > > > 5. "columner" is present in sgml file. correct it. > > > Sorry, fixed it. > > > 6. "max_cached_attnum" value in the document saying as 128 by default > > but in the code it set as 256. > > > Sorry, fixed it. > > > Also, I tried to run a benchmark of cache_scan on the case when this module > performs most effectively. > > The table t1 is declared as follows: > create table t1 (a int, b float, c float, d text, e date, f char(200)); > Its width is almost 256bytes/record, and contains 4milion records, thus > total table size is almost 1GB. > > * 1st trial - it takes longer time than sequential scan because of columnar- > cache construction > postgres=# explain analyze select count(*) from t1 where a % 10 = 5; > QUERY PLAN > ---------------------------------------------------------------------- > -------------------------------------------------------------- > Aggregate (cost=200791.62..200791.64 rows=1 width=0) (actual > time=63105.036..63105.037 rows=1 loops=1) > -> Custom Scan (cache scan) on t1 (cost=0.00..200741.62 rows=20000 > width=0) (actual time=7.397..62832.728 rows=400000 loops=1) > Filter: ((a % 10) = 5) > Rows Removed by Filter: 3600000 Planning time: 214.506 ms Total > runtime: 64629.296 ms > (6 rows) > > * 2nd trial - it takes much faster than sequential scan because of no disk > access postgres=# explain analyze select count(*) from t1 where a % 10 = > 5; > QUERY PLAN > ---------------------------------------------------------------------- > ------------------------------------------------------------ > Aggregate (cost=67457.53..67457.54 rows=1 width=0) (actual > time=7833.313..7833.313 rows=1 loops=1) > -> Custom Scan (cache scan) on t1 (cost=0.00..67407.53 rows=20000 > width=0) (actual time=0.154..7615.914 rows=400000 loops=1) > Filter: ((a % 10) = 5) > Rows Removed by Filter: 3600000 Planning time: 1.019 ms Total > runtime: 7833.761 ms > (6 rows) > > * 3rd trial - turn off the cache_scan, so planner chooses the built-in > SeqScan. > postgres=# set cache_scan.enabled = off; SET postgres=# explain analyze > select count(*) from t1 where a % 10 = 5; > QUERY PLAN > ---------------------------------------------------------------------- > ------------------------------------------------ > Aggregate (cost=208199.08..208199.09 rows=1 width=0) (actual > time=59700.810..59700.810 rows=1 loops=1) > -> Seq Scan on t1 (cost=0.00..208149.08 rows=20000 width=0) (actual > time=715.489..59518.095 rows=400000 loops=1) > Filter: ((a % 10) = 5) > Rows Removed by Filter: 3600000 Planning time: 0.630 ms Total > runtime: 59701.104 ms > (6 rows) > > The reason why such an extreme result. > I adjusted the system page cache usage to constrain disk cache hit in the > operating system level, so sequential scan is dominated by disk access > performance in this case. On the other hand, columnar cache allowed to host > whole of the records because it omits to cache unreferenced columns. > > * GUCs > shared_buffers = 512MB > shared_preload_libraries = 'cache_scan' > cache_scan.num_blocks = 400 > > [kaigai@iwashi backend]$ free -m > total used free shared buffers > cached > Mem: 7986 7839 146 0 2 > 572 > -/+ buffers/cache: 7265 721 > Swap: 8079 265 7814 > > > Please don't throw me stones. :-) > The primary purpose of this extension is to demonstrate usage of custom-scan > interface and heap_page_prune_hook(). > > Thanks, > -- > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > <kaigai@ak.jp.nec.com> > > > > -----Original Message----- > > From: Haribabu Kommi [mailto:kommi.haribabu@gmail.com] > > Sent: Monday, February 24, 2014 12:42 PM > > To: Kohei KaiGai > > Cc: Kaigai, Kouhei(海外, 浩平); Tom Lane; PgHacker; Robert Haas > > Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for > > cache-only table scan?) > > > > On Fri, Feb 21, 2014 at 2:19 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > > > > > > Hello, > > > > The attached patch is a revised one for cache-only scan module > > on top of custom-scan interface. Please check it. > > > > > > > > Thanks for the revised patch. Please find some minor comments. > > > > 1. memcpy(dest, tuple, HEAPTUPLESIZE); > > + memcpy((char *)dest + HEAPTUPLESIZE, > > > > + tuple->t_data, tuple->t_len); > > > > > > For a normal tuple these two addresses are different but in case of > > ccache, it is a continuous memory. > > Better write a comment as even if it continuous memory, it is > > treated as different only. > > > > 2. + uint32 required = HEAPTUPLESIZE + MAXALIGN(tuple->t_len); > > > > t_len is already maxaligned. No problem of using it again, The > > required length calculation is differing function to function. > > For example, in below part of the same function, the same t_len is > > used directly. It didn't generate any problem, but it may give some > confusion. > > > > 4. + cchunk = ccache_vacuum_tuple(ccache, ccache->root_chunk, &ctid); > > + if (pchunk != NULL && pchunk != cchunk) > > > > + ccache_merge_chunk(ccache, pchunk); > > > > + pchunk = cchunk; > > > > > > The merge_chunk is called only when the heap tuples are spread > > across two cache chunks. Actually one cache chunk can accommodate one > > or more than heap pages. it needs some other way of handling. > > > > 4. for (i=0; i < 20; i++) > > > > Better to replace this magic number with a meaningful macro. > > > > 5. "columner" is present in sgml file. correct it. > > > > 6. "max_cached_attnum" value in the document saying as 128 by default > > but in the code it set as 256. > > > > I will start regress and performance tests. I will inform you the same > > once i finish. > > > > > > Regards, > > Hari Babu > > > > Fujitsu Australia
Attachment
pgsql-hackers by date: