|
|||
|
I've decided I want to become more proficient in C, so the project is
chose is a ruby memcache C extension. I found the following site with a hash algorithm that seems to be an improvement on the default on, can anyone think of a reason why not to use this? http://www.last.fm/user/RJ/journal/2007/04/10/392555/ Also, since the extension isn't going to be thread safe (I'd have to create a new native thread for each instance of the class I think if I wanted to do that), I was thinking about the best way to structure it would be to use global variables for the C struct's that can only be created once. Then in the initialize function I can make sure to only call the C function mc_new() once. For example: struct memcache *mc = NULL; static VALUE class_initialize(VALUE self) { if (mc == NULL) rb_raise(); mc = mc_new(); return self; } Also, I have a particular need for caching database queries, where buffer the returned rows and then send the whole thing to memcache. Would it be more efficient to implement a string buffer in C rather then use an array or concating strings in ruby? I'm thinking about adding this as an additional feature, where you create a buffer, fill it up, then when you want to do a memcache add/set, you tell it to add/set the contents of the buffer. Thoughts/feedback appreciated. Chris |
|
|
||||
|
||||
|
|
|
|||
|
snacktime wrote the following on 17.07.2007 21:45 :
> I've decided I want to become more proficient in C, so the project is > chose is a ruby memcache C extension. I found the following site with > a hash algorithm that seems to be an improvement on the default on, > can anyone think of a reason why not to use this? > > http://www.last.fm/user/RJ/journal/2007/04/10/392555/ - very poor distribution with few servers, - no control of weight between servers (ie if I have a memcache server with 2GB and one with 128MB, nothing prevents the second from getting 10 times as much access as the first one...). This algorithm benefits very large server farms with identical server configurations where having poor performance over a small part of the cache isn't a big deal. If you want both good controlled distribution and high cache hits with low number of servers, on first thought I'll implement a double query system where cache clients are responsible for moving data accross servers based on an old configuration and a new one. When the hit rate ratio on the new configuration is good and bad on the old one, deactivate queries on the old configuration. Assuming 2 gets and a put to memcache servers are faster than direct access to the data, this is probably a win. With that kind of system you could even handle server failures automatically by creating new configurations on the fly (and using hit ratio stats from the two previous configuration to decide which one is the better new 'old one'). But then again it's only a first thought :-) Lionel. |
|
|||
|
On 7/17/07, snacktime <snacktime@gmail.com> wrote:
> I've decided I want to become more proficient in C, so the project is > chose is a ruby memcache C extension. I found the following site with > a hash algorithm that seems to be an improvement on the default on, > can anyone think of a reason why not to use this? > > http://www.last.fm/user/RJ/journal/2007/04/10/392555/ For what it's worth, there is a pure Ruby implementation of this idea. It's the MemCacheWithConsistentHashing gem. Info can be found in this thread: http://groups.google.com/group/acts_...1d22ce0fc9ccec -- Chris Wanstrath http://errfree.com // http://errtheblog.com |
|
|||
|
On Jul 17, 2007, at 12:45, snacktime wrote:
> I've decided I want to become more proficient in C, so the project is > chose is a ruby memcache C extension. I found the following site with > a hash algorithm that seems to be an improvement on the default on, > can anyone think of a reason why not to use this? > > http://www.last.fm/user/RJ/journal/2007/04/10/392555/ A consistent hashing library is completely orthogonal from a memcached library. You'd be better off wrapping apr_memcache or libmemcache than writing yet another C memcached client. > Also, since the extension isn't going to be thread safe (I'd have to > create a new native thread for each instance of the class I think if I > wanted to do that), I was thinking about the best way to structure it > would be to use global variables for the C struct's that can only be > created once. Then in the initialize function I can make sure to > only call the C function mc_new() once. Only one thread can be in C code at a time (unless you call rb_thread_select), so writing ruby-thread-safe C libraries is probably easier than you suspect. > Also, I have a particular need for caching database queries, where > buffer the returned rows and then send the whole thing to memcache. > Would it be more efficient to implement a string buffer in C rather > then use an array or concating strings in ruby? I'm thinking about > adding this as an additional feature, where you create a buffer, fill > it up, then when you want to do a memcache add/set, you tell it to > add/set the contents of the buffer. It will be a better use of your time to use an existing memcached library and optimize your entire task using benchmarks and profiling. memcached overhead is probably not going to be your bottleneck. -- Poor workers blame their tools. Good workers build better tools. The best workers get their tools to do the work for them. -- Syndicate Wars |
|
|||
|
Lionel Bouton wrote the following on 17.07.2007 22:18 :
> snacktime wrote the following on 17.07.2007 21:45 : >> I've decided I want to become more proficient in C, so the project is >> chose is a ruby memcache C extension. I found the following site with >> a hash algorithm that seems to be an improvement on the default on, >> can anyone think of a reason why not to use this? >> >> http://www.last.fm/user/RJ/journal/2007/04/10/392555/ > > - very poor distribution with few servers, > - no control of weight between servers (ie if I have a memcache server > with 2GB and one with 128MB, nothing prevents the second from getting > 10 times as much access as the first one...). > > > This algorithm benefits very large server farms with identical server > configurations where having poor performance over a small part of the > cache isn't a big deal. > > If you want both good controlled distribution and high cache hits with > low number of servers, on first thought I'll implement a double query > system where cache clients are responsible for moving data accross > servers based on an old configuration and a new one. > When the hit rate ratio on the new configuration is good and bad on > the old one, deactivate queries on the old configuration. > Assuming 2 gets and a put to memcache servers are faster than direct > access to the data, this is probably a win. > > With that kind of system you could even handle server failures > automatically by creating new configurations on the fly (and using hit > ratio stats from the two previous configuration to decide which one is > the better new 'old one'). > > But then again it's only a first thought :-) > > Lionel. > Second thought. Instead of choosing the server based on comparison of hash values with a simple integer comparis, you could compute a XOR distance between the hash value of the key and the hash value of the server. Then you can adjust the distance by a weigth. Example (using 8bit hash keys for readability): key1 as a hash key of 0x01010101 server1 as a hash key of 0x11000111 server2 as a hash key of 0x10011011 dist(key1, server1) = 3 dist(key1, server2) = 5 server1 wins (if equality, give priority to the first server in the list) Now if you want to use weigth, you'll have to adjust the distances by adding a correction factor. You can compute this correction factor by first computing the distribution of distances between 0 and size(hash). Then you can derive the probability of random_distance < another_random_distance + correction_factor by comparing 1 (sum of all distance probabilities) with 1 - the probabilities of each possible distance that is < correction_factor. You now have a mapping of correction_factor -> relative weigth, inverse the map and you can use it to give relative weights to servers (this obviously works better if you have long hashes as you can't have huge weights with small hashes and the more the weight is near the max allowed, the less perfect the distribution amongst servers is). This has the advantages of: - working great with additions/removal of servers (has good has the solution in the original URL), - supports weight between servers, - not ideal distribution, but orders of magnitude better than the other algorithm. Lionel. |
|
|||
|
On 7/17/07, Eric Hodel <drbrain@segment7.net> wrote:
> On Jul 17, 2007, at 12:45, snacktime wrote: > > > I've decided I want to become more proficient in C, so the project is > > chose is a ruby memcache C extension. I found the following site with > > a hash algorithm that seems to be an improvement on the default on, > > can anyone think of a reason why not to use this? > > > > http://www.last.fm/user/RJ/journal/2007/04/10/392555/ > > A consistent hashing library is completely orthogonal from a > memcached library. > > You'd be better off wrapping apr_memcache or libmemcache than writing > yet another C memcached client. > I am wrapping libmemcache, but it looked like it was easy enough to provide your own hashing algorithm. This is actually just part of some other work I'm doing on a query caching layer in activerecord, and memcache was actually taking a significant amount of time in the caching layer, plus I wanted to hone my C skills anyways. Chris |
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| SYDNEY, AUSTRALIA Data Mining Training and Workshop, Early-Bird Deadline Extension till May 5th | lisas@salford-systems.com | Newsgroup comp.soft-sys.sas | 0 | 05-02-2006 05:01 PM |
| Re: 2 periods in extension of Proc export | Guido T | Newsgroup comp.soft-sys.sas | 1 | 11-25-2005 04:00 AM |
| Re: 2 periods in extension of Proc export | Arild S | Newsgroup comp.soft-sys.sas | 0 | 11-24-2005 09:50 AM |
| 2 periods in extension of Proc export | Hari | Newsgroup comp.soft-sys.sas | 0 | 11-24-2005 09:34 AM |