开发者

Which is faster/better for caching, File System or Memcached?

开发者 https://www.devze.com 2023-01-05 09:17 出处:网络
I don\'t think it\'s clear to me y开发者_如何学Pythonet, is it faster to read things from a file or from memcached?Why?Memcached is faster, but the memory is limited. HDD is huge, but I/O is slow comp

I don't think it's clear to me y开发者_如何学Pythonet, is it faster to read things from a file or from memcached? Why?


Memcached is faster, but the memory is limited. HDD is huge, but I/O is slow compared to memory. You should put the hottest things to memcached, and all the others can go to cache files.
(Or man up and invest some money into more memory like these guys :)

For some benchmarks see: Cache Performance Comparison (File, Memcached, Query Cache, APC)

In theory:

Read 1 MB sequentially from memory       250,000 ns
Disk seek                             10,000,000 ns

http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf


There are quite a few different aspects that might favour one or the other:

  • Do you need/want to share this data between multiple servers? The filesystem is local, memcached is accessed over a network.
  • How large are the items your caching? The filesystem is likely to be better for large objects.
  • How many memcached requests might be there per page? TCP connections and tear-downs might take up more time than a simple filesystem stat() on the local machine.

I would suggest you look at your use case and do some profiling of both approaches. If you can get away with using the filesystem then I would. Adding in memcached adds in another layer of complexity and potential points of failure (memcached client/server).

For what it's worth the other comments about disk vs memory performance might well be academic as if the filesystem data is being accessed regularly then it'll likely be sitting in OS or disk cache memory anyway.


"Faster" can not be used without context. For example, accessing data in memcached on remote server can be "slower" due to network latency. In the other hand, reading data from remote server memory via 10Gb network can be "faster" than reading same data from local disk.

The main difference between caching on the filesystem and using memcached is that memcached is a complete caching solution. So there is LRU lists, expiration concept (data freshness), some high-level operations, like cas/inc/dec/append/prepend/replace.

Memcached is easy to deploy and monitor (how can we distinguish "cache" workload on filesystem from, let's say kernel? Can we calculate total amount of cached data? Data distribution? Capacity planning? And so on).

There are also some hybrid systems, like cachelot Basically, it's memcached that can be embedded right into the application, so the cache would be accessible without any syscalls or network IO.


In fact, it is not as simple as that reading from memory is much faster than reading from HDD. As you known, Memcached is based on tcp connection, if you make connection each time you want to get sth or set sth to memcached server(that is most of programmers do), it proberly performs poorly than using file cache. You should use static Memcached object, and reuse the object. Secondly, the modern OS's will cached files that are frequently used, that makes file caches might be faster than memcaches which are actualy TCP connections.


Cache Type | Cache Gets/sec
Array Cache | 365000
APC Cache | 98000
File Cache | 27000 Memcached Cache (TCP/IP) | 12200
MySQL Query Cache (TCP/IP) | 9900
MySQL Query Cache (Unix Socket) | 13500
Selecting from table (TCP/IP) | 5100
Selecting from table (Unix Socket) | 7400

Source:
https://surniaulula.com/os/unix/memcached-vs-disk-cache/

Source of my source :)
https://www.percona.com/blog/2006/08/09/cache-performance-comparison/


You're being awefully vauge on the details. And I believe the answer your looking for depends on the situtation. To my knowledge very few things tend to be better than the other all the time.

Obviously it woudln't be faster to read things of the file system (assuming that it's a harddrive). Even a SDD will be noticably slower than in-memory reads. And the reason for that is that HDD/FileSystem is built for capacity not speed, while DDR memory is particulary fast for that reason.

Good caching means to keep frequently accessed parts in memory and the not so frequently accessed things on disk (persistent storage). That way the normal case would be vastly improved by your caching implementation. That's your goal. Make sure you have a good understanding of your ideal caching policy. That will require extensive benchmarking and testing.


It depends if the cache is stored locally. Memcache can store data across a network, which isn't necessarily faster than a local disk.


If that file is stored in disk, and it is accessed frequently, there will be a high probability of finding it in the RAM (as a recently accessed file) or I am missing something here? Yes the first read will be from the disk which is awefully slow, but what about the subsequent reads (assuming that file is hot and its getting lots of reads) this should be even faster than memcached as this is a pure RAM read

0

精彩评论

暂无评论...
验证码 换一张
取 消