Reliable distributed computing systems and applications have become the cornerstone of prominent businesses, especially in automating and managing mission-critical business processes and delivering services to customers. As developers and system administrators of these systems and applications, you are expected to provide all kinds of information technology (IT) solutions that will ensure that you have the most efficient systems available.
This includes tasks such as designing, testing, and implementing strategies for system/application performance, reliability, availability, and scalability, to give end users a satisfying level of service. Caching is one of the many, very basic but effective application delivery techniques you can rely on. Before we go any further, let’s briefly look at what caching is, where and/or how it can be applied, and its benefits?
What is Caching or Content Caching?
Caching (or Content Caching) is a widely-used technique of storing copies of data in a temporary storage location (also known as a cache) so that the data can be easily and quickly accessed, than when it is retrieved from the original storage. The data stored in a cache may include files or fragments of files (such as HTML files, scripts, images, documents, etc.), database operations or records, API calls, DNS records, etc depending on the type and purpose of caching.
A cache can be in the form of hardware or software. Software-based cache (which is the focus of this article) may be implemented at different layers of an application stack.
Caching can be applied at the client-side (or at the application presentation layer), for example, browser caching or app caching (or offline mode). Most if not all modern browsers ship with an implementation of an HTTP cache. You might have heard of the popular phrase “clear your cache” when accessing a web application to enable you to see the latest data or content on a website or application, instead of the browser using an old copy of the content stored locally.
Another example of client-side caching is DNS caching which happens at the operating system (OS) level. It is a temporary storage of information about previous DNS lookups by the OS or web browser.
Caching can also be implemented at the network level, either in a LAN or WAN via proxies. A common example of this type of caching is in CDNs (Content Delivery Networks), which are globally distributed network of web proxy servers.
Thirdly, you can also implement caching at the origin or backend server(s). There are different forms of server-level caching, they include:
- webserver caching (for caching of images, documents, scripts, and so on).
- application caching or memorization (used in reading files from disk, data from other services or processes or requesting data from an API, etc.).
- database caching (to provide in-memory access to frequently used data such as requested database rows, query results, and other operations).
Note that cache data can be stored in any storage system including a database, file, system memory, and so on but should be a faster medium than the primary source. In this regard, in-memory caching is the most effective and commonly used form of caching.
Why Use Caching?
Caching offers numerous benefits including the following:
- At the database level, it improves read performance to microseconds for cached data. You can also use write-back cache to improve write performance, where data is written in memory and later on written to disk or main storage at specified intervals. But the data integrity aspect of it may have potentially disastrous implications. For example, when the system crashes just before data is committed to the main storage.
- At the application level, a cache can store frequently read data within the application process itself, thus reducing data lookup times from seconds down to microseconds, especially over the network.
- Considering overall application and server performance, caching helps to reduce your server load, latency, and network bandwidth as cached data is served to clients, thus improving response time and delivery speeds to clients.
- Caching also allows for content availability especially via CDNs, and many other benefits.
In this article, we will review some of the top open-source (application/database caching and caching proxy servers) tools for implementing server-side caching in Linux.
1. Redis
Redis (REmote DIctionary Server in full) is a free and open-source, fast, high performance, and flexible distributed in-memory computing system that can be used from most if not all programming languages.
It is an in-memory data structure store that works as a caching engine, in-memory persistent on-disk database, and message broker. Although it is developed and tested on Linux (the recommended platform for deploying) and OS X, Redis also works in other POSIX systems such as *BSD, without any external dependencies.
Redis supports numerous data structures such as strings, hashes, lists, sets, sorted sets, bitmaps, streams, and more. This enables programmers to use a specific data structure for solving a specific problem. It supports automatic operations on its data structure such as appending to a string, pushing elements to a list, incrementing the value of a hash, computing set intersection, and more.
Its key features include Redis master-slave replication (which is asynchronous by default), high availability and automatic failover offered using Redis Sentinel, Redis cluster (you can scale horizontally by adding more cluster nodes) and data partitioning (distributing data among multiple Redis instances). It also features support for transactions, Lua scripting, a range of persistence options, and encryption of client-server communication.
Being an in-memory but persistent on-disk database, Redis offers the best performance when it works best with an in-memory dataset. However, you can use it with an on-disk database such as MySQL, PostgreSQL, and many more. For example, you can take very write-heavy small data in Redis and leave other chunks of the data in an on-disk database.
Redis supports security in many ways: one by using a “protected-mode” feature to secure Redis instances from being accessed from external networks. It also supports client-server authentication (where a password is configured in the server and provided in the client) and TLS on all communication channels such as client connections, replication links, and the Redis Cluster bus protocol, and more.
Redis has very many use cases which include database caching, full-page caching, user session data management, API responses storage, Publish/Subscribe messaging system, message queue, and more. These can be applied in games, social networking applications, RSS feeds, real-time data analytics, user recommendations, and so on.
2. Memcached
Memcached is a free and open-source, simple yet powerful, distributed memory object caching system. It is an in-memory key-value store for small chunks of data such as results of database calls, API calls, or page rendering. It runs on Unix-like operating systems including Linux and OS X and also on Microsoft Windows.
Being a developer tool, it is intended for use in boosting speeds of dynamic web applications by caching content (by default, a Least Recently Used (LRU) cache) thus reducing the on-disk database load – it acts as a short term memory for applications. It offers an API for the most popular programming languages.
Memcached supports strings as the only data type. It has a client-server architecture, where half of the logic happens on the client-side and the other half on the server-side. Importantly, clients understand how to pick which server to write to or read from, for an item. Also, a client knows very well what to do in case it can not connect to a server.
Although it a distributed caching system, thus supports clustering, the Memcached servers are disconnected from each other (i.e they are unaware of each other). This means that there is no replication support like in Redis. They also understand how to store and fetch items, manage when to evict, or reuse memory. You can increase available memory by adding more servers.
It supports authentication and encryption via TLS as of Memcached 1.5.13, but this feature is still in the experimental phase.
3. Apache Ignite
Apache Ignite, also a free and open-source, horizontally scalable distributed in-memory key-value store, cache, and multi-model database system that provides powerful processing APIs for computing on distributed data. It is also an in-memory data grid that can be used either in memory or with Ignite native persistence. It runs on UNIX-like systems such as Linux and also Windows.
It features a multi-tier storage, complete SQL support and ACID (Atomicity, Consistency, Isolation, Durability) transactions (supported only at key-value API level) across multiple cluster nodes, co-located processing, and machine learning. It supports automatic integration with any third-party databases, including any RDBMS (such as MySQL, PostgreSQL, Oracle Database, and so on) or NoSQL stores.
It is important to note that although Ignite works as an SQL data store, it is not fully an SQL database. It distinctly handles constraints and indexes compared to traditional databases; it supports primary and secondary indexes, but only the primary indexes are used to enforce uniqueness. Besides, it has no support for foreign key constraints.
Ignite also supports security by allowing you to enable authentication on the server and providing user credentials on clients. There is also support SSL socket communication to provide a secure connection among all Ignite nodes.
Ignite has many uses cases which include caching system, system workload acceleration, real-time data processing, and analytics. It can also be used as a graph-centric platform.
4. Couchbase Server
Couchbase Server is also an open-source, distributed, NoSQL document-oriented engagement database that stores data as items in a key-value format. It works on Linux and other operating systems such as Windows and Mac OS X. It uses a feature-rich, document-oriented query-language called N1QL which provides powerful querying and indexing services to support sub-millisecond operations on data.
Its notable features are a fast key-value store with managed cache, purpose-built indexers, a powerful query engine, scale-out architecture (multi-dimensional scaling), big data and SQL integration, full-stack security, and high-availability.
Couchbase Server comes with native multiple instance cluster support, where a cluster manager tool coordinates all node-activities and provides simply a cluster-wide interface to clients. Importantly, you can add, remove, or replace nodes as required, with no down-time. It also supports data replication across nodes of a cluster, selective data replication across data-centers.
It implements security through TLS using dedicated Couchbase Server-ports, different authentication mechanisms(using either credentials or certificates), role-based access control(to check each authenticated user for system-defined roles they are assigned), auditing, logs, and sessions.
Its use cases include unified programming interface, full-text search, parallel query processing, document management, and indexing and much more It is specifically designed to provide low-latency data management for large-scale interactive web, mobile, and IoT applications.
5. Hazelcast IMDG
Hazelcast IMDG (In-Memory Data Grid) is an open-source, lightweight, fast, and extendable in-memory data grid middleware, that provides elastically scalable distributed In-Memory computing. Hazelcast IMDG also runs on Linux, Windows, and Mac OS X and any other platform with Java installed. It supports a wide variety of flexible and language-native data structures such as Map, Set, List, MultiMap, RingBuffer, and HyperLogLog.
Hazelcast is peer-to-peer and supports simple scalability, cluster setup (with options to gather statistics, monitor via JMX protocol, and manage the cluster with useful utilities), distributed data structures and events, data portioning, and transactions. It is also redundant as it keeps the backup of each data entry on multiple members. To scale your cluster, simply start another instance, data and backups are automatically and evenly balanced.
It provides a collection of useful APIs to access the CPUs in your cluster for maximum processing speed. It also offers distributed implementations of a large number of developer-friendly interfaces from Java such as Map, Queue, ExecutorService, Lock, and JCache.
It’s security features include cluster members and client authentication and access control checks on client operations via the JAAS based security features. It also allows for intercepting socket connections and remote operations executed by the clients, socket-level communication encryption between the cluster members, and enabling SSL/TLS socket communication. But according to the official documentation, most of these security features are offered in the Enterprise version.
It’s most popular use case is distributed in-memory caching and data store. But it can also be deployed for web session clustering, NoSQL replacement, parallel processing, easy messaging, and much more.
6. Mcrouter
Mcrouter is a free and open-source Memcached protocol router for scaling Memcached deployments, developed and maintained by Facebook. It features Memcached ASCII protocol, flexible routing, multi-cluster support, multi-level caches, connection pooling, multiple hashing schemes, prefix routing, replicated pools, production traffic shadowing, online reconfiguration, and destination health monitoring/automatic failover.
Additionally, it supports for cold cache warm-up, rich stats and debugs commands, reliable delete stream quality of service, large values, broadcast operations, and comes with IPv6 and SSL support.
It is being used at Facebook and Instagram as a core component of cache infrastructure, to handle almost 5 billion requests per second at peak.
7. Varnish Cache
Varnish Cache is an open-source flexible, modern and multi-purpose web application accelerator that sits between web clients and an origin server. It runs on all modern Linux, FreeBSD, and Solaris (x86 only) platforms. It is an excellent caching engine and content accelerator that you can deploy in front of a web server such as NGINX, Apache and many others, to listen on the default HTTP port to receive and forward client requests to the web server, and deliver the web servers response to the client.
While acting as a middle-man between clients and the origin servers, Varnish Cache offers several benefits, the elemental being caching web content in memory to alleviate your web server load and improve delivery speeds to clients.
After receiving an HTTP request from a client, it forwards it to the backend webserver. Once the webserver responds, Varnish caches the content in memory and delivers the response to the client. When the client requests for the same content, Varnish will serve it from the cache boosting application response. If it can’t serve content from the cache, the request is forwarded to the backend and the response is cached and delivered to the client.
Varnish features VCL (Varnish Configuration Language – a flexible domain-specific language) used to configure how requests are handled and more, Varnish Modules (VMODS) which are extensions for Varnish Cache.
Security-wise, Varnish Cache supports logging, request inspection, and throttling, authentication, and authorization via VMODS, but it lacks native support for SSL/TLS. You can enable HTTPS for Varnish Cache using an SSL/TLS proxy such as Hitch or NGINX.
You can also use Varnish Cache as a web application firewall, DDoS attack defender, hotlinking protector, load balancer, integration point, single sign-on gateway, authentication and authorization policy mechanism, quick fix for unstable backends, and HTTP request router.
8. Squid Caching Proxy
Another a free and open-source, outstanding, and widely-used proxy, and caching solution for Linux is Squid. It is a feature-rich web proxy cache server software that provides proxy and cache services for popular network protocols including HTTP, HTTPS, and FTP. It also runs on other UNIX platforms and Windows.
Just like Varnish Cache, it receives requests from clients and passes them to specified backend servers. When the backend server responds, it stores a copy of the content in a cache and passes it to the client. Future requests for the same content will be served from the cache, resulting in faster content delivery to the client. So it optimizes the data flow between client and server to improve performance and caches frequently-used content to reduce network traffic and save bandwidth.
Squid comes with features such as distributing the load over intercommunicating hierarchies of proxy servers, producing data concerning web usage patterns(e.g statistics about most-visited sites), enables you to analyze, capture, block, replace, or modify the messages being proxied.
It also supports security features such as rich access control, authorization, and authentication, SSL/TLS support, and activity logging.
9. NGINX
NGINX (pronounced as Engine-X) is an open-source, high performance, full-featured, and very popular consolidated solution for setting up web infrastructure. It is an HTTP server, reverse proxy server, a mail proxy server, and a generic TCP/UDP proxy server.
NGINX offers basic caching capabilities where cached content is stored in a persistent cache on disk. The fascinating part about content caching in NGINX is that it can be configured to deliver stale content from its cache when it can’t fetch fresh content from the origin servers.
NGINX offers a multitude of security features to secure your web systems, these include SSL termination, restricting access with HTTP basic authentication, authentication based on the sub-request result, JWT authentication, restricting access to proxied HTTP resources, restricting access by geographical location, and much more.
It is commonly deployed as a reverse proxy, load balancer, SSL terminator/security gateway, application accelerator/content cache, and API gateway in an application stack. It is also used for streaming media.
10. Apache Traffic Server
Last but not least, we have Apache Traffic Server, an open-source, fast, scalable, and extensible caching proxy server with support for HTTP/1.1 and HTTP/2.0. It is designed to improve network efficiency and performance by caching frequently-accessed content at the edge of a network, for enterprises, ISPs (Internet Server Providers), backbone providers, and more.
It supports both forward and reverses proxying of HTTP/HTTPS traffic. It may also be configured to run in either or both modes simultaneously. It features persistent caching, plugin APIs; support for ICP(Internet Cache Protocol), ESI(Edge Side Includes); Keep-ALive, and more.
In terms of security, Traffic Server supports controlling client access by allowing you to configure clients that are allowed to use the proxy cache, SSL termination for both connections between clients and itself, and between itself and the origin server. It also supports authentication and basic authorization via a plugin, logging(of every request it receives and every error it detects), and monitoring.
Traffic Server can be used as a web proxy cache, forward proxy, reverse proxy, transparent proxy, load balancer, or in a cache hierarchy.
Concluding Remarks
Caching is one of the most beneficial and long-established web content delivery technologies that is primarily designed to increase the speed of web sites or applications. It helps to reduce your server load, latency, and network bandwidth because cached data is served to clients, thus improving application response time and delivery speeds to clients.
In this article, we reviewed the top open-source caching tools to use on Linux systems. If you know other open-source caching tools not listed here please, share with us via the feedback form below. You can also share your thoughts about this article with us.