Here at CooliceHost, we will answer the most frequently requested question. We will discusses how ZFS handles read and write caching.
ZFS is a sophisticated file system with numerous features like shared storage, data cleaning, capacity, and much more. However, one of the best aspects of ZFS is how it caches, reads, and writes data. Through the usage of memory, ZFS enables layered data caching.
The Adaptive Replacement Cache is the initial level of caching in ZFS (ARC). When all the ARC space is consumed, ZFS moves the most recently and frequently accessed data to Level 2 Adaptive Replacement Cache (L2ARC).
There is some confusion about the role of the ARC and L2ARC, as well as the ZIL (ZFS Intent Log) and SLOG (separate log).
The ARC and its extended version, the L2ARC, are simple read caches. They exist to accelerate server readings so that the system does not have to sift through sluggish spinning drives every time it needs to find data. Even if it resides on a separate log device, the ZIL is a log and not a write cache (however, people refer to them as such) (SLOG). This post will explain what all these acronyms mean and how their implementation may benefit your server. Firstly, we’ll talk about writes, specifically the ZIL and SLOG.
Writes
When ZFS gets a write request, it does not immediately begin writing to disk; instead, it caches its writes in RAM before sending them out in Transaction Groups (TXGs) at predefined intervals (default of 5 seconds). That is known as a Transactional File System.
That improves performance, since all writes to disk, are better structured and thus easier for spinning drives to handle. It also improves data consistency by preventing partial writes in case of a power outage. Instead of committing a partial write, it will delete the TXG’s data.
To grasp the fundamentals of how ZFS handles writes, you must first understand the distinction between synchronous and asynchronous writes. So let’s get started with that.
Asynchronous Writes
Asynchronous writes: data is quickly cached in RAM and presented to the client as complete, then written to disk later.
When there is a write request from the client, it immediately goes to RAM. The server then informs the client that the writing is complete. After saving it in RAM, the server will continue to accept requests without writing anything to disks until the transaction group is written to disk together. Asynchronous writes are a quick procedure for the end-user because the data only has to be placed in high-speed RAM to be considered done.
What is the problem? Although the data will remain constant in case of a power outage, destroying everything in the transaction group is inevitable because its storage is in volatile memory. Synchronous writes’ intention is to ensure data consistency, however, they degrade performance.
Synchronous Writes (without a separate logging device)
Synchronous writes ought to receive an acknowledgment as having been written to persistent storage media before the write is considered complete.
When the client performs a synchronous write request, it still transmits to RAM first, much like an asynchronous write, but the server will not confirm that the write is complete until its log to the ZFS intent log (ZIL). When the ZIL is modified, the write is committed and confirmed. By default, the ZIL exists as part of your storage pool, which means that the drive heads must physically move location to both update the ZIL and store the data as part of the pool, lowering performance further. Waiting for slower storage media (HDDs) creates performance problems, particularly with tiny random writes.
ZFS’s answer to synchronous write slowdowns and data loss is to place the ZIL on a separate, faster, and persistent storage device (SLOG), usually an SSD.
Synchronous Writes with a SLOG
Client synchronous write requests will be logged significantly faster in the ZIL if its location is on an SSD. If the data on the RAM vanishes due to a power outage, the system will check the ZIL the next time it goes on and find what it was seeking.
Alternatively, the data might be instantly written to disk, pointing to its preservation on the ZIL. After the next TXG goes through, there would be modifications on the data’s metadata to point to the correct location. If a power outage occurs, the server will check the ZIL to see where it was. The system didn’t know where the data was until the following transaction group passed through, and it had to check the ZIL because the metadata didn’t identify where it was. One thing to keep in mind about SLOGs is that it is advisable to mirror them in order for them to do their function of guaranteeing data consistency in the case of a power outage.
So how much does a SLOG help performance?
The influence of a SLOG on performance depends on the app. There will be a significant improvement in tiny IO, and there may be a moderate improvement in sequential IO as well. It could also come in handy for many synchronous writes, including database servers or hosting VMs. However, the prime goal of the SLOG is not to improve speed but to save data that might otherwise expire in the event of a power outage. For mission-critical applications, losing the 5 seconds of data that would have been transmitted across in the next transaction group might be highly costly. That’s also why a SLOG isn’t a literal cache; it’s a log, as the name implies. The SLOG is only accessible in the event of a power outage.
If the 5 seconds of data you might lose is critical, you can compel all writes to be synchronous in trade for a performance hit. If none of the data is mission-critical, sync can be disabled, and all writes can use RAM as a cache, risking the loss of a transaction group. The default is standard sync set by the app and ZFS on every write.
An unofficial condition for selecting a device for a SLOG is to select drives that perform well with a single queue depth. When utilizing a conventional SSD, there might be a performance reduction. That’s because the synchronous writes are not coming over in the large batches, which most SSDs are best at handling. Due to their high speeds at low queue depth and battery to finish writes in case of a power outage, Intel Optane drives are often among the best drives for usage as a SLOG. The presence of a battery in your SLOG is critical if you want it to serve its purpose of data storage.
Reads
ZFS caches read in the system RAM the same way it caches writes. They refer to read cache as an “adaptive replacement cache” (ARC). It is a modified version of IBM’s ARC and, as a result of the more advanced algorithms used by the ARC, is wiser than average read caches.
ARC
The ARC works by storing the most recently and frequently used data in RAM. Unlike the ZIL, it is a literate cache because the data in the ARC on memory is also present in the storage pool on disks. It is just in the ARC to aid speed up read performance, which it does exceptionally well. A big ARC can consume a lot of RAM, but it will release it as other apps require it, and it is dependent on whatever you believe is best for your machine.
When a cold hit occurs, ARC employs a shifting share of most recently used and most often used data by assigning more space to one or the other. It happens when data that was originally cached but has since been pushed out to enable the ARC to store new data is requested. ZFS keeps track of what files were saved in the cache after its removal, allowing cold hits’ identification. Data, which hasn’t been in use for a while or whose usage hasn’t been in exploitation as much as the new data will be dismissed as new data arrives.
The more RAM your machine has, the better, as it simply improves read performance. Due to motherboard RAM slots and budget constraints, extra ARC will have physical as well as financial limitations. And, regrettably, no matter how hard you try, downloading extra RAM is impossible. If your ARC is overflowing without a high enough hit rate and your machine already has a lot of RAM, you should think about installing a level 2 ARC (L2ARC).
L2ARC
L2ARC operates on an SSD rather than significantly faster RAM. However, it is still much quicker than spinning disks. Therefore, when the hit rate for ARC is low, adding an L2ARC may provide some performance improvements. Instead of seeking data on HDDs, the system will look for it on RAM and an SSD to increase performance. L2ARC has commonly considered if the ARC hit rate is less than 90% while having 64+ GB of RAM.
L2ARC will only fill up when your ARC is overflowing and clearing room to make space for new data judged more vital by its algorithm. The ejected data’s new location will be on the L2ARC. The L2ARC could take a long time to fill up.
* L2ARC does not need to be mirrored in the same way that SLOG does because all of the data stored by L2ARC is still present in the pool.
Furthermore, the L2ARC requires a considerable RAM space to track what data is stored there. Therefore, boosting RAM is often preferable before contemplating an L2ARC. SSDs are more expensive than HDDs but significantly less expensive than RAM. Thus, there may be a price/performance trade-off to consider when making this choice.
Summary
- Before the confirmation goes back to the customer, synchronous writes must log in to ZIL. It is included in your storage pool by default.
- SLOG is a separate device on which the ZIL runs. It may increase performance for some applications. Its prime purpose is to prevent data from vanishing during a power outage.
- ARC is a type of RAM used to cache data to improve read performance.
- L2ARC is an ARC extension frequently found on SSDs. Its purpose is to extend the size of the ARC while avoiding the physical constraints of adding extra RAM.