TrueNAS ZFS - practical drive selection

Questions like: "I have eight drives what RAID is best to use?" or "what RAID-Z configuration is best". It's also common for a customer to buy a NAS server and ask us to configure TrueNAS on it because it's cool. Of course there is no problem too I think it is cool, we are happy to configure and support them.

 

It's just that often these servers are selected, well, not necessarily optimally. If you decide on a simple network drive such as a small QNAP or Synology and its performance is not crucial then yes. You multiply the capacity of the disk times the number of disks then subtract one, the spare one, and you have the capacity of the array. End of calculation, nothing simpler.

 

It's just that planning a ZFS-based file server like TrueNAS gets a little more complicated. And if, in addition, we plan to use our server intensively because we plan to keep the disks of vital machines on it, or databases, or video editing and editing, then the matter gets more complicated.

 

Sure, we can buy an ALL-flash array pack it with the fastest NVMe drives, a pile of RAM some not bad processor and it will work as expected. What's more, sometimes huge requirements make it impossible to do otherwise. It's just that if we additionally need our array to have a capacity of tens or hundreds of TB then such a solution will cost several bags of money.

 

But if you come up with the idea of using the so-called Hybrid solution, that is, disk drives supplemented by SSDs, and you want to do it optimally and thus spend less bags of money using the advantages of ZFS. This is when the selection and configuration of drives can get complicated and the answer to the question "What drives do I need for my NAS?" will resound, who would expect the remark: "it depends".

 

I still wanted to point out that not to get particularly attached to specific numbers because some of them change depending on models and time. It's more about orders of magnitude and comparison. The parameters given are more or less up to date as of August 2023. Wanted to focus more on conveying the differences and selection mechanisms than specific numbers.

 

Disk controller with HBA for TrueNAS (ZFS).

Starting with the basics before we start connecting drives we must have something to connect to.
Most servers are equipped with hardware RAID solutions by default. Well, and that's cool. Cool until we want to use ZFS. ZFS needs to communicate directly with the disk so we need either a disk controller built into the board or disk controllers with HBA (host bus adapter) functionality. If we connect the disks through a controller that acts as a hardware RAID then:

  • unnecessarily duplicate RAID functionality (ZFS will do)
  • make it difficult/impossible to access S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology), which is a mechanism that provides the system with information about drive status, failures and even anticipated failures before they happen, is important information. S.M.A.R.T. also provides the ability to read the serial number and ZFS recognizes drives just by their serial number.
  • We make it difficult to replace the disk, because in the case of HBA we turn off the server (or not), change the disk and configure the rest through the TrueNAS administration page. If we have a RAID controller on the way, we have to configure disk replacement not only in TrueNAS, but also in the RAID controller, and often with the server turned off through some menu available at server startup.
  • Hardware controllers are faster in HBA option than in RAID function, have me working
  • if the disk controller uses a built-in cache and for whatever reason we do not realize that the batteries in this cache are running out, we risk losing data in the event of a power failure.

 

Types of data carriers

As for the data storage media themselves, called by the technological honors "disks" in reality, after all, they are most often no longer disks. In addition, the term "hard disks" in contrast to the once soft floppy disks, however, seems to be no longer extremely inadequate. But sticking to this nomenclature, drives, as you probably know, are divided into platters, where data is stored on spinning magnetic platters, commonly known as HDD, and solid-state drives, so-called SSD.

 

HDDs 

HDDs have a good capacity-to-price ratio. At the moment there is no indication that in the world of disk arrays focused on large or huge capacities, platter drives will go away. As for their performance, their contemporary limit is around 150MB/second, the largest commercial products are around 20TB.

 

SSDs

SSDs are significantly faster than disk drives, especially when it comes to random access to various data. As in linear operations of reading a large amount of continuous data, platter drives still somehow manage, but in the case of reading small amounts of random data, the inertia of the mechanics of platter drives significantly reduces the performance of the platter drive. Currently, the capacities and prices of SSDs make them by far the first choice for everyday applications such as desktops and laptops. Regarding the performance of modern SSDs, we can talk about up to 7000MB/s and capacities are 8TB

 

Media interfaces

Once we have where to connect our drives, we already know what kind of drives, now the question remains how, that is, what kind of drive interfaces we have available to us, because we are about to find out how important they are.

 

SATA

Let's start with SATA (from Serial Advanced Technology Attachment), a popular and probably currently the oldest one in use Just as in the case of platter drives the SATA limit of 600MB/s is sufficient, in the case of SSDs with hardware bandwidths of 7000MB/s it is already a bottleneck.

 

SAS

The next interface still used in disk arrays is SAS (Serial Attached SCSI) until the proliferation of SSDs it was the market leader in high-performance server storage. It is faster than SATA drives if only because of its platter speeds of 15k compared to 5.4k or 7.2k for SATA. However, it is a dying technology.

 

Nearline SAS NL-SAS – Multipath

However, there is an exception. Nearline SAS is a rather unknown solution outside the disk array world. It is an intermediate solution between SATA and SAS. It has the bandwidth, capacity and design of SATA drives i.e. 7.2k rotational speed, large capacities, this makes the price no longer an expression of a higher price as in the case of SAS. Why then? You will ask. Because, after all, cheaper than SATA on will be. Well, it will not be, but from SAS solutions it has borrowed controller and communication. And what is the use of combining, as it is neither faster nor cheaper? You will ask.

 

And well, first of all it has a better developed error communication and fault prediction and a more efficient interface, but let's consider it a bit blah blah blah. The most interesting feature is support for Multipath. It allows you to control the same disk from two different controllers. Not at the same time, of course. It's all about the ability to use high-availability architecture. so-called high-availability. It allows for full redundancy. TrueNAS X-series and M-series arrays have the possibility of redundant controller using just Mutipath functionality. As a result, anything can break down - a RAM processor, a power supply, even a disk controller, and still the backup controller will be able to continue operating on exactly the same disks.

 

M.2

The next common interface used today is M.2. These are those flat memories that look a bit like RAM dice. Nowadays they are connected either directly through dedicated ports on the motherboard. Not all motherboards have them, or straight into the motherboard's PCIe port via an appropriate adapter. Alternatively, through an adapter into the port of a regular SATA sync.

 

M.2 SATA

The M.2 SATA variant is less common. M.2 SATA does not bring much into the matter because even though it is an SSD it uses SATA communication so there is still a limit of 600MB/s is not crazy. I mention mainly so as not to get stuffed when buying.

 

M.2 NVMe – PCIe

However, the M.2 NVMe version is already a completely different matter. It is an absolute gamechanger. Connected straight to the PCIe bus, it knocks open the wings of SSD technology. Depending on the PCIe version, it gives throughputs of several GB/s. Interestingly, these drives, despite being much faster, are not more expensive than SSDs and SATA interfixes. Technical note as of August 2023 TrueNAS does not yet support S.M.A.R.T. for NVMe drives. The other note is that not all boards support booting from NVMe drives, especially older server ones, so if you would plan to use a small NVMe for the system, which is a great idea, check first if the board can wake up from such a drive.

 

U.3

On the horizon we can already see the first drives with U.3 interfaces, which should allow you to take advantage of the full capabilities of SSDs Through connectors that look like SATA and, more importantly, backward compatible. But we haven't had to deal with them yet and we'll see how it goes.

 

Which drives for a NAS?

Regarding the specific drives I would already recommend for use in file servers, always aim for series designed for NAS. Whether it will be Segate Ironwolf, Ironerolf Pro, WD RED or WD-RED PRO is already a little less important than the fact that it will be a series for NAS.
Yes informatively WD-RED differs from the pro version in that it is in SMR technology among other things so slightly slower.

 

One thing you should not do is to not put consumer series drives in your NAS. There are known cases of already pulling the same brand, WD BLUE drives after a few months of use in the NAS died like flies, one after another.

 

Drives recommended by TrueNAS

Referring to the recommendations of TrueNAS itself, it recommends WD-RED pro series drives. The exception is the X and M series of products because of the high availability architecture is the need for NL-SAS drives WD UltraStar series is recommended.

 

Recently there has been a bit of a scramble because there have already been several publicized cases of WD drives after three years of use starting to report a warning stating that the drive is already 3 years old and recommending replacements. I'll admit that this is quite strange because WD RED drives are precisely continuous load drives. And even after 6 years, I would consider the failure of such a drive more of an exception than the norm.

 

Incidentally, WD-red pro drives are also recommended by companies such as Synology.

 

vDev

Coming already to the organization of physical disks in ZFS, I will just mention that the basic organizational unit in ZFS is the vDev. Each vDev can consist of one or more disks. What is very important is that reliability is ensured on the basis of vDevs so it is in them that the disks are connected in Mirror or RAZD-Z but about that later.

 

As I mentioned at the beginning in ZFS, a disk can perform one of five roles(vDev) in addition, of course, to just storing data.
I discuss the functioning of vDev in much more detail in the video [TrueNAS - ZFS why it's awesome].. Here I wanted to focus already on the practical selection of drives for specific vDev. It rarely makes sense to use them all. Often it doesn't even make sense to use them at all. But one by one.

 

Disks for vDev Data

It is vDev Data that stores data in, for example, any of the RAID-Z.

 

Approximately, the capacity of such a vDev is the number of disks minus the number of parity disks (RAID-z1 (1). RAID-z2 (2), RAID-z3 (3)) multiplied by the size of the disks. This is basically nothing out of the ordinary, but keep in mind that if you plan to expand your pool later with another vDev with data it must be organized the same way. If you already have a vDev in your pool organized in RAID-Z3, then the next vDev is recommended to be of the same size, but it certainly must be organized in RAID-Z3 as well.

 

Pool możemy prawie bez ograniczeń rozszerzać o następne identyczne  vDev co będzie tylko przyspieszało nasz pool. Raz podłączony vDev z danymi nie możemy praktycznie już odłączać od pool.

 

Dyski do Hot Spare

Poza nadmiarowością wynikającą z RAID-Z Warto też mieć dyski zapasowe czyli Hot Spare. W wypadku awarii jednego z dysków automatycznie zastępują one uszkodzony dysk i macierz jest odbudowywana bez niczyjej interwencji. Co oczywiste w tym wypadku, że dyski muszą być więc identyczne jak te tworzące vDev Data.

 

Dyski do vDev Cache

vDev cache (L2ARC) jest rozszerzeniem pamięci cache w RAM wszystko co przestaje się mieścić w RAM a pozostaje przydatne zostaje wysłane do L2ARC i stamtąd pobierane w razie potrzeby. Oczywiście takie rozwiązanie ma sens tylko wtedy jeżeli dysk L2ARC jest szybszy niż same dyski z danymi. Jeżeli mamy macierz dyskową złożoną z dysków SSD to zastosowanie dysków L2ARC spowolni nam osiągi macierzy.

 

On the other hand, using an oversized L2ARC drive is also not a good idea because each data block on an L2ARC drive has its own 88 byte address in RAM. So for example, if we connect a 480GB L2ARC disk filled with data blocks of 4KB each

 

this will clog up 10GB of RAM just for addresses in L2ARC.

 

So it is recommended that L2ARC be 5 to 20 times the size of our RAM

 

The drives do not need to be redundant in any way because the data is secondary exists anyway in the original and after each reboot the data resource is built from scratch anyway. Disks can be connected and disconnected live without consequences.

 

Disks to vDev SLOG (separate intent log)

The next interesting vDev in ZFS is SLOG. It is used if we use synchronous writing, that is, we request the server to acknowledge the writing of data after it has actually been written non-volatilely. This is where the pinch of theory comes in. ZFS, for optimization reasons, writes data to disks in ordered batches every 5 seconds. Between the arrival of the data behind the write, the data resides in RAM. An unexpected server restart will lead to loss of data not written to the disks. If we start using synchronous mode, i.e. guaranteed writing, ZFS will start writing this data on the fly, between write batches. The only problem is that this temporary data will write to ZIL (ZFS intent log), which is by default on the same disk as the data. As a result, we will practically be writing the same data twice to our pool. Which can overload or overload our server. And this is where the all-white SLOG comes into play. It is a separate vDev dedicated to writing ZIL data.

 

So SLOG only makes sense if we use synchronous operations. And we should use them for all important data.

 

The size of the SLOG may be surprising, but in practice it only has to hold as much data as it can get to the server in five seconds. Even having an already really large disk array capable of taking in a 10Gb link filled to the brim, that's about 1.25GB/s. That's the maximum amount of data that can hit SLOG in five seconds is 6.25GB of data. Taking that into account 16GB, yes 16GB is the recommended size for most configurations.

 

The use of higher capacity can possibly extend the life of such a disk. Keep in mind that to this disk will be written all data written in synchronous mode before writing to the actual disks.

 

SLOG may not make sense if the primary array consists of SSD or NVMe drives. It could prove to be a bottleneck.
SLOG can be added and subtracted at will during the operation of the system.

 

Disks for vDev Metadata

The next way to speed up our disk array is to use vDev Metadata. Using SSD technology, of course. Other names for the same solution are Fusion Pools or ZFS allocation classes, but it's the same thing. After connecting it to our pools, it will store metadata, that is, for example, information about the location of data on the disks.

 

To rozwiązanie sprawdzi się, jeśli nasz pool na macierzy obciążany jest dużą ilością przypadkowych małych odczytów. W losowych operacjach dostępu do małych danych może odciążyć dyski nawet po połowę. Co ostatecznie znacząco przyspiesza takie operacje.

 

Important note, once added vDev Metadata can no longer be subtracted because it stores an integral part of the data. What follows is that such a vDev must necessarily have redundancy because if you lose it, you will also lose the data. If you connect more than one vDev Metedata ZFS will try to balance the load between them. If it runs out of space on them it will start writing to the primary pools disks again.

 

Disks on vDev dedup

If, on the other hand, we conclude that we have a lot of repetitive data and need to use de duplication, the de duplication metadata will default to the same disks as the data. This can overload our disks. In that case, let's connect vDev Dedup. Each block of incoming data will be checked for similarity to all already slept on our disks. This is very aggravating. To make sense of the solution, the fastest possible NVMe drives are recommended.

 

Once connected vDev dedup cannot be disconnected because it contains checksums of data blocks and their addresses. Consequently, once we lose the vDev dedup we lose our data. It is also necessary to maintain redundancy.

 

The size of such a vDev is estimated at 1 GB for every 1TB of data de-duplicated.

 

Drives for the system

Every NAS server, in addition to the data disks and other supporting disks I've already talked about here, must also have disks for the operating system. Here there are no major requirements for size or speed because only the configuration and some logs are stored on those disks. None of our data. In practice, 16 GB size disks are sufficient in most cases.

 

Because of the conical redundancy, the optimal solution is two minimum SSD drives, which is probably 256GB or even 500GB already. In practice, disc drives and USB drives are discouraged.

 

Although as a curiosity I will say that one of our TrueNAS with a small pool of several TB flipping hundreds of GB per day is running itself on two 16GB USB drives. Absolutely nothing that I would recommend for a production solution, but it completely fulfills the role as a small handy, non-critical data storage. Well, and using USB leaves more space for data drives.

 

links

https://www.truenas.com/docs/core/gettingstarted/corehardwareguide/#minimum-hardware-requirements

https://www.truenas.com/docs/core/coretutorials/storage/pools/poolcreate/#vdev-layout

 

 

If you would like to learn more about TrueNAS write to us. We will tell you how it works and why it is worth it?