Every community I care about is dead

  • 0 Posts
  • 25 Comments
Joined 2 years ago
cake
Cake day: June 12th, 2023

help-circle

  • Mirrored vdevs allow growth by adding a pair at a time, yes. Healing works with mirrors, because each of the two disks in a mirror are supposed to have the same data as each other. When a read or scrub happens, if there’s any checksum failures it will replace the failed block on Disk1 with Disk2’s copy of that block.

    Many ZFS’ers swear by mirrored vdevs because they give you the best performance, they’re more flexible, and resilvering from a failed mirror disk is an order of magnitude faster than resilvering from a failed RAIDZ - leaving less time for a second disk failure. The big downside is that they eat 50% of your disk capacity. I personally run mirrored vdevs because it’s more flexible for a small home NAS, and I make up for some of the disk inefficiency by being able to buy any-size disks on sale and throw them in whenever I see a good price.


  • The main problem with self-healing is that ZFS needs to have access to two copies of data, usually solved by having 2+ disks. When you expose an mdadm device ZFS will only perceive one disk and one copy of data, so it won’t try to store 2 copies of data anywhere. Underneath, mdadm will be storing the two copies of data, so any healing would need to be handled by mdadm directly instead. ZFS normally auto-heals when it reads data and when it scrubs, but in this setup mdadm would need to start the healing process through whatever measures it has (probably just scrubbing?)


  • ZFS can grow if it has extra space on the disk. The obvious answer is that you should really be using RAIDZ2 instead if you are going with ZFS, but I assume you don’t like the inflexibility of RAIDZ resizing. RAIDZ expansion has been merged into OpenZFS, but it will probably take a year or so to actually land in the next release. RAIDZ2 could still be an option if you aren’t planning on growing before it lands. I don’t have much experience with mdadm, but my guess is that with mdadm+ZFS, features like self-healing won’t work because ZFS isn’t aware of the RAID at a low-level. I would expect it to be slightly janky in a lot of ways compared to RAIDZ, and if you still want to try it you may become the foremost expert on the combination.


  • ZFS without redundancy is not great in the sense that redundancy is ideal in all scenarios, but it’s still a modern filesystem with a lot of good features, just like BTRFS. The main problem will be that it can detect data corruption but not heal it automatically. Transparent compression, snapshotting, data checksums, copy-on-write (power loss resiliency), and reflinking are modern features of both ZFS/BTRFS, and BTRFS additionally offers offline-deduplication, meaning you can deduplicate any data block that exists twice in your pool without incurring the massive resources that ZFS deduplication requires. ZFS is the more mature of the two, and I would use that if you’ve already got ZFS tooling set up on your machine.

    Note that the TrueNAS forums spread a lot of FUD about ZFS, but ZFS without redundancy is ok. I would take anything alarmist from there with a grain of salt. BTRFS and ZFS both store 2 copies of all metadata by default, so bitrot will be auto-healed on a filesystem level when it’s read or scrubbed.

    Edit: As for write amplification, just use ashift=12 and don’t worry too much about it.


  • ZFS doesn’t eat your SSD endurance. If anything it is the best option since you can enable ZSTD compression for smaller reads/writes and reads will often come from the RAM-based ARC cache instead of your SSDs. ZFS is also practically allergic to rewriting data that already exists in the pool, so once something is written it should never cost a write again - especially if you’re using OpenZFS 2.2 or above which has reflinking.

    My guess is you were reading about SLOG devices, which do need heavier endurance as they replicate every write coming into your HDD array (every synchronous write, anyway). SLOG devices are only useful in HDD pools, and even then they’re not a must-have.

    IMO just throw in whatever is cheapest or has your desired performance. Modern SSD write endurance is way better than it used to be and even if you somehow use it all up after a decade, the money you save by buying a cheaper one will pay for the replacement.

    I would also recommend using ZFS or BTRFS on the data drive, even without redundancy. These filesystems store checksums of all data so you know if anything has bitrot when you scrub it. XFS/Ext4/etc store your data but they have no idea if it’s still good or not.


  • Oh I didn’t realize. I wonder why they would do that? Either way it’s not a huge deal - the main problem with TrueNAS Scale is that you actively cannot install Docker onto it because it will conflict with the normal TrueNAS Scale app system. There are technically ways to get Docker working on a TrueNAS Scale system but they’re unsupported and likely to break frequently on updates. Debian and OpenMediaVault should behave similarly in terms of getting Docker set up.


  • I would only recommend TrueNAS Scale if you want an easy way to manage a ZFS pool. I wouldn’t use it for anything related to containers or logic as it’s very inflexible and only supports specific usecases through its weird app system. Debian is my usual choice for basically everything, but it will be fully DIY. OpenMediaVault is a good turnkey option based on Debian that’s similar to TrueNAS Scale, except it allows you to run plain Docker and other custom usecases. I also like Proxmox (based on Debian) a lot but it’s a bit too advanced if you don’t need its hypervisor functionality.



  • yeah you’ve got it about right. Gen 3x4 is 8gb/s*4 == 4GB/s, which is your bottleneck. Hard drives might be closer to ~200-250MB/s each depending on your specific model. That M.2 -> SATA thing seems like it’s more geared towards SATA SSDs with how few ports it has - I wouldn’t be surprised if you could find something with more ports available if needed, or at least for a cheaper price.

    Also as you note, RAID0 will be the fastest config but depending on your RAID configuration or workload you’ll probably be getting less than max bandwidth out of each drive anyway.





  • The main problem is just getting TrueNAS access to the physical disks via IOMMU groups and passthrough. HBA cards are a super easy way to get a dedicated IOMMU group that has all your drives attached, so it’s common for people to use them in these sorts of setups. If you can pull your normal SATA controller down into the TrueNAS VM without messing anything else up on the host layer, it will work the same way as an HBA card for all TrueNAS cares.

    (TMK, SATA controller hubs are usually an all-at-once passthrough, so if you have your host system running off some part of this controller it probably won’t work to unhook it from the host and give it to the guest.)


  • This is a fairly common setup and it’s not too complex - learning more about Proxmox and TrueNAS/ZFS individually will probably be easiest.

    Usually:

    • Proxmox on bare metal

    • TrueNAS Core/Scale in a VM

    • Pass the HBA PCI card through to TrueNAS and set up your ZFS pool there

    • If you run your app stack through Docker, set up a minimal Debian/Alpine host VM (you can technically use Docker under an LXC but experienced people keep saying it causes problems eventually and I’ll take their word for it)

    • If you run your app stack through LXCs, just set them up through Proxmox normally

    • Set up an NFS share through TrueNAS, and connect your app stack to that NFS share

    • (Optional): Just run your ZFS pool on Proxmox itself and skip TrueNAS





  • I would go custom and use hardware that you can re-configure and re-use in the future. If you pick up a Synology now and wind up feeling restricted by it in 2 years, it might become useless e-waste. If you have anything laying around, put that to use while you’re getting your feet wet - you probably don’t know what hardware configurations you’ll end up wanting in a year, and you don’t want to underbuy/overbuy.

    You can also test self-hosting without any real hardware by spinning up a VM and passing in “fake” hard-drives to it. Try setting up a RAID6 in this fashion and see what happens. After you’ve played around enough you can just export all your Docker data etc onto real hardware.

    I haven’t used any of the prebuilt things so I’m not sure how user-friendly they are compared to normal solutions, but I’d find it hard to believe that they offer anything truly unique in terms of being accessible for normies. Assuming you’re going to be the only one taking care of the NAS administration, there’s likely an accessible webUI for every public service you want to offer to your friends/family.