• 0 Posts
  • 40 Comments
Joined 2 years ago
cake
Cake day: June 14th, 2023

help-circle
  • For RAID that’s pretty much it as far as I know, but I’m pretty sure it can be a lot simpler and more flexible using some of these newfangled filesystems that are out nowadays like LVM and ZFS and maybe BTRFS? I can’t pretend I’m super up to date on all the latest technologies, I know they can do some really incredible stuff though. I’m not familiar enough to recommend it, but it might be worth looking into what they can do for you if your NAS supports it. From what I understand they don’t use RAID at all, although they might be able to simulate it, instead they treat disks as JBOD (just a bunch of disks) and use their own strategies to spread whole filesystems and partition structures across them in various safe and redundant ways that are way more flexible, that don’t care about disk size or anything like that, they’ll handle any shapes and sizes and I think they can be expanded and contracted pretty freely. I think ZFS in particular is really heavily used for this and supports some crazy complicated structures.


  • At the end of the day it doesn’t matter so much if they’re in 2x 2 bays or 1x 4 bay that’s backing itself up. It might give a little extra redundancy and safety to have them on separate NAS but the backup software is what’s going to be doing the heavy lifting here and it shouldn’t really matter whether it’s talking to two different disks/arrays on the same machine/NAS (as long as the NAS allows you to split the 4 drives into 2 different arrays which from my experience they do)


  • I don’t know what kind of data this is but when you say the whole household’s data is going to be on it, I want to take a moment to point out that while RAID1 is redundant, it is NOT a backup. Both drives will happily delete, overwrite, corrupt, or encrypt all your data as quickly as you can blink the moment they believe something has told them to, and will both do it simultaneously to both “redundant” copies of your data. It also won’t help if your powersupply blows up and nukes both drives at once. It only guards against individual hardware failure of a single disk, nothing else. While that failure mode is quite common (and using RAID actually increases the risk of it) it’s important to remember that it’s also not the only cause of data loss.

    If any of this data is important and irreplaceable, consider whether you’d be better off spending your additional future budget setting up another pair of drives to maintain continuous backups. There are a variety of simple tools that can create incremental, time-machine-like backups from hard-drive based storage to other hard-drive based storage while using a minimal amount of additional space (I use this rock-solid script based on rsync but literally there are dozens of backup tools that do almost exactly the same thing, often using rsync under the hood themselves). This still won’t help you if say, your house burns down with both drive arrays inside it, but it’s an improvement over a single huge RAID NAS and gives you the option to roll back from a known-good snapshot or restore a file that was deleted or corrupted long ago and you never noticed.

    To answer your original question, it generally isn’t possible to do what you’re asking. You might be able to get away with starting the RAID array as RAID1+0 and pretending that half the drives (the RAID1 mirror side) have failed, but that will mean your two existing disks are running in RAID0 striping mode with no RAID1 mirrors, and a failure of EITHER one will lose all your data until you get the second two drives installed. And that’s super sketchy and would be tricky to even set up. You cannot run a RAID1+0 with only two drives in mirror mode because they’ll both be missing their striped RAID0 volume. In fact, if this happens on a live array, you lose the whole array in that case too. Despite having 4 drives, RAID1+0 is technically still only singly-redundant. Any single failure can be tolerated, but two failures can make the whole array unrecoverable if they happen to be the wrong two failures (both failures from the same stripe, leaving only two working RAID1 mirrors of the other stripe), and due to striping it really is unrecoverable. Only small chunks of each file will be available on the surviving RAID1 mirrors.

    In almost all cases, changing the geometry of the array means rebuilding it from scratch, and you usually need some form of temporary storage to be able to do that. The good news is, if you decide to add 2 drives to an existing 2 drive RAID1 setup, you have 4 drives, each 4TB. and you cannot possibly have more than 4TB of data because your existing two drives are RAID1 and only have 4TB capacity between them. You can probably use 3 of those drives to set up a 4-drive RAID 1+0 with a missing drive, after copying all the data from your RAID1 array onto drive #4 temporarily. Then once the 3-drive array is up, copy it back onto the NAS array. Finally, you can slot drive #4 into the NAS as well, treating it as a “new” drive to replace the “failed” one, and the array should sync over all the stripes it needs and bring it into the array properly. This is all definitely possible with Linux’s built-in software RAID tools (I’ve done stupider things) however whether your specific NAS box will let you to do this successfully is something I can’t promise.

    It’s important to keep in mind this is all sketchy as hell (remember what I said about backups and asking whether this data was irreplaceable? yeah. don’t stop thinking about that), but technically it should work.

    Edit to add: Another perspective is, once you get your 2 additional drives, you can turn your NAS drive + backup drive into two RAID0s to extend them. A pair of 4 TB RAID0 drives gives you the 8 TB of storage you ultimately want. A second pair of RAID0 drives gives you 8 TB that you can use to make regular backups of the primary RAID0. Again you need to do some array rebuilding, but this time you have an already-existing backup so you don’t even have to worry about dancing around creating initially-broken arrays. Yes the risk of a RAID0 failure taking down one of the arrays is much higher, but that’s what your backups are for. If a single drive fails, you either lose the primary array (sucks, but you still have all your backups on the other RAID0 safe and sound) or you lose the backup (not a big deal because the primary’s still happy and healthy, and once you fix the backup array you can start making new backups again). Either way, you’re now relying on an actual backup strategy to ensure your data is safe instead of relying on RAID1, which is not a backup. The only thing you lose is the the continuous uptime if you do have a failure in the primary array, and the ability of RAID1 to read from both arrays at once and theoretically increase the read speed. But the advantages of gaining a proper scheduled backup outweigh that in my opinion.


  • It’s mostly a relic from an older time, it can be useful for more traditional services and situations that struggle with sharing public IPs. In theory, things like multiple IP addresses (and IPv6’s near unlimited addresses) could be used to make things simpler – you don’t need reverse proxies and NAT and port forwarding (all of which were once viewed as excessive complexity if not outright ugly hacks instead of the virtual necessity they are today).

    Each service would have its own dedicated public IP, you’d connect them up with IP routing the way the kernel gods intended, and everything would be straightforward, clear, and happy. If such a quantity of IPs were freely available, this would indeed be a simpler life in many ways. And yet it’s such a distant fantasy now that it’s understandable (though a little funny) to hear you describe it as “additional complexity” when, depending on how you look at it, the opposite is true…

    From a modern perspective, you’re absolutely right. The tables have really been turned, we have taken the limitation of IP addresses in stride, we have built elaborate systems of tools and layers of abstraction that not only turn these IP-shortage lemons into lemonade, the way we’ve virtualized the connections through featureful and easily-configurable software layers like private IP ranges, IP masquerading, proxies and tunnels can be used to achieve immense flexibility and reliable security. Most software now natively supports handling multiple services on a single IP or even a single port, and in some cases it requires it. This was not always the case.

    It’s sort of like the divide between hardware RAID and software RAID. Once upon a time, software RAID was slow, messy, confusing, unreliable, and distinctly inferior to “true” hardware RAID, which was plug-and-play with powerful configuration options. Nobody would willingly use software RAID if they had any other choice, the best RAID cards were sold for thousands of dollars and motherboards advertised how much hardware RAID they had built-in. But over time, as CPUs and software became faster and more powerful, the tide changed, and people started to realize that actually, hardware RAID was the one that left you tied to an expensive proprietary controller that could fail or become obsolete and leave your array a difficult to migrate or recover mess, whereas software RAID was reliable, predictable, upgradable, supporting a wide variety of disk types and layouts while still performing solidly and was generally far nicer to work with. It became the more common configuration, and found its way into almost every OS. You can now set up software RAID simply by clicking an option in a menu, even in Windows, and it basically works flawlessly without any additional thought.

    Times change, we adapt to the technologies that are most common and that work the best in the situations we’re using them in, and we make them better until they’re not just a last resort anymore, but become a first choice, while the old way becomes a confusing anachronism. That’s what multiple public IPs have become nowadays, for most purposes.



  • cecilkorik@lemmy.catoSelfhosted@lemmy.worldPlex now want to SELL your personal data
    link
    fedilink
    English
    arrow-up
    18
    arrow-down
    3
    ·
    edit-2
    1 month ago

    I just want to tell my mom “install this app on your tv and log in”

    I mean, if I didn’t know better, I’d start to suspect that the large multimedia corporations building walled gardens of apps in closed Smart TV ecosystems don’t really want you to be able to easily tell your mom how to watch shit for free. I mean they’ll let you, if you really insist on having that app available, but someone will have to pay THEM money instead first (and probably let them spy on you). That’s their racket.

    The reason Plex can do it is because they do make money, doing shitty stuff like this to their users, so they can use that money to open these doors into SmartTV-land. The root of the problem is that your SmartTV itself (and your mom’s) is a locked down proprietary piece of shit, designed exclusively for shoving all proprietary content these media companies develop down your throat, and there are few convenient workarounds that are available to us, because of course they make workarounds as inconvenient as possible.

    Unless you’re willing to ditch everything proprietary and insist on open technology for everything, which is hard on its own, you’re going to end up with a janky mix of proprietary and open systems that always require some compromises, because the proprietary stuff forces us to compromise. It’s literally a “this is why we can’t have nice things” situation.



  • I trust the community, but not blindly. I trust those who have a proven track record, and I proxy that trust through them whenever possible. I trust the standards and quality of the Debian organization and by extension I trust the packages they maintain and curate. If I have to install something from source that is outside a major distribution then my trust might be reduced. I might do some cursory research on the history of the project and the people behind it, I might look closer at the code. Or I might not. A lot of software doesn’t require much trust. A web app running in its own limited user on a well-secured and up-to-date VPS or VM, in the unlikely event it turned out to be a malicious backdoor, it is simply an annoyance and it will be purged. In its own limited user, there’s not that much it can do and it can’t really hide. If I’m off the beaten track in something that requires a bit more trust, something security related, or something that I’m going to run it as root, or it’s going to be running as a core part of my network, I’ll go further. Maybe I “audit” in the sense that I check the bug tracker and for CVEs to understand how seriously they take potential security issues.

    Yeah if that malicious software I ran that I didn’t think required a lot of trust, happens to have snuck in a way to use a bunch of 0-day exploits and gets root access and gets into the rest of my network and starts injecting itself into my hardware persistently then I’m going to have a really bad day probably followed by a really bad year. That’s a given. It’s a risk that is always present, I’m a single guy homelabbing a bunch of fun stuff, I’m no match for a sophisticated and likely targeted nation-state level attack, and I’m never going to be. If On the other hand if I get hacked and ransomwared along with 10,000 other people from some compromised project that I trusted a little too much at least I’ll consider myself in good company, give the hackers credit where credit is due, and I’ll try to learn from the experience. But I will say they’d better be really sneaky, do their attack quickly and it had better be very sophisticated, because I’m not stupid either and I do pay pretty close attention to changes to my network and to any new software I’m running in particular.



  • I’ve moved to an “infrastructure as code” approach, not using any fancy tools in particular, primarily just bash shell scripts. Basically almost everything I setup or do gets documented via shell scripts, I write them as I go when I’m learning to install something new, and before I commit to something to new, I take extra care to make sure the scripts are idempotent so that when I want to do make any changes, all I need to do is add it to the appropriate script and re-run it.

    The idempotent part takes some effort sometimes, but is not actually as hard as it seems, particularly if you don’t mind that it sometimes spends some wasted time doing things that have already been done, and occasionally spits out some harmless error messages because something is already done, but I also try to minimize that when I can. The consequences of doing too much by re-running are rarely serious. Yeah sometimes the scripts can break, but as long as they fail properly (set -euo pipefail) it’s usually pretty obvious how to fix it and it won’t leave too much of a mess.

    Doing this has transformed my homelab from a mess of unknowable higgledy-piggledy spaghetti-services that was always teetering one small failure away from total collapse and frantic rebuilding, into something repeatable and reproducible that I can actually … wait for it … test. Just firing up a Linux ISO in a VM is all I need to test everything I’m doing in a perfect sandbox, and I can throw it away when I’m done with no regrets. Plus it makes rolling out new servers, and more importantly, decommissioning old ones, a breeze, you know exactly what’s on them and how it was set up, because it was all in your scripts. Combined with good data backups (which are also set up in the scripts) and restores (which I also test with scripts) it really takes the drama and stress out of migrations and even hardware failures.

    Yeah there are probably easier ways to accomplish what I’m doing using some of the technologies like terraform, ansible and nix/flake that people have mentioned, and I’ve dabbled with those, but for me, the shell script approach strikes a nice balance of not just documenting but also learning the process myself so that I understand enough of what it’s doing to effectively debug it when something goes wrong, and it works on almost everything and in most cases requires no installation or setup. Bash is everywhere. I even have an infrastructure-as-code setup for my Steam Deck to install stuff and get it set up the way I want.


  • Literally any old PC is likely fine. It may be slow, it may struggle or even fail with some of the very complex software (perhaps you will encounter timeouts, or you will spend so much time waiting for memory to swap in or out to disk that it won’t be worth using) but you can run Linux itself on a potato and if your machine isn’t powerful enough, maybe you can get a second one and run different stuff on each, or just scale down your expectations and don’t try to self-host LITERALLY everything just because you can. Certain services are very intense, others will run on a very small piece of a potato.


  • It’s aggressively privacy-first in some ways. It doesn’t do any self-updating which could be considered phoning home, so you have to make sure you have a way to keep it updated, through a package manager or otherwise. There’s a separate update monitor if you want that, for Windows at least. I tend to dial back the anti-fingerprinting a bit because it just makes browsing frustrating to me. I understand the risk of fingerprinting, and it’s good that they do everything they can to avoid being fingerprinted, but it doesn’t strike the right balance for me. Particularly forcing light mode, I absolutely fucking loathe getting light blasted unexpected into my eyeballs, I always have. The biggest mistake technology ever made in my opinion was trying to pretend an actively illuminated screen was paper and make it blinding white.

    I’ve so far resisted the urge to enable DRM. If something won’t show me stuff without DRM I’m willing to just say I don’t want to watch it.

    And obviously as per the topic, I turn on sync, which is not on by default, but that’s easy and a sensible default. Honestly it’s mostly sensible defaults.



  • I wouldn’t stress about it. People are overly delicate with their hard drives in my experience. They’re surprisingly sturdy and failure tends to be pretty random. There might be a slight statistical correlation in failure rates with minor vibration, but anecdotally I’ve got drives that vibrate the hell out of themselves (probably due to some other manufacturing defect) and have lasted decades with no errors, and plenty that fail completely for no perceptible reason at all. Spinning disks are just inherently unreliable, not that any storage technology is perfectly reliable. This is why backups are never optional.




  • I have been constantly asking myself why there isn’t something like this, and just wondering if maybe I was missing something about the seeming immense complexity of doing this on a small scale.

    Now there is something like this.

    I don’t love PHP, but I also don’t love having dozens of separate passwords, keys, certificates and other nonsense to keep track of like I’m doing now. I don’t mind using PHP to get around that if I can.


  • Nextcloud file sync is a convenient centralized solution but it’s not designed for performance. Nothing about Nextcloud is designed for performance. It’s an “everything and the kitchen sink” multi-user cloud solution. That is nice for a lot of reasons. Nextcloud Sync is essentially a drop-in replacement for Google Drive or OneDrive or Dropbox that multiple people can use and that’s awesome. It works the same way as those tools, which is a blessing and a curse.

    Nextcloud is for the same role you SAY you want, “All I want is a simple file sync setup like onedrive but without the microsoft.” That’s what it is. But I don’t think it’s what you’re actually asking for, and it’s not supposed to be. It has its role, and it’s good at that role. But I don’t think you actually want what you say you want, because in the details you’re describing something totally different.

    If you want performance sync for just files, SyncThing is made for this. It has better conflict resolution. It has better decentralized connectivity, it doesn’t need the public IP server. It uses a very different approach to configuration. Its configuration is front-loaded, it takes a fair bit of work to get things talking to each other. It’s not suitable for the same things Nextcloud Sync is. But once you have it set up it’s rock solid reliable and blazing fast.

    Personally I use both SyncThing and NextCloud Sync. I use them for different purposes, in different situations. NextCloud Sync takes care of my Windows documents and pictures, I use it to share photos with my family. I use it to sync one of the factors for my password vault. It works fine for this.

    I also use SyncThing for large data sets that require higher performance. I have almost 400 GB of shared program data, (and game data/saved games), some of which I sync with SyncThing to multiple workstations in different parts of the country. It can deal with complex simultaneous usage that sometimes causes conflicts. It supports fine tuning sync strategies and files to ignore using configuration dotfiles. It’s a great tool. I couldn’t live without it. But I use both. They both have their place.


  • Nextcloud Notes or Joplin (nevermind all the other features Nextcloud provides) tick most of your boxes. They’re more productivity focused than privacy focused, it doesn’t do “zero knowledge” encryption the way you’re describing, but I don’t really understand the point of that when you’re self-hosting and the server host belongs to you anyway. The federation may leave you wanting more and the collaboration might not be “real time” enough for you either, though. If you can build something better by all means go for it.