I’m a little teapot 🫖

  • 0 Posts
  • 20 Comments
Joined 2 years ago
cake
Cake day: September 27th, 2023

help-circle

  • You’re best off splitting the routing and WiFi tasks into separate hardware. Buy yourself a used ruckus unleashed r550/650 or r510/610 depending on how much you want to spend for wifi then run routing on whatever hardware is fit for purpose. I usually slap OPNsense on something like a dell/wyse 5070 j5005 mini PC, any mini PC with a PCIe slot will allow you to build a 1/2.5/10GbE router with open software. Chinese N100 router boxes are cheap now too, or you could reuse an old mini PC of some kind.

    I don’t like rolling my own router using arm boards anymore, router distro support for them is unreliable and j5005 pulls <10W anyway.


  • seaQueue@lemmy.worldtoSelfhosted@lemmy.worldHDD randomly unmounting
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    5 months ago

    I’m not sure how to get the N from session history, nor how to check my session history…

    journalctl --list-boots will list all sessions stored in the journal.

    The output is from yesterday, when the device stopped working correctly.

    I’m not familiar with linux kernel, but I can see there is definitely something wrong…

    The HDD (old) is attached to a USB hub (new), I tried switching port of the hub but the same issue happened again, if I try to mount it with sudo mount /mnt/2tb, it says it is already mounted:

    Those messages tell you what’s happening, there’s an unrecoverable error on the USB bus connecting the hard drive which is causing filesystem errors when writes fail. Diagnose that, lose the hub first and directly connect the drive to the pi, then try replacing the cable that attaches the drive if the error still occurs. I’d also check with people in the rpi community in case there are any known issues with USB on your model. There may be some pi specific USB firmware things you can do to increase reliability.

    You can also try disabling UASP for the drive in case BOT transfer somehow stabilizes the connection. You’ll lose performance but that helps with some USB storage bridges.

    Some USB storage bridges are just unreliable under Linux and crash under load, your last option is to buy another drive enclosure that’s tested and known to work correctly. I went through like 5 USB/NVMe enclosures looking for one that worked properly, that whole space is a compatibility mess.


  • seaQueue@lemmy.worldtoSelfhosted@lemmy.worldHDD randomly unmounting
    link
    fedilink
    English
    arrow-up
    13
    ·
    edit-2
    6 months ago

    Don’t just look at sdb hits in the log. Open up that entire session in journalctl kernel mode (journalctl -k -bN where N is the session number in session history) and find the context surrounding the drive dropping and reconnecting.

    You’ll probably find that something caused a USB bus reset or a similar event before the drive dropped and reconnected. if you find nothing like that try switching power supplies for the HDD and/or switching USB ports until you can move the drive to a different USB root port. Use lsusb -t and swap ports until the drive is attached beneath a different root port. You might have a neighboring USB device attached to the bus that’s causing issues for other devices attached to the same root port (it happens, USB devices or drivers sometimes behave badly.)

    Always look at the context of the event when you’re troubleshooting a failure like this, don’t just drill down on the device messages. Most of the time the real cause of the issue preceded the symptom by a bit of time.



  • Buy external drives. Don’t run them in RAID, use one to store backups and plug it in once or twice a week to copy data to it.

    The secret to RAID is that it doesn’t buy you data protection, it buys you uptime to access data while a device in the array is failed. This is most valuable to businesses that can’t afford the downtime that recovery from a backup incurs. The most paranoid RAID will still fail sooner or later, due to hardware or software failure, and as a home user with a limited budget you’re far better off having one offline backup that you can use to recover data from once that happens.

    Backup only data you can’t afford to lose (eg: don’t backup downloaded data that can be replaced easily, like a game or movie collection) and your backups will be much more manageably sized and you won’t need to spend as much on your backup drive. If a backup disk is too much for your budget you can always exploit cloud backup plans, backblaze PC backup has no limit on the size of your backups and only charges something like ~$60/yr.

    Edit: It’s also worth thinking about what kind of data you’re storing and splitting that data across multiple devices if possible. If you’re storing bulk data where performance isn’t critical, like backups from other machines or a movie collection, you can pay a much lower price by buying a hard drive instead of flash. Even if only some of your data requires fast flash you can still use a cheaper HDD to store bulk data and buy a smaller flash drive for performance sensitive tasks. When I build NAS I split my data two pools, one bulk pool of HDDs and one much smaller fast pool comprised of flash storage. Put performance critical data on flash, put bulk storage on HDDs, this will allow you to spend less on bulk and still have fast storage performance for tasks that require it. A 512GB or 1TB SSD alongside a 4TB, 6TB or 8TB HDD is significantly cheaper than spending on a 4TB or 8TB SSD.

    Shop eBay for refurbished storage, it’ll be significantly cheaper than spending on brand new drives.






  • Depends on the SSD, the one I linked is fine for casual home server use. You’re unlikely to see enough of a write workload that endurance will be an issue. That’s an enterprise drive btw, it certainly wasn’t cheap when it was brand new and I doubt running a couple of VMs will wear it quickly. (I’ve had a few of those in service at home for 3-4y, no problems.)

    Consumer drives have more issues, their write endurance is considerably lower than most enterprise parts. You can blow through a cheap consumer SSD’s endurance in mere months with a hypervisor workload so I’d strongly recommend using enterprise drives where possible.

    It’s always worth taking a look at drive datasheets when you’re considering them and comparing the warranty lifespan to your expected usage too. The drive linked above has an expected endurance of like 2PB (~3 DWPD, OR 2TB/day, over 3y) so you shouldn’t have any problems there. See https://www.sandisk.com/content/dam/sandisk-main/en_us/assets/resources/enterprise/data-sheets/cloudspeed-eco-genII-sata-ssd-datasheet.pdf

    Older gen retired or old stock parts are basically the only way I buy home server storage now, the value for your money is tremendous and most drives are lightly used at most.

    Edit: some select consumer SSDs can work fairly well with ZFS too, but they tend to be higher endurance parts with more baked in over provisioning. It was popular to use Samsung 850 or 860 Pros for a while due to their tremendous endurance (the 512GB 850s often had an endurance lifespan of like 10PB+ before failure thanks to good old high endurance MLC flash) but it’s a lot safer to just buy retired enterprise parts now that they’re available cheaply. There are some gotchas that come along with using high endurance consumer drives, like poor sync write performance due to lack of PLP, but you’ll still see far better performance than an HDD.




  • If I had to guess there was a code change in the PVE kernel or in their integrated ZFS module that led to a performance regression for your use case. I don’t really have any feedback there, PVE ships a modified version of an older kernel (6.2?) so something could have been backported into that tree that led to the regression. Same deal with ZFS, whichever version the PVE folks are shipping could have introduced a regression as well.

    Your best bet is to raise an issue with the PVE folks after identifying which kernel version introduced the regression, you’ll want to do a binary search between now and the last known good time that this wasn’t occurring to determine exactly when the issue started - then you can open an issue describing the regression.

    Or just throw a cheap SSD at the problem and move on, that’s what I’d do here. Something like this should outlast the machine you put it in.

    Edit: the Samsung 863a also pops up cheaply from time to time, it has good endurance and PLP. Basically just search fleaBay for SATA drives with capacities of 400/480gb, 800/960gb, 1.6T/1.92T or 3.2T/3.84T and check their datasheets for endurance info and PLP capability. Anything in the 400/800/1600/3200Gb sequence is a model with more overprovisioning and higher endurance (usually refered to as mixed use) model. Those often have 3 DWPD or 5 DWPD ratings and are a safe bet if you have a write heavy workload.





  • Distcc, maybe gluster. Run a docker swarm setup on pve or something.

    Models like those are a little hard to exploit well because of limited network bandwidth between them. Other mini PC models that have a pcie slot are fun because you can jam high speed networking into them along with NVMe then do rapid fail over between machines with very little impact when one goes offline.

    If you do want to bump your bandwidth per machine you might be able to repurpose the wlan m2 slot for a 2.5gbe port, but you’ll likely have to hang the module out the back through a serial port or something. Aquantia USB modules work well too, those can provide 5gbe fairly stably.

    Edit: Oh, you’re talking about the larger desktop elitedesk g1, not the USFF tiny machines. Yeah, you can jam whatever hh cards into these you want - go wild.


  • Bus issues usually. Having a disk (or 4) drop out of a ZFS filesystem regularly isn’t a good time.

    If you can find a combination of enclosure, driver/firmware and USB port that provides you with a reliable connection to the drive then USB is just another storage bus. It’s generally not recommended because that combination (enclosure, chipset, firmware, driver, port) is so variable from situation to situation but if you know how to address the pitfalls it can usually work fine.