Admiral Patrick

I’m surprisingly level-headed for being a walking knot of anxiety.

Ask me anything.

Special skills include: Knowing all the “na na na nah nah nah na” parts of the Three’s Company theme.

I also develop Tesseract UI for Lemmy/Sublinks

Avatar by @SatyrSack@feddit.org

  • 4 Posts
  • 155 Comments
Joined 3 years ago
cake
Cake day: June 6th, 2023

help-circle
  • Basically the only thing you want to present with a challenge is the paths/virtual hosts for the web frontends.

    Anything /api/v3/ is client-to-server API (i.e. how your client talk to your instance) and needs to be obstruction-free. Otherwise, clients/apps won’t be able to use the API. Same for /pictrs since that proxies through Lemmy and is a de-facto API endpoint (even though it’s a separate component).

    Federation traffic also needs to be exempt, but it’s not based on routes but by the HTTP Accept request header and request method.

    Looking at the Nginx proxy config, there’s this mapping which tells Nginx how to route inbound requests:

    nginx_internal.conf: https://raw.githubusercontent.com/LemmyNet/lemmy-ansible/main/templates/nginx_internal.conf

        map "$request_method:$http_accept" $proxpass {
            # If no explicit matches exists below, send traffic to lemmy-ui
            default "http://lemmy-ui:1234/";
    
            # GET/HEAD requests that accepts ActivityPub or Linked Data JSON should go to lemmy.
            #
            # These requests are used by Mastodon and other fediverse instances to look up profile information,
            # discover site information and so on.
            "~^(?:GET|HEAD):.*?application\/(?:activity|ld)\+json" "http://lemmy:8536/";
    
            # All non-GET/HEAD requests should go to lemmy
            #
            # Rather than calling out POST, PUT, DELETE, PATCH, CONNECT and all the verbs manually
            # we simply negate the GET|HEAD pattern from above and accept all possibly $http_accept values
            "~^(?!(GET|HEAD)).*:" "http://lemmy:8536/";
    

  • I also run (well, ran) a local registry. It ended up being more trouble than it was worth.

    Would you have to docker load them all when rebuilding a host?

    Only if you want to ensure you bring the replacement stack back up with the exact same version of everything or need to bring it up while you’re offline. I’m bad about using the :latest tag so this is my way of version-controlling. I’ve had things break (cough Authelia cough) when I moved it to another server and it pulled a newer image that had breaking config changes.

    For me, it’s about having everything I need on hand in order to quickly move a service or restore it from a backup. It also depends on what your needs are and the challenges you are trying to overcome. i.e. When I started doing this style of deployment, I had slow, unreliable, ad heavily data-capped internet. Even if my connection was up, pulling a bunch of images was time consuming and ate away at my measly satellite internet data cap. Having the ability to rebuild stuff offline was a hard requirement when I started doing things this way. That’s now no longer a limitation, but I like the way this works so have stuck with it.

    Everything a service (or stack of services) needs is all in my deploy directory which looks like this:

    /apps/{app_name}/
        docker-compose.yml
        .env
        build/
            Dockerfile
            {build assets}
        data/
            {app_name}
            {app2_name}  # If there are multiple applications in the stack
            ...
        conf/                   # If separate from the app data
            {app_name}
            {app2_name}
            ...
        images/
            {app_name}-{tag}-{arch}.tar.gz
            {app2_name}-{tag}-{arch}.tar.gz
    

    When I run backups, I tar.gz the whole base {app_name} folder which includes the deploy file, data, config, and dumps of its services images and pipe that over SSH to my backup server (rsync also works for this). The only ones I do differently are ones with in-stack databases that need a consistent snapshot.

    When I pull new images to update the stack, I move the old images and docker save the now current ones. The old images get deleted after the update is considered successful (so usually within 3-5 days).

    A local registry would work, but you would have to re-tag all of the pre-made images to your registry (e.g. docker tag library/nginx docker.example.com/nginx) in order to push them to it. That makes updates more involved and was a frequent cause of me running 2+ year old versions of some images.

    Plus, you’d need the registry server and any infrastructure it needs such as DNS, file server, reverse proxy, etc before you could bootstrap anything else. Or if you’re deploying your stack to a different environment outside your own, then your registry server might not be available.

    Bottom line is I am a big fan of using Docker to make my complex stacks easy to port around, backup, and restore. There’s many ways to do that, but this is what works best for me.


  • Yep. I’ve got a bunch of apps that work offline, so I back up the currently deployed version of the image in case of hardware or other failure that requires me to re-deploy it. I also have quite a few custom-built images that take a while to build, so having a backup of the built image is convenient.

    I structure my Docker-based apps into dedicated folders with all of their config and data directories inside a main container directory so everything is kept together. I also make an images directory which holds backup dumps of the images for the stack.

    • Backup: docker save {image}:{tag} | gzip -9 > ./images/{image}-{tag}-{arch}.tar.gz
    • Restore docker load < ./images/{image}-{tag}-{arch}.tar.gz

    It will backup/restore with the image and tag used during the save step. The load step will accept a gzipped tar so you don’t even need to decompress it first. My older stuff doesn’t have the architecture in the filename but I’ve started adding that lately now that I have a mix of amd64 and arm64.





  • I’m now running 9 of the Dell equivalents to those, and they’re doing well. Average 15-20 watts at normal load and usually no more than 30-35 watts running full tilt. 5 of them are unprovisioned but I got a good deal on them for $25/each so I couldn’t pass them up :shrug:.

    Attempting to cable-manage the power bricks for more than 1 of these is the worst part of using them. The only life pro tip I can offer is to ditch the power bricks and buy a 65W USB-C power delivery adapter that’s in the “wall wart” style and also one of the USB-C to Lenovo power adapter cords. Those make cable management so much better.

    Wall Wart

    Adapter Cable (these are for my Dells but they make them for most brands/styles)


  • I downgraded from used enterprise gear to those ultra small form factor PCs. They sip power well enough on their own that I haven’t really bothered tuning anything. I suppose I could cap the frequency with cpufrequtils and set the governor to conservative rather than on-demand (I do this with my battery-powered RasPi projects) but I’m not sure how much difference that’ll make for my servers.

    In the past, I had Docker Swarm setup and automation to collapse the swarm down to a single machine (powering the other ones down and back on with WoL) but that was more trouble than it was worth. On average load, the USFF PCs run at about 15 watts and don’t usually peak above 30 unless they’re rebooting or doing something very heavy. Even transcoding doesn’t break 20 watts since I’m using hardware acceleration.

    The biggest power savings I found that was worth the effort was to just get rid of the enterprise gear, switch from VMs to Docker containers where possible, and get rid of stuff I’m not using (or only run it on-demand).

    The only remaining enterprise power suck I have left is my managed switch. It’s a 2005-era dinosaur that’s loud and power hungry, but it’s been a workhorse I’m having a hard time parting with.


  • Let me first say that I love the idea of this phone and it breaking free of the “tall, skinny rectangle” form factor. The physical keyboard is a huge draw for me as well. However, there are some things on the software side that are definitely making me wary.

    While it offers a screen for viewing and responding to messages, the Communicator doesn’t provide access to addictive social media apps or games. Instead, the company partnered with the maker of an Android launcher, Niagara Launcher, to provide access to messaging apps and productivity tools like Gmail, Telegram, WhatsApp, and Slack.

    I don’t understand why it would limit apps. $499 is a lot to spend on a secondary device, and I don’t know that I’d want to EDC two devices. That’s a lie. I know I wouldn’t want to everyday carry two phones because I did that for work and absolutely hated it.

    Most apps work fine on smaller screens. I’ve been daily driving a Cat S22 Flip with a portrait-oriented 480x640 screen for over a year, and most apps scale just fine.

    I’ve at least heard of Niagara Launcher. Is that saying the only way to use those apps is through the launcher’s integration? That sounds shitty.

    The company is teasing the possibility of integrating AI applications with this button

    Dear god, no.

    The phone’s standout feature is its Signal Light, a light-up button on the side of the device that can be customized with different colors and light patterns to indicate when you’ve received messages from certain people, groups, or app

    So, a feature that has existed for years but was taken away from us? My old OnePlus had a customizable RGB light which could be configured the same way. It was really handy, and I hated the “always on” display that replaced it. I could tell from the color and pattern what kind of notification it was without having to preview it which was nice as it didn’t stress me out with a need to reply.

    I want to like this, but it seems like they’re being very opinionated on how you actually use it. Maybe we’ll get lucky and it’ll be bootloader unlockable and LineageOS can save it from the shitty decisions of the manufacturer.

    Edit: Submitted a question/ticket to ask support. Every time a promising-looking device is announced, I always ask. The answer is usually either “What? What do you mean?” or “No”. One of these days, there will hopefully be a manufacturer that doesn’t equate Android with Google.


  • Like you’re thinking: put HAProxy on your OpenWRT router.

    That’s what I do. The HAProxy setup is kind of “dumb” L7 only (rather than HTTP/S) since I wanted all of my logic in the Nginx services. The main thing HAProxy does is, like you’re looking for, put the SPOF alongside the other unavoidable SPOF (router) and also wraps the requests in Proxy Protocol so the downstream Nginx services will have the correct client IP.

    Flow is basically:

    LAN/WAN/VPN -> HAProxy -> Two Nginx Instances -> Apps
    

    With HAProxy in the router, it also lets me set internal DNS records for my apps to my router’s LAN IP.





  • I’ve got bot detection setup in Nginx on my VPS which used to return 444 (Nginx for "close the connection and waste no more resources processing it), but I recently started piping that traffic to Nepenthes to return gibberish data for them to train on.

    I documented a rough guide in the comment here. Of relevance to you are the two .conf files at the bottom. In the deny-disallowed.conf, change the line for return 301 ... to return 444

    I also utilize firewall and fail2ban in the VPS to block bad actors, overly-aggressive scrapers, password brute forces, etc and the link between the VPS and my homelab equipment never sees that traffic.

    In the case of a DDoS, I’ve done the following:

    • Enable aggressive rate limits in Nginx (it may be slow for everyone but it’s still up)
    • Just stop either Wireguard or Nginx on the VPS until the storm blows over. (Crude but useful to avoid any bandwidth overages if you’re charged for inbound traffic).

    Granted, I’m not running anything mission-critical, just some services for friends and family, so I can deal with a little downtime.



  • I have never used it, so take this with a grain of salt, but last I read, with the free tier, you could not secure traffic between yourself and Cloudflare with your own certs which implies they can decrypt and read that traffic. What, if anything, they do with that capability I do not know. I just do not trust my hosted assets to be secured with certs/keys I do not control.

    There are other things CF can do (bot detection, DDoS protection, etc), but if you just want to avoid exposing your home IP, a cheap VPS running Nginx can work the same way as a CF tunnel. Setup Wireguard on the VPS and have your backend servers in Nginx connect to your home assets via that. If the VPS is the “server” side of the WG tunnel, you don’t have to open any local ports in your router at all. I’ve been doing that, originally with OpenVPN, since before CF tunnels were ever offered as a service.

    Edit: You don’t even need WG, really. If you setup a persistent SSH tunnel and forward / bind a port to your VPS, you can tunnel the traffic over that.


  • So, I set this up recently and agree with all of your points about the actual integration being glossed over.

    I already had bot detection setup in my Nginx config, so adding Nepenthes was just changing the behavior of that. Previously, I had just returned either 404 or 444 to those requests but now it redirects them to Nepenthes.

    Rather than trying to do rewrites and pretend the Nepenthes content is under my app’s URL namespace, I just do a redirect which the bot crawlers tend to follow just fine.

    There’s several parts to this to keep my config sane. Each of those are in include files.

    • An include file that looks at the user agent, compares it to a list of bot UA regexes, and sets a variable to either 0 or 1. By itself, that include file doesn’t do anything more than set that variable. This allows me to have it as a global config without having it apply to every virtual host.

    • An include file that performs the action if a variable is set to true. This has to be included in the server portion of each virtual host where I want the bot traffic to go to Nepenthes. If this isn’t included in a virtual host’s server block, then bot traffic is allowed.

    • A virtual host where the Nepenthes content is presented. I run a subdomain (content.mydomain.xyz). You could also do this as a path off of your protected domain, but this works for me and keeps my already complex config from getting any worse. Plus, it was easier to integrate into my existing bot config. Had I not already had that, I would have run it off of a path (and may go back and do that when I have time to mess with it again).

    The map-bot-user-agents.conf is included in the http section of Nginx and applies to all virtual hosts. You can either include this in the main nginx.conf or at the top (above the server section) in your individual virtual host config file(s).

    The deny-disallowed.conf is included individually in each virtual hosts’s server section. Even though the bot detection is global, if the virtual host’s server section does not include the action file, then nothing is done.

    Files

    map-bot-user-agents.conf

    Note that I’m treating Google’s crawler the same as an AI bot because…well, it is. They’re abusing their search position by double-dipping on the crawler so you can’t opt out of being crawled for AI training without also preventing it from crawling you for search engine indexing. Depending on your needs, you may need to comment that out. I’ve also commented out the Python requests user agent. And forgive the mess at the bottom of the file. I inherited the seed list of user agents and haven’t cleaned up that massive regex one-liner.

    # Map bot user agents
    ## Sets the $ua_disallowed variable to 0 or 1 depending on the user agent. Non-bot UAs are 0, bots are 1
    
    map $http_user_agent $ua_disallowed {
        default 		0;
        "~PerplexityBot"	1;
        "~PetalBot"		1;
        "~applebot"		1;
        "~compatible; zot"	1;
        "~Meta"		1;
        "~SurdotlyBot"	1;
        "~zgrab"		1;
        "~OAI-SearchBot"	1;
        "~Protopage"	1;
        "~Google-Test"	1;
        "~BacklinksExtendedBot" 1;
        "~microsoft-for-startups" 1;
        "~CCBot"		1;
        "~ClaudeBot"	1;
        "~VelenPublicWebCrawler"	1;
        "~WellKnownBot"	1;
        #"~python-requests"	1;
        "~bitdiscovery"	1;
        "~bingbot"		1;
        "~SemrushBot" 	1;
        "~Bytespider" 	1;
        "~AhrefsBot" 	1;
        "~AwarioBot"	1;
    #    "~Poduptime" 	1;
        "~GPTBot" 		1;
        "~DotBot"	 	1;
        "~ImagesiftBot"	1;
        "~Amazonbot"	1;
        "~GuzzleHttp" 	1;
        "~DataForSeoBot" 	1;
        "~StractBot"	1;
        "~Googlebot"	1;
        "~Barkrowler"	1;
        "~SeznamBot"	1;
        "~FriendlyCrawler"	1;
        "~facebookexternalhit" 1;
        "~*(?i)(80legs|360Spider|Aboundex|Abonti|Acunetix|^AIBOT|^Alexibot|Alligator|AllSubmitter|Apexoo|^asterias|^attach|^BackDoorBot|^BackStreet|^BackWeb|Badass|Bandit|Baid|Baiduspider|^BatchFTP|^Bigfoot|^Black.Hole|^BlackWidow|BlackWidow|^BlowFish|Blow|^BotALot|Buddy|^BuiltBotTough|
    ^Bullseye|^BunnySlippers|BBBike|^Cegbfeieh|^CheeseBot|^CherryPicker|^ChinaClaw|^Cogentbot|CPython|Collector|cognitiveseo|Copier|^CopyRightCheck|^cosmos|^Crescent|CSHttp|^Custo|^Demon|^Devil|^DISCo|^DIIbot|discobot|^DittoSpyder|Download.Demon|Download.Devil|Download.Wonder|^dragonfl
    y|^Drip|^eCatch|^EasyDL|^ebingbong|^EirGrabber|^EmailCollector|^EmailSiphon|^EmailWolf|^EroCrawler|^Exabot|^Express|Extractor|^EyeNetIE|FHscan|^FHscan|^flunky|^Foobot|^FrontPage|GalaxyBot|^gotit|Grabber|^GrabNet|^Grafula|^Harvest|^HEADMasterSEO|^hloader|^HMView|^HTTrack|httrack|HTT
    rack|htmlparser|^humanlinks|^IlseBot|Image.Stripper|Image.Sucker|imagefetch|^InfoNaviRobot|^InfoTekies|^Intelliseek|^InterGET|^Iria|^Jakarta|^JennyBot|^JetCar|JikeSpider|^JOC|^JustView|^Jyxobot|^Kenjin.Spider|^Keyword.Density|libwww|^larbin|LeechFTP|LeechGet|^LexiBot|^lftp|^libWeb|
    ^likse|^LinkextractorPro|^LinkScan|^LNSpiderguy|^LinkWalker|msnbot|MSIECrawler|MJ12bot|MegaIndex|^Magnet|^Mag-Net|^MarkWatch|Mass.Downloader|masscan|^Mata.Hari|^Memo|^MIIxpc|^NAMEPROTECT|^Navroad|^NearSite|^NetAnts|^Netcraft|^NetMechanic|^NetSpider|^NetZIP|^NextGenSearchBot|^NICErs
    PRO|^niki-bot|^NimbleCrawler|^Nimbostratus-Bot|^Ninja|^Nmap|nmap|^NPbot|Offline.Explorer|Offline.Navigator|OpenLinkProfiler|^Octopus|^Openfind|^OutfoxBot|Pixray|probethenet|proximic|^PageGrabber|^pavuk|^pcBrowser|^Pockey|^ProPowerBot|^ProWebWalker|^psbot|^Pump|python-requests\/|^Qu
    eryN.Metasearch|^RealDownload|Reaper|^Reaper|^Ripper|Ripper|Recorder|^ReGet|^RepoMonkey|^RMA|scanbot|SEOkicks-Robot|seoscanners|^Stripper|^Sucker|Siphon|Siteimprove|^SiteSnagger|SiteSucker|^SlySearch|^SmartDownload|^Snake|^Snapbot|^Snoopy|Sosospider|^sogou|spbot|^SpaceBison|^spanne
    r|^SpankBot|Spinn4r|^Sqworm|Sqworm|Stripper|Sucker|^SuperBot|SuperHTTP|^SuperHTTP|^Surfbot|^suzuran|^Szukacz|^tAkeOut|^Teleport|^Telesoft|^TurnitinBot|^The.Intraformant|^TheNomad|^TightTwatBot|^Titan|^True_Robot|^turingos|^TurnitinBot|^URLy.Warning|^Vacuum|^VCI|VidibleScraper|^Void
    EYE|^WebAuto|^WebBandit|^WebCopier|^WebEnhancer|^WebFetch|^Web.Image.Collector|^WebLeacher|^WebmasterWorldForumBot|WebPix|^WebReaper|^WebSauger|Website.eXtractor|^Webster|WebShag|^WebStripper|WebSucker|^WebWhacker|^WebZIP|Whack|Whacker|^Widow|Widow|WinHTTrack|^WISENutbot|WWWOFFLE|^
    WWWOFFLE|^WWW-Collector-E|^Xaldon|^Xenu|^Zade|^Zeus|ZmEu|^Zyborg|SemrushBot|^WebFuck|^MJ12bot|^majestic12|^WallpapersHD)" 1;
    
    }
    
    
    deny-disallowed.conf
    # Deny disallowed user agents
    if ($ua_disallowed) { 
        # This redirects them to the Nepenthes domain. So far, pretty much all the bot crawlers have been happy to accept the redirect and crawl the tarpit continuously 
    	return 301 https://content.mydomain.xyz/;
    }
    


  • I’ve had pretty good experience with Nextcloud’s instant upload. The only time I’ve had it shit the bed was ages ago when it would occasionally get stuck on a conflict, but that hasn’t happened in a long time. Pretty much all of my image folders (camera/DCIM, Screenshots, Downloads) get synced. The only annoying thing was when apps would suddenly change where they download to and I’d have to reconfigure yet another sync folder, but I can’t really fault NC for that.

    Mine is set to upload and keep a local copy and only do a one way sync (phone to NC). Not sure if that causes less issues than a 2 way sync or deleting the local copy after upload?