@mholiv

mholiv@lemmy.world · edit-2 2 months ago

It’s government reporting data. If you find a better source I say go for it. But I used that data for salary negotiations in the past successfully.

I’m not talking about take home. I’m talking about total annual compensation including things like RSU payouts etc.

Even if we throw out the ones you doubt there are many 300k to 400k entries with the AI researcher title. If we add annualized RSU payouts we easily hit over €500k.

At this point t though you are free to doubt me.

mholiv@lemmy.world · 2 months ago

Maybe not with just if statements. But with a heuristic system I bet any site that runs a tar pit will be caught out very quickly.

mholiv@lemmy.world · edit-2 2 months ago

When I worked in the U.S. I was well above $160k.

When you look at leaks you can see $500k or more for principal engineers. Look at valves lawsuit information. https://www.theverge.com/2024/7/13/24197477/valve-employs-few-hundred-people-payroll-redacted

Meta is paying $400k BASE for AI Reserch engineers with stock options on top which in my experience is an additional 300% - 600%. Vesting over 2 to 4 years. This is to H1B workers who traditionally are paid less.

Once you get to principal and staff level engineering positions compensation opens up a lot.

https://h1bdata.info/index.php?em=meta+platforms+inc&job=&city=&year=all+years

ROI does not matter when companies are telling investors that they might be first to AGI. Investors go crazy over this. At least they will until the AI bubble pops.

I support people resisting if they want by setting up tar pits. But it’s a hobby and isn’t really doing much.

The sheer amount of resources going into this is beyond what people think.

That and a competent engineer can probably write something on the BEAM VM that can handle a crap ton of parallel connections. 6 figure maybe? Being slow walked means low CPU use which means more green threads.

mholiv@lemmy.world · 2 months ago

I see your point but like I think you underestimate the skill of coders. You make sure your timeout is inclusive of JavaScript run times. Maybe set a memory limit too. Like imagine you wanted to scrape the internet. You could solve all these tarpits. Any capable coder could. Now imagine a team of 20 of the best coders money can buy each paid 500.000€. They can certainly do the same.

Like I see the appeal of running a tar pit. But like I don’t see how they can “trap” anyone but script kiddies.

mholiv@lemmy.world · 2 months ago

Fair. But I haven’t seen any anti-ai-scraper tarpits that do that. The ones I’ve seen mostly just pipe 10MB of /dev/urandom out there.

Also I assume that the programmers working at ai companies are not literally mentally deficient. They certainly would add .timeout(10) or whatever to their scrapers. They probably have something more dynamic than that.

mholiv@lemmy.world · 2 months ago

They want to reduce the bandwidth usage. Not increase it!

mholiv@lemmy.world · edit-2 3 months ago

I mean they’re synced super fast to every file system. It works really well. Wayyy wayyy faster than nextcloud too. You can access them on that file system. If you want to “directly” access them you can always use the fuse driver. This being said there isn’t really a need to because all the files just are synced to your file system.

https://manual.seafile.com/latest/extension/fuse/

mholiv@lemmy.world · edit-2 5 months ago

Yah that term isn’t an official term. I just meant it in the sense of a IPv6 prefix. Without knowing more about how your router firewall works / in set up I can’t be too specific.

But in general the way things work with ip addresses is that your ISP provides you with a block of IPv6 address. This block is the prefix/first part of any given ipv6 address on your network. Each host uses that prefix and generates a suffix that it adds in to it in order to generate a full globally reputable IPv6 address.

By default most hosts use the IPv6 privacy extension to random suffixes and cycle through them. This is good for privacy but bad for hosting a public service. You need to turn off the privacy extension and the second half of the IPv6 address will stay static.

Next up you need to write a firewall rule to allow traffic to that globally routable IPv6 address. In an IPv6 system the router does not intercept or rewrite the packets like it does with IPv4. So all a router does is act as a firewall saying “Yup outside hosts can or can’t make inbound connections to certain hosts/ports”

The trick with a consumer IPv6 address space is that just like IPv4 addresses given to your router, the IPv6 prefix can change randomly.

It would be annoying to have to update the firewall rule every time this happened. That’s why the idea of masking matters. You tell the firewall “ignore the prefix of this firewall rule. Just allow or deny based on the static suffix.”

The way to write such rules is different on different firewalls. Most consumer devices don’t have a way to configure such things. Even professional networking equipment mostly makes you use the cli to manage such things.

I hope this helps.

mholiv@lemmy.world · 5 months ago

I’m glad you got it working with IPv4. For the record though the way to do such a thing in the future is to think in IPv6. In IPv6 there is no nat or port forwarding. Even if you have host exposure. You need to set an appropriate rule in your router firewall.

On the host itself you need to use public IPv6 addresses. Then on the router firewall you set a firewall rule with an appropriate delegation mask allowing traffic to the specified port.

It’s different than IPv4 but once you learn IPv6 it’s easy.

mholiv@lemmy.world · 1 year ago

Yes. The left side of the : in the volume is the file on the host. You can see this directory on the host. The right side of the : is where that directory is replicated into the docker container.

All you need to do is to interact with the directory on the host.

mholiv@lemmy.world · 1 year ago

You should use volumes over bind. You just move your media into the volume location on the local host and try will show up in docker. You should never need to ssh or sftp into the container.

mholiv@lemmy.world · 1 year ago

There is a lot here but I think the most important thing is that docker containers should always be disposable. Don’t put any data into the container ever.

All of your data and configuration should be done in volumes. Local disk to inside the container is all you really need.

By doing this you make updating any given docker container easy as just pulling the newest tagged version of the container. If you are using docker and not podman you can use tools like watchtower to do this automatically.

As for what distro, it depends on your goals. Do you want to learn and improve your skills? Stick with Fedora or Rocky or Debian or openSUSE. I recommend learning the command line as you go, but if you want a nice UI openSUSE has Yast which is a very robust tool.

If you want to just have a home NAS but don’t want to learn that’s a different question. In this case if you’re getting a proprietary NAS anyway you could just get one that supports docker (like synology) and kill 2 birds with 1 stone.

mholiv@lemmy.world · edit-2 1 year ago

https://codeberg.org/forgejo/discussions/issues/67

The biggest issue is they require your to give them your rights as they pertain to copyrights.

That means even if you submit MIT or GPL licensed code they can just instantly say “we relicense this code as proprietary” and there is nothing anyone can do.

They rejected a bunch of valid PRs. Including the one linked here because the author refused to assigned their copyrights to the Gitea corporation.

mholiv@lemmy.world · 1 year ago

Right now Forgejo is a drop in replacement. This article is them announcing that Forgejo will eventually not be one.

mholiv@lemmy.world · 1 year ago

Because gitea is fully the victim of corporate capture. Any PRs that make gitea better in a way that would reduce the main corporate “sponsor” profit are rejected.

The company has a conflict of interest with the community and it shows. Forgejo is sponsored by a non profit open source cooperative.

mholiv@lemmy.world · 1 year ago

+1 For Seafile. They put out a docker image that works well. It hasthe fastest sync I’ve ever seen and it has good clients.

mholiv@lemmy.world · 1 year ago

No problem. It should be wayyy faster than sshfs for the record. Both NFS and WireGuard are best in class tools.

mholiv@lemmy.world · 1 year ago

NFS over WireGuard is probably going to be the best when it comes to encrypted file shares without the need to set up Kerberos. Just set up the WireGuard tunnel and export over those ips.

mholiv@lemmy.world · 1 year ago

I understand. But do you see what you wrote could be seen as toxic? Intent is nice, but what and how you write really determines the tone of a community.

mholiv@lemmy.world · 1 year ago

No need to be toxic here. You don’t need put people down. We’re all learning here together. Hey. We all are all learning more about how reverse proxies and forwarded headers work together right now, including you.

We should aim to be an open welcoming community.