• 0 Posts
  • 70 Comments
Joined 1 year ago
cake
Cake day: July 14th, 2023

help-circle



  • What exactly are you trusting a cert provider with and what are the security implications?

    End users trust the cert provider. The cert provider has a process that they use to determine if they can trust you.

    What attack vectors do you open yourself up to when trusting a certificate authority with your websites’ certificates?

    You’re not really trusting them with your certificates. You don’t give them your private key or anything like that, and the certs are visible to anyone navigating to your website.

    Your new vulnerabilities are basically limited to what you do for them - any changes you make to your domain’s DNS config, or anything you host, etc. - and depend on that introducing a vulnerability of its own. You also open a new phishing attack vector, where someone might contact you, posing as the certificate authority, and ask you to make a change that would introduce a vulnerability.

    In what way could it benefit security and/or privacy to utilize a paid service?

    For most use cases, as far as I know, it doesn’t.

    LetsEncrypt doesn’t offer EV or OV certificates, which you may need for your use case. However, these are mostly relevant at the enterprise level. Maybe you have a storefront and want an EV cert?

    LetsEncrypt also only offers community support, and if you set something up wrong you could be less secure.

    Other CAs may offer services that enhance privacy and security, as well, like scanning your site to confirm your config is sound… but the core offering isn’t really going to be different (aside from LE having intentionally short renewal periods), and theoretically you could get those same services from a different vendor.






  • Reverse proxies aren’t DNS servers.

    The DNS server will be configured to know that your domain, e.g., example.com or *.example.com, is a particular IP, and when someone navigates to that URL it tells them the IP, which they then send a request to.

    The reverse proxy runs on that IP; it intercepts and analyzes the request. This can be as simple as transparently forwarding jellyfin.example.com to the specific IP (could even be an internal IP address on the same machine - I use Traefik to expose Docker network IPs that aren’t exposed at the host level) and port, but they can also inspect and rewrite headers and other request properties and they can have different logic depending on the various values.

    Your router is likely handling the .local “domain” resolution and that’s what you’ll need to be concerned with when configuring AdGuard.


  • Is it possible to force a corruption if a disk clone is attempted?

    Anything that corrupts a single file would work. You could certainly change your own disk cloning binaries to include such functionality, but if someone were accessing your data directly via their own OS, that wouldn’t be effective. I don’t know of a way to circumvent that last part other than ensuring that the data isn’t left on disk when you’re done. For example, you could use a ramdisk instead of non-volatile storage. You could delete or intentionally corrupt the volume when you unmount it. You could split the file, storing half on your USB flash drive and keeping the other half on your PC. You could XOR the file with contents of another file (e.g., one on your USB flash drive instead of on your PC) and then XOR it again when you need to access it.

    What sort of attack are you trying to protect from here?

    If the goal is plausible deniability, then it’s worth noting that VeraCrypt volumes aren’t identifiable as distinct from random data. So if you have a valid reason for having a big block of random data on disk, you could say that’s what the file was. Random files are useful because they are not compressible. For example, you could be using those files to test: network/storage media performance or compression/hash/backup&restore/encrypt&decrypt functions. You could be using them to have a repeatable set of random values to use in a program (like using a seed, but without necessarily being limited to using a PRNG to generate the sequence).

    If that’s not sufficient, you should look into hidden volumes. The idea is that you take a regular encrypted volume, whose free space, on disk, looks just like random data, you store your hidden volume within the free space. The hidden volume gets its own password. Then, you can mount the volume using the first password and get visibility into a “decoy” set of files or use the second password to view your “hidden” files. Note that when mounting it to view the decoy files, any write operations will have a chance of corrupting the hidden files. However, you can supply both passwords to mount it in a protected mode, allowing you to change the decoy files and avoid corrupting the hidden ones.


  • It sounds like you want these files to be encrypted.

    Someone already suggested encrypting them with GPG, but maybe you want the files themselves to also be isolated, even while their data is encrypted. In that case, consider an encrypted volume. I assume you’re familiar with LUKS - you can encrypt a partition with a different password and disable auto-mount pretty easily. But if you’d rather use a file-based volume, then check out VeraCrypt - it’s a FOSS-ish [1], cross-platform tool that provides this capability. The official documentation is very Windows-focused - the ArchLinux wiki article is a pretty useful Linux focused alternative.

    Normal operation is that you use a file to store the volume, which can be “dynamic” with a max size or can be statically sized (you can also directly encrypt a disk partition, but you could do that with LUKS, too). Then, before you can access the files - read or write - you have to enter the password, supply the encryption key, etc., in order to unlock it.

    Someone without the password but with permission to modify the file will be capable of corrupting it (which would prevent you from accessing every protected file), but unless they somehow got access to the password they wouldn’t be able to view or modify the protected files.

    The big advantage over LUKS is ease of creating/mounting file-based volumes and portability. If you’re concerned about another user deleting your encrypted volume, then you can easily back it up without decrypting it. You can easily load and access it on other systems, too - there are official, stable apps on Windows and Mac, though you’ll need admin access to run them. On Android and iOS options are a bit more slim - EDS on Android and Disk Decipher on iOS. If you’re copying a volume to a Linux system without VeraCrypt installed, you’ll likely still be able to mount it, as dm-crypt has support for VeraCrypt volumes.

    • 1 - It’s based on TrueCrypt, which has some less free restrictions, e.g., c. Phrase "Based on TrueCrypt, freely available at http://www.truecrypt.org/" must be displayed by Your Product (if technically feasible) and contained in its documentation.”

  • reasonable expectations and uses for LLMs.

    LLMs are only ever going to be a single component of an AI system. We’ve only had LLMs with their current capabilities for a very short time period, so the research and experimentation to find optimal system patterns, given the capabilities of LLMs, has necessarily been limited.

    I personally believe it’s possible, but we need to get vendors and managers to stop trying to sprinkle “AI” in everything like some goddamn Good Idea Fairy.

    That’s a separate problem. Unless it results in decreased research into improving the systems that leverage LLMs, e.g., by resulting in pervasive negative AI sentiment, it won’t have a negative on the progress of the research. Rather the opposite, in fact, as seeing which uses of AI are successful and which are not (success here being measured by customer acceptance and interest, not by the AI’s efficacy) is information that can help direct and inspire research avenues.

    LLMs are good for providing answers to well defined problems which can be answered with existing documentation.

    Clarification: LLMs are not reliable at this task, but we have patterns for systems that leverage LLMs that are much better at it, thanks to techniques like RAG, supervisor LLMs, etc…

    When the problem is poorly defined and/or the answer isn’t as well documented or has a lot of nuance, they then do a spectacular job of generating bullshit.

    TBH, so would a random person in such a situation (if they produced anything at all).

    As an example: how often have you heard about a company’s marketing departments over-hyping their upcoming product, resulting in unmet consumer expectation, a ton of extra work from the product’s developers and engineers, or both? This is because those marketers don’t really understand the product - either because they don’t have the information, didn’t read it, because they got conflicting information, or because the information they have is written for a different audience - i.e., a developer, not a marketer - and the nuance is lost in translation.

    At the company level, you can structure a system that marketers work within that will result in them providing more correct information. That starts with them being given all of the correct information in the first place. However, even then, the marketer won’t be solving problems like a developer. But if you ask them to write some copy to describe the product, or write up a commercial script where the product is used, or something along those lines, they can do that.

    And yet the marketer role here is still more complex than our existing AI systems, but those systems are already incorporating patterns very similar to those that a marketer uses day-to-day. And AI researchers - academic, corporate, and hobbyists - are looking into more ways that this can be done.

    If we want an AI system to be able to solve problems more reliably, we have to, at minimum:

    • break down the problems into more consumable parts
    • ensure that components are asked to solve problems they’re well-suited for, which means that we won’t be using an LLM - or even necessarily an AI solution at all - for every problem type that the system solves
    • have a feedback loop / review process built into the system

    In terms of what they can accept as input, LLMs have a huge amount of flexibility - much higher than what they appear to be good at and much, much higher than what they’re actually good at. They’re a compelling hammer. System designers need to not just be aware of which problems are nails and which are screws or unpainted wood or something else entirely, but also ensure that the systems can perform that identification on their own.



  • If you use that docker compose file, I recommend you comment out the build section and uncomment the image section in the lemmy service.

    I also recommend you use a reverse proxy and Docker networks rather than exposing the postgres instance on port 5433, but if you aren’t familiar with Docker networks you can leave it as is for now. If you’re running locally and don’t open that port in your router’s firewall, it’s a non-issue unless there’s an attacker on your LAN, but given that you’re not gaining anything from exposing it (unless you need to connect to the DB directly regularly - as a one off you could temporarily add the port mapping), it doesn’t make sense to increase your attack surface for no benefit.


  • The idea that someone does this willingly implies that the user knows the implications of their choice, which most of the Fediverse doesn’t seem to do

    The terms of service for lemmy.world, which you must agree to upon sign-up, make reference to federating. If you don’t know what that means, it’s your responsibility to look it up and understand it. I assume other instances have similar sign-up processes. The source code to Lemmy is also available, meaning that a full understanding is available to anyone willing to take the time to read through the code, unlike with most social media companies.

    What sorts of implications of the choice to post to Lemmy do you think that people don’t understand, that people who post to Facebook do understand?

    If the implied license was enough, Facebook and all the other companies wouldn’t put these disclaimers in their terms of service.

    It’s not an implied license. It’s implied permission. And if you post content to a website that’s hosting and displaying such content, it’s obvious what’s about to happen with it. Please try telling a judge that you didn’t understand what you were doing, sued without first trying to delete or file a DMCA notice, and see if that judge sides with you.

    Many companies have lengthy terms of service with a ton of CYA legalese that does nothing. Even so, an explicit license to your content in the terms of service does do something - but that doesn’t mean that you’re infringing copyright without it. If my artist friend asks me to take her art piece to a copy shop and to get a hundred prints made for her, I’m not infringing copyright then, either, nor is the copy shop. If I did that without permission, on the other hand, I would be. If her lawyer got wind of this and filed a suit against me without checking with her and I showed the judge the text saying “Hey hedgehog, could you do me a favor and…,” what do you think he’d say?

    Besides, Facebook does things that Lemmy instances don’t do. Facebook’s codebase isn’t open, and they’d like to reserve the ability to do different things with the content you submit. Facebook wants to be able to do non-obvious things with your content. Facebook is incorporated in California and has a value in the hundreds of billions, but Lemmy instances are located all over the world and I doubt any have a value even in the millions.



  • I haven’t personally used any of these, but looking them over, Tipi looks the most encouraging to me, followed by Yunohost, based largely on the variety of apps available but also because it looks like Tipi lets you customize the configuration much more. Freedom Box doesn’t seem to list the apps in their catalog at all and their site seems basically useless, so I ruled it out on that basis alone.


  • I am trying to avoid having to having an open port 22

    If you’re working locally you don’t need an open port.

    If you’re on a different machine but on the same network, you don’t need to expose port 22 via your router’s firewall. If you use key-based auth and disable password-based auth then this is even safer.

    If you want access remotely, then you still don’t have to expose port 22 as long as you have a vpn set up.

    That said, you don’t need to use a terminal to manage your docker containers. I use Portainer to manage all but my core containers - Traefik, Authelia, and Portainer itself - which are all part of a single docker compose file. Portainer stacks accept docker compose files so adding and configuring applications is straightforward.

    I’ve configured around 50 apps on my server using Docker Compose with Portainer but have only needed to modify the Dockerfile itself once, and that was because I was trying to do something that the original maintainer didn’t support.

    Now, if you’re satisfied with what’s available and with how much you can configure it without using Docker, then it’s fine to avoid it. I’m just trying to say that it’s pretty straightforward if you focus on just understanding the important parts, mainly:

    • docker compose
    • docker networks
    • docker volumes

    If you decide to go that route, I recommend TechnoTim’s tutorials on Youtube. I personally found them helpful, at least.


  • This is a very surface level overview of the frameworks it covers. The title is a bit of a reach, as it wouldn’t give anyone enough information to make a more educated decision about which framework to use.

    Are you the author? I think it could be improved by including:

    • metrics - number of apps that use each, number of job offerings, github stars
    • who backs each project, and how much can we trust them to continue developing it in a way that’s friendly to developers
    • for React specifically, a bit more info on the prominent frameworks - Next.js, Vite, Gatsby, CRA/CRACO, or ejected CRA - since the difference between them is substantial
    • a high level description of the use case that the framework is designed for, as well as use cases where it isn’t well suited or has drawbacks.
    • how does the development experience differ? Is there a lengthy build step? Does it offer hot reloading? Does it come with a built-in linter or integrate easily with one?
    • Does it have a bundled testing framework, and how does that compare to other offerings? For example, CRA comes with jest and it can be a pain to configure jest to properly handle all of your dependencies - it doesn’t use the same build pipeline as your app and will fail if you’re using newer dependencies that use import statements instead of module.exports and you don’t individually configure each one. Vitest, by contrast, uses the same build pipeline as Vite.
    • Ease of writing unit tests, component tests, and e2e tests (even if that means pulling in another library)
    • ease of use with or without typescript
    • some more substantial example apps per framework, like a to-do list that uses a simple API (preferably the same API in all cases). Currently the examples don’t even show what the code looks like with basic styling

    If you are the author, I saw your article on Typescript and would also like to say that you can configure your linter to not warn about using any. There’s even a no-implicit-any rule that you can use if you only want explicit any types but don’t want, for example, responses from API calls to have that type by default.


  • I’m not addressing anything Gitea has specifically done here (I’m not informed enough on the topic to have an educated opinion yet), but just this specific part of your comment:

    And they also demand a CLA from contributors now, which is directly against the idea of FOSS.

    Proprietary software is antithetical to FOSS, but CLAs themselves are not, and were endorsed by RMS as far back as 2002:

    In contrast, I think it is acceptable to … release under the GPL, but sell alternative licenses permitting proprietary extensions to their code. My understanding is that all the code they release is available as free software, which means they do not develop any proprietary softwre; that’s why their practice is acceptable. The FSF will never do that–we believe our terms should be the same for everyone, and we want to use the GPL to give others an incentive to develop additional free software. But what they do is much better than developing proprietary software.

    If contributors allow an entity to relicense their contributions, that enables the entity to write proprietary software that includes those contributions. One way to ensure they have that freedom is to require contributors to sign a CLA that allows relicensing, so clearly CLAs can enable behavior antithetical to FOSS… but they can also enable FOSS development by generating another revenue stream. And many CLAs don’t allow relicensing (e.g., Apache’s).

    Many FOSS companies require contributors to sign CLAs. For example, the FSF has required them since 2005 at least, and its CLA allows relicensing. They explain why, but that explanation doesn’t touch on why license reassignment is necessary.

    Even if a repo requires contributors sign a CLA, nobody’s four freedoms are violated, and nobody who modifies such software is forced to sign a CLA when they share their changes with the community - they can share their changes on their own repo, or submit them to a fork that doesn’t require a CLA, or only share the code with users who purchase the software from them. All they have to do is adhere to the license that the project was under.

    The big issue with CLAs is that they’re asymmetrical (as opposed to DCOs, which serve a similar purpose). That’s understandably controversial, but it’s not inherently a FOSS issue.

    Some of the same arguments against the SSPL (which is not considered FOSS because it is so copyleft that it’s impractical) being considered FOSS could be similarly made in favor of CLAs. Not in favor of signing them as a developer, mind you, but in favor of considering projects that use them to be aligned FOSS principles.