The catarrhine yerba mate enjoyer who invented a perpetual motion machine, by dreaming at night and devouring its own dreams through the day.

Кўис кредис ессе, Беллум?

  • 1 Post
  • 193 Comments
Joined 3 years ago
cake
Cake day: April 9th, 2021

help-circle





  • It isn’t “Hangul” that is saving the language, but the fact that it’s getting an orthography. That orthography could be theoretically in any writing system - not just Latin or Arabic (both already exist for Cia-Cia, contrariwise to what the video claims), but even a native one or Cyrillic or even, dunno, the Cherokee syllabary.

    Abidin looks informed on the matter; the same cannot be said about whoever produced this video. I’ll highlight a few issues.

    [0:33] - pretty much all languages are “syllable-based”. They organise sounds into syllables. The video is likely trying to convey that it’s a CV (consonant, vowel, repeat) language, unlike, say, Russian or English (that cram quite a lot of consonants in a single syllable).

    [0:36] The video is trying to use “transliterated” as a posh synonym for “spelled”; both are not the same thing. Transliteration is to convert text from a script from another; for example, “Quis credis esse, Bellum?” (Latin, using the Latin script) → “Кўис кредис ессе, Беллум?” (Latin, using the Cyrillic script instead) is transliteration.

    And you can spell pretty much any language in any writing system. The association between grapheme and sounds (or phonemes) is arbitrary.

    You might say “but the Latin alphabet doesn’t have a letter for /ɓ/!” - well, it doesn’t have a letter for /ʃ/ either. Italian handled it by spelling it ⟨sci⟩, English as ⟨sh⟩, Polish as ⟨sz⟩, Portuguese kind of repurposed ⟨x⟩. And the current Latin spelling for Cia-Cia - that you can check here - handled /ɓ/ just fine, using a similar approach as the Hangul one.


  • Let’s go simpler: what if your instance was allowed to copy the fed/defed lists from other instances, and use them (alongside simple Boolean logic plus if/then statements) to automatically decide who you’re going to federate/defederate with? That would enable caracoles and fedifams for admins who so desire, but also enable other organically grown relations.

    For example. Let’s say that you just joined the federation. And there are three instances that you somewhat trust:

    • Alice - it defederates only really problematic instances.
    • Bob and Charlie - both are a bit prone to defederate other instances on a whim, but when both defed the same instance it’s usually problematic.

    Then you could set up your defederation rules like this:

    • if Alice defed it, then defed it too.
    • else, if (Bob defed it) and (Charlie defed it), then defed it too.
    • else, federate with it.

    Of course, that would require distinguishing between manual and automatic fed/defed. You’d be able to use the manual fed/defed from other instances to create your automatic rules, to avoid deadlocks like “Alice is blocking it because Bob is blocking it, and Bob is blocking it because Alice is doing it”.



  • The source that I’ve linked mentions semantic embedding; so does further literature on the internet. However, the operations are still being performed with the vectors resulting from the tokens themselves, with said embedding playing a secondary role.

    This is evident for example through excerpts like

    The token embeddings map a token ID to a fixed-size vector with some semantic meaning of the tokens. These brings some interesting properties: similar tokens will have a similar embedding (in other words, calculating the cosine similarity between two embeddings will give us a good idea of how similar the tokens are).

    Emphasis mine. A similar conclusion (that the LLM is still handling the tokens, not their meaning) can be reached by analysing the hallucinations that your typical LLM bot outputs, and asking why that hallu is there.

    What I’m proposing is deeper than that. It’s to use the input tokens (i.e. morphemes) only to retrieve the sememes (units of meaning; further info here) that they’re conveying, then discard the tokens themselves, and perform the operations solely on the sememes. Then for the output you translate the sememes obtained by the transformer into morphemes=tokens again.

    I believe that this would have two big benefits:

    1. The amount of data necessary to “train” the LLM will decrease. Perhaps by orders of magnitude.
    2. A major type of hallucination will go away: self-contradiction (for example: states that A exists, then that A doesn’t exist).

    And it might be an additional layer, but the whole approach is considerably simpler than what’s being done currently - pretending that the tokens themselves have some intrinsic value, then playing whack-a-mole with situations where the token and the contextually assigned value (by the human using the LLM) differ.

    [This could even go deeper, handling a pragmatic layer beyond the tokens/morphemes and the units of meaning/sememes. It would be closer to what @[email protected] understood from my other comment, as it would then deal with the intent of the utterance.]


  • Not quite. I’m focusing on chatbots like Bard, ChatGPT and the likes, and their technology (LLM, or large language model).

    At the core those LLMs work like this: they pick words, split them into “tokens”, and then perform a few operations on those tokens, across multiple layers. But at the end of the day they still work with the words themselves, not with the meaning being encoded by those words.

    What I want is an LLM that assigns multiple meanings for those words, and performs the operations above on the meaning itself. In other words the LLM would actually understand you, not just chain words.


  • Complexity does not mean sophistication when it comes to AI and never has and to treat it as such is just a forceful way to make your ideas come true without putting in the real effort.

    It’s a bit off-topic, but what I really want is a language model that assigns semantic values to the tokens, and handles those values instead of directly working with the tokens themselves. That would be probably far less complex than current state-of-art LLMs, but way more sophisticated, and require far less data for “training”.









  • It’s less complicated than it looks like. The text is just a poorly written mess, full of options (Fedora vs. Ubuntu, repo vs. no repo, stable vs. beta), and they’re explaining how to do this through the terminal alone because the interface that you have might be different from what they expect. And because copy-pasting commands is faster.

    Can’t I just download a file and install it? I’m on Ubuntu.

    Yes, you can! In fact, the instructions include this option; it’s under “Installing the app without the Mullvad repository”. It’s a bad idea though; then you don’t get automatic updates.

    A better way to do this is to tell your system “I want software from this repository”, so each time that they make a new version of the program, yours get updated.

    but I have no idea what I’m doing here.

    I’ll copy-paste their commands to do so, and explain what each does.

    sudo curl -fsSLo /usr/share/keyrings/mullvad-keyring.asc https://repository.mullvad.net/deb/mullvad-keyring.asc
    echo "deb [signed-by=/usr/share/keyrings/mullvad-keyring.asc arch=$( dpkg --print-architecture )] https://repository.mullvad.net/deb/stable $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/mullvad.list
    sudo apt update
    sudo apt install mullvad-vpn
    

    The first command boils down to “download this keyring from the internet”. The keyring is a necessary file to know if you’re actually getting your software from Mullvad instead of PoopySoxHaxxor69. If you wanted, you could do it manually, and then move to the /usr/share/keyrings directory, but… it’s more work, come on.

    The second command tells your system that you want software from repository.mullvad.net. I don’t use Ubuntu but there’s probably some GUI to do it for you.

    The third command boils down to “hey, Ubuntu, update the list of packages for me”.

    The fourth one installs the software.


  • (I don’t care about USA enough to discuss its specificities. I’ll talk about fascism.)

    It’s a mistake to conflate two enemies. Even if you hate both for the same reasons, once you conflate them, you lose the ability to fight against at least one of them.

    And what the video describes as “friendly fascism” has barely anything to do with fascism. And it has already a name - plutocracy, or “government of the rich”.

    Once you disregard witch hunters and their brainfarts, fascism has a rather consistent bundle of traits:

    1. “strength through union”
    2. conflation between a government, its population, and a “nation”
    3. persecution of minorities as “harming our unity”
    4. a “strong leader” taking decisions for you
    5. hate against separatist movements
    6. emphasis on traditional values
    7. a discourse of a “glorious past” to return to
    8. usage of force to silence dissidence

    By far #0 is the most important trait of fascism, as the others come from it. In the meantime plutocracy (or “friendly fascism”) would fit #3, arguably #7. And the contempt for liberalism and electoral politics appears for different reasons for both - ideological and pragmatic respectively.


    Once you make this distinction, this video becomes specially interesting to watch, as it allows you to notice how one of your enemies is using the other to kill you with a borrowed knife.