Matrix Orthogonalization Improves Memory in Recurrent Models

(ayushtambde.com)

62 points | by at2005 9 hours ago ago

10 comments

phkahler an hour ago
If it can be made orthogonal, can you go a step further and diagonalize it? The storage and performance improvement from that would be huge.
BirbSingularity 8 hours ago
I can't help but think of orthogonal frequency-division multiplexing and it's use in encoding data on multiple carrier frequencies, and it makes me wonder what other parallels we will discover between digital transmission technology for cross-domain stuff like this.
[-]
- dapperdrake 7 hours ago
  Not even cross-domain. (Nor cross-co-domain.)
  Trigonometric polynomials are also polynomials. And linear spaces are all "the same". That is what the definition is for. Even the transpose-mapping is linear.
- chimpanzee2 3 hours ago
  I have this strange sensation that I can't put into words that somehow we are on the brink of unveiling an entirely new paradigm of AIs or perhaps even of combining AI with classical algorithms in a way to rapidly iterate between each other (and sensor data) that will instantly 10x or 100x current capabilities.
  Anyone else feel this?
  [-]
  - digdugdirk 2 hours ago
    I think part of it is the feeling of false understanding that comes from using llms regularly. They let you operate at a higher conceptual level, and they paper over enough of the actual details that your conceptual model might not actually be correct.
    I'm a mechanical engineer by training, and have similar vibes with the similarities I see between llm training and metallurgy. I could probably put together a formal concept for these vibes at this point, but is there actually a "there" there? I have no idea. And it would take me years to actually dive in and learn everything to gain the deep understanding that would be required to know if I'm just experiencing my own brand of AI psychosis or not.
    It's a brave new world, that's for sure.
    [-]
    - seanhunter an hour ago
      Andrej Karpathy said something along the lines of “while you can use llms to outsource some of your thinking, you can’t use them to outsource your understanding “.
  - cyanydeez 3 hours ago
    no. we're approach a sigmoid. AI is bloated carcass and we're tweaking out the size of the models and speed they'll run on smaller hardware.
    I think to feel what you're feeling, you've bought into "all we need is more context". I think evolution demonstrates that's not really true.
    [-]
    - chimpanzee2 2 hours ago
      would you really bet that this is it? there is nothing beyond this?
      reminds me of the famous anecdote of a 19th century physics professor who said "there is nothing left to be discovered in physics, only minor corrections"
      then came Einstein...
      [-]
      - seanhunter 25 minutes ago
        That wasn’t just a physics professor that was William Thompson aka Lord Kelvin (the dude the temperature unit is named after and one of the most important mathematical physicists of the 19th century [1]), who also said that heavier than air flight was physically impossible only a couple of weeks before the Wright Brothers (and presumably in spite of having at least once in his lifetime seen a bird). Proof that you can be both very smart and simultaneously a bit of a jackass.
        [1] https://en.wikipedia.org/wiki/Lord_Kelvin
harveyrook an hour ago
Now I’m wondering what is the eigenspace of an LLM? If I take a set of LLM’s with the same number of parameters, then what are the eigenvectors? Do they have different personalities?