Alan Dao's personal blog

Paper Summary: What Makes Rope Useful

Oct 23, 2024

Up until now everyone has been using “Rotary Position Embedding” (RoPE) like a default method for positional encoding for awhile. However, specifically how and why RoPE makes things “better” is still a little unexlored. Luckily there is a paper Round and Round We Go! What makes Rotary Positional Encodings useful? addressing this specific issue. I found a few of their results are quite interesting. 1. RoPE does not necessarily decay activations with distance: 🔗In the original paper RoFormer the authors made an analysis about the fact that RoPE has some level of decay of the increasing of the context len.

🍓 Ichigo: Llama Learns to Talk

Oct 12, 2024

We rebranded llama3-s into “Ichigo” with a cute UI just like below. If you are coming from Singapore Techweek you can also visit the Homebrew Blog Post in the new Annoucement Post. I will update this blog post once our paper comes out.

Tutorial: High Quality Llm on Low Vram - Llama3.1

Aug 31, 2024

With recent release of cutting edge model like gemma9B, llama3.1, etc… we are in an era that people can have model as small as just 8B parameters and it can have the same performance with ChatGPT3.5 or ChatGPT4 (according to lmsys ranking). Strangely, the general vibe on community like r/locallama does not seem to agree. But why my Llama3.1 seems, dumb? 🔗Or at the very least, nowhere near chatGPT 3.

We Released a New Model

Aug 24, 2024

Dated back to this post Multi Modal Tokenizing With Chameleon. I have worked with my team at HomeBrew Research to make something. We wanted to give the community something new and not simply a replicate of Chameleon (vision modality). By that, we decided to work on a model that can do sound, that you can talk to it, that you can give commands to it. A Llama model that can listen!

The Hidden Cost of LLM Training: Why Optimizers Gulp Down Vram

Jul 20, 2024

The Error of Death 🔗Have you been constantly battling with VRAM usage in fine-tuning LLMs, and constantly struggling with the below error? RuntimeError: CUDA error: out of memory.. The above is the destroyer of joy, the sudden stop of happiness, and the most dreadful error you might have faced as someone trying to train an AI model, or more specifically, an LLM (because I assume it’s the most VRAM-intensive among the bunch).

Paper Summary: What Makes Rope Useful

🍓 Ichigo: Llama Learns to Talk

Tutorial: High Quality Llm on Low Vram - Llama3.1

We Released a New Model

The Hidden Cost of LLM Training: Why Optimizers Gulp Down Vram

Alan's Blog