Research
🦩 Deepmind releases Flamingo, an 80-bn-parameter visual language model (VLM) that breaks the SOTA for 16 few-shot learning tasks, including visual question answering (VQA), hateful content classification, and captioning.
🕞 OpenAI released DALL·E 2 this month, the sequel to the original text-to-image generation model—this time using CLIP as a latent distribution of image embeddings given a text caption.
Check out the interactive demo by OpenAI and a technical overview of the paper by AssemblyAI, including some intuition on CLIP embeddings.
🤓 Stanford researchers are trying to theoretically understand why Batch-norm works in convex optimization and deep neural networks. In a paper released last March, the team formulated a theoretical understanding of its state-of-the-art results in the context of convex duality.
📈 Boris Dayma shared his findings on Twitter—and not on Arxiv—after training large transformers for 2,000+ hours. Check out some of his tips, including: don’t use bias in dense layers; use GeLU or Swish as an activation as opposed to SmeLU, and Normformers is more stable than Sandwich-LN.