top of page

Open-Source AI Model Updates: Qwen, Phi4, NVIDIA

  • Qwen Series Expansion: Qwen released Qwen3, a new family of dense and Mixture-of-Experts (MoE) models ranging from 0.6B to 235B parameters, alongside Qwen2.5-Omni, an any-to-any model available in 3B and 7B versions.
  • Microsoft Phi4 Reasoning Models: Microsoft AI introduced Phi4, a series of reasoning models available in various sizes (mini, plus), indicating a focus on efficient and scalable reasoning capabilities.
  • NVIDIA's Contribution to Reasoning and Speech: NVIDIA released new datasets for Chain-of-Thought (CoT) reasoning and parakeet-tdt-0.6b-v2, a compact 600M parameter automatic speech recognition (ASR) model.
  • Multimodal UI Parsing with UI-TARS-1.5: ByteDance unveiled UI-TARS-1.5, a native multimodal UI parsing agentic model, suggesting advancements in AI's ability to understand and interact with user interfaces.
  • On-Device Object Tracking with EdgeTAM: Meta introduced EdgeTAM, an on-device object tracking model based on a SAM2 variant, highlighting progress in efficient, edge-deployable vision models.
  • Text-to-Speech and Code Generation Models: Nari released Dia, a 1.6B text-to-speech model, JetBrains released Melium models (base and SFT) for coding, and Tesslate released UIGEN-T2-7B, a text-to-frontend-code model, showcasing advancements in generative AI for diverse applications. According to additional sources, the models mentioned are part of a larger collection that includes VLMs, multimodal learning resources, and models for image/video understanding, depth estimation, and document AI.
Source:
bottom of page