Microsoft Unveils Fara-7B Agentic Model Built on Qwen for Computer Use

Microsoft has unveiled Fara-7B, its first agentic small language model specifically designed for computer use, featuring only 7 billion parameters while achieving state-of-the-art performance within its size class and remaining competitive with much larger, more resource-intensive systems. The model is built on Qwen 2.5-VL architecture and functions as a multimodal decoder-only language model that takes screenshots and text context as inputs, directly predicting thoughts and actions with grounded arguments to accomplish high-level user tasks. Fara-7B was trained on 145,000 synthetic trajectories generated through the Magentic-One framework, with a novel synthetic data generation system called FaraGen that can propose diverse tasks from frequently used websites, generate multiple solution attempts, and filter successful trajectories using multiple verifiers at approximately $1 per trajectory. The model perceives browser inputs via screenshots while recording internal reasoning and state history textually, enabling it to plan and execute complex goals such as booking restaurants, applying for jobs, planning trips, and purchasing shopping lists. A robust post-training safety approach incorporating open-source and in-house synthetic datasets ensures the model recognizes critical points requiring user permission or involving sensitive information, with the agent trained to refuse harmful tasks and undergoing automated red teaming to assess risks including grounding, jailbreaks, harmful content, and copyright violations. Critical safety checkpoints include situations involving personal information entry, purchase completion, phone calls, email sending, application submission, and account sign-ins. Fara-7B outperforms other computer use agent models of comparable size on benchmarks including WebVoyager, Online-Mind2Web, and WebTailBench, a novel benchmark developed to better capture under-represented web tasks in pre-existing benchmarks. The model is efficient enough to run on-device, with quantized versions requiring as little as 5.8GB of storage for the Q4_K_S version, making it accessible for deployment on consumer-grade hardware with 12GB GPU memory. Microsoft has made Fara-7B open-weight available on Microsoft Foundry and HuggingFace, democratizing access to advanced agentic capabilities while also releasing WebTailBench to support future research in computer use agents.

Why it matters:

  • Demonstrates that efficient, small language models can match larger frontier models for computer use tasks, enabling on-device deployment and reducing computational costs
  • Introduces scalable synthetic data generation methodology (FaraGen) that addresses the critical gap in high-quality computer use agent training datasets

Key Points

  • Fara-7B is a 7-billion parameter agentic model built on Qwen 2.5-VL trained on 145,000 synthetic trajectories for web-based task automation
  • The model uses multimodal inputs (screenshots + text) to predict actions with grounded arguments for complex tasks like booking, job applications, and shopping
  • Incorporates safety mechanisms with critical point recognition to halt at sensitive actions requiring user permission
  • Outperforms comparable-sized models on WebVoyager, Online-Mind2Web, and the new WebTailBench benchmark
  • Runs efficiently on consumer hardware with quantized versions requiring only 5.8GB storage
  • Open-weight model released on Microsoft Foundry and HuggingFace with accompanying WebTailBench dataset

Source: Read original

Summary

Microsoft has unveiled Fara-7B, its first agentic small language model specifically designed for computer use, featuring only 7 billion parameters while achieving state-of-the-art performance within its size class and remaining competitive with much larger, more resource-intensive systems. The model is built on Qwen 2.5-VL architecture and functions as a multimodal decoder-only language model that takes screenshots and text context as inputs, directly predicting thoughts and actions with grounded arguments to accomplish high-level user tasks. Fara-7B was trained on 145,000 synthetic trajectories generated through the Magentic-One framework, with a novel synthetic data generation system called FaraGen that can propose diverse tasks from frequently used websites, generate multiple solution attempts, and filter successful trajectories using multiple verifiers at approximately $1 per trajectory. The model perceives browser inputs via screenshots while recording internal reasoning and state history textually, enabling it to plan and execute complex goals such as booking restaurants, applying for jobs, planning trips, and purchasing shopping lists. A robust post-training safety approach incorporating open-source and in-house synthetic datasets ensures the model recognizes critical points requiring user permission or involving sensitive information, with the agent trained to refuse harmful tasks and undergoing automated red teaming to assess risks including grounding, jailbreaks, harmful content, and copyright violations. Critical safety checkpoints include situations involving personal information entry, purchase completion, phone calls, email sending, application submission, and account sign-ins. Fara-7B outperforms other computer use agent models of comparable size on benchmarks including WebVoyager, Online-Mind2Web, and WebTailBench, a novel benchmark developed to better capture under-represented web tasks in pre-existing benchmarks. The model is efficient enough to run on-device, with quantized versions requiring as little as 5.8GB of storage for the Q4_K_S version, making it accessible for deployment on consumer-grade hardware with 12GB GPU memory. Microsoft has made Fara-7B open-weight available on Microsoft Foundry and HuggingFace, democratizing access to advanced agentic capabilities while also releasing WebTailBench to support future research in computer use agents.

Why It Matters

Demonstrates that efficient, small language models can match larger frontier models for computer use tasks, enabling on-device deployment and reducing computational costs
Introduces scalable synthetic data generation methodology (FaraGen) that addresses the critical gap in high-quality computer use agent training datasets

Key Points

  • Fara-7B is a 7-billion parameter agentic model built on Qwen 2.5-VL trained on 145,000 synthetic trajectories for web-based task automation
  • The model uses multimodal inputs (screenshots + text) to predict actions with grounded arguments for complex tasks like booking, job applications, and shopping
  • Incorporates safety mechanisms with critical point recognition to halt at sensitive actions requiring user permission
  • Outperforms comparable-sized models on WebVoyager, Online-Mind2Web, and the new WebTailBench benchmark
  • Runs efficiently on consumer hardware with quantized versions requiring only 5.8GB storage
  • Open-weight model released on Microsoft Foundry and HuggingFace with accompanying WebTailBench dataset

Source: analyticsindiamag.com

Original Publish Date: 27/11/2025

Entities: Microsoft, Microsoft Research, Qwen, Magentic-One, FaraGen, HuggingFace, Microsoft Foundry