Microsoft AI, the tech giant’s dedicated research division, has released three new foundational models capable of generating text, voice, and images. The move signals Microsoft’s intent to compete directly with other major AI labs, including its own partner OpenAI, by building out a proprietary stack of multimodal models.
Quick Facts
- Three new foundational AI models announced
- Capable of text, voice, and video generation
- Priced to compete directly with market leaders
A Closer Look at Microsoft’s New AI Toolkit
The new models, developed by the MAI Superintelligence team led by Microsoft AI CEO Mustafa Suleyman, are designed for practical and efficient use. The suite includes:
-
MAI-Transcribe-1: A speech-to-text model that transcribes audio across 25 different languages. According to Microsoft, it operates 2.5 times faster than its existing Azure Fast service.
-
MAI-Voice-1: An audio-generating model that can produce 60 seconds of audio in a single second and allows for the creation of custom voices.
-
MAI-Image-2: A model designed for video generation.
All three models are now available on Microsoft Foundry, with the transcription and voice models also accessible on MAI Playground, the company’s new platform for testing large language models.
A Dual Strategy: Building In-House While Backing OpenAI
This launch highlights Microsoft’s dual-pronged approach to AI. While the company has invested over $13 billion into OpenAI and integrated its technology across its products, a recently renegotiated partnership has allowed it to more aggressively pursue its own in-house research.
In a blog post, Suleyman emphasized the company’s unique philosophy. “At Microsoft AI, we’re building Humanist AI. We have a distinct view when creating our AI models — putting humans at the center, optimizing for how people actually communicate, training for practical use.”
A key selling point for the new models is their cost-effectiveness. MAI-Transcribe-1 starts at $0.36 per hour, MAI-Voice-1 at $22 per 1 million characters, and MAI-Image-2 at $5 for 1 million text input tokens. This pricing structure positions them as more affordable alternatives to offerings from Google and OpenAI.
What Microsoft’s AI Push Means for MENA’s Tech Ecosystem
For the MENA region’s rapidly growing tech scene, Microsoft’s move offers significant new possibilities. The availability of powerful, lower-cost foundational models from a major enterprise player could lower the barrier to entry for startups in Riyadh, Dubai, and Cairo looking to build AI-powered products.
The new tools, particularly MAI-Transcribe-1 with its multi-language support, present opportunities for developing more sophisticated and localized services for the Arabic-speaking market. This increased competition in the AI model space gives regional founders and developers more choice, flexibility, and potentially lower operational costs, accelerating the integration of advanced AI capabilities into local business solutions and supporting the ambitious digital transformation goals of countries across the region.
About Microsoft AI
Microsoft AI is the tech giant’s dedicated research and development division for artificial intelligence. Led by CEO Mustafa Suleyman, the division focuses on building foundational models and integrating AI across Microsoft’s products and services, guided by a “Humanist AI” approach.
Source: TechCrunch


