Abu Dhabi’s Technology Innovation Institute (TII) has expanded its open-source AI portfolio with the release of Falcon Perception and Falcon OCR. Developed as advanced vision models that can identify, segment, and read objects using natural language, the new systems challenge conventional computer vision approaches. By utilizing a highly efficient architecture, TII’s latest releases are matching or outperforming significantly larger global counterparts, including systems built by Meta.
Quick Facts
- TII launched open-source Falcon Perception and Falcon OCR models.
- Falcon Perception outperforms Meta’s SAM 3 in mask quality.
- New models use highly efficient early-fusion Transformer architecture.
Reimagining Computer Vision Architecture
Most vision AI systems operate through separate modules for seeing an image and understanding its context, a design choice that inherently adds complexity and limits interaction between text and visual features. TII dismantles this standard with an early-fusion Transformer design. In the Falcon models, image and text tokens interact from the initial processing layer instead of being siloed.
The architecture leverages a “Chain-of-Perception” methodology. The model resolves each segmentation instance in a specific sequence: determining position, then size, and finally the pixel mask. This sequential processing provides crucial context before producing spatial outputs, heavily improving accuracy in crowded or complex scenes.
Beating Global Giants on Efficiency
The practical results of this architectural shift are striking. Falcon Perception, a model with just 0.6 billion parameters, scored 68.0 Macro-F1 on the SA-Co benchmark. This notably surpasses Meta’s SAM 3, which scored 62.3. The Falcon model showed particularly strong gains in identifying food and drink, sports, and complex attribute recognition.
To further test AI vision models against compositional prompts, TII introduced PBench, a new diagnostic benchmark covering objects, attributes, text-reading, and spatial constraints. On PBench, Falcon Perception scored a 57.0 average Macro-F1, pulling ahead of SAM 3 (44.4) and even the massive Qwen3-VL-30B (52.7).
Despite its compact size, Falcon Perception outperforms the 2-billion and 9-billion parameter Moondream models across most reasoning tasks. TII’s system is proving to be between 3 and 15 times more parameter-efficient than the models it defeats, signaling a major win for scalable AI deployment.
High-Speed Document Processing with Falcon OCR
Alongside the core perception model, TII launched Falcon OCR. Running on just 300 million parameters, it is built to deliver heavy document reading accuracy.
The OCR model achieved 80.3% accuracy on the olmOCR benchmark and an 88.64 overall score on OmniDocBench. When tested on a single Nvidia A100-80GB GPU in a standard serving configuration, Falcon OCR processed 5,825 tokens per second, equivalent to roughly 2.9 images per second.
While TII noted an ongoing limitation regarding presence calibration—where the model occasionally struggles to identify when a described object is absent—early reinforcement learning tests have already produced an 8-point improvement. The research team has made all models, code, benchmarks, and a public playground freely available to developers.
About Technology Innovation Institute (TII)
The Technology Innovation Institute (TII) is the global applied research center of Abu Dhabi’s Advanced Technology Research Council (ATRC). Operating at the intersection of discovery science and advanced technology, TII focuses on building applied research capabilities across multiple domains, driving the UAE’s position as a premier global hub for artificial intelligence and software engineering.
Source: Middle East AI News


