VibeThinker-1.5B

A New Benchmark for Efficient Small Models

VibeThinker-1.5B is an open-source reasoning model developed by WeiboAI (Sina Artificial Intelligence Lab), featuring 1.5 billion parameters. Despite its compact size, it demonstrates outstanding performance in mathematical and code reasoning tasks, surpassing some ultra-large-scale models like DeepSeek-R1.

Built on the innovative Spectrum-to-Signal Principle (SSP) training framework, VibeThinker achieves reasoning capabilities comparable to models with tens of billions of parameters, all while maintaining remarkably low training costs.

Explore Features View on Hugging Face Read Paper

Core Highlights

Small Yet Powerful

With only 1.5B parameters, VibeThinker demonstrates exceptional mathematical and logical reasoning capabilities, proving that intelligence doesn't always require massive scale.

Innovative Training Architecture

Employs the Spectrum-to-Signal Principle (SSP), combining SFT (Spectrum Phase) and RL (Signal Phase) for two-stage optimization, enabling diverse exploration and precise refinement.

Ultra-Low Training Cost

Total training time of approximately 3,900 GPU hours (around $7,800 USD), delivering exceptional performance-to-cost ratio compared to larger models.

Open and Extensible

Released under MIT open-source license, freely available for fine-tuning and commercial deployment, fostering innovation and research.

Reasoning-Optimized

Designed to run efficiently on resource-constrained edge devices and research environments, making advanced AI accessible to more users.

Developed by WeiboAI

Created by Sina's Artificial Intelligence Lab (WeiboAI), leveraging years of expertise in natural language processing and machine learning research.

Technical Innovation: Spectrum-to-Signal Principle

VibeThinker's core innovation lies in its unique two-stage training methodology

Spectrum Phase

Diversity & Exploration

Generates diverse reasoning paths and solution approaches, encouraging the model to explore "multiple possibilities" rather than converging prematurely on a single answer.

• Supervised Fine-Tuning (SFT) on diverse reasoning examples
• Encourages creative problem-solving approaches
• Builds a rich foundation of reasoning patterns

Signal Phase

Refinement & Optimization

Through MaxEnt-Guided Policy Optimization (MGPO), reinforces correct signals and focuses optimization on high-uncertainty samples, refining the model's reasoning accuracy.

• Reinforcement Learning (RL) with targeted feedback
• Concentrates on challenging, high-uncertainty cases
• Converges on optimal reasoning strategies

This "Diversity + Refinement" mechanism enables the model to think like humans: first diverge, then converge—exploring multiple approaches before settling on the optimal solution.

Performance Benchmarks

VibeThinker-1.5B demonstrates competitive performance across multiple reasoning benchmarks

Benchmark	Score	Comparison
AIME24	80.3	Leading among similar-sized models
AIME25	74.4	Approaching large model performance
HMMT25	50.4	Strong mathematical logic capability
LiveCodeBench v6	51.1	Outperforms some 10B+ models

Strengths

VibeThinker-1.5B outperforms DeepSeek-R1 (671B parameters) in reasoning tasks, showcasing the potential of "small models, big intelligence."

Considerations

Performance on general knowledge Q&A and encyclopedic tasks remains slightly behind ultra-large models, as expected for a specialized reasoning model.

Application Scenarios

Mathematical Problem Solving

Automated math tutoring systems, competition problem solvers, and educational platforms

Programming Education

Code generation, debugging assistance, and automated programming instruction

Scientific Research

Algorithm analysis, symbolic reasoning, and computational research assistance

Edge AI Applications

Lightweight local AI deployment on resource-constrained devices

Educational Tools

Competition preparation, homework assistance, and interactive learning systems

Logic Reasoning Platforms

Automated theorem proving, logical inference, and decision support systems

Model Information

Parameter Scale

1.5 Billion

Model Type

Dense LLM

Training Framework

Spectrum-to-Signal (SFT + RL)

Open Source License

MIT License

Development Organization

WeiboAI (Sina AI Lab)

Deployment Platform

Hugging Face / GitHub

Focus Areas

Math · Logic · Code Reasoning

Training Cost

~3,900 GPU hours (~$7,800)

Open Source & Resources

Model Homepage

VibeThinker-1.5B on Hugging Face

Technical Report

arXiv:2511.06221

GitHub Project

WeiboAI/VibeThinker

Conclusion

VibeThinker-1.5B represents a groundbreaking approach: "Use intelligent training methods, not blind parameter stacking." In an era of model miniaturization, reasoning enhancement, and efficient training, it is becoming an important milestone for next-generation high-performance open-source models.

Developed by WeiboAI (Sina Artificial Intelligence Lab), VibeThinker demonstrates that exceptional reasoning capabilities can be achieved with compact models through innovative training methodologies. This opens new possibilities for deploying advanced AI in resource-constrained environments and democratizing access to powerful reasoning models.

Key Takeaway: VibeThinker proves that with the right training approach, small models can achieve reasoning performance comparable to—or even exceeding—models hundreds of times larger, all while maintaining practical deployment costs and accessibility.