blog2026-03-2510 phút

Cursor Composer 2: Model AI Coding Mới Thay Đổi Gì? (Benchmark, Pricing & Workflow Thực Tế)

Cursor ra mắt Composer 2 với cải thiện lớn trên CursorBench (61.3 vs 44.2) và Terminal-Bench 2.0 (61.7), giá giảm ~86% so với Composer 1.5. Bài này phân tích những gì thực sự thay đổi, pricing mới, và cách tổ chức workflow để tận dụng tối đa long-horizon agentic coding.

Cursor Composer 2 — Recap Nhanh

Ngày 18 tháng 3 năm 2026, Cursor chính thức ra mắt Composer 2 — model AI coding được fine-tune riêng cho Cursor, tối ưu cho agentic workflows trong IDE.

Không giống như việc chỉ wrap Claude hay GPT, Composer là model nội bộ của Cursor được training đặc biệt. Composer 2 là version thứ hai, với một bước cải tiến căn bản: continued pretraining + reinforcement learning trên long-horizon coding tasks.

Những Gì Thực Sự Cải Thiện

1. Long-Horizon Task Training

Đây là cải tiến quan trọng nhất. Từ blog chính thức của Cursor:

"From this base, we train on long-horizon coding tasks through reinforcement learning. Composer 2 is able to solve challenging tasks requiring hundreds of actions."

Ý nghĩa thực tế: Composer 1.x thường "mất track" sau 10-15 bước trong một complex task. Composer 2 maintain context và coherence tốt hơn khi task đòi hỏi:

Refactor nhiều files liên quan
Debug qua nhiều layers (frontend → backend → DB)
Build feature từ đầu, bao gồm tests

2. Context Window 200k Tokens

200k context window cho phép Composer 2 "nhìn thấy" cả codebase lớn hơn trong một lần. Điều này trực tiếp ảnh hưởng đến multi-file work quality.

3. Continued Pretraining Base

Cursor đây là lần đầu tiên sử dụng continued pretraining run — cung cấp base mạnh hơn để scale reinforcement learning. Đây là lý do technical tại sao benchmark improvements lớn hơn các version trước.

Benchmarks: Con Số Thực Tế

Đây là dữ liệu verified từ Cursor blog và VentureBeat:

CursorBench (Cursor's internal coding benchmark)

Model	CursorBench Score
Composer 2	61.3
Composer 1.5	44.2
Composer 1	38.0

→ Tăng 38.7% so với Composer 1.5.

Terminal-Bench 2.0 (Agent terminal evaluation)

Model	Score
GPT-5.4	75.1
Composer 2	61.7
Claude Opus 4.6	58.0

→ Composer 2 vượt Claude Opus 4.6, đứng thứ 2 sau GPT-5.4.

SWE-bench Multilingual

Composer 2 đạt 73.7 — đây là benchmark đo software engineering tasks trên nhiều ngôn ngữ lập trình.

Benchmark Limitations — Điều Benchmark Không Nói

Benchmark tốt không có nghĩa là nó sẽ tốt với codebase và workflow cụ thể của bạn. CursorBench đo Cursor-specific tasks, rất relevant. Nhưng:

Nếu bạn chủ yếu làm data science notebooks → benchmark này ít predictive
Nếu bạn làm embedded systems/low-level code → Terminal-Bench relevant hơn
Real-world performance phụ thuộc vào cách bạn prompt và structure tasks

Pricing — Con Số Cụ Thể

Đây là pricing verified từ Cursor blog (March 2026):

Composer 2 Standard

Input: $0.50 / 1M tokens
Output: $2.50 / 1M tokens
Cache-read: $0.20 / 1M tokens

Composer 2 Fast (Default)

Input: $1.50 / 1M tokens
Output: $7.50 / 1M tokens
Cache-read: $0.35 / 1M tokens

So Sánh Với Composer 1.5

Theo VentureBeat: Composer 2 Standard giảm ~86% chi phí so với Composer 1.5. "Fast" tier vẫn rẻ hơn nhiều fast models khác trên thị trường.

Subscription Plans

Plan	Price
Hobby	Free
Pro	$20/tháng
Pro+	$60/tháng
Ultra	$200/tháng

Composer usage nằm trong standalone usage pool — không conflict với GPT/Claude quota.

Cursor đang make Fast là default. Nếu bạn đang dùng Cursor mà không thấy option chọn, check Settings → Models.

Workflow Thực Tế: End-to-End Feature Build

Đây là cách tổ chức workflow để tận dụng tối đa Composer 2's long-horizon capabilities:

Step 1: Repo Scan + Plan

Đừng nhảy thẳng vào code. Để Composer 2 hiểu codebase trước:

Prompt mở đầu:
"Scan the /src directory and give me:
1. Main architecture patterns you see
2. How state management works
3. Where the API layer lives
4. Existing test patterns

I want to add [FEATURE]. Before we start, ask me clarifying questions."

Composer 2's 200k context window giúp nó xử lý toàn bộ codebase scan này hiệu quả hơn.

Step 2: Multi-File Edit + Tests

Sau khi có plan, break thành goals rõ ràng:

"Now let's implement [FEATURE]. 
Goals:
1. Create the service layer in /src/services/
2. Add API endpoint in /src/api/
3. Update types in /src/types/
4. Write unit tests for the service

Constraints:
- Follow existing patterns in /src/services/user.service.ts
- Don't change existing API contracts
- All new functions must have JSDoc

Start with Goal 1, pause and show me before moving to Goal 2."

Tại sao checkpoint quan trọng: Long-horizon model có thể "drift" nếu task quá dài và không có feedback loop. Checkpoint sau mỗi goal giúp maintain direction.

Step 3: Runtime Debug + Re-plan

Khi gặp lỗi runtime:

"I got this error: [paste error + stack trace]
Context: This happens when [describe scenario]
File locations involved: [list files]

Don't just fix this error. First explain what caused it, then propose the fix, wait for my approval."

Pattern này quan trọng với long-horizon tasks: Để Composer explain trước, không auto-fix blind. Giúp bạn maintain understanding của codebase.

Best Practices Cho Agentic Coding

1. Break Tasks Into Goals + Checkpoints

❌ Bad:    "Refactor this entire module to use new auth system"
✅ Good:   "Goal 1: Update auth middleware. Pause. Show diff. I approve. Goal 2: Update route handlers..."

2. Provide Constraints Rõ Ràng

❌ Vague:  "Write good code"
✅ Clear:  "Follow patterns in /examples/. Max function length 50 lines. No any types. Error handling required."

3. Use Summaries Cho Long Runs

Sau 20+ back-and-forth turns, context quality có thể giảm. Dùng:

"Before we continue, summarize:
1. What we've built so far
2. What's left
3. Any decisions we made about patterns/trade-offs"

Sau đó start fresh context với summary này làm prefix.

Nên Switch Không? Decision Guide

Switch Sang Composer 2 Nếu:

✅ Bạn thường xuyên làm multi-file features với nhiều dependencies
✅ Bạn debug complex runtime issues qua nhiều layers
✅ Context dài và coherence quan trọng với workflow của bạn
✅ Bạn hiện dùng Composer 1.5 (giá rẻ hơn nhiều, quality cao hơn)

Chưa Cần Thiết Nếu:

❌ 90% workflow là autocomplete và single-line suggestions
❌ Tasks ngắn, không cần long-horizon reasoning
❌ Bạn đã happy với Claude/GPT integration trong Cursor

Kết Luận

Composer 2 là bước cải tiến thực chất — không phải chỉ marketing. Benchmark gains (38.7% CursorBench, 6.4% hơn Claude Opus trên Terminal-Bench) kết hợp với giá giảm 86% làm cho đây là upgrade có ROI rõ ràng cho developers đang dùng Cursor.

Thứ quan trọng hơn số lượng token là cách bạn structure long-horizon tasks. Model tốt hơn không có nghĩa là bạn có thể dump toàn bộ task trong một messy prompt.

CTA: Thử build một feature hoàn chỉnh với Composer 2 bằng explicit multi-step plan. So sánh thời gian và số lần phải fix so với workflow trước của bạn.