build-ai2026-03-1320 min

AI Voice Journal: Xây Dựng Nhật Ký Giọng Nói Với Whisper + OpenAI Trong 1 Giờ

Ghi âm suy nghĩ bằng giọng nói, AI tự động chuyển thành văn bản, tóm tắt, và phân tích cảm xúc. Tutorial này hướng dẫn build từ đầu với Python và API của OpenAI.

AI Voice Journal: Xây Dựng Nhật Ký Giọng Nói Với Whisper + OpenAI Trong 1 Giờ

Voice journaling là cách ghi lại suy nghĩ tự nhiên nhất — nhanh hơn gõ phím, không cần ngồi vào bàn. Kết hợp với AI, mỗi đoạn ghi âm trở thành entry có cấu trúc: transcript, tóm tắt, mood, key insights.

📌 TL;DR: Kết Quả Cuối Bài

Input: File audio (MP3/WAV/M4A) hoặc ghi âm trực tiếp từ mic
Output: Transcript text + AI summary + mood analysis + key themes
Lưu trữ: JSON file theo ngày, dễ export và search
Tech stack: Python + OpenAI Whisper + GPT-4o
Thời gian build: ~1-2 giờ

Kiến Trúc App

Nguồn audio (file hoặc mic)
         ↓
Whisper API: Speech → Text (transcript)
         ↓
GPT-4o: Analyze transcript
  - Summary (3-5 câu)
  - Mood: positive/neutral/negative + mức độ
  - Key themes: [list]
  - Action items: [nếu có]
  - Notable quotes: [trích dẫn đáng nhớ]
         ↓
JSON storage + display

Setup

python -m venv voice-journal
source voice-journal/bin/activate

pip install openai python-dotenv pyaudio wave

Lưu ý macOS: pyaudio cần portaudio: brew install portaudio

File .env:

OPENAI_API_KEY=sk-proj-your-key-here
JOURNAL_DIR=./journal_entries

Module 1: Transcribe Audio

# transcriber.py
import openai
from pathlib import Path
from dotenv import load_dotenv
import os

load_dotenv()
client = openai.OpenAI()

def transcribe_audio(audio_path: str, language: str = "vi") -> dict:
    """
    Whisper transcribe audio file.
    language: "vi" cho tiếng Việt, "en" cho English, None để auto-detect
    """
    audio_path = Path(audio_path)
    
    if not audio_path.exists():
        raise FileNotFoundError(f"Không tìm thấy file: {audio_path}")
    
    # Check file size (Whisper limit: 25MB)
    file_size_mb = audio_path.stat().st_size / (1024 * 1024)
    if file_size_mb > 25:
        raise ValueError(f"File quá lớn: {file_size_mb:.1f}MB (max 25MB)")
    
    print(f"🎙️ Đang transcribe: {audio_path.name} ({file_size_mb:.1f}MB)")
    
    with open(audio_path, 'rb') as audio_file:
        kwargs = {
            "model": "whisper-1",
            "file": audio_file,
            "response_format": "verbose_json",  # Trả về word timestamps
        }
        if language:
            kwargs["language"] = language
            
        transcript = client.audio.transcriptions.create(**kwargs)
    
    return {
        "text": transcript.text,
        "language": transcript.language,
        "duration_seconds": transcript.duration,
        "segments": [
            {"text": seg.text, "start": seg.start, "end": seg.end}
            for seg in transcript.segments
        ] if hasattr(transcript, 'segments') else []
    }

Module 2: Ghi Âm Từ Mic (Optional)

# recorder.py
import pyaudio
import wave
import tempfile
from datetime import datetime

def record_audio(duration_seconds: int = 60, sample_rate: int = 44100) -> str:
    """
    Record từ microphone và lưu thành file WAV.
    Returns: path to temp WAV file
    """
    CHUNK = 1024
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    
    audio = pyaudio.PyAudio()
    
    print(f"🔴 Bắt đầu ghi âm... ({duration_seconds} giây)")
    print("Nhấn Ctrl+C để dừng sớm")
    
    stream = audio.open(
        format=FORMAT,
        channels=CHANNELS,
        rate=sample_rate,
        input=True,
        frames_per_buffer=CHUNK
    )
    
    frames = []
    try:
        for _ in range(0, int(sample_rate / CHUNK * duration_seconds)):
            data = stream.read(CHUNK)
            frames.append(data)
    except KeyboardInterrupt:
        print("\n⏹️ Đã dừng ghi âm")
    
    stream.stop_stream()
    stream.close()
    audio.terminate()
    
    # Lưu temp file
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    temp_path = f"/tmp/voice_journal_{timestamp}.wav"
    
    with wave.open(temp_path, 'wb') as wf:
        wf.setnchannels(CHANNELS)
        wf.setsampwidth(audio.get_sample_size(FORMAT))
        wf.setframerate(sample_rate)
        wf.writeframes(b''.join(frames))
    
    print(f"✅ Đã lưu audio: {temp_path}")
    return temp_path

Module 3: AI Analysis

# journal_analyzer.py
from openai import OpenAI
import json

client = OpenAI()

def analyze_journal_entry(transcript: str) -> dict:
    """
    Phân tích transcript và trả về structured insights.
    """
    prompt = f"""Bạn là một AI journal assistant. Phân tích nội dung nhật ký sau và trả về JSON với cấu trúc chính xác.

NỘI DUNG NHẬT KÝ:
{transcript}

Trả về JSON (CHÍNH XÁC format dưới đây, không thêm text):
{{
  "summary": "tóm tắt 3-5 câu về nội dung chính",
  "mood": {{
    "label": "positive|neutral|negative|mixed",
    "score": 1-10,
    "description": "mô tả ngắn về tâm trạng"
  }},
  "key_themes": ["chủ đề 1", "chủ đề 2", "chủ đề 3"],
  "action_items": ["việc cần làm 1", "việc cần làm 2"],
  "notable_quotes": ["trích dẫn đáng nhớ"],
  "word_count": số từ trong transcript,
  "language": "Vietnamese|English|Mixed"
}}"""
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2,
        response_format={"type": "json_object"}
    )
    
    return json.loads(response.choices[0].message.content)

Module 4: Storage & Main Runner

# voice_journal.py
import json
from datetime import datetime
from pathlib import Path
from transcriber import transcribe_audio
from journal_analyzer import analyze_journal_entry
import os
from dotenv import load_dotenv

load_dotenv()
JOURNAL_DIR = Path(os.getenv("JOURNAL_DIR", "./journal_entries"))
JOURNAL_DIR.mkdir(exist_ok=True)

def save_entry(transcript_data: dict, analysis: dict, audio_path: str) -> str:
    """Lưu journal entry với đầy đủ metadata."""
    
    timestamp = datetime.now()
    entry = {
        "id": timestamp.strftime("%Y%m%d_%H%M%S"),
        "date": timestamp.isoformat(),
        "audio_source": str(audio_path),
        "transcript": transcript_data["text"],
        "transcript_meta": {
            "language": transcript_data.get("language"),
            "duration_seconds": transcript_data.get("duration_seconds")
        },
        "analysis": analysis
    }
    
    # Lưu theo ngày: journal_entries/2026-03-13.json
    date_file = JOURNAL_DIR / f"{timestamp.strftime('%Y-%m-%d')}.json"
    
    entries = []
    if date_file.exists():
        with open(date_file) as f:
            entries = json.load(f)
    
    entries.append(entry)
    
    with open(date_file, 'w', encoding='utf-8') as f:
        json.dump(entries, f, ensure_ascii=False, indent=2)
    
    return str(date_file)

def process_audio_file(audio_path: str):
    """Main workflow: audio → transcript → analysis → save."""
    
    print("🎙️ AI Voice Journal")
    print("="*50)
    
    # Step 1: Transcribe
    transcript_data = transcribe_audio(audio_path)
    print(f"✅ Transcript: {len(transcript_data['text'].split())} từ")
    
    # Step 2: Analyze
    print("🤖 Đang phân tích với AI...")
    analysis = analyze_journal_entry(transcript_data["text"])
    
    # Step 3: Save
    saved_path = save_entry(transcript_data, analysis, audio_path)
    
    # Display
    print("\n" + "="*50)
    print(f"📖 ENTRY: {datetime.now().strftime('%d/%m/%Y %H:%M')}")
    print(f"⏱️  Thời lượng: {transcript_data.get('duration_seconds', '?'):.0f}s")
    print(f"\n📝 TRANSCRIPT:\n{transcript_data['text'][:500]}{'...' if len(transcript_data['text']) > 500 else ''}")
    print(f"\n💡 TÓM TẮT:\n{analysis['summary']}")
    print(f"\n😊 TÂM TRẠNG: {analysis['mood']['label']} ({analysis['mood']['score']}/10)")
    print(f"   → {analysis['mood']['description']}")
    
    if analysis.get('key_themes'):
        print(f"\n🏷️  CHỦ ĐỀ: {', '.join(analysis['key_themes'])}")
    
    if analysis.get('action_items'):
        print(f"\n✅ VIỆC CẦN LÀM:")
        for item in analysis['action_items']:
            print(f"   - {item}")
    
    print(f"\n💾 Đã lưu: {saved_path}")

if __name__ == "__main__":
    import sys
    if len(sys.argv) > 1:
        process_audio_file(sys.argv[1])
    else:
        print("Usage: python voice_journal.py <audio_file>")
        print("Example: python voice_journal.py recording.mp3")

Chạy Thử

# Với file có sẵn
python voice_journal.py my_recording.mp3

# Output mẫu:
# 🎙️ Đang transcribe: my_recording.mp3 (2.1MB)
# ✅ Transcript: 347 từ
# 🤖 Đang phân tích với AI...
# 
# 📖 ENTRY: 13/03/2026 14:30
# ⏱️ Thời lượng: 87s
# 
# 📝 TRANSCRIPT:
# Hôm nay tôi hoàn thành xong dự án API integration...
# 
# 💡 TÓM TẮT:
# Entry này ghi lại cảm xúc sau khi hoàn thành dự án lớn...
# 
# 😊 TÂM TRẠNG: positive (8/10)
#    → Hào hứng và hài lòng, có cảm giác thành tựu
# 
# 🏷️ CHỦ ĐỀ: work, achievement, planning
# 
# ✅ VIỆC CẦN LÀM:
#    - Review code với team vào thứ Hai
#    - Viết documentation cho API

Mở Rộng

Search entries: Dùng embedding để tìm kiếm ngữ nghĩa qua tất cả entries
Weekly digest: AI tổng hợp tất cả entries trong tuần thành báo cáo
Telegram bot: Gửi voice message qua Telegram, nhận analysis ngay
Web UI: Frontend đơn giản để xem và search entries

Đọc thêm:

Build chatbot AI đầu tiên: AI Chatbot Guide
Dùng RAG cho knowledge base cá nhân: RAG Guide
AI API fundamentals: AI API Guide
Multimodal AI (Whisper là một ví dụ): Multimodal AI Guide

Khám phá thêm:

Use AI AI Tools Prompts Workflows Build with AI