# VoiSync — Complete Product Documentation

> VoiSync is a real-time, fully offline speech-to-text transcription application for macOS, developed by FilSyncStudio.

## Product Identity

- **Name**: VoiSync
- **Tagline**: すべての声を、すべてテキストに。 (Every voice, fully transcribed.)
- **Category**: Productivity / Business Application
- **Developer**: FilSyncStudio
- **Website**: https://filsync.com/voisync/
- **Current Version**: 0.1.0
- **License**: Proprietary
- **Price**: Free (Pro tier coming soon)
- **Download**: https://storage.googleapis.com/voisync-releases/VoiSync-0.1.0.dmg

## Core Value Proposition

VoiSync differentiates itself from cloud-based transcription services (e.g., Otter.ai, Notta, CLOVA Note) by processing all audio entirely on-device. No audio data, transcription results, or metadata ever leave the user's Mac. This makes VoiSync uniquely suited for:

- Confidential business meetings
- Legal and medical conversations
- Environments with strict data residency requirements
- Users who want transcription without an internet connection

## Technical Architecture

### Speech Recognition Engine
- **Engine**: whisper.cpp (C/C++ port of OpenAI's Whisper)
- **Inference**: On-device, using CPU and Apple Neural Engine (ANE) on Apple Silicon
- **Models available**:
  - Free tier: `base` (~150MB), `small` (~500MB)
  - Pro tier: `medium` (~1.5GB), `large-v3` (~3GB)
- **Language**: Optimized for Japanese; multi-language support planned (Whisper supports 99 languages)

### Speaker Diarization
- **Web meeting mode**: Separates system audio (remote participants) from microphone input (local user) using macOS Core Audio APIs
- **In-person mode**: MFCC (Mel-Frequency Cepstral Coefficients)-based AI clustering to identify distinct speakers from a single microphone input
- **Free tier limit**: 2 speakers
- **Pro tier**: Unlimited speakers

### Privacy & Security
- Zero network transmission of audio or text data
- No telemetry or analytics on speech content
- All processing uses local CPU/ANE — no cloud API calls
- Audio files stored only on the user's local filesystem
- Internet required only for initial model download

## System Requirements

| Requirement | Specification |
|-------------|--------------|
| Operating System | macOS 14 Sonoma or later |
| Processor | Apple Silicon (M1/M2/M3/M4) or Intel |
| Storage | 500MB – 3GB (varies by model selection) |
| RAM | 8GB minimum recommended |
| Network | Required only for initial setup (model download) |

## Feature Comparison: Free vs Pro

| Feature | Free | Pro (Coming Soon) |
|---------|------|-------------------|
| Real-time transcription | Yes | Yes |
| Speaker diarization | 2 speakers | Unlimited |
| Session length | 30 min | Unlimited |
| History retention | 7 days | Unlimited |
| Audio recording storage | No | Yes |
| Recognition models | base / small | medium / large-v3 |
| Text refinement (filler removal, punctuation) | No | Yes |
| AI summary | No | Yes |
| Export formats | Text, Clipboard | + Markdown, Word, CSV |
| Cloud integrations | No | Google Drive, Notion |

## Use Cases

### Meeting Transcription
Record and transcribe business meetings with automatic speaker separation. The offline-first approach means confidential board meetings, HR discussions, and client calls stay private.

### Personal Note-Taking
Capture ideas, brainstorming sessions, or voice memos. VoiSync runs as a menu bar app, always accessible with one click.

### Interview & Research
Journalists, researchers, and UX designers can transcribe interviews with full speaker attribution, then export structured text for analysis.

### Accessibility
Provide real-time captions for hearing-impaired participants in meetings or presentations.

## Competitive Landscape

| Product | Processing | Offline | Free Tier | Speaker Diarization | Platform |
|---------|-----------|---------|-----------|---------------------|----------|
| **VoiSync** | Local (whisper.cpp) | Yes | Yes | Yes (AI + audio channel) | macOS |
| Otter.ai | Cloud | No | Limited | Yes | Web, iOS, Android |
| Notta | Cloud | No | Limited | Yes | Web, iOS, Android |
| CLOVA Note | Cloud | No | Yes | Yes | Web, iOS, Android |
| MacWhisper | Local (whisper.cpp) | Yes | Limited | No | macOS |
| Whisper Transcription | Local | Yes | Paid | No | macOS |

**Key differentiators**:
1. **Privacy**: Only VoiSync and MacWhisper offer fully local processing; VoiSync adds real-time and speaker diarization
2. **Real-time**: Unlike batch-processing local tools, VoiSync transcribes as you speak
3. **Speaker separation**: Unique dual-mode (web meeting + in-person) diarization among local-processing tools
4. **Free**: Full real-time transcription with speaker diarization at no cost

## FAQ

### Q: Does VoiSync send audio data to the cloud?
A: No. All speech recognition runs locally on your Mac using whisper.cpp. No audio or transcript data is transmitted externally. The only network request is the initial model download during first setup.

### Q: Is VoiSync free?
A: Yes. The Free tier provides real-time transcription, speaker diarization (2 speakers), 30-minute sessions, and 7-day history at no cost. A Pro tier with additional features (unlimited recording, text refinement, AI summaries, advanced export) is coming soon.

### Q: What macOS version is required?
A: macOS 14 Sonoma or later, on Apple Silicon or Intel Macs. Storage requirements vary from 500MB to 3GB depending on the speech recognition model selected.

### Q: How does speaker diarization work?
A: In web meeting mode, VoiSync captures system audio (remote participants) and microphone input (you) as separate channels. In in-person mode, MFCC-based AI clustering identifies different speakers from a single microphone input.

### Q: What languages are supported?
A: VoiSync is currently optimized for Japanese. The underlying whisper.cpp engine supports 99 languages, and broader language support is planned for future releases.

### Q: Is an internet connection required?
A: Only for the initial setup (downloading the speech recognition model). After that, VoiSync works completely offline.

### Q: How accurate is the transcription?
A: Accuracy depends on the selected model. The `large-v3` model (Pro tier) approaches cloud-service accuracy for Japanese. The `base` and `small` models (Free tier) are suitable for clear speech in quiet environments.

### Q: Can I use VoiSync for Zoom/Teams/Google Meet calls?
A: Yes. In web meeting mode, VoiSync captures system audio output (the other participants) and your microphone separately, providing speaker-attributed transcription for any video conferencing tool.

## Contact & Support

- **Email**: info@filsync.com
- **Website**: https://filsync.com/
- **X (Twitter)**: https://x.com/abe_sdw/
- **Instagram**: https://instagram.com/abe_sdw/