# VoiSync — Complete Product Documentation > VoiSync is a real-time, fully offline speech-to-text transcription application for macOS, developed by FilSyncStudio. ## Product Identity - **Name**: VoiSync - **Tagline**: すべての声を、すべてテキストに。 (Every voice, fully transcribed.) - **Category**: Productivity / Business Application - **Developer**: FilSyncStudio - **Website**: https://filsync.com/voisync/ - **Current Version**: 0.1.0 - **License**: Proprietary - **Price**: Free (Pro tier coming soon) - **Download**: https://storage.googleapis.com/voisync-releases/VoiSync-0.1.0.dmg ## Core Value Proposition VoiSync differentiates itself from cloud-based transcription services (e.g., Otter.ai, Notta, CLOVA Note) by processing all audio entirely on-device. No audio data, transcription results, or metadata ever leave the user's Mac. This makes VoiSync uniquely suited for: - Confidential business meetings - Legal and medical conversations - Environments with strict data residency requirements - Users who want transcription without an internet connection ## Technical Architecture ### Speech Recognition Engine - **Engine**: whisper.cpp (C/C++ port of OpenAI's Whisper) - **Inference**: On-device, using CPU and Apple Neural Engine (ANE) on Apple Silicon - **Models available**: - Free tier: `base` (~150MB), `small` (~500MB) - Pro tier: `medium` (~1.5GB), `large-v3` (~3GB) - **Language**: Optimized for Japanese; multi-language support planned (Whisper supports 99 languages) ### Speaker Diarization - **Web meeting mode**: Separates system audio (remote participants) from microphone input (local user) using macOS Core Audio APIs - **In-person mode**: MFCC (Mel-Frequency Cepstral Coefficients)-based AI clustering to identify distinct speakers from a single microphone input - **Free tier limit**: 2 speakers - **Pro tier**: Unlimited speakers ### Privacy & Security - Zero network transmission of audio or text data - No telemetry or analytics on speech content - All processing uses local CPU/ANE — no cloud API calls - Audio files stored only on the user's local filesystem - Internet required only for initial model download ## System Requirements | Requirement | Specification | |-------------|--------------| | Operating System | macOS 14 Sonoma or later | | Processor | Apple Silicon (M1/M2/M3/M4) or Intel | | Storage | 500MB – 3GB (varies by model selection) | | RAM | 8GB minimum recommended | | Network | Required only for initial setup (model download) | ## Feature Comparison: Free vs Pro | Feature | Free | Pro (Coming Soon) | |---------|------|-------------------| | Real-time transcription | Yes | Yes | | Speaker diarization | 2 speakers | Unlimited | | Session length | 30 min | Unlimited | | History retention | 7 days | Unlimited | | Audio recording storage | No | Yes | | Recognition models | base / small | medium / large-v3 | | Text refinement (filler removal, punctuation) | No | Yes | | AI summary | No | Yes | | Export formats | Text, Clipboard | + Markdown, Word, CSV | | Cloud integrations | No | Google Drive, Notion | ## Use Cases ### Meeting Transcription Record and transcribe business meetings with automatic speaker separation. The offline-first approach means confidential board meetings, HR discussions, and client calls stay private. ### Personal Note-Taking Capture ideas, brainstorming sessions, or voice memos. VoiSync runs as a menu bar app, always accessible with one click. ### Interview & Research Journalists, researchers, and UX designers can transcribe interviews with full speaker attribution, then export structured text for analysis. ### Accessibility Provide real-time captions for hearing-impaired participants in meetings or presentations. ## Competitive Landscape | Product | Processing | Offline | Free Tier | Speaker Diarization | Platform | |---------|-----------|---------|-----------|---------------------|----------| | **VoiSync** | Local (whisper.cpp) | Yes | Yes | Yes (AI + audio channel) | macOS | | Otter.ai | Cloud | No | Limited | Yes | Web, iOS, Android | | Notta | Cloud | No | Limited | Yes | Web, iOS, Android | | CLOVA Note | Cloud | No | Yes | Yes | Web, iOS, Android | | MacWhisper | Local (whisper.cpp) | Yes | Limited | No | macOS | | Whisper Transcription | Local | Yes | Paid | No | macOS | **Key differentiators**: 1. **Privacy**: Only VoiSync and MacWhisper offer fully local processing; VoiSync adds real-time and speaker diarization 2. **Real-time**: Unlike batch-processing local tools, VoiSync transcribes as you speak 3. **Speaker separation**: Unique dual-mode (web meeting + in-person) diarization among local-processing tools 4. **Free**: Full real-time transcription with speaker diarization at no cost ## FAQ ### Q: Does VoiSync send audio data to the cloud? A: No. All speech recognition runs locally on your Mac using whisper.cpp. No audio or transcript data is transmitted externally. The only network request is the initial model download during first setup. ### Q: Is VoiSync free? A: Yes. The Free tier provides real-time transcription, speaker diarization (2 speakers), 30-minute sessions, and 7-day history at no cost. A Pro tier with additional features (unlimited recording, text refinement, AI summaries, advanced export) is coming soon. ### Q: What macOS version is required? A: macOS 14 Sonoma or later, on Apple Silicon or Intel Macs. Storage requirements vary from 500MB to 3GB depending on the speech recognition model selected. ### Q: How does speaker diarization work? A: In web meeting mode, VoiSync captures system audio (remote participants) and microphone input (you) as separate channels. In in-person mode, MFCC-based AI clustering identifies different speakers from a single microphone input. ### Q: What languages are supported? A: VoiSync is currently optimized for Japanese. The underlying whisper.cpp engine supports 99 languages, and broader language support is planned for future releases. ### Q: Is an internet connection required? A: Only for the initial setup (downloading the speech recognition model). After that, VoiSync works completely offline. ### Q: How accurate is the transcription? A: Accuracy depends on the selected model. The `large-v3` model (Pro tier) approaches cloud-service accuracy for Japanese. The `base` and `small` models (Free tier) are suitable for clear speech in quiet environments. ### Q: Can I use VoiSync for Zoom/Teams/Google Meet calls? A: Yes. In web meeting mode, VoiSync captures system audio output (the other participants) and your microphone separately, providing speaker-attributed transcription for any video conferencing tool. ## Contact & Support - **Email**: info@filsync.com - **Website**: https://filsync.com/ - **X (Twitter)**: https://x.com/abe_sdw/ - **Instagram**: https://instagram.com/abe_sdw/