questions: support@xosum.am
Abstract
Xosum.am is a cloud-based speech-to-text system designed for Armenian
language transcription. The platform supports real-time recording and file-
based transcription, with the ability to process multiple audio formats, including
MP3 and WAV. It is optimized for noise-robust speech recognition, allowing
transcription in varied acoustic environments. The system supports long-
duration files—up to 8 hours each—with significantly reduced processing times
compared to the file length. Xosum.am operates across all modern web
browsers on both desktop and mobile devices without restrictions, requiring
only an internet connection. A flexible pricing model offers subscription-based
access with additional transcription hours available for purchase. Additionally,
the system integrates with Telegram via a bot that enables transcription and
summarization of Armenian voice messages. This paper provides an overview
of the system architecture, functionalities, and future development roadmap.
1. Introduction
Speech-to-text technology is increasingly utilized in accessibility, data
processing, and workflow automation. Xosum.am provides an Armenian-
language transcription solution, leveraging cloud-based infrastructure for
scalability and accessibility. The system supports multiple input methods and
offers a robust speech recognition model optimized for real-world noise
conditions. By eliminating the need for dedicated hardware or software
installations, Xosum.am ensures accessibility across devices with minimal user-
side requirements. The following sections outline the system architecture, core
functionalities, and future development plans.
2. System Architecture
app.xosum.am. The platform consists of the following architectural
components:
2.1 Frontend
Web-based user interface supporting real-time recording, file uploads,
and transcription management.
Designed for compatibility across all modern desktop and mobile web
browsers.
Includes transcription history and user account management
functionalities.
2.2 Backend
Cloud-hosted speech processing engine implementing noise-robust
speech recognition models.
Supports simultaneous processing of multiple files.
Optimized for high-performance computing with reduced transcription
turnaround times.
2.3 Database
Stores transcriptions and associated metadata for retrieval.
Secure authentication and account-based access to transcription history.
2.4 Authentication Module
Google-based authentication for streamlined and secure user login.
Enables subscription management and usage tracking.
3. Core Functionalities
3.1 Speech-to-Text Conversion Modes
Xosum.am provides two primary methods for speech input:
1. Live Recording Mode
Speech is recorded directly via the browser and transcribed in real time.
The resulting text is displayed for immediate review and editing.
2. File Upload Mode
Users can upload audio files in multiple formats, including MP3, WAV, and
other common codecs.
The system processes files up to 8 hours in duration with reduced
processing times.
Supports concurrent file processing without user-side queuing constraints.
3.2 Noise-Robust Speech Recognition
The transcription engine employs noise-robust speech recognition models,
enabling high accuracy even in environments with background noise. This
enhances usability in settings such as meetings, lectures, and outdoor
recordings.
3.3 Simultaneous File Processing
The system supports parallel processing of multiple files, ensuring that users
can submit multiple transcription requests without waiting for sequential
completion.
3.4 Cross-Platform Accessibility
Xosum.am is accessible from both desktop and mobile devices without
platform-specific constraints. The web application is optimized for all modern
browsers, ensuring compatibility across Windows, macOS, Linux, Android, and
iOS. No additional software or extensions are required beyond an active
internet connection.
3.5 Transcription History Management
Users can access a history of previous transcriptions for reference and
download.
Stored transcripts remain available within user accounts, facilitating
workflow continuity.
3.6 User Authentication & Account Management
Google authentication is integrated for secure login.
User accounts track transcription usage and available processing hours.
3.7 Telegram Bot for Voice Message Transcription
A Telegram bot is available, allowing users to submit Armenian voice messages
for automatic transcription and summarization. This feature is designed for
users who rely on voice-based communication within messaging applications.
4. Pricing Model
Xosum.am operates on a hybrid pricing model, offering both subscription-
based and on-demand transcription options.
4.1 Subscription Plans
Users can opt for a monthly or yearly subscription, providing a fixed
number of transcription hours.
Subscription tiers accommodate varying usage needs, from occasional to
high-volume transcription requirements.
4.2 Additional Transcription Hours
Users can purchase additional transcription hours at a discounted rate.
There are no limits on the number of additional hours a user can acquire.
5. Cloud Infrastructure and Performance
Xosum.am leverages a scalable cloud infrastructure, ensuring:
High availability with minimal downtime.
Optimized speech processing algorithms to minimize transcription
turnaround times.
Secure storage with encryption to protect user data.
The system supports unlimited transcription requests, making it suitable for
both individual users and large-scale institutional use cases.
6. Future Roadmap
Future development efforts aim to expand the platform’s capabilities beyon
standard speech-to-text transcription. Planned enhancements include:
YouTube Link-Based Transcription
Users will be able to input a YouTube link and receive a full automated transcription of the video.
Automated Subtitle Generation for YouTube
The system will generate synchronized subtitles for YouTube videos.
Subtitle Support for TikTok and Instagram Reels
Subtitle processing will be extended to short-form video content, including TikTok and Instagram Reels.
Multilingual Subtitle and Transcription Translation
The system will support subtitle translation for videos, enabling broader accessibility.
General Transcription Translation
Users will have the option to translate full transcriptions into multiple languages.
These enhancements will position Xosum.am as a comprehensive speech and
media processing platform, extending beyond speech-to-text conversion into
multilingual and multimedia applications.
7. Conclusion
Xosum.am is a scalable and noise-robust Armenian speech-to-text
application, designed for both real-time and file-based transcription. The
system supports multiple input formats, long-duration files, parallel
processing, and cross-platform accessibility. With flexible pricing options
and Telegram bot integration, it provides a versatile solution for individuals and
organizations requiring Armenian-language transcription services.
Future developments will focus on YouTube and social media subtitle
generation, as well as multilingual transcription translation, further expanding
the platform’s capabilities in speech and video processing.
Comments