AppTek Company Overview

Recent Academic Research and Publications

Dynamic Acoustic Model Architecture Optimization in Training for ASR

June 2025

Jingjing Xu, Zijian Yang, Albert Zeyer, Eugen Beck, Ralf Schlueter, Hermann Ney

Architecture design is inherently complex. Existing approaches rely on either handcrafted rules, which demand extensive empirical expertise, or automated methods like neural architecture search, which are computationally intensive. In this paper, we introduce DMAO, an architecture optimization framework that employs a grow-and-drop strategy to automatically reallocate parameters during training. This reallocation shifts resources from less-utilized areas to those parts of the model where they are most beneficial. Notably, DMAO only introduces negligible training overhead at a given model complexity. We evaluate DMAO through experiments with CTC on LibriSpeech, TED-LIUM-v2 and Switchboard datasets. The results show that, using the same amount of training resources, our proposed DMAO consistently improves WER by up to 6% relatively across various architectures, model sizes, and datasets. Furthermore, we analyze the pattern of parameter redistribution and uncover insightful findings.

Company Overview

Home / Company Overview

About AppTek.ai

Company History and Timeline

Recent Academic Research and Publications

Dynamic Acoustic Model Architecture Optimization in Training for ASR

Prompting and Fine-Tuning of Small LLMs for Length-Controllable Telephone Call Summarization

Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions

Chunked Attention-Based Encoder-Decoder Model for Streaming Speech Recognition

Climategpt: Towards ai synthesizing interdisciplinary research on climate change

Improving Language Model Integration for Neural Machine Translation

Take the Hint: Improving Diacritization with Partially-Diacritized Text

Improving And Analyzing Neural Speaker Embeddings for ASR

Self-Normalized Importance Sampling for Neural Language Modeling

Efficient Training of Neural Transducer for Speech Recognition

Automatic Learning of Subword Dependent Model Scales

Improving the Training Recipe for a Robust Conformer-based Hybrid Model

Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept

On Sampling-Based Training Criteria for Neural Language Modeling

The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech

Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures

Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition

Librispeech Transducer Model with Internal Language Model Prior Correction

A New Training Pipeline for an Improved Neural Transducer

Early Stage LM Integration Using Local and Global Log-Linear Combination

Robust Beam Search for Encoder-Decoder Attention Based Speech Recognition without Length Bias

LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring

Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies

Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron

Language Modeling with Deep Transformers

Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech

Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos

AI and ML Technologies to Bridge the Language Gap

Find us on Social Media:









ABOUT APPTEK.ai

SEARCH APPTEK.AI

SITEMAP

LATEST NEWS

LATEST BLOG POSTS