A: I grew up in Tehran, Iran, with a strong interest in math and physics as a high school student, so I chose my major at university accordingly. I got my bachelor’s degree in electrical engineering in Iran, and then moved to Germany to continue my studies, in Stuttgart to begin with and then in Aachen. I find that the number of women in engineering and sciences in my home country is much higher than here in Germany, or in Europe from what I can tell. I think the ratio of women in the field is considerably higher in developing countries, such as Iran, but also in China and India, because the work prospects are much better with studies in STEM (science, technology, engineering, medicine), as opposed to finance or social studies, so women are more motivated to pursue them.
The motivation for me for electrical engineering specifically was in the form of some electrical medical tools I was fascinated by and the idea that you can use such tools to improve the medical sector and the lives of many people as a result. I did not opt for studies in biomedical engineering because I wanted to first have a solid ground knowledge of the field of electrical engineering, but I did get to look at biomedical engineering applications in my master thesis in Stuttgart which was on the topic of medical image and signal processing in a cooperation with Siemens.
A: My PhD is on the topic of neural sequence-to-sequence modeling for language and speech translation. In other words, I aimed to use neural networks to model sequence-to-sequence applications specifically for text-to-text and speech-to-text translation. I do like to try new things in general, but though my PhD topic sounds different to electrical engineering at first, it is not so distant to what I did before. I worked on optical character recognition (OCR) during my BSc and really liked it, and my MSc thesis was on signal processing, so I always had a hand in machine learning methods, albeit with different applications.
While in Stuttgart, I worked at the Marx Planck Institute for Intelligent Systems alongside Prof. Bernhard Schölkopf who is a big name in the machine learning field. His main focus, however, is on causal structures that underlie statistical dependencies, which was too theoretical for my liking. I wanted to work on something with more practical applications, so I looked for other options when it came to my PhD. In the end, I joined the human language technology (HLT) group at RWTH Aachen University. It was not high on my list city-wise, but I was very impressed by Prof. Hermann Ney’s lab – this is how I got to Aachen and started my career in HLT.
A: To me, spoken and written languages are one of the main features of human intelligence and of course also a natural means of communication between people. Developing intelligent systems that can understand and generate natural languages is important to the science and engineering domains and a specific objective in human language technology. What could be more important to work on than communication among different nations and societies?
I was also intrigued by the topic because there are many unsolved problems in relation to language. Most of the tasks in the HLT field are difficult to solve in an automatic fashion. When you see such a list of unsolved problems to work on, it is exciting to know that your work can have a small impact on the entire research community.
I aimed to make some contributions towards this goal with the topic of my PhD. One of them was rethinking the design of attention sequence-to-sequence models, in particular how to overcome specific problems associated with these models by introducing and developing new models. Another contribution was to reconsider the cascaded pipeline for speech translation, which had traditionally been achieved by integrating automatic speech recognition (ASR) with machine translation (MT) systems. Instead, the idea was to work on a direct speech translation system, that provides a direct translation from audio in one language to text in another, without going through ASR. I developed new ideas on how to explore different types of training data and how to benefit from ASR and MT in an end-to-end fashion.
A: I finished my PhD in 2020, even though the graduation ceremony did not take place until recently. It was a major milestone in my educational journey and an incredibly valuable experience for me in every sense. It wasn’t easy – research in any field usually comes with overwhelming challenges. It can get discouraging to work hard and not always achieve the result you hoped for. Even more so because the process is quite long in Germany, as you have a lot of other duties to attend to while you are working on your research: you participate in teaching, projects, proposal writing, supervision of bachelor and master students, etc. The experience helped me to be more mature and patient; to act professionally and not emotionally; to be able to have harsh and difficult conversations without taking it personally.
A: During my PhD, I was the only woman among a group of 25-30 other PhD candidates, so it was definitely a very male-dominated lab that I worked in. It would have probably been nicer to have another woman around, but it wasn’t a problem for me. I was never made to feel uncomfortable, in fact one of the nicest things about the lab was its professional and friendly environment. We all helped each other, it was a collaborative space.
Having said this, gender imbalance is a fact in the field of science and engineering. But this does not mean women should not follow their dreams. To me, it is key to be passionate about what you do and never lose sight of why you chose that field in the first place. If you want to work in the forefront of novel technologies, facing challenges is a given in one’s career path, regardless of gender. My advice to other women that want to embark in this field is to be positive, motivated and eager to learn new skills.
A: The hat, the wagon and the fountain are all part of a tradition at RWTH Aachen University.
The hat is very special. Once you complete your PhD, your colleagues make a big hat for you decorated with ornaments in the top. The ornaments themselves are related to stories you shared with them along the way, perhaps something to do with a conference you presented at, a funny incident and so on. Mine had photos from different occasions like conferences, business trips, playing cards together, etc. plus the logos of companies I collaborated with. There are also some speech bubbles with funny or memorable conversations we had with our professor or among us. The ornaments on the top relate to models I worked on, my hobbies, even a hilarious one related to my fashion style!
There are drinks and snacks involved as part of the celebration of course, and then your colleagues carry you on a wagon and throw you into a fountain here in Aachen. There are two fountains where this ceremony takes place; mine was the smaller, more shallow one but the water was still freezing cold. I think this tradition expresses a feeling of freedom and relief, which is exactly what I am experiencing right now!
A: I had the opportunity to work at AppTek part-time while I was still carrying out my PhD, and I joined as a full-time machine-learning scientist once I’d completed it. This was very important to me career-wise, as PhD graduates typically only have academic experience to show for, but I had the opportunity to gain some industrial experience too by the time I completed my PhD.
At AppTek I have been working on machine translation, both text-to-text and speech-to-text, and in speech recognition, as machine learning skills are required in both. I like the company very much because it offers a professional, yet super-friendly environment to work in. There are five main science groups, ASR, MT, Text-To-Speech (TTS), dubbing, and natural language understanding (NLU), and I have been through four of them for a short while. I am now part of the dubbing team, which is the latest product the company is working on for the video localization sector.
All the teams are formed by people with advanced scientific backgrounds and comprehensive hands-on experience, closely working together. It is this collaboration that is the force behind the company’s achievements in a very competitive market. You can easily see the progress made and it is very inspiring. It is also the main reason I wanted to stay in the company after I had finished my PhD. We also conduct research and we keep publishing papers, which is not a given elsewhere – the work really involves a mix of everything, which makes AppTek a great company to work at.
A: One of the industry verticals AppTek has chosen to focus on is media and entertainment. This encouraged us to combine our scientific and technological advancements in ASR, MT and TTS in order to develop a fully automated video dubbing pipeline.
Automatic video dubbing is the process of automatically revoicing a video for new audiences in a target language. The goal is to not only transcribe and then translate what is spoken, but also present it in a similar voice to the original speakers, maintain the correct gender, and have the result look natural by matching the output with the actors’ body language and lip movements.
It is a difficult task. It combines a number of challenging aspects that all need to be addressed simultaneously, which makes it more challenging than the sum of its parts. You cannot just string the three core technologies together and expect to build a dubbing pipeline. All the components need to be integrated and this integration is not an easy one.
Currently, I am working on MT models that are best suited for a dubbing pipeline, especially to meet the requirements of the TTS component, for instance with respect to length. I am also working on a diacritizer for our Arabic system, which is also needed to generate the synthetic voice in the target language, so it is added as another feature of our MT.
There are many such potential research areas through which we can improve the dubbing technology, as it is relatively new and plenty of challenges remain. To count a few, there is a lot that can still be done with respect to prosody alignment, isometric MT, emotion-aware TTS, speech separation, speech placement, lip-syncing and most importantly language support/coverage. I am looking forward to working more on many of these areas. I think one of the next milestones in the dubbing pipeline will inevitably involve lip synchronisation with the help of computer vision technology. That would be a really cool milestone to work on!
A. I really like this quote! It is true for me as well. It reminds me of cooking actually. As a younger scientist, in the first year(s) of my PhD, I was told what to do and I also had many of my ideas, sometimes big and complex ones. The PhD and science in general has taught me that things are typically more complicated than we first assume. But, in my own experience, something as hard as a PhD can become easy if it is divided into smaller pieces. It is important is to be able to find the correct recipe which works best for you along the way and, more importantly, motivates you further to build on in and come up with subsequent recipes for your research.
AppTek.ai is a global leader in artificial intelligence (AI) and machine learning (ML) technologies for automatic speech recognition (ASR), neural machine translation (NMT), natural language processing/understanding (NLP/U), large language models (LLMs) and text-to-speech (TTS) technologies. The AppTek platform delivers industry-leading solutions for organizations across a breadth of global markets such as media and entertainment, call centers, government, enterprise business, and more. Built by scientists and research engineers who are recognized among the best in the world, AppTek’s solutions cover a wide array of languages/ dialects, channels, domains and demographics.