Simulated air traffic control has been a desired feature for pilot training for many years. Becoming a copilot in a multi-crew passenger aircraft requires only 240 hours of flight time.
Surprisingly to many, flight time does not always mean in an airplane, most of the hours can be in a flight simulator. The multi-crew pilots’ license aims to qualify a person with no previous experience and often with limited English language skills in much less time than traditional pilot training.
The Link Trainer, the best known early flight simulation device, was first released for sale in 1929. In the nine decades since its release, flight simulators have reached a level of sophistication and realism that would be unimaginable to the inventor of the Link Trainer. The capabilities of a multi-million dollar device can easily convince those of us that don’t fly professionally; what they are experiencing is real. And of course, realism is crucial to representative simulation training.
Simulated Air Traffic Control Environment – The Missing Link
As sophisticated as flight simulators might be, the industry has long recognized that the lack of a simulated air traffic control environment (SATCE) and associated radio communications are detrimental to training. Having the instructor role-play ATC, increases workload and detracts from the primary roles of teaching and assessing. Furthermore, a simulator instructor cannot be familiar with differing worldwide rules of air traffic control. Nor can the instructor effectively simulate the often busy and complex radio environment.
SATCE Components
The document ARINC Spec 439A specifies guidelines for a simulated air traffic environment (SATCE) for flight simulators. ARINC 439A is a comprehensive document. But, SATCE includes the following key components:
- A target generator – Realistically simulates the movements of the background/surrounding aircraft, following local air traffic control procedures and, using realistic aircraft and airline performance profiles. The visual systems and appropriate devices, e.g., TCAS, display the generated aircraft targets.
- An ATC and radio communications subsystem – Generates realistic ATC instructions based on the unfolding traffic situation. Produces and manages the transmission of simulated air traffic controllers and background traffic radio communications.
- Ownship management – The ownship is the flight simulator attached to the SATCE. Features include the capability for pilots to interact with the simulated controllers. Speech recognition is the desired method for the pilot to controller communications. In the ownship management function, the simulated air traffic controllers, issue instructions to the ownship. Considerations given to the ATC commands include the training plan, the air traffic situation, and local airport and airspace procedures.
SATCE and Automatic Speech Recognition
Of the three components of this much-simplified description, speech recognition is the most challenging.
Believing that automatic speech recognition (ASR) is a solved problem is understandable. This belief may especially ring true if you are fortunate enough to be a native English speaker with clear pronunciation. Dictating a text on your phone or using the dictation feature in Microsoft Word is remarkably accurate for most. Smart assistants are part of our daily routine, and speech recognition is a standard feature in most modern cars.
When did you last call your bank or cable company and not have to interact using speech recognition? The practicality and accuracy improvements of speech recognition, even in just the last ten years, are stunning. However, ASR in SATCE is not dictation, and there are several challenges to solve before SATCE vendors can deliver a seamless speech interface.
The Challenges of an Acoustic Model
An acoustic model is a file that contains statistical representations of each of the distinct sounds that makes up a word. This earlier article further explains the importance of the acoustic model and the impact on speech recognition accuracy.
Training an acoustic model uses audio recordings. For best ASR results, a model for a native US English speaking target audience will use audio from native US English speakers. In other words, a model optimized for US English will not work well for French-accented English. The model also has to consider additional factors affecting ASR accuracy.
Factors to Be Considered
- Sampling rate match – Expect poor performance when an acoustic model created using 8KHz 8bit is used where the sampling rate of the end-use microphone differs. Desktop microphones will typically use a 16KHz sampling rate. Increasing the sampling rate and resolution is an effective way to reduce errors. However, there are diminishing returns as sampling rates increase. As always, there are tradeoffs. Higher sampling results in a need for higher network bandwidth.
- Speaking rate match – A match between the speaking rate of the user and the general speaking rate of speech in the audio is desirable for lower error rates. However, it is not uncommon for users to have significant variations in the speed at which they speak. For example, air traffic controllers increase their speaking rate as air traffic congestion increases. The best ASR technology will include automatic speaker rate adaptation.
- Domain-specific data – The language of international aviation is English. The community commonly uses “words” that are not in a dictionary. Frequently this is found in beacon names and airline callsigns, e.g., Alitalia. Adding such words to an ASR lexicon is not tricky. However, the ASR accuracy may suffer if the acoustic model is not aware of those words and, more importantly, how they sound when combined with other terms, i.e., the transition between words can be problematic to speech recognition. One final issue on this topic, the impact on the ASR Lexicon. A lexicon contains the phonetic spelling of all words used by the ASR. There may be many ways to pronounce, and phonetically spell a word. Commonly, the phonetic spelling is not apparent without asking local experts.
Pilot Training, SATCE, and the Acoustic Model Problems
Pilot training presents an extreme example of the acoustic model challenges. One factor links the many problems, user variation.
Contribution to a full range of pilot training is the goal of SATCE. Trainee and experienced pilots will interact with the speech recognition component.
Trainee pilots unfamiliar with the expected terminology will hesitate, and use unexpected words and phrases. They are often non-native English speakers with strong accents, many only beginning their English language learning. Given the global demand for pilots, the variation in user accents is extensive. If you recall, a US English acoustic model does not work well for French-accented English speakers.
Although not as extreme, experienced pilots also present problems. With experience comes confidence, and in turn, an increase in speaking rates. The implementation of ICAO English certification reduces the accent problem but not completely.
As the use of SATCE becomes more extensive, we can expect SATCE support for more airports and airspaces. Domain-specific data will grow. The number of non-dictionary words and aviation phrases will also increase.
Are You Stuck?
There are additional challenges that go beyond acoustic models. Defining supported terminology is the most prominent. You can learn more here. Developing acoustic models and having a cost-effective and timely way to manage them is paramount. If you are stuck with an unsuitable model, you are also stuck with a SATCE that will contribute to negative training and often unresolvable user frustration.