Articulate audio

The overriding principle of speech intelligibility is that louder does not mean clearer especially when the pro also has to deal with signal-to-noise ratios and reflected or reverberant sound distortion, says Phil Ward.

Here are some vowels and consonants, to get started: ISO, EN, BS, ANSI, ASA. Between them, they represent the British, European and American standards organisations that rule everything. They issue complex regulations, and naturally these cover mission-critical applications, like voice alarm (VA).

But anyone can understand the overriding principle of speech intelligibility who has ever been cornered by an over-enthusiastic hi-fi buff: up goes the sound pressure; out of the window goes your attention. Louder does not mean clearer.

If this happens at a party, two of the other key factors come into play: signal-to-noise ratio; and distortion. If it happens at a party in a cave (yes, I have), the final factor is present: the ratio of direct to reflected or reverberant sound.

AV professionals of every kind should be aware of these values, but only those who work in VA have to measure up to the standards. Among those, it’s the specialists in transportation who have the toughest job. And the worst application is road tunnelling, more than any other kind of tunnelling, because these are innately noisy and reflective environments prone to evacuation courtesy of high-speed vehicles.

Way back, Bell Telephone Laboratories devised the Articulation Index. In 1997, this became the revised Speech Intelligibility Index (SII). For practical application in electro-acoustic PA systems, this was modified into the Speech Transmission Index (STI), last revised around 2005. It’s a decimal scale from 0.0 to 1.0, where 0 to 0.3 is Bad; 0.3 to 0.45 is Poor; 0.45 to 0.6 is Fair; 0.6 to 0.75 is Good; and 0.75 to 1.0 is Excellent.

If you like, it’s almost a barometer of your chances of survival. As computer processing has advanced, the calibration has expanded to a matrix of 108 grades and superseded the RASTI scale that many will remember.

Nowadays the STI can be measured on site by handy portable devices such as the Bedrock SM50 STIPAmeter (by Embedded Acoustics), or the NTI Audio Acoustilyser AL1. The method used is a test-signal procedure called ST-IPA, which means Speech Transmission Index for Public Address. Prior to this stage, STI can be predicted in design software like Computer Aided Theatre Technique’s CATT-Acoustic (check out the company’s animated demos featuring ‘Oskar the Acoustician’); EASE; and the Danish option, ODEON.

The pattern used in the test is not real speech. It’s a mathematically calculated abstract pattern of frequencies intended to represent all of the layers in the speech spectrum of most languages, from 125Hz to 8kHz, via 250Hz, 500Hz, 1kHz, 2kHz and 4kHz – a kind of robot Esperanto.

“The commissioning certificate identifies how many tests must be done in each area that is ‘acoustically distinguishable’, as according to EN54 Part 32,” reveals Neil Voce, head of business development as Sussex-based PA/VA manufacturer, ASL.

“It’s useful, but of course in the first place your system must have the headroom to reproduce the peaks cleanly, as well as deploying the right speaker types for each separate location.”

If called upon by the commissioning specifier, expensive further tests of a more empirical, human nature can be carried out using 20 people who listen to the system throughout the site and complete a word-scoring test.

“It broadcasts a phonetically balanced word sequence,” explains Voce, “and your listeners write them all down on a chart. It’s rare these days, due to cost, but human experience is often superior to the algorithm used in the ST-IPA method. I’ve heard 0.35 on the STI – a clear fail – while being able to understand the VA messaging perfectly clearly. It’s usually some kind of acoustic anomaly in the building that defeats the machine.”

Using the usual techniques

Wherever possible, the usual acoustic absorption techniques can be applied to improve intelligibility, as for all sound reproduction systems.

Ditto: the control of the ambience using quieter ventilation, well-oiled door hinges and so on.

Beyond this, the pro audio industry has been introducing highly directional loudspeakers and dedicated signal processing to help solve intelligibility issues: the latest offering from Duran Audio, the Dutch pioneer acquired by Harman a few years ago, is the Asymmetric Boundary Flare horn (ABF-260), which claims a high ratio of front-to-back directivity pattern and relatively low signal distortion. It can be found in a lot of tunnels.

“Intelligibility in a theatre or classroom is very different to that required in a train station,” comments Nick Screen, director of transportation solutions at Harman.

“Most of these systems are now used for both public address and VA announcements. A minimum STI of 0.5 is required with the maximum background noise present – a challenge in some buildings. In a tunnel, 0.45 STI with noise is far more realistic.”

Duran Audio has also introduced its own STI measurement and filter optimisation tool called the OpSTImizer, a PC-based ST-IPA program that enables the pre-filtering of speech signals according to conditions.

“For complex spaces it is important that the ray tracing method is used when predicting STI values,” adds Screen.

“Statistical models simply do not work for complex spaces and can provide misleading results.”

With the last revision of the STI, it was noted that sound system levels needed in hostile environments – typically above 75dBA – scored significantly lower ratings on the scale.

This was because the revised standard, in the interests of greater accuracy, took account of auditory masking: the effects of background noise on the signal.

Now part of the STI ‘sweep’, masking has obliged new techniques. “With a perfect loudspeaker in an anechoic room the highest STI you can achieve at 110dB is just below 0.7,” says Screen. “So for systems that need to run at high volumes to ensure that you have a sufficient signal to noise ratio, things get a lot more complicated for the system designer…”

“Sound masking is the new frontier of intelligibility,” agrees Voce, “because you need to reduce it over distance for many acoustic zones. You can calculate this using the Radius of Distraction.”

Enough. In medicine there is also the Speech Confusion Index, for dysarthric speakers. Let’s not go there either.

Have your say

or a new account to join the discussion.