What the Max story says about safety oversight today

Yesterday the US Federal Aviation Administration joined most of the rest of the aviation world in grounding the Boeing 737 Max series of aircraft, the very latest version of the established 737 series. What took it so long?

Having entered service in May 2017, by early March this year the Max had suffered two fatal crashes within five months. This is extraordinary for a new commercial airliner today.

Evidence from the preliminary report on the earlier of the two accidents suggests a technical failure precipitated it. The first event, in October 2018, involved a nearly-new 737 Max 8 belonging to Indonesian carrier Lion Air. It crashed into the sea near Jakarta within about 10min of take-off. The second accident, on 10 March this year, involved an Ethiopian Airlines aircraft of the same type, and it plunged into the ground within six minutes of take-off from Addis Ababa. Pilots of both aircraft radioed that they were having trouble controlling the aircraft’s height, and this was evident on flight tracking systems.

The FAA issued its grounding order on 13 March. This was three days after the Ethiopian crash, two days after China, Ethiopia and Singapore had banned Max operations, and a day later than the influential European Aviation Safety Agency – and many other states – had done the same.

Does this demonstrate that there are different safety standards – or safety philosophies – in different countries? Or does it suggest that the relationship – in this case – between the safety regulator and the manufacturer is too close?

On 12 March, resisting calls to ground the aircraft, the FAA said: “Thus far, our review shows no systemic performance issues and provides no basis to order grounding the aircraft.”

The next day it stated: “The FAA is ordering the temporary grounding of Boeing 737 MAX aircraft operated by U.S. airlines or in U.S. territory. The agency made this decision as a result of the data gathering process and new evidence collected at the site [of the Ethiopian crash] and analyzed today. This evidence, together with newly refined satellite data available to FAA this morning, led to this decision.”

The safety principle behind aircraft design, for more than half a century, has been that all systems should “fail safe”. This means that any one critical system or piece of equipment, if it fails, will not directly cause an accident. This is achieved either by multiplexing critical systems so there is backup if one of them fails, or by ensuring that the failure does not render the aircraft unflyable.

The preliminary report from the Indonesian accident investigator NTSC suggests that a factor in the sequence of events leading to it was a faulty angle of attack (AoA) sensor. This device, says the report, sent false signals to a new stall protection system unique to the Max series of 737s, known as the manoeuvring control augmentation system (MCAS). According to the report, these signals wrongly indicated a very high AoA, and the MCAS triggered the horizontal stabiliser to trim the aircraft nose-down. Finally, the crew seems not to have known how to counteract this nose-down control demand.

The implication of the NTSC report – not the final verdict – is that the MCAS was not designed according to fail safe principles: a single unit failed, causing a software-controlled automatic system to motor the powerful horizontal stabiliser to pitch the aircraft nose-down, and it kept on doing this until the crew could not overcome the pitch-down force with elevator.

At that point disaster could still have been prevented if the crew had been familiar with the MCAS, or with the drill for a runaway stabiliser trim. But the MCAS would not have been expected to trigger at climb speeds during departure. The result was that in this case the crew failed to act as the final backup safety system.

In the months immediately following the Indonesian crash some pilot associations in the USA whose members operate the Max publicly claimed that there was a widespread ignorance among Max-qualified pilots of the very existence of the MCAS, and also many assumed that a runaway trim could be dealt with in exactly the same way as it was for all the earlier 737 marques. Actually the drill is quite different for the Max, as Boeing and the US Federal Aviation Administration (FAA) have pointed out. There is more detail on the MCAS in the preceding item in this blog – “This shouldn’t happen these days”.

Somehow, therefore, many 737 Max pilots in Boeing’s home territory had found themselves un-briefed on a system that was unique to the Max. They claimed lack of detail in the flight crew operations manual (FCOM), which described the system’s function but did not give it a name. US pilots who converted to the Max were all 737 type-rated and had flown the NG marque, but their conversion course to the Max consisted of computer-based learning, with no simulator time.

This ignorance among US pilots was soon corrected because the issue got plenty of intra-industry publicity, so if a US carrier pilot suffered an MCAS malfunction the crews would have known to apply the runaway trim checklist, and select the STAB TRIM switches to CUT OUT. Was this confidence about US crew knowledge the reason the FAA was able to maintain its sang-froid over grounding for longer than the rest?

On the other hand it is not a good principle to use a pilot as the back-up for a system that is not fail-safe.

In the 1990s there were several serious fatal accidents to 737s caused by what became known as “rudder hard-over”. This was a sudden, uncommanded move of the rudder to one extreme or the other, rendering the aircraft out of control, and unrecoverable if it happened at low altitude. The problem was ultimately solved by redesigning the rudder power control unit, for which there was no backup, thus no fail-safe.

If a Boeing product has a fault the responsibility is Boeing’s, but it is equally the FAA’s. The FAA is the safety overseer, and should satisfy itself that all critical systems are fail-safe and that the manufacturer has proven this through testing.

If America has an image it is that of the can-do, the entrepreneurial risk-taker. Why would Boeing or the FAA be different? One of the FAA’s stated values is this: “Innovation is our signature. We foster creativity and vision to provide solutions beyond today’s boundaries.”

The world has benefited from the USA’s risk-taking culture which has driven some aviation advances faster than they would have occurred in other more risk-averse cultures like that of Western Europe. An example of this is the massive extension of ETOPs (extended range twin engine operation) with the arrival on the market of the Boeing 777, which ultimately drove the four-engined Airbus A340 out of the market and influenced the early close-down of the A380 line. Boeing and the FAA took the risk together, and together they got away with it.

Is the 737 Max going to prove to be the one Boeing didn’t get away with? Time will tell.

But is certain Boeing will find a fix that will get the Max back in the sky. And although this episode, if it runs the course it seems likely to follow, will damage Boeing, the damage will be far from terminal. The company has an unbreakable brand name by virtue of being so good for so long, but trust will have suffered.

In the world at large, the art and science of safety oversight is changing dramatically. Technology is advancing so fast that the traditional system of close oversight by the regulator cannot work without stifling innovation, so “Performance-Based Regulation” (PBR) is the new watchword. Basically this means that the regulator prescribes what performance and reliability objectives a system or piece of equipment should meet, and the manufacturer has to prove to the regulator that it meets them. This is fine, providing that the regulator insists on the testing and the proof, and has the expertise and resources to carry out the oversight.

Although lack of oversight resources in the FAA seems unlikely, it would be a global disaster if it occurred. The same would be true of other national aviation agencies (NAA) in countries where aviation manufacturing takes place.

That risk of under-resourcing NAAs is a serious worry for the future, because all the signs are that most countries consider it a very low political priority, especially at a time of budget austerity.

7 thoughts on “What the Max story says about safety oversight today”

Sally Gethin says:

15/03/2019 at 02:09

Hugely insightful, thanks David

LikeLike

121Pilot says:

22/03/2019 at 21:13

There is a vast difference between a rudder hardcover, which crews are not trained to deal with and have no means of countering, and a stab trim runaway. The actions for which are a required memory item on the 737. As a professional pilot Indont think it unreasonable to expect that pilots of the 737 should be able to react to MCAS but shutting the stab trim off.

I think it’s also worth noting that pilots transitioning from the 320 CEO to the NEO are getting all of their training
Via iPad as well. Frankly it’s all that’s required.

There is no doubt that the MAX differences course should have addressed MCAS and the changes in how a stab trim runaway can be disconnected. But even without that training the Lion are crew should have been able to handle it. Other crews flying that same airplane did. And the Ethiopian crew that had been trained on MCAS has absolutely no excuse.

The real issue isn’t a problem with the MAX. It’s a problem with those crews who for reasons unknown were unable to do what they certainly should have been able to do.

LikeLike

- Chris Charles says:
  
  14/04/2019 at 15:05
  
  Perhaps you may consider modifying your reply due to some of the latest intelligence and comment:
  http://www.askthepilot.com/ethiopian-737max-crash/
  
  LikeLike
  
Shyam Fulena says:

30/03/2019 at 19:14

Would the consequences have been different if the signal was sent to the elevators instead of the stabilisers, taking into consideration that the aircraft was flying at low altitude, low speed and the instinctive reaction of the flight crew to act on the primary flight controls.

Also the effect the larger stabilisers have compared to the elevators at low speed and low altitude.

LikeLike

uwevielzukurz says:

01/04/2019 at 17:06

Hello!
In this context I stumbled over your article in flight global from 2000:
https://www.flightglobal.com/news/articles/faa-rules-kill-39grandfather-rights39-in-usa-and-europe-67064/

This seems to have never been taken any further?

LikeLike

Dr Ron Smith says:

09/05/2019 at 19:42

Very interesting. As an ex-design signatory from the helicopter industry, I find it alarming that Boeing and the FAA should somehow manage to certify a safety critical system that is neither duplex or triplex, and then not feel it necessary to detail the function, workings, failure modes and corrective actions in the Pilots’ Operating Handbook. This is exactly the sort of issue that an old-fashioned Technical Director, with a reporting line to the Board independent from the Progamme Team, would have looked for and prevented.

(Duplex tells you that the sensors disagree, but not which one is in error – still it alerts you to look into the issue. Triplex should leave you with two sensors that agree and which are therefore (probably) correct, allowing one lane to be disabled. Fully unstable aircraft (such as modern fighters) tend to have quadruplex systems to provide the necessary redundancy in their primary control systems).

LikeLike

What can software organizations learn from the Boeing 737 MAX saga? – Embedded Artistry says:

20/10/2019 at 17:16

[…] designed to use a single data point, that of the AoA sensor on the corresponding side of the plane. The initial NTSC report on the Lion Air crash tells us that a single faulty AoA sensor triggered the […]

LikeLike