What the Max story says about safety oversight today

Yesterday the US Federal Aviation Administration joined most of the rest of the aviation world in grounding the Boeing 737 Max series of aircraft, the very latest version of the established 737 series. What took it so long?

Having entered service in May 2017, by early March this year the Max had suffered two fatal crashes within five months. This is extraordinary for a new commercial airliner today.

Evidence from the preliminary report on the earlier of the two accidents suggests a technical failure precipitated it. The first event, in October 2018, involved a nearly-new 737 Max 8 belonging to Indonesian carrier Lion Air. It crashed into the sea near Jakarta within about 10min of take-off. The second accident, on 10 March this year, involved an Ethiopian Airlines aircraft of the same type, and it plunged into the ground within six minutes of take-off from Addis Ababa. Pilots of both aircraft radioed that they were having trouble controlling the aircraft’s height, and this was evident on flight tracking systems.

The FAA issued its grounding order on 13 March. This was three days after the Ethiopian crash,  two days after China, Ethiopia and Singapore had banned Max operations, and a day later than the influential European Aviation Safety Agency – and many other states – had done the same.

Does this demonstrate that there are different safety standards – or safety philosophies – in different countries? Or does it suggest that the relationship – in this case – between the safety regulator and the manufacturer is too close?

On 12 March, resisting calls to ground the aircraft, the FAA said: “Thus far, our review shows no systemic performance issues and provides no basis to order grounding the aircraft.”

The next day it stated: “The FAA is ordering the temporary grounding of Boeing 737 MAX aircraft operated by U.S. airlines or in U.S. territory. The agency made this decision as a result of the data gathering process and new evidence collected at the site [of the Ethiopian crash] and analyzed today. This evidence, together with newly refined satellite data available to FAA this morning, led to this decision.”

The safety principle behind aircraft design, for more than half a century, has been that all systems should “fail safe”. This means that any one critical system or piece of equipment, if it fails, will not directly cause an accident. This is achieved either by multiplexing critical systems so there is backup if one of them fails, or by ensuring that the failure does not render the aircraft unflyable.

The preliminary report from the Indonesian accident investigator NTSC suggests that a factor in the sequence of events leading to it was a faulty angle of attack (AoA) sensor. This device, says the report, sent false signals to a new stall protection system unique to the Max series of 737s, known as the manoeuvring control augmentation system (MCAS). According to the report, these signals wrongly indicated a very high AoA, and the MCAS triggered the horizontal stabiliser to trim the aircraft nose-down. Finally, the crew seems not to have known how to counteract this nose-down control demand.

The implication of the NTSC report – not the final verdict – is that the MCAS was not designed according to fail safe principles: a single unit failed, causing a software-controlled automatic system to motor the powerful horizontal stabiliser to pitch the aircraft nose-down, and it kept on doing this until the crew could not overcome the pitch-down force with elevator.

At that point disaster could still have been prevented if the crew had been familiar with the MCAS, or with the drill for a runaway stabiliser trim. But the MCAS would not have been expected to trigger at climb speeds during departure. The result was that in this case the crew failed to act as the final backup safety system.

In the months immediately following the Indonesian crash some pilot associations in the USA whose members operate the Max publicly claimed that there was a widespread ignorance among Max-qualified pilots of the very existence of the MCAS, and also many assumed that a runaway trim could be dealt with in exactly the same way as it was for all the earlier 737 marques. Actually the drill is quite different for the Max, as Boeing and the US Federal Aviation Administration (FAA) have pointed out. There is more detail on the MCAS in the preceding item in this blog – “This shouldn’t happen these days”.

Somehow, therefore, many 737 Max pilots in Boeing’s home territory had found themselves un-briefed on a system that was unique to the Max. They claimed lack of detail in the flight crew operations manual (FCOM), which described the system’s function but did not give it a name. US pilots who converted to the Max were all 737 type-rated and had flown the NG marque, but their conversion course to the Max consisted of computer-based learning, with no simulator time.

This ignorance among US pilots was soon corrected because the issue got plenty of intra-industry publicity, so if a US carrier pilot suffered an MCAS malfunction the crews would have known to apply the runaway trim checklist, and select the STAB TRIM switches to CUT OUT. Was this confidence about US crew knowledge the reason the FAA was able to maintain its sang-froid over grounding for longer than the rest?

On the other hand it is not a good principle to use a pilot as the back-up for a system that is not fail-safe.

In the 1990s there were several serious fatal accidents to 737s caused by what became known as “rudder hard-over”. This was a sudden, uncommanded move of the rudder to one extreme or the other, rendering the aircraft out of control, and unrecoverable if it happened at low altitude. The problem was ultimately solved by redesigning the rudder power control unit, for which there was no backup, thus no fail-safe.

If a Boeing product has a fault the responsibility is Boeing’s, but it is equally the FAA’s. The FAA is the safety overseer, and should satisfy itself that all critical systems are fail-safe and that the manufacturer has proven this through testing.

If America has an image it is that of the can-do, the entrepreneurial risk-taker. Why would Boeing or the FAA be different? One of the FAA’s stated values is this: “Innovation is our signature. We foster creativity and vision to provide solutions beyond today’s boundaries.”

The world has benefited from the USA’s risk-taking culture which has driven some aviation advances faster than they would have occurred in other more risk-averse cultures like that of Western Europe. An example of this is the massive extension of ETOPs (extended range twin engine operation) with the arrival on the market of the Boeing 777, which ultimately drove the four-engined Airbus A340 out of the market and influenced the early close-down of the A380 line. Boeing and the FAA took the risk together, and together they got away with it.

Is the 737 Max going to prove to be the one Boeing didn’t get away with? Time will tell.

But is certain Boeing will find a fix that will get the Max back in the sky. And although this episode, if it runs the course it seems likely to follow, will damage Boeing, the damage will be far from terminal. The company has an unbreakable brand name by virtue of being so good for so long, but trust will have suffered.

In the world at large, the art and science of safety oversight is changing dramatically. Technology is advancing so fast that the traditional system of close oversight by the regulator cannot work without stifling innovation, so “Performance-Based Regulation” (PBR) is the new watchword. Basically this means that the regulator prescribes what performance and reliability objectives a system or piece of equipment should meet, and the manufacturer has to prove to the regulator that it meets them. This is fine, providing that the regulator insists on the testing and the proof, and has the expertise and resources to carry out the oversight.

Although lack of oversight resources in the FAA seems unlikely, it would be a global disaster if it occurred. The same would be true of other national aviation agencies (NAA) in countries where aviation manufacturing takes place.

That risk of under-resourcing NAAs is a serious worry for the future, because all the signs are that most countries consider it a very low political priority, especially at a time of budget austerity.

 

Flydubai FZ981

A Boeing 737-800 attempts to land in windy weather in the small hours of the morning at Rostov-on-Don, Russia on a runway approach notorious for its windshear .

The crew fails to stabilise the aircraft on its first approach either because of windshear, or because it fails to make visual contact with the runway lights in time for a safe landing, and decides to climb away and circle, waiting for an improvement in the weather.

On its second attempt to approach the same runway – 22 – using a category 1 instrument landing system for guidance, it crashes short of the runway. There was no emergency call.

But this is no ordinary crash of the type that would have occurred if the crew – now under pressure to land because fuel is getting low – had made the decision to continue the descent through decision height, despite not being able to see the runway. If that had been true large sections of the aircraft would have remained intact.

This aircraft hit the ground about 300m short of the runway 22 threshold with such force it was shattered into tiny pieces which were scattered across the airfield. How could that happen?

Information from flight tracking service FlightRadar 24 suggests that the crew also abandoned this second approach, climbing away, but then disappearing.

On 17 November 2013 a Tatarstan Airlines Boeing 737-500, en route from Moscow to Kazan, abandoned a poorly executed night approach at its destination airport, applying full power for a go-around. The nose pitched up to 25deg and the speed rapidly dropped because of the steep climb. The crew, becoming disorientated, pushed the nose down hard, putting the aircraft into a dive at an angle of 75deg just before impact. The aircraft was shattered.

On 12 May 2010 an Afriquiyah Airways Airbus A330-200 carried out a go-around from the approach to Tripoli airport’s runway 09 at dawn, the crew lost control because of disorientation and the aircraft crashed. There was one survivor among the 104 on board.

There have been  many documented cases of crews nearly losing control when carrying out an all-engines-operating go-around.

This does not pretend to be the definitive answer to what happened to Flydubai flight FZ981 on 19 March, but it does pose the question as to what kind of event could cause the wreckage to be so badly fragmented.

 

Sinai A320 crash

The Russian Metrojet aircraft lost in north-central Sinai today was a leased Airbus A321 that entered service 18 years ago. Its reported passenger load was 224 people, which means its cabin was full or nearly full.

It had left the southern Sinai coastal resort town of Sharm el-Sheikh heading north for its destination, St Petersburg in Russia. Its route took it across Sinai – where the weather was good – and it would have continued northward over Cyprus and Turkey.

According to commercial flight tracking service Flightradar24 the aircraft was seen to suffer a disturbance which caused rapid variations in its speed and height, reducing the speed to 6okt at one point, which would put it into a deep stall condition unless the crew acted rapidly to recover speed again. Then the aircraft developed a high rate of descent – about 5,000ft per minute, and the position, height and speed information from the aircraft’s transponder was lost.

Flightradar24’s information about the Germanwings aircraft lost in the French Alps earlier this year proved to be highly accurate, and ahead of official information from the investigators it became evident that the A320 had  begun what looked like a deliberate descent to impact, and so it subsequently proved.

In this case the information is more complex because of the apparent speed and height variations that preceded the fatal descent.

The Egyptian authorities have been quick to rule out terrorist action in the form of sabotage or a missile strike, but it is too soon to rule anything out. Sharm el-Sheikh is an important Egyptian tourist resort, and any suggestion of security breaches affecting travellers there would be harmful to trade.

The aircraft was cruising at 31,000ft, at which it would be safe from the kind of man-portable missiles that terrorists in the area could obtain fairly easily, but the aircraft was 2,000ft lower than the Malaysia Airlines Boeing 777 that was shot down over eastern Ukraine last year by a more powerful ground-launched missile.

Early information suggests the aircraft came down in one piece and broke up on impact, making the missile strike theory less likely. On-board sabotage, however, does not have to break an aircraft up in order to damage its controllability.

So at this point it is certain that the aircraft suffered a serious upset during the cruise, but there is no indication why that occurred.

That A400M fatal crash

The big military transport aircraft, not long off the production line and bound soon for the Turkish Air Force, crashed shortly after take off from from Seville San Pablo airfield.

Airbus Military said four of its test crew were killed and two severely injured. All six are Spanish.

It was a warm day with good conditions. So why?

My struggle with this tragic event is that it is such a surprise. The A400M is a heavily-tested type, not just airborne-tested but tried and stressed for years on the manufacturer’s “Iron Bird” racks. There should be no surprises.

Nowadays new Boeings and Airbuses don’t crash during a normal take-off unless something really unusual and therefore unexpected goes wrong. What was it?

They’ll soon tell us.

The aeroplane is a good one and will do well. Airbus Military will survive this. The families are the ones I feel sorry for.