Max to the skies again

After nearly two years of grounding, Boeing’s 737 Max series has been cleared by the US Federal Aviation Administration to carry fare-paying passengers once again.

This is the first step in a redemption process for one of the world’s truly great engineering companies. Like a boxer who dropped his guard for just a second, Boeing has taken a punch that has knocked it to the canvas, and the referee had started counting.

Now, air traveller reaction is nervously awaited. Will the public believe claims by the FAA and Boeing that, together, they have confined to history the flaws that caused the 737 Max fatal crashes in 2018 and 2019?

The FAA – blamed along with the manufacturer for the lapses in design oversight that led to the two accidents – has declared the aircraft safe to operate in America. One by one, other national aviation authorities (NAA) are expected to follow suit.

Oversight of the type’s rehabilitation continues to be the FAA’s responsibility, but decisions on the systems and software changes applied to the Max have been made by multinational teams. Bodies formed to decide what changes were needed – and then to see them implemented – included the Joint Authorities Technical Review (JATR) representing nine nations plus the European Union Aviation Safety Agency (EASA) – and the Joint Operations Evaluation Board.

The relationship between the FAA and Boeing was much criticised in the accident investigations and the JATR review process . For that reason, the reaction of EASA to the Max’s clearance to fly is seen as critical.

Not only is EASA the agency that oversees safety in the region containing the largest group of aerospace industries outside America, but its contribution to the JATR recommendations made clear EASA was not happy with the FAA’s former piecemeal approach to certifying critical changes applied to the 737 Max.

Its opprobrium was directed particularly at the FAA’s approval of the flawed Manoeuvring Characteristics Augmentation System (MCAS), unique to the Max, and not used in earlier marques of 737. It recommended “a comprehensive integrated system-level analysis” of the MCAS, and of its integration into the total system-of-systems that constitutes a modern aircraft (for more detail, see “The Failures and the Fixes” section following this article).

So it was with heartfelt relief that Boeing heard EASA’s executive director, Patrick Ky, report on Max progress to the European Parliament Transport Committee on 29 October. Ky told them: “We are fully confident that, given all the work that has been performed, and the assessments which have been done, the aircraft can be returned safely to service.” Ky’s statement suggests EASA will re-certificate the 737 Max in Europe soon after the FAA’s announcement.

Meanwhile, out in the real world, Covid-19’s near-immobilisation of commercial air transport worldwide has rendered the Max’s long grounding almost invisible to the media and the public. Because of the far lower level of air travel activity, the airlines have been able to live without the 387 Maxes already delivered to them, and also without the additional 450 that have rolled off Boeing’s Renton, Washington production line since then. The latter are all in storage, awaiting any updates not already incorporated, and ultimate delivery.

Although clearance to fly has now been delivered, even in the USA the airlines will not instantly be re-launching their already-owned 737 Max fleets. The status of all the proposed software and hardware modifications to the type will not have been confirmed until the moment the FAA signs it all off.

American Airlines has said it hopes to start getting its Max fleet airborne before the end of December.

REUTERS/Nick Oxford/File Photo

Once the FAA has done that, getting the Max fleet ready for the sky will be an aircraft-by-aircraft, crew-by-crew process. In many airframes, a knowledge of what changes were coming has enabled a great deal of the work to be done. But also, because of the hardware and software changes to the Max, the crews have to be trained to use the new systems.

Incidentally, while the Max series was grounded, the FAA decided to order some additional modifications – completely unrelated to the crashes – to bring the type fully in line with modern safety regulations. For example, one of these involves the re-routeing and separation of wiring looms that the 737 had previously been allowed to sidestep under “grandfather” rules.

The number of lessons for manufacturers and regulators to learn from this aerospace drama is legion.

The failures and the fixes

The failures

Just a reminder: the 737 Max series fleet was grounded in March last year as a result of findings from the investigations into to the Lion Air and Ethiopian Airlines fatal crashes, respectively in October 2018 and March 2019.

The primary causal factor of the Lion Air Max crash was erroneous triggering of its manoeuvring characteristics augmentation system (MCAS) by a faulty angle of attack (AoA) sensor, according to the Indonesian final accident report. It is at the MCAS that Boeing’s corrective efforts have mostly been directed.

In both the accidents, the aircraft’s AoA sensor that feeds data to the MCAS wrongly indicated a very high AoA soon after take-off. The system reacted by providing nose-down stabilizer rotation that took the pilots by surprise. They did not understand the reason it kicked in, and their efforts to reverse the strong nose-down pitch did not succeed. Both these events occurred soon after take-off, and because the MCAS kept repeating the nose-down stabilizer in response to the continued erroneous high AoA sensor signal, the loss of height quickly resulted in impact with the surface.

During the examination of all the issues arising from the accidents, the JOEB was aware there were solutions to the situation in which the crews found themselves. But the fact that two crews in different regions of the world were so confused by what the MCAS was doing that they lost control had totally eclipsed pilot failings as the main issue.

MCAS was designed to trigger only in a specific flight configuration that causes the Max’s centre of lift to move slightly further forward, delivering a slight nose-up moment that can be countered by flight controls. This configuration is a combination of relatively low airspeed, flaps up, with the aircraft being flown manually. In the case of the Lion Air and Ethiopian flights, the pilots decided to continue to fly the aircraft manually during the early climb, rather than engaging the autopilot, so this precise flight configuration was encountered as soon as the flaps were fully retracted.

With flaps up, and still at a fairly low airspeed, the aircraft would be at a high angle of attack, and not far above the stall. FAA regulations require that, in the proximity to the stall, one of the “feel” cues to the pilots is that there should be a linear increase in the required control column force versus elevator displacement response, but the Max’s aerodynamics in this configuration had negated this effect, and MCAS was designed to restore that pilot cue automatically.

The JATR decided that MCAS’ fatal design weakness, above all, was that it was triggered by a single AoA sensor with no backup in case the unit had a fault or suffered damage. It seems Boeing and the FAA had overlooked that possibility, and had not explored the potential effects of erroneous inputs. Their excuse at the time was that the system was not seen as a critical one, rather as a refinement.

The fixes

The 737 Max had always been fitted with two AoA vanes, but originally only one was wired up to MCAS, and there was no flight deck indication of a disparity between the two sensors if a difference developed, which could have warned the pilots of a potential vane fault.

The hardware fix agreed by the JATR was that both AoA sensors would now feed into the MCAS, there would be an automatic comparison between them, and if there was more than a small disparity the MCAS would be locked out completely, because the aircraft can be flown without it.

The software fix also ensures that – now – the MCAS only operates once per high AoA event, so the repeated nose-down pitch demand by the stabilizers that led to the two accidents would not occur. In addition, the two flight control computers (FCC) now continuously cross-monitor each other.

After the hardware and software changes, the final improvements – overseen by the multinational JOEB – are to pilot training and cockpit drills for the Max series.

Now, even if the pilots are coming to the Max from the very similar 737NG series, pilots must undergo a one-off training session in a Max full flight simulator. This involves recovery from a full stall, dealing with a runaway stabilizer,  practice manual trimming at high speeds (and therefore high trim loads), and crew cooperation on all these exercises.

Non-normal checklists have now been compeletely revised, and contain updated procedures that concentrate particularly on the operation of the horizontal stabilisers and trim controls, both in normal operation and in the case of all potential faults.  The drills deal with runaway stabilizer, speed-trim failure, stabilizer out of trim, stabilizer trim inoperative, airspeed unreliable, altitude disagree, and AoA disagree.

Computer based training (CBT), containing video of crew exercises using the real controls, teaches drills for the following: airspeed unreliable, runaway stabilizer, the speed trim system, trim controls, and differences between the autopilot flight director system (AFDS) in the NG series and the Max series.

Testing the changes

Boeing and the FAA say they have put in 391,000 engineering and test hours developing the solutions, which have then been tried for 1,847 hours in simulators and for 3,000 airborne hours in the real aircraft.

The Max crux

Boeing, the FAA, and national aviation authorities (NAAs) from several other countries, met in Dallas on 23 May to consider the future of the 737 Max series of aircraft.

It is impossible to overstate how important this meeting is. The way civil aircraft manufacturing does business, not just in America, but all over the world, is under scrutiny.

Detail gradually emerging from Boeing and the FAA following the two 737 Max fatal crashes has upset such basic assumptions about the way modern aviation works that industry veterans – whose initial reaction was that this was just a case of finding a fix and getting the Max airborne again – are , only now, fully realising it’s not.

Like the Looney Tunes cartoon characters who ran over a cliff they didn’t know was there, we didn’t begin to fall until we looked down.

Let’s examine the proposal that all airliners nowadays are massively computerized, so adding some digital controls to the good old 737 to make it a Max is just bringing the 737 marque up to date.

After all, digital controls work on other types like Airbuses and Boeing’s own 777 and 787, and they are safe, so why not on the 737?

Back to basics.

All modern commercial airliners are supposed to be designed, in the first place, so they fly easily and intuitively, and have a natural aerodynamic stability within their flight envelope. That should hold true with or without computer control.

Designing an aircraft to be fly-by-wire, rather than conventionally controlled, can provide additional safeguards, but the airframe itself should still fly naturally.

Applying a digital solution to an airframe-related flight characteristic that is undesirable is a different matter entirely; but that is what Boeing chose to do when it installed the Manoeuvring Characteristics Augmentation System (MCAS) in the new Max.

The fact – revealed by the fatal accidents – that the MCAS could be triggered when it was not needed, and what consequences might follow its triggering, appears not to have been examined in any depth by Boeing or the FAA.

The fundamental questions for the FAA – and the foreign NAAs- are these: is the Max, as a simple airframe without digital corrections, sufficiently stable within its flight envelope to satisfy the regulators it is worthy of certification?

If not, is a digital fix sufficient to cover the undesirable flight characteristics lurking in a corner of its flight envelope? How reliable does the fix have to be to win approval?…and how can its reliability be proven?

For three decades the aviation world has agreed to operate a regime whereby the NAAs in countries where aircraft are manufactured all use the same standards when they certificate a new aircraft. So when the FAA certificated the 737 Max, the rest of the world accepted the FAA’s judgement and did not insist – as in the bad old days of the 1970s and before – on re-certificating it country by country.

What if, in this case, the FAA re-certificates the MCAS-modified Max, but foreign NAAs do not? The European Cockpit Association today has called on the European Union Aviation Safety Agency to scrutinize any FAA approvals, and EASA has pledged to do so. Is this “back to the bad old days”?

At the end of the Dallas meeting Boeing had this to say: “We appreciate the FAA’s leadership…in bringing global regulators together to share information and discuss the safe return to service of the 737 MAX….Once we have addressed the information requests from the FAA, we will be ready to schedule a certification test flight and submit final certification documentation.”

Industry speculation as to when the FAA will be ready to approve return to service varies massively, from a week to many months. These seers also seem to be preparing themselves for disagreement between the FAA and foreign NAAs.

This is the point at which you dare not look down.

 

What the Max story says about safety oversight today

Yesterday the US Federal Aviation Administration joined most of the rest of the aviation world in grounding the Boeing 737 Max series of aircraft, the very latest version of the established 737 series. What took it so long?

Having entered service in May 2017, by early March this year the Max had suffered two fatal crashes within five months. This is extraordinary for a new commercial airliner today.

Evidence from the preliminary report on the earlier of the two accidents suggests a technical failure precipitated it. The first event, in October 2018, involved a nearly-new 737 Max 8 belonging to Indonesian carrier Lion Air. It crashed into the sea near Jakarta within about 10min of take-off. The second accident, on 10 March this year, involved an Ethiopian Airlines aircraft of the same type, and it plunged into the ground within six minutes of take-off from Addis Ababa. Pilots of both aircraft radioed that they were having trouble controlling the aircraft’s height, and this was evident on flight tracking systems.

The FAA issued its grounding order on 13 March. This was three days after the Ethiopian crash,  two days after China, Ethiopia and Singapore had banned Max operations, and a day later than the influential European Aviation Safety Agency – and many other states – had done the same.

Does this demonstrate that there are different safety standards – or safety philosophies – in different countries? Or does it suggest that the relationship – in this case – between the safety regulator and the manufacturer is too close?

On 12 March, resisting calls to ground the aircraft, the FAA said: “Thus far, our review shows no systemic performance issues and provides no basis to order grounding the aircraft.”

The next day it stated: “The FAA is ordering the temporary grounding of Boeing 737 MAX aircraft operated by U.S. airlines or in U.S. territory. The agency made this decision as a result of the data gathering process and new evidence collected at the site [of the Ethiopian crash] and analyzed today. This evidence, together with newly refined satellite data available to FAA this morning, led to this decision.”

The safety principle behind aircraft design, for more than half a century, has been that all systems should “fail safe”. This means that any one critical system or piece of equipment, if it fails, will not directly cause an accident. This is achieved either by multiplexing critical systems so there is backup if one of them fails, or by ensuring that the failure does not render the aircraft unflyable.

The preliminary report from the Indonesian accident investigator NTSC suggests that a factor in the sequence of events leading to it was a faulty angle of attack (AoA) sensor. This device, says the report, sent false signals to a new stall protection system unique to the Max series of 737s, known as the manoeuvring control augmentation system (MCAS). According to the report, these signals wrongly indicated a very high AoA, and the MCAS triggered the horizontal stabiliser to trim the aircraft nose-down. Finally, the crew seems not to have known how to counteract this nose-down control demand.

The implication of the NTSC report – not the final verdict – is that the MCAS was not designed according to fail safe principles: a single unit failed, causing a software-controlled automatic system to motor the powerful horizontal stabiliser to pitch the aircraft nose-down, and it kept on doing this until the crew could not overcome the pitch-down force with elevator.

At that point disaster could still have been prevented if the crew had been familiar with the MCAS, or with the drill for a runaway stabiliser trim. But the MCAS would not have been expected to trigger at climb speeds during departure. The result was that in this case the crew failed to act as the final backup safety system.

In the months immediately following the Indonesian crash some pilot associations in the USA whose members operate the Max publicly claimed that there was a widespread ignorance among Max-qualified pilots of the very existence of the MCAS, and also many assumed that a runaway trim could be dealt with in exactly the same way as it was for all the earlier 737 marques. Actually the drill is quite different for the Max, as Boeing and the US Federal Aviation Administration (FAA) have pointed out. There is more detail on the MCAS in the preceding item in this blog – “This shouldn’t happen these days”.

Somehow, therefore, many 737 Max pilots in Boeing’s home territory had found themselves un-briefed on a system that was unique to the Max. They claimed lack of detail in the flight crew operations manual (FCOM), which described the system’s function but did not give it a name. US pilots who converted to the Max were all 737 type-rated and had flown the NG marque, but their conversion course to the Max consisted of computer-based learning, with no simulator time.

This ignorance among US pilots was soon corrected because the issue got plenty of intra-industry publicity, so if a US carrier pilot suffered an MCAS malfunction the crews would have known to apply the runaway trim checklist, and select the STAB TRIM switches to CUT OUT. Was this confidence about US crew knowledge the reason the FAA was able to maintain its sang-froid over grounding for longer than the rest?

On the other hand it is not a good principle to use a pilot as the back-up for a system that is not fail-safe.

In the 1990s there were several serious fatal accidents to 737s caused by what became known as “rudder hard-over”. This was a sudden, uncommanded move of the rudder to one extreme or the other, rendering the aircraft out of control, and unrecoverable if it happened at low altitude. The problem was ultimately solved by redesigning the rudder power control unit, for which there was no backup, thus no fail-safe.

If a Boeing product has a fault the responsibility is Boeing’s, but it is equally the FAA’s. The FAA is the safety overseer, and should satisfy itself that all critical systems are fail-safe and that the manufacturer has proven this through testing.

If America has an image it is that of the can-do, the entrepreneurial risk-taker. Why would Boeing or the FAA be different? One of the FAA’s stated values is this: “Innovation is our signature. We foster creativity and vision to provide solutions beyond today’s boundaries.”

The world has benefited from the USA’s risk-taking culture which has driven some aviation advances faster than they would have occurred in other more risk-averse cultures like that of Western Europe. An example of this is the massive extension of ETOPs (extended range twin engine operation) with the arrival on the market of the Boeing 777, which ultimately drove the four-engined Airbus A340 out of the market and influenced the early close-down of the A380 line. Boeing and the FAA took the risk together, and together they got away with it.

Is the 737 Max going to prove to be the one Boeing didn’t get away with? Time will tell.

But is certain Boeing will find a fix that will get the Max back in the sky. And although this episode, if it runs the course it seems likely to follow, will damage Boeing, the damage will be far from terminal. The company has an unbreakable brand name by virtue of being so good for so long, but trust will have suffered.

In the world at large, the art and science of safety oversight is changing dramatically. Technology is advancing so fast that the traditional system of close oversight by the regulator cannot work without stifling innovation, so “Performance-Based Regulation” (PBR) is the new watchword. Basically this means that the regulator prescribes what performance and reliability objectives a system or piece of equipment should meet, and the manufacturer has to prove to the regulator that it meets them. This is fine, providing that the regulator insists on the testing and the proof, and has the expertise and resources to carry out the oversight.

Although lack of oversight resources in the FAA seems unlikely, it would be a global disaster if it occurred. The same would be true of other national aviation agencies (NAA) in countries where aviation manufacturing takes place.

That risk of under-resourcing NAAs is a serious worry for the future, because all the signs are that most countries consider it a very low political priority, especially at a time of budget austerity.

 

This shouldn’t happen these days

In the last five years, statistics for fatal accidents to commercial passenger jets were so low they looked set to prove that a permanent zero fatal accident target was achievable.

Technology is accepted to be the main contributor to these remarkable safety performance improvements. The superb engineering and smart systems in the latest jets made them as different from their predecessors as today’s generation of automobiles is from cars of the 1970s.

But, on 29 October 2018, Lion Air flight JT610 crashed only about 12min after take-off from Jakarta, Indonesia. The aircraft was a Boeing 737 Max 8 that was delivered by the manufacturer to the airline less than three months before, one of 11 of this new marque in its fleet.

That was a shock, but when on 10 March this year another almost new 737 Max 8 also crashed within a few minutes of take-off from Addis Ababa, Ethiopia under circumstances that appear similar, a chill went through the entire aviation community.

Ethiopian Airlines has grounded its 737 Max fleet, Singapore has banned Max operations in its airspace, and the Chinese aviation authority CAAC has grounded all Maxes registered there – almost sixty of them. And on 12 March Australia, Ireland, France, Germany and the UK added themselves to the rapidly growing list of those who had banned operation of the type. Late on 12 March the biggest blow fell: European Union body the European Aviation Safety Agency has banned all 737 Max 8s and 9s from its skies except to fly, empty, to maintenance bases. The agency argued that it cannot be ruled out that the Ethiopian accident was caused by the same failure as that which appears to have caused the Lion Air crash. And, shortly before midnight, India had joined the doubters.

Now Latin America has begun a wave of groundings and, as a result, by the end of the Western European day on 12 March more than a third of all Maxes in service around the world had been affected by effective groundings. There has never been an event like this, where the original certificating authority has declared an aircraft airworthy but much of the rest of the world has decided it is not so confident.

Back to the accident issues. The two take-off airports couldn’t have been more different, one at sea level, the other at an elevation of more than 7,000ft, but in both cases it was daylight and the weather conditions were benign.

Both aircraft were seen to dive to impact.

The Indonesian investigator (NTSC) issued a preliminary factual report that doesn’t pretend to provide a verdict on the cause of the Lion Air crash, but suggests that a factor in the sequence of events leading to it was a faulty angle of attack (AoA) sensor. This device, says the report, sent false signals to a new stall protection system unique to the Max series of 737s, known as the manoeuvring control augmentation system (MCAS). According to the report, these signals wrongly indicated a very high AoA, and the MCAS triggered the horizontal stabiliser to trim the aircraft nose-down.  The crew seems not to have known how to counteract this nose-down control demand.

The NTSC did, however, provide fine detail about malfunctions on same airframe on the previous day (28 October), when almost exactly the same sequence of events occurred, including the signal from the faulty AoA sensor to the MCAS. But on that occasion the captain stopped the nose-down stabiliser trim rotation by selecting the STAB TRIM switches to CUT OUT, and then proceeded safely to the scheduled destination.

Some pilot associations in the USA whose members operate the Max have professed publicly that there was a widespread ignorance among Max-qualified pilots of the very existence of the MCAS, and also among them was an assumption that a runaway trim could be dealt with in exactly the same way as it was for all the earlier 737 marques. Actually the drill is different for the Max, as Boeing and the US Federal Aviation Administration (FAA) have pointed out.

The MCAS was developed for the Max because its more powerful engines are heavier and fitted further forward than those on earlier marques, affecting the aircraft’s centre of gravity and thus its behaviour at low speeds approaching the stall, so the manufacturer wanted to boost stall protection. It looks as if Boeing had either not foreseen the potential effect of a false high AoA indicator input to the MCAS, or it had failed to warn pilots clearly what that effect could be and how to react. The FAA also, it appears, had not anticipated this.

After the Lion Air crash the FAA put out an emergency airworthiness directive requiring operators of the Max to make clear to pilots the procedures for dealing with a runaway stabiliser trim. Boeing maintained that information was already available.

Pilots converting from earlier 737 marques to the Max are not required to undergo a new full type rating course or simulator sessions, because all 737s are deemed to have sufficient commonality to operate under the same type rating. Thus 737-rated pilots being prepared for the Max are required only to undergo a brief academic “differences course”. For example Southwest Airlines pilots had done their differences course entirely online, and American Airlines the same.

On 11 March, a day after the Ethiopian crash, the FAA revealed it has required Boeing to solve the software problem – and if applicable the hardware – that at present means that a false AoA input can trigger the MCAS stall protection when it is not needed, effectively causing a stabiliser pitch trim runaway. Meanwhile it has declared that the 737 Max series is airworthy.

But if it were to be found that there is a common cause of these two Max crashes – whatever that cause is determined to be – the implications for the manufacturer and the airlines are significant, given the massive size of the order book for 737 Max series aircraft.