Johnson Space Center
 PeopleProgramsNewsInfoQuestions
  Search
Return to Johnson Space Center home page Return to Johnson Space Center home page

 

NASA Johnson Space Center Oral History Project
Edited Oral History Transcript

David C. McGill
Interviewed by Rebecca Wright
Houston, TX – 22 May 2015

Wright: Today is May 22, 2015. This oral history session is being conducted with David McGill in Houston, Texas, as part of the JSC Oral History Project and for JSC’s Knowledge Management Office. Interviewer is Rebecca Wright, assisted by Jennifer Ross-Nazzal. Thank you for coming in and spending your Friday morning with us.

McGill: Thank you.

Wright: I’d like for you to start. If you would just briefly describe your background about how you became a part of NASA Johnson Space Center.

McGill: I graduated from Lamar University over in Beaumont [Texas] in ’68. There’s an interesting story that got me here. I’d been interested in the space program since we had one, and in electronics since I was four I think. That’s as far back as I can remember. But at the time I was interviewing I had several interviews with people like Western Electric up on the east coast. The head of the EE department—I have an electrical engineering degree—really felt like when companies came in for interviews their roster should be full. I saw that Philco was coming in, and I said, “I don’t really want to build refrigerators, air conditioners, it’s not really what I’m interested in.” But he believed that they ought to fill up, and so just because he asked me to, I went over and signed up.

It wasn’t till then I found out that they actually had the contract out here, and I said, “Whoa, that’s quite different than I thought.” So I certainly accepted an opportunity to come over and interview with them. When I found out how deeply rooted they were in the US manned space program I just couldn’t resist.

Wright: It was a good idea then, wasn’t it?

McGill: So that’s what got me over here in June of ’68. Now actually the first project that I worked on people probably don’t remember. But at that time the United States really had two manned space programs, the one everybody knows about where we were heading to the Moon, but the one they’ve forgotten about that was called Manned Orbiting Lab [MOL]. It was an Air Force program.

The first thing I got assigned to—mainly because I was the junior guy on the team, so the junior guy always gets to do all the travel—we were integrating together the systems that were to upgrade the Air Force remote tracking stations and also generally building the first version of what we now call the Blue Cube or the Satellite Test Center [Onizuka Air Force Station, California]. Those were certainly going to be used for a variety of things the Air Force wanted to do in space. But the real driver to upgrade those was the Manned Orbiting Lab.

That caused me to spend most of my time on the west coast. Did make it to a few of the tracking stations. Didn’t get to go to the Seychelles Islands one, which I always regret not having put a bug in that system before it got shipped. But it was a very interesting time. I look back on it, and if you got to pick a career path, I think that would be a great starting place for somebody who wants to design this stuff, because it was largely designed when I joined the team, and my job was to sit out there and try to get it all integrated together so it would work.

It certainly leaves indelible impressions on you about the implications of your design decisions. It really did start me out thinking you better design this stuff to integrate or you’re going to have huge problems when you try to put it back together and make it play. So I look back on that, it certainly was not by design, but I feel like I was very fortunate to have started out my career in aerospace by trying to integrate a system that somebody else mostly designed.

That lasted for about a year. The country then decided that the smart strategic position was to declare space civilian. So they canceled the Manned Orbiting Lab, the Air Force project, and put all the focus back on the civilian side and going to the Moon, and that’s what put me back on the Houston side, working on the NASA things. That’s a long answer to your question about how I got started in all of this.

Fortunately back in Houston I was given the opportunity to work on a variety of projects. To set the stage a little more I think of what ought to be our key topical thread today is looking at the architecture of the [Mission] Control Center, how it’s evolved. Probably there are not many of us around anymore that remember what the first one really looked like. Remember, this was a thing that was built 50 years ago, mid ’60s timeframe, no such thing as personal computers, no such thing as smartphones. A computer was a very big thing and it didn’t do very much.

The initial Control Center, which was in place when I got here, by that time it was about ’69. In fact I actually watched the Moon landing from Kodiak, Alaska. I was up there at the tracking station at the time. Got fussed at because we decided we’d swing the big antenna around and see if we could hear the LM [Lunar Module] on the Moon. The station commander didn’t think much of our plan. Almost had it too.

The Control Center, initial one, was driven by several interesting things. First, I think some of the best innovative engineering I’ve seen in my career was done by the people that built it. Unfortunately I can’t take credit for any of it. But we had nothing. There was virtually nothing in the commercial marketplace to help us get there. So it was all very clever custom hardware design. Even things today that you can’t imagine as being some hybrid of electronics and mechanical things. Like one of my favorite stories is the initial display system that drove those small monitors in the workstations. The first one was a thing we called the converter slide file. The way the thing worked is there was a box about 2 feet long and maybe 10 inches wide. In it was a linear tray that held 35-millimeter slides. Each of those slides was mounted in a little metallic frame that had little dents in it that coded the address of that slide.

On those slides was the background for the displays. Background meaning it might say pump A or bus B or whatever the name of that field was going to be. The way the thing worked is the computer sent an address to this box that told it which of those 1,024, I think it was, 1,000 slides in there it should pull up. This little robotic thing would run down there and grab the right one and pull it up out of the tray and move it up to the front and insert it in an optical gate at the end of the box, backlit optical gate. So the backgrounds on those displays were coming off of prebuilt 35-millimeter monochrome slides.

Then the dynamic information in there where you had to say pump A was either on or off or some bus voltage was at a particular value was filled in by a device. I believe it was Stromberg-Carlson that built them. But it was called a Charactron tube. It was basically a double-ended vacuum tube that from one end of it you would fire an electron beam, and inside the envelope of the tube was a metallic plate that actually had the alphanumeric character set etched in it. So you defocused the beam slightly and hit the character you were interested in and then you took that resulting set of electrons that are making it through the hole that outlines that character, and repositioned it on the other end of the tube to position it in the right place on the screen.

There were two 945-line cameras, 945 lines was a big deal in those days. The standard broadcast system in the US was 525 lines, so we called it high resolution. Today it’s very low res. There were two cameras. One of them was looking at the converter slide file, the background, the other one was looking at the output of this Charactron tube. Then you mixed the video between the two and the end result is you get the display with the static part and the dynamic part. I don’t know who actually developed that, but I thought that was one of the more clever things I had ever seen.

Wright: What’s the timeframe for all this, to make this work?

McGill: We were using it certainly in the ’68, ’69 timeframe. About the time I came over to the NASA side was when we were recognizing that raster scan alphanumeric display things were starting to come along. Certainly not like they are today, but there were fully pure electronic ways to produce those displays. Here in Houston Philco was involved in trying to do some of that, mainly because the Houston operation here was actually a branch office from the Western Development Labs out on the west coast.

The Western Development Labs was heavily involved in DoD [Department of Defense] contracts, including the MOL one I mentioned earlier. They were under DoD contract seeking ways to produce better display technology for a variety of contracts, including the updates to these ground stations that I mentioned earlier. There was already some pure electronic raster scan alphanumeric display technology starting to be developed here, and it was considered prudent to get away from this very difficult to manage electromechanical display thing. There was only one guy on the face of the Earth that could actually make that thing work, with all the electronic and mechanical alignments that you had to do to get everything to fit just right on the screen. It was very labor-intensive. Not to mention the fact that there was a whole team of people that built all these 35-millimeter slides. Obviously you had to think about everything you want on the display well in advance, and they had to go design it. It had to be very carefully laid out so everything was going to fit back together. Then go through the photographic process to produce the slide itself. It was very labor-intensive.

Wright: Are any of those slides still around?

McGill: I don’t know. I wish I’d captured one of those converter slide file boxes.

Wright: I have to find out.

McGill: There might be one hidden under the floor over there in 30 someplace. There’s all sorts of stuff over there. But they were very interesting. If you look around, they’re about two feet long, look like they’re made out of aluminum. I believe Aeronutronic on the west coast built them, and since they did predominantly DoD contracts, it looks like they’d take a direct hit. They’re very rugged boxes.

In fact when I came over to interview they were tinkering with trying to build full electronic displays. One of the things I recall seeing when I came over here was the prototyping on that. That led to them awarding a contract. Actually Philco bid on it themselves, but they were very careful about organizational conflict of interest, and since Philco was the integrating contractor for the Control Center, they walled off the people who were bidding the display.

I think mostly for fear of appearing to be a little biased, they actually picked the bid from Hazeltine [Corporation], who was getting into the display business at that time, as opposed to their own. Actually I think Philco probably had a better design. But nonetheless that device was called the DTE, the Digital Television Equipment. It was a really big deal. It replaced all the mechanical parts of the thing. It allowed us to generate those displays fully electronically.

It was a little bit challenging. In fact I think it was a little too challenging. Those displays, again going back and thinking of technology, in that timeframe we didn’t have solid-state memory like we have today. The only way you really could store things was in core memory. To use core memory to store the elements to produce displays really caused them to have to go extremely fast with the core memory. Probably too fast. One of the issues we fought for years with the Hazeltine DTEs was we would burn up the core drivers periodically, because we were really trying to push them too hard.

But it worked very well. Each one of those devices produced eight channels. We call them a display cluster. The way that mapped between the mainframe computers and the workstation display devices is the mainframe computers would send—we retained the old converter slide file communication format because we didn’t want to rewrite the software in the mainframe. It would tell the Hazeltine device what display it was seeking to turn on in the old converter slide file format. So it didn’t actually pull up a slide anymore, the backgrounds were stored on a local disk.

Then the dynamic stuff of course was periodically changing, was driven out of the mainframe to fill it all in. The composite video then was sent eventually to the workstations. What we had was the eight-channel clusters, the display generators, a big video switch matrix, and then a bunch of workstations.

We had a TV guide. There was one channel that you could pull up and see what is already up and running. If you wanted to view a display that’s already up and running, then when you selected it the video switch matrix simply routed the display that’s already up and populated to your workstation. If you requested a display that was not running, assuming there was an available channel in the system, then the system would activate that display and connect your workstation monitor to the output of that display.

You can see there were some fairly significant limitations. You only got eight channels per cluster. I think we got up to 15 clusters, but we never used all 15 of them for one flight activity. We eventually grew into two systems and we would allocate some clusters on one side and some on the other.

There was a limited number of concurrent displays that could be up in the system. But it was still an incredible maintenance and operation advantage over the original mechanical system.

That system existed over there into probably the early ’80s. I’m thinking around ’82 or ’83. Probably ’82.

Wright: About 20 years, give or take.

McGill: About 20 years, yes. So it served us well. At NASA we try to milk everything we can out of anything we spend the taxpayers’ money on. They don’t believe that but we really do. Interestingly enough, when it came time that they were clearly worn out, we had to replace them—

Wright: Let me ask you. How do you know they were worn out? Were there some indications that you were starting to have problems with them?

McGill: We use the term wearout, but it really has a bunch of things underneath it. If it’s a commercial product, which we use predominantly today, wearout probably means the vendor doesn’t want to support it anymore. They’ve gone three versions later, and they don’t want to mess with that anymore, so we can’t get any vendor support on it.

Back in those days when things were much more hardware-oriented and mechanical in some cases in nature, wearout could mean actually the motors were giving up. The reliability maintainability group would monitor all the equipment, still does. We could see by where we were seeing failures occurring on a lognormal curve that they used that we would declare it in wearout. It’s becoming such a maintenance problem that it’s more cost-effective to replace it.

The other type of wearout on things that are more pure electronic in nature is you can’t get the parts anymore. So if parts fail you may not be able to replace them. That’s basically where we were with the Hazeltine DTE stuff. The electronics that it was built with were becoming very very difficult to procure and parts give up and you have to replace them. The real driver was the growing maintenance cost associated with it. Interestingly enough, when it came time to do that I got asked to do the study to figure out how we ought to replace it.

By that time there was lots of display technology floating around. There were numerous vendors building a variety of displays, certainly things had gone from monochrome to color, much higher resolution, full graphics, which the original DTE could not do. It could do very limited graphics.

But interestingly enough, when I walked into doing that study I was predispositioned to think obviously we’re going to go buy something. It wasn’t until I dug into the details of that that I realized that was the wrong recommendation to make because the interface to that display system was so custom. Remember I said earlier the way it was communicated with went all the way back to the mechanical converter slide files. Of course there was no such thing out there in any vendor’s product, absolutely not.

When I looked at the end result of taking some commercial product and modifying it so it could fit into the system, compared to the cost of designing from scratch a functional replacement for the Hazeltine DTE—and by the way, by that time with all the solid-state technology we had, it was trivial, absolutely trivial design—the recommendation I brought forward was let’s just functionally replace it. Let’s just build another functional equivalent to the Hazeltine DTEs. We called it DGE, Display Generation Equipment. It took two racks of equipment in a cluster for the DTE, and we put it in about five circuit boards, the DGE, to give you some comparison of how much technology had reduced that, it turned into an absolutely trivial problem. There was certainly no component that was under any stress at all in the DGE.

Let me step back and say another couple things about the original Control Center architecture. It was certainly dominated by having to custom-build everything. Almost all the functions done in the Control Center originally were done in custom-built hardware. We had computers, but they were so slow by today’s standards, most of the high speed work, like the telemetry processing, they just couldn’t keep up. So we had very custom-designed frame synchronizers and multiplexers and decommutation equipment, and a whole variety of things were all done with actually a fairly substantial group of engineers, myself included, that did custom hardware design.

When I did the study for the replacement—the other thing that dominated that Control Center I should mention, and it may have just been me because obviously I was a real neophyte at that time, but there was no vision of life after Apollo. We were commissioned to go to the Moon, that’s what we were fully focused on. We weren’t thinking about any kind of a follow-up project. Obviously there were a few things said in public about doing other things. But our mission was to get to the Moon, and so I guess all of us assumed that once we pulled it off they were going to dissolve NASA and we were all going to go find something else to do.

But we didn’t think about it very much. Certainly a plan beyond Apollo was not a driver for the design of the original Control Center. It was rather tactical in focus. Not that I think the people who designed could have come up with anything much different even if they had been trying to build a system that had a 20-year life expectancy. But nonetheless that really was not a consideration.

By the time we got into the early Shuttle timeframe and I was replacing the display system—

Wright: What do you consider to be the early Shuttle timeframe?

McGill: Probably about ’82. We had started flying, but barely, and I was asked to go do the study. That triggered a couple of interesting things. As I said, I wound up recommending a custom-built replacement. Although by that time moving away from custom hardware seemed to be the appropriate thing to do, the circumstances around that display system were such that it was more cost-effective for the government to go ahead and replace it functionally with a custom design.

When I looked at the world around I said, “Everybody else is doing things a lot different than we are.” The original architecture for the facility was a computer-centric architecture. I’ll explain myself. I see big systems as falling into three categories. There’s those that are computer-centric, there are those that are data-centric, the heart of the thing may be a large database, and those that are by today’s terminology network-centric. Networks didn’t exist back when the Apollo Control Center was built.

When I did the study to replace the display system I looked. I said, “The world seems to be going to network-oriented distributed processing systems. Displays are no longer generated in clustered display generators. They’re generated in workstations.” Even though the task I was asked to look at was only to replace the display generation stuff, it became clear to me that we really didn’t have the right architecture on the floor.

Understand that from the original Control Center up until the early Shuttle timeframe the architecture did not change, but we’d swapped out every piece in the thing. It wasn’t that we had mid ’60s vintage equipment in it. There might have been a little of that lying around, but mostly all the key elements had been upgraded at least once, maybe twice. Mainframe computers were probably two models later by that time. As I mentioned, we replaced the display system twice, because it was originally this electromechanical converter file thing with the DTE and now we were going to put DGE in there.

Even though the pieces were changing, and certainly we had a large effort to swap almost all the pieces out for early Shuttle to make it easier to maintain, it became very clear to me we really had the wrong architecture. It wasn’t just a matter of modernizing the elements of the system. The system structure was not correct. I’ll explain why I came to that conclusion, because this is really I think the important topic today, about architectures and why they get picked.

Some of what goes on there actually is outside of what [NASA Procedural Requirements] 7123 talks about and some of the documented [NASA] systems engineering processes [and Requirements], which I think is important for people to understand. The thing that was wrong with that facility, we had gotten rid of a lot of the labor-intensive hard configuration items. We no longer were making 35-millimeter slides for the displays. Originally all the event lights on those consoles were hardwired to drivers, computers turned the lights on and off.

Every time we tried to change the configuration of the facility you had to move all those wires around, so we had cross-connect cabinets, and we had a herd of people then that had to go in there and check to make sure the right light lit when the particular bit was set. It took thousands of hours to configure the system, which really wasn’t a consideration in Apollo at all.

We had started moving away from that. We’d gotten rid of the 35-millimeter slides. We had done a project we called CONIS, the Console Interface System, which provided a bus-oriented communication out into the workstation to turn lights on and off. So it eliminated a lot of moving wires around and checking that. We still had to do a lot of testing to make sure everything was hooked up right, but it made it more of a configuration message that was sent to the console to turn a light on or turn a light off.

We were making real progress on chiseling away at the labor-intensive nature of configuring the facility. But we were sitting there and I looked at our architecture and I said, “Nobody does business that way anymore. Everybody’s going to network-oriented distributed processing.” At least that’s what I thought at that time. It should be remembered that the Shuttle program early on was trying to get to flying about 24 flights a year. They envisioned being on orbit all the time with a Shuttle and sometimes having two up there. This was all before [Space Shuttle] Challenger [accident, STS-51L] and we decided maybe that was too aggressive.

But when we looked at that facility, even where we were with the things we had done to make it easier to configure, the best we could do was nine flights a year. The system, the way it was architected, was still hard-configured, was still not very flexible. It took weeks to months to instantiate a different mission configuration in the building.

Here we had a program that wanted to fly two dozen times or so a year. We couldn’t do it, absolutely could not do it. So we really couldn’t meet the program’s requirements. I concluded the basic way we had it structured left over from Apollo was not the way systems were going. So our ability to reach out and touch commercial products would be seriously limited because we had a structure that wasn’t consistent with the way the rest of the world was going.

There were a whole lot of factors that got me to thinking that we really ought to go do something about it. Interestingly enough, Philco, probably Philco-Ford by that time, agreed with me. They set aside some of the company money to launch a study in 1983 to go look at that, which we did totally over on our side of the house. We certainly weren’t hiding it from NASA, but I don’t know that I advertised that we were looking at replacing it, mainly because the paint wasn’t even dry on all those pieces that we had swapped out in early Shuttle, and it didn’t seem like a real good idea to go over and tell them, “You know all that stuff we just spent a lot of your money on, we think it’s wrong.”

Interestingly enough, in ’83 NASA came to the same conclusion independent of us. It was one of those phenomena that’s hard to explain. But they realized that we had this brand-new set of stuff on the floor and we really weren’t structured right. That eventually led into what should we do, and there was a lot of years that went by. Some of it technical, much of it concern from a programmatic standpoint. I’ll whiz through some of that. But the conclusion there was that we really wanted a net-centric architecture, not a compute-centric architecture. We wanted to distribute the computing so that we had some unknown number of workstations, each of which was a computer, and all the application layer would run out there. So we had no limitation in the system design then as to how many workstations we could have. Eventually we’d run out of room in the building I guess. But the system didn’t really care.

The distribution on the local network was somewhat predigested data but not carried all the way to where it was analyzed. That was also very appealing to the flight control community. One of the things they disliked about the system early on was to get any analytical application produced, it had to get coded and placed in the mainframe computer, competing with everything else that was going on in the mainframe computer. It was not uncommon for it to take a year and a half to two years to get some new application in there, because it had to get fully defined in the requirements. Had to go off and code it carefully and test it to the nth degree because it was going to run on a machine that all the critical functions were running in.

They didn’t like that, the operation community tended to have a little shorter horizon they were looking at then than what it was typically taking to get new functions into the system.

Giving them each their own workstation they could play with was quite appealing to them. Eliminating their dependency on the mainframe computer.

Wright: A true own domain.

McGill: Yes. It not only had some strong technical basis to go that way but there was a pretty significant emotional driver in the community to get away from that kind of a restriction.

Wright: I was going to ask you earlier about the transitions from one to the other, because users had to make those transitions. So this was going to be a transition that they were probably going to embrace.

McGill: Yes, it was. The transition itself is probably a fascinating topic all by itself, because as you might imagine just from what I’ve said so far, generation two, Apollo being the first generation architecture, the one we’re talking about going to here is the second generation architecture, we’re on the third one now, I’ll describe it in a little while.

It was a very radical change. There were many many things that were fundamental in the new architecture that didn’t exist in any form in the previous one. It was right on the leading edge of applying local area networking technology to something as demanding as real-time telemetry processing. In fact I think I jumped the gun a little bit. A little too ahead of the curve. We had a history of very centralized computing going to fully distributed computing. So, it was a very radical change.

I do recall some of the senior NASA management telling me they weren’t at all sure I could figure out how to pull off a transition from one to the other, because Shuttle was already flying, and the Shuttle Program wasn’t going to stand down for six months or a year while we played with the system.

In fact one of the difficulties in trying to figure out how to do that was the lack of real estate, [Building] 30M [Mission Operations Wing] was all we had initially, and it was full. Trying to figure out how to come up with enough floor space to put basically another system alongside of it to transition to didn’t exist. Looked at options about trying to stage it over at Building 17. I looked at trying to get trailers and move everybody out of 30 into trailers in the parking lot to free up as much space. Nothing was going to work.

Fortunately in preparation for Space Station, when we got to thinking about it, we went and built 30S [Station Operations Wing]. So all of a sudden we had some real estate to work with. Otherwise we’d have never pulled it off.

Knowing we had to transition rather seamlessly was a major system level design driver. I’m going to go back and recap some of the things that are really important considerations that you can’t find written in Level A requirements, that you don’t really find in the classically documented engineering process.

I’ll explain exactly what I mean by that. We were looking for a system that could be configured quickly. The closest I could come to that is I did write a Level A that said we had to be able to transform the system from one activity to another in an hour or less, which seemed incredible at the time. Today we do it all the time. That one could be a real requirement. Many of the things we were after are rooted more in what I call needs, goals, and objectives that in engineering terminology can’t be requirements, because you can’t test them.

If I tell you I want a system that can remain viable for 20 years, you’re going to tell me that’s not a good requirement. I’m not going to wait around for 20 years to try to sell this thing off to you. Those kinds of desirements are difficult to deal with in the classical systems engineering process.

We had a lot of those. Many of the things that we wanted. For example, we wanted to have fully portable software. How many times do you have to port it before you can demonstrate you complied with that requirement? How many years will that take? It’s not a testable requirement. Very difficult to test if at all.

Most of the real targets, the behavior that we wanted out of the new system, really weren’t deeply rooted in standard engineering processes.

Wright: So you were rocking that boat, weren’t you?

McGill: Yes. I still am. Which is one of the things that I hope comes across in this material, because I’d like for people to understand there’s a lot more to consider when you’re building these systems than simply meeting the Level A functional requirements.

The other things that were important to us at that time is I was convinced that we were on the brink of being able to eliminate all the custom hardware in the facility—we almost did it, didn’t quite—but that there was enough out there in the marketplace for us to go buy, not software but hardware. Which eliminated an awful lot of the cost of ownership. As you can imagine when it was locally designed and built hardware, then we had to do all our own maintenance, board level maintenance. You couldn’t call a tech to come in from the factory and do it, we were the factory. There was a huge labor force just to keep all that custom hardware working, not to mention the labor force it took to design it in the first place. I thought that the technology had gotten to the point that we could put the new system together largely with commercial hardware.

Even on the software side, I felt like that we could go with commercial operating systems. In particular I wanted to pick one that had fairly wide industry support, because recognize, hardware does wear out, and you have to replace it, and you don’t want to rewrite all your software every time you do it, because remember, software portability was one of my targets as well.

We went with POSIX [Portable Operating System Interface] as the standard that we were going to use—today we would call it UNIX probably—because many vendors were supporting it. I knew that whatever hardware models I went and bought, within seven years or so, I was going to replace them. They were going to be in wearout, the vendor didn’t want to support them anymore.

I wanted to buy some insulation against the hardware, because the downside to going and buying COTS [commercial off-the-shelf] hardware is you can’t fully control your own destiny. If you design and build it here, you can maintain it as long as you want to, as long as you’re willing to pay the annual price to do it, because you’re sustaining it yourself. You’re not at the mercy of the vendor to do that.

Knowing that when you go COTS, you got to be prepared to swap that stuff out on a fairly regular basis, the right thing to do is to make sure you put all your custom intelligence of the system in in a way that you’ve bought as much insulation against the underlying hardware platforms that you can, so you can replace them whenever it’s prudent to do so, and you don’t have to go rebuild all the software.

We picked a POSIX kind of an operating system structure. Which I was told by many people there’s no way you can do real-time processing on POSIX, but that’s turned out to not be true. Although they did scare the hell out of me, I’ll tell you, they really did. Because some of the people who were telling me that I had a lot of respect for.

Wright: Was that in house or industry people telling you that it wouldn’t be able to do real-time processing?

McGill: Really both. There wasn’t very many places that that kind of real-time processing was being done at all. Even going back, the original, the real-time operating system we ran in the IBM mainframe computers, IBM built it for us here. Eventually they marketed it and called it RTX. But most of the time computers were checkwriters. They were being used in the financial marketplace but not flying spacecraft. They weren’t rich with the kinds of resources that you really needed.

The UNIX kind of OSs were really designed to predominantly support a software development environment. It did things, for example if you had 50 users logged on the machine trying to do software development and compilations, it sliced up the machine evenly across everybody trying to use it. Really didn’t make any provision for prioritization of tasks. It was in the basic personality of those operating systems to not give you a rich set of features to handle the fact that some things are more time-critical than others.

There was an element of truth in what they were saying. I still felt like we could put enough horsepower behind it and make it work anyway. The benefit of getting something that was pretty much industrywide supported outweighed the difficulty that I might have in getting it all to play right.

We were trying to accomplish a lot of things there. There was a fairly long list of targets, many of which you could not represent in the documented requirement set, because there was no reasonable way to go test the system at delivery time to prove that you met that. So, most of the drivers were really hung out over here on the side.

But we did produce it. As I mentioned earlier, and I’ll give you a couple of examples, I knew from the beginning that I had to put something together that we could transition. We were not going to be between programs. We were going to be in the middle of a program. We couldn’t stop supporting the program. There were features designed into the system that would allow us to incrementally morph from the old architecture to the new one. Maybe that’s a slight overstatement.

I’ll give you some examples. In the old original architecture we had some frontend processors. We called them TPCs in those days, telemetry preprocessing computers. They did the first level of decommutation on the downlinks for example. They decommutated the downlinks for us and ordered all the parameters up in a buffer. The buffer then was sent through a very custom set of communication stuff that we called the MBI, the multibus interface, that provided the communication path between the frontend processors and the IBM mainframe computers. Then the IBM mainframe computers did all the magic to the data. They were responsible for doing all of the computations for analytical purposes, generating all the displays, and driving the display system.

The first step in the transition was to replace that very custom communications thing between the telemetry processing computers in the front end and the IBM mainframe computers with a local area network. Fortunately we could go get Series/1s, which was an IBM product that we could hook up to the mainframe computers, and with a little magic in them we could make them learn how to talk on a network. We went and built a device called the NDD, the network data driver, very cleverly named, that functioned like the TPC did, but it knew how to talk to a network.

We switched first so we started trafficking the data between the frontend processors and the central processor across a local area network. That did two things. One, it put the network in place so now we could start talking about hanging workstations off of that network. But it also eliminated one of the very custom pieces of equipment that was hidden in the middle of it. Since it was transitional, the functional allocations between the processing on the front end and the mainframe didn’t change. The frontend processor produced what I called an NDM, a network data message, that was raw in that it was not calibrated—it was the actual bits that came down from the spacecraft—but it was decommutated. We pack those things in there as tight as we can, because we’re always very bandwidth-limited on the RF [radio frequency] links. There’s actually a bitfield in there. You got to sort it all back out to identify the individual parameters.

The NDM was a raw form of the message. It had been decommutated and the parameters identified, but they were not calibrated. Trafficked them over to the mainframe. It did the same job it had done before where it handled all of the polynomial calibration on the parameters as well as did all the computations with them.

It was expected at that time by me that eventually that message would go away. That was an interim step because eventually I was going to move all the calibration functions over to the frontend processors. Remember, the mainframes were going to go away, at least in my plan.

It turned out by the time we got to that point in the transition where it was going to pull the plug on the NDMs, I had already replaced them with what I called a PTM, a processed telemetry message. And so it carried the fully calibrated version of it, calibrated in the frontend computers. When I got ready to get rid of the NDMs, the flight control community said, “No, no, no, no, no.” They wanted the raw values too, so I said, “Okay, we’ll just let them run in there.” But that was supposed to be a transition-only element in the design. It turned out it stayed around all the way through Shuttle. But who cares? We already had it in there.

But, we did things in a particular way so we could incrementally shift things over. The actual transition from old to new was incremental as well. We had both systems up and running in the building. We would first switch over and go do some routine Shuttle on-orbit ops with the new system, but we flew all the critical phase things like ascents and entries with the old system, until finally we had a high enough comfort factor that the new system actually worked that we were willing to fly an ascent with the new system and declare transition complete at that point.

Wright: Do you remember what mission that was?

McGill: No, I do not. It’s probably well documented someplace and probably remembered by other people. But it was probably in the ’95 timeframe by the time we actually pulled off the transition.

It was incremental in two ways. One, we didn’t ask the flight control team to show up in a different place one day and just pull the plug on the old one. But it was also incremental in the way we designed the system so we could morph functions over from one to the other so that things appeared to be under control all the way. But it did influence the design. Certainly as I mentioned the NDMs stayed around for a very long time. They’re not there now because we’re not flying Shuttle. But still even on the Station side we have RIMs [Raw Information Messages] and PIMs [Processed Information Messages] which mirror the old NDMs and PTMs, the raw telemetry messages and the processed ones. There are artifacts of having to do that transition in what we now call the legacy system because we’re fixing to replace it again.

Again you can’t write those in a requirements document. One of the key things in really understanding this architectural engineering is figuring out how you’re going to get there from here. Obviously if you can’t get there from here you’ve got a failed project. But also to understand what it’s going to mean to try to put an architecture in place that you think you can live with for 20 years. I’m going to say some more about some of the elements of that that I think are very important.

You’re not going to get a chance to replace the architecture very often. That takes a lot of money. Seldom do the planets align and congressional—

Wright: Benevolence?

McGill: Yes. I was looking for a word. Allow you to do that sort of thing. Longevity is a very very important element in producing a design at the architectural level. Let me define what I think architecture means. That’s an often used and mostly abused word I think.

To me the systems architecture is talking about its behavior, especially its behavior in the face of change traffic. A system level design, you can hand me a stack of functional requirements, Level A requirements, things the system has to do. I could go draw you up probably half a dozen different structures that can meet every one of those requirements. In fact some of them would be compute-centric, some of them might be data-centric, some of them might be network-centric. But they functionally will do everything you’ve asked me to do. But over time they’re not going to behave the same way. One structure is going to be able to capitalize on things like network bandwidth or speeds just becoming better and better and better without us spending any money on them, the technology is growing, computers are becoming faster, they have more memory in them.

Some of those structures would lend themselves to that kind of plug-and-play swapout better than others would. To me when you talk about the architecture of a system, you’re really talking about how well is it going to behave itself for the next 15 or 20 years. Some of that change traffic you probably can see. Some of it you probably can see. Some of it you cannot. A key element in a long-living architecture is building in some very carefully thought out modularity and flexibility. We like to call them pinch points.

What you really want to do is decouple things in clever ways. As I mentioned before, we have what we call ER [equipment replacement] program. We typically try to swap out the hardware stuff about every seven years. Some things live a little longer than others. You know over the course of architecture that’s going to last you 20 years you’re going to change all those pieces out several times. How do you do that without disrupting the rest of the system? Because you don’t do them all at the same time. The money is not there. Each fiscal year you’ve got budget to change out a few things. You have to have a certain modularity so that you can actually sustain the system, you can swap those things out without disrupting operations, without forcing a massive redesign on other parts of the system.

The modularity is also driven by things like if you look out there you can see that certain technologies are going to advance because the commercial marketplace is pouring money into them. A common mistake, an example of it I think was in early Station. There was thinking initially that they would go ahead and run a 300-megabit downlink on the Ku band side of Station early on. Remember, that would get downlinked to White Sands, New Mexico. The question in my mind was how are you planning on getting that from White Sands, New Mexico, to Houston, Texas? Remember, back in those days, communication satellites was all you had.

The people I was quizzing on that said, “Well, there are 50-megabit-per-second transponders on communication satellites today. By the time we need them they’ll be 300-megabit.”

I said, “No. The commercial marketplace is spending their money trying to figure out how to slice up one 50-megabit transponder to service a half a dozen different customers. They’re not spending any money trying to build 300-megabit transponders. They have no market for it. If you take that position you’re probably going to wind up having to build and deploy your own wideband communication satellite, which I don’t think you’ve got in the budget right now.”

It turned out the problem got solved eventually because everybody starts sticking fiber in the ground. Now we have lots of bandwidth from coast to coast in the US with all of the fiber optics that’s in the ground which is what allowed us to step the Ku band rate up recently to 300-meg and we’re fixing to go to 600-meg. Satellite communications is not the way we get it in; we get it in with fiber.

One of the key things is when you look out there you can’t just assume everything’s going to get better. The speed of light is not going to change. If you’re going to count on something improving you better have identified the non-NASA commercial drivers that are going to make that happen, because we don’t want to have to go invent all the stuff ourselves. We want to be able to buy it. Some things, you can see those drivers, and you can count on them, they will happen naturally, and by the time you get to the next ER cycle you’ll have better things available you can go buy. Some things will not. Some things are not going to get better with time because there’s nobody wants it but you.

Wright: Can you give me some idea? How were you able to keep up with what all the industry was doing to give you those pieces of information to be used? Because you didn’t have a lot of the access to information like we do in 2015 back then.

McGill: We didn’t have Google. You couldn’t Google things like we can now. But still there were enough periodicals floating around. You could pretty well tell where the networking industry was going, where—I won’t call them personal computers but much more personal than a mainframe certainly—workstation class machines were going. You could look at the trends and the projections in the semiconductor industry about what they thought would happen periodically in terms of the speed of processors. There was plenty of indicators out there to give you clues about which things were going to improve for you.

It also gives you some insight into—when you’re putting the thing together, and certainly the second generation architecture is an example of it, the things that you would be willing to be right on the hairy edge because they’re going to be self-improving, they’ll fix themselves for you. The initial instantiation of the network-based second architecture system barely had enough bandwidth on the local area network. In fact it really didn’t. I compromised the design some to make it work, but it was right on the hairy edge. I knew from what was going on in the industry that you wait a few years and that’s going to get really fast. Then by the time we get ready to replace it, that won’t be an issue anymore. You’re willing to take the chance there because you know the situation is going to improve.

There are other categories of things where you might say, “I’m willing to be wasteful in this category because there’s a benefit. The category I’m being wasteful in will become less and less of an issue because the marketplace will fix it.” For example when we initially put together workstations the amount of memory that was in those things was very limited. Amount of processing power was very limited. And I was willing to get on the hairy edge on the processing power to do a whole lot of graphics in there because I knew that was going to fix itself with time. Without us spending any money those processors were going to get really fast.

You can make decisions where some things you can project out with a fair amount of accuracy. You may not get the timeline exactly right, but you can tell where things are going to go just because the industry is driving that way. Not that you’re foolish enough to think you’re going to drive them. So, you can make decisions that say that we can be a little at risk here because it’s going to fix itself with time. I can be wasteful over here because there’s a benefit to being able to display data more graphically, for example, even though I’m really taxing the processor a lot. But that’ll fix itself with time.

Some of those things are fairly easy to see. But there’s certainly a category of them that are not so easy to see. We’re sitting here today looking at exactly that kind of a future. As I mentioned, we’re on the brink—I say on the brink. We actually started doing simulations for certification Monday this week on the new system. It’s rolling into operations. Within a matter of a few months it will be flying the Space Station for us.

It’s the same MCC. We only have one. Now we call it MCCS because we’ve integrated some of the peripheral functions in with it too, so it’s Mission Control Center Systems. But MCC-21 is a project. It’s the latest one where we’re pushing the thing up to the third generation architecture.

Why do we do that? What’s different about the third generation architecture compared to the second generation one? What were the reasons why we thought a change was necessary? I’m going to throw a term in I think nobody in the world uses but me but I like it. I firmly believe in the phenomenon that I’ll call architectural wearout or architectural obsolescence.

Obviously you can go use lognormal curves or some of the other things that are used to predict when equipment is going to wear out. But what that really means is when you look at the state, the configuration of the system you have on the floor, and the amount of money you’ve got invested in it, and what it would take to replace it, and the operational requirements that you’re seeing now, you’re trapped behind the power curve. The world has shifted enough on you that all of a sudden you can’t really support it well. I mentioned even going to the second generation architecture some of those things.

Shuttle wanted to fly 24 times a year. We couldn’t do it. The architecture wouldn’t support it. So I would contend that coming out of the Apollo era going into Shuttle we were in architectural wearout. The one we had on the floor was dead. We could not get it to go where we needed it to.

Today we’re seeing a variety of things that are shifting. First off, we believe when we look out there with commercial players involved we look at much more diversity in the kinds of missions we think we’re going to fly. We were used to having one or maybe two single big programs that went on for 20 years. Now we think we’re going to do one-offs. In fact we already have. EFT-1 [Exploration Flight Test-1] was a good example of it.

We see the need for a different kind of flexibility. We also see the need for the capability to be a viable participant in geographically distributed operations. Where today if you look at our history MOD [Mission Operations Directorate] was it. We did the whole mission start to finish. The Control Center that’s on the floor was designed as a closed shop Control Center. Now we did stick Band-Aids on the side of it to support international partners, but that really wasn’t a driver when they designed that one. In fact Larry [Lawrence S.] Bourgeois was our MOD rep to the ISS [International Space Station] Program early on. He was quite emphatic that we were not going to do element ops. If the Japanese [Japan Aerospace Exploration Agency, JAXA] wanted to play they could come sit in our building. If ESA [European Space Agency] wanted to play they could come sit in our building. We all know that’s not the way it came down. But the system was not designed to support geographically distributed operations, and so it’s been a little bit of a stretch to make it do it. But we see our future as necessitating that all the way around. If we go off and do some sort of a Mars mission, you know that’s going to be an international event. It’s not all going to fly out of [Building] 30. Other geographic locations are going to have certain responsibilities in that.

We know in the interplay we already are participating in with commercial vendors that that’s the nature of the beast. The new MCC-21 architecture is explicitly designed to support that, which was not a factor in the second generation. In fact the Agency did us a huge favor—coincidental but I accept a gift anywhere. They had decided about the time we were looking at needing to do this to collapse out all of the Center level network domains and go to NDC, which is the NASA Data Center domain.

Prior to that Marshall [Space Flight Center, Huntsville, Alabama] had their little network and JSC had their little network. There were cross-connects across the Agency, but it was not one domain. By going to NDC, which is NASA-wide, one domain, that gave us a very convenient mechanism to exchange data between here and other Centers.

Most of the participants that interact with NASA, even if they’re not a NASA Center, have NDC accounts and have access to NDC. That gave us a very convenient mechanism to build on. We changed our strategy not only from MOD flying the whole mission to being willing to work with other operational participants, but also even on the JSC campus of the role of MOD versus IRD [Information Resources Directorate].

When I looked out there I saw several interesting things. The Agency had invested significantly in what we call our corporate backbone, e-mail and WebCAD and all the things that go on to make the organization work, where certainly not very many years ago the capability of those systems was not suitable for flying manned spacecraft. The criticality was too high. Reliability was not there.

But today with the investments the Agency had made, they were very very close to the kinds of performance that we were used to for flying spacecraft. Certainly for the less critical side of our operation they were there. So, it seemed economically prudent to jump on their bandwagon. Plus the fact that they had provided us the very convenient mechanism for exchanging data with other NASA Centers.

We split our architecture. At the very top level of the new architecture we have a side that we call the high side and a side we call the moderate side. The terminology used in there, we have MCE, which is the Mission-Critical Environment, and the MSE, which is the Mission Support Environment. We split it that way probably for two reasons. There’s a balancing act, which there seems to always be. One is I said the MSE side is sitting on the NDC domain. So it exposes those services in ways that it’s very easy for us to give accounts on the system to people at other Centers. It’s there.

But the downside to NDC is, remember, it’s hooked up to the Internet, and people need that to do business. So it’s a little bit public. So to try to get the best balance between making our services available outside of our little world and yet providing the kind of security mechanisms that we really need to protect ourselves, we split the system.

MCE, the Mission-Critical Environment, can be isolated away from the MSE, so that if we’re having a very bad day and NASA goes under attack, that does not compromise the vehicle or the crew. We have a safe haven. We have the right switches that we can throw, and we will not be corrupted by whatever might be going on on the more public network.

Of course at the expense of some of that intercenter traffic, but if you were in hunker-down mode, that’s probably not the number one consideration. So we split it that way to get the best balance we could with guaranteeing that we provide the level of security that the country expects from us to manage these very expensive spacecraft, but at the same time build that element into the architecture that was missing from the second architecture to conveniently support interactions with people that don’t live here.

The other thing that we changed in the system that’s a bit subtle, but very important, is we decided since the first one I mentioned was a compute-centric architecture, the second one was a network-centric architecture, we decided to go with a data-centric architecture. The reason is again even back when we did the second one we were flying Shuttle. We did real-time operations. The system is really designed to be a time now real-time system. That’s what we did mostly. Now the engineering evaluator guys worked in a different timeframe but they’d come get stored data and go off and analyze it, but it was a time now system. When we looked out in the future with the desire to go play with asteroids and go to Mars and flight time delays were going to get very long through the laws of physics, I can’t fix that. All of a sudden it didn’t look like time now was the right answer. It looked more like a TiVo kind of a thing where you needed to be able to slide a little slider up and down and say, “Well, here’s what it was doing yesterday, here’s what we think it’s doing about right now, here’s what we think it’ll be doing tomorrow based on our spacecraft models.” All of that obviously implies that I’m interacting with storage, with a database.

We believe the best way to posture for the kinds of missions we think we want to fly at least—again this is speculative, it’s not really written in the Level As, but planning for the future—is to switch to a data-centric system. So, we have a thing we call Ops History. We gave it that name because that solves a real problem that we looked at. I probably should have said some of the changes always are not because the world has changed around you. Sometimes you’re changing things because you say, “Boy, I wish I could have done that the first time but the technology wouldn’t support it but now it does, so I want to change that.” Some of that are just flat lessons learned saying, “Boy, that wasn’t a good idea, was it?”

One of the things that I felt like though we missed the mark on on the second generation architecture was a lot of data got generated in the system that the system common services did not manage for the user. They were just left on their own. They could write over their files, and they did occasionally. That was a real mistake. There really was no complete auditable data set that was kept by the system for them. When we put the data-centric element in it we called it Ops History. We think we have largely fixed that where it’s a very complete set of everything that was available while the mission was going. The raw data, all of the computations that were done, all the analytical results are stored in there, and configuration-managed for users automatically by the system.

Part of that was lessons learned. Maybe a little technology in there, although file systems were pretty good by the time we built the second one. Really we didn’t go to the third of the three categories just so we could play with all three of them eventually. We went there because we think that is the structure that best suits our future. Some of it is speculative. But we wanted a posture so that if a new administration might suggest that we could go play with a Mars moon then we were positioned to be able to do that. Our system can support that quite conveniently.

Other changes that we made architecturally. As I slightly mentioned the downside of opening it up is security. As long as you’re air-gapped the way we were basically, you don’t worry about security too much. Nobody could get in, the door is locked. When you start making the services available outside of your controlled access areas, however, you better worry about security.

We made security in the system a number one priority in the new architecture. Security is not a Band-Aid that we stuck on the side of it after the fact, which is what it was largely in the second generation architecture. It is a primary part of the way the new architecture is put together. And it goes beyond just making sure bad guys don’t get into our system. It deals with again another artifact of where we think we’re going. There are different levels of sensitivity of the data. For example on ISS there’s some elements in the downlink that are considered ITAR [International Traffic in Arms Regulations]-sensitive that we don’t just give away to a foreign national unless there’s been some agreement made.

There are other elements in the downlink that are medical private. For example any spacesuit data. There’s legislation that we have to guarantee we don’t distribute that inappropriately. But in general in the second generation architecture all of that was dealt with procedurally. You’re not supposed to look at that, so don’t. But there was nothing in the system that really prohibited somebody from looking at something they weren’t supposed to.

With the new system we’ve built all of those data protection mechanisms into the heart of the system. Not only that, really I don’t think we could get away with being grandfathered like we were anymore anyhow. But even more importantly, if we’re going to start dealing with commercial vendors’ data, they consider it proprietary. They’re not going to want to play with us if we can’t protect their data.

The new system not only has new security to protect us from the outside, it has security on the inside to make sure that we are fully capable of handling a variety of different data types, and even though we have multiple concurrent things going on in the building they don’t get shared with people who are not supposed to see them. There are a variety of new features in the new system. Also some of them that are more subtle. In the second generation system I mentioned the process we call recon, reconfiguration. Every time we get a new vehicle, every time we fly another mission, we wind up having to reconfigure for it. We don’t like to do anything twice.

It’s a significant process even with a soft-configured system like we have now to regenerate all of the file products necessary to understand how to interpret the downlinks and calibrate things right. Unfortunately in the second generation system part of that process was all the way over on the client side of the network, simply because the network bandwidth was not enough. That’s the compromise I mentioned earlier. I could not fully process the data and make the client recon-insensitive because the end of it was we fixed that.

The new scheme that we’re using to traffic messaging inside of MCC-21 totally removes the client side of those messages from the recon process. They don’t require any recon products. All fully self-interpreted messages. There’s also another element in this that has yet to happen, although I think we’re on the brink. I have this vision that I’ve had for a while that’s difficult to make happen, but to use the cliche OneNASA, a much more integrated ability across the Agency of the Centers to interact with each other.

We designed the mechanisms that we’re using internally where they work equally well externally. The one I was just talking about, our internal messaging scheme, is very well suited to share data with another Center. It’s very efficient. Like I said the recipient Center does not have to know anything about recon for the vehicle. It also has security mechanisms in it.

Also even the second generation system is technically what’s called a service-oriented architecture. To explain that a little bit, obviously it’s network-oriented, so things actually happen based on IP [Internet Protocol] addresses for source and client, but you don’t have to know the numerical IP address. It’s a little more like the way you surf the Internet where I know this URL [Uniform Resource Locator] for a Web site and I stick it in there. Actually that gets resolved in a domain name server somewhere where that name is matched up with an IP address which connects you to port 80 on that server out there someplace.

You don’t know that, but that’s what’s really going on. We do that inside even our second generation system, but on a much larger scale than what a DNS does, a Domain Name Server does, because we actually rendezvous not only just with IP addresses but port numbers on the machine. So one physical machine under one IP address could be vending a lot of different services.

Our whole internal architecture on the second generation system was really service-oriented. It did not connect with things by knowing IP addresses. It connected with things by knowing the name of a service or the name of a parameter, and it got automatically connected.

Those mechanisms were largely built in the second generation system to work inside a system. They were not well suited for a wide area network. They were okay for a local area network. When we generated those functions in MCC-21 they were built where they can work quite well on a wide area network. It sets the stage for interactions between us and Marshall or us and KSC [Kennedy Space Center, Florida] to where we can take some of the techniques that we have proven over the now decades here to work well for us and push them out in the communications between Centers.

We’re expecting some of that to occur going into EM-1 [Exploration Mission-1]. Some of the work we’re doing with Marshall folks and figuring out how we’re going to interact with them. They’re building the rocket, the SLS [Space Launch System]. It looks like they’re going to accept our TIM format, Tagged Information Message is the name we gave to this new way we traffic data, as an input. We’ve agreed to provide them software to read it. There’s an opportunity to save some money there but also to start a process of standardizing the way the centers will interact with each other.

Wright: Yay!

McGill: Yes. Maybe my dream of pulling the Agency together, which I think the Agency would be incredibly powerful if the Centers could work together more. I can’t solve the politics but I can certainly work the systems engineering side of the thing to try to put mechanisms in place that allow that to happen if the desire is there.

Wright: You could build the road for them to go down.

McGill: That’s right. We’re doing that. There’s a lot of it built in. Some of it is a natural artifact of opening the thing up to the outside. But some of it is making very clever choices about how things are done so that we already have the solutions in hand for the problems that have not been placed on the table quite yet but we think are there. You can see there’s a list of categories of things that need to be thought about when you’re fixing to rearchitect one of these systems. Which leads me to one of the questions I saw that you had on your list that I think is an interesting question. What are the big challenges?

To me the biggest challenge, certainly if you’re working at the system architecture level, is communications. Let me explain what I mean by that. There’s probably not a large percentage of the community that’s running around figuring out where you want to be 20 years from now. It’s just the way life is. The majority of the people don’t think past Thursday next week. Building large systems is very much a team sport. It takes a lot of people to do it that range all the way from the architects at the top to the software developers and procurement organizations. There’s a large number of people involved, and there’s decisions being made all up and down this hierarchy.

As I mentioned, there’s a fairly well documented process in systems engineering, 7123 for NASA, but there are many many descriptions of the engineering process out there. It really talks about how you have the set of requirements somebody wrote, you take those requirements, and a smart engineer does some designs and produces some sort of a design spec, and then that design spec becomes the requirements to the next guy. All of a sudden this thing decomposes all the way down to things that can actually be implemented.

That’s all true. But what’s not in there are the things that defy being represented in those requirements. That’s the reason I mentioned that several times. There is a set of drivers that need to go into the engineering process at every one of those levels that is cognizant of where we’re trying to go with this. These are not things that we can consider requirements because you can’t test to them. But they’re things that cause me—of the five different ways I could satisfy these requirements—cause me to pick a particular one over the other four. Those decisions are made across the entire team.

As you decompose one of these things it grows geometrically, and you can quickly wind up with 1,000 of these things going on, and one person can’t possibly be cognizant of all the decisions being made. So, the challenge is how you take those difficult to represent things that can’t really be requirements and communicate them across the team so this whole array of decisions that are being made on a daily basis really favor where you want to go with it.

I don’t have a really great answer for that. I would love to challenge someone who might read this someday to think about a great way to blend that into documented engineering methodology. It may have to do with a set of categorical questions that might get written down and evolved with time, where as design decisions are made you go back and challenge them against these questions.

Well, how will your design react if suddenly we have a mission that is going to involve three countries to go fly it? How are you going to tolerate that? How is your system going to respond to all of a sudden wide area networking is twice as fast and half as much money as it is today? Can you take advantage of that?

Those kinds of questions possibly could be documented and used as a test and some guidance that when you get down into the ranks of the people building it to help them steer their decision process a little bit to favor the things that you think are important out there. The actual set of questions varies depending on the perception, but it may be a better way to try to communicate that across the entire team so that all those decisions they’re making that definitely affect the behavior of this thing, the architecture, are made smartly.

That’s a challenge I guess to anybody that’s reading this. Think about how you think you ought to do that. If you believe what I said about architecture as a layer above a system design, if you want to see it that way, an architecture is all wrapped up in behavior and how well this thing may or may not behave itself over a period of time in the future, then the real question is how do you get the whole team to understand your vision of what that looks like, and the decisions that they’re going to have to make to make them consistent with helping to make that a reality.

To me that’s the biggest challenge. Now of course there’s plenty of challenges in this business all over the place. If you’re designing hardware how you get it to go fast enough. But I think the one that really is the opportunity to break some new ground, do some good things for the Agency and maybe aerospace in general, is to figure out how to better address those somewhat elusive characteristics that a good system versus a bad system has to exhibit. To me a good system is one that lasts a long time. Replaced all the pieces in it, but the architecture is still there.

I’m sure there will be a fourth generation out there at some point. Hopefully it’ll be 20 years from now or so. Maybe less. Maybe by that time someone will have a very clever way to have written an annex to 7123 to put all these factors into the consideration in a more formal and organized way than simply using the missionary process that we do today where you’re running around telling people and hoping that they understood what you said and go back to their office and think that way.

Wright: Let me ask you too, when you’re talking about challenges, through the course of our conversation you mentioned about for MOD it was pretty much all their domain.

McGill: Closed shop.

Wright: Then was there a lot of cultural challenges when you started broadening that? Because all of a sudden now it wasn’t a controlled factor as much as it was a communicated factor.

McGill: There is some.

Wright: Or is some of that going away now?

McGill: It was surprisingly little actually. I think because it unfolded incrementally. There was some shock like that. Certainly when we started Station and it became an International Space Station. I mentioned that the MOD position early on was no element ops. If they want to play they come here. That changed gradually where we wound up with an ESA Control Center, a JAXA Control Center, and the Russians sitting in Russia. They won’t come here. We’re interacting with those people. That happened over time. I think it was not as much of a culture shock because it happened over time. The more recent element, when we canceled the Constellation Program and the administration’s decision was to put more emphasis on commercial vendors, which I personally think is a great idea, the reality I think of you can’t own it all yourself anymore started settling in.

You really had two choices. Either you got with the new program or you went and found yourself something else to do. I think it added another layer on modifying the culture to recognize that our future was dependent on our willingness to be a bit part player rather than to own the whole show ourselves.

Wright: Interesting way of looking at that.

McGill: The other side of it, I don’t think MOD was ever in a land grab expedition. It was never that way. This is probably self-centered because I’ve always lived in MOD. But we stepped up to do whatever it took to go fly the mission. Many times we thought it was somebody else’s charter but they didn’t seem to want to do it. It wasn’t that we were out trying to assimilate the world or take things over. It simply evolved that way that we wound up doing it end to end to get it done.

We sit here today for example. We, FOD [Flight Operations Directorate], have equipment positioned at KSC and positioned at White Sands. We don’t really want to manage equipment at those places. We think they should have put it out there for us but they didn’t do it. I don’t think we got there because we were being self-centered. I think it was a natural evolution process that it’s the nature of the culture of the organization is we’re going to get it done. You can do your job, or we’ll do your job for you if you don’t want to do it. But we’re going to get it done. That’s always been the way MOD saw things. I think our success over the years demonstrates that we really do mean it.

Wright: You took that full responsibility and continued the accountability.

McGill: It wasn’t ever that there was a decision made that we were unwilling to share. It was more one focused on the mission objective and the importance of pulling it off in a very effective way that caused us to do the things that we had done. I think that it was less of a culture shock in the strict sense of it but a realization okay, maybe the world out there really does want to play now. We certainly see viable commercial vendors coming along. That hasn’t been true for very long.

It’s really more of an adaptation I think to the situation as we see it. I don’t think there was an awful lot of culture shock. There’s still a few people that are territorial. But that’s always going to be the case. People are very proud of what their organization knows how to do and don’t believe anybody else can possibly know how to do it.

Wright: When you’re responsible for the safety and maintenance of not only the Station but the people that are on it, it makes you feel a little—

McGill: We take that job very seriously. Yes. One of the things I’ve discovered over the years. I’ll say a couple things about it. You walk into things thinking, “Oh, everybody does business the same way we do.” We found out that’s not really true. Give you some examples. Even in the US space program, we had been responsible for these vehicles from the time they left the pad until they landed, splashed down, from end to end. Things like understanding how to do an ascent design, which is a very complex problem that the flight dynamics people take on, to tune all those factors that have to be tuned to make sure you insert this thing exactly in the orbital window you want to put it in, most other parts of NASA have no idea how to do.

The reason is the science community out at JPL [Jet Propulsion Laboratory, Pasadena, California] or Goddard [Space Flight Center, Greenbelt, Maryland], they go buy launch services from somebody. That launch provider figures out how to do that and delivers their payload to them already on orbit. You don’t find knowledge in the Agency on that side of the thing except here, because that was not part of their charter to do that.

Interestingly enough, the way that came down, which people think I’m crazy when I say this, but it’s actually true. Remember when we started this whole thing out it was Mercury. We didn’t fly them out of here. Mercury was flown out of KSC. The start of Gemini was flown out of KSC. The decision to move mission ops here put us into Gemini before we were going to start flying them out of here. We’d already built the Control Center. It was online but wasn’t primed for the flight for Gemini III I believe.

It just so happens at the time that we lifted off Gemini III and of course television was pretty primitive back in those days, the cameras were very insensitive and you had to have huge lighting to see anything, when they fired all the theatrical lights up down there, they brought the power down on the LPS, the Launch Processing System, in KSC. Right at first motion, we were already online. KSC went down. We said, “That’s okay, we got it.” That set the stage for why we took control of the vehicle at first motion. There was no other science in it, nothing else manned or unmanned launched anywhere in the world hands it off to the operations center at first motion. Everyplace else hands it off on orbit.

But people should also remember back in those days from one mission to another things were very dynamic. We were told to get to the Moon and get there quick, and each mission had specific new things that we weren’t real sure we knew how to deal with. So, there was an attitude well, if it works, we’ll just keep doing that. When we took it over from KSC to here on first motion, that worked just fine. That just set the stage, we’ve been doing it that way ever since.

But an artifact of that is we find that the knowledge that exists within MOD is unique compared to other space missions that are flown by NASA. Certainly we find more subtle indications of that. We have crews. The other missions flown by NASA don’t. Obviously the Russians do, Chinese do, some of them. But most of those missions don’t try to sustain what I’ll call a synchronous forward link, a hard forward link between the ground and the spacecraft, all the time. They don’t have anybody to talk to and we do.

We find very subtle differences in how we want to do business operating through TDRSS [Tracking and Data Relay Satellite System] for example or the Deep Space Network or a ground station compared to what anybody else wants to do, because we have people on board to talk to, and they want to be able to talk to us. There are some very subtle things in there that are unique simply because it’s a manned program. I think there’s reason for MOD as an organization to be extremely proud of the breadth of missions that they know how to deal with. You won’t find that anywhere else.

It just so happens that normally the way the functions are allocated they don’t all exist in one place like they do here. Self-serving but it’s true. We are I think a great ally for a commercial vendor that’s trying to get into the business or another country that wants to participate in some great adventure, because we really know a lot of things, and we’ve got a lot of years of learning where the landmines are and where not to step. There’s a great deal of experienced knowledge that exists within the organization that is relevant to pulling these things off.

Wright: Although spacecraft may change and missions may change, that basic knowledge of what—

McGill: The laws of physics do not change. They are the same here as they are in Goddard, as they are in Moscow. There’s certain common factors underneath all this. The nature of the business makes it very dangerous. You have to deal with very large amounts of power that you try to keep under control, but can get away from you very easily. If you’re going to take something fairly massive and accelerate it up to 17,500 miles an hour, it’s going to take a lot of power.

Until somebody can come up with antigravity paint or something like that, this is just a very risky business. You’re dealing with huge amounts of power that you’ve got to control all the way down to the fifteenth decimal place. It’s a funny combination of precision and power that has to always be balanced out. It will get away from you very very quickly.

Wright: Thinking about some of the words that you shared of making sure that the communication of the systems on the ground can tell you what’s going on so far away so that you can prevent and/or anticipate any issues.

McGill: MOD in our classic role I’ve always thought of as being schizophrenic. Our primary job is to go fly a mission. So the programs have defined a set of detailed objectives that they want for this mission and the job is to go fly a mission and produce those results. However, experience will tell you that the spacecraft don’t always behave right. Things break. You wind up in a troubleshooting mode. You wind up doing onboard system management and taking corrective actions mainly to try to protect your ability to meet all those mission objectives.

Because things don’t go to plan—they never do, I don’t think they ever have—you have to be very very fleet of foot, very agile in your ability to replan to try to accomplish everything the program asks you to accomplish, even though it didn’t happen the way it was supposed to happen. It’s always been amazing to me. We spend an awful lot of effort planning these things out. We spend a lot of time running simulations, practicing not only the nominal but even more so the off-nominal. So often the off-nominal case that occurs is not one we practiced. There’s so many more things that can go wrong than go right that I guess just statistically that’s the way it is.

Wright: Is there some times that you remember back through all these years that maybe not just through the transition but the systems that you thought were going to work so well glitched for some reason that you had no idea? You were talking about the incremental changes and how you saved the ascent and descent for the Shuttle as the end, knowing everything was as good as it could go before you went to those critical steps. But were there troubleshooting along the way during those? Or did much of what you planned fall in the place the way that you had perceived it and planned it?

McGill: No system is ever perfect. Certainly if it’s new it’s very imperfect. We will go into operations with MCC-21 with a lot of flaws, many of them known and some of them unknown. It’s always a calculated risk. We’ve never in the history of the Control Center had a system on the floor that had no known anomalies documented. Our DR [Discrepancy Reporting] database, DRTS [Discrepancy Reporting and Tracking System], or now we use a different system, but whatever it is has always got thousands of things in it, most of which are known to us, we have operational work-arounds. You’re used to when you fire up your PC, [Microsoft] Word is not flawless, PowerPoint is not flawless. We learn how to work around them and get the job done anyway.

Obviously the newer the system is, the more likely there are unknowns. Our best protection, the way we try to get enough of a warm fuzzy to sign a CoFR [Certification of Flight Readiness] that we’re willing to stand up to the program and say we’re ready to go fly, is we try to log as many hours as we can testing and running simulations so the probability of having encountered those anomalies is as high as we can make it. Doesn’t keep us from discovering something for the first time when we’re on orbit, but the only way you can mitigate that I think is we try to to the extent possible have more than one way to do things.

You try to make sure that you put in place operational work-arounds that are inherent in the way the thing works. If it can’t get it done the way I wanted it to, it may not be real convenient but there’s another way to do it. To the extent that you can anticipate those kinds of things and have the money to build alternate paths in the system, that’s about the best you can do. But I think we’ve had a history of deploying fairly well by systems.

It was established very early in the program that we were going to operate to a level of “two nines five” reliability. That’s the probability of success or reliability number that says, “Well, what’s the probability of you being able to do your critical functions?” The number we established decades ago was 99.5. So there’s a half percent chance we’ll fail by the way, which we haven’t yet, but we’re due.

It turns out that if you do that then soon as you say two nines five reliability, that means that for every critical function you do you have to be one fault-tolerant. You can’t get there any other way. For all of those things like being able to command the spacecraft, being able to process trajectory and know where it is, being able to view the telemetry, being able to communicate with the crew, all those critical functions, whenever we’re in critical or complex phase, depending on whether you want to use a Shuttle or Station term, we guarantee that we’re two-stringed. We have two separate independent processing strings up. If one of them fails, we switch to the other one. Of course that doesn’t protect you any for common cause problems where both of them have the same flaw in them. But typically even if it is a design flaw, coding flaw in the software, those things that are just flat logic mistakes get discovered very early in the testing process.

The ones that’ll tend to propagate long enough down the timeline to get you past all the simulations and into operations are things that require a very particular time-ordered sequence of events before they can trigger. It’s so unlikely, that’s why you haven’t seen it before. Generally if you switch over and you do it again, the time order is not the same, and the problem probably won’t surface, even if it really is an implementation problem, a design problem, because you reset the roll of the dice.

It’s amazing how often we have seen cases like that where we’re sure after the fact, go back and look at it, we actually have got a code problem in there we got to go find. Sometimes they’re very difficult to find because trying to reconstruct them, get them to happen again, when they’re improbable and very conditional. We do a lot of logging in the system to give us as much data to go back and look to see exactly what was going on when this happened, to try to reconstruct it.

Generally another interesting design I think consideration today is the systems don’t have to be perfect. They have to get the job done. I’m a firm believer in what I call fault-tolerant software. You can implement it in a particular way where yes, there’s a bug in the code, but it’ll still get the job done. Maybe it forces a restart, the computer comes back up again and gets the job done, or maybe you switch over to an alternate path and it gets the job done. At the bottom line, you get the job done, and that’s good enough. That may mean you still want to go look and see if you can figure out why the heck it did that, or maybe not. Maybe if it does it so rarely you just don’t want to spend the money even trying to fix it. Nobody asked us to build a perfect system. They asked us to build a tool to go fly missions successfully.

Wright: On a personal/professional reflection, you’ve spent a good part of your life in creating systems that are beginning as a vision but yet turn into something very practical and reliable. Any point during that time period you thought about going to do something else? I know you started our conversation, before we started recording, that you had retired. But what happened there?

McGill: Yes. I was very happily retired. I retired in 2003. I’d done 35 years and three days. It turns out you only retire on a Friday. Did you know that?

Wright: No, I didn’t.

McGill: I didn’t either. But apparently it’s true. There were a lot of things at the time. There wasn’t much dynamics going here. The second generation system was supporting perfectly. No one was suggesting we needed to do anything about it. I had been off actually doing commercial projects. One of the other guys who was at Lockheed and I teamed up and we started a little company within the company. We were doing things for a variety of commercial companies. I’ll keep them nameless. Doing pretty well at it, but the corporation didn’t much care for us out doing commercial work. They thought themselves in the aerospace business, and I think they were a little afraid of us.

We started backing out of that somewhat because it was not smiled upon by the corporation. It was a lull. There was going to be a change in the retirement policy that the company had about that time. It seemed like the right time. I went ahead and pulled the trigger and retired. I did occasionally go do a little consulting work either back for Lockheed or other places like DARPA [Defense Advanced Research Projects Agency], but really wasn’t looking for anything like a full-time job at all.

When we got into 2006 the Constellation was flaring up big time. Lockheed Martin out here was obviously a major bidder on Orion. They were using a number of the local people to try to work on that proposal effort. Because of the way NASA chose to run that competition that required them to firewall them away. There was a fair amount of work that needed to be covered over on the NASA side that there really wasn’t any resources to cover. That resulted in me getting a phone call saying was I at all interested in doing anything. I got to thinking about it. I said, “Well, engineers always demand symmetry.” That’s a flaw in the engineering mentality. Going beyond low Earth orbit when I started matched up with going beyond low Earth orbit at the end appealed to me. I think once you get the stuff in your blood you can’t get rid of it anyway. So I said, “Well, sure.”

This was in February 2006 and I agreed to work the balance of that fiscal year over at NASA to help them work the Constellation issues so the rest of the Lockheed guys could go off and design Orions. Of course that was 2006 and I’m still here. They haven’t chased me off yet. But the fascination is still there. The opportunities are still there for us to go do great things.

Wright: The technology keeps changing for you to adapt to.

McGill: It does. You can’t get bored. You’re just not watching if you get bored. The tools that we have to work with today are constantly in a state of flux. Many of them very relevant to what we’re trying to do. Also it’s been beneficial that the world at large is much more space-oriented than certainly they were in the ’60s. You have things like the CCSDS [Consultative Committee for Space Data Systems]. It’s an international organization that develops standards for major functions that people need in space. Their objective is if you create a standard that’s internationally accepted then commercial companies will go design products to those standards because they have a marketplace. That’s worked, and that’s provided some tremendous capabilities for us that are not good for anybody that isn’t doing space but they’re commercial products.

What’s available as a tool set to go build these things with is changing a lot. Just the basic technologies, how fast networks can run, compute power that we have in these workstations is phenomenal. There’s such a change in what we can do, the sophistication of the spacecraft. Orion is almost capable of taking care of itself. It has a very powerful set of fault detection and isolation capability on it. I can’t imagine how someone could get bored. They’re just not paying attention. I’ll stay on as long as they’ll let me probably.

Wright: I think one of the questions that I think Blair had sent over was if you were leaving or you were sharing those things for new people coming up, for the next generation coming up. What are some of those aspects or those—I don’t want to use the word lessons learned?

McGill: He’s keying on a comment I made to him when I was standing outside, so you know what triggered this in the first place. Another factor in 2006 when I got the call is I felt like I was fortunate that the taxpayers invested a huge amount of money in my education. Obviously not everything worked exactly great the first time. You learn from that, but it’s very expensive. I looked at it, and I said, “I think it’s payback time. The country invested a lot in me learning what I learned over the years. I think they got something for their money but also it’s time to give back.” Toward the end of last fiscal year it looked like because of the budget pressures for example that probably I was not going to be able to continue to do this. Lockheed was really getting squeezed.

To that end one of the areas that is very important to us and not well covered with the resource base is the communication subsystems, which I happen to know a bit about. I went off and decided I’d teach communication school, because I wanted to make sure if I left I left as much behind as I could. I really feel that way. I’m obviously not going to be able to do this forever, and the next generation needs to pick it up and run with the ball.

One of the guys in the session who was an absolute master with a video recorder and an editor, he videoed three of the four sessions. They’re huge files, about two hours each. It teaches you how communication systems work. We stuck those out there on servers and a lot of people have watched them. I hope they’re getting something out of them. But one of the things that triggered Blair is I feel very strongly that what I know was paid for by American taxpayers, and as much of it as possible needs to get left behind for the next group to take and run with. I really feel that way. This I see as an opportunity to do some of that. There are other topics probably that are like that. But the important thing is that the space program continue to flourish.

Wright: When you do finally leave, what do you feel like is going to be, if you had to pick one or two, the best contributions that you felt like you’ve been able to make?

McGill: I certainly do think that the evolution of the architecture of the Control Center is one. I’m not going to take credit for the MCC-21 but I’ve had my finger in the pie on that one big time. That’s the next generation that owns that one. They need to get credit for that one.

I think that has benefited certainly the Agency at large and the manned program in particular. I feel like it’s been very worthwhile for both sides. Certainly it’s been very good for me. I’ve been able to practice the thing I’m passionate about. Life gets no better than that.

Wright: Rewarding.

McGill: If you get to do what you want to do. I think probably the next wave above that is a topic that I mentioned several times here. It’s not finished. There’s lots to be learned and figured out about how to go do these big systems. There’s opportunities for people to think about it, come up with clever ideas and document them. We’re nowhere near finished in any one of these threads, including the methodologies.

I like to leave that challenge behind for some of these bright people. One of the huge advantages of working out here is this is the smartest bunch of people you’re going to find anywhere on the face of the Earth. I know that to be true because I’ve been a lot of places. You’re not going to find a talent base anywhere close to this anywhere. They’re really really sharp people out here that are capable of great innovation.

The other thing I’d like to leave behind is the challenge to pick that ball up and run with it. There’s plenty of field left ahead of you.

Wright: I know. I think sometimes the generation feels like there’s nothing new to explore or to do.

McGill: It is not true. It is definitely not true. In fact if anything the rate at which things are evolving is accelerating. The opportunities are growing. It’s not simply working with new products. It’s even challenging old techniques.

I had one opportunity in my career that occurred in ’91 when it actually happened I guess, rolling into 1992. The situation we were in was early Space Station. Space Station needed a Control Center. We were using a very classical engineering methodology they call waterfall to go create one. The program was quite unstable. I think twice we came within one vote in Congress of canceling the whole damn program. About the time we’d get through the front end and get a system design spec put together, the whole thing would flip over and you’d go back and start over again. We were executing completely in compliance with the contract, doing what I thought was pretty good work per the contract, but producing nothing of value that was going to go on the floor and go fly missions, because you couldn’t get that far.

I think the reality of that was we were spending a lot of money. The cost to complete was basically not changing from one year to the next, and the schedule was just incrementally sliding out. I had a history of running around complaining about bad processes. I guess that was the reason why they said, “Okay, go fix it.”

They gave me the opportunity in ’91 to go overhaul what was at that time the MSC contract. They didn’t put any bounds on me about changing the statement of work and the DRL [Data Requirements List] and anything else I thought needed to be worked on. I did that and changed a lot of things. In fact some people said I changed everything but the carpet on the floor, which might have been pretty close to the truth. But went to what today we would call an agile process. If I’d thought of that term back then I’d have used it. It’s a perfect term. I called in incremental because I wasn’t smart enough to think of agile.

What it really said is hey, the world is too dynamic. The reason we can’t deliver on this is things don’t hold still long enough to use the long duration waterfall process. We got to build this thing incrementally. We got to eat this big cookie one little bite at a time, even though we’ve never seen the whole cookie at once. It’s all hidden behind a curtain somewhere. I reshuffled everything around so we could have a systems engineering level that was very heads up, keeping track of all the change traffic, and an implementing organization that were very heads down, that were taking each one of these delivery increments defined and going off and doing that without paying any attention to what’s happening to the budgets or the schedules or the politics or anything else going on.

So we shifted to that. It was interesting, because when we made the change we gave the program back $70 million, I think was the number, and we trimmed two years off the schedule. But in reality that’s an inaccurate statement, because like I said, the cost to complete was staying the same, and the schedules were slipping. But the one that was baseline—and we delivered by the way. We met those budgetary commitments and schedule commitments.

The reason I bring that up is it is useful and sometimes very appropriate to be willing to challenge the well rooted processes that are in place. If you can build a compelling case and it makes good sense for why you really ought to do it a different way, then people will listen to you. Obviously NASA contracts accepted all that before we were allowed to go do it. There’s plenty of ways to cause progress to occur. Some of them are deep-rooted technically. Some are more just how it is you go about doing it. But the important thing I think is always not just to say, “Well, it’s been done this way for 20 years, that’s obviously got to be the right way to do it.” The right thing to do is look at the situation around you right then because that’s what’s changing constantly.

The operational requirements are changing. The technologies are changing. The budget situations are changing. The national priorities are changing. You’re really pressed to find something that’s not changing. If you’re not willing to be very agile, very fleet of foot, you’re probably not going to succeed. It’s all about being able to adapt. That means adapt the processes, adapt the systems, adapt all the things that we’re about to succeed.

I got asked soon after we built the thing over there, after I redid the whole contract, the Air Force was on some kind of a summit thing in Colorado Springs. They were trying to figure out what to do with their systems in Cheyenne Mountain was what they were about. I guess somebody had mentioned what I’d been doing here in Houston, so they asked me to come up there and tell them about how it was we built the new Control Center here. I had a chart [in my presentation that showed the context the system engineering process must operate in]. Basically it was showing that the operations guys, are in a state of flux too. One day to the next they’re trying to figure out what the heck they’re going to do.

Even made the comment that I could remember when I would have been the first in the line to go say, “Well, if those ops guys can write a decent ops concept I can sure build a system that would meet up with it.” Now the recognition that they would love to be able to do that but their world won’t hold still either. This whole chart was oriented around if you’re going to do good engineering these days you’re going to have to be constantly aware of all that change traffic out there, and modifying your plan to match up with it.

It was interesting to look around the room, because I didn’t know anybody in the room, I don’t think. You could easily see where the ops guys were sitting in the back, they’re all standing up and cheering. You could see the systems integrator bunch that the Air Force had hired over there because they’re saying, “Well, doesn’t that cause requirements creep?” I remember answering that question saying, “Well, I guess. But what I deliver the ops community likes. So I don’t guess it matters, does it? It satisfies their need. I don’t know whether the requirements crept or not.” Because we started doing it before they were all written down. The requirements were evolving too. I don’t know what requirements creep means anymore.

There are a lot of comfortable things that an engineering community can hide behind like a well formed set of Level As. It’s not in my requirements. I’m not going to do that. Those kinds of things are inappropriate today. The world is not that stable.

Wright: I’m going to ask you one more question and then we’ll let you go. Don’t want to take all your day. You’ve seen a lot of human spaceflight. Is there a time during all these years that you’ve worked around this community that you feel that’s the one you were the gladdest you were around? Special mission or that transition or the new Center opening? Whatever. Is there something that just really sticks out in your mind of all the things that you put up?

McGill: There are probably a lot of those. But interestingly enough, today is probably the one. The reason is I had the advantage of working with some incredibly bright people. I see them taking the reins. I see them doing things that I can’t do, that I don’t know how to do. They’re more contemporary in their knowledge of the technologies, and very talented people. To me there’s nothing more rewarding than that, seeing what a bright future our program has with those kind of people who have a lot of years ahead of them to be able to contribute like that.

Today is probably as good as it could possibly be. Of course you look back and you see some of the things that have happened over time. Certainly the day that we really started transitioning the second generation architecture in place was quite fulfilling to me since I was the thing’s daddy. You always like to see your baby succeed. Certainly if you look at some of those events back in Apollo, every one of those missions was very risky, very challenging. The organization and the tools that were available for them to work with helped us pull off some rather remarkable things I think.

You have trouble finding a year where you can’t find something to be very proud of. I don’t think any exceeds today. Just watching what the team is capable of doing is incredibly rewarding.

Wright: What a great time for you to learn even more.

McGill: It is. One of the things that—I guess you can teach an old dog new tricks. When I came out of retirement, there’s been a lot of things learned since then that I’ve learned. I like to learn. That’s another major pleasure that I get out of this thing. There’s so much in this business, so much complexity in it. Every one of those rocks has got a gem underneath it. You turn it over and find, “Wow, I think I’m going to learn more about that.” There’s some really really interesting things out there to dig into.

Wright: Maybe when MCC-21 is up and running you can come back and visit with us and tell us about the end of that saga.

McGill: I will. We should be totally flying by the end of the year with it. So it’s not far down.

Wright: I look forward to talking with you again then.

McGill: So far we’ve got enough testing with the simulations. The basic behavior that we’re looking for I think is actually there. It’s not finished. These things never are. We’ve been working on the Control Center for 50 years, never have finished it. It’s not so much a point, it’s a vector. You’re really pointing it in a direction, putting it on a vector. Now you continue to build on that off in that direction. There are a lot of things in there. There’s some features that I would have liked to have seen get in there probably day one, but that’s okay.

One of the fascinating things about building it in the first place is with all the change traffic the front office recognized that maybe it was time to do something. They told a few of us that, “Okay, if you had a clean sheet of paper how would you do the Control Center?” Clean sheet exercise.

We all head out for a week. I say all. It was four or five of us head out for a week in an office up there. I remember telling them. I said, “Well, we’ll ask for everything. They’ll give us about 50 percent. We got to be willing to accept about half of what we’re asking for.” That’s the way it usually is. So, we put everything in it that we wanted, and they gave it to us all. They told us, “Just go for it.” The front office very much supported our desire to make wholesale changes to the way the thing was put together and move it in a significantly different direction than where it was set to go.

You really can’t ask for better than that. Most of it got in there, even though there were certainly some things that absolutely had to happen, other things could be put off and it wouldn’t hurt. But everything that had to happen is in there, and a lot of the things that didn’t have to happen right now are also in there. It’s difficult to figure out exactly how you measure the success or failure of one of these things. But I’d declare it a success.

Wright: The future will be fun seeing how it all comes about.

McGill: Yes. I’m excited about like I said the work we’re doing so far for EM-1 and 2 because it is more intercenter. Seeing the great spirit of cooperation and working with the other Centers, I think moving down that path of OneNASA, if you want to use a cliché, can actually happen, where we can really pool the different resources at the Center into one enterprise objective and make it all fit together rather seamlessly.

We can get that started. That’s probably not going to finish while I’m still around. But that’s another nice challenge for the next wave to go take on to pull all the Centers together.

Wright: Exciting times in the future.

McGill: It is. Absolutely. Absolutely.

Wright: Thank you for giving up your whole morning. I appreciate it.

McGill: You’re very welcome. Thank you for the opportunity.

Wright: I’ve enjoyed it. Thank you.

[End of interview]

Return to JSC Oral History Website

Go to NASA home Go to JSC home

Curator: JSC Web Team | Responsible NASA Official: Lynnette Madison | Updated 7/16/2010
Privacy Policy and Important Notices

Information JSC History Portal