Thursday, March 31, 2011

How To Steal Like An Artist (A... (austinkleon.com)



How To Steal Like An Artist (And 9 Other Things Nobody Told Me)
http://www.austinkleon.com/2011/03/30/how-to-steal-like-an-artist-and-9-other-things-nobody-told-me/

---
Sent from Zite personalized magazine iPad app.
Available for free in the App Store.
www.zite.com


Sent from my iPad

Article: Former Teen Stock Swindler Gets 3 Years for New Hack

Former Teen Stock Swindler Gets 3 Years for New Hack
http://feeds.wired.com/~r/wired/index/~3/LALw3YujGuU/?currentPage=all

(via Instapaper)



Sent from my iPad

Article: Microsoft's Odd Couple

Microsoft's Odd Couple
http://www.vanityfair.com/business/features/2011/05/paul-allen-201105?printable=true&currentPage=all


Microsoft's Odd Couple

LIFE 2.0 Paul Allen in his office at the headquarters of his venture-capital firm, Vulcan, in Seattle, Washington.

Adapted from Idea Man, by Paul Allen, to be published this month by Portfolio, a member of the Penguin Group (USA) Inc.; © 2011 by the author.

My high school in Seattle, Lakeside, seemed conservative on the surface, but it was educationally progressive. We had few rules and lots of opportunities, and all my schoolmates seemed passionate about something. But the school was also cliquish. There were golfers and tennis players, who carried their rackets wherever they went, and in the winter most everyone went skiing. I'd never done any of these things, and my friends were the boys who didn't fit into the established groups. Then, in the fall of my 10th-grade year, my passion found me.

My honors-geometry teacher was Bill Dougall, the head of Lakeside's science and math departments. A navy pilot in World War II, Mr. Dougall had an advanced degree in aeronautical engineering, and another in French literature from the Sorbonne. In our school's best tradition, he believed that book study wasn't enough without real-world experience. He also realized that we'd need to know something about computers when we got to college. A few high schools were beginning to train students on traditional mainframes, but Mr. Dougall wanted something more engaging for us. In 1968 he approached the Lakeside Mothers Club, which agreed to use the proceeds from its annual rummage sale to lease a teleprinter terminal for computer time-sharing, a brand-new business at the time.

On my way to math class in McAllister Hall, I stopped by for a look. As I approached the small room, the faint clacking got louder. I opened the door and found three boys squeezed inside. There was a bookcase and a worktable with piles of manuals, scraps from notebooks, and rolled-up fragments of yellow paper tape. The students were clustered around an overgrown electric typewriter, mounted on an aluminum-footed pedestal base: a Teletype Model ASR-33 (for Automatic Send and Receive). It was linked to a GE-635, a General Electric mainframe computer in a distant, unknown office.

The Teletype made a terrific racket, a mix of low humming, the Gatling gun of the paper-tape punch, and the ka-chacko-whack of the printer keys. The room's walls and ceiling were lined with white corkboard for soundproofing. But though it was noisy and slow, a dumb remote terminal with no display screen or lowercase letters, the ASR-33 was also state-of- the-art. I was transfixed. I sensed that you could do things with this machine.

That year, 1968, would be a watershed in matters digital. In March, Hewlett-Packard introduced the first programmable desktop calculator. In June, Robert Dennard won a patent for a one-transistor cell of dynamic random-access memory, or DRAM, a new and cheaper method of temporary data storage. In July, Robert Noyce and Gordon Moore co-founded Intel Corporation. In December, at the legendary "mother of all demos" in San Francisco, the Stanford Research Institute's Douglas Engelbart showed off his original versions of a mouse, a word processor, e-mail, and hypertext. Of all the epochal changes in store over the next two decades, a remarkable number were seeded over those 10 months: cheap and reliable memory, a graphical user interface, a "killer" application, and more.

It's hard to convey the excitement I felt when I sat down at the Teletype. With my program written out on notebook paper, I'd type it in on the keyboard with the paper-tape punch turned on. Then I'd dial into the G.E. computer, wait for a beep, log on with the school's password, and hit the Start button to feed the paper tape through the reader, which took several minutes.

At last came the big moment. I'd type "RUN," and soon my results printed out at 10 characters per second—a glacial pace next to today's laser printers, but exhilarating at the time. It would be quickly apparent whether my program worked; if not, I'd get an error message. In either case, I'd quickly log off to save money. Then I'd fix any mistakes by advancing the paper tape to the error and correcting it on the keyboard while simultaneously punching a new tape—a delicate maneuver nowadays handled by a simple click of a mouse and a keystroke. When I achieved a working program, I'd secure it with a rubber band and stow it on a shelf.

Soon I was spending every lunchtime and free period around the Teletype with my fellow aficionados. Others might have found us eccentric, but I didn't care. I had discovered my calling. I was a programmer.

One day early that fall, I saw a gangly, freckle-faced eighth-grader edging his way into the crowd around the Teletype, all arms and legs and nervous energy. He had a scruffy-preppy look: pullover sweater, tan slacks, enormous saddle shoes. His blond hair went all over the place. You could tell three things about Bill Gates pretty quickly. He was really smart. He was really competitive; he wanted to show you how smart he was. And he was really, really persistent. After that first time, he kept coming back. Many times he and I would be the only ones there.

Bill came from a family that was prominent even by Lakeside standards; his father later served as president of the state bar association. I remember the first time I went to Bill's big house, a block or so above Lake Washington, feeling a little awed. His parents subscribed to Fortune, and Bill read it religiously. One day he showed me the magazine's special annual issue and asked me, "What do you think it's like to run a Fortune 500 company?" I said I had no idea. And Bill said, "Maybe we'll have our own company someday." He was 13 years old and already a budding entrepreneur.

Where I was curious to study everything in sight, Bill would focus on one task at a time with total discipline. You could see it when he programmed—he'd sit with a marker clenched in his mouth, tapping his feet and rocking, impervious to distraction. He had a unique way of typing, sort of a six-finger, sideways scrabble. There's a famous photograph of Bill and me in the computer room not long after we first met. I'm seated on a hard-back chair at the teleprinter in my dapper green corduroy jacket and turtleneck. Bill is standing to my side in a plaid shirt, his head cocked attentively, eyes trained on the printer as I type. He looks even younger than he actually was. I look like an older brother, which was something Bill didn't have.

Getting with the Program

When Bill got the news that he'd been accepted at Harvard University, he wasn't surprised; he'd been riding high since scoring near the top in the Putnam Competition, where he'd tested his math skills against college undergraduates around the country. I offered a word to the wise: "You know, Bill, when you get to Harvard, there are going to be some people a lot better in math than you are."

"No way," he said. "There's no way!"

And I said, "Wait and see."

I was decent in math, and Bill was brilliant, but by then I spoke from my experience at Washington State. One day I watched a professor cover the blackboard with a maze of partial differential equations, and they might as well have been hieroglyphics from the Second Dynasty. It was one of those moments when you realize, I just can't see it. I felt a little sad, but I accepted my limitations. I was O.K. with being a generalist.

For Bill it was different. When I saw him again over Christmas break, he seemed subdued. I asked him about his first semester, and he said glumly, "I have a math professor who got his Ph.D. at 16." The course was purely theoretical, and the homework load ranged up to 30 hours a week. Bill put everything into it and got a B. When it came to higher mathematics, he might have been one in a hundred thousand students or better. But there were people who were one in a million or one in 10 million, and some of them wound up at Harvard. Bill would never be the smartest guy in that room, and I think that hurt his motivation. He eventually switched his major to applied math.

Through the spring semester of 1974, Bill kept urging me to move to Boston. We could find work together as programmers, he said; some local firms sounded interested. We'd come up with some exciting project. In any case, we'd have fun. Why not give it a try?

Drifting at Washington State, I was ready to take a flier. I mailed my résumé to a dozen computer companies in the Boston area and got a $12,500 job offer from Honeywell. If Boston didn't work out, I could always return to school. In the meantime, I'd sample a new part of the country, and my girlfriend, Rita, had agreed to join me. We had grown more serious and wanted to live together as a trial run for marriage. Plus, Bill would be there. At a minimum, we could put our heads together on the weekends.

Rita and I had come to New England knowing two people. One was a brilliant, troubled Lakesider who would insinuate that he was working for the Mafia. Then there was Bill. Rita had roasted a chicken one night for dinner and couldn't take her eyes off him. "Did you see that?" she said after he'd left. "He ate his chicken with a spoon. I have never in my life seen anyone eat chicken with a spoon." When Bill was thinking hard about something, he paid no heed to social convention. Once, he offered Rita fashion advice—basically, to buy all your clothes in the same style and colors and save time by not having to match them. For Bill, that meant any sweater that went with tan slacks.

Each time I brought an idea to Bill, he would pop my balloon. "That would take a bunch of people and a lot of money," he'd say. Or "That sounds really complicated. We're not hardware gurus, Paul," he'd remind me. "What we know is software." And he was right. My ideas were ahead of their time or beyond our scope or both. It was ridiculous to think that two young guys in Boston could beat IBM on its own turf. Bill's reality checks stopped us from wasting time in areas where we had scant chance of success.

So when the right opportunity surfaced, as it did that December, it got my full attention: an open invitation by the MITS company, in Albuquerque, to build a programming language for their new Altair microcomputer, intended for the hobbyist market.

Some have suggested that our Altair basic was remarkable because we created it without ever seeing an Altair or even a sample Intel 8080, the microprocessor it would run on. What we did was unprecedented, but what is less well understood is that we had no choice. The Altair was little more than a bare-bones box with a C.P.U.-on-a-chip inside. It had no hard drive, no floppy disk, no place to edit or store programs.

We moved into Harvard's Aiken Computation Lab, on Oxford Street, a one-story concrete building with an under-utilized time-sharing system. The clock was ticking on us from the start. Bill had told Ed Roberts, MITS'S co-founder and C.E.O., that our BASIC was nearly complete, and Ed said he'd like to see it in a month or so, when in point of fact we didn't even have an 8080 instruction manual.

In building our homegrown basic, we borrowed bits and pieces of our design from previous versions, a long-standing software tradition. Languages evolve; ideas blend together; in computer technology, we all stand on others' shoulders. As the weeks passed, we got immersed in the mission—as far as we knew, we were building the first native high-level programming language for a microprocessor. Occasionally we wondered if some group at M.I.T. or Stanford might beat us, but we'd quickly regain focus. Could we pull it off? Could we finish this thing and close the deal in Albuquerque? Yeah, we could! We had the energy and the skill, and we were hell-bent on seizing the opportunity.

We worked till all hours, with double shifts on weekends. Bill basically stopped going to class. Monte Davidoff, a Harvard freshman studying advanced math who had joined us, overslept his one-o'clock French section. I neglected my job at Honeywell, dragging into the office at noon. I'd stay until 5:30, and then it was back to Aiken until three or so in the morning. I'd save my files, crash for five or six hours, and start over. We'd break for dinner at Harvard House of Pizza or get the pupu platter at Aku Aku, a local version of Trader Vic's. I had a weakness for their egg rolls and butterflied shrimp.

I'd occasionally catch Bill grabbing naps at his terminal during our late-nighters. He'd be in the middle of a line of code when he'd gradually tilt forward until his nose touched the keyboard. After dozing for an hour or two, he'd open his eyes, squint at the screen, blink twice, and resume precisely where he'd left off—a prodigious feat of concentration.

Working so closely together, the three of us developed a strong camaraderie. Because our program ran on top of the multi-user TOPS-10 operating system, we could all work simultaneously. We staged nightly competitions to squeeze a sub-routine—a small portion of code within a program that performs a specific task—into the fewest instructions, taking notepads to separate corners of the room and scrawling away. Then someone would say, "I can do it in nine." And someone else would call out, "Well, I can do it in five!"

A few years ago, when I reminisced with Monte about those days, he compared programming to writing a novel—a good analogy, I thought, for our approach to Altair BASIC. At the beginning we outlined our plot, the conceptual phase of the coding. Then we took the big problem and carved it into its component chapters, from the hundreds of sub-routines to their related data structures, before putting all the parts back together.

By late February, eight weeks after our first contact with MITS, the interpreter (which would save space by executing one snippet of code at a time) was done. Shoehorned into about 3,200 bytes, roughly 2,000 lines of code, it was one tight little BASIC—stripped down, for sure, but robust for its size. No one could have beaten the functionality and speed crammed into that tiny footprint of memory: "The best piece of work we ever did," as Bill told me recently. And it was a true collaboration. I'd estimate that 45 percent of the code was Bill's, 30 percent Monte's, and 25 percent mine, excluding my development tools.

All things considered, it was quite an achievement for three people our age. If you checked that software today, I believe it would stack up against anything written by our old mentors. Bill and I had grown into crack programmers. And we were just getting started.

As I got ready to go to Albuquerque, Bill began to worry. What if I'd screwed up one of the numbers used to represent the 8080 instructions in the macro assembler? Our BASIC had tested out fine on my simulator on the PDP-10, but we had no sure evidence that the simulator itself was flawless. A single character out of place might halt the program cold when it ran on the real chip. The night before my departure, after I knocked off for a few hours of sleep, Bill stayed up with the 8080 manual and triple-checked my macros. He was bleary-eyed the next morning when I stopped by en route to Logan Airport to pick up the fresh paper tape he'd punched out. The byte codes were correct, Bill said. As far as he could tell, my work was error-free.

The flight was uneventful up until the plane's final descent, when it hit me that we'd forgotten something: a bootstrap loader, the small sequence of instructions to tell the Altair how to read the BASIC interpreter and then stick it into memory. A loader was a necessity for microprocessors in the pre-ROM era; without one, that yellow tape in my briefcase would be worthless. I felt like an idiot for not thinking of it at Aiken, where I could have coded it without rushing and simulated and debugged it on the PDP-10.

Now time was short. Minutes before landing, I grabbed a steno pad and began scribbling the loader code in machine language—no labels, no symbols, just a series of three-digit numbers in octal (base 8), the lingua franca for Intel's chips. Each number represented one byte, a single instruction for the 8080; I knew most of them by heart. "Hand assembly" is a famously laborious process, even in small quantities. I finished the program in 21 bytes—not my most concise work, but I was too rushed to strive for elegance.

I came out of the terminal sweating and dressed in my professional best, a tan Ultrasuede jacket and tie. Ed Roberts was supposed to pick me up, so I stood there for 10 minutes looking for someone in a business suit. Not far down the entryway to the airport, a pickup truck pulled up and a big, burly, jowly guy—six feet four, maybe 280 pounds—climbed out. He had on jeans and a short-sleeved shirt with a string tie, the first one I'd seen outside of a Western. He came up to me, and in a booming southern accent he asked, "Are you Paul Allen?" His wavy black hair was receding at the front.

I said, "Yes, are you Ed?"

He said, "Come on, get in the truck."

As we bounced over the city's sunbaked streets, I wondered how all this was going to turn out. I'd expected a high-powered executive from some cutting-edge entrepreneurial firm, like the ones clustered along Route 128, the high-tech beltway around Boston. The reality had a whole different vibe. (On a later trip to Albuquerque, I came down from a plane and got hit in the head by tumbleweed on the tarmac. I wasn't in Massachusetts anymore.)

Ed said, "Let's go over to MITS so you can see the Altair." He drove into a low-rent commercial area by the state fairgrounds and stopped at a one-story strip mall. With its brick façade and big plate-glass windows, the Cal-Linn Building might have looked modern in 1955. A beauty salon occupied one storefront around the corner. I followed Ed through a glass door and into a light industrial space that housed MITS's engineering and manufacturing departments. As I passed an assembly line of a dozen or so weary-looking workers, stuffing kit boxes with capacitors and Mylar circuit boards, I understood why Ed was so focused on getting a BASIC. He had little interest in software, which he referred to as variable hardware, but he knew that the Altair's sales wouldn't keep expanding unless it could do something useful.

When I arrived, there were only two or three assembled computers in the whole plant; everything else had gone out the door. Ed led me to a messy workbench, where I found a sky-blue metal box with ALTAIR 8800 stenciled on a charcoal-gray front panel. Modeled after a popular minicomputer, with rows of toggle switches for input and flashing red L.E.D.'s for output, the Altair was 7 inches high by 18 inches wide. It seemed fantastic that such a small box could contain a general-purpose computer with a legitimate C.P.U.

Hovering over the computer was Bill Yates, a sallow, taciturn string bean of a man with wire-rimmed glasses—Stan Laurel to Ed's Oliver Hardy. He was running a memory test to make sure the machine would be ready for me, with the cover flipped up so I could see inside. Plugged into slots on the Altair bus—an Ed Roberts innovation that was to become the industry standard—were seven 1K static-memory cards. It might have been the only microprocessor in the world with that much random-access memory, more than enough for my demo. The machine was hooked up to a Teletype with a paper-tape reader. All seemed in order.

It was getting late, and Ed suggested that we put off the BASIC trial to the next morning. "How about dinner?" he said. He took me to a three-dollar buffet at a Mexican place called Pancho's, where you got what you paid for. Afterward, back in the truck, a yellow jacket flew in and stung me on the neck. And I thought, This is all kind of surreal. Ed said he'd drop me at the hotel that he'd booked for me, which I'd thought would be something like a Motel 6. I'd brought only $40; I was chronically low on cash, and it would be years before I'd have a credit card. I blanched when Ed pulled up to the Sheraton, the nicest hotel in town, and escorted me to the reception desk.

"Checking in?" the clerk said. "That will be $50."

It was one of the more embarrassing moments of my life. "Ed, I'm sorry about this," I stammered, "but I don't have that kind of cash."

He just looked at me for a minute; I guess I wasn't what he'd been expecting, either. Then he said, "That's O.K., we'll put it on my card."

The following morning, with Ed and Bill Yates hanging over my shoulder, I sat at the Altair console and toggled in my bootstrap loader on the front panel's switches, byte by byte. Unlike the flat plastic keys on the PDP-8, the Altair's were thin metal switches, tough on the fingers. It took about five minutes, and I hoped no one noticed how nervous I was. This isn't going to work, I kept thinking.

I entered my 21st instruction, set the starting address, and pressed the Run switch. The machine's lights took on a diffused red glow as the 8080 executed the loader's multiple steps—at least that much seemed to be working. I turned on the paper-tape reader, and the Teletype chugged as it pulled our BASIC interpreter through. At 10 characters per second, reading the tape took seven minutes. (People grabbed coffee breaks while computers loaded paper tape in those days.) The MITS guys stood there silently. At the end I pressed Stop and reset the address to 0. My index finger poised over the Run switch once again …

To that point, I couldn't be sure of anything. Any one of a thousand things might have gone wrong in the simulator or the interpreter, despite Bill's double-checking. I pressed Run. There's just no way this is going to work.

The Teletype's printer clattered to life. I gawked at the uppercase characters; I couldn't believe it.

But there it was: MEMORY SIZE?

"Hey," said Bill Yates, "it printed something!" It was the first time he or Ed had seen the Altair do anything beyond a small memory test. They were flabbergasted. I was dumbfounded. We all gaped at the machine for a few seconds, and then I typed in the total number of bytes in the seven memory cards: 7168.

"OK," the Altair spit back. Getting this far told me that 5 percent of our BASIC was definitely working, but we weren't yet home free. The acid test would be a standard command that we'd used as a midterm exam for our software back in Cambridge. It relied on Bill's core coding and Monty's floating-point math and even my "crunch" code, which condensed certain words (like "PRINT") into a single character. If it worked, the lion's share of our BASIC was good to go. If it didn't, we'd failed.

I typed in the command: PRINT 2+2.

The machine's response was instantaneous: 4. That was a magical moment. Ed exclaimed, "Oh my God, it printed '4'!" He'd gone into debt and bet everything on a full-functioning micro-computer, and now it looked as though his vision would come true.

"Let's try a real program," I said, trying to sound nonchalant. Yates pulled out a book called 101 BASIC Computer Games, a slim volume that DEC had brought out in 1973. The text-based Lunar Lander program, created long before computers had graphics capability, was just 35 lines long. Still, I thought it might build Ed's confidence. I typed in the program. Yates launched his lunar module and, after a few tries, settled it safely on the moon's surface. Everything in our BASIC had worked.

Ed said, "I want you to come back to my office." Through a flimsy-looking doorway, I took a seat in front of his desk and the biggest orange glass ashtray I had ever seen. Ed was a chain-smoker who'd take two or three puffs, stub the cigarette out, and light the next one. He'd go through half a pack in a single conversation.

"You're the first guys who came in and showed us something," he said. "We want you to draw up a license so we can sell this with the Altair. We can work out the terms later." I couldn't stop grinning. Once back at the hotel, I called Bill, who was thrilled with the news. We were in business now, for real; in Harvard parlance, we were golden. I hardly needed a plane to fly back to Boston.

Micro-manager

In the life of any company, a few moments stand out. Signing that original BASIC contract was a big one for Bill and me. Now our partnership needed a name. We considered Allen & Gates, but it sounded too much like a law firm. My next idea: Micro-Soft, for microprocessors and software. While the typography would be in flux over the next year or so (including a brief transition as Micro Soft), we both knew instantly that the name was right. Micro-Soft was simple and straightforward. It conveyed just what we were about.

From the time we'd started together in Massachusetts, I'd assumed that our partnership would be a 50-50 proposition. But Bill had another idea. "It's not right for you to get half," he said. "You had your salary at MITS while I did almost everything on BASIC without one back in Boston. I should get more. I think it should be 60-40."

At first I was taken aback. But as I pondered it, Bill's position didn't seem unreasonable. I'd been coding what I could in my spare time, and feeling guilty that I couldn't do more, but Bill had been instrumental in packing our software with "more features per byte of memory than any other BASIC we know," as I'd written for Computer Notes. All in all, I thought, a 60-40 split might be fair.

A short time later, we licensed BASIC to NCR for $175,000. Even with half the proceeds going to Ed Roberts, that single fee would pay five or six programmers for a year.

Bill's intensity was nonstop, and when he asked me for a walk-and-talk one day, I knew something was up. We'd gone a block when he cut to the chase: "I've done most of the work on BASIC, and I gave up a lot to leave Harvard," he said. "I deserve more than 60 percent."

"How much more?"

"I was thinking 64-36."

Again, I had that moment of surprise. But I'm a stubbornly logical person, and I tried to consider Bill's argument objectively. His intellectual horsepower had been critical to BASIC, and he would be central to our success moving forward—that much was obvious. But how to calculate the value of my Big Idea—the mating of a high-level language with a microprocessor—or my persistence in bringing Bill to see it? What were my development tools worth to the "property" of the partnership? Or my stewardship of our product line, or my day-to-day brainstorming with our programmers? I might have haggled and offered Bill two points instead of four, but my heart wasn't in it. So I agreed. At least now we can put this to bed, I thought.

Our formal partnership agreement, signed on February 3, 1977, had two other provisions of note. Paragraph 8 allowed an exemption from business duties for "a partner who is a full-time student," a clause geared to the possibility that Bill might go back for his degree. And in the event of "irreconcilable differences," paragraph 12 stated, Bill could demand that I withdraw from the partnership.

Later, after our relationship changed, I wondered how Bill had arrived at the numbers he'd proposed that day. I tried to put myself in his shoes and reconstruct his thinking, and I concluded that it was just this simple: What's the most I can get? I think Bill knew that I would balk at a two-to-one split, and that 64 percent was as far as he could go. He might have argued that the numbers reflected our contributions, but they also exposed the differences between the son of a librarian and the son of a lawyer. I'd been taught that a deal was a deal and your word was your bond. Bill was more flexible; he felt free to renegotiate agreements until they were signed and sealed. There's a degree of elasticity in any business dealing, a range for what might seem fair, and Bill pushed within that range as hard and as far as he could.

Microsoft was a high-stress environment because Bill drove others as hard as he drove himself. He was growing into the taskmaster who would prowl the parking lot on weekends to see who'd made it in. People were already busting their tails, and it got under their skin when Bill hectored them into doing more. Bob Greenberg, a Harvard classmate of Bill's whom we'd hired, once put in 81 hours in four days, Monday through Thursday, to finish part of the Texas Instruments BASIC. When Bill touched base toward the end of Bob's marathon, he asked him, "What are you working on tomorrow?"

Bob said, "I was planning to take the day off."

And Bill said, "Why would you want to do that?" He genuinely couldn't understand it; he never seemed to need to recharge.

Our company was still small in 1978, and Bill and I worked hand in glove as the decision-making team. My style was to absorb all the data I could to make the best-informed decision possible, sometimes to the point of over-analysis. Bill liked to hash things out in intense, one-on-one discussions; he thrived on conflict and wasn't shy about instigating it. A few of us cringed at the way he'd demean people and force them to defend their positions. If what he heard displeased him, he'd shake his head and say sarcastically, "Oh, I suppose that means we'll lose the contract, and then what?" When someone ran late on a job, he had a stock response: "I could code that in a weekend!"

And if you hadn't thought through your position or Bill was just in a lousy mood, he'd resort to his classic put-down: "That's the stupidest fucking thing I've ever heard!"

Good programmers take positions and stick to them, and it was common to see them square off in some heated disagreement over coding architecture. But it was tough not to back off against Bill, with his intellect and foot tapping and body rocking; he came on like a force of nature. The irony was that Bill liked it when someone pushed back and drilled down with him to get to the best solution. He wouldn't pull rank to end an argument. He wanted you to overcome his skepticism, and he respected those who did. Even relatively passive people learned to stand their ground and match their boss decibel for decibel. They'd get right into his face: "What are you saying, Bill? I've got to write a compiler for a language we've never done before, and it needs a whole new set of run-time routines, and you think I can do it over the weekend? Are you kidding me?"

I saw this happen again and again. If you made a strong case and were fierce about it, and you had the data behind you, Bill would react like a bluffer with a pair of threes. He'd look down and mutter, "O.K., I see what you mean," then try to make up. Bill never wanted to lose talented people. "If this guy leaves," he'd say to me, "we'll lose all our momentum."

Some disagreements came down to Bill and me, one-on-one, late at night. According to one theory, we'd installed real doors in all the offices to keep our arguments private. If that was true, it didn't work; you could hear our voices up and down the eighth floor. As longtime partners, we had a unique dynamic. Bill couldn't intimidate me intellectually. He knew I was on top of technical issues—often better informed than he, because research was my bailiwick. And unlike the programmers, I could challenge Bill on broader strategic points. I'd hear him out for 10 minutes, look him straight in the eye, and say, "Bill, that doesn't make sense. You haven't considered x and y and z."

Bill craved closure, and he would hammer away until he got there; on principle, I refused to yield if I didn't agree. And so we'd go at it for hours at a stretch, until I became nearly as loud and wound up as Bill. I hated that feeling. While I wouldn't give in unless convinced on the merits, I sometimes had to stop from sheer fatigue. I remember one heated debate that lasted forever, until I said, "Bill, this isn't going anywhere. I'm going home."

And Bill said, "You can't stop now—we haven't agreed on anything yet!"

"No, Bill, you don't understand. I'm so upset that I can't speak anymore. I need to calm down. I'm leaving."

Bill trailed me out of his office, into the corridor, out to the elevator bank. He was still getting in the last word—"But we haven't resolved anything!"—as the elevator door closed between us.

I was Mr. Slow Burn, like Walter Matthau to Bill's Jack Lemmon. When I got mad, I stayed mad for weeks. I don't know if Bill noticed the strain on me, but everyone else did. Some said Bill's management style was a key ingredient in Microsoft's early success, but that made no sense to me. Why wouldn't it be more effective to have civil and rational discourse? Why did we need knock-down, drag-out fights?

Why not just solve the problem logically and move on?

Logging Off

As we grew, our need for more help became glaring. Neither Bill nor I had a lot of experience as managers, and both of us had other areas of responsibility—Bill in sales, I in software development. Steve Wood had filled in admirably as general manager, but he, too, was a programmer by background. Bill came to see that we needed someone to help him run the business side of things, just as I ran technology. He chose Steve Ballmer, a Harvard classmate who'd worked in marketing at Procter & Gamble and was now studying at Stanford's business school. Bill sold him hard to me: "Steve's a super-smart guy, and he's got loads of energy. He'll help us build the business, and I really trust him."

I had run into Steve a few times at Harvard, where he and Bill were close. The first time we met face-to-face, I thought, This guy looks like an operative for the N.K.V.D. He had piercing blue eyes and a genuine toughness (though, as I got to know him better, I found a gentler side as well). Steve was someone who wouldn't back down easily, a necessity for working well with Bill. In April 1980, shortly before leaving town on a business trip, I agreed that we should offer him up to 5 percent of the company, because Bill felt certain that Steve wouldn't leave Stanford unless he got equity.

A few days later, after returning from my trip, I got a copy of Bill's letter to Steve. (Someone had apparently found it in the office's Datapoint word-processing system, and it made the rounds.) Programmers like Gordon Letwin were furious that Bill was giving a piece of the company to someone without a technical background. I was angry for another reason: Bill had offered Steve 8.75 percent of the company, considerably more than what I'd agreed to.

It was bad enough that Bill had chosen to override me on a partnership issue we'd specifically discussed. It was worse that he'd waited till I was away to send the letter. I wrote him to set out what I had learned, and concluded, "As a result of discovering these facts I am no longer interested in employing Mr. Ballmer, and I consider the above points a major breach of faith on your part."

Bill knew that he'd been caught and couldn't bluster his way out of it. Unable to meet my eyes, he said, "Look, we've got to have Steve. I'll make up the extra points from my share." I said O.K., and that's what he did.

It began in the summer of 1982 with an itch behind my knees at the Oregon Shakespeare Festival, where my parents would take us to see nine plays in seven days when I was in junior high. Not like a rash you got from the wrong soap—this was an agony that had me clawing at myself.

After the itching stopped, the night sweats began. Then, in August, I became aware of a tiny, hard bump on the right side of my neck, near my collarbone. Over the next several weeks, it grew to the size of a pencil eraser tip. It didn't hurt, and I didn't know that any lump near the lymph nodes was a warning sign. I felt as bulletproof as most people under 30; I took my health for granted.

On September 25, doctors at the Swedish Medical Center, in downtown Seattle, performed a biopsy. After I came out of anesthesia, my surgeon entered my room looking grim. "Mr. Allen," he said, "I took out as much as I could, but our initial diagnosis is lymphoma."

Then, good news: they'd caught my disease in Stage 1-A, before it had spread. Early-stage Hodgkin's lymphoma is one of the most curable cancers; I'd drawn a scary card, but hardly the worst. I began a six-week course of radiation, five days a week. Halfway through therapy, my white-cell count dropped so low that they had to stop for several weeks. But by then the tumor was shrinking. There was no guarantee of a cure, and I still felt sick and debilitated, but I began to be encouraged.

After resuming the radiation, I was in Bill's office one day talking about MS-DOS revenues. Our flat-fee strategy had helped establish us in several markets, but I thought we'd held on to it for too long. A case in point: We'd gotten a fee of $21,000 for the license for Applesoft BASIC. After sales of more than a million Apple II's, that amounted to two cents per copy. "If we want to maximize revenue," I said, "we have to start charging royalties for DOS."

Bill replied as though he were speaking to a not-so-bright child: "How do you think we got the market share we have today?" Then Steve came by to weigh in on Bill's side with his usual intensity; it would have been two on one, except I was approximately half a person at the time. (Microsoft later switched to per-copy licensing, a move that would add billions of dollars in revenue.)

Not long after that incident, I told Steve that I might start my own company. I told Bill that my days as a full-time executive at Microsoft were probably numbered, and that I thought I'd be happier on my own.

One evening in late December 1982, I heard Bill and Steve speaking heatedly in Bill's office and paused outside to listen in. It was easy to get the gist of the conversation. They were bemoaning my recent lack of production and discussing how they might dilute my Microsoft equity by issuing options to themselves and other shareholders. It was clear that they'd been thinking about this for some time.

Unable to stand it any longer, I burst in on them and shouted, "This is unbelievable! It shows your true character, once and for all." I was speaking to both of them, but staring straight at Bill. Caught red-handed, they were struck dumb. Before they could respond, I turned on my heel and left.

I replayed their dialogue in my mind while driving home, and it felt more and more heinous to me. I helped start the company and was still an active member of management, though limited by my illness, and now my partner and my colleague were scheming to rip me off. It was mercenary opportunism, plain and simple. That evening, a chastened Steve Ballmer called my house and asked my sister Jody if he could come over. "Look, Paul," he said after we sat down together, "I'm really sorry about what happened today. We were just letting off steam. We're trying to get so much stuff done, and we just wish you could contribute even more. But that stock thing isn't fair. I wouldn't have anything to do with it, and I'm sure Bill wouldn't, either."

I told Steve that the incident had left a bad taste in my mouth. A few days later, I received a six-page, handwritten letter from Bill. Dated December 31, 1982, the last day of our last full year together at Microsoft, it contained an apology for the conversation I'd overheard. And it offered a revealing, Bill's-eye view of our partnership: "During the last 14 years we have had numerous disagreements. However, I doubt any two partners have ever agreed on as much both in terms of specific decisions and their general idea of how to view things."

Bill was right. Our great string of successes had married my vision to his unmatched aptitude for business. But that was beside the point. Once I was diagnosed with Hodgkin's, my decision became simpler. If I were to relapse, it would be pointless—if not hazardous—to return to the stresses at Microsoft. If I continued to recover, I now understood that life was too short to spend it unhappily.

Bill's letter was a last-ditch effort to get me to stay, and I knew he believed he had logic on his side. But it didn't change anything. My mind was made up.

In January, I met with Bill one final time as a Microsoft executive. As he sat down with me on the couch in his office, I knew that he'd try to make me feel guilty and obliged to stay. But once he saw he couldn't change my mind, Bill tried to cut his losses. When Microsoft incorporated, in 1981, our old partnership agreement was nullified, and with it his power to force me to accept a buyout based on "irreconcilable differences." Now he tried a different tack, one he'd hinted at in his letter. "It's not fair that you keep your stake in the company," he said. He made a lowball offer for my stock: five dollars a share.

When Vern Raburn, the president of our consumer products division, left to go to Lotus Development, the Microsoft board had voted to buy back his stock for three dollars a share, which ultimately cost him billions of dollars. I knew that Bill hoped to pressure me to sell mine the same way. But I was in a different position from Vern, who'd jumped to Lotus in apparent violation of his employment agreement. I was a co-founder, and I wasn't leaving to join a competitor. "I'm not sure I'm willing to sell," I countered, "but I wouldn't even discuss less than $10 a share."

"No way," Bill said, as I'd suspected he would. Our talk was over. As it turned out, Bill's conservatism worked to my advantage. If he'd been willing to offer something close to my asking price, I would have sold way too soon.

On February 18, 1983, my resignation became official. I retained my seat on the board and was subsequently voted vice-chairman—as a tribute to my contributions, and in the hope that I would continue to add value to the company I'd helped create.


(via Instapaper)



Sent from my iPad

Ex-cops get long terms in Katrina killing - US news - Crime & courts - msnbc.com

http://www.msnbc.msn.com/id/42358217/ns/us_news-crime_and_courts/


Sent from my iPad

ERIC - Education Resources Information Center

http://www.eric.ed.gov/ERICWebPortal/digitization/digitization.jsp


Sent from my iPad

The Coming Right-Brain Economy: Daniel H. Pink Says the MFA Is the New MBA

http://www.eric.ed.gov:80/ERICWebPortal/search/detailmini.jsp?_nfpb=true&_&ERICExtSearch_SearchValue_0=EJ792609&ERICExtSearch_SearchType_0=no&accno=EJ792609


Sent from my iPad

Wired 13.02: START

http://www.wired.com/wired/archive/13.02/start.html?pg=4


Sent from my iPad

Wired 13.02: START

http://www.wired.com/wired/archive/13.02/start.html?pg=4


Sent from my iPad

Wired 13.02: Revenge of the Right Brain

http://www.wired.com/wired/archive/13.02/brain.html


Sent from my iPad

the white boards: Design Thinking by Scott Hutchinson

http://thewhiteboards.blogspot.com/2011/03/design-thinking-by-scott-hutchinson.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+blogspot%2Fzxwlc+%28the+white+boards%29


Sent from my iPad

Tuesday, March 29, 2011

The History of Adobe Illustrator



Sent from my iPad

Huffington Post: Report: Big Banks Save Billions As Homeowners Suffer

 
Report: Big Banks Save Billions As Homeowners Suffer
 


Sent from my iPad

The Tap Room, The Royce and the Best Bar Snack Ever � Chow Balla

The Tap Room, The Royce and the Best Bar Snack Ever � Chow Balla

Kidnapped Journalists: The New Danger in the Middle East - The Daily Beast

Kidnapped Journalists: The New Danger in the Middle East - The Daily Beast

The Daily Beast

The Daily Beast

The Daily Dish | By Andrew Sullivan

The Daily Dish | By Andrew Sullivan

The Instant iPad App - Recovering Journalist

The Instant iPad App - Recovering Journalist

Content That Works -Spec Page

Content That Works -Spec Page

Content That Works -HOME

Content That Works -HOME

I Need a Great Story: Home

I Need a Great Story: Home

IBM Archives: Nicholas deB. Katzenbach

IBM Archives: Nicholas deB. Katzenbach

This is the guy Dad worked with in San Francisco on the anti-trust case.

IBM Archives: 1981

IBM Archives: 1981

IBM Archives: Valuable resources on IBM's history

IBM Archives: Valuable resources on IBM's history

IBM Archives: Interactive history

IBM Archives: Interactive history

IBM Archives: Interactive history

IBM Archives: Interactive history

IBM Archives: 1982

IBM Archives: 1982

Information Technology Corporate Histories Collection

Information Technology Corporate Histories Collection

National CSS

Nomad

The Birth of the Relational Model - Part 2

The Birth of the Relational Model - Part 2

The Birth of the Relational Model - Part 2 (back to Part 1)


Note - Applied Information Science has copied article from its source at http://www.intelligententerprise.com/9811/frm_online2.shtml in fear that the original posting may be removed. Certain sections have been emphasized and/or commented by the editor.


Last month I began my retrospective review of Codd's first two relational papers -- "Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks" (IBM Research Report RJ599, August 19, 1969) and "A Relational Model of Data for Large Shared Data Banks" (CACM 13, June 1970). In particular, I took a detailed look at the first section of the first paper. Just to remind you, that paper had six sections overall:


1. A Relational View of Data
2. Some Linguistic Aspects
3. Operations on Relations
4. Expressible, Named, and Stored Relations
5. Derivability, Redundancy, and Consistency
6. Data Bank Control.

Some Linguistic Aspects

Codd opened this section with the following crucial observation: "The adoption of a relational view of data ... permits the development of a universal retrieval sublanguage based on the second-order predicate calculus." (Note the phrase "second-order;" the 1969 paper explicitly permitted relations to be defined on domains having relations as elements. I'll come back to this point when I discuss the 1970 paper in detail.)

It was Codd's very great insight that a database could be thought of as a set of relations, that a relation in turn could be thought of as a set of propositions (assumed by convention to be true), and hence, that the language and concepts of logic could be directly applied to the problem of data access and related problems. In this section of the paper, he sketched the salient features of an access language based on such concepts. These features include the following: The language would be set level, and the emphasis would be on data retrieval (though update operations would also be included). Also, the language would not be computationally complete; it was meant to be a "sublanguage," to be "[embedded] in a variety of host languages.... Any [computational] functions needed can be defined in [the host language] and invoked [from the sublanguage]." Personally, I've never been entirely convinced that factoring out data access into a separate "sublanguage" was a good idea, but it's certainly been with us (in the shape of embedded SQL) for a good while now. In this connection, incidentally, it's interesting to note that with the addition in 1996 of the PSM feature (Persistent Stored Modules) to the SQL standard, SQL has now become a computationally complete language in its own right, meaning that a host language is no longer logically necessary (with SQL, that is).

Codd also wrote, "Some deletions may be triggered by others if deletion dependencies ... are declared." In other words, Codd already had in mind in 1969 the possibility of triggered "referential actions" such as CASCADE DELETE (and in the 1970 paper, he extended this notion to include UPDATE referential actions as well). Also, the language would provide symmetric exploitation. That is, the user would be able to access a given relation using any combination of its attributes as knowns and the remaining ones as unknowns. "This is a system feature missing from many current information systems." Quite right. But we take it as a sine qua non now, at least in the relational world. (The object world doesn't seem to think it's so important, for some reason.)

Operations on Relations

This section of the paper provides definitions of certain relational operations; in other words, it describes what later came to be called the manipulative part of the relational model. Before getting into the definitions, however, Codd states: "Most users would not be directly concerned with these operations. Information systems designers and people concerned with data bank control should, however, be thoroughly familiar with [them]." (Italics added.) How true! In my experience, regrettably, people who should be thoroughly familiar with these operations are all too often not so.

The operations Codd defines are permutation, projection, join, tie, and composition (the 1970 paper added restriction, which I'll cover here for convenience). It's interesting to note that the definitions for restriction and join are rather different from those usually given today and that two of the operations, tie and composition, are now rarely considered.

Throughout what follows, the symbols X, Y, ... (and so on) denote either individual attributes or attribute combinations, as necessary. Also, I'll treat the definition of join at the end, for reasons that will become clear in a moment.

Permutation. Reorder the attributes of a relation, left to right. (As I noted last month, relations in the 1969 paper had a left-to-right ordering to their attributes. By contrast, the 1970 paper states that permutation is intended purely for internal use because the left-to-right ordering of attributes is -- or should be -- irrelevant so far as the user is concerned.)

Projection. More or less as understood today (although the syntax is different; in what follows, I'll use the syntax R{X} to denote the projection of R over X). Note: The name "projection" derives from the fact that a relation of degree n can be regarded as representing points in n-dimensional space, and projecting that relation over m of its attributes (m<n) can be seen as projecting those points on to the corresponding m axes.

Tie. Given a relation A{X1,X2,...,Xn}, the tie of A is the restriction of A to just those rows in which A.Xn = A.X1 (using "restriction" in its modern sense, not in the special sense defined below).

Composition. Given relations A{X,Y} and B{Y,Z}, the composition of A with B is the projection on X and Z of a join of A with B (the reason I say "a" join, not "the" join, is explained below). Note: The natural composition is the projection on X and Z of the natural join.

Restriction. Given relations A{X,Y} and B{Y}, the restriction of A by B is defined to be the maximal subset of A such that A{Y} is a subset -- not necessarily a proper subset -- of B.

Codd also says "all of the usual set operations are [also] applicable ... [but] the result may not be a relation." In other words, definitions of the specifically relational versions of Cartesian product, union, intersection, and difference still lay in the future when Codd was writing his 1969 paper.

Now let's get to the definition of join. Given relations A{X,Y} and B{Y,Z}, the paper defines a join of A with B to be any relation C{X,Y,Z} such that C{X,Y} = A and C{Y,Z} = B. Note, therefore, that A and B can be joined (or "are joinable") only if their projections on Y are identical -- that is, only if A{Y} = B{Y}, a condition one might have thought unlikely to be satisfied in practice. Also note that if A and B are joinable, then many different joins can exist (in general). The well known natural join -- called the linear natural join in the paper, in order to distinguish it from another kind called a cyclic join -- is an important special case, but it's not the only possibility.

Oddly, however, the definition given in the paper for the natural join operation doesn't require A and B to be joinable in the foregoing special sense! In fact, that definition is more or less the same as the one we use today.

Let me try to explain where that rather restrictive "joinability" notion comes from. Codd begins his discussion of joins by asking the important question: Under what circumstances does the join of two relations preserve all the information in those two relations? And he shows that the property of "joinability" is sufficient to ensure that all information is thus preserved (because no row of either operand is lost in the join). Further, he also shows that if A and B are "joinable" and either A.X is functionally dependent on A.Y or B.Z is functionally dependent on B.Y, then the natural join is the only join possible (though he doesn't actually use the functional dependence terminology -- that also lay in the future). In other words, what Codd is doing here is laying some groundwork for the all-important theory of nonloss decomposition (which, of course, he elaborated on in subsequent papers).

Remarkably, Codd also gives an example that shows he was aware back in 1969 of the fact that some relations can't be nonloss-decomposed into two projections but can be nonloss-decomposed into three! This example was apparently overlooked by most of the paper's original readers; at any rate, it seemed to come as a surprise to the research community when that same fact was rediscovered several years later. Indeed, it was that rediscovery that led to Ronald Fagin's invention of the "ultimate" normal form, 5NF, also known as projection-join normal form (PJNF).

Expressible, Named, and Stored Relations

According to Codd, three collections of relations are associated with a data bank: expressible, named, and stored sets. An expressible relation can be designated by means of an expression of the data access language (which is assumed to support the operations described in the previous section); a named relation has a user-known name; and a stored relation is directly represented in physical storage somehow.

I do have a small complaint here (with 20/20 hindsight, once again): It seems a little unfortunate that Codd used the term stored relation the way he did. Personally, I would have divided the expressible relations into two kinds, base relations and derivable ones; I would have defined a derivable relation to be an expressible one the value of which, at all times, is derived according to some relational expression from other expressible relations, and a base relation to be an expressible relation that's not derivable in this sense. In other words, the base relations are the "given" ones; the derivable ones are the rest. And then I would have made it very clear that base and stored relations are not necessarily the same thing. (See Figure 1.) As it is, the paper effectively suggests that base and stored relations are the same thing (basically because it doesn't even bother to mention base relations as a separate category at all).





Figure 1. Kinds of relations.

It's true that base relations are essentially the same as stored relations in most SQL products today. In other words, most people think of base relations as mapping very directly to physical storage in those products. But there's no logical requirement for that mapping to be so simple; indeed, the distinction between model and implementation dictates that we say nothing about physical storage at all in the model. To be more specific, the degree of variation allowed between base and stored relations should be at least as great as that allowed between views and base relations; the only logical requirement is that it must be possible to obtain the base relations somehow from those that are physically stored (and then the derivable ones can be obtained, too).

As I already indicated, however, most products today provide very little support for this idea; that is, most products today provide much less data independence than relational technology is theoretically capable of. And this fact is precisely why we run into the notorious denormalization issue. Of course, denormalization is sometimes necessary (for performance reasons), but it should be done at the physical storage level, not at the logical or base relation level. Because most systems today essentially equate stored and base relations, however, there is much confusion over this simple point; furthermore, denormalization usually has the side effect of corrupting an otherwise clean logical design, with well-known undesirable consequences.

Enough of this griping. Codd goes on to say, "If the traffic on some unnamed but expressible relation grows to significant proportions, then it should be given a name and thereby included in the named set." In other words, make it a view! So Codd was already talking about the idea of views as "canned queries" back in 1969.

"Decisions regarding which relations belong in the named set are based ... on the logical needs of the community of users, and particularly on the ever-increasing investment in programs using relations by name as a result of past membership ... in the named set." Here Codd is saying that views are the mechanism for providing logical data independence -- in particular, the mechanism for ensuring that old programs continue to run as the database evolves. And he continues, "On the other hand, decisions regarding which relations belong in the stored set are based ... on ... performance requirements ... and changes that take place in these [requirements]." Here Codd is drawing a very sharp distinction between the logical and physical levels.

Derivability, Redundancy, and Consistency

In this section, Codd begins to address some of the issues that later came to be included in the integrity part of the relational model. A relation is said to be derivable if and only if it's expressible in the sense of the previous section. (Note that this definition of derivability is not quite the same as the one I was advocating above because -- at least tacitly -- it includes the base relations.) A set of relations is then said to be strongly redundant if it includes at least one relation that's derivable (in Codd's sense) from other relations in the set.

The 1970 paper refines this definition slightly, as follows: A set of relations is strongly redundant if it includes at least one relation that has a projection -- possibly the identity projection, meaning the projection over all attributes -- that's derivable from other relations in the set. (I've taken the liberty of simplifying Codd's definition somewhat, although, of course, I've tried to preserve his original intent.)

Codd then observes that the named relations probably will be strongly redundant in this sense, because they'll probably include both base relations and views derived from those base relations. (What the paper actually says is that "[such redundancy] may be employed to improve accessibility of certain kinds of information which happen to be in great demand;" this is one way of saying that views are a useful shorthand.) However, the stored relations will usually not be strongly redundant. Codd elaborates on this point in the 1970 paper: "If ... strong redundancies in the named set are directly reflected ... in the stored set ... then, generally speaking, extra storage space and update time are consumed [though there might also be] a drop in query time for some queries and in load on the central processing units."

Personally, I would have said the base relations should definitely not be strongly redundant, but the stored ones might be (depending -- as always at the storage level -- on performance considerations).

Codd says that, given a set of relations, the system should be informed of any redundancies that apply to that set, so that it can enforce consistency; the set will be consistent if it conforms to the stated redundancies. I should point out, however, that this definition of consistency certainly doesn't capture all aspects of integrity, nor does the concept of strong redundancy capture all possible kinds of redundancy. As a simple counterexample, consider a database containing just one relation, for example EMP { EMP#, DEMP#, BUDGET }, in which the following functional dependencies are satisfied:

EMP# -> DEPT# DEPT# -> BUDGET

This database certainly involves some redundancy, but it isn't "strongly" redundant according to the definition.

I should explain why Codd uses the term strong redundancy. He does so to distinguish it from another kind, also defined in the paper, which he calls weak redundancy. I omit the details here, however, because unlike just about every other concept introduced in the first two papers, this particular notion doesn't seem to have led to anything very significant. (In any case, the example given in the paper doesn't even conform to Codd's own definition.) The interested reader is referred to the original paper for the specifics.

Data Bank Control

This, the final section of the 1969 paper, offers a few remarks on protocols for what to do if inconsistencies are discovered. "The generation of an inconsistency ... could be logged internally, so that if it were not remedied within some reasonable time ... the system could notify the security officer [sic]. Alternatively, the system could [inform the user] that such and such relations now need to be changed to restore consistency ... Ideally, [different system actions] should be possible ... for different subcollections of relations."

This concludes my discussion of the 1969 paper. Next time, I'll turn my attention to that paper's more famous successor, the 1970 paper.


The Birth of the Relational Model - Part 1

The Birth of the Relational Model - Part 1

The Birth of the Relational Model - Part 1 (go to Part 2)


Note - Applied Information Science has copied article from its source at http://www.intelligententerprise.com/9811/frm_online2.shtml in fear that the original posting may be removed. Certain sections have been emphasized and/or commented by the editor.


A look back at Codd's original papers and how the relational model has evolved over time in today's leading database systems

It was thirty years ago today / Dr. Edgar showed the world the way ...
-- with apologies to Lennon and McCartney


Excuse the poetic license, but it was 30 years ago, near enough, that Dr. Edgar F. Codd began work on what would become The Relational Model of Data. In 1969, he published the first in a brilliant series of highly original papers describing that work -- papers that changed the world as we know it. Since that time, of course, many people have made contributions (some of them quite major) to database research in general and relational database research in particular; however, none of those later contributions has been as significant or as fundamental as Codd's original work. A hundred years from now, I'm quite sure, database systems will still be based on Codd's relational foundation.

The First Two Papers

As I already mentioned, Codd's first relational paper, "Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks," was published in 1969. Unfortunately, that paper was an IBM Research Report; as such, it carried a Limited Distribution Notice and therefore wasn't seen by as many people as it might have been. (Indeed, it's since become something of a collector's item.) But the following year a revised version of that first paper was published in Communications of the ACM, and that version was much more widely disseminated and received much more attention (at least in the academic community). Indeed, that 1970 version, "A Relational Model of Data for Large Shared Data Banks," is usually credited with being the seminal paper in the field, though that characterization is perhaps a little unfair to its predecessor. Those first two papers of Codd's are certainly unusual in one respect: They stand up very well to being read -- and indeed, repeatedly reread -- nearly 30 years later! How many papers can you say that of? At the same time, it has to be said that they're not particularly easy to read. The writing is terse and a little dry, the style theoretical and academic, the notation and examples rather mathematical in tone. As a consequence, I'm sure I'm right in saying that to this day, only a tiny percentage of database professionals have actually read them. So I thought it would be interesting and useful to devote a short series of articles to a careful, unbiased, retrospective review and assessment of Codd's first two papers.

As I began to get involved in writing that review, however, I came to realize that it would be better not to limit myself to just the first two papers, but rather to take a look at all of Codd's early relational publications. Over the next few months, therefore, I plan to consider the following important papers of Codd's in addition to the two already mentioned: "Relational Completeness of Data Base Sublanguages;" "A Data Base Sublanguage Founded on the Relational Calculus;" "Further Normalization of the Data Base Relational Model;" "Interactive Support for Nonprogrammers: The Relational and Network Approaches;" and "Extending the Relational Database Model to Capture More Meaning."

One last preliminary remark: I don't mean to suggest that Codd's early papers got every last detail exactly right, or that Codd himself foresaw every last implication of his ideas. Indeed, it would be quite surprising if this were the case! Minor mistakes and some degree of confusion are normal and natural when a major invention first sees the light of day; think of the telephone, the automobile, or television (or even computers themselves; do you remember the prediction that three computers would be sufficient to serve all of the computing needs of the United States?). Be that as it may, I will, of course, be liberally applying the "20/20 hindsight" principle in what follows. Indeed, I think it's interesting to see how certain aspects of the relational model have evolved over time.

Codd's Fundamental Contributions

For reference purposes, let me briefly summarize Codd's major contributions here. (I limit myself to relational contributions only! It's not as widely known as it ought to be, but the fact is that Codd deserves recognition for original work in at least two other areas as well -- namely, multiprogramming and natural language processing. Details of those other contributions are beyond the scope of this article, however.)

Probably Codd's biggest overall achievement was to make database management into a science; he put the field on solid scientific footing by providing a theoretical framework (the relational model) within which a variety of important problems could be attacked in a scientific manner. In other words, the relational model really serves as the basis for a theory of data. Indeed, the term "relational theory" is preferable in some ways to the term "relational model," and it might have been nice if Codd had used it. But he didn't.

Codd thus introduced a welcome and sorely needed note of clarity and rigor into the database field. To be more specific, he introduced not only the relational model in particular, but the whole idea of a data model in general. He stressed the importance of the distinction (regrettably still widely misunderstood) between model and implementation. He saw the potential of using the ideas of predicate logic as a foundation for database management and defined both a relational algebra and a relational calculus as a basis for dealing with data in relational form. In addition, he defined (informally) what was probably the first relational language, "Data Sublanguage ALPHA;" introduced the concept of functional dependence and defined the first three normal forms (1NF, 2NF, 3NF); and defined the key notion of essentiality.

The 1969 Paper

Now I want to focus on the 1969 paper (although I will also mention points where the thinking in the 1970 paper seems to augment or supersede that of the 1969 version). The 1969 paper --which was, to remind you, entitled "Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks" -- contains an introduction and the following six sections:

1. A Relational View of Data

2. Some Linguistic Aspects

3. Operations on Relations

4. Expressible, Named, and Stored Relations

5. Derivability, Redundancy, and Consistency

6. Data Bank Control.

The paper's focus is worthy of note. As both the title and abstract suggest, the focus is not so much on the relational model per se as it is on the provision of a means of investigating, in a precise and scientific manner, certain notions of data redundancy and consistency. Indeed, the term "the relational model" doesn't appear in the paper at all, although the introduction does speak of "a relational view ... (or model) of data." The introduction also points out that the relational "view" enjoys several advantages over "the graph (or network) model presently in vogue: It provides a means of describing data in terms of its natural structure only (that is, all details having to do with machine representation are excluded); it also provides a basis for constructing a high-level retrieval language with Œmaximal [sic] data independence'" (that is, independence between application programs and machine data representation -- what we would now call, more specifically, physical data independence). Note the term "retrieval language," by the way; the 1970 paper replaced it with the term "data language," but the emphasis throughout the first two papers was always heavily on query rather than update. In addition, the relational view permits a clear evaluation of the scope and limitations of existing database systems as well as the relative merits of "competing data representations within a single system." (In other words, it provides a basis for an attack on the logical database design problem.) Note the numerous hints here of interesting developments to come.

A Relational View of Data

Essentially, this section of Codd's paper is concerned with what later came to be called the structural part of the relational model; that is, it discusses relations per se (and briefly mentions keys), but it doesn't get into the relational operations at all (what later came to be called the manipulative part of the model).

The paper's definition of relation is worth examining briefly. That definition runs more or less as follows: "Given sets S1, S2, ..., Sn (not necessarily distinct), R is a relation on those n sets if it is a set of n-tuples each of which has its first element from S1, its second element from S2, and so on. We shall refer to Sj as the jth domain of R ... R is said to have degree n." (And the 1970 paper adds: "More concisely, R is a subset of the Cartesian product of its domains.")

Although mathematically respectable, this definition can be criticized from a database standpoint -- here comes the 20/20 hindsight! -- on a couple of counts. First, it doesn't clearly distinguish between domains on the one hand and attributes, or columns, on the other. It's true that the paper does introduce the term attribute later, but it doesn't define it formally and or use it consistently. (The 1970 paper does introduce the term active domain to mean the set of values from a given domain actually appearing in the database at any given time, but this concept isn't the same as attribute, either.) As a result, there has been much confusion in the industry over the distinction between domains and attributes, and such confusions persist to the present day. (In fairness, I should add that the first edition of my book An Introduction to Database Systems (Addison-Wesley, 1975) was also not very clear on the domain-vs.-attribute distinction.)

The 1969 paper later gives an example that -- at least from an intuitive standpoint -- stumbles over this very confusion. The example involves a relation called PART with (among others) two columns called QUANTITY_ON_HAND and QUANTITY_ON_ORDER. It seems likely that these two columns would be defined on the same domain in practice, but the example clearly says they're not. (It refers to them as distinct domains and then says those domains "correspond to what are commonly called ... attributes.")

Note, too, that the definition of relation specifies that the domains (and therefore attributes) of a relation have an ordering, left to right. The 1970 paper does say that users shouldn't have to deal with "domain-ordered relations" as such but rather with "their domain-unordered counterparts" (which it calls relationships), but that refinement seems to have escaped the attention of certain members of the database community -- including, very unfortunately, the designers of SQL, in which the columns of a table definitely do have a left-to-right ordering.

Codd then goes on to define a "data bank" (which we would now more usually call a database, of course) to be "a collection of time-varying relations ... of assorted degrees," and states that "each [such] relation may be subject to insertion of additional n-tuples, deletion of existing ones, and alteration of components of any of its existing n-tuples." Here, unfortunately, we run smack into the historical confusion between relation values and relation variables. In mathematics (and indeed in Codd's own definition), a relation is simply a value, and there's just no way it can vary over time; there's no such thing as a "time-varying relation." But we can certainly have variables -- relation variables, that is -- whose values are relations (different values at different times), and that's really what Codd's "time-varying relations" are.

A failure to distinguish adequately between these two distinct concepts has been another rich source of subsequent confusion. For this reason, I would have preferred to couch the discussions in the remainder of this series of columns in terms of relation values and variables explicitly, rather than in terms of just relations -- time-varying or otherwise. Unfortunately, however, this type of approach turned out to involve too much rewriting and (worse) restructuring of the material I needed to quote and examine from Codd's own papers, so I reluctantly decided to drop the idea. I seriously hope no further confusions arise from that decision!

Back to the 1969 paper. The next concept introduced is the crucial one --very familiar now, of course -- of a key (meaning a unique identifier). A key is said to be nonredundant if every attribute it contains is necessary for the purpose of unique identification; that is, if any attribute were removed, what would be left wouldn't be a unique identifier any more. ("Key" here thus means what we now call a superkey, and a "nonredundant" key is what we now call a candidate key -- candidate keys being "nonredundant," or irreducible, by definition.)

Incidentally, the 1970 paper uses the term primary key in place of just key. Observe, therefore, that "primary key" in the 1970 paper does not mean what the term is usually taken to mean nowadays, because: a) it doesn't have to be nonredundant, and b) a given relation can have any number of them. However, the paper does go on to say that if a given relation "has two or more nonredundant primary keys, one of them is arbitrarily selected and called the primary key."

The 1970 paper also introduces the term foreign key. (Actually, the 1969 paper briefly mentions the concept too, but it doesn't use the term.) However, the definition is unnecessarily restrictive, in that -- for some reason -- it doesn't permit a primary key (or candidate key? or superkey?) to be a foreign key. The relational model as now understood includes no such restriction.

Well, that's all I have room for this month. At least I've laid some groundwork for what's to come, but Codd's contributions are so many and so varied that there's no way I can deal with them adequately in just one or two articles. It's going to be a fairly lengthy journey.


The Birth of the Relational Model - Part 2


The Birth of the Relational Model - Part 1

The Birth of the Relational Model - Part 1

The Birth of the Relational Model - Part 1 (go to Part 2)


Note - Applied Information Science has copied article from its source at http://www.intelligententerprise.com/9811/frm_online2.shtml in fear that the original posting may be removed. Certain sections have been emphasized and/or commented by the editor.


A look back at Codd's original papers and how the relational model has evolved over time in today's leading database systems

It was thirty years ago today / Dr. Edgar showed the world the way ...
-- with apologies to Lennon and McCartney


Excuse the poetic license, but it was 30 years ago, near enough, that Dr. Edgar F. Codd began work on what would become The Relational Model of Data. In 1969, he published the first in a brilliant series of highly original papers describing that work -- papers that changed the world as we know it. Since that time, of course, many people have made contributions (some of them quite major) to database research in general and relational database research in particular; however, none of those later contributions has been as significant or as fundamental as Codd's original work. A hundred years from now, I'm quite sure, database systems will still be based on Codd's relational foundation.

The First Two Papers

As I already mentioned, Codd's first relational paper, "Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks," was published in 1969. Unfortunately, that paper was an IBM Research Report; as such, it carried a Limited Distribution Notice and therefore wasn't seen by as many people as it might have been. (Indeed, it's since become something of a collector's item.) But the following year a revised version of that first paper was published in Communications of the ACM, and that version was much more widely disseminated and received much more attention (at least in the academic community). Indeed, that 1970 version, "A Relational Model of Data for Large Shared Data Banks," is usually credited with being the seminal paper in the field, though that characterization is perhaps a little unfair to its predecessor. Those first two papers of Codd's are certainly unusual in one respect: They stand up very well to being read -- and indeed, repeatedly reread -- nearly 30 years later! How many papers can you say that of? At the same time, it has to be said that they're not particularly easy to read. The writing is terse and a little dry, the style theoretical and academic, the notation and examples rather mathematical in tone. As a consequence, I'm sure I'm right in saying that to this day, only a tiny percentage of database professionals have actually read them. So I thought it would be interesting and useful to devote a short series of articles to a careful, unbiased, retrospective review and assessment of Codd's first two papers.

As I began to get involved in writing that review, however, I came to realize that it would be better not to limit myself to just the first two papers, but rather to take a look at all of Codd's early relational publications. Over the next few months, therefore, I plan to consider the following important papers of Codd's in addition to the two already mentioned: "Relational Completeness of Data Base Sublanguages;" "A Data Base Sublanguage Founded on the Relational Calculus;" "Further Normalization of the Data Base Relational Model;" "Interactive Support for Nonprogrammers: The Relational and Network Approaches;" and "Extending the Relational Database Model to Capture More Meaning."

One last preliminary remark: I don't mean to suggest that Codd's early papers got every last detail exactly right, or that Codd himself foresaw every last implication of his ideas. Indeed, it would be quite surprising if this were the case! Minor mistakes and some degree of confusion are normal and natural when a major invention first sees the light of day; think of the telephone, the automobile, or television (or even computers themselves; do you remember the prediction that three computers would be sufficient to serve all of the computing needs of the United States?). Be that as it may, I will, of course, be liberally applying the "20/20 hindsight" principle in what follows. Indeed, I think it's interesting to see how certain aspects of the relational model have evolved over time.

Codd's Fundamental Contributions

For reference purposes, let me briefly summarize Codd's major contributions here. (I limit myself to relational contributions only! It's not as widely known as it ought to be, but the fact is that Codd deserves recognition for original work in at least two other areas as well -- namely, multiprogramming and natural language processing. Details of those other contributions are beyond the scope of this article, however.)

Probably Codd's biggest overall achievement was to make database management into a science; he put the field on solid scientific footing by providing a theoretical framework (the relational model) within which a variety of important problems could be attacked in a scientific manner. In other words, the relational model really serves as the basis for a theory of data. Indeed, the term "relational theory" is preferable in some ways to the term "relational model," and it might have been nice if Codd had used it. But he didn't.

Codd thus introduced a welcome and sorely needed note of clarity and rigor into the database field. To be more specific, he introduced not only the relational model in particular, but the whole idea of a data model in general. He stressed the importance of the distinction (regrettably still widely misunderstood) between model and implementation. He saw the potential of using the ideas of predicate logic as a foundation for database management and defined both a relational algebra and a relational calculus as a basis for dealing with data in relational form. In addition, he defined (informally) what was probably the first relational language, "Data Sublanguage ALPHA;" introduced the concept of functional dependence and defined the first three normal forms (1NF, 2NF, 3NF); and defined the key notion of essentiality.

The 1969 Paper

Now I want to focus on the 1969 paper (although I will also mention points where the thinking in the 1970 paper seems to augment or supersede that of the 1969 version). The 1969 paper --which was, to remind you, entitled "Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks" -- contains an introduction and the following six sections:

1. A Relational View of Data

2. Some Linguistic Aspects

3. Operations on Relations

4. Expressible, Named, and Stored Relations

5. Derivability, Redundancy, and Consistency

6. Data Bank Control.

The paper's focus is worthy of note. As both the title and abstract suggest, the focus is not so much on the relational model per se as it is on the provision of a means of investigating, in a precise and scientific manner, certain notions of data redundancy and consistency. Indeed, the term "the relational model" doesn't appear in the paper at all, although the introduction does speak of "a relational view ... (or model) of data." The introduction also points out that the relational "view" enjoys several advantages over "the graph (or network) model presently in vogue: It provides a means of describing data in terms of its natural structure only (that is, all details having to do with machine representation are excluded); it also provides a basis for constructing a high-level retrieval language with Œmaximal [sic] data independence'" (that is, independence between application programs and machine data representation -- what we would now call, more specifically, physical data independence). Note the term "retrieval language," by the way; the 1970 paper replaced it with the term "data language," but the emphasis throughout the first two papers was always heavily on query rather than update. In addition, the relational view permits a clear evaluation of the scope and limitations of existing database systems as well as the relative merits of "competing data representations within a single system." (In other words, it provides a basis for an attack on the logical database design problem.) Note the numerous hints here of interesting developments to come.

A Relational View of Data

Essentially, this section of Codd's paper is concerned with what later came to be called the structural part of the relational model; that is, it discusses relations per se (and briefly mentions keys), but it doesn't get into the relational operations at all (what later came to be called the manipulative part of the model).

The paper's definition of relation is worth examining briefly. That definition runs more or less as follows: "Given sets S1, S2, ..., Sn (not necessarily distinct), R is a relation on those n sets if it is a set of n-tuples each of which has its first element from S1, its second element from S2, and so on. We shall refer to Sj as the jth domain of R ... R is said to have degree n." (And the 1970 paper adds: "More concisely, R is a subset of the Cartesian product of its domains.")

Although mathematically respectable, this definition can be criticized from a database standpoint -- here comes the 20/20 hindsight! -- on a couple of counts. First, it doesn't clearly distinguish between domains on the one hand and attributes, or columns, on the other. It's true that the paper does introduce the term attribute later, but it doesn't define it formally and or use it consistently. (The 1970 paper does introduce the term active domain to mean the set of values from a given domain actually appearing in the database at any given time, but this concept isn't the same as attribute, either.) As a result, there has been much confusion in the industry over the distinction between domains and attributes, and such confusions persist to the present day. (In fairness, I should add that the first edition of my book An Introduction to Database Systems (Addison-Wesley, 1975) was also not very clear on the domain-vs.-attribute distinction.)

The 1969 paper later gives an example that -- at least from an intuitive standpoint -- stumbles over this very confusion. The example involves a relation called PART with (among others) two columns called QUANTITY_ON_HAND and QUANTITY_ON_ORDER. It seems likely that these two columns would be defined on the same domain in practice, but the example clearly says they're not. (It refers to them as distinct domains and then says those domains "correspond to what are commonly called ... attributes.")

Note, too, that the definition of relation specifies that the domains (and therefore attributes) of a relation have an ordering, left to right. The 1970 paper does say that users shouldn't have to deal with "domain-ordered relations" as such but rather with "their domain-unordered counterparts" (which it calls relationships), but that refinement seems to have escaped the attention of certain members of the database community -- including, very unfortunately, the designers of SQL, in which the columns of a table definitely do have a left-to-right ordering.

Codd then goes on to define a "data bank" (which we would now more usually call a database, of course) to be "a collection of time-varying relations ... of assorted degrees," and states that "each [such] relation may be subject to insertion of additional n-tuples, deletion of existing ones, and alteration of components of any of its existing n-tuples." Here, unfortunately, we run smack into the historical confusion between relation values and relation variables. In mathematics (and indeed in Codd's own definition), a relation is simply a value, and there's just no way it can vary over time; there's no such thing as a "time-varying relation." But we can certainly have variables -- relation variables, that is -- whose values are relations (different values at different times), and that's really what Codd's "time-varying relations" are.

A failure to distinguish adequately between these two distinct concepts has been another rich source of subsequent confusion. For this reason, I would have preferred to couch the discussions in the remainder of this series of columns in terms of relation values and variables explicitly, rather than in terms of just relations -- time-varying or otherwise. Unfortunately, however, this type of approach turned out to involve too much rewriting and (worse) restructuring of the material I needed to quote and examine from Codd's own papers, so I reluctantly decided to drop the idea. I seriously hope no further confusions arise from that decision!

Back to the 1969 paper. The next concept introduced is the crucial one --very familiar now, of course -- of a key (meaning a unique identifier). A key is said to be nonredundant if every attribute it contains is necessary for the purpose of unique identification; that is, if any attribute were removed, what would be left wouldn't be a unique identifier any more. ("Key" here thus means what we now call a superkey, and a "nonredundant" key is what we now call a candidate key -- candidate keys being "nonredundant," or irreducible, by definition.)

Incidentally, the 1970 paper uses the term primary key in place of just key. Observe, therefore, that "primary key" in the 1970 paper does not mean what the term is usually taken to mean nowadays, because: a) it doesn't have to be nonredundant, and b) a given relation can have any number of them. However, the paper does go on to say that if a given relation "has two or more nonredundant primary keys, one of them is arbitrarily selected and called the primary key."

The 1970 paper also introduces the term foreign key. (Actually, the 1969 paper briefly mentions the concept too, but it doesn't use the term.) However, the definition is unnecessarily restrictive, in that -- for some reason -- it doesn't permit a primary key (or candidate key? or superkey?) to be a foreign key. The relational model as now understood includes no such restriction.

Well, that's all I have room for this month. At least I've laid some groundwork for what's to come, but Codd's contributions are so many and so varied that there's no way I can deal with them adequately in just one or two articles. It's going to be a fairly lengthy journey.


The Birth of the Relational Model - Part 2