Welcome to the Datatron 205 and 220 Blog

This blog is a companion to T J Sawyer's Web Page that outlines the history of the Burroughs Datatron 205 and Paul Kimpel's incredible 205 and 220 emulators. Please visit those sites and return here to post any comments.

Sunday, August 16, 2015

Magnetic Tape for the retro-205 Emulator

Nothing in the popular concept of computing during the 1950s and 1960s is more iconic than that of a computer room with a line of tall magnetic tape drives, their reels jerking clockwise and counter-clockwise, or spinning rapidly in rewind. Tape has changed a lot over the intervening decades, increasing dramatically in capacity, speed, and reliability, but is now relegated almost entirely to a role as a backup medium. Prior to the widespread availability of inexpensive disk storage, however, tape was mass storage, and a large number of computer applications were designed around its capabilities and limitations.

This is Paul Kimpel again, with another update on the retro-205 emulator. Emulator version 0.05 was released on 15 August and is available from the hosting site at http://www.phkimpel.us/ElectroData/205/webUI/D205.html. The open-source project is hosted on GitHub at https://github.com/pkimpel/retro-205/.

The main feature of this new release is implementation of magnetic tape for the 205. Magnetic tape is required for both Knuth's MEASY assembler and Algol-58 compiler, so completion of this feature has been necessary in order to proceed with reconstruction of those two pieces of software.


Overview of 205 Magnetic Tape


The concept of magnetic storage was originally developed by Vladimir Poulsen of Denmark in 1899. Magnetic tape was derived from magnetic wire recording, both developed by the Germans during the 1920s and 1930s. In the early 1950s, audio recording techniques were extended to magnetic tape video recording, to support the burgeoning television industry. Magnetic tape for data storage started to appear about the same time.

ElectroData was an early pioneer in developing magnetic tape for data storage, producing a design for the Stanford Research Institute's ERMA project in the early 1950s. ERMA was a highly-successful attempt to automate customer checking account operations for Bank of America, producing, among other things, Magnetic Ink Character Recognition and the high-speed reading and sorting of checking account drafts. The ElectroData tape units were used in the prototype computer system SRI built for ERMA, but design and implementation of the production systems was awarded to General Electric, who never quite managed to turn that golden opportunity into a successful computer systems business.

By the time ElectroData had come out with the Datatron model 203 computer, several of their larger customers, particularly those in the insurance business, encouraged ElectroData to develop magnetic tape capability for the system. The result was the Datatron 204, which was a 203 with additional circuitry to support magnetic tape operations.

Initially, magnetic tape support included a Model 543 Tape Control Unit and Model 544 Tape Storage Units (or "DataReaders"). Later, this was extended to include the Model 560 DataFile, which although implemented using magnetic tape technology, actually behaved more like a very slow disk unit. The Tape Control Unit could support up to 10 DataReader and DataFile units in any combination.

Physical and Recording Characteristics

The DataReader was a transport of the open-reel type. It used a 0.75-inch wide tape with a plastic base, wound on metal reels. Each reel could hold up to 2500 feet of tape. Data was recorded in six tracks -- four data tracks encoded as BCD digits, an even-parity track, and a control track that indicated the start of a block. The tape physically supported 12 tracks, however, so data was recorded in two "lanes" along the length of the tape. The 205's tape-search instruction (MTS, 42) provided the ability to switch between lanes; read and write instructions operated on only the currently-selected lane.

Data was recorded on the tape in fixed, 20-word blocks. A full reel of tape could store 10,000 of these blocks. Each word in a block consisted of 11 data digits (including sign) plus a spacer digit -- the same format as on the memory drum. The tape moved at 60 inches/second, recording at 100 bits/inch. In addition to data words, each block had a header that contained the address of that block, numbered 0-9999. A block on tape required 2.4 inches for data, 0.08 inch for the header, and 0.3 inch for an inter-block gap, for a total of 2.78 inches. Read time was 46 ms per block, plus 6 ms start time to bring the tape up to speed at the start of an operation.

The Tape Control Unit supported reading or writing up to 100 blocks with a single instruction. The fixed-size, 20-word block format allowed a drive to overwrite existing blocks on a tape, something that was not practical with the variable-length block scheme of IBM tape drives that later became the de facto standard.

Based on the block numbers recorded in the block headers, the Tape Control Unit could search forward or backward for a specified block number in a specified lane. Standard read and write operations tied up the Processor for their duration, but tape search proceeded independently and asynchronously of the Processor.

Only one read, write, or search operation could take place at a time, although multiple drives could be rewinding simultaneously. Attempting to initiate a read, write, or search while another search was in progress, or to initiate an operation against a drive that was not ready or in rewind, set the overflow toggle in the Processor to indicate the requested operation could not be performed. Programs were required to test for this condition after a tape operation, the same as for any other operation that sets overflow.

Conditioning Tapes

To support block search and overwrite, tapes needed to be formatted, or "conditioned" before use. Conditioning was performed off-line by the Tape Control Unit in concert with one of the drives. The Processor could not use the tape subsystem while conditioning was taking place.

Conditioning worked by first driving the tape forward and writing a constant magnetic pattern on it. When the end of the tape was reached (indicated by a perforated section of tape), the tape reversed direction, and the Tape Control Unit read back the constant pattern, looking for flawed areas on the tape. When an unflawed section long enough to hold a block was detected, the Control Unit wrote a block header with the appropriate block number in it. This process continued until the block number counted down to zero or the beginning of the tape was reached. Only one lane at a time could be conditioned -- the entire process had to be repeated to condition the other lane. Once conditioned, a tape could be written upon and reused any number of times.

The 205 Operating Procedures manual contains a good description of the DataReader and Control Unit, along with descriptions of the steps necessary to load, unload, and condition tapes.

The retro-205 emulator does not require conditioning of tape images, thus does not implement the manual conditioning features of the Tape Control Unit.

Tape Data Transfer and Buffering

The speed of data transfer to and from tape was a real problem for the 205. At 100 digits/inch and 60 inches/second, a word was transferred every two milliseconds to or from the tape drive. The high-speed loops on the memory drum (with a maximum access time of 1.68ms) were fast enough to handle this data rate, but main memory (with a maximum access time of 16.8ms) was not. Obviously, some sort of buffering was needed, and it had to handle I/O operations up to 100 blocks in length.

Discounting the 6ms startup time for tape motion, a 20-word block could be read or written in 46ms. That is a lot longer than the 16.8ms required for a main memory access, but 20 consecutive main-memory accesses would require 336ms. The Block-to-Loop and Block-from-Loop instructions, however, can transfer 20 words to or from main memory and a high-speed loop in a maximum of 1.1 drum revolutions, or about 18.5ms.

Thus, the designers of the magnetic tape subsystem decided to use the high-speed loops as buffers between the tape and main memory and use the existing blocking mechanism to rapidly transfer data between main memory and the loop buffers. Two buffers were needed, so that one could be transferring a block of data to or from the tape, while the other one could be transferring another block from or to main memory.

The designers chose to use the 4000 and 5000 loops for these buffers, in such a way that data transfer always ended with the 4000 loop. Thus if a even number of blocks were to be transferred, the transfer began with the 5000 loop; if an odd number were to be transferred, it began with the 4000 loop.

During a read operation, one buffer would be filling a block from tape while the other buffer was transferring the previous block to main memory. During a write, one buffer would be sending a block to tape while the other buffer was loading the next block from main memory. The buffers would switch roles at the end of each block, alternating between servicing main memory and servicing the tape drive.

As a special case, tape instructions that used a main memory address of 8000 or higher read or stored data the 4000 and 5000 loops directly, and did not transfer data to or from main memory.


Magnetic Tape Instructions


The Datatron 205 Processor supports four instructions that operate on magnetic tape:
  • 0 nnup 40 aaaa: read nn blocks into address aaaa
  • 0 nnup 50 aaaa: write nn blocks from address aaaa
  • 0 llup 42 bbbb: search on lane ll for block bbbb
  • 0 xxup 52 xxxx: rewind unit u
where:
  • aaaa is the memory address at which data transfer into or out of main memory will begin. If the address is 8000 or greater, then data transfer will take place directly to or from the 4000 and 5000 loops and main memory will not be accessed. It is not possible to read or write directly to the 6000 and 7000 loop addresses.
  • bbbb is the zero-relative block address sought by a search command. The system will automatically determine whether a forward or backward search is necessary.
  • ll is the lane number on which to search. On a DataReader unit, which supports only two lanes, only the low-order bit of this value is relevant. DataFile units have 100 lanes, so two full digits are required for a lane number on those drives.
  • nn is the number of consecutive blocks to read or write. A value of 00 indicates 100 blocks.
  • p is the standard breakpoint digit in 205 instructions.
  • u is the logical unit number to which the operation will be directed. Each drive has a selector switch on its panel that designated the unit number to which it will respond.
  • x indicates digits in an instruction word that are ignored.
Note that the lane can be selected only by a search command. Thereafter, read and write instructions will use the last lane selected. Also note that based on the value of the sign digit, the low-order four digits of each instruction word could be modified by the contents of the B register in the usual manner.


Magnetic Tape User Interface


This section describes the user interface for the Control Unit and DataReader as presently implemented in the retro-205 emulator. The DataFile has not yet been implemented in the emulator.

Release 0.05 of the emulator supports three tape drives, identified as MTA, MTD, and MTE. A facility to control the number and kind of peripheral devices on the system is planned for a future release.

Magnetic Tape Control Unit

The Control Unit is the interface between the drives and the 205 Processor and memory. It controlled all tape operations except rewind, which was handled locally by the drive. Since there was only one Control Unit per system, there could be only one tape operation (except rewind) active at a time.

In the emulator, the Control Unit has a small window with a number of lamps, two buttons, and a toggle switch.

Magnetic Tape Control Unit
The physical Control Unit had two additional panels on either end, plus additional buttons and switches, as can be seen from Figure 9-3 in the 205 Operating Procedures manual. The lamps on the two additional panels reflected the status of circuits involved with physical recording on tape, parity generation and checking, and maintenance of the unit. The additional switches and buttons dealt with manipulation of the T register and manual calibration of tapes. None of these have any meaning in the context of the emulator, and thus have been omitted. Of the remaining panels:
  • The two columns of lamps comprising the BZ register in the left-most panel indicate the currently-selected lane for the currently-selected tape unit. The lane is a decimal number displayed in BCD. For DataReader units, which have only two lanes, only the bottom lamp in the second column is relevant. DataFile units have up to 100 lanes, so both columns of lamps are used with those drives.
  • The column of lamps comprising the Z register in the left-most panel indicates the currently selected tape unit. The unit number is also displayed in BCD. A value of zero indicates unit number 10.
  • The four-digit T register in the center panel is used for two purposes: 
    • During search operations, the T register holds the BCD representation of the target block number. Once the desired block is located on the tape, the contents of the T register will be rotated one digit to the right. 
    • During read and write operations, the low-order digit position holds the selected unit number. The next two digits hold the number of blocks to be read or written -- these digits will be counted down as blocks are read or written. The high-order digit will always be zero during read or write operations.
  • The right-most panel holds three columns of five lamps, but only four of these lamps are used by the emulator; the remaining lamps were for maintenance purposes:
    • The S lamp indicates a search operation is in progress.
    • The R lamp indicates a write (record) operation is in progress.
    • The B lamp indicates backward tape motion.
    • The F lamp indicates forward tape motion.
The CLEAR button clears the internal state of the Control Unit, including all of the lamps on the panels. This button should seldom be needed in the emulator.

The NORMAL/SUPPRESS B switch controls how sign digits of incoming words are interpreted during read operations. In the NORMAL position, sign digits with the second (2) bit set will cause the current value of the Processor's B register to be added to the word before it is stored in memory. The 2-bit is also reset before the resulting word is stored. In the SUPPRESS B position, no action is taken based on the value of the sign digits, and they are stored with their words as-is. SUPPRESS B is normally used when loading programs from tape, to prevent altering the sign digits in words for Cardatron format bands. For normal operations, this switch should be left in the **SUPPRESS B** position.

The DISABLE button stopped all tape movement as long as it was depressed. This could be used, for example, to interrupt a search for a non-existent block. When this button is clicked in the emulator, it aborts the current tape read, write, or search operation. It has no effect on tape rewind.

Magnetic Tape Storage Unit (DataReader)

The DataReader handles open-reel tapes. In the emulator, these units are physically labeled MTA through MTJ. Each unit has a separate window in the browser:

Tape Storage Unit (DataReader)

The POWER lamp, STOP REWIND button, and the ON/OFF and FORWARD/OFF/REVERSE toggle switches from the physical DataReader have been eliminated, as they have no function in the emulator. Two new buttons have been added to enable loading and unloading of tape images in the emulator version of the drive.

The DESIGNATED lamp lights when this unit is currently selected for a tape operation. Except for rewind, it remains lit while the operation is in progress.

Clicking the REWIND button while the unit has a tape mounted and is in local status will rewind the tape to its load point (BOT). The button is inactive when no tape is loaded or the unit is in remote status.

The pull-down DESIGNATE list determines the logical unit number of the drive. This is the unit number that tape instructions in the Processor reference. No two drives should have the same logical number at the same time. If two or more drives are assigned the same number simultaneously, tape operations against that unit number will act as if the unit is not ready. Unit numbers are initially assigned based on the drive's physical identifier, A=1, B=2, etc. [Update 2017-04-15] Unit numbers and switch settings except for the REMOTE/LOCAL switch are now persisted in the system configuration.

The REMOTE/LOCAL switch controls the status of the drive. If a tape is mounted, placing this switch in the REMOTE position will place the drive in remote status and make it ready; otherwise it is in local status and not-ready. Tape operations against a not-ready unit will result in the Processor's Overflow toggle being set.

The NOT READY lamp indicates the remote (ready) or local (not-ready) status of the drive.

The REWIND-READY/REWIND-NOT READY switch determines the status of the drive after a rewind operation completes. In the REWIND-READY position, the drive will be placed in remote (ready) status after rewind completes, provided the LOCAL/REMOTE switch is in the REMOTE position. Otherwise, the drive will be in local (not-ready) status after a rewind.

The NOT WRITE lamp will be lit when the NOT WRITE/WRITE switch is in the NOT WRITE position. This indicates any tape that is loaded is write-protected.

The NOT WRITE/WRITE switch controls whether a tape can be written upon by the drive. In the NOT WRITE position, the NOT WRITE lamp illuminates and the tape is write-protected. Write operations will cause the tape to move the designated number of blocks, but data on the tape will not be overwritten.

The progress meter at the bottom of the window shows the relative amount of tape left on the supply reel. As the tape moves forward, the meter bar will diminish towards the left; as the tape moves backward, the bar will increase towards the right

Above the progress meter is the name of the tape image file currently loaded into the drive, if any. If a blank tape has been loaded, this will read simply "(blank tape)".

Loading and Unloading Tape Image Files

The LOAD button is used to mount a reel of tape, in the form of a tape-image file, into the DataReader. This button is active only when no tape is loaded to the drive. Clicking this button will open a tape-loader dialog box:

Tape Loader Dialog Window

To load a blank tape into the drive, simply click the OK button. To load an existing tape-image file, use the file picker on the window to select the file, then click the OK button. All tape images are treated as if their reels had been calibrated to hold 10,000 blocks (numbered 0-9999). Blank tapes are implicitly calibrated. The emulator does not support manual calibration of tapes, nor tapes containing other than 10,000 blocks.

When a tape is loaded into the drive, a small tape-reel icon will display on the right side of the tape drive window. This icon will rotate as tape commands move the tape in the drive. Below the icon are a set of orange annunciators that indicate the status of the tape image:
  • UNLOADED: no tape is mounted in the drive.
  • AT BOT: a tape is loaded and is positioned at the tape's load point.
  • AT EOT: the tape is now positioned at physical end-of-tape. No further forward motion is possible. Either a backward search command must be executed, or a manual or programmatic rewind must be done.
  • REWINDING: the tape is currently rewinding.
The UNLOAD button dismounts a tape from the drive. This button is inactive unless a tape is mounted, the drive is in local (not-ready) status, and the tape is positioned at its load point (BOT).

If the tape has been written upon while it has been mounted, clicking this button will display a dialog box asking if you want to save the tape image data. If you respond by clicking the dialog's OK button, the emulator will render the contents of the tape as lines of ASCII text and display them in a new, temporary window. You may save the contents of this window to your local file system, or copy/paste the data into another program. When you are finished with the window, simply close it.

Format of Tape Image Files

Tape images are maintained externally as simple ASCII text files. Lines in the file may be delimited by carriage-return followed by line-feed, carriage-return only, or line-feed only. Empty lines (those containing no characters or only spaces) are ignored. Each non-empty line of text in the file represents two blocks, one in each lane. Words in the blocks are represented as hexadecimal numbers with BCD encoding, which you can think of simply as decimal numbers. Words within a block are delimited by commas.

The first 40 comma-delimited numbers on a line will be converted to the tape drive's internal format. The first 20 words will become the contents of lane 0; the second 20 words will become the contents of lane 1. If a line contains fewer than 40 numbers, it will be implicitly extended to 40 with zeroes. If a line contains more than 40 numbers, those after 40 will be ignored. Two adjacent commas imply a zero word between them. Spaces before or after the text between commas are ignored.

When loading an existing tape image file, parsing of the lines in the file proceeds as follows:
  1. Each line is extracted in turn from the file. If the line is empty, it is ignored and the load proceeds to the next line of the file.
  2. The text of the line is separated into individual strings of text at the point they are delimited by commas.
  3. Any leading or trailing spaces in each string are discarded.
  4. Each resulting string is converted to a numeric value by the Javascript parseInt(...,16) function. This function will convert any leading hexadecimal characters (0-9, A-F) to internal numeric form. The number may optionally be prefixed by a hyphen ("-") to indicate it is negative. Any trailing, non-hexadecimal characters in the string are ignored.
  5. Each converted number is reduced to 11 hexadecimal digits by discarding any higher-order digits. If the number was prefixed by a hyphen, the low-order bit of the sign (11th) digit is set to one. Negative values may be indicated either by preceding the number with a hyphen, or setting the low-order bit of the 11th (highest order) digit to one, or both.
If a tape-image file contains fewer than 10,000 non-empty lines, the emulator will extend the drive's internal tape image to 10,000 blocks when it is loaded by appending blocks of zero words in each lane. If a file contains more than 10,000 lines, the extraneous blocks are ignored. A blank tape image initially consists entirely of blocks of zero words.

When unloading a tape image, the emulator will trim all blocks consisting entirely of zero words from the end of the tape-image text it creates. It will also trim zero words from the end of each line. If the blocks from both lanes are entirely zero, the unloaded image will have only a single comma rendered on that line. These trimming actions minimize the amount of redundant text in the resulting tape image file.

This format has been designed to create and maintain tape image files with a simple text editor. Most spreadsheet programs will read and write comma-delimited text files, so they can be used as well to view and edit tape image files. Note that if a block in lane 1 is non-zero, the corresponding block in lane 0 must be padded as necessary in the image file out to a full 20 words.


Other Changes and Enhancements in Release 0.05


In addition to the implementation of magnetic tape, this release contains the following  enhancements and corrections:
  1. Addition and subtraction of words having signs other than zero or one now works properly.
  2. The algorithm that compensates for timer deviation in the setCallback() mechanism has been significantly improved. In addition, separate timing categories have been established for main memory and the high-speed loops.
  3. The handling of blocking instructions for the high-speed loops has been changed to accommodate the way they worked on systems with magnetic tape. In addition, BTn instructions with an operand address of 8000 or greater will now clear the designated loop to zeroes instead of transferring words from the mod-4000 congruent address.
  4. The INPUT SETUP and GENERAL CLEAR buttons have been implemented on the Cardatron Control Unit. GENERAL CLEAR functions the same as the CLEAR buttons on the Control Console and Supervisory Panel. INPUT SETUP presets the Processor registers to execute a card-read instruction for input unit 1 when the Processor is next started or stepped.
  5. Unit numbers for Cardatron output units have changed: the card punch is unit 1 and the line printer is unit 3. These are the unit numbers required by MEASY and the Algol-58 compiler.
  6. On the text panes for Cardatron output units, double-clicking anywhere in the text area will now open a new, temporary window, copy the text from the pane to that new window, and clear the text in the pane. This is an additional method to save line printer or card punch output, and is consistent with the method used on the Flexowriter and paper-tape punch.
  7. The overly-generic END OF SUPPLY and RUNOUT SUPPLY button captions on Cardatron output units have been changed to reflect "paper" for line printers and "cards" for card punch devices.
  8. The ALGOL GLYPHS option on Cardatron output units now works properly.
  9. The large, round knobs on the Control Console and Supervisiory Panel now work differently. Previously, when a knob that was in its last position was clicked, it would jump to its first position. Now, the knob reverses direction, and subsequent clicks move it in that reverse direction. This is more in keeping with how real knobs work.
  10. The Supervisory Panel window is now opened last, to avoid timing problems with the opening of the Cardatron and magnetic tape unit windows.
  11. The ability for the emulator to run off-line in the browser's "application cache" has been reinstated.
  12. Minor changes to the home window were made to accommodate the project's move from Google Code to the GitHub hosting service.
Future plans for the emulator include implementation of the magnetic-tape DataFile unit and a configuration facility to allow users to customize the number and kind of peripheral devices for their 205 emulator.

Saturday, August 8, 2015

Knuth's EASY Assembler

It is difficult to appreciate today, but there was a time when compilers were unheard of and assemblers were held in suspicion. In the late-1940s and early/mid-1950s, Real Programmers wrote raw, seething machine code. At first they did this because assemblers did not yet exist, but later because, well, they were Real Programmers, and things like mnemonic operator codes, symbolic addresses, resolution of forward references, and ease of modification were for sissies.

This is Paul Kimpel again, with an update on the retro-205 emulator and progress towards reconstruction of Donald Knuth's Algol-58 compiler for the ElectroData/Burroughs Datatron 205. This post focuses on the assemblers for the 205 he wrote to develop that compiler.


Origin of the EASY Assembler


In 1960, Knuth contracted with Burroughs Corporation in Pasadena, California, to write an Algol compiler for the 205. The specification for Algol 60 had not yet been published, so this compiler was to be for the preliminary version of the language, known as Algol-58.

The work was to be done that summer, between the time of Knuth's graduation from Case Institute of Technology (now Case Western Reserve University) in Cleveland, Ohio, and his start of graduate studies at the California Institute of Technology in Pasadena. Knuth discusses the project briefly in his Oral History for the Computer History Museum. The transcript from the 1985 Burroughs B5000 Conference, as well as Richard Waychoff's Stories of the B5000... memoir, also mention the project.

At that point in time, compilers had to be written in assembly language, although the B5000 project was to turn that notion on its head just a year or so later. Burroughs supported an assembler for the 205, named "STAR 0". There was also the "Shell" assembler developed by Shell Research. For reasons that are not clear, Knuth decided to use neither of these, and instead to write his own assembler for the 205, calling it EASY, the "Elegant Assembly System for the 205."

In notes for a later variant of this assembler, MEASY (described later in this post), Knuth indicated that the "elegant" in the title should be taken in its mathematical connotation of "short and sweet." Indeed, EASY is short -- the entire program is only 569 lines, of which more than half represent the constant data to initialize the symbol table and Cardatron format bands. The executing code, including the run-time loader the assembler emits, is little more than 250 instructions.

Reconstructing the Algol-58 compiler is a major goal of this project, but in order to do that, we first needed to reconstruct either EASY or MEASY. Eventually I did both of them. I am presently using MEASY for the compiler, but MEASY requires a tape drive. Magnetic tape has only recently been implemented in the retro-205 emulator, and as I write this, has not yet been officially released. When I started working on the assemblers back in March of this year, all I had working were card and paper tape devices. EASY is a card-based assembler, so I started with it, and later used EASY to bootstrap MEASY.


Using EASY


EASY is easy to use. You place its load deck in the reader followed by the source deck to be assembled, press the INPUT SETUP button on the Cardatron Control Unit to initialize the processor to load from cards, press START on the reader, and then either START on the Supervisory Console or CONT on the Control Console. Assembly terminates when EASY senses an END pseudo-instruction in the card deck.

EASY reads its source from cards and punches its output to cards. It does not generate a printed listing. The 205's Cardatron subsystem controls the card equipment, which was typically an IBM 089 collator for input and IBM 523 punch for output. As described in an earlier post, the Cardatron requires a numeric punch in a selected column of each card to specify the format band under which the card will be interpreted. The format band determines how columns from the card will be translated into decimal words before being sent to the 205 memory drum. Thus, EASY requires a "1" punch in the first column of each of its source cards to select input format band 1. That format band is part of the assembler object code, and is loaded to the Cardatron by EASY during its initialization.

Interestingly, each input card generates one output card. The left side of an output card holds the original source text; the right side holds the generated object code. EASY copies the "1" punch into the first column of the output card, but also places a "6" punch in the second column. Format band 6 is hard-wired in the Cardatron. It is used to load all-numeric data, and in particular, object code. Therefore, by changing the setting in the Cardatron Input Unit to select format from column 1, you read the output deck back in as source for another assembly. By changing the format selection to column 2, you load the object code from the deck to run the program. Sweet.

The numeric word in columns 70-80 of an output card is the assembled word to be loaded into memory. The numeric word in columns 59-69 is a card read instruction with a sign digit of 6. The Cardatron reads data backwards from the end of the card, thus under format band 6, the word in columns 70-80 is read first and stored at the effective memory address for the current card-read instruction. Since the second word read from columns 59-60 has a sign of 6, it is interpreted as an instruction to be executed rather than data to be stored. That sign value terminates the data transfer from the current card (ignoring the remainder of the card) and executes the new card-read instruction to read the next card into the effective address of that instruction.

Thus, when reading the output deck under format band 6, each card bootstraps the next one and specifies the address at which the object word on the next card will be loaded. Programs generally begin with a START pseudo-operator to specify the initial assembly address. Since a START operator does not generate a word of object code, the output card for it has the bootstrapping card-read instruction for the next card in columns 70-80, serving to set the address for the word to be read from the next card. Hence, it does not matter what address is used for the initial card-read instruction to load the deck -- the executable instructions embedded in the deck will relocate to the correct address automatically.

For a more detailed example of how loading code with format band 6 works, see this description of a short card-load program that clears the 205 memory drum.


Programming with EASY


Traditionally, assembly languages specify one computer instruction per line. Most assemblers require at least three fields on each line:
  • A field for a symbolic label. This symbol becomes defined as the memory address at which that line is assembled.
  • A field for an operation code. This holds a mnemonic for the type of instruction or data on that line. Many assemblers also have pseudo-instructions to control operation of the assembler. Lines with pseudo-instructions generally do not generate object code.
  • A field for the operand address or addresses used by the instruction. Usually this references the symbolic label for some other line in the program. A major task of an assembler is to assign addresses to these symbolic labels as they are defined in the first column of lines in the program, and to apply those addresses to the operand fields of other lines where the labels are referenced. Most assemblers also support some sort of "address arithmetic" for the operand addresses, e.g., specifying an address so many locations before or after that for a symbolic label.
  • Any remaining text on the line after the operand(s) is generally ignored, and considered to be a comment.
EASY follows this general idea, but uses more columns to simplify the parsing and processing of instructions. In particular, the specification of operand addresses is spread across four fields, as discussed below. With the exception of the START and END pseudo-instructions, each line of input generates one word of assembled object code. There is no provision for "comment lines" as with most assemblers.

Input cards to EASY must have the following format:
1       10        20        30        40        50        60
.........|.........|.........|.........|.........|.........|
1   P LLLLL SCCCCIIAAAA OOOOO YYYYY PP T XXXXXXXXXXXXXXX
Column 1: must contain a "1" to satisfy the Cardatron's requirement for format band selection.

Column 5: Program-point label: Program points can be thought of as local, reusable labels. Programs typically have numerous references to nearby locations, especially in branch instructions to implement control flow. These locations are often referenced only once or twice, so rather than require a globally-unique symbol for each of these briefly-used references, program points provide short labels the programmer can reuse multiple times throughout a program, but in different local scopes. EASY provides nine program-point symbols, 1-9. Placing one of these in column 5 labels that line with that program point. Other lines can then refer to the nearest program point before or after their location, but only the nearest one -- see the discussion for columns 37-38 below. Once a program-point label is reused, earlier definitions of it fall out of scope.

A number of assembly languages for Burroughs machines supported program-point labels. I first encountered them in the assembler for the B3500, and always wondered where they originated. In researching the Shell assembler, I found this footnote in its reference manual:1
The idea of the program point address is apparently that of Melvin Conway of the Case Institute of Technology. It was incorporated into SOAP III, an assembler for the IBM 650, by Donald E. Knuth, also of Case Institute. See Case Institute Report 1005.
This explains why program points were in EASY, but also note that the Shell assembler predated EASY by at least two years.


Columns 7-11: Symbolic location label: the traditional label field for an assembly line. Labels defined in this field must be unique within the program. Labels are limited to five characters, since the 205 stored five alphanumeric characters per word.

Columns 13-23: Numeric word: this field can be used for several purposes, including specifying literal values, relative addresses, and additional digits in instruction words. The exact effect depends on what is specified in other fields on the line. To understand this, the 11-digit 205 instruction word can be thought of as containing four sub-fields: SCCCCIIAAAA, where S is the sign digit, CCCC are the so-called "control digits" used to specify breakpoints and by some I/O instructions for unit selection and control, II is the two-digit operation code, and AAAA is the operand address. Any values in these sub-fields on the card become the default values in corresponding fields of the assembled word, with blanks being interpreted as zeroes. The operation code (II) and operand address (AAAA) fields can be modified by other fields on the line, as discussed in the following.

Columns 25-29: Symbolic operation code: If this field is non-blank, the value of the symbol in that field, modulo 100, replaces the II digits (columns 18-19) in the assembled instruction word. The symbol table in EASY is initialized with a set of operation mnemonics and their corresponding numeric codes. Knuth chose a set of mnemonics somewhat different from those used by other 205 assemblers, apparently to be consistent with the assembler mnemonics for the Burroughs 220, a later machine with a very similar architecture:

EASY & MEASY Assembler Operation Codes

In addition, there are two pseudo-operators, START, the operand address for which specifies the next location at which instructions will be assembled, and END, which indicates the end of the program. The operand address for END specifies the location at which the loader routine will be placed during program load.

Columns 31-35: Symbolic operand address: If this field is non-blank, it modifies the operand address digits (AAAA, columns 20-23, which are most often blank or zero), in the following ways:
  • If the symbolic label has been previously defined (i.e., its address is already known), that address is added to the value for AAAA, modulo 10,000. This is a way to address a location relative to that of a symbolic label. To address relatively in a negative direction, the value in AAAA must be the 10s-complement of the desired relative offset (e.g., 9999 equates to -1).
  • If the symbolic label has not been previously defined (i.e., it's a forward reference), the value in AAAA is replaced by a link to any previous instructions referencing the same forward label. The reason for this forward-reference linkage is that EASY is a one-pass assembler. It does not resolve forward references at assembly time. Instead, such references are linked backwards through the address fields of all those instructions referencing a given forward address, with a zero address terminating the chain. At the end of the assembly, EASY emits a short loader routine and a table of head addresses for the forward-reference chains. Using this table, the loader works backward through the forward-reference chains and resolves the forward references each time the program is loaded.
Columns 37-38: Program-point address: This field is ignored if the symbolic operand address in columns 31-35 is non-blank. Otherwise, the first column in this field is interpreted as a numeric digit, and the second column indicates how that digit will modify the value of AAAA (which again is most often zero), thus:
  • n+: The current location address plus n words is added to the value in AAAA, modulo 10,000.
  • n-: The current location address minus n words is added to the value in AAAA, modulo 10,000.
  • nB: The location address of the immediately-prior program-point label n is added to the value in AAAA, modulo 10,000.
  • nF: The location address of the immediately-next program-point label n will eventually replace AAAA when it is resolved by the loader, as described above for forward symbolic operand addresses.
Column 40: High-speed loop tag: This column, if non-blank, must be in the range 4-7. When present, this digit eventually replaces the high-order digit in the address that is finally resolved into the AAAA field, causing that address to reference a location in one of the four, 20-word, high-speed loops on the 205 memory drum. A common 205 coding practice was to assemble instructions into locations in main memory and then transfer them using one of the "blocking" instructions to a high-speed loop for execution, where they would run almost ten times faster. Use of the loop tag allows instructions to be coded as if they were going to run from main memory, but the tag modifies the operand address to be relative to the loop where the code will actually run.

Columns 42-56: Remarks: These columns are ignored by the assembler, and can be used for comments.

Columns 59-80: Loadable code: These columns are ignored by the assembler on input cards. On output cards, they hold the assembled object word and card-read instruction for loading the assembled program into memory, as described earlier.

 Any data on the card in columns other than the ones cited above is dropped by the Cardatron and never even reaches the 205 memory.

The following snippet of code from EASY (starting at line 183 in the source) illustrates most of the features described above:
1                  4000 START
1     TABLE             STC   HOLD       SYMBOL TABLE
1                       CAD         4F   SEARCH ROUTINE.
1                     4 SLT
1                       STA   EXIT       STORE EXIT.
1                       CAD   HOLD       GENERATE
1                       MUL   10101      STARTING
1                       STC   TEMP       PLACE, AND
1                     3 SLT              SEARCH FOR
1                       STA   TEMP       EMPTY OR
1                       LDB   TEMP       UNUSED PLACE
1   1       1      1900 CAD
1                       CNZ         2F
1                       CAD   HOLD       NOT IN TABLE
1     EXIT
1     10101  1  1  1001
1   3                   CAD   EXIT       IF SYMBOL IN
1                       ADD   TWO        TABLE, ADD 2
1                       STA   TEMP       TO EXIT LINE
1           1      2900 CAD              AND DISPLAY
1     TEMP                               EQUIVALENT


Reconstructing EASY


Fortunately, a listing of the EASY assembler survives in the papers Knuth donated to the Computer History Museum in 2005. Brief documentation for the program, accompanying a shorter listing (lacking the symbol table constants and Cardatron format bands), is also available at that site.

Since the source code existed only in the form of scanned listings, I had to transcribe it manually. Then, because EASY was written in itself, the next problem was to find a way to bootstrap the assembler source into 205 object code so it could be executed.

Both listings cited above include the assembled object code to the right of the source text. Even better, the shorter listing accompanying the documentation showed the forward references resolved to their final addresses. Having the forward references already resolved allowed me to write an off-line script to extract the object code directly from the transcribed text without having to resolve the forward references myself.

After transcribing the assembler source and carefully proofreading it, I extracted the object code from the right-most columns of the card images and reformatted those words into a standard seven-words-per-card deck that can be loaded using Cardatron format band 6. This deck has the same format as the short card-load program cited earlier. Using this load deck, I then assembled EASY with itself and compared the assembly output against my transcription. After a couple of cycles of correction and reassembly, I had an accurate transcription of EASY that would generate itself.

There were several anomalies in the reconstruction process that bear mentioning, though.
  1. The unconditional branch instruction on line 129 (address 1733) is shown on the listing as assembling to 0000 00 0000. It should be 0000 20 4756. This was probably an error introduced during whatever process originally resolved the forward-reference addresses for the listing.
  2. The START pseudo-instruction on line 304 is shown on the listing as "20 START", which would imply the next word should be assembled at location 0020. The effective address is actually 1478, however. I eventually learned that suppression of leading zeroes in numeric fields is something that was done by the IBM card equipment, not the 205 or Cardatron, and those machines suppressed any character that did not have a numeric 1-9 punch in its card code. There is actually a "0+" (or equivalently, "0&") for the program-point address in columns 37-38 on this line. Since those characters do not have a numeric punch in their card encoding,  they were both zero-suppressed on the listing and printed as blanks. The presence of "0+" in the program-point address field changes the effective address of the START instruction from 0020 to the current address plus 20, or 1458+20=1478.
  3. At line 425 (address 2712), the symbol table word for the BF7 operator (code 37, Block From Loop 7000) was shown on the listing as 0 4246 86 0007. The character codes in that word translate to BF6 instead, so I corrected the word to 0 4246 87 0007. This is very likely a day-one bug in the program, but was probably never encountered as BF7 is rarely used.
  4. At lines 558-561 (addresses 1203-1206) in the initialization code for the program, the listing shows four Block To Loop instructions:
        1220 BT4
        1260 BT5
        1240 BT6
        1280 BT7

    These each transfer 20-word blocks from the indicated main memory addresses to the 4000, 5000, 6000, and 7000 high-speed loops, respectively. Of course, when these are executed, they destroy whatever was previously in those loops. Alas, EASY had assembled very critical code and data into each of those loops (see lines 183-250 in the listing), so the initial attempts to run the assembler bombed. These BTn instructions made absolutely no sense, as nothing had been assembled into the 1220-1299 address range. I resolved the problem simply by changing the BTn instructions to BFn, which transferred words from the loops to unused memory addresses instead of the reverse. Later, I learned that MEASY uses all four high-speed loops for its loader, and any code assembled into loop addresses will not be loaded by that routine. My best guess at this point is that the version of EASY in the listing we have had been modified for assembly by MEASY, and that there is a missing piece that took care of loading the loops.
  5. I modified the load deck to replace the HLT 1221 instruction that terminates the loader with an unconditional branch to the program's entry point (address 1200). This allows EASY to begin execution automatically after it is loaded.
  6. I also modified the load deck to impose "reload lockout" on the reading of the last card of the load deck. This allows the source deck to be assembled to follow immediately after the load deck, by preventing the Cardatron from reading the first card of the source deck before EASY has a chance to load the format bands to the Cardatron.
  7. The listing I transcribed did not have the mnemonics or operation codes for the four magnetic tape instructions. I added these to both the transcribed source and to the load deck.
  8. As will become clear from the discussion of the symbol table below, EASY requires that memory be cleared prior to an assembly. It does not do this itself, so the short memory-clear program cited above has been prepended to the load deck mentioned in the next paragraph.

The transcribed source file, load deck, a sample output deck, and an annotated listing are all available from the software repository folder on our GitHub project site.


How EASY Works


The 205 was strictly a word-oriented machine. It had no native character manipulation facilities, and it being rather slow to begin with, free-form parsing of text was quite expensive. What the 205 did have, though, was the Cardatron, which was quite good at extracting columns from cards and transforming them into numeric words for the 205's memory. EASY took heavy advantage of this capability. The numerous fields of its input card format were designed to allow the Cardatron to do essentially all of the parsing of text on a card, converting each of the fields described in the "Programming with EASY" section above to a full word in memory as part of a single card-read instruction.

EASY reads each input card into locations 6005-6018 under control of Cardatron format band 1. You can examine the codes for that format band at memory locations 1400-1428 in the source.

The 6000-series addresses reference one of the 20-word high-speed loops on the memory drum. All loop addresses are congruent modulo 20, which means that 6005 6015, 6135, 6675, 6815, etc., all address the same word on the drum [corrected 2015-12-08]. The layout of words in this loop is designed to facilitate both analysis of the columns on the card and formatting of the output card.

In the following, keep in mind that for both input and output, the Cardatron processes card columns in descending order, but reads and writes to memory in ascending order, so the fields in memory are in reverse order of those on the card:
  • Word 6000: the assembled word, but with its sign digit set to zero. This word will be punched into columns 71-80 on the output card.
  • Word 6001: the sign (in the low-order digit of the word) to be applied to the assembled word This will be punched into column 70 on the output card.
  • Word 6002: current value of the location counter (address of the instruction being assembled). This will be punched into columns 66-69 of the output card as the address field for the following card-read instruction.
  • Word 6003: skeleton card-read instruction (0 0810 44 xxxx) that will be punched into columns 60-69 on the output card. At load time, this will cause the next card in the output deck to be read; the address (xxxx, from word 6002 above) will specify the location into which the assembled word on that card will be stored.
  • Word 6004: a literal 6 applied by the Cardatron output band to the card-read instruction above and punched into column 59. This word is also used to supply a literal 6 elsewhere in the assembler (e.g., at 1732).
  • Word 6005: third word of comment field (columns 52-56).
  • Word 6006: second word of comment field (columns 47-51).
  • Word 6007: first word of comment field ( columns 42-46).
  • Word 6008: high-speed loop tag from column 40 of the card.
  • Word 6009: second program-point address reference character from column 38 on the card (+, -, F, B).
  • Word 6010: first program-point address reference character from column 37 (0-9).
  • Word 6011: symbolic address field from columns 31-15 on the card.
  • Word 6012: symbolic op code field from columns 25-29 on the card.
  • Word 6013: numeric address field from columns 20-23 on the card.
  • Word 6014: numeric op code field from columns 18-19 on the card.
  • Word 6015: numeric control digits from columns 14-17 on the card.
  • Word 6016: numeric sign from column 13 on the card.
  • Word 6017: symbolic label from columns 7-11 on the card.
  • Word 6018: program-point label from column 5 on the card.
  • Word 6019: literal 1600000999. The low-order four digits are used by the TABLE routine during symbol-table search wraparound to reload the B register. The high-order 16 gets punched into columns 1-2 in the output card to be used as Cardatron input format band select digits when the assembled deck is read back in.
Some fields are read numerically by the Cardatron, viz columns 5, 13, 14-17, 18-19, 20-23, 37, and 40. The rest of the fields are read alphanumerically. Blank columns read numerically are interpreted as zeroes. The internal code for a space character is decimal 00, so it is simple for EASY to determine if one of the symbolic fields is blank by checking its corresponding memory word for zero.

The numeric word from columns 13-23 becomes the default value for the assembled word. Literals can be specified simply by coding their value in these columns and leaving the remaining fields on the card blank. Coding a 1 in column 13 sets the sign digit in the assembled word, which for instructions indicates the operand address should be modified by the value of the B register prior to execution of the instruction.

EASY uses the program-point and symbol label fields on the left side of the card to create symbol table entries and detect when a forward reference to a symbol has been satisfied. If the symbolic operation code, symbolic operand address, program-point address reference, or high-speed loop tag are specified, their respective values are applied to the corresponding sub-fields in the assembled word.

The symbol table occupies memory locations 1900-3899, providing for a total of 1000 symbols. The table is in two parts: the first 1000 words hold the alphanumeric symbols, one symbol per word, left-justified over zeroes (i.e., blanks). The corresponding word 1000 locations higher in memory holds the value of the symbol, typically an address or instruction operation code. The table is initialized with the mnemonic operator codes and their numeric equivalents. The mnemonics have 7 added to their symbol value, probably to distinguish them from user-specified symbols having the same name. In effect, this places the operator mnemonics in a separate name space from the other symbols.

Symbols in the table are located by the TABLE routine, which resides in the 4000 and 5000 high-speed loops, using a hashing technique. The assembly routines that handle symbolic labels, mnemonic operation codes, and symbolic operand addresses call TABLE passing the target symbol in the A register. EASY multiplies the target symbol by 1001001001. This produces a 20-digit result in the A and R registers. EASY then extracts the high-order three digits of the R register as the modulus of the hash, producing an initial probe offset into the symbol words in addresses 1900-2899 of the table.

The hash modulus is loaded into the B register, which acts as an index register. The word at location 1900+B is loaded into the A register. If it is zero, that position in the symbol table is presently unused. This also means the target symbol is not in the table, and is therefore undefined. TABLE exits back to the caller one word after the address where it was called, with the target symbol restored to A and the offset to the unused symbol table entry in B.

If the symbol word in the table is not zero, TABLE subtracts the target symbol from it. If the result of that subtraction is zero, the symbol has been found, so TABLE loads the corresponding value of the symbol from 2900+B into A and returns to the caller three words after the address where it was called.

If the target symbol does not match the current symbol table entry, a hash collision has occurred.  TABLE decrements B to test against the prior symbol in the table. The process in the above two paragraphs then repeats until either the symbol is found or an unused symbol table entry is encountered. If B decrements below zero, its value is reset to 999 to continue searching from the top of the table. Alas, if the symbol table fills up, TABLE will loop endlessly if it searches for an undefined symbol. There is no recovery from this situation.

Since words of all zeroes signify an unused entry in the symbol table, EASY will not work properly if at least addresses 1900-3959 in memory are not cleared to zero initially. Ensuring this is something that must be done prior to loading the program.

Program-point labels are stored in memory locations after the symbol table. Backward program points are stored beginning at 3900, indexed by the program-point number. Forward program points are stored beginning at 3950. Forward references to program points are handled in the same manner as for symbolic labels. Only the value word for a program point is stored, as the symbols are just the digits 1-9 and implicitly index the table entries.

Once all the fields of a card have been processed, the values of the words in the 6000 loop define the content of the output card to be punched. Cardatron output format 1 (at locations 1436-1457) defines the mapping of these words to columns on the output card, which is punched by the CWR instruction at location 7018.

A card containing the END pseudo-operator signals the end of input to the assembly. The operand address on this card specifies the memory location where the loader program will be placed when the output cards are loaded for execution.

The loader occupies 33 words plus one word for each forward-reference chain generated during the assembly. The loader is coded in locations 1761-1790 within EASY. The routine at 1740-1760 copies this code, at six words per card, to the output deck. Following the cards for the loader, this routine emits a table of chain descriptor words, again six words per card. As with the cards generated by the assembly process, a seventh word on each card (in columns 4-14) is a card-read instruction that bootstraps the next card in the deck. On the final card, that card-read instruction is replaced by a branch to the start of the loader.

Thus, when an output deck is loaded under format band 6, the assembled words go to their assigned locations, the loader and chain descriptor words go to locations starting with the one specified on the END card, and the last card of the deck automatically branches to the loader routine. After fixing the forward-reference chains, the loader halts with 1221 in the operand address register, and the program is ready to run. At this point the 205 operator would normally use one of the control panels to enter a branch instruction to the program's entry point.

Each chain descriptor word processed by the loader has its sign digit set to 1. The low-order four digits of this word, if non-zero, are the address of the last word in a forward-reference chain. The next four higher-order digits are the forward address to be applied to each of the words in the chain. As mentioned previously, each word in the chain has the address of the prior word in the chain, except for the last word, which has an address field of zero.

The foregoing is a highly condensed description of how EASY operates, with many significant details missing. I was interested in how this assembler worked, so studied it in detail and prepared an annotated listing of the program. If you are interested in a deeper understanding of EASY, you may find that to be a good place to start.


The MEASY Assembler


At some point during the summer of 1960, Knuth rewrote EASY to take advantage of magnetic tape, calling the new assembler MEASY ("Modified EASY Assembly System for the Burroughs 205"). It is not difficult to understand why he did so. The Algol-58 compiler eventually grew to more than 3600 cards, so each assembly with EASY would punch another 3600 cards, plus whatever was required for the loader. For those who remember punched cards, 3600 is almost two boxes worth.

Running under the retro-205 emulator within Google Chrome, EASY assembles the Algol-58 compiler in 41 minutes and punches 3737 cards. To load the compiler so it can be run, all of those cards need to be read back in again, which including the time to resolve forward addresses, takes another 17-18 minutes. Assembling to magnetic tape is faster and eliminates the waste in card stock on each assembly. Loading the assembled compiler from tape takes only a little over four minutes. When you may only get one time slot on the computer per day, this level of time savings could have a significant effect on productivity.

MEASY is not just a clone of EASY that has been patched to output to tape. It uses the same input card format and symbol table structure, and internally many of the coding techniques are similar, but it appears to be a complete rewrite. Like EASY, it is a one-pass assembler and relies on a loader to resolve forward-address references, but the output format is nothing like EASY's, and the loader works entirely differently than the one for cards. MEASY does not punch cards. Instead, it produces a printed listing on the Cardatron's IBM 407 tabulator.

Magnetic tape on the 205 is formatted in fixed, 20-word blocks. The output tape produced by MEASY has the loader routine in the first three blocks, followed by blocks of assembled code. Within a block of assembled code, each assembled word is represented by a pair of words:
  • The second word is the assembled word to be stored in memory.
  • The first word has two four-digit fields. The low-order four digits will be non-zero if assembly of this word resolved a forward-address reference. The value of this field is the address of the last word in the forward-reference chain, with subsequent words in the chain linked through their address fields, as with EASY. The next higher-order four digits are the address at which the second word should be stored. This is also the address that must be applied to any words in the forward-reference chain. 
After the last block of assembled code, there is one additional block with its first word having a value of 9999999999. This signals the end of the program to the loader.
Running MEASY is a little more complicated than EASY. The assembler can be loaded either from cards (if assembled with EASY) or from tape (if assembled with itself). MEASY requires a scratch tape on magnetic tape unit 4 to receive the assembled output.

Loading from cards works the same as for EASY. As with EASY, I modified the load deck to replace the HLT 1221 at the end of the loader with a CUB (30, Change Unconditional and Block) instruction to the assembler's entry point, 1048. This instruction transfers 20 words from main memory to the 7000 loop, starting with the operand address. It then branches to the mod-20 congruent address in the loop (7008 in this case).

Loading from tape requires three steps. First, the three-block loader must be read from tape into memory at address 1000. This can be done with a magnetic tape read instruction of the form 0 0340 40 1000. Second, the loader must be initiated with a CUB to its entry point address at 1045 (0 0000 30 1045). Once the loader completes, it will HLT 7555. The final step is to initiate the assembler with a CUB 1048. This sequence of instructions could be entered manually from the Control Console keyboard, punched onto a paper tape, or punched onto a single card to be read with format band 6.

Reconstructing MEASY was done in a fashion similar to EASY. I transcribed the entire contents of the scanned listing, both source text and assembled object code. From that transcription, I created a card deck of the source text and assembled that with EASY, comparing the assembled output back to my transcription, applying corrections, and reassembling until the transcription and output matched.

Next, I extracted the MEASY object code from the EASY assembly output using an off-line script to produce a loadable card deck. Once magnetic tape devices were implemented in the emulator, I used that load deck to assemble MEASY with itself, this time checking the assembly output listing against the original scan. Once everything agreed, I had an accurate transcription, and executable versions of MEASY that could be loaded either from cards or tape.

As with EASY, the transcribed MEASY source file, assembly output listing, load deck, output deck from assembly with EASY, and output tape image from assembly with itself are all available from the software repository folder on our GitHub project site.

There are two anomalies in these materials. First, I transcribed MEASY using the same format as I had for EASY. The object code is in a different position on the MEASY listing than it is on the EASY listing. Second, the assembly output listing has been post-processed by an off-line script to apply leading-zero suppression to the numeric fields. Zero suppression was done by the IBM 407, and an equivalent facility has not yet been implemented in the emulator. The raw assembly output is also available in the project repository.


Why This Assembler?


If Burroughs had its STAR 0 assembler, and Shell Research had made its much more sophisticated assembler for the 205 available, why would Knuth have gone to the trouble to write an entirely different one on his own? I don't know the answer, but I can speculate a little.

First, it is likely that Knuth had never even seen a 205 prior to arriving in Pasadena early in the summer of 1960, let alone had any direct experience with one. He says in his oral history cited above that he had "learned about the 205 machine language" while still at Case. It's possible he was not aware of the other two assemblers, or had insufficient information on them to want to risk relying on them for such a short-term project.

Second, Knuth already had some experience with assemblers at Case Institute of Technology, and as cited above, had modified the SOAP III assembler for the IBM 650 to support program-point labels. It's possible he had some ideas about how an assembler could be constructed, and wanted to try those. It's also possible he wrote EASY as a warm-up exercise, to validate his understanding of the 205 and refine coding techniques for it. While the 205 and 650 had superficially similar architectures, programming to deal with the 205's high-speed loops was entirely different than that for addressing and flow control in the 650.

Third, it is clear from his comments in the oral history cited above that he had started working on the Algol-58 compiler before he got to Pasadena, and was coding portions of it at least by the time he was driving out to California. In order to do that, he had to have an assembly notation to code, and much of the notation in his draft coding form for the compiler closely resembles that of EASY.

Regardless of his reasons, his implementation of not one, but two 205 assemblers for the Algol compiler project worked, and no doubt helped him meet the project's ambitious schedule. Not only that, it turned out these relatively simple assemblers were quite fast. As Richard Waychoff recalls in his memoir, comparing assembly of Knuth's Algol-58 compiler with that of the 205 FORTRAN compiler he and Lloyd Turner were developing at the same time:
Our compilers were both punched on cards and were the same size. We had written ours in STAR 0, the only assembler that Burroughs supported on the 205. [...] Our compiler took one hour and 45 minutes to assemble. The first week of don’s project he spent in writing his own assembler. He could assemble his compiler in 45 minutes. We were green with envy. I am sure that don used only half the computer time that Lloyd and I used.
Running in the retro-205 emulator under Google Chrome, EASY can assemble its 574-card self in six minutes and 15 seconds. That is a rate of almost 92 cards/minute, which is pretty impressive, considering the assembler is ultimately limited by the 100 card/minute speed of the IBM 523 punch. The assembly of the Algol-58 compiler with EASY cited above -- producing 3737 cards in 41 minutes -- yields a similar rate.

Knuth comments in his notes for the MEASY assembler that it operates at about 120 cards/minute. Our experience with it in the emulator is nearly the same -- it compiles the Algol-58 compiler in approximately 30 minutes.

For whatever reasons drove Knuth to write his own assemblers, we are fortunate today that he did so, and that the source code for them has survived. The most reasonable alternative for him to use instead was probably the STAR 0 assembler. Nothing of that assembler appears to exist any longer -- not even a reference manual -- so if the Algol-58 compiler had been written in STAR 0, we would have been faced with the prospect of reverse-engineering that assembler from thin air.

Regardless, reconstruction of EASY and MEASY has been an interesting sideline to the overall goal of reconstructing of the Algol-58 compiler.


In Conclusion


As mentioned earlier, completion of the MEASY reconstruction has had to wait on implementation of magnetic tape in the retro-205 emulator. That work is now complete, and the next release of the emulator, scheduled shortly, will have tape support. Transcription of the Algol-58 source code was completed in June, and much progress has been made towards making it run.

The next blog posts will detail the magnetic tape implementation and the status of the compiler's reconstruction.




____________
1 Shell Assembly Program for the Burroughs 205 Electronic Data Processing System, Burroughs Corporation Bulletin 3038, March 1960, Part I, page 2. Originally published as EPR Memorandum Report 37 by the Shell Development Company, Exploration and Production Research Division, Houston, Texas, September 1958.