Welcome to the Datatron 205 and 220 Blog

This blog is a companion to T J Sawyer's Web Page that outlines the history of the Burroughs Datatron 205 and Paul Kimpel's incredible 205 and 220 emulators. Please visit those sites and return here to post any comments.

Saturday, November 7, 2015

Knuth's Algol-58 Compiler

In the mid-1950s, a group of computer scientists from Europe and the United States began to lay the groundwork for a new programming language. Their idea was to define a universal notation for the expression of algorithms -- procedures for solving mathematical problems in a series of finite steps. Hence this new language was to be called "Algol" -- not to be confused with the trinary star of the same name in the constellation Perseus.

The focus of this new language was what, at the time, termed "scientific" (i.e., arithmetic) computing, as opposed to "commercial" computing, which was oriented mostly towards business record-keeping and accounting. The objectives set for for the new language were:
  1. It should be as close as possible to standard mathematical notation.
  2. It should be possible to use it to describe computing processes in publications.
  3. It should be mechanically translatable into machine programs.
Design of the language proceeded to the point that a "Preliminary Report" [1] was published in the Communications of the ACM in 1958. This version of the language thus became known as Algol-58 and served as the basis for a number of early compiler implementations. The final ("revised") specification for the language, known as Algol-60, was not published until early 1963 [2].

This is Paul Kimpel again, with an update on the retro-205 emulator project. One of the primary goals of this project has been reconstruction of the Algol-58 compiler that Donald Knuth wrote for the 205. This post reports on our progress towards that goal.


A Tale of Two Compilers


As previously discussed in this blog [3], Knuth entered into a contract with the Burroughs Corporation in 1960 to write a compiler for the 205, which he did largely during the summer of that year at Burroughs in Pasadena, California. The compiler appears to have been finalized and released to customers in late 1960 or early 1961.

Knuth wrote two versions of an assembler to develop the compiler, EASY and MEASY [3]. He saved a MEASY-generated listing of the compiler, and in 2005 donated that listing, along with numerous other documents, to the Computer History Museum in Mountain View, California. Those documents have been scanned, and are available on the museum's web site. The scanned compiler listing can be downloaded as a PDF [4]; a full index to Knuth's documents can be found in [5].

I stumbled across this listing a few years ago and was fascinated both by it and by a similar listing Knuth donated for the Algol-58 ("BALGOL") compiler for the Burroughs 220. Those discoveries stimulated my interest in older computer systems, which evolved into emulating older computer systems, and eventually to this retro-205 emulator project. My main impetus for building a 205 emulator was the idea of reconstructing Knuth's compiler and getting it to work again.

Some years earlier, I also stumbled across Tom Sawyer's excellent web site devoted to the 205 [6]. As I started to research the 205 in preparation for designing my emulator, I contacted Tom, and we struck up an on-line relationship. As Tom discusses on his web site, he actually used a 205 while at the University of Portland (Oregon, USA). When he left Portland, he took with him a number of manuals and other technical documents for the 205, many of which he has generously shared with me during this project.

Tom was able to take something else of interest from Portland -- a listing of a raw-decimal dump from a tape for the "Burroughs Algebraic Compiler." This was just the object code for the compiler; there was no source. Long before I became interested in the 205, Tom had manually transcribed this listing (all 4600 11-digit words of it) and partially disassembled it by hand, as described in his blog post earlier this year [23]. He wrote is own emulator a few years ago and was able to get the transcribed object code to work with that emulator.

The existence of Tom's tape dump was quite a surprise when he revealed it to me a year ago. Of course, the question then naturally arose whether this was the same compiler as the one represented by Knuth's listing. Neither of us had heard of a second Algol compiler for the 205, but that was certainly possible. Thus, determining the relationship between Tom's tape dump and Knuth's listing became another goal for the project to pursue.

The text of Tom's transcription and partial disassembly (exported from his original Excel spreadsheet) [7] and a loadable tape image [8] in the format used by my emulator are available in the project's source code repository.

Tom also has a copy of the Programmers Manual for the Algol-58 compiler. This document does not appear to be otherwise publicly available. Until it can be properly archived, you can access a scan of it in [25].


Reconstructing the Compiler Source


Before we could assemble the compiler represented by Knuth's listing and compare its code to Tom's tape dump, let alone attempt to run that compiler in the emulator, that listing first had to be converted to source code.

Transcribing the Listing

There are basically two ways to convert a scanned listing into usable source code, either manually transcribe the listing to a text file, or use Optical Character Recognition software to convert the page images to a text file. Neither approach is easy, and neither is without its own set of problems. Both generally require a lot of manual proofing and correction. OCR software has come a long way in the last several years, but Knuth's listing is only 66 pages -- slightly more than 3600 lines. I decided that struggling with OCR was not worth the trouble, so simply transcribed it manually.

Since the listing was the product of a MEASY assembly, it had not only the text of the source code, but to the left of the source on each line, the numeric address and word that had been assembled. As with the recovery of the much smaller source code for the EASY and MEASY assemblers, I decided to transcribe the entire line image -- both source text and assembled code.

The reason for transcribing both the source and assembled code is to support validation of the transcription. Assembling the compiler with MEASY will produce another assembly listing, which can be captured as a text file and compared back to the transcribed file. The source code, of course, should match between the two, but most errors in transcription of the source code should result in differences in the assembled code. Since comparing two text files is something that software can easily and reliably do, I was thus hoping to avoid much of the line-by-line proofing that would otherwise be necessary.

Transcription of the compiler listing took about eight weeks, averaging a page or two per day, and was completed in late June of this year. Further work had to wait on debugging of the MEASY assembler, which in turn had to wait on development of magnetic tape support in the emulator. By the second week in July, I was finally ready to try to assemble the compiler.

To begin, I extracted the source code columns from the transcribed file to a separate text file, which became the input card deck to MEASY. I assembled that deck, captured the printer output as a text file, and compared that file to the original transcription using a side-by-side GUI diffing program. This first attempt revealed dozens of mismatches of exactly the sort you would expect -- missing or misspelled identifiers, improper column alignment, typos in the transcribed object code, confusion between letter "O" and digit "0," etc. After applying corrections to both the card deck and original transcription file, I assembled again, which revealed additional mismatches. It took only three such iterations of correction and reassembly over the course of an afternoon to achieve agreement between the transcription and the assembler output.

The project repository has copies of the corrected transcription from the listing [9], the corrected card deck extracted from that transcription [10], and the listing produced by MEASY from assembling that card deck [11]. This is not the original listing produced by MEASY, but one obtained by processing the MEASY output through a script [12] to perform the same type of leading-zero suppression that the IBM 407 would have. The emulator does not presently support leading-zero suppression for Cardatron output devices, but that is planned for a future release.

The repository also has a copy of the magnetic tape image of object code [13] produced by MEASY from assembly of the corrected card deck [10].

The Mysterious Assembly Mismatches

Most of the mismatches between the transcription and assembler output were trivial to understand and correct. There were half a dozen, however, that took a while to figure out. In all of these cases, the operand address in the assembled word did not match the one in the listing, and it was the assembled word from the original listing that did not make sense. For example, consider the instruction at address 0002 (line 608 in the transcribed file):
2 0 0000 15 7012             10 NOR        7
The "2" on the left is the address of the word being assembled, "0 0000 15 7012" is the assembled word, with 15 being the op code for the Normalize (NOR) instruction and 7012 being the operand address. The "10" to the left of "NOR" is an absolute address, which will be added to any symbolic address determined from other fields on the card. The "7" is a "loop tag," which serves to set the high-order digit of the operand address so that it references one of the high-speed memory-drum loops.

Alas, that line was coming out of MEASY looking like this:
2 0 0000 15 7010             10 NOR        7
Which seems right -- with no other symbolic address following the op code, the operand address should reference location 10 with the 7000-loop tag applied to it. But it turns out that the operand address in the scanned listing is correct after all. It had been transcribed properly, but that line in the scanned listing was missing something. What it must have contained was this:
2 0 0000 15 7012             10 NOR     0&
As described in the earlier blog post on EASY and MEASY [3], columns 37-38 of a source card can represent an optional "program point" address. This can be either a reference to a numeric program-point label forward or backward of the current line (nFnB), or a relative address so many words before or after the current line (n&n-). Note that on the IBM card equipment used with the 205, "&" and "+" are the same character, with the kind of print wheel installed in the 407 tabulator determining which glyph prints.

Therefore, for this line in the listing the operand address should be 10 plus the current location (0002) offset by zero, so 0010+0002+0000=0012, which after the loop tag is applied becomes 7012, thus agreeing with the assembled word on the listing.

What happened to the "0&" from the original source card? The answer turned out to be that the 407 ate it. I had to study a manual on the 407 to understand why:
  • Suppression of leading zeroes for numeric output to card devices was done by means of plugboard wiring in the card device, not by the 205 or the Cardatron. 
  • IBM card equipment divided each column into two parts, "zone" punches (the top three rows on a punched card) and "numeric" punches (the bottom nine rows). 
  • Under this scheme, "0" (the third row on a punched card) is a zone punch, not a numeric punch. Similarly, "&" or "+" (the top row on a card) is also a zone punch.
  • Pins on the 407 plugboard designated columns in which zero suppression was to begin. Printing in one of these and its immediate columns to the right was suppressed until the 407 encountered a column that contained a numeric punch. The intent is to replace leading zeroes by blanks, but note that printing will be suppressed for any leading character that does not have a numeric punch.
  • Since the "0" and the "&", do not have numeric punches, and were the leading characters in their field, their printing was suppressed. Not seeing them on the scan, of course I didn't initially transcribe those characters, resulting in the mismatch.
One of the perils associated with trying to recover software from old listings is that you can only recover what you can see, and the data always goes through some sort of transformation between its source medium and the listing. Occasionally that transformation hides or corrupts important information, as it did in this case. Of course, most listings were never intended as long-term archival software storage -- they're just what someone happened to save at the time -- but they are often the only record of the software that has survived, so we must suffer the archival limitations of listings as best we can.

In the rest of this post, I will refer to the compiler transcribed from Knuth's listing as the "recovered" compiler and the one from Tom Sawyer's tape dump as "Tom's" compiler.

The First Compile

Having successfully assembled the source for the recovered compiler and gotten its assembly listing to agree with the transcribed listing, there wasn't much left to do but try to run the compiler and see if it worked.

Tom had written a small program to exercise the compiler in his emulator, and as a first attempt, I decided to try compiling that:
2X=40$
2Y=50$
2Z=60$
2COMMENT HERE IS SOME COMPLICATED STUFF$
2X1=X + Y + 3.14.X + 7.98/Z + (X-Y).(X+Z)$
2X2=X-Y$
2X3=X.Y$
2X4=X/Z$
2STOP$
2FINISH$
The first column of the card has a "2" to select the format band in the Cardatron. The free-field program text is taken from columns 2-72, with columns 73-80 reserved for sequence numbers or identification. The "$" is a substitute for the semicolon (;), which the Cardatron and IBM equipment did not have. Depending on the print wheels installed in the 407 tabulator, "+" may print as "&", "=" may print as "#", "(" may print as "%", and ")" may print as "¤" (termed the "lozenge").

The compiler uses the asterisk (*) to denote exponentiation.  Multiplication is denoted by a period (.). Where no ambiguity would be introduced, the compiler allows the multiplication symbol to be omitted. Thus, the fifth line could have been written as:
2X1=X + Y + 3.14X + 7.98/Z + (X-Y)(X+Z)$
On the morning of 11 July 2015, I loaded the tape produced by assembling the compiler with MEASY and ran the compiler, as described in the next section. I then loaded the card deck for the program above into the reader, pressed the reader's start button, and to my great surprise saw the following start to come out on the printer:
X=40$

Y=50$

Z=60$

COMMENT HERE IS SOME COMPLICATED STUFF$

X1=X + Y + 3.14.X + 7.98/Z + (X-Y).(X+Z)$
                        0071  0 0000 64 7086
                        0072  0 0004 12 6019
                        0073  0 0000 64 7087
                        0074  0 0004 12 6018
                        0075  0 0000 64 7088
                        0076  0 0004 12 6017
                        0077  0 0000 64 6019
                        0078  0 0000 82 7089
                        0079  0 0000 02 5019
                        0080  0 0000 64 7090
                        0081  0 0000 33 0010
                        0082  0 0000 83 6017
                        0083  0 0000 02 5018
                        0084  0 0000 65 6018
                        0085  0 0001 30 0091
                        0086  0 5240 00 0000
                        0087  0 5250 00 0000
                        0088  0 5260 00 0000
                        0089  0 5131 40 0000
                        0090  0 5179 80 0000

X2=X-Y$

X3=X.Y$

X4=X/Z$

STOP$
                        0091  0 0000 80 6019
                        0092  0 0000 02 5017
                        0093  0 0000 64 6017
                        0094  0 0000 80 6019
                        0095  0 0000 82 5017
                        0096  0 0000 80 5018
                        0097  0 0000 80 5019
                        0098  0 0000 80 6018
                        0099  0 0000 80 6019
                        0100  0 0004 12 6016
                        0101  0 0000 65 6018
                        0102  0 0000 80 6019
                        0103  0 0004 12 6015
                        0104  0 0000 64 6018
                        0105  0 0000 82 6019
                        0106  0 0004 12 6014
                        0107  0 0000 64 6019
                        0108  0 0000 33 0010
                        0109  0 0000 83 6017
                        0110  0 0001 30 0111

FINISH$
                        0111  0 0004 12 6013
                        0112  0 0000 08 0137
                        0113  0 0001 30 0114
The compiler was actually working!

The indented, all-numeric lines are object code generated by the compiler. The first four digits are the memory address, followed by the 11-digit word at that address.

Note that the compiled code is output in 20-word blocks. The compiler is generating the code so that it can be loaded into and executed from the 7000-loop on the memory drum. At the end of each block is a CUB (30) block-and-branch instruction to load the next 20-word block of code to the 7000-loop and branch to it. One of these occurs at address 0085, branching to the next block at 0091 (the words after that are floating-point constants referenced within that block); another occurs at 0110, branching to the next block at 0111. That is sophisticated code generation for the 205, as programs run almost ten times as fast from the high-speed loops on the memory drum as they do from main memory.

After reaching this point, the compiler halted with 5250 in the C-register address field. I didn't yet know it, but the compiler hadn't finished, and was waiting for more input, but I stopped there. The next section describes the detailed steps to run the compiler and what we have deduced it was waiting for at the end of the source deck.


Running the Recovered Compiler


MEASY writes its assembled code to magnetic tape unit 4. It prefixes the assembled code on tape with a three-block loader, which assumes the tape will be loaded later from unit 4. To run the program, you load the loader manually to address 1000 and then branch to location 1045. The loader then loads the assembled code from tape, fixes up any forward references and halts with 7555 in the C-register address field. Once the compiler is loaded, you execute it by manually branching to its entry point at address 3040.

To run the recovered compiler in the retro-205 emulator, perform the following steps. See [14] for information on operating the magnetic tape drive and [15] for information on the Cardatron devices.
  1. On the Supervisory Panel, make sure the LOCK/NORMAL and CONTINUOUS/STEP switches are both in the down position.
  2. Load magnetic tape unit 4 with the tape image of the assembled compiler. You can assemble this yourself from [10] or use the tape image in [13]. Place the unit in REMOTE status.
  3. On the Control Console, set the INPUT switch to the KEYBOARD position.
  4. On the Control Console, click the CLEAR button, then the STEP button. The KEYBOARD annunciator on the Console should light.
  5. On your workstation's keyboard, enter 6 0340 40 1000 (without the spaces). This is a magnetic tape instruction to read the three-block loader from unit 4 into memory starting at address 1000. Check your entry in the D register; if you made a mistake, just start over keying the instruction. When finished, press your keyboard's Enter key. The system should execute the read and then halt.
  6. On the Control Console, click CLEAR again, then the CONT button. The KEYBOARD annunciator on the Console should light.
  7. On your workstation's keyboard, enter 6 0000 20 1045 and press the Enter key. This is a branch instruction to the entry point of the loader. The loader should start to execute and read the compiler from the tape. This process takes a little over four minutes.
  8. Once the load has completed, click CLEAR on the Control Console again, then the CONT button again. The KEYBOARD annunciator on the Console should light.
  9. On your workstation's keyboard, enter 6 0000 30 3040 and press the Enter key. This is a block-and-branch instruction to the start of the compiler. The compiler will run briefly to initialize its tables and load the Cardatron format bands, then stop with 44 in the C-register order field, trying the read the first card of the program to be compiled.
  10. Load the card deck to be compiled into Cardatron input unit 1 and press START on that unit. The compiler should start reading cards, and a listing of the program and generated object code should appear on Cardatron output unit 3. The object code output can be suppressed by placing the SKIP switch on the Control Console in the up position. The compiled output code will be punched on Cardatron output unit 2.
  11. If there are no errors, the compiler should halt after processing the program's FINISH$ statement with 5250 in the C-register address field.
This is as far as this version of the compiler can go. If at this point you would press the CONT button on the Control Console to continue past the halt, the system will almost immediately stop again with 44 in the C-register order field, a card-read instruction. What it appears to be trying to do is read an overlay from cards containing a library of standard routines. Alas, we do not have that library.

What the listing Knuth donated to the Computer History Museum appears to represent is the Algol-58 compiler in a fairly late state of development, but not the finished product. It reads cards, parses the source text, and generates what looks to be quite good object code. Without the library, however, it cannot finish its compilation and produce an executable program. This is disappointing, but there is more to the story, as will be revealed below.

The project repository contains a couple of small sample Algol programs, [16] and [18], with corresponding listings produced by this compiler, [17] and [19]. The program and listing in [16] and [17] are the same as the first-compile example above, except that the compile listing in [17] has been annotated to describe the generated instructions and data words. The object code card deck generated by compiling [16] can be viewed in [26].


Running Tom Sawyer's Compiler


It was not difficult to take Tom's transcription of the Burroughs Algebraic Compiler he obtained while at the University of Portland and convert it into the tape-image format used by my retro-205 emulator, then get it to run in that emulator.

The format of that tape image is quite a bit different from the one produced by the MEASY assembler. Tom's tape has a one-block loader on the front, followed by 200 blocks of code for the main compiler, followed by a couple of multi-block overlay segments. All of the object code has had its forward references resolved, so it can be loaded directly to memory and executed. Thus, it loads a lot faster than the tape image produced by MEASY.

To run Tom's compiler in the retro-205 emulator, perform the following steps:
  1.  On the Supervisory Panel, make sure the LOCK/NORMAL and CONTINUOUS/STEP switches are both in the down position.
  2. Load magnetic tape unit 5 with the loadable tape image of the compiler available from [8]. Place the unit in REMOTE status.
  3. Load magnetic tape unit 1 with a blank tape. Place the unit in REMOTE status.
  4. On the Control Console, set the INPUT switch to the KEYBOARD position.
  5. On the Control Console, click the CLEAR button, then the STEP button. The KEYBOARD annunciator on the Console should light.
  6. On your workstation's keyboard, enter 6 0150 40 0000 (without the spaces) and press the Enter key. This is a magnetic tape instruction to read the one-block loader from unit 5 into memory starting at address 0000. The system should execute the read and then halt.
  7. On the Control Console, click CLEAR again, then the CONT button. The KEYBOARD annunciator on the Console should light.
  8. On your workstation's keyboard, enter 6 0000 30 0000 and press the Enter key. This is a block-and-branch instruction to the entry point of the loader. The loader should start to execute and read the compiler from the tape, which will only take a few seconds. The loader will then perform a checksum routine over the loaded code and branch to the compiler's entry point. If the checksum is correct, the compiler will initialize and then stop with 44 in the C-register order field, waiting to read the first card of the source deck.
  9. Load the card deck to be compiled into Cardatron input unit 1 and press START on that unit. The compiler should start reading cards, and a listing of the program and generated object code should appear on Cardatron output unit 3. As with the recovered compiler, the object code output can be suppressed by placing the SKIP switch on the Control Console in the up position. The compiled object code will be written to magnetic tape unit 1.
After processing the FINISH$ statement in the source deck, the compiler will read another 22 blocks from magnetic tape unit 5. This appears to be the first overlay of library routines, similar to the one that is missing for the recovered compiler. Tom's compiler then generates additional object code, apparently emitting I/O routines from the overlay, and finally halts with 7570 in the C-register address field. This appears to signal successful end of compilation.

At least this is what happens for a very simple program, such as [16], that does not make use of library functions. The listing generated by Tom's compiler for that program can be viewed in [20]. It generates the same object code as the recovered compiler for this program, but then generates substantially more after loading the 22-block overlay.

When compiling a more complex program, however, such as [18], things do not turn out as well. The compiler processes the source deck without difficulty, then loads the 22-block overlay and generates additional code. Then it appears to attempt to load one or more additional overlays from tape unit 1, eventually stopping at block 260 on tape unit 5 and going into an endless loop, continuously printing the same object code instruction word. The listing for this compilation can be viewed in [21].

The interesting thing here is that the listing from the University of Portland that Tom transcribed did not have 260 blocks of data, only 230. The remaining blocks would be whatever was on the tape image after block 230, which in my case was words of all zeroes. Tom has confirmed that he had transcribed the entire listing, and after some research has concluded that his listing does not represent the entire tape. The looping behavior when compiling [18] is almost certainly a result of trying to execute the incomplete second overlay.

In sum, we have two versions of the Algol-58 compiler for the 205 that both appear to work, but due to missing or incomplete libraries, cannot generate fully executable programs.


Comparing the Two Compilers


With both compilers working, albeit unable to generate complete programs, the next task was to determine whether and how they are related. To do so, I decided to compare the object code generated by MEASY for the recovered compiler against the first 200 blocks of Tom Sawyer's transcribed tape dump. This represents just the main segment of the compiler, sans overlays, which the recovered compiler did not have.

The tape images for the two compilers have both different formats and different address sequences. Thus the first step was to convert both to a common format and sequence so they could be compared. The code for Tom's compiler is a straightforward memory image, with the words in memory-address sequence and all operand addresses in their final form. The code produced by MEASY consists of two-word pairs, with the first word containing address and forward-reference linkage, and the second word containing the assembled value. See [3] for more information on the format of the tape produced by MEASY. Addresses are in the same sequence as the source code for the compiler.

Preparation of the code for Tom's compiler was simple, as it was already in memory-address order with all addresses resolved. I simply discarded the one-block loader on the front of the tape image, along with the blocks for the overlays at the end of the image. Then I reformatted the remainder to create a text file with one word per line.

Preparation of the code for the recovered compiler was a little more work. Rather than try to fix up the forward-reference linkages and reorder the code myself, I simply cleared the 205's memory and loaded the MEASY object code tape as if I were going to run the compiler. That way, MEASY's loader resolved the forward references and placed the code in memory-address sequence for me. Then I manually entered two 100-block tape write instructions on the Control Console to write all 4000 words of main memory to a blank tape. I unloaded that tape, capturing its image as a text file. Finally, I reformatted that text file so that it also had one word per line.

These steps resulted in two text files of object code having the same format and address sequence. To both files I inserted four-digit numbers in front of each word that indicate their memory addresses. I then compared the two files using a Windows GUI diffing program (Compare It 4.2, http://www.grigsoft.com). The result of that comparison can be viewed in [22]. Note the following about this comparison:
  • The object code for the recovered compiler generated by MEASY is on the left; the object code from Tom Sawyer's transcription is on the right.
  • Sequences of lines that do not match (having a light green background) are surrounded by up to five matching lines (having a white background) on either end.
  • Sequences of more than 10 contiguous matching lines are not shown. Such longer sequences of matching lines are replaced by a horizontal line.
While there are numerous mismatches between these two files, there are also large runs where words at the same addresses match exactly -- 571 words starting at 0438, 481 words starting at 1010, and the big kahuna, 982 lines starting at line 1492 (almost 25% of the program). In total, 3722 of the object code words at the same addresses had the same value -- 93% of the program.

This degree of correspondence between the two sets of object code makes it clear that they represent the same compiler, but at different levels of development. The recovered compiler is obviously the earlier and less-complete one. Tom's compiler appears to be a production release.

Tom has analyzed the differences and reported to me that most of the changes appear to be related to initialization of the compiler and to input/output routines. In particular, his version:
  • Prints a heading line on the 407 at the beginning of compilation. This heading has the date "June 24, 1961."
  • Supports the use of magnetic tape for the compiled object code. The recovered compiler supports only punched cards and paper tape.
  • Supports loading the subroutine libraries from magnetic tape instead of cards.
  • When loading from magnetic tape, the loader computes a checksum of the main program segment. The compiler's initialization code will generate an error message if the checksum is incorrect.
There are no doubt other enhancements and corrections between the two versions that we have not yet uncovered.


Where Do We Go From Here?


It would be nice to get Tom's compiler to the point that it could generate complete, executable programs. Doing so, however, would be a lot like trying to restore a '57 Chevy that has a gutted passenger compartment -- it sort of runs, but you can't go anywhere in it. Significant parts are not available and will need to be recreated from scratch. Unlike a Chevy, though, we do not have any idea what those missing parts were like, other than that they appear to be library routines. They would need to be constructed from their apparent function and the interfaces to them generated by the compiler. What we would end up with would be not so much a reconstruction as a reinvention of those libraries. It is not clear to me this is worth attempting.

A much more achievable goal would be to reverse-engineer Tom's object-code transcription back into source code. Since 93% of the two compilers are identical, 93% of that job is already completed, and Tom's partial disassembly covers portions of the remainder. This would give us complete source code for the finished compiler (less the libraries, of course), and would make it easier to study the changes that took place between the time Knuth's listing was generated and the version that became Tom's tape image was released.

Another goal could be to disassemble as much of the library overlays as we have. This would allow us to understand better how they work and how the compiler uses the libraries. Tom has already done pieces of this during his analysis.

I have not attempted the same type of detailed analysis of the compiler's operation that I did for the EASY assembler [3], but Knuth had already done something similar. Among the papers he donated to the Computer History Museum is a document titled "The Internals of Algol 205" [24]. This describes the inner workings of the compiler at a fairly detailed level, and links that description to flowcharts of the control logic.

In summary, while Tom and I are disappointed that neither of our versions of the Algol-58 compiler are sufficiently complete to produce executable programs, we are both quite satisfied with the degree to which we have been able to reconstruct them and get them to work. Both versions appear to work quite well, given the limits of the source materials we had to start with.

Thus, I am happy to say that I consider this goal of the retro-205 project to have been met.


References


[1] "Preliminary Report - International Algebraic Language," A.J. Perlis, K.Samelson, Communications of the ACM, volume 1, number 12, December 1958, pp. 8-22:
http://www.softwarepreservation.org/projects/ALGOL/report/Algol58_preliminary_report_CACM.pdf

[2] "Revised Report on the Algorithm Language ALGOL 60," P. Naur, ed., Communications of the ACM, volume 6, number 1, January 1963, pp. 1-17:
http://web.eecs.umich.edu/~bchandra/courses/papers/Naure_Algol60.pdf

[3] "Knuth's EASY Assembler":
http://datatron.blogspot.com/2015/08/knuths-easy-assembler.html

[4] Knuth's Algol-58 compiler listing donated to the Computer History Museum:
http://archive.computerhistory.org/resources/text/Knuth_Don_X4100/PDF_index/k-2-pdf/k-2-u2435-B205-ALGOL-listing.pdf

[5] Index to Knuth documents available at the Computer History Museum: http://archive.computerhistory.org/resources/text/Knuth_Don_X4100/PDF_index/KnuthDigitalArchive-Index.html

[6] Tom Sawyer's Datatron 205 web site:
http://www.tjsawyer.com/B205Home.htm

[7] Tom Sawyer's transcription of the dump of the "Burroughs Algebraic Compiler" tape:
https://github.com/pkimpel/retro-205/blob/master/software/Burroughs-Algebraic-Compiler/Algol58_Tape_Annotated_TomSawyer.txt

[8] Loadable tape image of Tom Sawyer's compiler tape dump transcription:
https://github.com/pkimpel/retro-205/blob/master/software/Burroughs-Algebraic-Compiler/Algol58_Tape_TomSawyer.tape

[9] Corrected transcription of Knuth's compiler listing [4] donated to the Computer History Museum:
https://github.com/pkimpel/retro-205/blob/master/software/Knuth-ALGOL-58/ALGOL-58.measy

[10] Corrected card deck extracted from the transcription [9] of Knuth's compiler listing:
https://github.com/pkimpel/retro-205/blob/master/software/Knuth-ALGOL-58/ALGOL-58.card

[11] Listing of the Algol-58 compiler from assembly of the corrected card deck [10]:
https://github.com/pkimpel/retro-205/blob/master/software/Knuth-ALGOL-58/Algol-58-Listing.lst

[12] Script to apply leading-zero suppressing to MEASY assembly listings:
https://github.com/pkimpel/retro-205/blob/master/software/Tools/MEASY-ListingZeroSuppress.wsf

[13] Magnetic tape image of object code producted by MEASY from assembly of [10]:
https://github.com/pkimpel/retro-205/blob/master/software/Knuth-ALGOL-58/Algol-58-MEASY.tape

[14] "Magnetic Tape for the retro-205 Emulator":
http://datatron.blogspot.com/2015/08/magnetic-tape-for-retro-205-emulator.html

[15] "Using the retro-205 Caratron":
http://datatron.blogspot.com/2015/03/using-retro-205-cardatron.html

[16] Small sample Algol-58 program prepared by Tom Sawyer:
https://github.com/pkimpel/retro-205/blob/master/software/Burroughs-Algebraic-Compiler/Arithmetic1dollar.card

[17] Compile listing produced by the recovered compiler for [16] with object code annotations:
https://github.com/pkimpel/retro-205/blob/master/software/Knuth-ALGOL-58/First-Algol-58-Annotated.lst

[18] Sample Algol-58 program taken from Appendix C of the Burroughs Algebraic Compiler for the 205 Programmer's Manual:
https://github.com/pkimpel/retro-205/blob/master/software/Burroughs-Algebraic-Compiler/Appendix-C-Example.card

[19] Compile listing produced by the recovered compiler for [18]:
https://github.com/pkimpel/retro-205/blob/master/software/Knuth-ALGOL-58/Appendix-C-Compile.lst

[20] Compile listing produced by Tom Sawyer's compiler for [16]:
https://github.com/pkimpel/retro-205/blob/master/software/Burroughs-Algebraic-Compiler/Arithmetic1dollar-Compile.lst

[21] Compile listing produced by Tom Sawyer's compiler for [18]:
https://github.com/pkimpel/retro-205/blob/master/software/Burroughs-Algebraic-Compiler/Appendix-C-Example.lst

[22] Comparison of object code for the recovered compiler and Tom Sawyer's transcription:
https://github.com/pkimpel/retro-205/blob/master/software/Knuth-ALGOL-58/Sawyer-MEASY-DUMP-Delta.pdf

[23] "Compiling Programs on Paul Kimpel's B205 Emulator":
http://datatron.blogspot.com/2015/02/compiling-algol-58-programs-on-paul.html

[24] "The Internals of Algol 205," Donald Knuth, ca. 1960:
http://archive.computerhistory.org/resources/text/Knuth_Don_X4100/PDF_index/k-2-pdf/k-2-u2435-B205-ALGOL-internals.pdf

[25] The Burroughs Algebraic Compiler for the 205 Programmers Manual, Bulletin 3041, Burroughs Corporation, Pasadena, California, February 1961:
https://www.dropbox.com/s/khk53yk3ts8hu97/3041_Algol58.pdf?dl=0

[26] Object code card deck generated by compiling [16] with the recovered compiler:
https://github.com/pkimpel/retro-205/blob/master/software/Knuth-ALGOL-58/First-Algol-58-Object.card

 

No comments: