Welcome to the Datatron 205 and 220 Blog

This blog is a companion to T J Sawyer's Web Page that outlines the history of the Burroughs Datatron 205 and Paul Kimpel's incredible 205 and 220 emulators. Please visit those sites and return here to post any comments.

Sunday, August 9, 2020

Recovering the 220 BALGOL Compiler

The retro-220 emulator for the Burroughs 220 computer system has been stable for almost two years. The recovered Burroughs Algebraic Compiler, better known as BALGOL, has been stable for nearly as long. Previous posts for this blog ([1], [2]) have discussed the architecture of the 220 and the retro-220 emulator. Other posts ([3], [4]) have discussed the syntax and features of BALGOL, as well as the preliminary version of the Algol programming language, Algol-58, upon which BALGOL was based.

The story that has not yet been told is how I recovered the BALGOL compiler and made it run again.

Compiler Assets and Components

The BALGOL compiler was first released by Burroughs in March 1960, then received a significant update in February 1962. The compiler was distributed to customers as a magnetic tape, termed the "Generator Tape," along with a few small card decks used for bootstrapping and utility functions.

The 220, first shipped in 1958, was one of the last of the major vacuum tube systems, and quickly became obsolete. Considering that the 220 had a highly unusual magnetic tape subsystem, the possibility that any software in machine-readable form may have survived to the present is remote.

There was another form of release for the compiler, however. For some reason, Burroughs generated complete assembly listings of the 1962 compiler, its standard library routines, and a configuration utility, then made sets of copies in 8.5x11-inch format. They must have made a lot of copies, apparently to be given away. Michael Mahon has told me that there was still a stack of these copies in a closet at the Pasadena plant when Burroughs decommissioned their internal 220 systems around 1968. He says he grabbed one of the copies and still has it.

Another person who had a copy and retained it was Donald Knuth. In 2005, Professor Knuth loaned a large number of documents from his files to the Computer History Museum in Mountain View, California. These documents were scanned by the museum, which then made the resulting PDFs publicly available on its web site [5]. Among those documents is Knuth's copy of the BALGOL compiler listings [6]. That PDF is about 17MB in size and comprises 267 pages.

It was the CHM scan of Professor Knuth's copy of the BALGOL listing that I used to recover the compiler source code and restore the compiler to operation.

The compiler source consists of four main parts:
  1. Compiler main module, 4760 lines, 4663 cards.
    This is the one-pass BALGOL translator, which parses the language text and generates machine instructions for the 220.
  2. Compiler overlay module, 2431 lines, 2388 cards.
    This is the second part of the compiler, which merged into the object program any required machine-language routines from the standard library, any optional user-supplied machine language routines on punched cards, and the code for any debugging aids specified in the BALGOL source (tracing, monitoring, symbolic dump). It produced a loadable program on magnetic tape.
  3. Standard library routines, 1466 lines, 1381 cards.
    The standard BALGOL library of 28 subroutines, including math routines, integer/floating conversion routines, formatted I/O (input/output) routines, and debugging support. The standard routines are described in Appendix G of the compiler reference manual cited below.
  4. Generator program, 2215 lines, 1954 cards.
    This was a "sysgen" program that, using the Generator Tape, created customized tapes from which the compiler could be run. It could prepare the compiler for a given computer configuration, update the standard library with new or modified routines, replace the default punched-card I/O interfaces for the compiler and library, and modify some aspects of the way the compiler operated.
The number of "lines" cited above reflects the number of printed lines on the scanned listings. Due to assembler-generated printing (extra spacing, multiple words generated by pseudo operators, etc.), the listings have more lines than would have been input as source code to the assembler. The number of "cards" reflects the number of lines of assembly source code that were extracted from the transcribed listings.

In addition to these listings, there are Burroughs 220 documents on the bitsavers.org web site [7]. The most important of these is the compiler reference manual from March 1963, 220-21017_B220_BALGOL_Mar63.pdf, prepared for the 1962 version of the compiler represented in Professor Knuth's listing. In addition to these documents, the bitsavers.org site has an older version of the reference manual and an incomplete listing of the compiler and library from 1961.

Thus, when starting the compiler recovery project, we had what appeared to be listings of the complete compiler, library, and generator program (this eventually proved to be true), and good documentation for the compiler, library, and generator program. But here's what we did not have:
  1. The source code in machine-readable form.
  2. Assembly software for the source code. The compiler main, overlay, and library were all written for one assembler. The generator program was written for an entirely different assembler.
  3. Documentation in any form for either assembler. Not even their names were known.
  4. Information on how the Generator Tape was formatted.
It therefore appeared that I would need to transcribe all of the text from the scanned listings (totaling almost 11,000 lines of assembly code), reverse-engineer the two assembler notations and write two different assemblers from scratch, assemble the 31 source modules (compiler, overlay, generator, and 28 library routines) into a form that could be used to construct a Generator Tape, somehow deduce what the format of that tape needed to be, combine the assembled modules to construct an initial tape, and from that use the generator to create a loadable compiler tape. Having done all that, I would then be ready to find some BALGOL source code and start debugging the whole mess. At the time, I found this list of tasks to be somewhat daunting.

Recovering the Source Code

I began transcribing the listings in early November 2016, before I started work on the retro-220 emulator at the end of that December. Initial transcription, proofing, and correction of the transcribed text continued through May 2017. I was usually able to transcribe four pages a day, about 180 lines, or proof about double that amount. I found that trying to do more than that at one sitting simply resulted in introducing more errors during transcription or failing to catch them during proofing, so I took my time and tried to work carefully.

The assembly listing for the generator program is in the traditional form, with memory addresses and words of assembled object code on the left, and the text of each line of assembly source on the right. My approach with listings of this type -- probably a common one -- is to transcribe the listing completely, as is, including the addresses, object code words, spacing, and any additional output by the assembler. The purpose of this is to ease the job of proofing the transcription. After extracting card images for the source code from the transcribed text and assembling it, you can capture the assembly listing as a text file and mechanically compare it to the text file from the transcription. Any differences between the two are easily identified. The assembler symbolic instructions and any comments will of course match, but if the assembled addresses and object code words also match, then there is a good probability that the symbolic instructions were transcribed correctly.

Before attempting to assemble the transcribed generator code, I completed a manual proofing pass to catch obvious errors and as many typos in the comments as I could. Next, I extracted a text file of card images from the transcription containing the symbolic instructions and comments in a form that could be input to the assembler discussed below. I wrote scripts to do this. Then I iterated between assembly runs and correction of the transcribed text until the assembler output matched the transcriptions. This is a fairly reliable proofing process, but it is not guaranteed to catch everything.

Proofing the transcription of the remaining modules, which comprise 80% of the lines, was more problematic. These listings appear to be from a first pass of the assembler. They show the memory addresses assigned to each symbolic instruction, but not the words of generated object code. That meant a mechanical match between the transcribed text and assembler output could only detect address assignment errors, not errors in the symbolic instructions. That in turn meant that for these modules I had to do a much more careful job of manual proofing.

In the end, I had to do two complete proofing passes on the compiler, overlay, and library transcriptions -- the second one after I had started to test the compiler with sample programs and was running into serious problems. I also made at least one extra proofing pass on the main tables in the compiler, which are dense and cryptic. The correctness of those tables are absolutely essential to the compiler's operation.

Creating Assemblers from Scratch

Without documentation on either of the assemblers required to reconstruct the compiler, I had to puzzle out for myself both of the symbolic assembly notations and from that design programs to parse each notation and generate the necessary words of object code and data. These programs were not intended to be recreations of the original assemblers, because those assemblers likely had features that were not used in the transcribed source code, and there was no way to know that those features had existed. Instead, these assembly programs were intended to support just whatever was necessary to successfully assemble the transcribed source code, plus whatever additional features I thought (a) might be useful for assembling 220 programs in general, and (b) would not be too difficult to implement.

The BAC-Assembler

The assembler that Burroughs used for the compiler, overlay, and library routines is fairly straightforward, at least as it was used with those modules. It has a classic columnar assembler format, with fields for an optional label, a mnemonic code for the hardware or pseudo instruction, an optional sign digit to be applied to the assembled word, and an operand field consisting of comma-delimited symbolic expressions for addresses and variant fields. The remainder of the source line, delimited by a space after the operands, is ignored and may be used for comments.

The mnemonic codes for the hardware instructions matched those documented in the 220 Operational Characteristics manual (in [7]), except for a few additional ones. These few, however, appeared to be convenience mnemonics for specialized forms of other instructions. For example, from its context, "BZA <address>" appeared to stand for a "branch if the A register is zero" instruction. There is no such instruction for the 220, but there is a "BFA <address>,<sL field>,<digits>" instruction (branch if the <sL field> of the A register matches <digits>). If <sL field> and <digits> are both zero, this is effectively a branch-on-zero instruction. These convenience mnemonics were easy to accommodate in the assembler.

To understand the operand formats, I wrote scripts to extract the operation codes and number of operand fields from the transcribed text and tabulate the frequencies of operation codes and their numbers of operands. This helped identify the operands that were required for each instruction, as well as cases where operand fields are optional.

The tabulation of mnemonic codes also helped identify the pseudo instructions used by the compiler modules. There were several of these, with most used for typical assembler functions such as defining the value of a symbol or label, setting the next memory address to be assigned, constructing data words from constant values, and signaling the end of the program.

The only challenging pseudo was FBGR, which generates the 29 words of digit codes for a format band used by the 220's Cardatron punched-card interface. See Chapter 6 in the Operational Characteristics manual (in [7]). The coding of a format band is quite complex. FBGR considerably eases that task by producing the necessary digit patterns from a list of parameters that are somewhat like COBOL PICTURE clauses. Fortunately, this assembler's formatting notation was very similar to that used in the Shell Assembler for the Burroughs 205 [11] and its Cardatron interface, so I found it relatively easy to implement this pseudo.

I christened the assembler for this symbolic notation the "BAC-Assembler," from the product code that Burroughs assigned to the BALGOL compiler, BAC-220. It is actually a cross-assembler, as it does not run in the 220 environment. The assembler is written in Javascript and runs from a web page in a standard browser. It can be run directly from the retro-220 project's hosting site, or downloaded from the project repository and run locally. Documentation and instructions for its use are in the retro-220 project wiki [12].

Although the BAC-Assembler was developed specifically to assemble the compiler and library modules, it has enough features that it can be used for general 220 programming. It generates absolute machine code that can be loaded into the retro-220 emulator and executed directly. It supports the following output formats:
  • A standard, self-loading Cardatron format band 6 card deck. Format 6 is a hardwired band used to load numeric words of data.
  • A self-loading paper tape image.
  • A "machine language" (ML) card deck. This format is used by the compiler for machine-language routines to be merged with BALGOL programs at compile time, and by the generator program for updating the standard library. It is described in Appendix F of the compiler reference manual.
  • A "media" card deck. This format is used with the generator program's INPUTMEDIA and OUTPUTMEDIA specifications to replace the default Cardatron I/O routines in the compiler and library. This format is documented in Appendix F of the compiler reference manual. See the BALGOL-INPUTMEDIA-OUTPUTMEDIA folder [14] in the retro-220 project repository for an example.
  • A magnetic tape image.
The last three formats were implemented to support reconstruction of the compiler and generator program. The first two were implemented as a convenience for general 220 programming.

The GEN-Assembler

The assembler that Burroughs used for the generator program appeared to be quite a bit more sophisticated than the one used for the compiler and library. The columnar format it used was similar to but not compatible with that of the BAC-Assembler. This assembler supported more complex symbolic expressions as well as a more sophisticated notation for literal operands.

From an analysis of operation codes and operand formats like the one done for the BAC-Assembler, this assembler used the same hardware operation codes, although there were a few additional convenience mnemonics. It is likely that this assembler had a macro processing facility, as the generator source has many instances of a DO operator that generated two instruction words to execute a subroutine call. The cross-assembler discussed below simply implemented DO as a special pseudo operator.

This assembler also had a richer set of pseudo operators compared to the BAC-Assembler. It had equivalents for all of the ones mentioned above for BAC-Assembler, but in addition, ones to assign assembly addresses starting at one location but store the assembled code at another, specify the location of the literal pool, erase the internal symbol table and start a fresh assembly, construct words from multiple sub-fields, fill a range of words in memory with a constant value, and generate sequences of data words from lists of constant values and expressions. It had a FORMAT pseudo for generating Cardatron format band data similar to FBGR for the BAC-Assembler, but fortunately the notation was the same and I was able to transfer the FBGR implementation from BAC-Assembler with almost no change.

Since it was designed to reconstruct the generator, I christened this cross-assembler the "GEN-Assembler." It is also written in Javascript and runs in a standard web browser. Like the BAC-Assembler, it can be run from the retro-220 project's hosting site or downloaded and run locally. The user interface is identical to BAC-Assembler, and it can output object code in the same formats. It is in many ways a more capable assembler than BAC-Assembler, and is at least as well-suited for general 220 programming. Documentation and instructions for its use are also in the retro-220 project wiki [13].

Pre-Loading Cross-Assembler Literal Pools

Something that proved to be necessary with both cross-assemblers was a capability to pre-load the literal pool. A literal pool is a block of data the assembler constructs to hold values that are defined as literals in the operand fields of instructions. Rather than have the programmer manually declare a constant value, give the address of that constant a symbolic name, and then reference that name in operand expressions, literal pools allow the programmer to simply write the literal directly in an operand expression. The assembler automatically allocates storage for the value and essentially assigns the literal itself as the value's symbolic name. Multiple references to the same literal value throughout a program result in addressing the same value in the pool.

For example, the following instruction occurs at address 3182 in the compiler main module:
CAD   +6034037172
The assembler recognizes +6034037172 as a literal, assigns it address 4113 in the literal pool, assembles the value of the literal in that location, and assembles that CAD (Clear and Add) instruction with the pool address, resulting in an object word of 0 0000 10 4113.

A literal pool is not difficult to implement in an assembler, but what proved to be impractical was allocating the pool words at the same physical addresses as the original assembler did. The ordering of literal address assignments was probably some function of the original assembler's symbol-table structure and symbol lookup mechanism. Although assigning different pool addresses would not affect the correctness of the program, it produced undesirable differences in the mechanical matching of the generated assembler listing back to the transcription of the original documents.

I tried a few fairly obvious address assignment schemes without success before deciding that this was a problem not worth trying to solve the same way the original assembler did. Instead, I simply extracted the pool addresses and values from the transcriptions and designed a JSON format to hold them. You then load a file containing this JSON into the cross-assembler at the start of the assembly process to pre-populate the assembler's literal pool table with values at the specified addresses. This is a feature specifically intended for recovering the compiler and generator, and is not something you would expect to use with either cross-assembler for regular 220 programming.

Pre-loading the literal pool proved to be useful for resolving another problem. In the generator program there was one literal expression involving a symbolic value that was a forward reference to a symbol that in turn was defined after the point where the assembler was instructed to place the pool. See the LDB instruction at card sequence 1903, address 4310, with the literal operand "=BUF+1=".
The location of symbol BUF after the location of the pool, and the fact that the value of the symbol depended upon the size of the pool, meant that the literal value was also dependent on size of the pool. I could never get the assembler to assign locations correctly after the point where the pool was emitted to the object code. Since I was more in the business of recovering a compiler than writing an assembler, I decided not to try to fix what amounted to a GEN-Assembler design deficiency and let pre-loading of the pool resolve that problem for me.

Constructing the Generator Tape

After transcribing and proofing the original listings, writing two cross-assemblers, and getting everything to assemble and match back to the transcriptions, the next task was to take the assembled object code and construct a Generator Tape. The Generator Tape cannot be used to run the compiler; rather the generator program constructs a loadable compiler tape from it.

This task was not the most time-consuming part of the project, but it proved to be one of the most complex and challenging ones. Not having access to any documentation or an actual Generator Tape, the only way to approach the problem was to try to deduce from the generator program the layout of the tape and try to reproduce it.

The 220 Magnetic Tape Subsystem

Before discussing the Generator Tape, it will be useful to discuss the 220 magnetic tape subsystem, because it was quite different from the IBM standard that later became dominant, and the compiler took advantage of that subsystem's unique characteristics.

The tapes themselves were 3/4-inch Mylar on 3500-foot reels. The drive operated at 120 inches/second, recording at 208.3 bits/inch. Each 11-digit 220 word was recorded on the tape as 12 four-bit BCD (binary coded decimal) digits, with a "spacer" digit between words. Each six-bit recording frame on the tape held a BCD digit, an even-parity bit, and a control bit used to detect block boundaries. No longitudinal block-parity checking was used.

The tapes were recorded in two parallel, six-bit "lanes," a feature brought forward from the Burroughs 205 tape subsystem. There were two tape heads, but a drive could read or write only one lane at a time. Several tape instructions provided for switching between lanes, a process that took about 70ms.

Blocks could contain 10 to 100 words. Importantly, the design of the block format allowed drives to overwrite existing blocks reliably, something you could not do with the IBM format. Once a block was written, though, that structure was locked down. To overwrite the tape with a different block structure, the entire tape first had to be erased, something the drive could do off line. There were two forms of tape write instruction: MIW was an "initial" write, and only worked on erased tape. MOW was an "overwrite," only worked on existing blocks, and had to write data of exactly the same length as the existing block.

Single tape instructions could read or write up to 10 blocks at a time. The tape control unit also had two forms of search that operated independently of the processor. A "search" considered the first word of a block to be a numeric keyword and could search the tape bidirectionally, stopping at the first block with a keyword greater than or equal to a value provided by the search instruction. This obviously worked best if the keywords on tape were arranged in monotonically ascending order. A "scan" worked only in the forward direction, but would stop when it found a matching value in one of the first 10 words in a block. Due to the overwrite and search capabilities, 220 applications tended to use magnetic tape more like a semi-random device than a purely sequential one.

Because of the difference between the initial and overwrite modes of writing to tape, it was common practice for applications to be designed to work with pre-formatted tapes having blocks of a constant size. Because larger block sizes used tape more efficiently, it was also common to format tapes with the maximum block size.

This was the case with the BALGOL generator and compiler. Both the Generator Tape and the compiler tapes created from it required magnetic tapes that were pre-formatted with 100-word blocks. The BALGOL generator and compiler used both the block overwrite and the search capabilities.

Constructing the Compiler and Overlay

Appendix A in the compiler reference manual has a good description for creating a loadable compiler tape from the Generator Tape. It mentions using a bootstrap card deck, the "Generator Callout program," but this deck was apparently distributed only on cards and we did not have it. The operating instructions for the compiler in Appendix B mentions a similar deck for the "Compiler Callout program," which we did not have either. Fortunately, though, that appendix shows the code for an equivalent paper-tape compiler bootstrap routine:
6 1000 04 0002       PRB   2      READ BOOTSTRAP INTO ADDR 2
0 2008 50 0000 MRW 002 REWIND TAPE 2, SWITCH TO LANE 0
0 2001 52 0000 MNC 0,2,0 READ 10 BLOCKS FROM TAPE 2 TO ADDR 0
From that five-instruction paper-tape bootstrap it was easy to understand both how the compiler loaded and how to construct a similar bootstrap routine for punched cards:
0 0000 39 0000       SOR          DO NOT HALT ON OVERFLOW
0 2008 50 0000 MRW 002 REWIND TAPE 2, SWITCH TO LANE 0
0 2001 52 0000 MNC 0,2,0 READ 10 BLOCKS FROM TAPE 2 TO ADDR 0
From understanding the way the compiler loaded, I could see that the generator program loaded the same way. Both bootstraps read the first ten blocks (1000 words) from tape to memory starting at address 0000 and then branched to address 0002, which is the start of a routine that loads the rest of the program, validates and strips out the checksum words embedded in the code, and then branches to the beginning of the program proper. Thus, these small bootstrap programs were the first clues as to the layout of both the Generator and loadable compiler tapes.

The routine at address 0002 in both the generator and compiler assembly listings is labeled "LOAD." Immediately after that routine is another one labeled "STORE." Inspection of that routine in both programs revealed that it writes the program from memory to tape, calculating and inserting checksum words in the manner expected by the LOAD routine. So here was the second clue -- if we could arrange to get the assembled object code for these programs into memory, all we need to do is execute a branch to the STORE routine in order to get that code written properly to magnetic tape.

The overlay module, as its name implies, is loaded by the compiler to overlay a portion of the compiler's memory. Thus it does not have a load routine of its own, but does have a corresponding (and unlabeled) store routine located at its end, starting at address 4000.

These embedded store routines were sufficient to get the generator, compiler, and overlay written correctly to the Generator Tape. The compiler and overlay are written to lane 0 on the Generator Tape, while the generator is written to lane 1. I eventually wrote a few small 220 programs to partially automate this process by positioning the tapes and calling the internal LOAD/STORE routines within the modules.

The progress thus far was a good start on constructing a Generator Tape, but there were a number of other details that still needed to be resolved, the most significant of which was getting the standard library onto the tape.

Constructing the Standard Library

The Generator Tape as distributed by Burroughs would store the object code for the standard library in the form required by the generator program. Loadable compiler tapes produced by the generator would store the library in the form required by the overlay module to select individual routines and merge them into a program's executable image.

During the process of creating a loadable compiler tape, the generator program has the ability to modify the standard library, allowing sites to build a compiler tape having a customized library. Sites could alter the existing routines, add new library routines of their own, or even delete routines they did not want.

"Modify" is perhaps too generous a word here, because the process is one of total replacement. The complete set of library routines to be on the compiler tape must be fed to the generator program on cards in the "machine language" (ML) format mentioned earlier. Thus, you need the ML card decks for all of the routines that will make up the library, including the ones you are neither modifying nor adding.

The set of object code cards for each library routine must be prefixed by a card with the name of the routine, the type of its return value if any, and a list of the types of its parameters if any. If a library routine references other library routines, the name card must be followed by an "equivalence" card for each referenced routine. Using the equivalence information, an address-coding convention in the ML format allows a library routine to reference locations at relative offsets within other library routines.

To make the process of updating the library easier, the generator supports an option to punch a card deck in ML format containing the object code for the entire library, complete with name and equivalence cards, in the format required for feeding back into the generator on a subsequent run. A site wanting to modify the library could thus start by punching this deck and then modifying only parts of that deck as needed.

I started with the idea of somehow constructing the library structures for the Generator Tape myself from the assembled output of the transcribed library routines. I tried to determine the structure of the library on tape from the generator program, but the handling of the parameter signatures and equivalence information involved a lot of quite complex code, and I had trouble understanding how those structures should be constructed.

But then I had a thought -- instead of trying to construct the library structures for the Generator Tape myself, would it be possible to get the library update feature of the generator program to do it for me? It appeared from looking at the generator and overlay code that the structure of the library on tape was the same for both the Generator Tape and the loadable compiler tapes. The library comes at the end of the Generator Tape, after the overlay module, so inserting the library should be simply a matter of appending the library blocks to the end of the tape.

I decided to try the following scheme:
  • Construct a preliminary Generator Tape with an empty library.
  • Manually construct the complete card deck for updating the library as required by the generator from the assembled output for each routine, inserting the necessary name and equivalence cards.
  • Use the generator program with the preliminary tape to create a compiler tape.
  • During the creation of the compiler tape, use the card deck to replace the (empty) library coming from the preliminary Generator Tape and thus output the library routines from the card deck to the compiler tape.
  • Finally, extract the tape blocks containing the library from that compiler tape and append them to the preliminary Generator Tape, thus creating a final Generator Tape.
To make a long story short, this approach worked. Of course, I had to assemble all 28 of the transcribed library routines individually to get their object code in ML card-deck format, as well as manually prepare their name and equivalence cards.

Fortunately, the information for the names and equivalences is embedded as constant strings in the source code for each of the routines, bracketed by words with special sign digits. For the ARCSIN function, it looks like this:
CNST  40000990000
CNST 90000000000
The 40000990000 word signals the end of the object code in an ML deck (see Appendix F in the compiler reference manual), while the 90000000000 word signals the end of the name/equivalence information. The complete library deck for the ARCSIN function that would be fed to the generator program thus looks like the following. The digits 2 and 6 in the first column allow the generator to distinguish between name/equivalence cards and ML object cards. Note how the 40000990000 word is now the last one on the last card of the deck and the name/equivalence information has been dropped, as the generator will not tolerate that additional data.
2   ARCTAN=1
2   ROMXX=2
2   ERROR=3
The presence of the bracketing words for the name/equivalence information suggests that Burroughs probably had a Generator Tape-builder program that would process the ML decks for the library routines, extract the name and equivalence information, drop the words following the 40000990000 word, and write the remaining code with the necessary library structures directly to the Generator Tape. Not having such a builder program (and not knowing enough to write one), I had to use a much clumsier process, but it worked, and was only a one-time effort.

When speaking of "cards" and "decks" throughout this post, I am of course not referring to actual punched cards. The retro-220 emulator uses plain text files to simulate card decks, with the individual lines in the files simulating individual card images. A sample "deck" for generating a compiler tape from a preliminary Generator Tape with card images for the full standard library is available as a text file in the retro-220 project repository.

Documenting the Process

As I worked through the process of understanding the layout of the Generator Tape and marshaling all of the parts that needed to go into it, manual generation of that tape started to become quite complex. At first I was keeping all of this in my head, but as I got further along, the complexity reached a point where I had to start writing things down.

Those notes evolved into a 12-page document that describes a 24-step procedure for constructing a Generator Tape, starting from only the transcribed source code and a few utility decks in the retro-220 project's GitHub repository [9]. That document is available in the project wiki [10].

It is no longer necessary for anyone to work through that 24-step procedure, unless the Generator Tape needs to be corrected. To make using the compiler, creating customized compiler tapes, or reconstructing the Generator Tape easier, the following files are available in the project repository [9]:
  • Two loadable compiler tape images -- one configured for a system with 5KW of memory (the minimum the compiler will support) and another for 10KW (the maximum the 220 supported).
  • A pre-built Generator Tape image for the generator, compiler, and library in their current state.
  • Files for the various bootstrap and utility card decks.
  • Pre-assembled files of object code for the compiler, overlay, library routines, and generator.
Summarizing from the 12-page document, the layout of the Generator Tape turned out to be the following (all blocks are 100 words in length):
  • Lane 0:
    1. 50 blocks (#1-50) of the compiler main module. These are written in groups of 10 blocks containing 999 program words followed by a checksum word.
    2. Two blocks (#51-52) reserved for compiler patches, also checksummed. These blocks are not used with the standard compiler.
    3. Three blocks (#53-55) containing a copy of words 3996-4295 from the compiler main module, not checksummed. This is part of the compiler's initial symbol table, before entries for the library routines have been inserted. These blocks are used by the generator to rebuild the symbol table when replacing the library, after which the new symbol table containing information for the library routines is written back over blocks 41-43 of the generated compiler tape. This use of blocks #53-55 was the most obtuse part of the tape format to figure out.
    4. Three blocks (#56-58) containing addresses 0000-0399 of the overlay module, not checksummed. This is the object code for the Symbolic Dump diagnostic routine.
    5. 29 blocks (#59-87) containing addresses 0700-3599 of the overlay module, not checksummed. This is the code loaded by the compiler after the end of the user's BALGOL source program has been compiled. The 32 blocks of the overlay module in blocks #56-87 will be checksummed by the generator as one 3200-word unit before being written to a generated compiler tape.
    6. Three blocks (#88-90) for the Library Table, a directory to the library routines.
    7. One block (#91) with a zero as the first word. This appears to be a low-key stopper block for the 220's MTS (Magnetic Tape Search) instruction, which was used to position the tape efficiently to individual library routines.
    8. n blocks (#92 - n+91) for library routines. The first word of each block is a search keyword for the MTS instruction. Each routine starts on a block boundary and may span multiple consecutive blocks.
    9. One block (#n+92) with all nines in the first word. This serves as an end-of-file sentinel to the generator program and as a high-key stopper block for the MTS instruction.
  • Lane 1:
    • 50 blocks of the generator program, checksummed as for the compiler main module above.
A loadable compiler tape has a similar format to the Generator Tape, but with these differences:
  • Lane 0:
    • The initial symbol table for the generator's use in blocks #53-55 is omitted. Therefore, the overlay module (with its checksum word) starts at block #53 and the Library Table starts at #85.
    • Following the all-nines sentinel block at the end of the library, the generator writes four blocks containing the code for the Object Dump and Program Card Loader routines described in Appendix B of the compiler reference manual. These routines are assembled as part of the generator program.
  • Lane 1:
    • Used as workspace by the compiler's Symbolic Dump diagnostic facility.
The tape images in the repository for the retro-220 emulator are plain ASCII text files with a ".tape" file name extension. You can look at these files with a text editor and view the structure of the Generator and compiler tapes. See the wiki page on the magnetic tape subsystem for details on the tape image format.

Debugging the Recovered Compiler

By the time I had finished the emulator, assembled all of the source code, puzzled out the format of the Generator Tape, gotten the generator program working, and created the first loadable compiler tape, it was December 2017. 
Obtaining sample BALGOL source code with which to test the compiler and debug the recovery effort was by far the simplest of the tasks in this project. Section 11 of the 1963 reference manual has four sample programs, apparently written by faculty at Stanford University, including such luminaries as J. G. Herriot and George Forsythe. The 1961 edition of the manual also includes these, plus a fifth sample program. None of them is very long, so I simply transcribed them from the manuals. There were a few typographical errors in the manuals, but these were easy to discover and correct once the compiler was starting to work. I also wrote a couple of small test programs from scratch.

The generator program proved to be very easy to debug and get working, largely because its assembly listing included the object code, which allowed the mechanical matching process discussed earlier to find the significant transcription errors. The biggest problems with the generator were not with the program, but in getting its specification cards properly prepared and getting the various modules of object code arranged on the preliminary Generator Tape so that the program would work properly.

Debugging the compiler and library was another matter. I started compiling some of the test programs, and by setting console switches to cause the compiler to print the object code it was compiling, could see that it was working -- somewhat.

The compiler would be parsing the BALGOL source text and generating what appeared to be correct code, but then would start reporting errors that were not related to the source code, including some strange ones like "INVALID PSEUDO-OP" that suggested internal compiler errors. In other cases, the compiler would go off into never-never land, eventually either halting without explanation or looping continuously.

This behavior was highly distressing, partly because something was clearly wrong with the compiler, or the emulator, or both, but mostly because I had no good idea how to start diagnosing these problems, let alone fixing them. The compiler is an extremely complex piece of software, written very cleverly and efficiently by people who really knew the 220, and I never got to the point of a comprehensive understanding of how the compiler worked internally. Plus, the problems were likely occurring long before any evidence of them was surfacing, resulting in a very murky trail to try to follow.

At this point, I decided to launch the complete second proofing of the transcriptions for the compiler, overlay, and library routines. That effort took several weeks to complete, and found a handful of errors, especially in the all-critical compiler tables. That solved a few problems, but still left a number of mysterious errors.

As always with such situations, the only thing one can do is to start with the evidence at hand and try to work backward from there to the cause. Since most of the evidence I had consisted of error messages produced by the compiler, it seemed logical to start at the point in the code where the error message was being generated and try to see which execution paths led to that point. In most cases this was straightforward to do, either by inspecting the transcribed code in the neighborhood of the error message, or by following subroutine return linkages and inspecting the code in the neighborhood of the originating calls.

Tracing the error messages revealed the cause of the error about half the time. For the rest, I had to try to reason about what the compiler was trying to do at that point and hypothesize what could have gone wrong. In other words, the other half of the problems were solved by making what amounted to lucky guesses and following up on them. I was surprised at how successful I was at this. Eventually I started to get error-free compilations and be able to run the resulting object programs.

One thing that helped in the debugging is that the 220 has a feature, the S Register, which allows you to set a trap address on the console. Before the processor executes an instruction at that address, it halts. From there you can see the contents of the registers and use the console keypad to inspect words in memory. I had also implemented a feature in the emulator that would generate a complete dump of the 220 memory. That proved to be much more convenient for inspecting variables and tables in the compiler than finger-boning the console one word at a time.

In the end, most of the trouble in the compiler was caused by a few subtle typos in the transcription. The following table shows the typos in the compiler main module uncovered by debugging:
 Address Assembly Text Error
 0621NOPV1 CNST  00000100001
 10 should be 01
 0691      CNST  20176000000 17 should be 71
*M    LDB   MODE *M should be *H
      STA   SYMBL
 SYMBL should be SMBL
      BUN   SMBL,BUNZ SMBL should be ASMBL
 2660PRFSW F4241 1210,01,1
 1 should be 0
The overlay module was much cleaner. The only significant problem was at address 0010: "MRD 4 100,OT,BMOD" should have been "MRD 4 100,OT,3,BMOD". MRD is a magnetic tape read instruction; the missing "3" is the number of blocks to read.

These types of problems just go to show that even with careful transcription and multiple careful proofing passes, it is almost impossible to recover software from listings without introducing errors, some of which will have nasty effects and be difficult to find.

Debugging the compiler revealed one serious error in the emulator. The 220 supports two similar instructions, CFA (Compare Field A) and CFR (Compare Field R). These compare either a whole word or a partial-word field between a word in memory and either the A or R register, setting the result in the processor's LOW/EQUAL/HIGH comparison toggles. That result can be acted upon by the 220's BCH/BCL/BCE/BCU conditional branch instructions.

Compare instructions like these are usually easy to implement -- the words in the register and in memory are virtually subtracted, and the comparison toggles are set based on the virtual difference being positive, negative, or zero. This is complicated in the 220, however, by the fact that the sign is a whole digit. The low-order bit in that four-bit digit indicates the arithmetic sign (1=negative), but the other bits have meaning in certain situations.

If the sign digit is part of the field to be compared, then CFA and CFR take the extra sign-digit bits into account and order the sign values in a strange way:

3 < 2 < 1 < 0 < 7 < 6 < 5 < 4 < 8 < 9

In addition, if the sign digit is 1, 2, or 3, the remaining digits in the word or field are compared using their nines-complement value.

To put it simply, I didn't get the implementation of this right the first time, nor the second, nor the third. It got to the point towards the end that, whenever I encountered strange behavior by the compiler, I started looking for CFA or CFR instructions in the code path leading up to the error and examining how those instructions were behaving in that specific case.

You would not think that weird comparisons of signs other than 0 or 1 would have much impact on something like a compiler, but BALGOL does extensive field packing in its internal tables to minimize their size, thus to allow compiling larger programs. The sign digits in these tables are frequently included as parts of fields or as fields on their own, and comparisons involving those fields are surprisingly frequent. The people writing the compiler knew exactly how CFA and CFR were supposed to work, and they took every advantage of the somewhat unorthodox behavior of those instructions. Once I had the comparison logic implemented properly in the emulator, the last of the strange compiler behavior simply disappeared.

Recovery of the BALGOL compiler was completed in mid-July 2018. The compiler and library seem to be working quite well now, but I will not be the least bit surprised if someone eventually finds additional errors in the transcriptions.

Recovering the BALGOL compiler for the 220 has been one of the most difficult and challenging software projects I ever tried to do. The retro-220 emulator, while hardly trivial, was a straightforward development task by comparison. Regardless of the difficulties involved, though, recovering the BALGOL compiler and getting to see it run again has been immensely gratifying.


Compiling from Paper Tape

After finishing the compiler recovery project in the summer of 2018, I started working with Michael Mahon to enable paper tape input and SPO (console typewriter) output for the compiler. Michael has written a 220 emulator for the Apple II [15], based upon his experience at CalTech with a 220 that did not support the Cardatron punched card interface. As with that 220, his emulator does not support punched cards, thus he was unable to run the standard compiler.
The generator program has the ability to replace the I/O routines for the compiler and library using its aforementioned INPUTMEDIA and OUTPUTMEDIA specifications, which are described in Appendix A of the 1963 compiler reference manual. Thus we set out to try to customize the compiler so that it could be used with his emulator.

Appendix A mentions a document issued by Burroughs, "Supplemental Material for Use with the BAC-220 Generator," that purported to show examples of alternate input and output routines, but of course we did not have that. Appendix A describes how to generate a compiler tape with custom I/O routines, and Appendix F describes the calling conventions and "media deck" format for such routines, but actually figuring out how to write the routines was going to be up to us. We eventually were successful, and the results are available in the BALGOL-INPUTMEDIA-OUTPUTMEDIA folder of the retro-220 project's repository [14].

The interesting thing about this episode with respect to the compiler recovery effort is that it uncovered a few bugs in the generator program. These were not, however, due to errors in the transcription -- they appear to be original bugs present in Professor Knuth's copy of the generator listing:
  1. The generator needs to know the location of several items in the compiler and overlay modules so it can adjust their code during the process of generating a compiler tape. It does this by means of symbolic address definitions, but these definitions need to be kept in sync with the actual locations in the compiler and overlay. One of the symbolic definitions, NUMB, was incorrect. It was defined in the generator as "4800+1335," but based on the location of that symbol in the overlay module, should have been "4800+1334." This error did not affect generation of a compiler tape with the default Cardatron I/O routines, but it did affect the code adjustments the generator had to do when replacing the default routines, resulting in words at the wrong memory addresses getting adjusted. This suggests that in the 1962 listings Burroughs distributed, the version of the generator may not quite have matched the versions of the compiler and overlay modules.
  2. The generator is driven largely by a series of small tables that are selected for use in a given run based upon the specifications present in its input card deck. Each table consists of a series of words that encode operations the generator must perform to adjust a specific feature of the compiler. Each of these tables begins with a word indicating the number of words that follow it in the table. One of the tables, F2, had an incorrect length word, 6 instead of 8. This was another case where the error did not affect generation of a compiler with standard I/O routines. When replacing those routines, however, the truncated table resulted the generator failing to delete support for the compiler's standard output completely, which in turn resulted in processor alarms when the overlay module tried to execute a Cardatron instruction on a system without a Cardatron.
  3. Another table, with the point-label "1" following the label PTVER, was missing two entries that adjusted the way the compiler output its heading line at the start of a compilation. The consequence of this was relatively benign, in that the compiler simply printed the wrong heading.

Each of these errors has been corrected in the source for the generator available on the retro-220 project repository, but the original version of the generator, matching the original assembly listing, remains available in the repository's BALGOL-Generator--ORIGINAL sub-folder.

Finally, an Assembler Manual

During a visit to the Charles Babbage Institute in Minneapolis during the Fall of 2019, I discovered a manual for what appears to be the assembler used for the compiler, overlay, and library modules [8]. That document indicates, not surprisingly, that this was a much more fully-featured assembler than is apparent from the 1962 listings. It has more powerful address expressions, additional pseudo instructions (termed "control ops"), provision for overlays and source code libraries, and a relatively sophisticated macro facility (termed "generators").

From inspecting that document, I had done a pretty good job of reverse-engineering its assembly notation, or at least those parts of it that were revealed in the compiler listings. My source card format is a little different, the pseudo op for comments is REMK instead of REM, and there are additional convenience operator mnemonics, but none of that was apparent from the transcribed listings. It would be interesting at some point to go through the document and bring the BAC-Assembler more in line with this specification, but implementing some of the more advanced features, such as the library support and generators, would probably require a complete rewrite.


[3] "Algol-58 -- The International Algebraic Language," https://datatron.blogspot.com/2018/07/algol-58-international-algebraic.html.

[7] Bitsavers archive of Burroughs 220 documents, http://bitsavers.org/pdf/burroughs/electrodata/220/.

[8] "Burroughs 220 Assembler-Compiler," Bulletin 5024, Burroughs Corporation, Data-Processing Systems Group, April 1960. Burroughs Corporation Records, Product Literature (CBI 90), Charles Babbage Institute, University of Minnesota, Minneapolis. https://archives.lib.umn.edu/repositories/3/resources/186. Series 74, box 5, folder 17.

[9] BALGOL section of retro-220 GitHub repository, https://github.com/pkimpel/retro-220/tree/master/software/BALGOL.

[10] "Building the BALGOL Compiler," retro-220 project wiki, https://github.com/pkimpel/retro-220/wiki/BuildingBALGOL.

[12] "Using the BAC-Assembler," retro-220 project wiki, https://github.com/pkimpel/retro-220/wiki/UsingBACAssembler.

[13] "Using the GEN-Assembler," retro-220 project wiki, https://github.com/pkimpel/retro-220/wiki/UsingBACAssembler.

[14] BALGOL-INPUTMEDIA-OUTPUTMEDIA section of the retro-220 GitHub repository, https://github.com/pkimpel/retro-220/tree/master/software/BALGOL/BALGOL-INPUTMEDIA-OUTPUTMEDIA.

[15] "Burroughs 220 Simulator, v1.1," Michael J. Mahon, http://www.michaeljmahon.com/.

Sunday, May 10, 2020

retro-205 Emulator Version 1.03 Released

I am pleased to announce that version 1.03 of the retro-205 emulator has been released. This version introduces two significant enhancements, one to the Flexowriter output device, the other to have multiple, selectable system configurations.

These changes have been committed to the project's source repository on GitHub and uploaded to the project's hosting site from which you can run the emulator. The project wiki also has corresponding updates.

Flexowriter 203/204 Character Encoding

Tom Sawyer requested this enhancement for a project he has been pursuing, but explaining it requires a little background. The original ElectroData Datatron 203 computer (as well as the CEC models 30-201 and 30-202 that preceded it) had only primitive input/output devices -- a high-speed paper-tape reader, a significantly slower paper-tape punch, a Flexowriter electric typewriter, and a decimal keypad that could be used to enter instructions and data into the system via the Control Console.

The paper tape and keypad devices were numeric-only, operating with four-bit binary-coded decimal (BCD) digits. The Flexowriter, however, was an alphanumeric device. It could print signed numeric data directly from BCD digits supplied via the processor's A register, but could also print alphanumerically in mixed case, with the characters specified by pairs of adjacent digits. The assignment of the two codes closely matched the internal mechanism of the Flexowriter, with some codes serving to control non-printing functions, i.e., shifting between upper and lower-case, carriage-return, horizontal tabulation, backspacing, and shifting the ribbon between black and red colors.

When the Datatron 204 was introduced with support for magnetic tape, the same Flexowriter character encoding was carried forward. When the Datatron 205 was introduced with the Cardatron interface to IBM punched-card equipment, however, a different character-encoding scheme was adopted that mapped more directly to the zone/numeric scheme used with punched cards. It was also the case that the IBM equipment supported only upper-case letters and fewer special characters than the Flexowriter. As a result, the 205 used a modified Flexowriter that had been adapted to the Cardatron character set.

Prior to this release, the retro-205 emulator has used the 205's Cardatron character encoding scheme. The project that Tom has been working on involves a program that had been written for the 204, which meant that when it printed alphanumeric data to the Flexowriter, it printed gibberish. He needed the ability to switch the Flexowriter between the 205 and 203/204 character-encoding schemes.

This proved to be not very difficult to do. The character encoding for the Flexowriter can now be changed in two ways:
  1. On the system configuration page (accessed from the emulator's home page, webUI/D205.html), in the Console Unit Selection section, there is a new check box, "Use 203/204 Flexowriter encoding." When this box is ticked, the Flexowriter will use the original encoding for the Datatron 203/204. When the box is not ticked, the device will use the 205 Cardatron encoding. This setting is preserved across emulator restarts. Note that the configuration page can be accessed only when the emulator is in a "power off" state.
  2. On the Flexowriter's browser window, there is a new button below the logo that indicates the encoding scheme currently in effect. This is initialized to the setting from the configuration page when the emulator is started. Clicking the button will toggle between the 203/204 and 205 Cardatron schemes. This button allows the encoding scheme to be changed while the emulator is running, even in the midst of the Flexowriter printing. Changes made using this button are not preserved across an emulator restart -- the setting will revert to that from the configuration page.
Character codes used with the 203/204 are shown starting in Section 3 page 21 of ElectroData Technical Bulletin 3040, Datatron Programming and Coding Manual. Note that there is an error in this table for code 33 -- for lower case it prints the apostrophe (') as shown; for upper case, however, it prints the double-quote (").

Character codes used with the 205, Cardatron, and modified Flexowriter are shown on the Burroughs 220 Pocket Card -- the 205 and 220 used the same Cardatron codes. The codes are also shown in Figure 2 of United States Patent 3,000,556.

When printing in 203/204 mode, the emulator supports all character glyphs except the 1/2 (one-half) symbol, which prints as the crosshatch (#). It supports the control codes for upper- and lower-case, carriage return, horizontal tab (with stops at every eight positions), backspace (overwriting rather than overprinting any existing characters), and the black/red color shift. It ignores the stop code, 07.

Multiple System Configurations

Since version 0.06, the emulator has supported customization of its input/output configuration -- console devices, magnetic tape drives, and Cardatron punched-card equipment. There was only one configuration, however -- if you wanted a different arrangement, you had to modify the configuration and save it, which overwrote any prior arrangement.

This next enhancement also originated from a request by Tom Sawyer. He wanted the ability to store multiple configurations and a way to select easily among them. In particular, he wanted the emulator to support some number of predefined configurations so that users of the emulator could easily choose one.

The emulator now supports multiple configurations. There is a "Default" configuration, which is created automatically when the emulator is first loaded to an instance of a browser. This is equivalent to the single configuration that existed previously, and is initialized to the same settings. When running version 1.03 or later for the first time, any existing single configuration will be transferred to the Default one. You can modify this configuration in any way you wish, but you cannot delete it.

In addition, there is -- initially -- one predefined configuration named "Basic203." This configuration, as its name implies, gives you the input/output devices you would have had on a Datatron 203 -- paper-tape reader and punch, plus the Flexowriter printer with 203/204 character encoding enabled. This is what Tom needed for his 204-vintage program. You can neither modify nor delete predefined configurations, but you can clone a new configuration from it and modify the clone, as discussed below. We may be implementing additional predefined configurations in future releases of the emulator.

You configure the emulator the same way as in the past, by clicking the Configure System button on the emulator home page. This can be done only when the emulator is in a "power off" state. Clicking the button opens the configuration page in a separate window, which has changed somewhat from previous releases:

retro-205 Configuration Page

The buttons have been moved and two new ones added. To the left of the buttons is a new pull-down list that holds the available configurations and shows which one is presently selected. You can also see the new check box for 203/204 Flexowriter character encoding discussed earlier in this post.

The buttons work as follows:
  • CANCEL -- Closes the configuration page without saving any changes you have made. You can accomplish the same thing by clicking the close-window icon on the page's title bar (i.e., the "X" on Microsoft Windows systems).
  • SAVE -- Saves any changes you have made on the page to the currently-selected configuration, makes that configuration the current one for the emulator, and closes the configuration page. An alert box will pop up confirming this fact.
  • DELETE -- Deletes the currently-selected configuration and makes the "Default" configuration the current one. You cannot delete the "Default" configuration or any of the predefined ones.
  • CLONE -- Creates a copy of the currently-selected configuration under a new name and makes the new copy the currently-selected configuration. The page will prompt you for the new name. The page stays open after this so that you can customize the new configuration. All cloned configurations can be modified and deleted later.
Thus, to switch to a different configuration that already exists, you need only select the configuration's name from the pull-down list and click the SAVE button.

There is no fixed upper limit on the number of configurations you can have. The configuration data is stored by the browser on your workstation using a standard mechanism known as LocalStorage. The limiting factor is how much data the browser will allow for an entry in LocalStorage. For most modern browsers, this is a figure in the low megabytes. Most configurations require only 2000-3000 bytes.

In addition to its setting on the configuration page, the name of currently-selected configuration is shown under the blue buttons on the retro-205 emulator home page and in the lower-right corner of the Supervisory Panel window.

There is a second method that can be used to select the configuration under which the emulator will run. You can include the name of the configuration in the query string of the URL that opens the emulator home page, e.g.,
The query string is the portion of the URL that begins with "?". The "config" key is case-insensitive, but the configuration name must be written exactly as it is stated in the configuration list, including case and any internal spacing. This will make the named configuration (if it exists) the current one, exactly as if you had selected it on the System Configuration window. If that configuration name does not exist, the emulator displays an alert and opens with its currently-selected configuration.

The idea behind this second method is that it's an easy way to send someone a link that will run a specific emulator configuration.

No further additional features are planned for the retro-205 emulator at present, but occasionally bugs crop us, and we do have a few items of software that are in various stages of restoration, so we aren't done with this system yet.

Thursday, May 7, 2020

retro-220 Emulator Version 1.02 Released

I am pleased to announce that version 1.02 of the retro-220 emulator has been released. This is a minor release, aggregating a number of corrections and small enhancements that have accumulated since 1.01 was released in October 2018.

The source code and supporting materials for this version of the emulator are available from the project repository. You can download the source from the repository and run it from your own web server. Alternatively, feel free to run it from our hosting site. Please see the Getting Started wiki page for instructions.

Emulator Corrections and Enhancements

The most significant changes involve corrections to the implementation of floating-point arithmetic instructions:
  • The implementation of the "normalizing limiter digit" in the floating add/subtract instructions has been redesigned to operate correctly. This was an unusual feature in the 220, and was the subject of a patent (US 3022006A) awarded to John Alrich, who designed the floating-point circuitry for both the 220 and the 205. In essence, if the high-order (sL=11) digit in the instruction word was between 1 and 7, the processor monitored the number of normalization shifts that took place after the add/subtract operation. If the number of shifts exceeded the value of the digit, the processor halted at the end of the instruction, and from the contents of the C register, the operator could determine the number of shifts that actually occurred. Normalization shifts (with accompanying adjustments to the exponent field) are necessary after floating-point add/subtract when one or more high-order digits of the mantissa are zero. The intent of this feature was to trap loss of precision due to very small differences between the two operands. The limiter digit worked only up to seven digits because after eight shifts the mantissa would be all zeroes, yielding a floating-zero result. For details on the normalization limiter, see the 220 Operating Characteristics manual, pp. 2-18 to 2-22.
  • The final value of the A register for floating multiply when exponent overflow occurred was incorrect and has been fixed.
  • The final value of the A register for floating divide when the D register (memory operand) is not normalized was incorrect and has been fixed.
  • The way that registers were being set up for floating divide and post-divide normalization was being done was incorrect and has been fixed.
Text files loaded into magnetic tape drives as tape images are now validated during the load process. If they do not have the correct format, or contain invalid data, or exceed the maximum capacity of a 3500-foot reel of tape (approximately 729,000 words), the tape image file is rejected.

A couple of hidden user interface features used for internal emulator testing have been implemented. These are described in the repository commit note (1893b8a2) associated with the release.

Tuesday, February 11, 2020

retro-205 Emulator Version 1.02 Released

The retro-205 emulator for the ElectroData/Burroughs Datatron 205 has been stable and operating nicely now for almost three years. I occasionally find small problems to fix, or come across a reference that sheds new light on the operation of the system that prompts a programming change. Enough of these have accumulated that I recently pushed out a new release, version 1.02.

This post describes the changes in 1.02. These changes have been committed to the project's source repository on GitHub and uploaded to the project's hosting site from which you can run the emulator. The project wiki also has a few minor updates.

Application Cache Feature Removed

Support for the browser Application Cache ("appcache") feature has been removed from the emulator. This feature allowed the browser to load the entire emulator into local storage and run it, once the code was loaded, without requiring access to the host web site, or even to a network connection. Alas, this feature was not well-accepted by web developers and was deprecated as a standard a few years ago. Some browsers may no longer support it.

The impact of this on most users should range from minor to none. The browser will still require network access to the host web site, either to load the code or verify that the copies in its cache are still current. Once loaded into the browser and running, the emulator requires no further access to the host site, as before. If you want to run the emulator from a location that cannot access your host site, a possible option is to set up a web server on your local workstation. Only a very simple web server is needed. See "Setting Up a Web Server" in the Getting Started wiki for more information.

Floating Point Issues Corrected

I came across a set of floating-point diagnostic test routines at the Charles Babbage Institute (CBI) during a visit in October 2019. These looked interesting, so I transcribed them and got them to run in the retro-205 emulator. That exercise uncovered one bug in the emulator's implementation of floating-point arithmetic, and one aspect of the implementation that was incorrect, but relatively benign.
  • Floating Divide (FDIV, 83) should set the Overflow Toggle when dividing by zero. It was not doing this, however, when the mantissas of both the divisor and dividend were zero. This bug was due to testing the mantissas of the two operands for zero in the wrong order.
  • The emulator included code to pre-normalize the operands, i.e., shift the mantissa left and decrement the exponent until the high-order digit was non-zero. This was code carried over from the retro-b5500 emulator, which I used as a base for the implementation of floating point in retro-205. In fact, I discovered, the 205 always assumed the operands were normalized and did not adjust them before commencing an operation. All pre-normalization code has therefore been removed from the emulator.
The floating-point diagnostic routines are interesting, and I plan a future blog post to discuss them in more detail.

Breakpoint Switch on Control Console Corrected

In the process of getting the floating-point diagnostic routines to work, I discovered that the Breakpoint switch on the Control Console was not working correctly. Setting the switch to the "4" position did not cause the processor to halt after it executed an instruction having the 4-bit set in the fourth digit of the instruction word. Having the switch in the "2" position worked correctly. Have it in the "1" position also worked, but only accidentally -- a case of two errors canceling each other out. This bug was introduced version 0.04e (July 2015) as part of the change that caused the switch to reverse when it was clicked at either end of its travel, and is now fixed.

Rogue Paper-Tape Reader Brought to Heel

During the floating-point routine testing, I also discovered that the paper-tape reader stayed "hot" after clearing the system. If the device was hanging on a read command, waiting for a paper-tape image file to be loaded, then even though the processor had been halted and the system cleared, the reader would immediately started reading when a paper-tape file was loaded into it. It now detects the system clear operation and does not start reading the tape until specifically commanded to do so.

External Control Instruction Corrected

The External Control instruction (EXC, 71) was implemented in the 205 as a potential means for the system to control external equipment. There were eight external switches in the Model 421 External Switching Unit that could be set from the high-order eight digits (not including sign) of the memory word fetched by the instruction's operand address. See the DATATRON Command List (Bulletin 3030C, which follows the DATATRON 204 Central Computer Handbook at the link) for details.

The change in this release is a detail that Bulletin 3030C did not discuss, viz, if the low-order bit of the operand word's sign bit is set, the computer will halt in the same manner as for the Breakpoint switch, i.e., at the end of the instruction. This detail was discovered in a copy of Bulletin 3031, External Switching Unit Model 421 and Output Selector Unit Model 420 (Preliminary Edition), ElectroData Division of Burroughs Corporation, 1956.

Note that although the emulator implements the EXC instruction, it presently doesn't do anything except set flags corresponding to the switches internally in the Processor. In the future it could be possible to use these flags to control some new mechanism.

Cardatron Reader Format Lockout Corrected

The Cardatron interface to IBM punched-card equipment had a feature when reading cards termed "format lockout" that would prohibit the next card from being read automatically. It was triggered by an 8-punch on the card in the column used to determine the format band that would convert the card punches to 205 memory words. Since the emulator uses ordinary ASCII text files as card decks, the concept of an "8 punch" does not translate well, and instead the emulator uses specific ASCII characters to encode the 8-punch along with another punch in rows 1-7 that indicates the format band to be selected.

Two of those ASCII characters were not represented properly in the code, the grave accent (`) for the 1-8 punch and the apostrophe (') for the 5-8 punch. This problem has been corrected. See the Cardatron wiki for the full list of format lockout codes.

Sunday, October 28, 2018

retro-220 Emulator Version 1.01 Released

I am pleased to announce that version 1.01 of the retro-220 emulator has been released. This is a minor release, consisting of some tuning adjustments and a couple of small features as described below. There has also been some enhancement to the BALGOL compiler, also described below.

The source code and supporting materials for this version of the emulator are available from the project repository. The wiki pages have been updated with the corrections and new features for this release. You can download the source from the repository and run it from your own web server. Alternatively, feel free to run it from our hosting site. Please see the Getting Started wiki page for instructions.

Emulator Enhancements

The internal parameters the emulator uses to compute average lamp intensities on the Control Console have been adjusted in an attempt to produce a more realistic dynamic display while the processor is running. The most noticeable change is that the glow persistence of the neon bulbs has been reduced significantly.

The handling of form-feed characters by the Console printers (SPO and teletype units) has been changed. Previously, a form-feed was simulated by the printer emitting a number of line-feeds based upon the value of an internal line counter and a page size of 66 lines. Now, the printer inserts an ASCII form-feed character into the text. The form-feed appears as a normal new-line in the printer's window on your screen, but if you copy the text of the "paper" portion of the window (either directly or by double-clicking the text to move it to a new, temporary window), the form-feed character will accompany the text when you copy/paste the text into another application or save it to a file.

The magnetic tape load dialog has a new convenience feature. There is an additional checkbox, labeled Set Transport Power ON, which is ticked by default. If the box is still ticked when the OK button is clicked, Transport Power for the drive will automatically switch from Standby to ON. This saves you the trouble of having the click the ON button yourself (which I would frequently forget to do) and will place the drive in a ready state.

Magnetic Tape Load Dialog

BALGOL Enhancements

BALGOL now supports paper-tape input and teletype printer output, for both the compiler and the run-time library. When a compiler generated with paper-tape support is used, the emulator does not need to be configured with any Cardatron devices.

When using paper tape, BALGOL source programs must be prepared with fixed-length, 14-word records. Input data must be prepared with fixed-length, 16-word records. The text of these records should be encoded as single-frame characters (alphanumeric words with a sign digit of 2). Numerically-encoded words are accepted, but the program source will print numerically instead of alphanumerically. See Appendix B in the BALGOL manual for details.

Implementing paper-tape support required developing custom INPUTMEDIA and OUTPUTMEDIA routines for the compiler, as well as custom REED and RITE routines for the run-time library. A card deck for the Generator program was constructed with the necessary directive statements (see Appendix A in the BALGOL manual), the object code of the custom routines, and the object code for the remainder of the library. A few corrections to the Generator program were also necessary, along with preparation of a new Generator tape.

The files for paper-tape support are in the software/BALGOL/BALGOL-INPUTMEDIA-OUTPUTMEDIA/PaperTape-Media/ folder in the project repository. Included in that folder are a magnetic tape image for the paper-tape version of the compiler (configured for 5000 words of memory), a short paper-tape callout program to bootstrap the compiler from that tape, and the Generator deck used to produce the compiler tape. The corrected tape image for the Generator program is also available from the repository.

In addition, a few new BALGOL example programs are available in the software/BALGOL/BALGOL-Examples/ folder in the repository.

Wednesday, August 8, 2018

BALGOL – The Burroughs Algebraic Compiler

The Burroughs Algebraic Compiler for the 220 computer system was an early dialect of the Algol-58 language. Within Burroughs, it was styled as BAC-220, but was more popularly known as Burroughs Algol, or BALGOL.

In the prior post [1], I briefly summarized the movement in the 1950s toward the development of higher-level programming languages, the events leading up to the development of Algol, and an overview of the preliminary version of Algol, now known as Algol-58.

This post describes the origin of the Burroughs compiler, compares the BALGOL language features to those of the Algol-58 Preliminary Report [2], and briefly discusses compiler performance and influence.

Origin of BALGOL

In 1954, Robert S. ("Bob") Barton took a job at Shell Development Company Research in Houston, Texas, working on programming applications. By 1957, he had a talented group of young programmers working with him, including Joel Erdwinn, Clark Oliphint, C. M. Pearcy, and (as a summer employee) David Dahm.

This young gang of four wrote quite an advanced assembler for the ElectroData/Burroughs 205. The assembler was tape-based, but included a macro facility (including maintenance of a macro library on tape), special support for addressing the 205's high-speed memory loops, warnings for invalid adjacent instruction parings, and most usefully, a pseudo-instruction that would take COBOL PICTURE-like strings and generate the obtuse 319-digit patterns that are needed for Cardatron format bands.This became known as the Shell Assembler [15].

In 1959, Barton left Shell to work for the ElectroData division of Burroughs in Pasadena, California. Erdwinn, Oliphint, and Dahm (still as a summer employee) followed him to Pasadena. They became known as the "arthropods," because, well, they came from Shell. Not long after, Barton became the head of the so-called Automatic Programming Group in Pasadena, which was developing compilers and other programming aids.

By this time, the Burroughs 220 had been available to customers for about a year. While the 220 was a decimal, vacuum-tube machine, it had good (if somewhat slow) facilities for scientific and numerical computation. According to a memorandum Tom Sawyer has uncovered in the Burroughs archives at CBI, dated 12 November 1958, from R. A. Mallet of the Scientific Compiler Project, Burroughs had been working on a FORTRAN compiler with the goal that "the Burroughs 220 would be able to use any program written, in FORTRAN, for the IBM 650, 704, 709, and 7070..." The memo goes on to request a change in the way that arithmetic overflow was originally handled by the 220.

When Barton became head of Automatic Programming, he apparently assumed responsibility for this project, and according him in the 1985 B5000 Conference oral history [3, page 50], "It's correct to a certain extent in that the job that I had taken, under generally misleading conditions, called for doing an impossible FORTRAN which would also include conversion of assembly language from the 7090, or whatever the machine was at the time, automatically. I knew it couldn't be done, but that was my responsibility."

Somewhere in this time frame, the focus for the "scientific compiler" shifted from FORTRAN to Algol-58, the Preliminary Report [2] for which had been published in December 1958. It is not clear whether this happened at Barton's instigation or the young team took this direction on their own, but Dahm tells this amusing – and telling – anecdote from the B5000 Conference [3, page 27]:
I remember when I was a summer employee in the summer of 1959, I was working on the [BALGOL] compiler with another fellow named Joel Erdwin [sic], who was the project leader. And we were busily trying to do our version of IAL, and one day Bob Barton came along and he had a FORTRAN manual in his hand. It was a really nice, well-done FORTRAN manual done by IBM. He said, "Why don't you guys do this language because then you wouldn't have to write a manual?" We rejected that as being not a very good reason for picking a language. [Laughter] So, basically, I would say that the decision, that the compiler we would do would be ALGOL as opposed to FORTRAN was made by a summer employee and a project leader. I don't know that anyone else was really involved in making that decision.
One further clue as to the shift in focus is Barton's statement immediately following his complaint over the "impossible FORTRAN" [3, page 50]:
Erdwin [sic] would never have done a FORTRAN. I mean, he'd been going through this kind of educational experience at Shell, and he was not the sort of guy that would waste his time doing FORTRAN. He knew too much about language.
So it appears the decision to do an Algol-58 compiler was Erdwinn's, but probably with Barton's active approval and support. Barton gives Erdwinn full credit for the design of the BALGOL compiler, saying it was "a brilliant job of programming" and "Erdwin's [sic] masterpiece" [3, page 49].

Over the short term, it would probably have been better for Burroughs to have gone forward with a FORTRAN compiler, albeit one with more realistic goals. Over the longer term, though, the choice to go with Algol-58, plus the experience gained from building BALGOL and having customers use it, helped set Burroughs on the path to the B5000/5500, B6500/6700/7700, and later related systems that continue to be produced and sold today as Unisys ClearPath MCP systems.

According to the BALGOL reference manual [4], the compiler was completed and first installed for customer use in March 1960. We have two scans of listings of the compiler, one dated 1 May 1961 [5], citing in the heading [Joel] Erdwinn, [Jack] Merner, [Fran] Crowder, [Joe] Speroni, and [Donald] Knuth as authors. The second listing, dated 1 February 1962 [6], adds the names [David] Dahm, [Clark] Oliphint, Logemann, and [Toni] Schuman.

Erdwinn left Burroughs shortly after the BALGOL compiler was finished to join fast-growing Computer Sciences Corporation (CSC, now DXC Technologies). He spent the bulk of the next two decades building compilers and system software for CSC's customers, particularly compilers for none other than... FORTRAN.

Merner, Oliphint, and Dahm continued with Burroughs and, with Barton, had major roles in the development of the B5000/5500. Dahm is also known for his work on the B5500 Timesharing MCP and CANDE, and on the DMSII data base management system for the B6700/7700.

BALGOL vs. Algol-58

BALGOL for the Burroughs 220 is one of several early implementations of the Algol-58 language described in the 1958 Preliminary Report [2]. Other well-known implementations are System Development Corporation's JOVIAL, the U.S. Navy's NELIAC, and the University of Michigan's MAD. The Preliminary Report was a progress report on the current state of a committee's attempt to develop a universal algorithmic language, not a fully-vetted specification. As such, and given the limited capabilities of computer systems at the time, all implementations based on the Report deviated from it, often significantly.

Of the Algol-58 variants, BALGOL is arguably the one that most closely followed the syntax and semantics put forth in the Preliminary Report. The BALGOL compiler is a single-pass design, intended for fast compilation at the expense of highly-optimized code, and oriented to a compile-and-go job shop environment.

The single-pass design imposes some restrictions on the dialect of Algol-58 BALGOL can support, and on the ordering of some language elements in the program text, as is discussed in the following sections. It drops some features of Algol-58, and implements others in slightly different ways, but adds little except for pragmatic facilities such as FORTRAN-like input/output, an open subroutine, program segmentation and overlay facilities, some diagnostic facilities, and the ability to incorporate machine-language routines into the compiled object code.

BALGOL's reference manual even has a detailed set of transliteration rules [4, Appendix E], as called for in the Preliminary Report, describing the differences between the Report and BALGOL's implementation. Going the Preliminary Report one better, the manual has a complete syntax definition for BALGOL in Backus-Naur Form [4, Appendix D].

The remainder of this section discusses the differences between BALGOL and Algol-58 in detail. It follows the overview of Algol-58 features in the prior post [1].


Where the Algol-58 reference language uses a 63-character set (not counting upper- and lower-case equivalents), Burroughs chose to stick with the 48-character set available with IBM punched-card equipment, which most sites used through the 220's Cardatron subsystem.

Rather than attempt some sort of delimiter scheme to support the word-symbols of Algol-58 (begin, end, if, for, etc.), in BALGOL these symbols are considered to be reserved words. Spaces are significant and must be used as delimiters between reserved words and identifiers.
  • The 26 English letters: A-Z [upper-case only]
  • The decimal digits: 0-9
  • Arithmetic operators: +     .   /   *
  • Relational operators: LSS   LEQ   EQL   GEQ   GTR   NEQ
  • Logical operators: NOT   OR   AND   EQIV [equivalence]   IMPL [implication]
  • Sequential operators: GO,   GO TO,   RETURN,   STOP,   FOR,   IF,   OR,   EITHER IF,   OR IF,   SWITCH,   UNTIL,   OTHERWISE,   OVERLAY
  • Separators: ,   ..   $   =   **   BEGIN   END
  • Brackets: (   )
Compare this to the symbology described for Algol-58 in the prior post [1].

The IBM Type 407 tabulator used as a printer with the Cardatron had interchangeable type wheels. Two common wheel sets were "A" (commercial) and "F" (FORTRAN). Four character glyphs differed between the two, with the commercial set shown on the left below and FORTRAN on the right:
#   =
&   +
%   (
¤   )
The internal character codes for these pairs were the same; they simply printed differently on different 407s. In addition, the 407 does not support the semicolon (;), so BALGOL uses the dollar sign ($) in its place.

Aside from these substitutions, BALGOL adopted the following changes in symbology from the Algol-58 Preliminary Report:
  • The period (.) replaces the small cross (×) for multiplication.
  • The asterisk (*) replaces the up/down arrow brackets () for exponentiation.
  • Mnemonic equivalents replace the relational and logical operators, as shown above.
  • Two adjacent periods (..) replace the colon (:).
  • The equal sign (=) replaces the assignment operator (:=). EQL is used as the relational operator for equality.
  • Two adjacent asterisks (**) replace the subscripted (10) to delimit the scale factor in numeric literals for scientific notation.
  • Square brackets, [ ], are not supported by the 407, so parentheses are used to enclose array subscripts.
There are a number of additional declarators and sequential operators (statements), which will be discussed separately below. The complete list of reserved words for BALGOL is in Appendix C of the reference manual [4]. The identifiers for the routines in the compiler library are also considered to be reserved. The standard set of library routines is listed in Appendix G of the reference manual, although the library (and hence the set of reserved identifiers) could be revised during the compiler generation process.

Data Types and Declarations

BALGOL supports only INTEGER, REAL (FLOATING), and BOOLEAN data types. Unlike some other dialects of Algol-58 (e.g., JOVIAL), there is no provision for records, strings, or other composite types.


Literals are specified the same way as in Algol-58, except that numbers have the general form I.F**±E, where I is a whole integer, .F is a decimal fraction, and **±E is a scaling factor as a power of ten. The fraction and scale factor are optional, as is the sign preceding the exponent of the scaling factor. Unlike Algol-58, a number cannot be expressed as simply a scaling factor without at least an integer prefix to it. In addition, a literal cannot begin or end with a period for the decimal point (.), as this could be confused with the use of the period as a multiply operator. Boolean literals continue to be represented by 0 and 1 for false and true, respectively.

Identifiers and Types

As with Algol-58, BALGOL assumes all identifiers refer to REAL variables by default. A type declaration can specify an identifier to be of type INTEGER or BOOLEAN. Array identifiers within type declaration lists need not be followed by empty subscript lists indicating dimensionality – that is determined by the ARRAY declaration. The parentheses surrounding the entire list of identifiers in a type declaration are also optional in BALGOL.

Identifiers must start with a letter, which may be followed by letters and digits, up to a total length of 50 characters. An identifier may not contain embedded spaces or special characters.

BALGOL also provides a generic type specifier termed a prefix. A type declaration can include an identifier followed by an ellipsis (three adjacent periods, ...). Any identifier not otherwise covered by a type declaration that begins with the same set of characters automatically receives the specified type. Multiple prefixes can be specified with overlapping subsets of the same leading characters. Unless an identifier is fully stated in a type declaration, the longest prefix matching an identifier determines its type. An OTHERWISE clause may specify a globally-default type.
INTEGER I..., J..., K..., L..., M..., N...$
With these declarations, the identifier MATRIX would be of type REAL, not INTEGER. Because BALGOL is a one-pass compiler, type declarations must precede first use of an identifier in an expression or array declaration.

Array Declarations

Array declarations differ from those in Algol-58. The most significant difference is that custom bounds are not supported. Each dimension of an array is declared simply as its length. All indexing is one-relative. Only literal lengths may be specified.

As an extension, BALGOL permits an array declaration to specify initial values for the array. If the dimension list is followed by an equal sign (=) and a list of literal values in parentheses, the compiler will fill the array with those values at the time the program is loaded. For arrays of multiple dimensions, a linear list of values is specified. These are assigned to the array elements in row-major order, starting with the last dimension. If the list of values is shorter that the total number of elements, the remaining elements are initialized to zero.
ARRAY A(9), B(4,12), C(10, 10, 10)$
ARRAY MONTHDAYS(12) = (31,28,31,30,31,30,31,31,30,31,30,31)$

Function Declarations

BALGOL supports in-line function declarations like Algol-58, but with a slightly different syntax. The function identifier must be preceded by the reserved word FUNCTION, and the parameter list delimited from the function body by an equal sign (=). Use of an in-line function is otherwise the same.

Diagnostic Declarations 

BALGOL supports three declarations for its run-time diagnostic facilities:
  • MONITOR specifies a list of variable identifiers and labels. At run time, whenever one of the named variables is modified, or any variable within the scope of a named label is modified, the first five characters of that variable's identifier are printed along with the variable's new value.
  • DUMP specifies a list of variables or labels. When a dump is triggered, the abbreviated names and the values of the variables are printed. Labels are printed with the count of the number of times they have been passed. A dump is triggered by a TRACE declaration.
  • TRACE specifies a list of label identifiers, each optionally followed by a literal integer in parentheses. If an integer (n) is specified, then every n-th time that label is passed in the program, all DUMP declarations in the program (or if execution is within a procedure, all DUMP declarations within that procedure) are triggered, and the values in their variable lists are dumped. If a label in a TRACE list is not followed by an integer, then the corresponding DUMP declarations are triggered each time the label is passed, as if (1) had been specified.

Segmentation and Overlay

BALGOL supports multi-level segmentation of the object code for a program. A segment is simply a compound statement preceded by the reserved word SEGMENT and an identifier that labels the segment. Segments may be nested. The code outside any segment declaration is considered to be a global segment, and is permanently resident while the program is running.

A segment is called into memory using the OVERLAY statement, naming the segment label. Note that this statement does not transfer control to the segment, it merely loads it into memory. A separate GO TO statement must be used to enter the segment, and the segment label cannot be used for this purpose. Typically, the first statement inside the segment's compound statement is given a label, and a manual transfer is made to that label. Similarly, exit from the segment is not automatic, and an explicit transfer of control must be coded to return to the code for some containing segment.

Section 9 of the reference manual [4] gives the rules for overlaying segments and transferring into and out of them.

FINISH Declaration

Every BALGOL program must have a FINISH declaration at its end. This signals to the compiler the end of the BALGOL text and the start of the process to link in any library routines or external machine-language programs. The "$" following FINISH is required.

While the reference manual classifies this as a declaration, it is actually an executable statement, as it causes the compiler to emit a halt instruction with 9669 00 9669 ("XX") displayed in the C register. Pressing START on the Control Console at this point will execute a card or paper-tape read (depending on how the compiler was generated), with the intent of booting the next job in sequence.

Other Declarations

Comment declarations are the same as in Algol-58. After the reserved word COMMENT all text is ignored until the next "$".

The switch declaration of Algol-58 is not supported by BALGOL. Instead there is a more primitive SWITCH statement, as described below.

There are additional declarations to support formatted input and output of data. These are discussed in the section on input-output below.


BALGOL follows the scheme of Algol-58 expressions closely, but more completely defines the resulting type of an expression, and adds a few twists.

Arithmetic Expressions

Arithmetic expressions are the same as in Algol-58, with the following exceptions:
  • Array identifiers are indexed using a list of subscripts in parentheses rather than square brackets.
  • Different characters are used to specify multiplication and exponentiation, as described above under "Symbology."
  • Expressions are evaluated left to right, except that default precedence is defined to be exponentiation first, then multiplication, then division, then addition/subtraction. Parentheses, as usual, can override the default precedence.
  • In certain cases, the multiply operator (.) may be omitted, as in standard mathematical notation. The specific rules are stated in Section 3 of the reference manual [4], but basically the operator may be omitted:
    • between a closing parenthesis on the left and an opening parenthesis on the right
    • between a simple variable or literal and a parenthesis on either side
    • between a closing parenthesis on the left and a subscripted or function variable on the right
    • between a literal on the left and a simple, subscripted, or function variable on the right.
     See the examples below.
  • The type of an arithmetic expression is integral if all of its components are integral and real otherwise. Note that inner expressions may be evaluated as integers while the overall expression may result in being of type real. The compiler will coerce literals and some constant expressions to the appropriate destination type at compile time where possible.
  • The integer precision on the 220 is ten decimal digits plus sign, so all integer operations are effectively truncated on the high end to ten digits, e.g., 5000000001 + 5000000002 = 3. The floating-point precision is eight digits.
Note in the second example below the use of implied multiplication with GAMMA. Note in the fourth example not only the implied multiplication of 2 and SQRT(S), but also that the expression is interpreted as (SIGN(A(R,L))) / (2SQRT(S)) since multiplication has a higher precedence than division.

Boolean Expressions

Boolean expressions operate in BALGOL as in Algol-58 with a few enhancements:
  • Parentheses around an arithmetic relation are not required.
  • Array subscripting is the same as for arithmetic expressions.
  • The Algol-58 relational symbols are replaced by mnemonic operators, as described under "Symbology" above.
  • If a relation is between operations of different types, the compiler will coerce the integer operand to floating point. If the integer operand is a literal, this coercion will occur at compile time.
  •  BALGOL adds a Boolean operator for implication (IMPL). A IMPL B evaluates to true unless A is true and B is false.
  • Evaluation of Boolean expressions is from left to right with this operator precedence: NOT > AND > OR > IMPL > EQIV.


The concept of statements in BALGOL is essentially the same as that in Algol-58. The "$" is used to separate statements, not terminate them. The end of a line or record of program text has no syntactic significance. Statements may be split across lines and multiple statements may be grouped on a line.

Since the BALGOL character set does not include the colon (:), statement labels are indicated by two adjacent periods (..). Labels may be either identifiers or literal integers. Leading zeroes in a numeric label are not significant.

Compound statements are identical to those in Algol-58, except that the terminating END of a compound statement may be labeled to provide an exit point. An END may not be immediately preceded by a "$". If the BEGIN of a compound statement is labeled, the label may optionally be repeated after the corresponding END. Curiously, left and right parentheses can be substituted for BEGIN and END, respectively.

As with Algol-58, declarations may be intermixed with statements. Since BALGOL is a single-pass compiler, however, declarations for an identifier after the identifier has been first referenced (and its type determined at that point) are ignored.

The do macro statement of Algol-58 is not supported by BALGOL.

Assignment Statements

The BALGOL assignment statement is the same as for Algol-58, except that the operator "=" is used instead of ":=". Type coercion works the same – integer expressions are converted before being stored in real variables and real expressions are truncated to integer before being stored in integer variables. A Boolean expression will result in a 0 or 1 being stored in a numeric variable. Numeric expressions cannot be stored in Boolean variables.

BALGOL adds the ability to assign an expression to multiple variables. The stores take place in right-to-left order, starting with the variable immediately to the left of the expression. The expression value is coerced as necessary to a destination variable's type before the store, and that coerced valued is then applied to the next store in sequence. Note that this is multiple assignment of a single expression; embedded assignment within an expression (as in Algol-60 and C) is not supported.

GO TO and SWITCH Statements

GO TO statements are more primitive in BALGOL, since it does not support the Algol-58 switch declaration, and thus there is no concept of a designational expression. GO TO may only reference a label. The word TO may be omitted.

To compensate for the lack of the switch declaration, BALGOL supports a SWITCH statement. This is a classic computed-go-to construct, where the value of an expression selects the branch destination from a list of labels. If the value is zero, control continues in sequence after the SWITCH statement. Values starting at 1 select a label from the switch list. If the value is negative or greater than the number of labels in the switch list, the behavior is undefined (internally, the branch table is built and indexed backwards, and an out-of-range index will result in a branch to some arbitrary instruction).

IF Statements and Alternative Statements

The IF statement works identically in BALGOL as in Algol-58.

The alternative (if either... or if...) statement of Algol-58 is supported in BALGOL, but with a slightly different syntax. Instead of starting with if either, the statement starts with EITHER IF. Otherwise, the chaining of OR IF clauses and terminating END are the same.

BALGOL adds an additional feature to the EITHER IF syntax. Instead of the terminating END, you may write "$ OTHERWISE$" followed by another statement, which of course may be a compound statement. This statement will be executed if none of the others in the skip chain have been selected. Thus BALGOL does have an "else." I found to my great chagrin that the "$" before "OTHERWISE" is absolutely necessary – without it, mysterious cascading errors result.
EITHER IF A > B$ FWD = 1$ OR IF A < B$ FWD = 0 END$


FOR and UNTIL Statements

The FOR statement in BALGOL functions identically to that in Algol-58, but with a slight change in syntax. The iteration list may be composed of individual values or triplets that define a sequence of values, but the triplet form is specified as "(initial, increment, final)" rather than Algol-58's "initial (increment) final".

BALGOL adds a new iteration statement, UNTIL, similar to the modern while statement. It evaluates a Boolean expression and iterates until the expression becomes true. If the expression is initially true, the body of the loop does not execute at all.
FOR I = 1, 3, 5, 7, 11, (13, 7, 99), 101, (103, 1, 125)$
    A(I) = I$

    A = A + 3

STOP Statement

BALGOL supports the STOP statement of Algol-58 with two extensions. First, it is a simple halt, and does not terminate execution of the program. Pressing START on the Control Console will resume the program with the next statement in sequence.

Second, the word STOP may be followed by an optional arithmetic expression. The value of the expression will be displayed in the A register on the Control Console. All STOPs display 0137 00 7310 in the C register.

Procedure Declarations and Procedure Calls

Procedure declarations are somewhat simplified in BALGOL compared to Algol-58, but most of the capability is retained.

Procedure Declarations

The first significant difference for BALGOL is that multiple entry points are not supported, so the procedure heading does not have multiple identifiers each with their own parameter list. No label of the same name within the body of the procedure is required.

Second, the input and output parameter lists are not delimited by the "=:" operator. Instead, parameters are arranged in three groups: input, output, and reference parameters. Input parameters may be expressions or arrays. Output parameters may be variables, including arrays. Reference parameters include formal labels, formal functions and procedures, and identifiers for the INPUT, OUTPUT, and FORMAT declarations described below in the section on input-output. The parameter groups are delimited by "$". Any of the groups may be omitted, and the "$" for any trailing empty groups need not be specified. There is no difference between passing an array as an input parameter or as an output parameter.

Formal array identifiers must be specified with an empty subscript list containing n-1 commas to indicate an array of n dimensions. Formal procedure and function identifiers must be followed by an empty parenthesis pair.

Type declarations for the formal parameters must occur immediately after the BEGIN that introduces the body, not before it as in Algol-58. After the final END of the procedure, the procedure identifier may be stated, but it must be followed by an empty parenthesis pair. If the procedure returns a value like a function, the return value is assigned to the procedure's identifier as for Algol-58, but the identifier must be followed by an empty parenthesis pair in this case as well. See the Simpson's Rule example below for an illustration of the syntax.

A RETURN statement is required to exit the procedure.

Once a procedure has been declared, its identifier is available globally throughout the remainder of the program text and may be referenced, even within the body of another procedure.
PROCEDURE CROUT4 ($ N, A(,), B(), Y(), PIVOT(), DET, EX7$
                  SINGULAR, IP())$
    END CROUT4()$
In the example above, A, B, Y, and PIVOT are arrays, SINGULAR is a label, and IP is a function or a value-returning procedure. There are no input parameters in this case, only output and reference parameters.

Procedure Call

The differences for a BALGOL procedure call follow those for the procedure's declaration. Actual parameters are divided into input, output, and reference groups, and the groups delimited by "$". Expressions and scalar variables in the input group are effectively passed by value. Everything else is passed by reference (address).

Array slices can be passed by leaving some subscript positions empty for the actual array parameter, n empty positions specifying an n-dimensional slice. When passing a procedure or function as an actual parameter, the procedure or function identifier must be followed by an empty pair of parentheses. Unlike Algol-58, the number of parameters for the function is not indicated, and currying of the parameters is not supported.

SUBROUTINE Declarations

BALGOL supports an open SUBROUTINE declaration. This is simply a compound statement that is given a name and can be called as a parameter-less procedure from other points in the program. All access to variables is global. Any declarations within the body of the SUBROUTINE are treated as if they had been placed outside its body and do not create local variables. A subroutine is called using the ENTER statement. Exit from the subroutine is by means of a RETURN statement, as for procedures.
  X = Y + Z$
  Y = A(I,J)*2 + Z$


Intrinsic Functions

BALGOL defines a number of intrinsic functions. These are not part of the compiler's library – they instead generate in-line code. All are considered to be reserved identifiers:
  • MOD(X, MODULUS) – remainder division of X by MODULUS
  • MAX(X1, X2, ..., XN) – maximum value from a list of arithmetic expressions
  • MIN(X1, X2, ..., XN) – minimum value from a list of arithmetic expressions
  • SIGN(X) – sign of the argument expression, -1, 0, +1, as the type of the expression
  • ABS(X) – absolute value of the argument expression, as the type of the expression
  • PCS(N) – returns a Boolean value indicating whether the specified Program Control Switch on the 220 Control Console is on, N = 0-9.
 In addition, BALGOL defines a standard set of mathematical functions in its library. These are described in Appendix G of the reference manual [4].

External Routines

PROCEDURE declarations may be prefixed with the reserved word EXTERNAL. In this case the body of the procedure is omitted. An external routine is one whose body will be supplied in relocatable machine language after the FINISH declaration at the end of the BALGOL text. The compiler will assign storage for the machine language instructions and link them into the object code. Instructions for preparing external machine-language routines and the linkage mechanism are in Appendix F of the reference manual [4].

BALGOL supports an additional type of machine-language linkage, EXTERNAL STATEMENT. This is both a declaration and an executable statement. It declares a label identifier that is matched to the identifier of a machine-language routine that will follow the FINISH declaration for the program. The statement branches to the external machine-language code. This is not a subroutine linkage, though, and BALGOL offers no intrinsic means for returning to the point in the program after the EXTERNAL STATEMENT.

Input-Output Facilities

The Algol-58 Preliminary Report does not address the subject of input and output of data for a program. BALGOL provides input-output capabilities somewhat similar to FORTRAN. The default configuration of the compiler and its library assumes input from punched cards and output to a line printer or a card punch, but alternate versions of the compiler could be generated to use other media, such as paper tape.

Input to a BALGOL program is free-form, with values delimited by one or more spaces or the end of the 80-character card or data record. The input formatter accepts integer and floating-point values. Numbers in scientific notation use "," to delimit the power-of-10 scale factor, not "**" as in BALGOL text. Character strings delimited by "$" may be read into a sequence of integer variables or array elements, five characters per word.

Values from the input medium are mapped to variables within the BALGOL program by means of an INPUT declaration. This specifies a named list of variables that is in turn referenced by a READ statement. Iteration through arrays is supported in much the same way as in FORTRAN:
                     FOR J=(1,1,N)$
                         (M1(I,J), M2(J,I)))$

The INPUT declaration generates a co-routine that supplies the address of the next element in the list on each call to it. The READ statement calls this co-routine and performs the actual formatting and data transfer. It keeps reading from the external medium until the input list is satisfied.

A second form of READ allows it to terminate before satisfying the list and signal that fact to the program:
where ENDED is a Boolean variable that is set to true (1) if the input is interrupted by a signal in the input data stream. The signal is indicated by the 10-character sequence " SENTINEL " starting in the second character position of an input line or record. This signal is ignored if the first form of READ is used.

Output from a BALGOL program is not free-form, but formatted in a way similar to that of FORTRAN. An OUTPUT declaration defines the mapping of program values to the output medium, analogous to an INPUT declaration. It also generates a co-routine and is in turn referenced by a WRITE statement. While an input list may only name variables, an output list may specify its elements as general expressions.

The data is formatted and arranged on the output medium according to the specifications of a FORMAT declaration. This is like a FORTRAN FORMAT, although the formatting phrases are coded somewhat differently. The WRITE statement performs the actual formatting and data transfer, naming the identifiers for an output list and a format. A WRITE statement may also omit the data list and specify only a format identifier if the format contains only literal data.
                       FOR J=(1,1,N)$
                          (M1(I,J), M1(I,J)/M2(I,J)))$
FORMAT F1 (I8, B2, *SUM = *, X12.6, W2, (8F14.8, W0)),
       F2 (B20, *THATS ALL FOLKS*, W2)$

WRITE ($$ F2)$
The BALGOL input-output facilities are described in Section 8 of the reference manual [4].


The prior post [1] exhibited near its end an example Algol-58 procedure for computing an integral using Simpson's Rule. The following is that example procedure transformed into BALGOL, and combined with declarations, procedure calls, and input-output statements to make it a complete program:
2   IBAR = V(B-A)$
2   N = 1$
2   H = (B-A)/2$
2   J = H(F(A) + F(B))$
2 J1..
2   S = 0$
2   FOR K = (1, 1, N)$
2     S = S + F(A + (2K-1)H)$
2   I = J + 4H.S$
2     BEGIN
2     IBAR = I$
2     J = (I+J)/4$
2     N = 2N$
2     H = H/2$
2     GO TO J1
2     END$
2   SIMPS() = I/3$
2   END SIMPS()$
2 FUNCTION TORADS(X) = 3.1415926X/180$
2 FUNCTION DARCTAN(X) = 1/(X*2 + 1)$
2   LOGISTICSIGMOID() = 1/(1 + EXP(-X))$
2 SUM = SIMPS(TORADS(30.0), TORADS(90.0), 0.00001, 2.0$$
2             SIN())$
2 SUM = SIMPS(0.0, 1.0, 1**-5, 2.0$$ DARCTAN())$
2 SUM = SIMPS(0.5, 3.0, 1**-5, 2.0$$ LOGISTICSIGMOID())$
2 FORMAT F1(*SINE INTEGRAL =    *,X10.6,W0),
2        F2(*DARCTAN INTEGRAL = *,X10.6,W0),
2        F3(*LOGISTIC INTEGRAL =*,X10.6,W0)$
The "2"s in the leftmost column of the program text are an artifact of the 220's Cardatron punched-card subsystem, and must be present to select an input format when the cards are read. BALGOL uses "2" for source cards, "6" for machine-language code, and "5" for data cards. The output printed by this program is:
SINE INTEGRAL =       .866025
I have transcribed the example programs from Section 11 of the compiler reference manual [4] and have them running in my retro-220 emulator [7]. I have prepared a few other programs, including the Simpson's Rule example above, and have them running in the emulator as well.

These examples are available from the BALGOL-Examples directory in the emulator's open-source project repository [8]. Each example includes source code, sample data, and compilation listings both with and without the code generated by the compiler. A few examples also include equivalent programs for purpose of comparison, written in the Algol-60 dialects of the Burroughs B5500 or modern Unisys ClearPath MCP ("E-mode") systems.

The emulator is written in Javascript and runs in a standard web browser. It can be run directly from its hosting site at [7], or you can download the files from the project repository and install it on some other web server. For instructions on running the emulator and the compiler, see the introductory blog post [9] and the wiki pages in the project repository.

BALGOL Compiler Performance

BALGOL was designed from the beginning to be a fast, one-pass compiler that generated reasonably good code. It was able to do small local optimizations within statements, and some rather fancy constant folding, but did not attempt to do more global optimizations, such as common expression factoring.

Even though the 220 was not a very fast machine, both the user-friendly nature of BALGOL and the speed of its compilation process gave it an advantage in environments where lots of relatively small programs were being submitted by lots of relatively unsophisticated programmers. For this reason it was quite popular at universities and research institutions. Stanford, Caltech, Georgia Tech, Cornell, Case (now Case Western Reserve), and Stanford Research Institute all had 220s and used BALGOL.

Various claims about the compiler's performance have been made. In his 2013 anecdote [10] on the use of BALGOL at Sanford in the early 1960s, Robert Braden stated "For typical source programs, the BALGOL compiler ran at line printer speed, 150 LPM [lines per minute], close to the card reader speed of 240 CPM [cards per minute]."

In a guest editorial [11] for the May 1977 issue of IEEE Computer devoted to stack machines, David Bulman talks about Bob Barton and the influence of BALGOL on the design of the Burroughs B5000/5500, and says of BALGOL, "About sixty compiler instructions were executed to generate each machine language instruction..." A little later he states, "The fact that Balgol [sic] could be compiled at the read speed of the card reader on the B220 lent some credibility within Burroughs to Barton and his ideas for a high-level language machine" (i.e., the B5000).

I have been unable to confirm Bulman's claim on instructions executed per instruction generated. I implemented an instruction counter in my retro-220 emulator and recorded counts of instructions executed and instructions generated for a few small and medium-size programs. There are some choices on just which instructions should be included (e.g., compiler load time, which takes about seven seconds and generates no instructions, or library linking, which generates lots of instructions without processing any source code), but excluding the library linking process, I mostly see ratios of 400-500:1, not 60:1.

I have also been unable to confirm either Braden's or Bulman's claims on compiler speed. For blank cards, comments, and cards with with relatively little text (say, a trimmed length up to 20 or so characters), the compiler does seem to operate at or near the line printer speed of 150 LPM. For complex statements and declarations, there is a noticeable pause between lines, often 2-3 seconds. The best average speed I have seen is close to 100 CPM, but that is without generating a compilation listing. A more typical average speed with a listing is 80-90 CPM.

The retro-220 emulator is designed to run programs at the speed of a real 220, using instruction times published in its Operational Characteristics manual [12] and the performance of peripheral devices from its Operating Procedures manual [13]. Emulator performance matches predicted performance on some very small benchmarks where it is easy to analyze the instruction timings, but timing is much more difficult to predict on larger, more complex programs that do significant amounts of I/O, such as the BALGOL compiler.

The emulator uses about 80% of an Intel 3.3GHz i3-2120 core when running under Windows 7 in the Firefox 61.0 browser and compiling a non-trivial program. It uses about 60% of a core when executing a program doing no I/O. Most of that processor use is devoted to browser overhead (garbage collection, rendering updates to the control panels and peripheral device displays, etc.), not to emulating 220 instructions, which requires only about 5% of a core. CPU utilization in Google Chrome 68.0 is about 30% less. In any case, the emulator's performance does not appear to be limited by its host environment.

It is entirely possible that the emulator is not regulating its performance correctly and running slower than a real 220, but there is a contradictory report to consider. In a 1960 letter [14] to the Communications of the ACM, Bob Barton briefly described BALGOL (calling it a "nameless compiler") and stated, "1. Compiling Rate: Averages 500 single-address instructions per minute." In the retro-220 emulator, I am seeing instructions generated at rates of 600-700 per minute, compiling the same programs where cards are being read at only 80-100 CPM. So Barton gives us a report of compiler performance that is lower than that observed in emulator.

From this, I conclude that the claim of compiling at card-reader or line-printer speed is somewhat overstated. This should take nothing away from the compiler's performance, which for its time was remarkable, especially considering the intrinsic speed of the 220.

Significance and Impact of BALGOL

As mentioned above, BALGOL had a significant impact within Burroughs on the design of the B5000/5500 computer system that followed the 220. Not only did the compiler give the Burroughs design team in Pasadena the confidence that a system could be designed to support fast, one-pass compilers and use solely higher-level languages (the B5000/5500 never had a resident assembler, and its modern descendants still don't), but many of the data structure techniques it used internally were given hardware support. The I/O and diagnostic features of the language also migrated into the Burroughs Algol-60 compilers that followed BALGOL.

One striking result of the influence and experience of BALGOL upon the B5000/5500 is that Burroughs did considerably better in terms of performance of its one-pass Algol-60 compiler for that machine. It could easily compile at card-reader speed -- using a 1400 CPM reader.

Braden [10] describes the impact, under the guidance of George Forsythe, that BALGOL had during his time at Stanford on the growth of what he terms "computer literacy." The language's features and ease of use, along with the compiler's operational efficiency created a highly-productive, quick-turnaround environment that encouraged people to develop programming skills and begin using them in their studies and research.

Braden also mentions some students who were stimulated by the environment and went on to notable careers in computing and computer science. Two of those students, Roger Moore and Larry Breed, along with Braden, succeeded in cloning the BALGOL compiler from the 220 to an IBM 7090 after Stanford acquired that latter machine. The cloned language was referred to as SUBALGOL. It was bootstrapped in stages so that the compiler was eventually written in itself. It also offered the same degree of operational efficiency on the much-faster 7090 as had been enjoyed on the 220.

The next blog post will discuss the significant effort to recover the BALGOL compiler source code and get it into running condition, as well as what I have learned over the course of this project about the compiler's internal operation.


[1] "Algol-58 – the International Algebraic Language,"
[2] "Preliminary Report—International Algebraic Language," Communications of the ACM, vol. 1 #12, December 1958, pp. 8-22,
[3] "Burroughs B5000 Conference" 1985-09-06, Charles Babbage Institute,
[4] "BAC-220: Burroughs Algebraic Compiler," Revised Edition, 220-21017, Burroughs Corporation, Detroit, Michigan, March 1963,
[5] BALGOL Listing, May 1961,
[6] BALGOL Listing, March 1962,
[7] Burroughs 220 Emulator Hosting Site, http://www.phkimpel.us/Burroughs-220/.
[8] retro-220 Emulator Project Repository, https://github.com/pkimpel/retro-220/.
[9] "Introducing the retro-220 Emulator,"
[10] "Burroughs Algol at Stanford University, 1960-1963," Robert Braden, IEEE Annals of the History of Computing, vol.35 #4, October-December 2013, p.69-73,
[11] "Stack Computers," David M. Bulman, IEEE Computer, vol.10 #5, May 1977, p.14-16,
[12] Operational Characteristics of the Burroughs 220, Burroughs Corporation, Bulletin 5020A, August 1960,
[13] Handbook of Operating Procedures for the Burroughs 220, Burroughs Corporation, Bulletin 5023, November 1969,
[14] "Another (Nameless) Compiler for the Burroughs 220," R. S. Barton, Communcations of the ACM, vol.4 #1, January 1961, p.A11, https://dl.acm.org/citation.cfm?doid=366062.366070.
[15] "The Shell Assembler, Part 1,"