In the prior post [1], I briefly summarized the movement in the 1950s toward the development of higher-level programming languages, the events leading up to the development of Algol, and an overview of the preliminary version of Algol, now known as Algol-58.
This post describes the origin of the Burroughs compiler, compares the BALGOL language features to those of the Algol-58 Preliminary Report [2], and briefly discusses compiler performance and influence.
Origin of BALGOL
In 1954, Robert S. ("Bob") Barton took a job at Shell Development Company Research in Houston, Texas, working on programming applications. By 1957, he had a talented group of young programmers working with him, including Joel Erdwinn, Clark Oliphint, C. M. Pearcy, and (as a summer employee) David Dahm.
This young gang of four wrote quite an advanced assembler for the ElectroData/Burroughs 205. The assembler was tape-based, but included a macro facility (including maintenance of a macro library on tape), special support for addressing the 205's high-speed memory loops, warnings for invalid adjacent instruction parings, and most usefully, a pseudo-instruction that would take COBOL PICTURE-like strings and generate the obtuse 319-digit patterns that are needed for Cardatron format bands.This became known as the Shell Assembler [15].
In 1959, Barton left Shell to work for the ElectroData division of Burroughs in Pasadena, California. Erdwinn, Oliphint, and Dahm (still as a summer employee) followed him to Pasadena. They became known as the "arthropods," because, well, they came from Shell. Not long after, Barton became the head of the so-called Automatic Programming Group in Pasadena, which was developing compilers and other programming aids.
By this time, the Burroughs 220 had been available to customers for about a year. While the 220 was a decimal, vacuum-tube machine, it had good (if somewhat slow) facilities for scientific and numerical computation. According to a memorandum Tom Sawyer has uncovered in the Burroughs archives at CBI, dated 12 November 1958, from R. A. Mallet of the Scientific Compiler Project, Burroughs had been working on a FORTRAN compiler with the goal that "the Burroughs 220 would be able to use any program written, in FORTRAN, for the IBM 650, 704, 709, and 7070..." The memo goes on to request a change in the way that arithmetic overflow was originally handled by the 220.
When Barton became head of Automatic Programming, he apparently assumed responsibility for this project, and according him in the 1985 B5000 Conference oral history [3, page 50], "It's correct to a certain extent in that the job that I had taken, under generally misleading conditions, called for doing an impossible FORTRAN which would also include conversion of assembly language from the 7090, or whatever the machine was at the time, automatically. I knew it couldn't be done, but that was my responsibility."
Somewhere in this time frame, the focus for the "scientific compiler" shifted from FORTRAN to Algol-58, the Preliminary Report [2] for which had been published in December 1958. It is not clear whether this happened at Barton's instigation or the young team took this direction on their own, but Dahm tells this amusing – and telling – anecdote from the B5000 Conference [3, page 27]:
I remember when I was a summer employee in the summer of 1959, I was working on the [BALGOL] compiler with another fellow named Joel Erdwin [sic], who was the project leader. And we were busily trying to do our version of IAL, and one day Bob Barton came along and he had a FORTRAN manual in his hand. It was a really nice, well-done FORTRAN manual done by IBM. He said, "Why don't you guys do this language because then you wouldn't have to write a manual?" We rejected that as being not a very good reason for picking a language. [Laughter] So, basically, I would say that the decision, that the compiler we would do would be ALGOL as opposed to FORTRAN was made by a summer employee and a project leader. I don't know that anyone else was really involved in making that decision.One further clue as to the shift in focus is Barton's statement immediately following his complaint over the "impossible FORTRAN" [3, page 50]:
Erdwin [sic] would never have done a FORTRAN. I mean, he'd been going through this kind of educational experience at Shell, and he was not the sort of guy that would waste his time doing FORTRAN. He knew too much about language.So it appears the decision to do an Algol-58 compiler was Erdwinn's, but probably with Barton's active approval and support. Barton gives Erdwinn full credit for the design of the BALGOL compiler, saying it was "a brilliant job of programming" and "Erdwin's [sic] masterpiece" [3, page 49].
Over the short term, it would probably have been better for Burroughs to have gone forward with a FORTRAN compiler, albeit one with more realistic goals. Over the longer term, though, the choice to go with Algol-58, plus the experience gained from building BALGOL and having customers use it, helped set Burroughs on the path to the B5000/5500, B6500/6700/7700, and later related systems that continue to be produced and sold today as Unisys ClearPath MCP systems.
According to the BALGOL reference manual [4], the compiler was completed and first installed for customer use in March 1960. We have two scans of listings of the compiler, one dated 1 May 1961 [5], citing in the heading [Joel] Erdwinn, [Jack] Merner, [Fran] Crowder, [Joe] Speroni, and [Donald] Knuth as authors. The second listing, dated 1 February 1962 [6], adds the names [David] Dahm, [Clark] Oliphint, Logemann, and [Toni] Schuman.
Erdwinn left Burroughs shortly after the BALGOL compiler was finished to join fast-growing Computer Sciences Corporation (CSC, now DXC Technologies). He spent the bulk of the next two decades building compilers and system software for CSC's customers, particularly compilers for none other than... FORTRAN.
Merner, Oliphint, and Dahm continued with Burroughs and, with Barton, had major roles in the development of the B5000/5500. Dahm is also known for his work on the B5500 Timesharing MCP and CANDE, and on the DMSII data base management system for the B6700/7700.
BALGOL vs. Algol-58
BALGOL for the Burroughs 220 is one of several early implementations of the Algol-58 language described in the 1958 Preliminary Report [2]. Other well-known implementations are System Development Corporation's JOVIAL, the U.S. Navy's NELIAC, and the University of Michigan's MAD. The Preliminary Report was a progress report on the current state of a committee's attempt to develop a universal algorithmic language, not a fully-vetted specification. As such, and given the limited capabilities of computer systems at the time, all implementations based on the Report deviated from it, often significantly.
Of the Algol-58 variants, BALGOL is arguably the one that most closely followed the syntax and semantics put forth in the Preliminary Report. The BALGOL compiler is a single-pass design, intended for fast compilation at the expense of highly-optimized code, and oriented to a compile-and-go job shop environment.
The single-pass design imposes some restrictions on the dialect of Algol-58 BALGOL can support, and on the ordering of some language elements in the program text, as is discussed in the following sections. It drops some features of Algol-58, and implements others in slightly different ways, but adds little except for pragmatic facilities such as FORTRAN-like input/output, an open subroutine, program segmentation and overlay facilities, some diagnostic facilities, and the ability to incorporate machine-language routines into the compiled object code.
BALGOL's reference manual even has a detailed set of transliteration rules [4, Appendix E], as called for in the Preliminary Report, describing the differences between the Report and BALGOL's implementation. Going the Preliminary Report one better, the manual has a complete syntax definition for BALGOL in Backus-Naur Form [4, Appendix D].
The remainder of this section discusses the differences between BALGOL and Algol-58 in detail. It follows the overview of Algol-58 features in the prior post [1].
Symbology
Where the Algol-58 reference language uses a 63-character set (not counting upper- and lower-case equivalents), Burroughs chose to stick with the 48-character set available with IBM punched-card equipment, which most sites used through the 220's Cardatron subsystem.
Rather than attempt some sort of delimiter scheme to support the word-symbols of Algol-58 (begin, end, if, for, etc.), in BALGOL these symbols are considered to be reserved words. Spaces are significant and must be used as delimiters between reserved words and identifiers.
- The 26 English letters: A-Z [upper-case only]
- The decimal digits: 0-9
- Arithmetic operators: + – . / *
- Relational operators: LSS LEQ EQL GEQ GTR NEQ
- Logical operators: NOT OR AND EQIV [equivalence] IMPL [implication]
- Sequential operators: GO, GO TO, RETURN, STOP, FOR, IF, OR, EITHER IF, OR IF, SWITCH, UNTIL, OTHERWISE, OVERLAY
- Separators: , .. $ = ** BEGIN END
- Brackets: ( )
- Declarators: INTEGER, BOOLEAN, FLOATING, REAL, ARRAY, FUNCTION, COMMENT, PROCEDURE, SUBROUTINE, SEGMENT, FINISH, INPUT, OUTPUT, FORMAT, MONITOR, TRACE, DUMP, EXTERNAL STATEMENT, EXTERNAL PROCEDURE
The IBM Type 407 tabulator used as a printer with the Cardatron had interchangeable type wheels. Two common wheel sets were "A" (commercial) and "F" (FORTRAN). Four character glyphs differed between the two, with the commercial set shown on the left below and FORTRAN on the right:
# =The internal character codes for these pairs were the same; they simply printed differently on different 407s. In addition, the 407 does not support the semicolon (;), so BALGOL uses the dollar sign ($) in its place.
& +
% (
¤ )
Aside from these substitutions, BALGOL adopted the following changes in symbology from the Algol-58 Preliminary Report:
- The period (.) replaces the small cross (×) for multiplication.
- The asterisk (*) replaces the up/down arrow brackets (↑↓) for exponentiation.
- Mnemonic equivalents replace the relational and logical operators, as shown above.
- Two adjacent periods (..) replace the colon (:).
- The equal sign (=) replaces the assignment operator (:=). EQL is used as the relational operator for equality.
- Two adjacent asterisks (**) replace the subscripted (10) to delimit the scale factor in numeric literals for scientific notation.
- Square brackets, [ ], are not supported by the 407, so parentheses are used to enclose array subscripts.
Data Types and Declarations
BALGOL supports only INTEGER, REAL (FLOATING), and BOOLEAN data types. Unlike some other dialects of Algol-58 (e.g., JOVIAL), there is no provision for records, strings, or other composite types.
Literals
Literals are specified the same way as in Algol-58, except that numbers have the general form I.F**±E, where I is a whole integer, .F is a decimal fraction, and **±E is a scaling factor as a power of ten. The fraction and scale factor are optional, as is the sign preceding the exponent of the scaling factor. Unlike Algol-58, a number cannot be expressed as simply a scaling factor without at least an integer prefix to it. In addition, a literal cannot begin or end with a period for the decimal point (.), as this could be confused with the use of the period as a multiply operator. Boolean literals continue to be represented by 0 and 1 for false and true, respectively.
Identifiers and Types
As with Algol-58, BALGOL assumes all identifiers refer to REAL variables by default. A type declaration can specify an identifier to be of type INTEGER or BOOLEAN. Array identifiers within type declaration lists need not be followed by empty subscript lists indicating dimensionality – that is determined by the ARRAY declaration. The parentheses surrounding the entire list of identifiers in a type declaration are also optional in BALGOL.
Identifiers must start with a letter, which may be followed by letters and digits, up to a total length of 50 characters. An identifier may not contain embedded spaces or special characters.
BALGOL also provides a generic type specifier termed a prefix. A type declaration can include an identifier followed by an ellipsis (three adjacent periods, ...). Any identifier not otherwise covered by a type declaration that begins with the same set of characters automatically receives the specified type. Multiple prefixes can be specified with overlapping subsets of the same leading characters. Unless an identifier is fully stated in a type declaration, the longest prefix matching an identifier determines its type. An OTHERWISE clause may specify a globally-default type.
INTEGER I..., J..., K..., L..., M..., N...$With these declarations, the identifier MATRIX would be of type REAL, not INTEGER. Because BALGOL is a one-pass compiler, type declarations must precede first use of an identifier in an expression or array declaration.
BOOLEAN KEEP, LAST$
REAL MAT..., JOULES$
REAL OTHERWISE$
Array Declarations
Array declarations differ from those in Algol-58. The most significant difference is that custom bounds are not supported. Each dimension of an array is declared simply as its length. All indexing is one-relative. Only literal lengths may be specified.
As an extension, BALGOL permits an array declaration to specify initial values for the array. If the dimension list is followed by an equal sign (=) and a list of literal values in parentheses, the compiler will fill the array with those values at the time the program is loaded. For arrays of multiple dimensions, a linear list of values is specified. These are assigned to the array elements in row-major order, starting with the last dimension. If the list of values is shorter that the total number of elements, the remaining elements are initialized to zero.
ARRAY A(9), B(4,12), C(10, 10, 10)$
ARRAY MONTHDAYS(12) = (31,28,31,30,31,30,31,31,30,31,30,31)$
Function Declarations
BALGOL supports in-line function declarations like Algol-58, but with a slightly different syntax. The function identifier must be preceded by the reserved word FUNCTION, and the parameter list delimited from the function body by an equal sign (=). Use of an in-line function is otherwise the same.
FUNCTION TORADS(DEGREES) = DEGREES . 3.1415926/180$
FUNCTION PYTHAGOREAN(A, B) = SQRT(A*2 + B*2)$
Diagnostic Declarations
BALGOL supports three declarations for its run-time diagnostic facilities:
- MONITOR specifies a list of variable identifiers and labels. At run time, whenever one of the named variables is modified, or any variable within the scope of a named label is modified, the first five characters of that variable's identifier are printed along with the variable's new value.
- DUMP specifies a list of variables or labels. When a dump is triggered, the abbreviated names and the values of the variables are printed. Labels are printed with the count of the number of times they have been passed. A dump is triggered by a TRACE declaration.
- TRACE specifies a list of label identifiers, each optionally followed by a literal integer in parentheses. If an integer (n) is specified, then every n-th time that label is passed in the program, all DUMP declarations in the program (or if execution is within a procedure, all DUMP declarations within that procedure) are triggered, and the values in their variable lists are dumped. If a label in a TRACE list is not followed by an integer, then the corresponding DUMP declarations are triggered each time the label is passed, as if (1) had been specified.
MONITOR X, Y, Z$
DUMP A, B, D$
TRACE AGAIN, INNERLOOP(100)$
Segmentation and Overlay
BALGOL supports multi-level segmentation of the object code for a program. A segment is simply a compound statement preceded by the reserved word SEGMENT and an identifier that labels the segment. Segments may be nested. The code outside any segment declaration is considered to be a global segment, and is permanently resident while the program is running.
A segment is called into memory using the OVERLAY statement, naming the segment label. Note that this statement does not transfer control to the segment, it merely loads it into memory. A separate GO TO statement must be used to enter the segment, and the segment label cannot be used for this purpose. Typically, the first statement inside the segment's compound statement is given a label, and a manual transfer is made to that label. Similarly, exit from the segment is not automatic, and an explicit transfer of control must be coded to return to the code for some containing segment.
Section 9 of the reference manual [4] gives the rules for overlaying segments and transferring into and out of them.
FINISH Declaration
Every BALGOL program must have a FINISH declaration at its end. This signals to the compiler the end of the BALGOL text and the start of the process to link in any library routines or external machine-language programs. The "$" following FINISH is required.
While the reference manual classifies this as a declaration, it is actually an executable statement, as it causes the compiler to emit a halt instruction with 9669 00 9669 ("XX") displayed in the C register. Pressing START on the Control Console at this point will execute a card or paper-tape read (depending on how the compiler was generated), with the intent of booting the next job in sequence.
FINISH$
Other Declarations
Comment declarations are the same as in Algol-58. After the reserved word COMMENT all text is ignored until the next "$".
The switch declaration of Algol-58 is not supported by BALGOL. Instead there is a more primitive SWITCH statement, as described below.
There are additional declarations to support formatted input and output of data. These are discussed in the section on input-output below.
Expressions
BALGOL follows the scheme of Algol-58 expressions closely, but more completely defines the resulting type of an expression, and adds a few twists.
Arithmetic Expressions
Arithmetic expressions are the same as in Algol-58, with the following exceptions:
- Array identifiers are indexed using a list of subscripts in parentheses rather than square brackets.
- Different characters are used to specify multiplication and exponentiation, as described above under "Symbology."
- Expressions are evaluated left to right, except that default precedence is defined to be exponentiation first, then multiplication, then division, then addition/subtraction. Parentheses, as usual, can override the default precedence.
- In certain cases, the multiply operator (.) may be omitted, as in standard mathematical notation. The specific rules are stated in Section 3 of the reference manual [4], but basically the operator may be omitted:
- between a closing parenthesis on the left and an opening parenthesis on the right
- between a simple variable or literal and a parenthesis on either side
- between a closing parenthesis on the left and a subscripted or function variable on the right
- between a literal on the left and a simple, subscripted, or function variable on the right.
- The type of an arithmetic expression is integral if all of its components are integral and real otherwise. Note that inner expressions may be evaluated as integers while the overall expression may result in being of type real. The compiler will coerce literals and some constant expressions to the appropriate destination type at compile time where possible.
- The integer precision on the 220 is ten decimal digits plus sign, so all integer operations are effectively truncated on the high end to ten digits, e.g., 5000000001 + 5000000002 = 3. The floating-point precision is eight digits.
ALPHA + BETA / GAMMA
(ALPHA + BETA) GAMMA
A*2/2*(B-1)
SIGN(A(R,L))/2SQRT(S)
Boolean Expressions
Boolean expressions operate in BALGOL as in Algol-58 with a few enhancements:
- Parentheses around an arithmetic relation are not required.
- Array subscripting is the same as for arithmetic expressions.
- The Algol-58 relational symbols are replaced by mnemonic operators, as described under "Symbology" above.
- If a relation is between operations of different types, the compiler will coerce the integer operand to floating point. If the integer operand is a literal, this coercion will occur at compile time.
- BALGOL adds a Boolean operator for implication (IMPL). A IMPL B evaluates to true unless A is true and B is false.
- Evaluation of Boolean expressions is from left to right with this operator precedence: NOT > AND > OR > IMPL > EQIV.
(ALPHA GEQ BETA) OR (GAMMA NEQ EPSILON) EQIV NOT OVERRIDE
Statements
The concept of statements in BALGOL is essentially the same as that in Algol-58. The "$" is used to separate statements, not terminate them. The end of a line or record of program text has no syntactic significance. Statements may be split across lines and multiple statements may be grouped on a line.
Since the BALGOL character set does not include the colon (:), statement labels are indicated by two adjacent periods (..). Labels may be either identifiers or literal integers. Leading zeroes in a numeric label are not significant.
Compound statements are identical to those in Algol-58, except that the terminating END of a compound statement may be labeled to provide an exit point. An END may not be immediately preceded by a "$". If the BEGIN of a compound statement is labeled, the label may optionally be repeated after the corresponding END. Curiously, left and right parentheses can be substituted for BEGIN and END, respectively.
As with Algol-58, declarations may be intermixed with statements. Since BALGOL is a single-pass compiler, however, declarations for an identifier after the identifier has been first referenced (and its type determined at that point) are ignored.
The do macro statement of Algol-58 is not supported by BALGOL.
Assignment Statements
The BALGOL assignment statement is the same as for Algol-58, except that the operator "=" is used instead of ":=". Type coercion works the same – integer expressions are converted before being stored in real variables and real expressions are truncated to integer before being stored in integer variables. A Boolean expression will result in a 0 or 1 being stored in a numeric variable. Numeric expressions cannot be stored in Boolean variables.
BALGOL adds the ability to assign an expression to multiple variables. The stores take place in right-to-left order, starting with the variable immediately to the left of the expression. The expression value is coerced as necessary to a destination variable's type before the store, and that coerced valued is then applied to the next store in sequence. Note that this is multiple assignment of a single expression; embedded assignment within an expression (as in Algol-60 and C) is not supported.
ALPHA = BETA = GAMMA = 2DELTA + 1$
GO TO and SWITCH Statements
GO TO statements are more primitive in BALGOL, since it does not support the Algol-58 switch declaration, and thus there is no concept of a designational expression. GO TO may only reference a label. The word TO may be omitted.
To compensate for the lack of the switch declaration, BALGOL supports a SWITCH statement. This is a classic computed-go-to construct, where the value of an expression selects the branch destination from a list of labels. If the value is zero, control continues in sequence after the SWITCH statement. Values starting at 1 select a label from the switch list. If the value is negative or greater than the number of labels in the switch list, the behavior is undefined (internally, the branch table is built and indexed backwards, and an out-of-range index will result in a branch to some arbitrary instruction).
SWITCH CHOICE-1, (CASE1, CASE2, CASE3, CASE4)$
IF Statements and Alternative Statements
The IF statement works identically in BALGOL as in Algol-58.
The alternative (if either... or if...) statement of Algol-58 is supported in BALGOL, but with a slightly different syntax. Instead of starting with if either, the statement starts with EITHER IF. Otherwise, the chaining of OR IF clauses and terminating END are the same.
BALGOL adds an additional feature to the EITHER IF syntax. Instead of the terminating END, you may write "$ OTHERWISE$" followed by another statement, which of course may be a compound statement. This statement will be executed if none of the others in the skip chain have been selected. Thus BALGOL does have an "else." I found to my great chagrin that the "$" before "OTHERWISE" is absolutely necessary – without it, mysterious cascading errors result.
EITHER IF A > B$ FWD = 1$ OR IF A < B$ FWD = 0 END$
EITHER IF A > B$ FWD = 1$ OTHERWISE$ FWD = 0$
FOR and UNTIL Statements
The FOR statement in BALGOL functions identically to that in Algol-58, but with a slight change in syntax. The iteration list may be composed of individual values or triplets that define a sequence of values, but the triplet form is specified as "(initial, increment, final)" rather than Algol-58's "initial (increment) final".
BALGOL adds a new iteration statement, UNTIL, similar to the modern while statement. It evaluates a Boolean expression and iterates until the expression becomes true. If the expression is initially true, the body of the loop does not execute at all.
FOR I = 1, 3, 5, 7, 11, (13, 7, 99), 101, (103, 1, 125)$
A(I) = I$
UNTIL A > 125 OR OUTTAHERE$
BEGIN
OUTTAHERE = A EQL 77$
A = A + 3
END$
STOP Statement
BALGOL supports the STOP statement of Algol-58 with two extensions. First, it is a simple halt, and does not terminate execution of the program. Pressing START on the Control Console will resume the program with the next statement in sequence.
Second, the word STOP may be followed by an optional arithmetic expression. The value of the expression will be displayed in the A register on the Control Console. All STOPs display 0137 00 7310 in the C register.
Procedure Declarations and Procedure Calls
Procedure declarations are somewhat simplified in BALGOL compared to Algol-58, but most of the capability is retained.
Procedure Declarations
The first significant difference for BALGOL is that multiple entry points are not supported, so the procedure heading does not have multiple identifiers each with their own parameter list. No label of the same name within the body of the procedure is required.
Second, the input and output parameter lists are not delimited by the "=:" operator. Instead, parameters are arranged in three groups: input, output, and reference parameters. Input parameters may be expressions or arrays. Output parameters may be variables, including arrays. Reference parameters include formal labels, formal functions and procedures, and identifiers for the INPUT, OUTPUT, and FORMAT declarations described below in the section on input-output. The parameter groups are delimited by "$". Any of the groups may be omitted, and the "$" for any trailing empty groups need not be specified. There is no difference between passing an array as an input parameter or as an output parameter.
Formal array identifiers must be specified with an empty subscript list containing n-1 commas to indicate an array of n dimensions. Formal procedure and function identifiers must be followed by an empty parenthesis pair.
Type declarations for the formal parameters must occur immediately after the BEGIN that introduces the body, not before it as in Algol-58. After the final END of the procedure, the procedure identifier may be stated, but it must be followed by an empty parenthesis pair. If the procedure returns a value like a function, the return value is assigned to the procedure's identifier as for Algol-58, but the identifier must be followed by an empty parenthesis pair in this case as well. See the Simpson's Rule example below for an illustration of the syntax.
A RETURN statement is required to exit the procedure.
Once a procedure has been declared, its identifier is available globally throughout the remainder of the program text and may be referenced, even within the body of another procedure.
PROCEDURE CROUT4 ($ N, A(,), B(), Y(), PIVOT(), DET, EX7$In the example above, A, B, Y, and PIVOT are arrays, SINGULAR is a label, and IP is a function or a value-returning procedure. There are no input parameters in this case, only output and reference parameters.
SINGULAR, IP())$
BEGIN
INTEGER N, PIVOT, EX7$
...
RETURN
END CROUT4()$
Procedure Call
The differences for a BALGOL procedure call follow those for the procedure's declaration. Actual parameters are divided into input, output, and reference groups, and the groups delimited by "$". Expressions and scalar variables in the input group are effectively passed by value. Everything else is passed by reference (address).
Array slices can be passed by leaving some subscript positions empty for the actual array parameter, n empty positions specifying an n-dimensional slice. When passing a procedure or function as an actual parameter, the procedure or function identifier must be followed by an empty pair of parentheses. Unlike Algol-58, the number of parameters for the function is not indicated, and currying of the parameters is not supported.
CROUT4 ($ K, MAT(,), RHS(), FACTORS(), PIV(), DET, POWER$
SINGULARITY, INNERPRODUCT())$
SUBROUTINE Declarations
BALGOL supports an open SUBROUTINE declaration. This is simply a compound statement that is given a name and can be called as a parameter-less procedure from other points in the program. All access to variables is global. Any declarations within the body of the SUBROUTINE are treated as if they had been placed outside its body and do not create local variables. A subroutine is called using the ENTER statement. Exit from the subroutine is by means of a RETURN statement, as for procedures.
SUBROUTINE COMMONCODE$
BEGIN
X = Y + Z$
Y = A(I,J)*2 + Z$
RETURN$
END$
ENTER COMMONCODE$
Intrinsic Functions
BALGOL defines a number of intrinsic functions. These are not part of the compiler's library – they instead generate in-line code. All are considered to be reserved identifiers:
- MOD(X, MODULUS) – remainder division of X by MODULUS
- MAX(X1, X2, ..., XN) – maximum value from a list of arithmetic expressions
- MIN(X1, X2, ..., XN) – minimum value from a list of arithmetic expressions
- SIGN(X) – sign of the argument expression, -1, 0, +1, as the type of the expression
- ABS(X) – absolute value of the argument expression, as the type of the expression
- PCS(N) – returns a Boolean value indicating whether the specified Program Control Switch on the 220 Control Console is on, N = 0-9.
External Routines
PROCEDURE declarations may be prefixed with the reserved word EXTERNAL. In this case the body of the procedure is omitted. An external routine is one whose body will be supplied in relocatable machine language after the FINISH declaration at the end of the BALGOL text. The compiler will assign storage for the machine language instructions and link them into the object code. Instructions for preparing external machine-language routines and the linkage mechanism are in Appendix F of the reference manual [4].
BALGOL supports an additional type of machine-language linkage, EXTERNAL STATEMENT. This is both a declaration and an executable statement. It declares a label identifier that is matched to the identifier of a machine-language routine that will follow the FINISH declaration for the program. The statement branches to the external machine-language code. This is not a subroutine linkage, though, and BALGOL offers no intrinsic means for returning to the point in the program after the EXTERNAL STATEMENT.
Input-Output Facilities
The Algol-58 Preliminary Report does not address the subject of input and output of data for a program. BALGOL provides input-output capabilities somewhat similar to FORTRAN. The default configuration of the compiler and its library assumes input from punched cards and output to a line printer or a card punch, but alternate versions of the compiler could be generated to use other media, such as paper tape.
Input to a BALGOL program is free-form, with values delimited by one or more spaces or the end of the 80-character card or data record. The input formatter accepts integer and floating-point values. Numbers in scientific notation use "," to delimit the power-of-10 scale factor, not "**" as in BALGOL text. Character strings delimited by "$" may be read into a sequence of integer variables or array elements, five characters per word.
Values from the input medium are mapped to variables within the BALGOL program by means of an INPUT declaration. This specifies a named list of variables that is in turn referenced by a READ statement. Iteration through arrays is supported in much the same way as in FORTRAN:
INPUT MATRIX (N, FOR I=(1,1,N)$The INPUT declaration generates a co-routine that supplies the address of the next element in the list on each call to it. The READ statement calls this co-routine and performs the actual formatting and data transfer. It keeps reading from the external medium until the input list is satisfied.
FOR J=(1,1,N)$
(M1(I,J), M2(J,I)))$
READ ($$ MATRIX)$
A second form of READ allows it to terminate before satisfying the list and signal that fact to the program:
READ ($ ENDED$ MATRIX)$where ENDED is a Boolean variable that is set to true (1) if the input is interrupted by a signal in the input data stream. The signal is indicated by the 10-character sequence " SENTINEL " starting in the second character position of an input line or record. This signal is ignored if the first form of READ is used.
Output from a BALGOL program is not free-form, but formatted in a way similar to that of FORTRAN. An OUTPUT declaration defines the mapping of program values to the output medium, analogous to an INPUT declaration. It also generates a co-routine and is in turn referenced by a WRITE statement. While an input list may only name variables, an output list may specify its elements as general expressions.
The data is formatted and arranged on the output medium according to the specifications of a FORMAT declaration. This is like a FORTRAN FORMAT, although the formatting phrases are coded somewhat differently. The WRITE statement performs the actual formatting and data transfer, naming the identifiers for an output list and a format. A WRITE statement may also omit the data list and specify only a format identifier if the format contains only literal data.
OUTPUT MATOUT (N, SUM, FOR I=(1,1,N)$The BALGOL input-output facilities are described in Section 8 of the reference manual [4].
FOR J=(1,1,N)$
(M1(I,J), M1(I,J)/M2(I,J)))$
FORMAT F1 (I8, B2, *SUM = *, X12.6, W2, (8F14.8, W0)),
F2 (B20, *THATS ALL FOLKS*, W2)$
WRITE ($$ MATOUT, F1)$
WRITE ($$ F2)$
Examples
The prior post [1] exhibited near its end an example Algol-58 procedure for computing an integral using Simpson's Rule. The following is that example procedure transformed into BALGOL, and combined with declarations, procedure calls, and input-output statements to make it a complete program:
2 PROCEDURE SIMPS(A, B, DELTA, V$$ F())$The "2"s in the leftmost column of the program text are an artifact of the 220's Cardatron punched-card subsystem, and must be present to select an input format when the cards are read. BALGOL uses "2" for source cards, "6" for machine-language code, and "5" for data cards. The output printed by this program is:
2 BEGIN
2 INTEGER K, N$
2
2 IBAR = V(B-A)$
2 N = 1$
2 H = (B-A)/2$
2 J = H(F(A) + F(B))$
2
2 J1..
2 S = 0$
2 FOR K = (1, 1, N)$
2 S = S + F(A + (2K-1)H)$
2
2 I = J + 4H.S$
2 IF DELTA LSS ABS(I-IBAR)$
2 BEGIN
2 IBAR = I$
2 J = (I+J)/4$
2 N = 2N$
2 H = H/2$
2 GO TO J1
2 END$
2
2 SIMPS() = I/3$
2 RETURN$
2 END SIMPS()$
2
2 FUNCTION TORADS(X) = 3.1415926X/180$
2
2 FUNCTION DARCTAN(X) = 1/(X*2 + 1)$
2
2 PROCEDURE LOGISTICSIGMOID(X)$
2 BEGIN
2 LOGISTICSIGMOID() = 1/(1 + EXP(-X))$
2 RETURN$
2 END LOGISTICSIGMOID()$
2
2 SUM = SIMPS(TORADS(30.0), TORADS(90.0), 0.00001, 2.0$$
2 SIN())$
2 WRITE($$ RESULT, F1)$
2 SUM = SIMPS(0.0, 1.0, 1**-5, 2.0$$ DARCTAN())$
2 WRITE($$ RESULT, F2)$
2 SUM = SIMPS(0.5, 3.0, 1**-5, 2.0$$ LOGISTICSIGMOID())$
2 WRITE($$ RESULT, F3)$
2
2 OUTPUT RESULT(SUM)$
2 FORMAT F1(*SINE INTEGRAL = *,X10.6,W0),
2 F2(*DARCTAN INTEGRAL = *,X10.6,W0),
2 F3(*LOGISTIC INTEGRAL =*,X10.6,W0)$
2 FINISH$
SINE INTEGRAL = .866025I have transcribed the example programs from Section 11 of the compiler reference manual [4] and have them running in my retro-220 emulator [7]. I have prepared a few other programs, including the Simpson's Rule example above, and have them running in the emulator as well.
DARCTAN INTEGRAL = .785398
LOGISTIC INTEGRAL = 2.074510
These examples are available from the BALGOL-Examples directory in the emulator's open-source project repository [8]. Each example includes source code, sample data, and compilation listings both with and without the code generated by the compiler. A few examples also include equivalent programs for purpose of comparison, written in the Algol-60 dialects of the Burroughs B5500 or modern Unisys ClearPath MCP ("E-mode") systems.
The emulator is written in Javascript and runs in a standard web browser. It can be run directly from its hosting site at [7], or you can download the files from the project repository and install it on some other web server. For instructions on running the emulator and the compiler, see the introductory blog post [9] and the wiki pages in the project repository.
BALGOL Compiler Performance
BALGOL was designed from the beginning to be a fast, one-pass compiler that generated reasonably good code. It was able to do small local optimizations within statements, and some rather fancy constant folding, but did not attempt to do more global optimizations, such as common expression factoring.
Even though the 220 was not a very fast machine, both the user-friendly nature of BALGOL and the speed of its compilation process gave it an advantage in environments where lots of relatively small programs were being submitted by lots of relatively unsophisticated programmers. For this reason it was quite popular at universities and research institutions. Stanford, Caltech, Georgia Tech, Cornell, Case (now Case Western Reserve), and Stanford Research Institute all had 220s and used BALGOL.
Various claims about the compiler's performance have been made. In his 2013 anecdote [10] on the use of BALGOL at Sanford in the early 1960s, Robert Braden stated "For typical source programs, the BALGOL compiler ran at line printer speed, 150 LPM [lines per minute], close to the card reader speed of 240 CPM [cards per minute]."
In a guest editorial [11] for the May 1977 issue of IEEE Computer devoted to stack machines, David Bulman talks about Bob Barton and the influence of BALGOL on the design of the Burroughs B5000/5500, and says of BALGOL, "About sixty compiler instructions were executed to generate each machine language instruction..." A little later he states, "The fact that Balgol [sic] could be compiled at the read speed of the card reader on the B220 lent some credibility within Burroughs to Barton and his ideas for a high-level language machine" (i.e., the B5000).
I have been unable to confirm Bulman's claim on instructions executed per instruction generated. I implemented an instruction counter in my retro-220 emulator and recorded counts of instructions executed and instructions generated for a few small and medium-size programs. There are some choices on just which instructions should be included (e.g., compiler load time, which takes about seven seconds and generates no instructions, or library linking, which generates lots of instructions without processing any source code), but excluding the library linking process, I mostly see ratios of 400-500:1, not 60:1.
I have also been unable to confirm either Braden's or Bulman's claims on compiler speed. For blank cards, comments, and cards with with relatively little text (say, a trimmed length up to 20 or so characters), the compiler does seem to operate at or near the line printer speed of 150 LPM. For complex statements and declarations, there is a noticeable pause between lines, often 2-3 seconds. The best average speed I have seen is close to 100 CPM, but that is without generating a compilation listing. A more typical average speed with a listing is 80-90 CPM.
The retro-220 emulator is designed to run programs at the speed of a real 220, using instruction times published in its Operational Characteristics manual [12] and the performance of peripheral devices from its Operating Procedures manual [13]. Emulator performance matches predicted performance on some very small benchmarks where it is easy to analyze the instruction timings, but timing is much more difficult to predict on larger, more complex programs that do significant amounts of I/O, such as the BALGOL compiler.
The emulator uses about 80% of an Intel 3.3GHz i3-2120 core when running under Windows 7 in the Firefox 61.0 browser and compiling a non-trivial program. It uses about 60% of a core when executing a program doing no I/O. Most of that processor use is devoted to browser overhead (garbage collection, rendering updates to the control panels and peripheral device displays, etc.), not to emulating 220 instructions, which requires only about 5% of a core. CPU utilization in Google Chrome 68.0 is about 30% less. In any case, the emulator's performance does not appear to be limited by its host environment.
It is entirely possible that the emulator is not regulating its performance correctly and running slower than a real 220, but there is a contradictory report to consider. In a 1960 letter [14] to the Communications of the ACM, Bob Barton briefly described BALGOL (calling it a "nameless compiler") and stated, "1. Compiling Rate: Averages 500 single-address instructions per minute." In the retro-220 emulator, I am seeing instructions generated at rates of 600-700 per minute, compiling the same programs where cards are being read at only 80-100 CPM. So Barton gives us a report of compiler performance that is lower than that observed in the emulator.
From this, I conclude that the claim of compiling at card-reader or line-printer speed is somewhat overstated. This should take nothing away from the compiler's performance, which for its time was remarkable, especially considering the intrinsic speed of the 220.
Significance and Impact of BALGOL
As mentioned above, BALGOL had a significant impact within Burroughs on the design of the B5000/5500 computer system that followed the 220. Not only did the compiler give the Burroughs design team in Pasadena the confidence that a system could be designed to support fast, one-pass compilers and use solely higher-level languages (the B5000/5500 never had a resident assembler, and its modern descendants still don't), but many of the data structure techniques it used internally were given hardware support. The I/O and diagnostic features of the language also migrated into the Burroughs Algol-60 compilers that followed BALGOL.
One striking result of the influence and experience of BALGOL upon the B5000/5500 is that Burroughs did considerably better in terms of performance of its one-pass Algol-60 compiler for that machine. It could easily compile at card-reader speed -- using a 1400 CPM reader.
Braden [10] describes the impact, under the guidance of George Forsythe, that BALGOL had during his time at Stanford on the growth of what he terms "computer literacy." The language's features and ease of use, along with the compiler's operational efficiency created a highly-productive, quick-turnaround environment that encouraged people to develop programming skills and begin using them in their studies and research.
Braden also mentions some students who were stimulated by the environment and went on to notable careers in computing and computer science. Two of those students, Roger Moore and Larry Breed, along with Braden, succeeded in cloning the BALGOL compiler from the 220 to an IBM 7090 after Stanford acquired that latter machine. The cloned language was referred to as SUBALGOL. It was bootstrapped in stages so that the compiler was eventually written in itself. It also offered the same degree of operational efficiency on the much-faster 7090 as had been enjoyed on the 220.
The next blog post will discuss the significant effort to recover the BALGOL compiler source code and get it into running condition, as well as what I have learned over the course of this project about the compiler's internal operation.
References
[1] "Algol-58 – the International Algebraic Language,"
http://datatron.blogspot.com/2018/07/algol-58-international-algebraic.html
[2] "Preliminary Report—International Algebraic Language," Communications of the ACM, vol. 1 #12, December 1958, pp. 8-22,
http://www.softwarepreservation.org/projects/ALGOL/report/Algol58_preliminary_report_CACM.pdf.
[3] "Burroughs B5000 Conference" 1985-09-06, Charles Babbage Institute,
http://hdl.handle.net/11299/107105.
[4] "BAC-220: Burroughs Algebraic Compiler," Revised Edition, 220-21017, Burroughs Corporation, Detroit, Michigan, March 1963,
https://bitsavers.org/pdf/burroughs/Electrodata/220/220-21017_B220_BALGOL_196303.pdf.
[5] BALGOL Listing, May 1961,
https://bitsavers.org/pdf/burroughs/Electrodata/220/BALGOL_listing_196105.pdf.
[6] BALGOL Listing, March 1962,
http://archive.computerhistory.org/resources/text/Knuth_Don_X4100/PDF_index/k-1-pdf/k-1-u2196-balgol220compiler.pdf.
[7] Burroughs 220 Emulator Hosting Site, http://www.phkimpel.us/Burroughs-220/.
[8] retro-220 Emulator Project Repository, https://github.com/pkimpel/retro-220/.
[9] "Introducing the retro-220 Emulator,"
http://datatron.blogspot.com/2018/06/introducing-retro-220-emulator.html.
[10] "Burroughs Algol at Stanford University, 1960-1963," Robert Braden, IEEE Annals of the History of Computing, vol.35 #4, October-December 2013, p.69-73,
https://ieeexplore.ieee.org/document/6674030/.
[11] "Stack Computers," David M. Bulman, IEEE Computer, vol.10 #5, May 1977, p.14-16,
https://www.computer.org/csdl/mags/co/1977/05/01646477.pdf.
[12] Operational Characteristics of the Burroughs 220, Burroughs Corporation, Bulletin 5020A, August 1960,
https://bitsavers.org/pdf/burroughs/Electrodata/220/5020A_B220_OperCharac_196008.pdf.
[13] Handbook of Operating Procedures for the Burroughs 220, Burroughs Corporation, Bulletin 5023, November 1969,
https://bitsavers.org/pdf/burroughs/Electrodata/220/5023_B220_OperatingProcedures_19911.pdf.
[14] "Another (Nameless) Compiler for the Burroughs 220," R. S. Barton, Communcations of the ACM, vol.4 #1, January 1961, p.A11, https://dl.acm.org/citation.cfm?doid=366062.366070.
[15] "The Shell Assembler, Part 1,"
http://datatron.blogspot.com/2016/06/the-shell-assembler-part-1.html.
15 comments:
I'm sorry if I'm just missing something obvious, but what assembler was used originally to assemble BALGOL? Was one written specifically for the project? In any case, is there a reference guide/manual for this assembly syntax?
Also -- I really appreciate the work that's being done here. You are doing a great thing by making this old software run again, especially online.
Thanks. There were actually two assemblers used, one for the Generator program (which I suspect was a later development) and one for the compiler, overlay, and library. We know very little about either one, but Tom Sawyer found some correspondence in the Burroughs Archives at the Charles Babbage Institute, dated 15 March 1960, from D. L. Stevens to L. P. Robinson, with subject "Erdwinn's ALGOL Compiler." Based on this memo, it is likely that the assembler used for the compiler and library was "ESAP #1" (Engineering Symbolic Assembly Program #1), originally developed to support the design automation programs for the 3500 project. The 3500 project (not to be confused with the B3500 system) eventually produced what became the B100/200/300-series systems.
We have no documentation, and certainly no source code, for either assembler. I had to reverse-engineer the syntax and semantics for both assemblers from the assembly listings donated by Professor Knuth. I then wrote cross-assemblers in Javascript for each of them. Obviously, these are not full reproductions of the original assemblers, because they are based only on what is revealed in the listings, but they did the job, and I've actually used both of them for some small programming tasks.
A couple of months ago I finally documented what I had deduced of the syntax and semantics for both assemblers. I've added that to the project wiki. I've also had plans for over a year to do a blog post on the recovery of the BALGOL compiler, but just haven't gotten to it yet.
Links the wiki pages follow. They are certainly no great job of technical writing, and you'll need to be fairly familiar with the 220 machine instruction word formats to understand the notation. Instructions for using the assemblers are in their respective wiki pages. There are a few example programs in the /software/examples folder of the project repo.
Assembler for the Generator program:
https://github.com/pkimpel/retro-220/wiki/UsingGENAssembler
Assembler for the compiler and library:
https://github.com/pkimpel/retro-220/wiki/UsingBACAssembler
Software folder on the repo:
https://github.com/pkimpel/retro-220/tree/master/software
Paul.
Once again, this is phenomenal; thank you so much for your work here.
I just have two further questions, if you don't mind.
In "Section A" ("BIG TABLES") there are many pairs of CNST's where one is (a part of) a string, written as a number, and then the second one is an actual $-delimited string constant. My understanding as to why this is done consists of:
1. '3...' seems to mark the beginning of a string in some way.
2. String constants delimited by $ are "terminated", but not "started"; the first $ won't end anything, but the closing $ will end a string.
3. Thus e.g. 34348415941 begins a string constant in memory, then $CTER$ continues it, and terminates the full string "CHARACTER", which would otherwise be too long to fit in a single line/statement; $CHARA$ followed by $CTER$ would represent two separate strings.
My second question is simpler. I was reading Knuth's Art of Computer Programming and noticed that the assembly language he created for the theoretical computer MIX has a way to say "jump to the *next* label X" and also "jump to the *previous* label X", like the assembler in which BALGOL was programmed (but MIXAL uses "B" and "F" as opposed to "-" and "+"). Was this a common feature of assembly languages of the time (1960s)?
Anyway, thank you again for your response, and I hope I'm not bothering you.
Thanks again, and no, you're certainly not bothering me. I'm please that someone is showing some interest in this project.
To address your first question, the purpose of the CNST pseudo-operator was to encode decimal words in memory. The word value could be specified in different ways, usually as a decimal number (with or without sign digit) or as a character string delimited by "$". Note that when the operand was a string, the pseudo always generated an integral number of words, padding the last word with decimal 00 (space) characters as necessary to a multiple of five characters and assigning signs of 2 to each of the words. The signs of 2 are an artifact of the way the teletype equipment worked.
What you are seeing in the table DICT is a dictionary of words used to compose compiler error messages. This was done by the routine WEM at address 1682. For example, consider the call on WEM at address 1746:
FULL STP WEMX
BUN WEM,EX0
CNST 33941430000 COMPILER CAPACITY EXCEEDED
The digit pairs after the sign digit (39, 41, 43) are word offsets into the DICT table, with the list terminated by the first 00 code. In the DICT table, each variable-length entry starts with a word having a sign of 3, with any additional words needed for that entry having a sign of 2. The sign-3 word of the next entry serves as a stopper for the entry. Thus, it's not that a sign-3 starts a string in the assembler, it starts an entry in that particular table of the BALGOL compiler.
The reason for the weird presentation with mixed numeric and string words in the table is that CNST could not encode a $-delimited string operand with a sign other than 2. Therefore, in the first word of a DICT entry, the characters had to be coded as a numeric word using their two-digit decimal equivalents.
For your second question, I have not seen the reusable "point" labels (as they were often called) used outside of Burroughs assemblers from the 1960s, except, of course, for MIX. The Shell assembler for the 205 had them, Knuth's EASY and MEASY assemblers for the 205 had them, the two assemblers I reverse-engineered for 220 BALGOL had them, at least one assembler for the B100/200/300 had them, and the B3500 assembler had them. Curiously, point labels were not a feature of the assemblers Burroughs provided to customers for the 205 (STAR*0) and the 220 (STAR*1). It's quite possible that other assemblers had this feature, and I simply don't know about it.
There is a note on page 2 in the first volume of the 1957 manual for the 205 Shell assembler that is the only source for the origin of point labels I've found:
The idea of the program point address is apparently that
of Melvin Conway of the Case Institute of Technology [now
Case Western Reserve University]. It was incorporated into
SOAP III, an assembler for the IBM 650, by Donald E. Knuth,
also of Case Institute. See Case Institute Report 1005.
https://www.dropbox.com/s/gnsdrwors4j3mnf/Shell-Symbolic-Assembly-Program-for-the-Burroughs-205-Part-I.pdf?dl=0
Thank you for the explanation.
Here's some more stuff I found on the "program points":
First, I guess I was being silly---Knuth actually gives a brief source in the same section where he introduces them (calling them "local symbols", which is a decidedly higher-level term): "The idea of local symbols was introduced by M. E. Conway 1958, in connection with an assembly program for the UNIVAC I."
Of course, this is essentially the same as the note you cite. Looking into it further, I found this quote from a paper whose full text I have not been able to find (the code won't look right, sorry):
One of the more unpleasant experiences encountered in the work on the compiler ["Compiler II-SOAP II"] was the discovery that SOAP II was unable to assemble the entire compiler owing to the symbol table becoming filled up at an early stage of the assembly process. The solution to the problem was obvious but not very satisfactory.
As a matter of fact we did modify SOAP II to dump the symbol table and then reload it again in modified form, but we abandoned this philosophy as not being a worthwhile solution to the problem. Therefore, Mr. Knuth suggested that he write a new symbolic assembly program with some new features incorporated in it. Accordingly, SOAP III (later renamed CASE-SOAP III due to some rather peculiar complaints from a large corporation) was written. CASE-SOAP-III solved the symbol-table difficulty by introducing a fairly new idea--the program point. Program points are addresses which the programmer needs to introduce in order to cause the machine to function properly but which have no mnemonic value to the functioning of the program. For example, one may frequently encounter the use of addresses such as NEXT and NXT, etc., which are included solely to link the program together but which have no significance at all in the logical structure of the problem. The program point was introduced just to solve the problem of naming these "random" addresses. The program point uses no room at all in the symbol table and is continuously redefinable by the simple expedient of using the same point over again. As an example consider the following segment of coding:
RAU ROO02
BMI 1F
STL PO005
RSL ONE6 1F
1 ALO PCON
In both cases of the use of "1F" in the instruction address position, the
meaning is to use the address of program point ''1" forward. Now look at the
next example (which, by the way, does nothing):
2 RAU ROO01 1F
1 SUP ONES
NZE 1F 2B
1 RAL 6 1B
Note that the use of "1F" still refers to the forward program point, while "1B" refers to the correspondingly numbered backward program point. The use of "6" in the data address of the last instruction refers to the address of that particular line of coding (i.e., the line "1 RAL 6 1B"). Using CASE-SOAP III to write our next compiler revealed that there was an economy of approximately 50 per cent in the number of symbols used in CASE-SOAP III over the equivalent program written in SOAP II. Since we are looking forward to acquiring some day some of the optional attachments for our 650, Mr. Knuth incorporated additional pseudo-operations into CASE-SOAP III which allow one to effectively write programs for the augmented machine.
(From here: http://hopl.info/showlanguage.prx?exp=6505 .
The paper is "Current Developments In Computer Programming Techniques" by
Way, F. III. (1958))
Here is the document to which the Shell assembler manual refers:
https://www.computerhistory.org/collections/catalog/102784981
And the listing of that program:
https://www.computerhistory.org/collections/catalog/102724580
A comment on the BALGOL IO facility: I had it from Bill Price (who worked on the B5000/B5500 with Dave Dahm and others) that the Algol I/O facility was known internally as "Knuthput", and had been designed largely by Don Knuth.
I wish I'd asked Dave about this, but he is gone now, and in all the years I knew him I never realized that he had worked on 220 programming, instead thinking he had been brought in on the B5000.
It would be easy to miss Dahm's 220 work since his "official biography" from 1986 (when he was named a Burroughs Fellow) starts off "... since joining Burroughs in 1962 …)
However, among the Burroughs legal papers at CBI, of all places, there is an early listing of the BALGOL compiler and it contains this note: "Another paper concerning this compiler has been accepted for the ACM symposium which will be held on September 23, 1959, at the Huntington-Sheraton Hotel in Pasadena, California. It is entitled “On the Application of Self-Organizing Memory Techniques to Automatic Programming” by Joel D. Erdwinn, David M. Dahm, and George W. Logemann."
This might be an interesting sidelight in the development of virtual memory but I have not yet tried to find the paper.
Dahm also did some significant work on the Burroughs 205. He is listed as one of the authors of the Shell Assembler, developed at Shell Development Research in Houston, and discussed elsewhere in this blog. It's interesting to note that he was still a student when working on both that assembler and the BALGOL compiler.
As to "Knuthput", I have never heard that term, and assume you are referring to Algol I/O for the B5000. Another possibility would be BALGOL I/O for the 220, but we know that BALGOL was originally released in March 1960 and that Knuth wrote his compiler shortly after that, during the summer of 1960, so it's unlikely he had much influence on the original design of BALGOL. I/O for Knuth's BAC-205 Algol-58 compiler for the 205 closely resembles that of BALGOL for the 220 -- in fact the entire compiler appears to be a large subset of 220 BALGOL, so I think he was following the 220 BALGOL design.
Regardless, I would not be surprised if Knuth had at least significant input into the design of Algol I/O for the B5000/5500. He stayed at Caltech after receiving his PhD and continued to consult with Burroughs until he moved to Stanford in 1968.
I was at Pasadena (actually employed by Proctor) from 1972 until it closed, working on Medium Systems. After that I was at Lake Forest for a while until it closed, then at MV until it closed, and finally at Irvine until I retired a last month.
In mid 1973 three people showed up at Pasadena from Large Systems: Larry Hanson, Bill Price, and a third person who's name I can't remember at the moment, other than he had a Doctor with his name. Their story was that they had basically been thrown out of the LS plant (then Proctor) and were coming to roost at Pasadena. All three of them were B6500/B6700 hardware designers. Larry had done the math and a good bit of the rest of the processor. He had also implemented the Vector Mode hack, that then got sold by marketing while it was still just an experiment to show what was possible, and thus became a somewhat lame and quite hated (internally) part of the processor until it was removed from the E-Mode spec around revision Beta (or maybe even Alpha, I forget now).
I seem to recall that Bill had worked on IO hardware, as well as writing software. The third person had also been a hardware architect. I didn't know him as well as Larry and Bill, and am not sure exactly what he worked on.
Bill had worked on the B6700 Algol compiler and the MCP, as well as writing utilities. A bit of legacy remaining in the MCP is the "How to make a sow's ear out of a silk purse" comment at the front of Logical IO. The story goes that he made a disk test that could do IO considerably better than the MCP could, and he also hacked in enough stuff to make it a mini-MCP capable of running some user programs. This did not endear him to the owners of the MCP, and it was suggested to him that he would be better off elsewhere.
Bill of course was also around in B5000 days, though I didn't find out about that until years later. For some reason the people that had worked on the B5000/B5500 and were still around in the 1970s really didn't talk much about those times. While Bill was the first to tell me that the IO implementation in the B6700 ALGOL was known as Knuthput, it was a well-known term by other old-timers at Pasadena and MV that had been around in the B5000 days. I'm pretty sure I recall seeing it in various listings and internal documents back in the 1970s.
While Bill described B6500 ALGOL IO as Knuthput, the formatting can be tracked back to B5500 ALGOL pretty obviously. While I admit to knowing virtually nothing about BALGOL, perusing the pages on it here looks an awful lot like what I've been using for years on E-Mode machines, so I suspect a rather direct history (in concepts, not actual code base) from BALGOL to B5500 ALGOL to B6500 ALGOL.
BTW, another curious antique term was "Kunkerform". This was the original name of the file name storage format used on the B6500. You won't find it in any current documentation nor code that I know of, but I think I've seen it in some early documents available on Bitsavers. I never knew the origin of that name, but suspect it was named after some programmer that came up with it.
This is becoming far too interesting an exchange on the B5x00/B6700 for it to be relegated to the comments for a blog devoted to an entirely different (if somewhat historically-related) architecture.
When we talk about I/O on the B6500/B6700 and later systems, it has three major aspects: Physical I/O (driving the hardware and interacting with the task scheduling mechanism), Logical I/O (user-level I/O, working on top of Physical I/O), and Formatted I/O (FORMAT declarations and read/write lists, working on top of Logical I/O). So I'd be interested in knowing just which aspect(s) of overall I/O you think Knuthput related to.
I doubt that Knuth had much direct influence on the detailed design and implementation of the B6500 -- hardware or software -- as he left Caltech for Stanford in 1968. I wouldn't doubt that he had a great deal of influence on the architecture and early design of that system while he was still in Pasadena, however.
If you started with Burroughs in 1972 and spent the early years in Pasadena, you probably missed the first iteration of all three aspects of I/O on the B6500. It truly was a sow's ear. The design of B6500 Logical I/O was an ambitious advance over that for the B5500, with a somewhat object-oriented approach using "file attributes" to get and set the properties of logical files. I think the initial implementation started as simply an extension of B5500 Logical I/O, but with all of the additional features, the internal complexity quickly got out of hand. The API the user programs had to use was complex as well, and had very poor separation of concerns. As a result, reliability was poor and performance was worse. You could not write a simple Algol program that read a tape file and did nothing with the data, and not have the capstan pinch rollers disengage between every tape block.
Physical and Formatted I/O also had their performance and reliability problems. As more systems started to be delivered and customers started to do serious work with them, the situation became critical. I was the software tech on B6500 #109 from late 1970 through 1972, so I watched all of this happening, and suffered under it.
The solution was a complete rewrite of all three aspects of I/O, implementing simpler APIs, more straight-line code, and taking some brilliant advantage of the efficiencies inherent in direct stack addressing. This introduced the quite incredible FIBSTACK mechanism, which was later generalized into the user-level server library mechanism. I have always wondered who came up with that idea. I suspect it may have been Dave Dahm and/or Roy Guck, as they took a very similar approach in the design of addressing with SIBs and data-base stacks for DMSII.
I am pretty sure that the I/O redesign was released in Mark II.1 (ca. 1971-72), although the improvements to Formatted I/O may have been spread out over the next couple of releases. I can still remember the day we installed that release on the customer's B6500 and our amazement that Library/Maintenance could dump and load files with none of the click-clack from the pinch rollers we had become so accustomed to -- the tape just moved continuously.
--- continued ---
--- continued ---
If Knuthput is Algol array-row I/O, then I can certainly believe that Knuth had at least significant input into its design and implementation on the B5500. Array-row I/O (also known as the I/O Intrinsics) was implemented as part of the complete MCP rewrite for the B5500 to support the Head-per-Track disk.
Before that, you had to use Formatted I/O, or if you were really brave, RELEASE statements, which manipulated I/O buffers directly and offered access to the buffers only as parameters to Stream Procedures. They allowed asynchronous I/O based on the Program Release (PRL, 0111) instruction, which is what a RELEASE statement resolved to, the Continuity and Presence Bits in the data descriptor for the buffer, and the Control State I/O Release (IOR, 2111) instruction used by the MCP. Since this gave programs direct accesss to the buffers, any blocking/deblocking was of course the responsibility of the program.
RELEASE statements were not easy to use, and array-row I/O was obviously an answer to that, yielding a much more straightforward way to read and write records, as well as supporting blocking of data. I suspect that the development of Direct I/O for the B6500 was an attempt (and a successful one) to give users the benefits of RELEASE statements with a much more humane API.
Your reading of the influence of 220 BALGOL on B5000/B5500 Algol I/O, on Extended Algol generally, and on the B5000 architecture is correct. It had a tremendous influence. In his oral biography, Robert Braden said "In a sense, they built BALGOL (ALGOL-60, actually) into the hardware..." [https://conservancy.umn.edu/handle/11299/172263, p. 17]. David Bulman, in his 1977 introduction to the issue of IEEE Computer on stack machines, also says that BALGOL had a strong influence on the design of the B5000 [https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1646477].
Finally, "Kunkerform" is named after Don Kunker, who worked on software for the B5500 and then later on the MCP for the B6500/B6700, particularly on Physical I/O. Kunkerform is now known as StandardForm, and is a way of unabiguously representing multi-level file names internally in the software. I knew Don slightly during the early/mid-1970s while working in the Benchmark Facility at Great Valley Labs (near the Burroughs Research Center in Paoli, Pennsylvania). He would fly out occasionally when we were having I/O problems, often arriving on a red-eye, carrying a briefcase containing a small reel of magnetic tape, a small deck of cards, the latest issue of National Geographic, and no other luggage. He was that confident of going home that evening, and he did.
Thank you for your most interesting and informative replies!
Yes, I missed much of the early Large Systems world. I really wasn't that interested in it, I was a Medium Systems guy at heart, and of course LS had moved to Proctor and then to MV while I was working in Pasadena. I was hired as a tech writer out of Proctor, because the guy hiring me (whose name I can no longer remember, my memory started going south about ten years ago) was desperate to get some people to write manuals for LS, but I managed to convince his boss during the intiial interview that Pasadena needed me worse for that. So I worked in TIO in Pasadena under Bob Loomis while being paid out of Proctor, and spent more time than most employees traveling back and forth between the plants.
I spent a year or a little over writing the Basic manual from scratch, completely rewriting the Assembler manual, doing a revision on the Cobol manual, a few tech notes, and a couple of minor things I can no longer recall. I also spent a lot of time looking at customer UCFs and the dumps that came in with them. That wasn't my job, but TIO was the customer interface department, and all that stuff was stored in the office next to mine, and I was interested in how the system worked internally, and this gave me a chance to actually read and understand the MCP listing. In the field I'd only been able to reverse engineer what it probably looked like from dumps and an object listing. I ended up writing patches to fix quite a few of the problems that came in and sending them downstairs for blessing, which caused consternation, ruffled feathers, and got a lot of the software managers to know my name and recognize my face. (Pasadena was different than LS in how patches were done. We issued object patches to the field in a series of ?PATCH cards to fix problems in software in their hands, so they didn't have to wait till the next official software release, even though in those days they came out every 30 days or so.)
After about a year I transferred downstairs to the MCP department and redesigned and rewrote tape label handling as an initial project. Then I spent a while doing cleanup things behind other programmers. I still recall coming up with a 600 line object patch to fix someone else's new feature that wasn't implemented as well as it should have been. (The usual patch was 4-12 lines and issued on a Night Letter for the customers to type in themselves. I forget how we managed this monster; I think maybe we would overnight tapes with the patch deck to the customers.) I spend the rest of my time in Pasadena in the MCP department doing various thing I found interesting, including doing the initial software debug on every new processor Pasadena made over those years, largely I think because nobody else in Software really wanted to deal with the hardware people or the pain of debugging a new machine.
Getting back to the subject of Algol: It was my understanding, or at least belief, that Knuthput referred to Algol formatted IO. Specifically the WRITE statement that used formatted IO. Since I don't believe Knuth was still around when the B6500 was being implemented, this had to have been invented in the B5000 timeframe (or earlier) and carried over to the B6500 ALGOL language. The first LS IO subsystem I saw was FIBSTACK. I still have a bound listing around here somewhere from about 1974 that I'd been presented by some LS manager during a series of meetings where he tried to convince me how much superior the LS IO implementation was than the MS implementation (which he was completely unfamiliar with). Since like most LS types he had absolutely no understanding or conception of the MS hardware architecture, he couldn't understand why we insistent on using our own implementation of the MCP instead of using their "superior" version. Dealing with LS plant people could be frustrating at times.
I have added a link to your page at Wikipedia.
Also the link to the BALGOL Manual has changed. It is now:
http://bitsavers.org/pdf/burroughs/Electrodata/220/220-21017_B220_BALGOL_196303.pdf
Comment by Ian Joyner (The 'comment as' field is showing something different).
Wikipedia ALGOL 58 page: https://en.wikipedia.org/wiki/ALGOL_58
Ian (I still don't know why its showing sdgsfgd).
I have also recently extensively revised the NEWP entry at Wikipedia.
https://en.wikipedia.org/wiki/NEWP
Perhaps those more familiar with NEWP can check for correctness, particular my memory of the memory array M.
Ian
Post a Comment