Welcome to the Datatron 205 and 220 Blog

This blog is a companion to T J Sawyer's Web Page that outlines the history of the Burroughs Datatron 205 and Paul Kimpel's incredible 205 and 220 emulators. Please visit those sites and return here to post any comments.

Wednesday, August 8, 2018

BALGOL – The Burroughs Algebraic Compiler

The Burroughs Algebraic Compiler for the 220 computer system was an early dialect of the Algol-58 language. Within Burroughs, it was styled as BAC-220, but was more popularly known as Burroughs Algol, or BALGOL.

In the prior post [1], I briefly summarized the movement in the 1950s toward the development of higher-level programming languages, the events leading up to the development of Algol, and an overview of the preliminary version of Algol, now known as Algol-58.

This post describes the origin of the Burroughs compiler, compares the BALGOL language features to those of the Algol-58 Preliminary Report [2], and briefly discusses compiler performance and influence.

Origin of BALGOL


In 1954, Robert S. ("Bob") Barton took a job at Shell Development Company Research in Houston, Texas, working on programming applications. By 1957, he had a talented group of young programmers working with him, including Joel Erdwinn, Clark Oliphint, C. M. Pearcy, and (as a summer employee) David Dahm.

This young gang of four wrote quite an advanced assembler for the ElectroData/Burroughs 205. The assembler was tape-based, but included a macro facility (including maintenance of a macro library on tape), special support for addressing the 205's high-speed memory loops, warnings for invalid adjacent instruction parings, and most usefully, a pseudo-operator that would take COBOL PICTURE-like strings and generate the obtuse 319-digit patterns that are needed for Cardatron format bands.This became known as the Shell Assembler [15].

In 1959, Barton left Shell to work for the ElectroData division of Burroughs in Pasadena, California. Erdwinn, Oliphint, and Dahm (still as a summer employee) followed him to Pasadena. They became known as the "arthropods," because, well, they came from Shell. Not long after, Barton became the head of the so-called Automatic Programming Group in Pasadena, which was developing compilers and other programming aids.

By this time, the Burroughs 220 had been available to customers for about a year. While the 220 was a decimal, vacuum-tube machine, it had good (if somewhat slow) facilities for scientific and numerical computation. According to a memorandum Tom Sawyer has uncovered in the Burroughs archives at CBI, dated 12 November 1958, from R. A. Mallet of the Scientific Compiler Project, Burroughs had been working on a FORTRAN compiler with the goal that "the Burroughs 220 would be able to use any program written, in FORTRAN, for the IBM 650, 704, 709, and 7070..." The memo goes on to request a change in the way that arithmetic overflow was originally handled by the 220.

When Barton became head of Automatic Programming, he apparently assumed responsibility for this project, and according him in the 1985 B5000 Conference oral history [3, page 50], "It's correct to a certain extent in that the job that I had taken, under generally misleading conditions, called for doing an impossible FORTRAN which would also include conversion of assembly language from the 7090, or whatever the machine was at the time, automatically. I knew it couldn't be done, but that was my responsibility."

Somewhere in this time frame, the focus for the "scientific compiler" shifted from FORTRAN to Algol-58, the Preliminary Report [2] for which had been published in December 1958. It is not clear whether this happened at Barton's instigation or the young team took this direction on their own, but Dahm tells this amusing – and telling – anecdote from the B5000 Conference [3, page 27]:
I remember when I was a summer employee in the summer of 1959, I was working on the [BALGOL] compiler with another fellow named Joe Erdwin [sic], who was the project leader. And we were busily trying to do our version of IAL, and one day Bob Barton came along and he had a FORTRAN manual in his hand. It was a really nice, well-done FORTRAN manual done by IBM. He said, "Why don't you guys do this language because then you wouldn't have to write a manual?" We rejected that as being not a very good reason for picking a language. [Laughter] So, basically, I would say that the decision, that the compiler we would do would be ALGOL as opposed to FORTRAN was made by a summer employee and a project leader. I don't know that anyone else was really involved in making that decision.
One further clue as to the shift in focus is Barton's statement immediately following his complaint over the "impossible FORTRAN" [3, page 50]:
Erdwin [sic] would never have done a FORTRAN. I mean, he'd been going through this kind of educational experience at Shell, and he was not the sort of guy that would waste his time doing FORTRAN. He knew too much about language.
So it appears the decision to do an Algol-58 compiler was Erdwinn's, but probably with Barton's active approval and support. Barton gives Erdwinn full credit for the design of the BALGOL compiler, saying it was "a brilliant job of programming" and "Erdwin's [sic] masterpiece" [3, page 49].

Over the short term, it would probably have been better for Burroughs to have gone forward with a FORTRAN compiler, albeit one with more realistic goals. Over the longer term, though, the choice to go with Algol-58, plus the experience gained from building BALGOL and having customers use it, helped set Burroughs on the path to the B5000/5500, B6500/6700/7700, and later related systems that continue to be produced and sold today as Unisys ClearPath MCP systems.

According to the BALGOL reference manual [4], the compiler was completed and first installed for customer use in March 1960. We have two scans of listings of the compiler, one dated 1 May 1961 [5], citing in the heading [Joel] Erdwinn, [Jack] Merner, [Fran] Crowder, [Joe] Speroni, and [Donald] Knuth as authors. The second listing, dated 1 February 1962 [6], adds the names [Dave] Dahm, [Clark] Oliphint, Lodgemann, and [Toni] Schuman.

Erdwinn left Burroughs shortly after the BALGOL compiler was finished to join fast-growing Computer Sciences Corporation (CSC, now DXC Technologies). He spent the bulk of the next two decades building compilers and system software for CSC's customers, particularly compilers for none other than... FORTRAN.

Merner, Oliphint, and Dahm continued with Burroughs and, with Barton, had major roles in the development of the B5000/5500. Dahm is also known for his work on the B5500 Timesharing MCP and CANDE, and on the DMSII data base management system for the B6700/7700.

BALGOL vs. Algol-58


BALGOL for the Burroughs 220 is one of several early implementations of the Algol-58 language described in the 1958 Preliminary Report [2]. Other well-known implementations are System Development Corporation's JOVIAL, the U.S. Navy's NELIAC, and the University of Michigan's MAD. The Preliminary Report was a progress report on the current state of a committee's attempt to develop a universal algorithmic language, not a fully-vetted specification. As such, and given the limited capabilities of computer systems at the time, all implementations based on the Report deviated from it, often significantly.

Of the Algol-58 variants, BALGOL is arguably the one that most closely followed the syntax and semantics put forth in the Preliminary Report. The BALGOL compiler is a single-pass design, intended for fast compilation at the expense of highly-optimized code, and oriented to a compile-and-go job shop environment.

The single-pass design imposes some restrictions on the dialect of Algol-58 BALGOL can support, and on the ordering of some language elements in the program text, as is discussed in the following sections. It drops some features of Algol-58, and implements others in slightly different ways, but adds little except for pragmatic facilities such as FORTRAN-like input/output, an open subroutine, program segmentation and overlay facilities, some diagnostic facilities, and the ability to incorporate machine-language routines into the compiled object code.

BALGOL's reference manual even has a detailed set of transliteration rules [4, Appendix E], as called for in the Preliminary Report, describing the differences between the Report and BALGOL's implementation. Going the Preliminary Report one better, the manual has a complete syntax definition for BALGOL in Backus-Naur Form [4, Appendix D].

The remainder of this section discusses the differences between BALGOL and Algol-58 in detail. It follows the overview of Algol-58 features in the prior post [1].

Symbology


Where the Algol-58 reference language uses a 63-character set (not counting upper- and lower-case equivalents), Burroughs chose to stick with the 48-character set available with IBM punched-card equipment, which most sites used through the 220's Cardatron subsystem.

Rather than attempt some sort of delimiter scheme to support the word-symbols of Algol-58 (begin, end, if, for, etc.), in BALGOL these symbols are considered to be reserved words. Spaces are significant and must be used as delimiters between reserved words and identifiers.
  • The 26 English letters: A-Z [upper-case only]
  • The decimal digits: 0-9
  • Arithmetic operators: +     .   /   *
  • Relational operators: LSS   LEQ   EQL   GEQ   GTR   NEQ
  • Logical operators: NOT   OR   AND   EQIV [equivalence]   IMPL [implication]
  • Sequential operators: GO,   GO TO,   RETURN,   STOP,   FOR,   IF,   OR,   EITHER IF,   OR IF,   SWITCH,   UNTIL,   OTHERWISE,   OVERLAY
  • Separators: ,   ..   $   =   **   BEGIN   END
  • Brackets: (   )
  • Declarators: INTEGER,   BOOLEAN,   FLOATING,   REAL,   ARRAY,   FUNCTION,   COMMENT,   PROCEDURE,   SUBROUTINE,   SEGMENT,    FINISH,   INPUT,   OUTPUT,   FORMAT,   MONITOR,   TRACE,   DUMP,   EXTERNAL STATEMENT,   EXTERNAL PROCEDURE
Compare this to the symbology described for Algol-58 in the prior post [1].

The IBM Type 407 tabulator used as a printer with the Cardatron had interchangeable type wheels. Two common wheel sets were "A" (commercial) and "F" (FORTRAN). Four character glyphs differed between the two, with the commercial set shown on the left below and FORTRAN on the right:
#   =
&   +
%   (
¤   )
The internal character codes for these pairs were the same; they simply printed differently on different 407s. In addition, the 407 does not support the semicolon (;), so BALGOL uses the dollar sign ($) in its place.

Aside from these substitutions, BALGOL adopted the following changes in symbology from the Algol-58 Preliminary Report:
  • The period (.) replaces the small cross (×) for multiplication.
  • The asterisk (*) replaces the up/down arrow brackets () for exponentiation.
  • Mnemonic equivalents replace the relational and logical operators, as shown above.
  • Two adjacent periods (..) replace the colon (:).
  • The equal sign (=) replaces the assignment operator (:=). EQL is used as the relational operator for equality.
  • Two adjacent asterisks (**) replace the subscripted (10) to delimit the scale factor in numeric literals for scientific notation.
  • Square brackets, [ ], are not supported by the 407, so parentheses are used to enclose array subscripts.
There are a number of additional declarators and sequential operators (statements), which will be discussed separately below. The complete list of reserved words for BALGOL is in Appendix C of the reference manual [4]. The identifiers for the routines in the compiler library are also considered to be reserved. The standard set of library routines is listed in Appendix G of the reference manual, although the library (and hence the set of reserved identifiers) could be revised during the compiler generation process.

Data Types and Declarations


BALGOL supports only INTEGER, REAL (FLOATING), and BOOLEAN data types. Unlike some other dialects of Algol-58 (e.g., JOVIAL), there is no provision for records, strings, or other composite types.

Literals


Literals are specified the same way as in Algol-58, except that numbers have the general form I.F**±E, where I is a whole integer, .F is a decimal fraction, and **±E is a scaling factor as a power of ten. The fraction and scale factor are optional, as is the sign preceding the exponent of the scaling factor. Unlike Algol-58, a number cannot be expressed as simply a scaling factor without at least an integer prefix to it. In addition, a literal cannot begin or end with a period for the decimal point (.), as this could be confused with the use of the period as a multiply operator. Boolean literals continue to be represented by 0 and 1 for false and true, respectively.

Identifiers and Types


As with Algol-58, BALGOL assumes all identifiers refer to REAL variables by default. A type declaration can specify an identifier to be of type INTEGER or BOOLEAN. Array identifiers within type declaration lists need not be followed by empty subscript lists indicating dimensionality – that is determined by the ARRAY declaration. The parentheses surrounding the entire list of identifiers in a type declaration are also optional in BALGOL.

Identifiers must start with a letter, which may be followed by letters and digits, up to a total length of 50 characters. An identifier may not contain embedded spaces or special characters.

BALGOL also provides a generic type specifier termed a prefix. A type declaration can include an identifier followed by an ellipsis (three adjacent periods, ...). Any identifier not otherwise covered by a type declaration that begins with the same set of characters automatically receives the specified type. Multiple prefixes can be specified with overlapping subsets of the same leading characters. Unless an identifier is fully stated in a type declaration, the longest prefix matching an identifier determines its type. An OTHERWISE clause may specify a globally-default type.
INTEGER I..., J..., K..., L..., M..., N...$
BOOLEAN KEEP, LAST$
REAL MAT..., JOULES$
REAL OTHERWISE$
With these declarations, the identifier MATRIX would be of type REAL, not INTEGER. Because BALGOL is a one-pass compiler, type declarations must precede first use of an identifier in an expression or array declaration.

Array Declarations


Array declarations differ from those in Algol-58. The most significant difference is that custom bounds are not supported. Each dimension of an array is declared simply as its length. All indexing is one-relative. Only literal lengths may be specified.

As an extension, BALGOL permits an array declaration to specify initial values for the array. If the dimension list is followed by an equal sign (=) and a list of literal values in parentheses, the compiler will fill the array with those values at the time the program is loaded. For arrays of multiple dimensions, a linear list of values is specified. These are assigned to the array elements in row-major order, starting with the last dimension. If the list of values is shorter that the total number of elements, the remaining elements are initialized to zero.
ARRAY A(9), B(4,12), C(10, 10, 10)$
ARRAY MONTHDAYS(12) = (31,28,31,30,31,30,31,31,30,31,30,31)$

Function Declarations


BALGOL supports in-line function declarations like Algol-58, but with a slightly different syntax. The function identifier must be preceded by the reserved word FUNCTION, and the parameter list delimited from the function body by an equal sign (=). Use of an in-line function is otherwise the same.
FUNCTION TORADS(DEGREES) = DEGREES . 3.1415926/180$
FUNCTION PYTHAGOREAN(A, B) = SQRT(A*2 + B*2)$

Diagnostic Declarations 


BALGOL supports three declarations for its run-time diagnostic facilities:
  • MONITOR specifies a list of variable identifiers and labels. At run time, whenever one of the named variables is modified, or any variable within the scope of a named label is modified, the first five characters of that variable's identifier are printed along with the variable's new value.
  • DUMP specifies a list of variables or labels. When a dump is triggered, the abbreviated names and the values of the variables are printed. Labels are printed with the count of the number of times they have been passed. A dump is triggered by a TRACE declaration.
  • TRACE specifies a list of label identifiers, each optionally followed by a literal integer in parentheses. If an integer (n) is specified, then every n-th time that label is passed in the program, all DUMP declarations in the program (or if execution is within a procedure, all DUMP declarations within that procedure) are triggered, and the values in their variable lists are dumped. If a label in a TRACE list is not followed by an integer, then the corresponding DUMP declarations are triggered each time the label is passed, as if (1) had been specified.
MONITOR X, Y, Z$
DUMP A, B, D$
TRACE AGAIN, INNERLOOP(100)$

Segmentation and Overlay


BALGOL supports multi-level segmentation of the object code for a program. A segment is simply a compound statement preceded by the reserved word SEGMENT and an identifier that labels the segment. Segments may be nested. The code outside any segment declaration is considered to be a global segment, and is permanently resident while the program is running.

A segment is called into memory using the OVERLAY statement, naming the segment label. Note that this statement does not transfer control to the segment, it merely loads it into memory. A separate GO TO statement must be used to enter the segment, and the segment label cannot be used for this purpose. Typically, the first statement inside the segment's compound statement is given a label, and a manual transfer is made to that label. Similarly, exit from the segment is not automatic, and an explicit transfer of control must be coded to return to the code for some containing segment.

Section 9 of the reference manual [4] gives the rules for overlaying segments and transferring into and out of them.

FINISH Declaration


Every BALGOL program must have a FINISH declaration at its end. This signals to the compiler the end of the BALGOL text and the start of the process to link in any library routines or external machine-language programs. The "$" following FINISH is required.

While the reference manual classifies this as a declaration, it is actually an executable statement, as it causes the compiler to emit a halt instruction with 9669 00 9669 ("XX") displayed in the C register. Pressing START on the Control Console at this point will execute a card or paper-tape read (depending on how the compiler was generated), with the intent of booting the next job in sequence.
FINISH$

Other Declarations


Comment declarations are the same as in Algol-58. After the reserved word COMMENT all text is ignored until the next "$".

The switch declaration of Algol-58 is not supported by BALGOL. Instead there is a more primitive SWITCH statement, as described below.

There are additional declarations to support formatted input and output of data. These are discussed in the section on input-output below.

Expressions


BALGOL follows the scheme of Algol-58 expressions closely, but more completely defines the resulting type of an expression, and adds a few twists.

Arithmetic Expressions


Arithmetic expressions are the same as in Algol-58, with the following exceptions:
  • Array identifiers are indexed using a list of subscripts in parentheses rather than square brackets.
  • Different characters are used to specify multiplication and exponentiation, as described above under "Symbology."
  • Expressions are evaluated left to right, except that default precedence is defined to be exponentiation first, then multiplication, then division, then addition/subtraction. Parentheses, as usual, can override the default precedence.
  • In certain cases, the multiply operator (.) may be omitted, as in standard mathematical notation. The specific rules are stated in Section 3 of the reference manual [4], but basically the operator may be omitted:
    • between a closing parenthesis on the left and an opening parenthesis on the right
    • between a simple variable or literal and a parenthesis on either side
    • between a closing parenthesis on the left and a subscripted or function variable on the right
    • between a literal on the left and a simple, subscripted, or function variable on the right.
     See the examples below.
  • The type of an arithmetic expression is integral if all of its components are integral and real otherwise. Note that inner expressions may be evaluated as integers while the overall expression may result in being of type real. The compiler will coerce literals and some constant expressions to the appropriate destination type at compile time where possible.
  • The integer precision on the 220 is ten decimal digits plus sign, so all integer operations are effectively truncated on the high end to ten digits, e.g., 5000000001 + 5000000002 = 3. The floating-point precision is eight digits.
Note in the second example below the use of implied multiplication with GAMMA. Note in the fourth example not only the implied multiplication of 2 and SQRT(S), but also that the expression is interpreted as (SIGN(A(R,L))) / (2SQRT(S)) since multiplication has a higher precedence than division.
ALPHA + BETA / GAMMA
(ALPHA + BETA) GAMMA
A*2/2*(B-1)
SIGN(A(R,L))/2SQRT(S)

Boolean Expressions


Boolean expressions operate in BALGOL as in Algol-58 with a few enhancements:
  • Parentheses around an arithmetic relation are not required.
  • Array subscripting is the same as for arithmetic expressions.
  • The Algol-58 relational symbols are replaced by mnemonic operators, as described under "Symbology" above.
  • If a relation is between operations of different types, the compiler will coerce the integer operand to floating point. If the integer operand is a literal, this coercion will occur at compile time.
  •  BALGOL adds a Boolean operator for implication (IMPL). A IMPL B evaluates to true unless A is true and B is false.
  • Evaluation of Boolean expressions is from left to right with this operator precedence: NOT > AND > OR > IMPL > EQIV.
(ALPHA GEQ BETA) OR (GAMMA NEQ EPSILON) EQIV NOT OVERRIDE

Statements


The concept of statements in BALGOL is essentially the same as that in Algol-58. The "$" is used to separate statements, not terminate them. The end of a line or record of program text has no syntactic significance. Statements may be split across lines and multiple statements may be grouped on a line.

Since the BALGOL character set does not include the colon (:), statement labels are indicated by two adjacent periods (..). Labels may be either identifiers or literal integers. Leading zeroes in a numeric label are not significant.

Compound statements are identical to those in Algol-58, except that the terminating END of a compound statement may be labeled to provide an exit point. An END may not be immediately preceded by a "$". If the BEGIN of a compound statement is labeled, the label may optionally be repeated after the corresponding END. Curiously, left and right parentheses can be substituted for BEGIN and END, respectively.

As with Algol-58, declarations may be intermixed with statements. Since BALGOL is a single-pass compiler, however, declarations for an identifier after the identifier has been first referenced (and its type determined at that point) are ignored.

The do macro statement of Algol-58 is not supported by BALGOL.

Assignment Statements


The BALGOL assignment statement is the same as for Algol-58, except that the operator "=" is used instead of ":=". Type coercion works the same – integer expressions are converted before being stored in real variables and real expressions are truncated to integer before being stored in integer variables. A Boolean expression will result in a 0 or 1 being stored in a numeric variable. Numeric expressions cannot be stored in Boolean variables.

BALGOL adds the ability to assign an expression to multiple variables. The stores take place in right-to-left order, starting with the variable immediately to the left of the expression. The expression value is coerced as necessary to a destination variable's type before the store, and that coerced valued is then applied to the next store in sequence. Note that this is multiple assignment of a single expression; embedded assignment within an expression (as in Algol-60 and C) is not supported.
ALPHA = BETA = GAMMA = 2DELTA + 1$

GO TO and SWITCH Statements


GO TO statements are more primitive in BALGOL, since it does not support the Algol-58 switch declaration, and thus there is no concept of a designational expression. GO TO may only reference a label. The word TO may be omitted.

To compensate for the lack of the switch declaration, BALGOL supports a SWITCH statement. This is a classic computed-go-to construct, where the value of an expression selects the branch destination from a list of labels. If the value is zero, control continues in sequence after the SWITCH statement. Values starting at 1 select a label from the switch list. If the value is negative or greater than the number of labels in the switch list, the behavior is undefined (internally, the branch table is built and indexed backwards, and an out-of-range index will result in a branch to some arbitrary instruction).
SWITCH CHOICE-1, (CASE1, CASE2, CASE3, CASE4)$

IF Statements and Alternative Statements


The IF statement works identically in BALGOL as in Algol-58.

The alternative (if either... or if...) statement of Algol-58 is supported in BALGOL, but with a slightly different syntax. Instead of starting with if either, the statement starts with EITHER IF. Otherwise, the chaining of OR IF clauses and terminating END are the same.

BALGOL adds an additional feature to the EITHER IF syntax. Instead of the terminating END, you may write "$ OTHERWISE$" followed by another statement, which of course may be a compound statement. This statement will be executed if none of the others in the skip chain have been selected. Thus BALGOL does have an "else." I found to my great chagrin that the "$" before "OTHERWISE" is absolutely necessary – without it, mysterious cascading errors result.
EITHER IF A > B$ FWD = 1$ OR IF A < B$ FWD = 0 END$

EITHER IF A > B$ FWD = 1$ OTHERWISE$ FWD = 0$

FOR and UNTIL Statements


The FOR statement in BALGOL functions identically to that in Algol-58, but with a slight change in syntax. The iteration list may be composed of individual values or triplets that define a sequence of values, but the triplet form is specified as "(initial, increment, final)" rather than Algol-58's "initial (increment) final".

BALGOL adds a new iteration statement, UNTIL, similar to the modern while statement. It evaluates a Boolean expression and iterates until the expression becomes true. If the expression is initially true, the body of the loop does not execute at all.
FOR I = 1, 3, 5, 7, 11, (13, 7, 99), 101, (103, 1, 125)$
    A(I) = I$

UNTIL A > 125 OR OUTTAHERE$
    BEGIN
    OUTTAHERE = A EQL 77$
    A = A + 3
    END$

STOP Statement


BALGOL supports the STOP statement of Algol-58 with two extensions. First, it is a simple halt, and does not terminate execution of the program. Pressing START on the Control Console will resume the program with the next statement in sequence.

Second, the word STOP may be followed by an optional arithmetic expression. The value of the expression will be displayed in the A register on the Control Console. All STOPs display 0137 00 7310 in the C register.

Procedure Declarations and Procedure Calls


Procedure declarations are somewhat simplified in BALGOL compared to Algol-58, but most of the capability is retained.

Procedure Declarations


The first significant difference for BALGOL is that multiple entry points are not supported, so the procedure heading does not have multiple identifiers each with their own parameter list. No label of the same name within the body of the procedure is required.

Second, the input and output parameter lists are not delimited by the "=:" operator. Instead, parameters are arranged in three groups: input, output, and reference parameters. Input parameters may be expressions or arrays. Output parameters may be variables, including arrays. Reference parameters include formal labels, formal functions and procedures, and identifiers for the INPUT, OUTPUT, and FORMAT declarations described below in the section on input-output. The parameter groups are delimited by "$". Any of the groups may be omitted, and the "$" for any trailing empty groups need not be specified. There is no difference between passing an array as an input parameter or as an output parameter.

Formal array identifiers must be specified with an empty subscript list containing n-1 commas to indicate an array of n dimensions. Formal procedure and function identifiers must be followed by an empty parenthesis pair.

Type declarations for the formal parameters must occur immediately after the BEGIN that introduces the body, not before it as in Algol-58. After the final END of the procedure, the procedure identifier may be stated, but it must be followed by an empty parenthesis pair. If the procedure returns a value like a function, the return value is assigned to the procedure's identifier as for Algol-58, but the identifier must be followed by an empty parenthesis pair in this case as well. See the Simpson's Rule example below for an illustration of the syntax.

A RETURN statement is required to exit the procedure.

Once a procedure has been declared, its identifier is available globally throughout the remainder of the program text and may be referenced, even within the body of another procedure.
PROCEDURE CROUT4 ($ N, A(,), B(), Y(), PIVOT(), DET, EX7$
                  SINGULAR, IP())$
    BEGIN
    INTEGER N, PIVOT, EX7$
    ...
    RETURN
    END CROUT4()$
In the example above, A, B, Y, and PIVOT are arrays, SINGULAR is a label, and IP is a function or a value-returning procedure. There are no input parameters in this case, only output and reference parameters.

Procedure Call


The differences for a BALGOL procedure call follow those for the procedure's declaration. Actual parameters are divided into input, output, and reference groups, and the groups delimited by "$". Expressions and scalar variables in the input group are effectively passed by value. Everything else is passed by reference (address).

Array slices can be passed by leaving some subscript positions empty for the actual array parameter, n empty positions specifying an n-dimensional slice. When passing a procedure or function as an actual parameter, the procedure or function identifier must be followed by an empty pair of parentheses. Unlike Algol-58, the number of parameters for the function is not indicated, and currying of the parameters is not supported.
CROUT4 ($ K, MAT(,), RHS(), FACTORS(), PIV(), DET, POWER$
    SINGULARITY, INNERPRODUCT())$

SUBROUTINE Declarations


BALGOL supports an open SUBROUTINE declaration. This is simply a compound statement that is given a name and can be called as a parameter-less procedure from other points in the program. All access to variables is global. Any declarations within the body of the SUBROUTINE are treated as if they had been placed outside its body and do not create local variables. A subroutine is called using the ENTER statement. Exit from the subroutine is by means of a RETURN statement, as for procedures.
SUBROUTINE COMMONCODE$
  BEGIN
  X = Y + Z$
  Y = A(I,J)*2 + Z$
  RETURN$
  END$

ENTER COMMONCODE$

Intrinsic Functions


BALGOL defines a number of intrinsic functions. These are not part of the compiler's library – they instead generate in-line code. All are considered to be reserved identifiers:
  • MOD(X, MODULUS) – remainder division of X by MODULUS
  • MAX(X1, X2, ..., XN) – maximum value from a list of arithmetic expressions
  • MIN(X1, X2, ..., XN) – minimum value from a list of arithmetic expressions
  • SIGN(X) – sign of the argument expression, -1, 0, +1, as the type of the expression
  • ABS(X) – absolute value of the argument expression, as the type of the expression
  • PCS(N) – returns a Boolean value indicating whether the specified Program Control Switch on the 220 Control Console is on, N = 0-9.
 In addition, BALGOL defines a standard set of mathematical functions in its library. These are described in Appendix G of the reference manual [4].

External Routines



PROCEDURE declarations may be prefixed with the reserved word EXTERNAL. In this case the body of the procedure is omitted. An external routine is one whose body will be supplied in relocatable machine language after the FINISH declaration at the end of the BALGOL text. The compiler will assign storage for the machine language instructions and link them into the object code. Instructions for preparing external machine-language routines and the linkage mechanism are in Appendix F of the reference manual [4].

BALGOL supports an additional type of machine-language linkage, EXTERNAL STATEMENT. This is both a declaration and an executable statement. It declares a label identifier that is matched to the identifier of a machine-language routine that will follow the FINISH declaration for the program. The statement branches to the external machine-language code. This is not a subroutine linkage, though, and BALGOL offers no intrinsic means for returning to the point in the program after the EXTERNAL STATEMENT.

Input-Output Facilities


The Algol-58 Preliminary Report does not address the subject of input and output of data for a program. BALGOL provides input-output capabilities somewhat similar to FORTRAN. The default configuration of the compiler and its library assumes input from punched cards and output to a line printer or a card punch, but alternate versions of the compiler could be generated to use other media, such as paper tape.

Input to a BALGOL program is free-form, with values delimited by one or more spaces or the end of the 80-character card or data record. The input formatter accepts integer and floating-point values. Numbers in scientific notation use "," to delimit the power-of-10 scale factor, not "**" as in BALGOL text. Character strings delimited by "$" may be read into a sequence of integer variables or array elements, five characters per word.

Values from the input medium are mapped to variables within the BALGOL program by means of an INPUT declaration. This specifies a named list of variables that is in turn referenced by a READ statement. Iteration through arrays is supported in much the same way as in FORTRAN:
INPUT MATRIX (N, FOR I=(1,1,N)$
                     FOR J=(1,1,N)$
                         (M1(I,J), M2(J,I)))$

READ ($$ MATRIX)$
The INPUT declaration generates a co-routine that supplies the address of the next element in the list on each call to it. The READ statement calls this co-routine and performs the actual formatting and data transfer. It keeps reading from the external medium until the input list is satisfied.

A second form of READ allows it to terminate before satisfying the list and signal that fact to the program:
READ ($ ENDED$ MATRIX)$
where ENDED is a Boolean variable that is set to true (1) if the input is interrupted by a signal in the input data stream. The signal is indicated by the 10-character sequence " SENTINEL " starting in the second character position of an input line or record. This signal is ignored if the first form of READ is used.

Output from a BALGOL program is not free-form, but formatted in a way similar to that of FORTRAN. An OUTPUT declaration defines the mapping of program values to the output medium, analogous to an INPUT declaration. It also generates a co-routine and is in turn referenced by a WRITE statement. While an input list may only name variables, an output list may specify its elements as general expressions.

The data is formatted and arranged on the output medium according to the specifications of a FORMAT declaration. This is like a FORTRAN FORMAT, although the formatting phrases are coded somewhat differently. The WRITE statement performs the actual formatting and data transfer, naming the identifiers for an output list and a format. A WRITE statement may also omit the data list and specify only a format identifier if the format contains only literal data.
OUTPUT MATOUT (N, SUM, FOR I=(1,1,N)$
                       FOR J=(1,1,N)$
                          (M1(I,J), M1(I,J)/M2(I,J)))$
FORMAT F1 (I8, B2, *SUM = *, X12.6, W2, (8F14.8, W0)),
       F2 (B20, *THATS ALL FOLKS*, W2)$

WRITE ($$ MATOUT, F1)$
WRITE ($$ F2)$
The BALGOL input-output facilities are described in Section 8 of the reference manual [4].

Examples


The prior post [1] exhibited near its end an example Algol-58 procedure for computing an integral using Simpson's Rule. The following is that example procedure transformed into BALGOL, and combined with declarations, procedure calls, and input-output statements to make it a complete program:
2 PROCEDURE SIMPS(A, B, DELTA, V$$ F())$
2   BEGIN
2   INTEGER K, N$
2
2   IBAR = V(B-A)$
2   N = 1$
2   H = (B-A)/2$
2   J = H(F(A) + F(B))$
2
2 J1..
2   S = 0$
2   FOR K = (1, 1, N)$
2     S = S + F(A + (2K-1)H)$
2
2   I = J + 4H.S$
2   IF DELTA LSS ABS(I-IBAR)$
2     BEGIN
2     IBAR = I$
2     J = (I+J)/4$
2     N = 2N$
2     H = H/2$
2     GO TO J1
2     END$
2
2   SIMPS() = I/3$
2   RETURN$
2   END SIMPS()$
2
2 FUNCTION TORADS(X) = 3.1415926X/180$
2
2 FUNCTION DARCTAN(X) = 1/(X*2 + 1)$
2
2 PROCEDURE LOGISTICSIGMOID(X)$
2   BEGIN
2   LOGISTICSIGMOID() = 1/(1 + EXP(-X))$
2   RETURN$
2   END LOGISTICSIGMOID()$
2
2 SUM = SIMPS(TORADS(30.0), TORADS(90.0), 0.00001, 2.0$$
2             SIN())$
2 WRITE($$ RESULT, F1)$
2 SUM = SIMPS(0.0, 1.0, 1**-5, 2.0$$ DARCTAN())$
2 WRITE($$ RESULT, F2)$
2 SUM = SIMPS(0.5, 3.0, 1**-5, 2.0$$ LOGISTICSIGMOID())$
2 WRITE($$ RESULT, F3)$
2
2 OUTPUT RESULT(SUM)$
2 FORMAT F1(*SINE INTEGRAL =    *,X10.6,W0),
2        F2(*DARCTAN INTEGRAL = *,X10.6,W0),
2        F3(*LOGISTIC INTEGRAL =*,X10.6,W0)$
2 FINISH$
The "2"s in the leftmost column of the program text are an artifact of the 220's Cardatron punched-card subsystem, and must be present to select an input format when the cards are read. BALGOL uses "2" for source cards, "6" for machine-language code, and "5" for data cards. The output printed by this program is:
SINE INTEGRAL =       .866025
DARCTAN INTEGRAL =    .785398
LOGISTIC INTEGRAL =  2.074510
I have transcribed the example programs from Section 11 of the compiler reference manual [4] and have them running in my retro-220 emulator [7]. I have prepared a few other programs, including the Simpson's Rule example above, and have them running in the emulator as well.

These examples are available from the BALGOL-Examples directory in the emulator's open-source project repository [8]. Each example includes source code, sample data, and compilation listings both with and without the code generated by the compiler. A few examples also include equivalent programs for purpose of comparison, written in the Algol-60 dialects of the Burroughs B5500 or modern Unisys ClearPath MCP ("E-mode") systems.

The emulator is written in Javascript and runs in a standard web browser. It can be run directly from its hosting site at [7], or you can download the files from the project repository and install it on some other web server. For instructions on running the emulator and the compiler, see the introductory blog post [9] and the wiki pages in the project repository.

BALGOL Compiler Performance


BALGOL was designed from the beginning to be a fast, one-pass compiler that generated reasonably good code. It was able to do small local optimizations within statements, and some rather fancy constant folding, but did not attempt to do more global optimizations, such as common expression factoring.

Even though the 220 was not a very fast machine, both the user-friendly nature of BALGOL and the speed of its compilation process gave it an advantage in environments where lots of relatively small programs were being submitted by lots of relatively unsophisticated programmers. For this reason it was quite popular at universities and research institutions. Stanford, Caltech, Georgia Tech, Cornell, Case (now Case Western Reserve), and Stanford Research Institute all had 220s and used BALGOL.

Various claims about the compiler's performance have been made. In his 2013 anecdote [10] on the use of BALGOL at Sanford in the early 1960s, Robert Braden stated "For typical source programs, the BALGOL compiler ran at line printer speed, 150 LPM [lines per minute], close to the card reader speed of 240 CPM [cards per minute]."

In a guest editorial [11] for the May 1977 issue of IEEE Computer devoted to stack machines, David Bulman talks about Bob Barton and the influence of BALGOL on the design of the Burroughs B5000/5500, and says of BALGOL, "About sixty compiler instructions were executed to generate each machine language instruction..." A little later he states, "The fact that Balgol [sic] could be compiled at the read speed of the card reader on the B220 lent some credibility within Burroughs to Barton and his ideas for a high-level language machine" (i.e., the B5000).

I have been unable to confirm Bulman's claim on instructions executed per instruction generated. I implemented an instruction counter in my retro-220 emulator and recorded counts of instructions executed and instructions generated for a few small and medium-size programs. There are some choices on just which instructions should be included (e.g., compiler load time, which takes about seven seconds and generates no instructions, or library linking, which generates lots of instructions without processing any source code), but excluding the library linking process, I mostly see ratios of 400-500:1, not 60:1.

I have also been unable to confirm either Braden's or Bulman's claims on compiler speed. For blank cards, comments, and cards with with relatively little text (say, a trimmed length up to 20 or so characters), the compiler does seem to operate at or near the line printer speed of 150 LPM. For complex statements and declarations, there is a noticeable pause between lines, often 2-3 seconds. The best average speed I have seen is close to 100 CPM, but that is without generating a compilation listing. A more typical average speed with a listing is 80-90 CPM.

The retro-220 emulator is designed to run programs at the speed of a real 220, using instruction times published in its Operational Characteristics manual [12] and the performance of peripheral devices from its Operating Procedures manual [13]. Emulator performance matches predicted performance on some very small benchmarks where it is easy to analyze the instruction timings, but timing is much more difficult to predict on larger, more complex programs that do significant amounts of I/O, such as the BALGOL compiler.

The emulator uses about 80% of an Intel 3.3GHz i3-2120 core when running under Windows 7 in the Firefox 61.0 browser and compiling a non-trivial program. It uses about 60% of a core when executing a program doing no I/O. Most of that processor use is devoted to browser overhead (garbage collection, rendering updates to the control panels and peripheral device displays, etc.), not to emulating 220 instructions, which requires only about 5% of a core. CPU utilization in Google Chrome 68.0 is about 30% less. In any case, the emulator's performance does not appear to be limited by its host environment.

It is entirely possible that the emulator is not regulating its performance correctly and running slower than a real 220, but there is a contradictory report to consider. In a 1960 letter [14] to the Communications of the ACM, Bob Barton briefly described BALGOL (calling it a "nameless compiler") and stated, "1. Compiling Rate: Averages 500 single-address instructions per minute." In the retro-220 emulator, I am seeing instructions generated at rates of 600-700 per minute, compiling the same programs where cards are being read at only 80-100 CPM. So Barton gives us a report of compiler performance that is lower than that observed in emulator.

From this, I conclude that the claim of compiling at card-reader or line-printer speed is somewhat overstated. This should take nothing away from the compiler's performance, which for its time was remarkable, especially considering the intrinsic speed of the 220.

Significance and Impact of BALGOL


As mentioned above, BALGOL had a significant impact within Burroughs on the design of the B5000/5500 computer system that followed the 220. Not only did the compiler give the Burroughs design team in Pasadena the confidence that a system could be designed to support fast, one-pass compilers and use solely higher-level languages (the B5000/5500 never had a resident assembler, and its modern descendants still don't), but many of the data structure techniques it used internally were given hardware support. The I/O and diagnostic features of the language also migrated into the Burroughs Algol-60 compilers that followed BALGOL.

One striking result of the influence and experience of BALGOL upon the B5000/5500 is that Burroughs did considerably better in terms of performance of its one-pass Algol-60 compiler for that machine. It could easily compile at card-reader speed -- using a 1400 CPM reader.

Braden [10] describes the impact, under the guidance of George Forsythe, that BALGOL had during his time at Stanford on the growth of what he terms "computer literacy." The language's features and ease of use, along with the compiler's operational efficiency created a highly-productive, quick-turnaround environment that encouraged people to develop programming skills and begin using them in their studies and research.

Braden also mentions some students who were stimulated by the environment and went on to notable careers in computing and computer science. Two of those students, Roger Moore and Larry Breed, along with Braden, succeeded in cloning the BALGOL compiler from the 220 to an IBM 7090 after Stanford acquired that latter machine. The cloned language was referred to as SUBALGOL. It was bootstrapped in stages so that the compiler was eventually written in itself. It also offered the same degree of operational efficiency on the much-faster 7090 as had been enjoyed on the 220.

The next blog post will discuss the significant effort to recover the BALGOL compiler source code and get it into running condition, as well as what I have learned over the course of this project about the compiler's internal operation.

References


[1] "Algol-58 – the International Algebraic Language,"
http://datatron.blogspot.com/2018/07/algol-58-international-algebraic.html
[2] "Preliminary Report—International Algebraic Language," Communications of the ACM, vol. 1 #12, December 1958, pp. 8-22,
http://www.softwarepreservation.org/projects/ALGOL/report/Algol58_preliminary_report_CACM.pdf.
[3] "Burroughs B5000 Conference" 1985-09-06, Charles Babbage Institute,
http://hdl.handle.net/11299/107105.
[4] "BAC-220: Burroughs Algebraic Compiler," Revised Edition, 220-21017, Burroughs Corporation, Detroit, Michigan, March 1963,
http://bitsavers.org/pdf/burroughs/electrodata/220/220-21017_B220_BALGOL_Mar63.pdf.
[5] BALGOL Listing, May 1961,
http://bitsavers.org/pdf/burroughs/electrodata/220/BALGOL_listing_May61.pdf.
[6] BALGOL Listing, March 1962,
http://archive.computerhistory.org/resources/text/Knuth_Don_X4100/PDF_index/k-1-pdf/k-1-u2196-balgol220compiler.pdf.
[7] Burroughs 220 Emulator Hosting Site, http://www.phkimpel.us/Burroughs-220/.
[8] retro-220 Emulator Project Repository, https://github.com/pkimpel/retro-220/.
[9] "Introducing the retro-220 Emulator,"
http://datatron.blogspot.com/2018/06/introducing-retro-220-emulator.html.
[10] "Burroughs Algol at Stanford University, 1960-1963," Robert Braden, IEEE Annals of the History of Computing, vol.35 #4, October-December 2013, p.69-73,
https://ieeexplore.ieee.org/document/6674030/.
[11] "Stack Computers," David M. Bulman, IEEE Computer, vol.10 #5, May 1977, p.14-16,
https://www.computer.org/csdl/mags/co/1977/05/01646477.pdf.
[12] Operational Characteristics of the Burroughs 220, Burroughs Corporation, Bulletin 5020A, August 1960,
http://bitsavers.org/pdf/burroughs/electrodata/220/5020A_B220_OperCharac_Aug60.pdf.
[13] Handbook of Operating Procedures for the Burroughs 220, Burroughs Corporation, Bulletin 5023, November 1969,
http://bitsavers.org/pdf/burroughs/electrodata/220/5023_B220_OperatingProcedures_Nov59.pdf.
[14] "Another (Nameless) Compiler for the Burroughs 220," R. S. Barton, Communcations of the ACM, vol.4 #1, January 1961, p.A11, https://dl.acm.org/citation.cfm?doid=366062.366070.
[15] "The Shell Assembler, Part 1,"
http://datatron.blogspot.com/2016/06/the-shell-assembler-part-1.html.

Sunday, July 29, 2018

Algol-58 – The International Algebraic Language

By the early 1950s, the number and variety of electronic computers were increasing, the scope of their application to science, business, and government was rapidly expanding, and the demand for the programming those applications required was expanding in concert. This demand stimulated the need for increasingly efficient and effective ways to prepare programs, which in turn led to the development of higher-level programming languages.

One of the first attempts at designing a universal programming language, at least for numeric computation, was the International Algebraic Language (IAL), also known as Algol-58. The Burroughs Algebraic Compiler (BALGOL) for the 220 computer system was an early dialect of Algol-58. To prepare for a detailed discussion of BALGOL on the 220 in the next post [1], this post will discuss the development of Algol-58 and present an overview of the language.

Background


Initially, preparation of the program steps for machine execution was done entirely on the machine's terms. Instructions were prepared directly using the numeric or alphanumeric code required by the hardware. Hence the popular use of "code" and "coding" as synonyms for programs and programming. There is a lot of manual bookkeeping involved in programming at this level, especially in keeping track of addresses within the code for data storage and branch locations. That bookkeeping task is compounded when changes must be made to an existing program.

Mnemonic notations arose fairly quickly to ease the clerical task of coding, along with a number of higher-level abstractions, such as flowcharting, to aid in the overall conception and design of programs. Not long after, people began to realize that the machine itself could assist with the clerical aspects of generating code, especially in keeping track of address assignments. This led to the development of assembly languages and the assembler programs to process them.

From the very beginning of automated computing, though, there has been a strong desire to be able to approach programming more on our terms, in a way that is more in keeping with the way we think about the applications to which we want to apply a computer. Since most early applications involved numerical calculations based on mathematical formulae, there was an obvious desire to express programs using standard notations from mathematics. FORTRAN is one of the earliest and best known realizations of this desire, but as Donald Knuth made clear in his 2003 lecture, "A Dozen Precursors of FORTRAN" [2], attempts to express programs using higher-level abstractions can be traced all the way back to Konrad Zuse during the 1940s.

A parallel desire has been to be able to express programs in a more general way than in the coding for a specific type of machine. The half-life of a given computer design in the 1950s was only a couple of years, so as new designs came along, existing programs had to be completely redone using the instruction coding for each new machine. Being able to express programs in a way that was not specific to any particular machine would ease this burden. It would also open the possibility of more easily adapting programs written for one machine so that they could run on entirely different ones.

Many efforts towards translation of standard arithmetic expressions into machine instructions were going on, starting in the early 1950s. A preliminary report on FORTRAN was available by 1954, but other languages were starting to appear, such as Alick Glennie's Autocode, Alan Perlis' Internal Translator (IT, also known as the Purdue Compiler), and Grace Hopper's A0 and MATH-MATIC. There were also a number of interpretive expression evaluators, such as the Whirlwind Algebraic Interpreter. By the second half of the 1950s, the number of different (and incompatible) programming notations was beginning to proliferate.

In 1955, the German Association for Applied Mathematics and Mechanics (GAMM) held a meeting in Darmstadt on electronic computers. This resulted in the formation of a committee to work on formula translation. Then in May 1957, the Association for Computing Machinery (ACM), SHARE (the IBM user organization), USE (the Sperry Univac user organization), and DUO (the ElectroData/Burroughs user organization) held a conference in Los Angeles to explore the exchange of computing information. The group concluded that a single, universal programming language would be very desirable, and recommended that the ACM form a committee to pursue the idea. A few months later, the GAMM group reached a similar conclusion and contacted the ACM to suggest a joint conference that would make recommendations on a common language.

By April 1958, both the ACM and GAMM committees had independently developed proposals for a universal language and met to exchange them. They found they had much in common, so arranged to hold a joint working session between representatives of each group to combine their proposals and resolve the differences. That joint session was charged with the following objectives [3]:
  1. The new language should be as close as possible to standard mathematical notation and be readable with little further explanation.
  2. It should be possible to use it for the description of computing processes in publications.
  3. The language should be translatable into machine programs.
The joint ACM-GAMM meeting was held in May 1958 at ETH in Zurich, Switzerland. The product of that meeting was the "Preliminary Report—International Algebraic Language" [3]. The language was initially referred to as the IAL, but the name was soon determined to be unsuitable, and Algol, short for "algorithic language," was suggested instead. The successor to the preliminary IAL was known as Algol from the start, so to distinguish them, the IAL eventually became known as Algol-58 and its successor as Algol-60.

FORTRAN had been released by IBM for almost a year by the time of the 1958 ACM-GAMM joint session in Zurich. FORTRAN was being received warmly, especially by the scientific community. It was not exactly an attractive notation, though, and was certainly not an ideal way to exchange definitions of algorithmic processes, especially among people accustomed to the elegance of mathematics. Even John Backus, the driving force behind FORTRAN, could not have been completely pleased with it, as he was one of the ACM representatives to the Zurich joint session. Thus, there was motivation to come up with something much better.

Overview of Algol-58



This section attempts to describe the IAL as presented in the Preliminary Report in somewhat informal terms. It largely follows the structure of that Report.

Symbology

 
Algol-58 uses a 63-character set, which allowed for a number of traditional mathematical symbols, and may have been influenced by six-bit codes for teletype and punched paper-tape equipment. Early FORTRAN, of course, had been limited by the 48-character set used in IBM punched-card equipment.

The basic symbols defined for the language are:
  • The 26 English letters: A-Z [including both upper and lower case; the report is silent on the subject of case sensitivity]
  • The decimal digits: 0-9
  • Arithmetic operators: +     ×   /
  • Relational operators: <      =      >  
  • Logical operators: ¬ [not]   [or]   [and]   [equivalence]
  • Sequential operators: go to,   do,   return,   stop,   for,   if,   or,   if either,   or if
  • Separators: .   ,   :   ;   :=   =:      10   begin   end
  • Brackets: (   )   [   ]     
  • Declarators: procedure,   array,   switch,   type,   comment
The role of spaces is not described in the Preliminary Report. It seems likely the intent is that, as in FORTRAN, spaces can be inserted for readability, but they do not serve as delimiters, and their presence or absence does not affect the interpretation of the program text.

Note that, as with Algol-60, the boldface words in the lists above are considered not to be identifiers or sequences of characters, but atomic symbols in their own right. Thus, "go to" is not two words or five characters, but a unique symbol in the language. The subscripted "10" is used to indicate the power-of-10 scaling factor for numbers in scientific notation.

Also note that this is the symbol set for the reference language, which is intended to specify the language itself and not necessarily be used for coding programs or publishing algorithms. The report anticipated that a more restricted set of characters, a hardware representation, would need to be used by an actual compiler targeting a real machine. It also anticipated that an expanded set of characters might be used in a publication language used to communicate algorithms in the literature. Each instance of a publication language or hardware representation is to be accompanied by a set of transliteration rules for mapping to and from the reference language.

Data Types and Declarations


The Algol-58 report specifies three data types – integer, real (floating-point), and Boolean (logical) values, plus arrays of these entities. It appears to anticipate the use of additional types, but is silent on how they should be handled.

Literals


Numeric literals have the general form I.F10±E, where I is a whole integer, .F is a decimal fraction, and 10±E is a scaling factor as a power of ten. The fraction and scale factor are optional, as is the sign preceding the exponent of the scaling factor. A number could also be expressed as simply a scaling factor by omitting the integer 1 before it, e.g., "10-3" for 0.001.

Boolean literals are represented as the integer literals 0 [false] and 1 [true].

Identifiers and and Types


Identifiers are defined in the usual way, starting with a letter and optionally followed by a string of letters and decimal digits. No maximum length for an identifier is specified.

By default, all identifiers are considered to refer to variables of type real. A type declaration can be used to specify that an identifier refers to an integer or Boolean variable. Note that the type declaration does not define a variable or assign it storage. The declaration merely associates a type with an identifier. Variables are declared upon first reference to their identifiers, which need not appear in any type declaration.
integer (i, j, k); Boolean (on, off);

Array Declarations


Array declarations name a list of identifiers to be considered as arrays and to specify the bounds of each array's dimensions. The bounds notation is a little different from Algol-60 -- all of the lower bounds are specified first in a comma-delimited list, then a colon (:), then the corresponding upper bounds are specified in a second comma-delimited list. The Report specifies the bounds to be "integers," so it appears that dynamic array sizing is not supported. Multiple array identifiers can be associated with a common set of bounds, as in Algol-60. If an array is to be of a type other than real, it must appear in a type declaration, and must be followed by square brackets containing sufficient commas to indicate the dimensionality of the array:
integer i, j, m[], n[,,];
array a, b, c[1:10], d[0, 0:7,19], m[-7:7], n[1,1:9,9];

Switch Declarations


A switch defines a multi-way branch, after the idea of the computed go-to in FORTRAN, but essentially the same as switches in Algol-60. Note that it is a declaration, not a statement. It defines an ordinal list of designational expressions, which are either simple statement labels or indexed switch identifiers. A statement label may be an identifier or a literal integer.

A switch is used by the go to statement, which takes as its target not a label, but the more general designational expression construct. The index used with a switch identifier is evaluated dynamically, not only for a switch named in a go to statement, but also for any switch identifiers used in designational expressions within switch declarations. Thus the evaluation of a designational expression may result in a cascade of switches being evaluated before a label is finally encountered and the destination of a branch determined.
switch s1:= (start, 44, s2[i×j-1], s1[k]);
switch s2:= (5, 4, 3, 2, 1, blastoff);

go to s1[3];

Comment Declarations


Algol-58 considers a comment to be a declaration. It can appear between any instance of other declarations and statements, but otherwise has no effect on the meaning of the program. Like the Algol-60 comment, it starts with the symbol comment and continues until the next semicolon (;) is encountered.

Function Declarations


Algol-58 supports the declaration of an expression as a function, similar to the in-line function of FORTRAN. The function defines a list of parameters, the actual values of which will be substituted into the expression. Any variables in the expression not named in the parameter list are referenced globally in the containing program.

toRads(degrees) := degrees×3.1415926/180;
Pythagorean(a, b) := sqrt(a↑2↓ + b↑2↓);

length := Pythagorean(base, height);
The language also supports procedure declarations and calls. These are a more complex subject, and are discussed in a separate section below.

Expressions


Expressions in Algol-58 are formed in the usual way, from combinations of variable references, function calls, and operators of various types. The values of scalar variables are referenced as simple identifiers. Elements of arrays are referenced as the identifier for the array, followed by a comma-delimited list of subscripts in square brackets ([ ]). Subscripts can be general arithmetic expressions, which are rounded to integers before indexing into the array.

Functions (and procedures that return values as a function) are evaluated by naming the function, followed by a comma-delimited list of expressions in parentheses corresponding by position to the parameters of the function, as shown in the example for the Pythagorean function declaration above.

Arithmetic Expressions


An arithmetic expression, A, may take any of the following forms:
  • A numeric literal, e.g., "123.456"
  • A simple identifier, e.g., "alpha"
  • A subscripted identifier, e.g., "points[i, j]"
  • A function call, e.g., "Pythagorean(x, y)"
  • An arithmetic expression prefixed by "+" or "–", e.g., "–A"
  • Two arithmetic expressions joined by an arithmetic operator (+ - × /), e.g., "A/7.0"
  • One arithmetic expression followed by a second arithmetic expression in up/down arrow brackets to indicate the first expression raised to the power of the second expression, e.g., "A1↑A2↓"
  • An arithmetic expression in parentheses, e.g., "(A)"
Evaluation of arithmetic expressions is by default from left to right, with multiplication and division having precedence over addition and subtraction. The Report is silent on the precedence of exponentiation, but presumably it is higher than multiplication/division. Parentheses may be used to alter or make clear the precedence by which operators are to be applied to values.
alpha
alpha + beta / gamma
(alpha + beta) / gamma
a↑2↓ / 2↑b-1↓
(m[i] + n[j, k])×(m[j] - n[k, i])

Boolean Expressions


A Boolean expression, B, may take the following forms:
  • The literal value for false, e.g., 0
  • The literal value for true, e.g., 1
  • A simple or subscripted identifier, e.g., "empty" or "flags[k]" 
  • A function call, e.g., "testValid(assertions())"
  • An arithmetic relation, consisting of two arithmetic expressions (A) separated by a relational operator (< = > ) and enclosed in parentheses, e.g., (A1 < A2)
  • A Boolean expression prefixed by the not operator, e.g., ¬B
  • Two Boolean expressions separated by a Boolean operator ( ), e.g., B1 B2
Evaluation of Boolean expressions is strictly left to right. Parentheses must be used to specify any operator precedence. The not operator applies only to the expression immediately to its right.
(alpha ≥ beta) (gamma ≠ epsilon) override

Statements


The individual operations of a program are defined by statements. As is typical, statements are executed sequentially unless directed otherwise by a containing statement. Algol-58 is a free-form notation, without particular line-length or columnar conventions. Statements are separated by (not terminated by) the semicolon (;). The end of a line or record in the program text has no syntactic significance.

Algol-58 introduced the idea of a compound statement by using the symbols begin and end to bracket a sequence of individual statements. The compound statement may then be used in any place that an individual statement can be used, e.g., as the object of an if or for statement. There is no provision for an empty or null statement, so strictly speaking, a semicolon cannot precede an end.

Statements can be prefixed by a label and a colon (":"), which serves to declare the label. The label can be an identifier or an integer literal. If a compound statement has a label, then the terminating end of the compound statement may optionally repeat the label to indicate the span of the compound statement, e.g.,
start:
        begin
       
x:= x+1;
        if (x < 100);
                go to start;
        end start
Declarations may be interspersed with statements. The order and placement of declarations within a program is immaterial. Algol-58 did not support the Algol-60 concept of blocks, where declarations within a compound statement create a nested identifier scope, which has visibility of the identifier scopes of all containing blocks. Declarations embedded within compound statements are interpreted as if they were placed outside any compound statement.

Assignment Statements


The assignment statement works similar to that in most languages. The identifier of a variable (possibly subscripted) on the left is assigned the value of the expression on the right. The assignment operator is ":=", a standalone equal sign ("=") being reserved for use in arithmetic relations.

If the variable on the left of the assignment operator is of an arithmetic type, the expression on the right must yield an arithmetic value. The Report implies that the value of the expression will be coerced as necessary to the type of the assignment variable, but does not explicitly say so. Mixed integer/real expressions are permitted, but the Report does not say how the resulting type of an expression should be determined.

If the variable on the left is arithmetic and the expression on the right is Boolean, the truth value of the expression will be converted to an integer 0 if it is false or 1 if it is true. The converse is not true, however -- only Boolean expressions can be assigned to Boolean variables.

Go To Statements


The go to works as in most languages, except that as noted earlier, the operand of a go to is a designational expression. Thus you can branch to a simple label or to an indexed switch identifier. As also mentioned earlier, all designational expressions in switches are evaluated dynamically using the current values of their index expressions. Index expressions are ordinal (i.e., one-relative). The Report is silent on the behavior of a switch when its index is less than one or greater than the number of designational expressions in its declaration.

If Statements and Alternative Statements


The if statement works as in most languages, except that it serves only as a guard for the next statement. There is no else in Algol-58. The guarded statement may of course be a compound statement. The condition to be tested may be any Boolean expression.
if (i > 0);
        if (j < i);
                j := i+1;
Algol-58 implements a variant of the if statement that functions as a skip-chain. Each if either or or if is evaluated in sequence, the statement associated with the first condition that evaluates true is executed, and any remaining conditions and statements are skipped. There is no provision for a default or catch-all statement if none of the specified conditions is met:
if either (i = 1);
        j := i+1;
or if (i = 2);
        j := i+2;
or if (i = 3);
        j := j+i;
or if (i > 3);
        begin
        if (j > 0);
                j := i;
        if (j < 0);
                j := 0;
        end
end

For Statements


The only iterative statement in Algol-58 is the for statement, which extended FORTRAN's idea and anticipated the complex for statement of Algol-60. Specification of the sequence of values over which iteration will occur has two forms. The first is a simply comma-delimited list of arithmetic expressions. The control variable is assigned the value of the next expression in the list at the start of each iteration. When the end of the list is reached, the loop terminates:
for x := 2, 4, 5, 9, 16, 27, 31, -1;
        cell[x] := cell[x]×3;
The second form is more like that of FORTRAN, with initial, increment, and final values for the control variable. Each value may be an arithmetic expression. The middle expression in parentheses is the increment value:
for x := 1 (1) n-1;
        begin
    
   if (cell[x] < 0);
                cell[x] := -cell[x];
        if (cell[x] > 0);
                cell[x] := 1
       end
The two forms can be combined to iterate over a complex sequence of values:
for x := 1, 5, 8, 99 (-7) 9, 101, 127, i (j) k, 255;
        begin ... end

Do Statements


The do statement is not so much an executable statement as a generator of executable statements. It is more in keeping with the idea of a macro. This is best illustrated with a simple example:
first: A := (B) / (C);
last:  go to D;
...
do first, last (resultA, alpha×beta-1B, gammaC, topD)
The do statement names a range of other statements identified by the labels of the first and last statements in the range. A one-statement range may be specified using a single label. The entire do statement is replaced by a copy of that range of statements, less the bracketing labels and any embedded declarations. Within the copy, each of the identifiers on the right of an arrow ("") is replaced by the corresponding sequence of symbols on the left of the arrow. In the example above, the do statement would be replaced by:
result := (alpha×beta-1) / gamma;
go to  top;
The range of statements copied by a do statement can itself contain do statements, in which case any nested do statements are replaced first, so that the final text contains no do statements.

Stop Statements


The stop statement terminates execution of the program. No resumption of execution from the stop is possible.

Procedure Declarations and Procedure Calls


The concept of a procedure in Algol-58 is that of a closed subroutine having possibly multiple entry and exit points. All identifiers referenced within the body of a procedure refer either to locally-declared variables and labels or to the formal parameters of the procedure. Global variables in the containing program are not visible within the procedure unless they are passed as parameters. Variables local to the procedure body are not visible outside the procedure.

Procedure Declaration


Parameters to a procedure are classified as either input parameters or output parameters and are specified in separate parameter lists delimited by a "=:" symbol. The general form of a procedure declaration is:
procedure I1 (Pi1) =: (Po1),    I2 (Pi2) =: (Po2),    I3 (Pi3) =: (Po3), ...
where the In are the identifiers of entry points within the procedure. Each such identifier must also be a label within the body of the procedure that defines the location where that entry point will commence execution.

The Pin are lists of formal input parameters that supply values to the procedure, and the Pon are lists of formal output parameters that receive results from the procedure. The formal variables in the lists can, of course, be different for the different entry points.

Formal input parameters may consist of a scalar identifier, an array identifier followed by some number of empty subscript positions in square brackets (to indicate the dimensionality), or a function or procedure identifier followed by some number of empty parameter positions within parentheses (to indicate the number of parameters to the formal function or procedure).

Formal output parameters may consist of a scalar identifier, an array identifier with empty subscript positions as for input parameters, or a label identifier. The output parameter list and its preceding "=:" can be omitted, in which case the procedure can be called as a function and be used within an expression. The return value is defined by assigning it to the entry label identifier as if it were an ordinary scalar variable, similar to the method used in Algol-60. See the example below for an illustration of this.

Following the procedure heading defining the entry points and parameter lists, there can be a set of declarations supplying type information for the formal input and output parameters. The type for label parameters is deduced from the context of their use in the body of the procedure. Surprisingly, array declarations for formal parameters can have expressions in their bounds lists, provided any identifiers in the expressions are formal parameters Undeclared formal parameters are of type real, as usual. The body of the procedure follows this, and must be enclosed in a begin/end pair. Any declarations within the body of the procedure refer only to local variables.

Procedure Call Statement


A procedure call statement states the identifier of a procedure entry point, followed by input and output lists for the actual parameters, with the two lists again delimited by "=:". The Report specifies that in the call, the replacement of formal parameters by actual parameters "may be considered to be a replacement of every occurrence within the procedure of the symbols (or sets of symbols) listed in formal parameters, by the symbols (or sets of symbols) listed as actual parameters in the corresponding positions of the procedure statement, after enclosing in parentheses every expression not enclosed completely in parentheses already." This seems to be an early form of call by name, which was more fully developed in Algol-60. There is no call by value, even for input parameters.

The report allows array slices to be passed to array parameters. The dimensionality of the actual parameter slice must match that of the formal parameter, i.e., the number of empty subscript positions in the actual and formal parameters must be the same. More interestingly, functions and procedures passed as actual parameters may have a mixture of supplied and empty parameters. Again, the number of empty parameters must match between the actual and formal parameters. This appears to imply support for currying or closures.

Exit from a procedure is accomplished using either the return statement (which can only occur within the body of a procedure) or a go to statement referencing (perhaps indirectly through a switch) a formal label in the output parameter list. Providing multiple exits from a subroutine was a common technique in machine-language programming at the time, so label parameters attempt to provide a similar facility. Each path of execution from an entry point must eventually reach a return or a go to referencing a formal label. The behavior if execution runs off the end of the procedure body without executing a return or a go to referencing a formal label is not defined.

Example Procedure


The Report show an example procedure declaration for Simpson's Rule integration. It is reproduced below, without its comments, with semicolons inserted between statements, and reformatted into a more modern style. Note that "F()" indicates a formal function with one parameter, and "abs()" is an intrinsic function defined by the Report with the customary meaning of absolute value:
procedure Simps (F(), a, b, delta, V);
        begin
Simps:
        Ibar := V×(b-a);
        n := 1;
        h := (b-a)/2;
        J := h×(F(a) + F(b));
J1:
        S := 0;
        for k := 1 (1) n;
                S := S+F(a + (2×k-1)×h);
        I := J + 4×h×S;
        if (delta < abs(I-Ibar));
                begin
                Ibar := I;
                J := (I+J)/4;
                n := 2×n;
                h := h/2;
                go to J1
                end;
        Simps := I/3;
        return;
        integer (k, n)
        end Simps
A call on this procedure might look like this, where "poly()" is some function or value-returning procedure defined elsewhere in the program:
area := Simps(poly(), x, x+20, 210-5, 51025);

Impact of the Preliminary Report


Perhaps the most enduring impact of the Preliminary Report was upon the continuing effort to refine the language, resulting two years later in publication of the 1960 "Report on the Algorithmic Language ALGOL 60" [4], followed by the 1963 "Revised Report on the Algorithmic Language Algol-60" [5].

As should be clear from comments in the overview above, the Algol-58 Preliminary Report is not a complete specification for a programming language, nor was it ever intended to be. The first sentence in the report makes this clear:
Note: In the interest of immediate circulation of the results of the ACM-GAMM committee work on an algebraic programming language, this preliminary report is presented. ... This report has not been thoroughly examined for errors and inconsistencies.
The Report leaves many things unsaid, and not a few unclear. It should be viewed for what it was, a progress report on an effort to define a universal algorithmic language. Many ideas that became more fully developed in Algol-60 appear in this Report, and no doubt a great deal was learned from its preparation about how a language should be specified, leading to the development of Backus-Naur Form and the elegant syntax-and-semantics presentation of the Algol-60 report.

Nonetheless, the Report was good enough, and the ideas in it attractive enough, to get a lot of creative juices flowing.

The first compiler for Algol-58 was running by the end of 1958 on a Zuse Z22 computer in Mainz, Germany. It was constructed by the so-called ZMMD group, consisting of Friedrich Bauer, Hermann Bottenbruch, Heinz Rutishauser, and Klaus Samelson, the GAMM representatives to the joint working session that produced the Report. That this compiler was available only six months after the joint session may indicate the extent to which the GAMM proposals formed the core of the preliminary language and how far along the ZMMD group was towards creating a compiler at the time of the joint session.

Availability of the Preliminary Report quickly launched the development of many other hardware implementations of Algol-58, all deviating in greater or lesser degrees from the reference language described by the Report. The best known of these are:
  • ALGO (Bendix Corporation, Los Angeles, California: Bendix G-15)
  • ALGOL 30 (Dartmouth College, Hanover, New Hampshire: Librascope LGP-30)
  • BALGOL (Burroughs Corporation, Pasadena, California: Burroughs 220)
  • JOVIAL (System Development Corporation, Santa Monica, California: multiple targets)
  • MAD (University of Michigan, Ann Arbor, Michigan: IBM 7090, Univac 1108)
  • NELIAC (Naval Electronics Laboratory, San Diego, California: multiple targets)
BALGOL followed the syntax and semantics Preliminary Report quite closely and implemented most of its features, although sometimes with slightly different syntax. In the next post [1], we will examine BALGOL in some detail and see its adaptation of Algol-58.

References


[1] "BALGOL – the Burroughs Algebraic Compiler,"
http://datatron.blogspot.com/2018/08/balgol-burroughs-algebraic-compiler.html
[2] "A Dozen Precursors of FORTRAN," Donald Knuth, lecture on 2003-12-03, Computer History Museum, http://www.computerhistory.org/collections/catalog/102622137.
[3] "Preliminary Report--International Algebraic Language," Communications of the ACM, vol. 1 #12, December 1958, pp. 8-22,
http://www.softwarepreservation.org/projects/ALGOL/report/Algol58_preliminary_report_CACM.pdf.
[4] "Report on the Algorithmic Language ALGOL 60", ed. Peter Naur, Communications of the ACM, vol. 3, #5, May 1960, p.299-314,
http://www.softwarepreservation.org/projects/ALGOL/standards/report/Algol60_report_CACM_1960_June.pdf.
[5] "Revised Report on the Algorithmic Language Algol-60," ed. P. Naur, Communications of the ACM, vol. 6 #1, January 1963, p. 1-17,
http://www.softwarepreservation.org/projects/ALGOL/report/Algol60_report_CACM_1960_June.pdf.