This is Paul Kimpel, with another post on Tom Sawyer's 205 blog. In this second part, I will discuss programming with the Shell Assembler, along with its relatively advanced features for making the programmer's job easier and more productive. Unlike Knuth's EASY and MEASY assemblers , which were built to be lightweight and fast, Shell was built for power and convenience. It became the assembler of choice for shops that had to write and maintain a lot of code.
Operator and Pseudo-Operator Mnemonics
In contrast to EASY and MEASY, which use a set of operator mnemonic codes designed to be compatible between the Burroughs 205 and 220 computer systems, the Shell Assembler uses the standard set of mnemonics initially established by ElectroData and continued by Burroughs in their reference manuals:
00 PTR 20 CU 40 MTR 60 M 80 FAD 01 CIRA 21 CUR 41 -- 61 DIV 81 FSU 02 STC 22 DB 42 MTS 62 -- 82 FM 03 PTW 23 RO 43 -- 63 EX 83 FDIV 04 CNZ 24 BF4 44 CDR 64 CAD 84 -- 05 -- 25 BF5 45 CDRI 65 CSU 85 -- 06 UA 26 BF6 46 -- 66 CADA 86 -- 07 PTWF 27 BF7 47 -- 67 CSUA 87 -- 08 STOP 28 CC 48 CDRF 68 -- 88 -- 09 -- 29 CCR 49 -- 69 -- 89 -- 10 DAD 30 CUB 50 MTW 70 MRO 90 (FAA) 11 BA 31 CUBR 51 -- 71 EXC 91 (FSA) 12 ST 32 IB 52 MTRW 72 SB 92 (FMA) 13 SR 33 CR 53 -- 73 OSGD 93 (FDA) 14 SL 34 BT4 54 CDW 74 AD 94 -- 15 NOR 35 BT5 55 CDWI 75 SU 95 -- 16 ADSC 36 BT6 56 -- 76 ADA 96 -- 17 SUSC 37 BT7 57 -- 77 SUA 97 -- 18 -- 38 CCB 58 CDWF 78 -- 98 -- 19 -- 39 CCBR 59 -- 79 -- 99 --
It also implements a number of synonyms for some of the instructions, e.g., A for AD (Add) and S for SU (subtract). For some reason, the version of the assembler we have does not implement mnemonics for the floating-absolute instructions (90=FAA, 91=FSA, 92=FMA, 93=FDA). It would not be difficult, however, to add these to the assembler's symbol table.
In addition to supporting the mnemonic operator codes for the 205 machine instructions, the Shell Assembler offers these pseudo operators:
- REM: a remark or comment. The remaining contents the card are ignored by the assembler, except that they are printed on the listing.
- EQU: assign a value to a symbol. This is typically used to assign a symbol to an absolute address, or to assign the value of one symbol to another.
- ORG: specify the next assembly address. Normally, the assembler increments its address counter by one after assembling an instruction. This pseudo operator is typically used to establish a starting address for a program or to position a section of a program at a particular address.
- REG: reserve a block of memory locations. This simply adds the value of the operand field to the address counter. Any label for the line is assigned the value of the starting address of the block
- MOD: advances the address counter forward as necessary so that it is a modulo 20 address. This is particularly useful for positioning the address of magnetic tape records in memory. Data transfers to and from tape can begin at any main memory address, but because magnetic tape I/Os are buffered through the 4000 and 5000 high-speed loops on the memory drum, and because of the way that blocking transfers between main memory and the high-speed loops work, the word at the first mod-20 congruent address is the first one read from or written to the tape. Thus a program must read data from a tape using the same mod-20 congruent address as was used to write data to the tape in order for the words end up in the same sequence in memory. Starting the memory buffers on mod-20 addresses helps eliminate any confusion as to the word ordering between tape and memory.
- NUM: define a one-word numeric literal value. The value must be right-justified in the 10-character operand field, with the sign coded as either 0/1 or +/- in the tag field. Spaces in the operand and tag fields are interpreted as zeroes. See the "Input Card Layout" section below for the field locations.
- ALF: define a one word (five character) alphanumeric literal encoded using Cardatron card codes.
- ALFS: define an alphanumeric literal up to six words (30 characters) in length, encoded using Cardatron codes. The text of the literal is written in the remarks field of the card. The length of the literal in words (1-6) must be written right-justified in the address portion of the operand field. The resulting words are stored in reverse order in memory to support both the way the Cardatron accessed memory and the decrement-and-branch indexing native to the 205's B register.
- FLX and FLXS: similar to ALF and ALFS, but generate words using the older Flexowriter code used on the Datatron 204 and earlier system models before the Cardatron was introduced.
- TAG: specify the high-speed loop to which addresses in assembled instructions should be assigned. Efficient programming for the 205 requires that instructions be transferred to one of the high-speed loops (typically the 7000 loop) and executed from there. The TAG operator specifies the digit (4-7) that the assembler will automatically set in the high-order digit of the operand address for instructions that reference memory, saving the programmer the trouble of continually having to override that digit on individual instructions. This behavior can be overridden on individual instructions as described for the Loop Tag field below, or disabled altogether by coding a TAG operator with an "X" as the tag digit.
- MACRO and NEW: define assembler macro instructions. The difference between them is that MACRO is used to define a temporary macro that will exist only for the current assembly run, while NEW defines a macro that can be used in the current assembly run, but will also be inserted into the macro library on magnetic tape. See the "Macro Definition and Usage" section below for more information on how these two pseudo-operators are used.
- SUBR: causes a subroutine to be copied from the macro library and appended to the end of the program being assembled. No call linkage to the subroutine is provided. A given subroutine will be included only once, no matter how many references are made to it using SUBR. Thus a body of code can be written as a closed subroutine, and a macro written to provide parameters and linkage to the subroutine, using SUBR within the macro to call out the closed body of code from the library. Each macro invocation will generate unique linkage code at that point in memory, but the body of the closed subroutine will be included only once. Subroutines can be added to the library as if they are a parameter-less macro, using NEW.
- RFB, WFB, and PFB: generate Cardatron format bands. RFB generates a format band for reading cards; WFB generates a format band for printing to a line printer; PFB generates a format band for punching cards. These operators use a simple notation to define the card or print line format, dramatically reducing the effort and potential for error compared to coding a band manually. See the "Cardatron Format Band Assembly" section below for more information.
- END: signals the end of the main program or a macro. The card deck must always end with this pseudo-instruction.
Programming with the Shell Assembler
Source program input to the Shell Assembler was punched on cards. This section describes the layout of the cards, how the fixed columns of that layout were used, and some special features of the assembler.
As described in an earlier post , the Cardatron input unit requires a punch in a specified column to select a format band from its small drum memory. The codes in this format band describe how the Cardatron will reformat and translate columns of data on the IBM card machine to the 205 memory. The Shell Assembler normally takes its band selection from column 1 and uses two format bands:
- Format for instruction cards.
- Format for a header card.
The Header Card
The first card in the deck must have a "2" in column 1, indicating it is the header card. The contents of this card are read using the format band defined in FMB7 of Movement 1 and output to the line printer using the band defined in FMB6. The contents of this card are not otherwise used by the assembler.
- Columns 1-3 are read as a number.
- Columns 4-53 are read alphanumerically (these are probably intended to be a title).
- Columns 54-55 are also read alphanumerically.
- Columns 56-61 are read numerically (these appear to be intended as a date in MMDDYY format).
- The remaining 19 columns on the card are ignored.
Input Card Format
The remaining cards of the source program must have a "1" in column 1. These cards carry the instructions that comprise the program.
Cards for format band 1 have the following layout:
1 10 20 30 40 50 60
Columns 2-9 and 66-80 are ignored by the assembler, and can be used for sequencing or identification of the deck.
Column 1: Cardatron format band code. The character in this column is used by the Cardatron, but otherwise ignored by the assembler.
Columns 10-14: Symbolic location label: the traditional label field for an assembly line. The symbol in this field is assigned the current value of the address counter. Labels are limited to five characters and must be unique within the program. They must be left-justified and start with an alphabetic character, but any characters, including punctuation, can be used in the other four positions.
Alternatively, this field could have a "program-point" label, written left-justified in the field as a period followed by a letter, e.g., ".A". Program points are temporary labels with a relative, local scope. They can be defined multiple times within a program. The program can refer to an address at the immediately-prior or immediately-next program point. Once a program-point label is redefined, its prior location falls out of scope and can no longer be referred to by subsequent instructions. See columns 26-30, below, for the way that these labels were referenced.
Program points allow a programmer to refer to locally-relevant locations, e.g. for short branches and local constants, without cluttering up the symbol table with labels that may be used only a few times within a small region of the program.
As discussed in a footnote on page 2 of the Part I manual for this assembler, the idea of program points apparently originated with Melvin Conway of the Case Institute of Technology (now Case Western Reserve University) and were first incorporated into the SOAP III assembler for the IBM 650 by Donald Knuth. Knuth later used program points in his EASY and MEASY assemblers. They also found their way into a number of later Burroughs assemblers. Program points were not available in STAR 0, the official Burroughs assembler for the 205, however.
Columns 15-19: "Spares": the sign and first four numeric digits of an instruction word. The sign must be coded in column 15, and can be a numeric digit, a space (which was translated to 0) or "-" (which was translated to 1). Odd-valued sign digits in instruction words cause the B-register to be added to the operand address to form the effective address used when the instruction executes. The other four digits are used for the breakpoint control digit (in column 19) and for control digits in I/O instructions. Programmers sometimes used this field for data. If all five columns are left blank, the sign and control digits are assembled as zeroes. If any of the four control digits are non-zero, all four must be coded.
Columns 20-24: Operation code: the symbolic operation code for the instruction or pseudo-instruction. This must be left-justified in the field. Alternatively, a numeric operation code can be entered in the last two positions of this field. The Shell Assembler uses the standard ElectroData/Burroughs mnemonics as discussed in the prior section.
Column 25: Loop tag: for instructions that reference memory addresses, a value of 4-7 in this column will cause the high-order digit of the assembled address to be set to this value, making it an address for one of the high-speed loops on the drum. If this field is left blank and a loop digit has been specified previously by the TAG pseudo-instruction, that loop digit will be placed in the high-order digit of the assembled instruction. If column 25 contains an "X", any effect of TAG is suppressed in the assembled instruction, and the high-order digit of the address is assembled normally.
Use of the TAG pseudo-instruction considerably eases the task of writing code to run in the 205's high-speed loops. Instructions must be assembled into main memory locations and then blocked to one of the loops. Instructions referencing locations within the loop, e.g., for short branches or local constants, must use loop addresses instead of the ones that would normally be assigned for their originally-assembled location, so the use of TAG allows this address modification to be applied automatically. Most of the code in the Shell Assembler itself is written under the influence of a TAG 7 directive, as most of the code is blocked into the 7000 loop by CUB instructions for execution.
Columns 26-30: Operand address: the coding in this field specifies the address field to be assembled into the instruction (possibly modified by the increment field below). This field can hold one of the following:
- Spaces: this causes the current value of the location counter to be assembled into the instruction. This is equivalent to the symbol "*" used by many other assemblers to designate the current value of the location counter.
- Absolute address: a literal, decimal address, right-justified in the five columns of the field. Leading zeroes could be replaced by spaces. This was normally used only with instructions that require a numeric operand, such as shifts and console I/O.
- Symbolic address: one of the alphanumeric symbols defined in columns 10-14 on some other line of the program.
- Program-point address: a reference to a program-point label. This is coded in columns 26-27 as either "+L" or "-L", where "L" is a letter. "+L" refers to next line in the source labeled ".L". "-L" refers to the prior line in the source with that label. Only the immediately-next or immediately-prior lines with that label can be referenced with this notation. Note that, depending on the type of print wheels installed in the 407, "+" (a 12-row punch on a card) may print as "&".
Columns 36-65: Remarks: These 30 columns can be used for comments, and are reproduced as-is on the output listing. This field is also used as the operand for the ALFS, FLXS, RFB, PFB, and WFB pseudo-instructions. Format-band coding for the three format-band instructions can overflow on up to eight continuation cards by leaving the operation code field on the continuation cards blank.
Cardatron Format Band Assembly
One of the most useful features of the Shell Assembler is its ability to construct Cardatron format bands from a fairly simple and intuitive notation.
Designing and coding format bands for the Cardatron is extremely tedious. As described in , a 205 format band consists of 315 digits, which occupy 29 words in memory before being loaded to the Cardatron drum. Bands are coded using values 0-3 in each digit position of a memory word, including the sign digits.
The Cardatron works by splitting each column on a card or line of print into two decimal digits, one for the numeric portion (the bottom nine rows on a card) and one for the zone portion (the top three rows on a card). The format band digits operate on the stream of numeric and zone digits coming from or going to the IBM device, by copying digits, deleting digits, or inserting zero digits into the stream. The result is a mapping between the 80 or 120 columns on the IBM device and up to 28 words in 205 memory. The format band digits can drop columns and insert sign, leading, and scaling digits on input. They can also insert blanks on output, control whether digits from memory words are output numerically or alphanumerically, reduce 10-digit memory word values to fewer columns on a card or print line, and position signs as either separate columns or overpunches.
This mapping is complicated by the order in which the Cardatron processes data. It starts with the last column on a card or print line and works towards the front, with the first word in memory mapped to the last columns on the card or print line. Thus the low-order digits in the first word of a format band in memory references the end of a card or print line. The band designer must code digits to account for exactly 80 columns on a card or 120 columns on a print line, and provide "3" (delete digit) codes in all remaining positions of the format band. On input, the designer must code a single additional "0" (insert zero digit) code at the end of the format band to push the last word being assembled out of the D register and into memory.
Electrodata developed a tabloid-size (11x17-inch) form to aid in designing and coding format bands. See page 6-11 in  for an example of the form used with the 220 Cardatron, which is almost identical to that for the 205. This form made it easier for the designer to keep track of columns on the IBM device, the mapping between those columns and words in 205 memory, and any padding required to fill the band data out to 29 memory words. Even with this form, though, the process of designing and coding the band digits was tedious and error-prone.
The format band pseudo-operators in the Shell Assembler dramatically simplify the task of designing and coding format bands. They use a free-field notation with letter codes and repeat factors, not unlike a FORTRAN FORMAT statement or a COBOL PICTURE clause. The band definition consists of phrases, one per 205 memory word, that define the mapping between digits in the memory word and columns on the IBM device. The word-phrases are delimited by "+" (or "&") characters. For example, consider the definition of format band 1 described in the prior section that is used to read the instruction card for the Shell Assembler:
The codes used in format band phrases have the following meanings:
FMB1 RFB 9B & 7¤P10Z¤ & 3¤P5A¤ & P8ZA
& 8¤P5A¤ & P10Z & 15B
- A - translate a column alphanumerically. On input, two digits from the card (usually the numeric and zone digits from the same column) are translated to the Cardatron code and stored in the next two digits in memory, proceeding from right to left in the memory word. On output, two digits in Cardatron code are translated from the memory word to numeric and zone codes for the next column (again, proceeding from right to left) on the card or print line.
- B - ignore a column. On input, the next two digits from the card are dropped and not stored in memory. On output, two zero digits are generated, producing a blank column on the IBM device.
- N - translate a column numerically. On input this copies the numeric digit from a card column to memory and discards the zone digit. On output, it copies a digit from memory as the numeric digit for a column and supplies a zero for the zone digit in that column. As a special case on output, if the digit in memory is a zero, a zone digit is output that causes a zero to be printed or punched in the column. This was necessary because the IBM devices consider a zero punch (third row on a card) to be a zone, not a numeric punch. Without this special case, zeroes in memory would be output as blank columns instead of zeroes.
- P - ignore the sign digit in memory. This code is identical to the Z code below, except that the assembler will allow a P only for the leftmost digit in a word. The combination of a special code for the sign digit and delimiters between word-phases allows the assembler to verify that exactly 11 digits have been generated or consumed by the band for each memory word.
- X - transfer zone digit. This is normally used to place the sign in a column. The combination XN will treat the sign as an over-punch on the numeric digit formatted by N. The combination XB will treat the sign as a separate column.
- Z - ignore digit position in memory. On input, this will not consume a digit coming from the card and will store a zero digit in memory. On output, it will skip the next digit in memory and output nothing to the IBM device.
- ¤ - the lozenge (which prints as a right-parenthesis on some tabulators) serves as a bracket character to group one or more phrases for repetition.
For the format band example above, the word-phrases can be understood to have the following meaning:
- 9B - skip the first 9 positions on the card.
- & 7¤P10Z¤ - store seven words of zeroes into memory. The P stores a zero in the sign-digit position and 10Z stores the other ten digits of the word.
- & 3¤P5A¤ - transfer the next 15 columns on the card alphanumerically to the next three words in memory. This corresponds to the Symbolic Location, Spares, and Operation Code fields in columns 10-24 on the card. The P stores a zero in the sign digit of each five-character word.
- & P8ZA - Store the alphanumeric code for next column (TAG in column 25) in the low-order two digits of the next word in memory. P8Z stores zeroes in the sign and high-order eight digits of that word.
- & 8¤P5A¤ - Transfer the next 40 columns (26-65) from the card alphanumerically into the next eight words in memory. This comprises the Operand Address, Address Increment, and Remarks fields on the card.
- & P10Z - Store another word of zeroes into memory.
- & 15B - Skip the remaining 15 columns (66-80) on the card.
The way that format bands are coded internally for input and output is slightly different. One very nice thing about the format-band phrase notation is that it is symmetric between input and output. Reading a card from an input device with a set of format phrases in an RFB specification and then writing that data using the same set of phrases in a WFB or PFB specification produces an identical set of columns on the output device. The digits in the format band will be somewhat different, but the assembler knows how to code the band differently based on the pseudo-operation code used.
The only difference between WFB and PFB is that bands for punching cards must format 80 columns, while bands for printing must format 120 columns. The band designer must still account for the appropriate number of columns on the IBM device, and the assembler will verify this, issuing an error message if the number of columns is incorrect. The designer does not need to worry about padding the band out to 315 digits, however -- the assembler does that automatically.
Macro Definition and Usage
Our word macro comes from the Greek word for long. In the context of computers and software, macro is typically used as an abbreviation for macroinstruction. There are many forms of macroinstructions, ranging from simple text replacement (e.g., named constants) to parameterized replacement of text (e.g., the C #define pre-processor) to the conditional and iterative generation of highly customized text.
Macro assemblers are assembler programs that implement the ability to create and invoke macroinstructions. The first assembler to employ macros appears to be Autocoder for the IBM 702 and 705 in 1956. The presence of macros in the Shell Assembler of 1958 is therefore a fairly early application of this concept. There were people at Shell at the time who had come from IBM, including Bob Barton, so it's possible that the idea of including macros in the Shell Assembler came from those former IBMers.
The main idea of macros in assembly languages is to take commonly-used sequences of instructions and encapsulate them, give that sequence a name, and call forth the sequence later, possibly specifying parameters that will customize the sequence at that point in the program. The invocation of a macro often looks like the coding of an instruction, but with the operation code naming the macroinstruction instead of the mnemonic for a machine instruction.
Macros and subroutines are related, in that both can be used to define a sequence of code once and then activate that sequence from many places later. Subroutines are generally stored in memory only once, the program transfers control to the subroutine when it is invoked, and the subroutine passes control back to the program when it is finished. Macros, on the other hand, generate code in the program at the point they are invoked. Each time the macro is invoked, additional code is generated at that point in the program, and the code may be customized based on the parameters passed as part of the macro invocation.
Another important difference is that while subroutines are executed at run time, macros are executed at assembly time -- invocation of a macro generates additional text to be assembled. The instructions resulting from assembling that additional text will be executed at run time, but not the macro itself.
Invoking Macros in the Shell Assembler
A macro in invoked by writing its name in the operation code field (columns 20-24) on a card. A label, if desired, can be written in the label field. Actual parameters to the macro, if any, are written in the operand address and increment fields (columns 26-35) just as they are for a normal instruction. The first parameter is written in the operand field to the right of the macro name. Any additional parameters are written in the operand fields of subsequent cards, one per card. The operation code of these continuation cards must be blank. There is no limit to the number of parameters a macro may have.
All forms of operands may be passed as macro parameters. The only exception to operand coding for macros as compared to normal instructions is that the loop tag is never applied to the parameter operand as part of the invocation -- a loop tag may be applied to the parameter when it is used inside the macro, of course, but operand parameters are passed to the macro as is, without loop address modification.
Thus, a macro instruction with three parameters might be coded as follows:
.G CLEAR 0020As discussed above, invocation of a macro generates code in the program at the point of invocation. The Shell Assembler does this a little differently than most macro assemblers, however. At the point of invocation, the assembler emits a CUB (30, Change Unconditionally and Block) instruction. The code generated by the macro is placed at the effective address of that instruction, which will be after the end of the non-macro instructions in the program.
Any return to the point of invocation is the responsibility of the macro, so it was typical to pass a label or relative address as one of the parameters to the macro. An example of this is shown in the third parameter above -- the +0001 is relative to the CUB instruction emitted by the assembler at the point the macro is invoked. Thus the return would be to the instruction immediately following the invocation.
The reason for using a CUB at the point of macro invocation is that, once again, efficient programming for the 205 requires that code be executed out of one of the high-speed memory loops as much as possible. Well-written 205 programs were coded mostly as short sequences of instructions linked by CUBs to load and execute the sequences. Thus, macro invocation automatically puts the generated code in the 7000 loop. It is the macro coder's responsibility to deal with macro expansion that exceeds the 20-word capacity of the 7000 loop.
Defining Macros in the Shell Assembler
Macros are defined using the pseudo-operators MACRO and NEW. MACRO defines a temporary macro that will available only during the current assembly run. NEW defines a macro that will be permanently cataloged on the library tape. Other than that distinction, the two pseudo-operators are identical and function the same. MACRO and NEW can also be used to define bodies of code to be called out using the SUBR pseudo-operator, although such definitions should not use parameters.
Temporary macros are stored on the library tape along with permanent macros; they are simply not recorded in the tape's catalog. For that reason, Section VII of the Part I manual recommends that MACRO definitions follow any NEW definitions to avoid leaving unusable gaps on the library tape. It also recommends that permanent updates to the library be done in a separate assembly run dedicated to that purpose.
To define a macro, code MACRO or NEW in the operation-code field and the name of the macro (up to five characters) in the operand address field of a card. In the middle two digits of the "spares" field (columns 17-18), code the number of cards that make up the macro definition, not including the MACRO/NEW card, but including the required END card that terminates the macro. The names of newly-defined macros must not match the name of any existing macro -- attempting to do so results in an error.
Within the body of the macro, instructions are written the same way they are in open code. Macros may invoke other macros. Parameters are referred to by their ordinal position in the invocation -- #0001, #0002, #0003, etc., and may be referenced as the operand address in instructions and other macro invocations. The effective address of these parameters may be modified by the operand address increment field and by means of loop tags.
Here is one possible implementation for the CLEAR macro invoked above. Note the count of cards in the first line of the macro definition (0080) and the negative sign on the second STC instruction, causing it to apply B-register modification to the operand address.
0080 MACRO CLEARWhen a macro invocation is encountered during Movement 1, the assembler emits a CUB with an unresolved forward address at that point in the program, then searches the macro library for the macro, and when found, makes an entry in a memory table for the macro and its parameters. After the END card is sensed in Movement 1, the assembler steps through this table, and for each macro invocation, re-positions the library tape to the macro in the library and appends its body on tape unit 1 after the records that were written there for the main portion of the program, partially assembling the instructions in the body in the same way that instructions in the main portion are during that first movement. In Movement 2, assembly of the instructions from the macro bodies is completed as for other instructions, and the addresses in the CUB instructions at the point of macro invocation are resolved to the location where the macro bodies are assembled.
- STC X #0002
CUB X #0003
.C 00X #0001-0001
The project repository has a folder  with small sample card decks and assembly listings, showing both definition and invocation of macros. The listings show how the CUB instructions emitted at the point of invocation branch to the code appended at the end of the program for each invocation.
Recovery of the Shell Assembler has been a very interesting part of this project. I wish we had more examples of its use in significant applications, especially ones demonstrating the macro facility, and perhaps some will surface in due time. In any case, it's an excellent example of a sophisticated piece of programming from the mid-late 1950s.
References Shell Symbolic Assembly Program for the Burroughs 205, Part I, Burroughs Corporation, Bulletin 3038, March 1960:
 Shell Symbolic Assembly Program for the Burroughs 205, Part II, Burroughs Corporation, Bulletin 3038, March 1960:
 "Knuth's EASY Assembler":
 "The Mighty CARDATRON":
 Burroughs 220 Operational Characteristics, Bulletin 5020A, Burroughs Corporation, August 1960:
 Folder of sample macro definitions and invocations: