65816 syntax

A Proposed Assembly Language Syntax For 65c816 Assemblers
by Randall Hyde

This is a proposed standard for 65c816 assembly language. The
proposed standard comes in three levels: subset, full, and extended. The
subset standard is intended for simple (or inexpensive) products,
particularly those aimed at beginning 65c816 assembly language programmers.
The full standard is the focus of this proposal. An assembler meeting the
full level adopts all of the requirements outlined in this paper. The
extended level is a mechanism whereby a vendor can claim full compliance
with the standard and point out that there are extensions as well. An
assembler cannot claim extended level compliance unless it also complies with
the full standard. An assembler, no matter how many extensions are
incorporated, will have to claim subset level unless the full standard is
supported. This ensures that programmers who do not use any assembler
extensions can assemble their programs on any assembler meeting the full or
extended compliance levels.

In addition to the items required for compliance, this proposal
suggests several extensions in the interests of compatibility with existing
65c816 assemblers. These recommendations are not required for full
compliance with the standard, they're included in this proposal as suggestions
to help make conversion of existing programs easier. The suggestions are
presented in two levels: recommended and optional. Recommended items should
be present in any decent 65c816 package. Inclusion of the optional items
is discouraged (since there are other ways to accomplish the same operation
within the confines of the standard) but may be included in the assembler
at the vendor's discretion to help alleviate conversion problems.

65c816 Instruction Mnemonics
----------------------------

All of the following mnemonics are required at the subset, full,
and extended standard levels.

The following mnemonics handle the basic 65c816 instruction set:

ADC - add with carry
AND - logical AND
BCC - branch if carry clear
BCS - branch if carry set
BEQ - branch if equal
BIT - bit test
BMI - branch if minus
BNE - branch if not equal
BPL - branch if plus
BRA - branch always
BRK - break point instruction
BVC - branch if overflow clear
BVS - branch if overflow set
CLC - clear the carry flag
CLD - clear the decimal flag
CLI - clear the interrupt flag
CLP - clear bits in P
CLR - store a zero into memory
CMP - compare accumulator
CPX - compare x register
CPY - compare y register
CSP - call system procedure
DEC - decrement acc or memory
DEX - decrement x register
DEY - decrement y register
EOR - exclusive-or accumulator
HLT - halt (stop) the clock
INC - increment acc or memory
INX - increment x register
INY - increment y register
JMP - jump to new location
JSR - jump to subroutine
LDA - load accumulator
LDX - load x register
LDY - load y register
MVN - block move (decrement)
MVP - block move (increment)
NOP - no operation
ORA - logical or accumulator
PHA - push accumulator
PHP - push p
PHX - push x register
PHY - push y register
PLA - pop accumulator
PLP - pop p
PLX - pop x register
PLY - pop y register
PSH - push operand
PUL - pop operand
RET - return from subroutine
ROL - rotate left acc/mem
ROR - rotate right acc/mem
RTI - return from interrupt
RTL - return from long subroutine
RTS - return from short subroutine
SBC - subtract with carry
SED - set decimal flag
SEI - set interrupt flag
SEP - set bits in P
SHL - shift left acc/mem
SHR - shift right acc/mem
STA - store accumulator
STX - store x register
STY - store y register
SWA - swap accumulator halves
TAD - transfer acc to D
TAS - transfer acc to S
TAX - transfer acc to x
TAY - transfer acc to y
TCB - test and clear bit
TDA - transfer D to acc
TSA - transfer S to acc
TSB - test and set bit
TSX - transfer S to X
TXA - transfer x to acc
TXS - transfer x to S
TXY - transfer x to y
TYA - transfer y to acc
TYX - transfer y to x
WAI - wait for interrupt
XCE - exchange carry with emulation bit

Comments:

CLP replaces REP in the original 65c816 instruction set, since CLP
is a tad more consistent with the original 6502 instruction set. See
"recommended options" for the status of REP. CLR replaces the STZ
instruction. Since STA, STX, and STY are used to store 65c816 registers,
STZ seems to imply that there is a Z register. Using CLR (clear) eliminates
any confusion. CSP (call system procedure) replaces the COP mnemonic. COP
was little more than a software interrupt in both intent and implementation.
CSP helps make this usage a little clearer. HLT replaces the STP mnemonic.
STP, like the STZ mnemonic, implies that the P register is being stored
somewhere. HLT (for halt) is just as obvious as "stop the clock" yet it
doesn't have the same "look and feel" as a store instruction. JML and JSL
are not really required by the new standard; but see recommended options
concerning these two instructions. Most of the new 65c816 push and pull
instructions have been collapsed into two instructions: PSH and PUL.

PEA label becomes PSH #label
PEI (label) becomes PSH label
PER label becomes PSH @label
PHB becomes PSH DBR
PHD becomes PSH D
PHK becomes PSH PBR

PLB becomes PUL DBR
PLD becomes PUL D

These mnemonics are more in line with the original design of the 6502
instruction set whereby the mnemonic specifies the operation and the operand
specifies the addressing mode and address. The RET instruction gets converted
to RTS or RTL, depending on the type of subroutine being declared. RTS and
RTL still exist in order to force a short or long return. SHL and SHR (shift
left and shift right) are used instead of ASL and LSR. The 6500 family has
NEVER supported an arithmetic shift left instruction. The operation performed
by the ASL mnemonic is really a logical shift left. To simplify matters, SHL
and SHR are used to specify shift left and shift right. SWA (swap accumulator
halves) is used instead of XBA. Since this is the only instruction that
references the "B" accumulator, there's no valid reason for even treating
the accumulator as two distinct entities (this is just a carry-over from the
6800 MPU). Likewise, since the eight-bit accumulator cannot be distinguished
from the 16-bit accumulator on an instruction by instruction basis (it depends
on the setting of the M bit in the P register), the accumulator should always
be referred to as A, regardless of whether the CPU is in the eight or sixteen
bit mode. Therefore, instructions like TCD, TCS, TDC, and TSC should be
replaced by TAD, TAS, TDA, and TSA. For more info on these new mnemonics,
see the section on "recommended options".

Built-in Macros
---------------

The following instructions actually generate one or more instructions.
They are not required at the subset level, but are required at the full and
extended levels.

ADD - emits CLC then ADC
BFL - emits BEQ (branch if false)
BGE - emits BCS
BLT - emits BCC
BTR - emits BNE (branch if true)
BSR - emits PER *+2 then BRA (short) or PER *+3 then BRL (long)
SUB - emits SEC then SBC

Recommended Options
-------------------

The following mnemonics are aliases of existing instructions. The
(proposed) standard recommends that the assembler support these mnemonics,
mainly to provide compatibility with older source code, but does not
recommend their use in new programs. Some (or all) of these items may be
removed from the recommended list in future revisions of the standard. None
of these recommended items need be present at the subset level. If these
are the only extensions over and above the full syntax, the assembler
CANNOT claim to be an extended level assembler.

ASL BRL COP JML JSL LSR PEA PEI PER
PHB PHK PHK PLB PLD REP TCD TCS TDC
TSC TRB WDM XBA

Symbols, Constants, and Other Items
-----------------------------------

Symbols may contain any reasonable number of characters at the full
level. At the subset compliance level, at least 16 characters should be
supported and 32 is recommeded. A "reasonable" number of characters should
be at least 64 if the implementor needs a maximum value.

Symbols must begin with an alphabetic character and may contain
(only) the following symbols: A-Z, a-z, 0-9, "_", "$", and "!". The
assembler must be capable of treating upper and lower case alphabetic
characters identically. Note that this does not disallow an assembler from
allowing the programmer to choose that upper and lower case be distinct, it
simply requires that in the default case, upper and lower case characters
are treated identically. Note that the standard does not require case
sensitivity in the assembler (and, in fact, recommends against it).
Therefore, anyone foolish enough (for many, many reasons) to create variables
that differ only in the case of the letters they contain is risking port-
ability problems (as well as maintenence, readability, and other problems).

The following symbols are reserved and may not be redefined within
the program:

A, X, Y, S, DBR, PBR, D, M, P

Nor may these symbol appear as fields to a record or type definition (which
will be described later).

Constants take six different forms: character constants, string
constants, binary constants, decimal constants, hexadecimal constants and
set constants.

Character constants are created by surrounding a single character by
a pair of apostrophes or quotation marks, e.g., "s", "a", '$', and 'p'. If
the character is surrounded by apostrophes, then the ASCII code for that
character WITH THE H.O. BIT CLEAR will be used. If the quotation marks are
used, then the ASCII code for the character WITH THE H.O. BIT SET will be
used. If you need to represent the apostrophe with the H.O. bit clear or a
quotation mark with the H.O. bit set, simply double up the characters, e.g.,

'''' - emits a single apostrophe.
"""" - emits a single quotation mark.

String constants are generated by placing a sequence of two or more
characters within a pair of apostrophes or quotation marks. The choice of
apostrophe or quotation mark controls the H.O. bit, as for character
constants. Likewise, to place an apostrophe or quote within a string
delimited by the same character, just double up the apostrophe or quotation
mark:

'This isn''t bad!' - generates --This isn't bad--
"He said ""Hello""" - generates --He said "Hello"--

Binary integer constants consist of a sequence of 1 through 32 zeros
or ones preceded by a percent sign ("%"). Examples:

%10110010
%001011101
%10
%1100

Decimal integer constants consist of strings of decimal digits without
any preceding characters. E.g., 25, 235, 8325, etc. Decimal constants
may be (optionally) preceded by a minus sign.

Hexadecimal constants consist of a dollar sign ("$") followed by
a string of hexadecimal digits (0..9 and A..F). Values in the range $0
through $FFFFFFFF are allowed.

Set constants are only required at the full and extended compliance
levels. A set constant consists of a list of items surrounded by braces,
e.g., {0,3,5}. For more information, see the .SET directive.

Address Expressions
-------------------

Most instructions and many pseudo-opcode/assembler directives require
operands of some sort. Often these operands contain some sort of address
expression (some, ultimately, numeric or string value). This proposed
standard defines the operands, precision, accuracy, and available operations
that constitutes an address expression.

Precision: all integer expressions are computed using 32 bits. All string
expressions are computed with strings up to 255 characters in length. All
floating point operations are performed using IEEE 80-bit extended floating
point values (i.e., Apple SANE routines). All set operations are performed
using 32 bits of precision.

Accuracy: all integer operations (consisting of two 32-bit operands and an
operator on those operands) must produce the correct result if the actual
result can fit within 32 bits. If an overflow occurs, the value is truncated
and only the low order 32 bits are retained. If an underflow occurs, zero
is used as the result. If an overflow or underflow occurs, a special bit will
be set (until the next value is computed) that can be tested by the ".IFOVR"
and ".IFUNDR" directives. Other than that, such errors are ignored. All
arithmetic is performed using unsigned arithmetic operations. All
floating point operations follow the IEEE (and Apple SANE) suggestions, and
are otherwise ignored by the assembler. Any string operation producing a
string longer than 255 characters produces an assembly time error. All set
operations must be exact.

Integer operations: The following integer operations must be provided at all
compliance levels:

+ (binary) adds the two operands.
- (binary) subracts second operand from the first.
* multiplies the two operands.
/ divides the first operand by the second.
\ divides the first operand by the second and returns the remainder.
& logically ANDs the two operands.
| logically ORs the two operands.
^ logically XORs the two operands.

=
<> These operators compare the two operands (unsigned comparison) and
< return 1 if the comparison is true, 0 otherwise.
>
<=
>=

- (unary) negates (2's complement) the operand
~ (unary) complements (inverts - 1's complement) the operand

The following operators must be provided at the full and extended compliance
levels:

<- shifts the first operand to the left the number of bits specified by the
second operand.
-> shifts the first operand to the right the number of bits specified by the
second operand.

@ (unary) subtracts the location counter at the beginning of the current
statement from the following address expression.

% (ternary, e.g.: X%Y:Z) This operator extracts bits Y through Z from X and
returns that result right justified.

Floating point operations: floating point numbers and operations are required
only at the full and extended levels. The following operations must be
available as well:

+ adds the two operands.
- subtracts the second operand from the first.
* multiplies the two operands.
/ divides the first operand by the second.
- (unary) negates the operand.

=
<> These operators compare the two operands and
< return 1 if the comparison is true, 0 otherwise.
>
<=
>=

String operations: strings and string operations are not required at the
subset level, but the standard recommends their presence. The following
string operations must be provided at the full and extended levels:

+ concatenates two strings
% (ternary, e.g., X%Y:Z) returns the substring composed of the characters in
X starting at position Y of length Z. Generate an error if X doesn't
contain sufficient characters.

=
<> These operators compare the two operands and
< return 1 if the comparison is true, 0 otherwise.
>
<=
>=

Set operations: sets and set operations are required only at the full and
extended levels. The following set operations must be provided:

+ union of two sets (logical OR of the bits).
* intersection of two sets (logical AND of the bits).
- set difference (set one ANDed with the NOT of the second set)

= returns 1 if the two sets are equal, zero otherwise.
<> returns 1 if the two sets are not equal, zero otherwise.
< returns 1 if the first set is a proper subset of the second.
<= returns 1 if the first set is a subset of the second.
> returns 1 if the first set is a proper superset of the second.
>= returns 1 if the first set is a superset of the second.

% (ternary, e.g., X % Y:Z) extracts elements Y..Z from X and returns those
items.

In addition to the above operators, several pre-defined functions are also
available. Note that these functions are not required at the subset
compliance level, only at the full and extended levels:

float(i) - Converts integer "i" to a floating point value.
trunc(r) - Converts real "r" to a 32-bit unsigned integer (or generates an
error).
valid(r) - returns "1" if r is a valid floating point value, 0 otherwise
(for example, if r is NaN, infinity, etc.)
length(s)- returns the length of string s.
lookup(s)- returns "1" if s is a valid symbol in the symbol table.
value(s) - returns value of symbol specified by string "s" in the symbol
table.
type(s) - returns type of symbol "s" in symbol table. Actual values
returned are yet to be defined.
mode(a) - returns the addressing mode of item "a". Used mainly in macros.
STR(s) - returns string s with a prefixed length byte.
ZRO(s) - returns string s with a suffixed zero byte.
DCI(s) - returns string s with the H.O. bit of its last char inverted.
RVS(s) - returns string s with its characters reversed.
FLP(s) - returns string s with its H.O. bits inverted.
IN(v,s) - returns one if value v is in set s, zero otherwise.

The following integer functions must be present at all compliance levels:

LB(i),
LBYTE(i),
BYTE(i) - returns the L.O. byte of i.
HB(i),
HBYTE(i) - returns byte #1 (bits 8-15) of i.
BB(i),
BBYTE(i) - returns bank byte (bits 16-23) of i.
XB,
XBYTE(i) - returns H.O. byte of i.
LW(i),
LWORD(i),
WORD(i) - returns L.O. word of i.
HW(i),
HWORD(i) - returns H.O. word of i.
WORD(i)

Pack(i,j)- returns a 16-bit value whose L.O. byte is the L.O. byte of i and
whose H.O. byte is the L.O. byte of j.

Pack(i,j,k,l)- returns a 32-bit value consisting of (i,j,k,l) where i is the
L.O. byte and l is the H.O. byte. Note: l is optional. If
it isn't present, substitute zero for l.

The order of evaluation for an expression is strictly left to right
unless parentheses are used to modify the precedence of a sub-expression.
Since parentheses are used to specify certain indirect addressing modes, the
use of paretheses to override the strict left-to-right evaluation order
introduces some ambiguity. For example, should the following be treated
as jump indirect through location $1001 or jump directly to location $1001?

JMP ($1000+1)

The ambiguity is resolved as follows: if the parenthesis is the first char-
acter in the operand field, then the indirect addressing mode is assumed.
Otherwise, the parentheses are used to override the left-to-right precedence.
The example above would be treated as a jump indirect through location $1001.
If you wanted to jump directly to location $1001 in this fashion, the state-
ment could be modified to

JMP 0+($1000+1)

so that the parenthesis is no longer the first character in the operand
field.

The use of parentheses to override the left-to-right precedence is
only required at the full and extended compliance levels. It is not
required at the subset compliance level.

Expression Types
----------------

Expressions, in addition to having a value associated with
them, also have a specific type. The three basic types of expressions are
integer, floating point, and string expressions. Integer expressions can
be broken down into subtypes as well. A hierarchical diagram is the easiest
way to describe integer expressions:

This diagram points out that there are two types of integer expres-
sions: constants and addresses. Further, there are two types of constants
and four types of addresses. Before discussion operations on these different
types of integer values, their purpose should be presented.

Until now, most 65xxx assembler did little to differentiate between
the different types of integer values. In this proposed standard, however,
strong type checking is enforced. Whereas in previous assemblers you could
use the following code:

label equ $1000
lda #Label
sta Label

such operations are illegal within the confines of the new standard. The
problem with this short code segment is that the symbol "label" is used as
both an integer constant (in the LDA instruction) and as an address
expression (in the STA instruction). To help prevent logical errors from
creeping into a program, the assembler doesn't allow the use of addresses
where constants are expected and vice versa. To that end, a new assembler
directive, CON, is used to declare constants while EQU is used to declare
an (absolute) address. Symbols declared by CON cannot be (directly) used
as an address. Likewise, symbols declared by EQU (and others) cannot be
used where a constant is expected (such as in an immediate operand).

Although this type checking can be quite useful for locating bugs
within the source file, it can also be a source of major annoyance. Some-
times (quite often, in fact) you may want to treat an address expression
as a constant or a constant expression as an address. Two functions are
used to coerce these expressions to their desired form: PTR and OFS.
PTR(expr) converts the supplied constant expression to an address expression.
OFS(expr) converts the supplied address expression to a constant expression.
The following is perfectly legal:

Cons1 CON $5A
DataLoc EQU $1000
lda #OFS(DataLoc)
sta PTR(Cons1)

For more information, see the section on assembler directives. PTR and OFS
are required at all compliance levels of this proposed standard.

While any constant value may be used anywhere a constant is allowed,
the 65c816 microprocessor must often differentiate between the various types
of address expressions. This is particularly true when emitting code since
the length of an instruction depends on the particular address expression.
If an expression contains only constants, direct page values, absolute
values, or long values, there isn't much of a problem. The assembler uses
the specified type as the addressing mode. If the expression contains mixed
types, the resulting type is as follows:

Allowable forms:

constant
direct constant+direct
absolute constant+absolute
long constant+long
absolute+long
constant+absolute+long

This says that if you expression contains only constants, then the
result is a constant. If it contains a mixture of constants and direct
page addresses, the result is a direct page address. Note that direct page
addresses cannot be mixed with other types of addresses. An error must be
reported in this situation (although you could get around it with an
expression of the form "abs+OFS(direct)"). Likewise, adding a constant to
an absolute address produces an absolute address. Adding an absolute and
a long address produces a long address, etc.

Sometimes, you need to force an expression to be a certain type.
For example, the instruction "LDA $200" normally assembles to a load
absolute from location $200 in the current data bank. If you need to force
this to location $200 in bank zero, regardless of the content of the DBR,
the address expression must be coerced to a long address. Coercion of this
type is accomplished with the ":D", ":A", ":L", and ":S" expression suffixes.
To force "LDA $200" to be assembled using the long address mode, the in-
struction is modified to be "LDA $200:L". The coercion suffix must always
follow the full address expression. The ":S" (for short branches) suffix
is never required, since a short branch (for BRA and BSR) is always assumed,
but it is included for completeness. For BRA and BSR, the ":L" suffix is
used to imply a long branch (+/- 32K) rather than the long addressing mode.

Caveats: If ":D" or ":A" is used to coerce a large address expression
to direct or absolute, the high order byte(s) of the expression are truncated
and ignored. The assembler must assume that when a programmer uses these
constructs he knows exactly what he's doing. Therefore, "LDA $1001:D" will
happily assemble this instruction into a "LDA $01" instruction despite the
actual value of the address expression.

Addressing Mode Specification
-----------------------------

65c816 addressing modes are specified by certain symbols in the op-
erand field. A quick rundown follows:

Addressing mode Format(s) Example(s)
--------------- ------------------ ----------------------

Immediate # LDA #0
= CMP =LastValue

Direct Page LDA DPG
:D LDA ANY:D

Absolute LDA ABS
:A LDA ANY:A

Long LDA LONG
:L LDA ANY:L

Accumulator {no operand} ASL
INC

Implied {no operand} CLC
SED

Direct, Indirect,
Indexed by Y (),Y LDA (DPG),Y
().Y LDA (ANY:D).Y

Direct, Indirect,
Indexed by Y, Long [],Y LDA [DPG],Y
[].Y LDA [DPG].Y

Direct, Indexed by X,
Indirect (,X) LDA (DPG,X)
(.X) LDA (ANY:D.X)

Direct, Indexed by X ,X LDA DPG,X
.X LDA DPG.X

Direct, Indexed by Y ,Y LDX DPG,Y
.Y LDX DPG.Y

Absolute, Indexed by X ,X LDA ABS,X
.X LDA ANY:A.X

Long, Indexed by X ,X LDA ANY:L,X
.X LDA LONG.X

Absolute, Indexed by Y ,Y LDA ANY:A,Y
.Y LDA ABS.Y

Program Counter
Relative (branches) BRA ABS
@ BRA @ABS

PC Relative (PSH) @ PSH @ABS

Absolute, Indirect () JMP (ABS)

Absolute, Indexed,
Indirect (,X) JMP (ABS,X)
(.X) JMP (ABS.X)

Direct, Indirect () LDA (DPG)
STA (ANY:D)

Stack Relative ,S LDA 2,S
.S LDA 2.S

Stack Relative,
Indirect, Indexed (,S),Y LDA (2,S),Y
(, MVN LONG,LONG
MVP LONG,LONG

, DPG- Any direct page expression or symbol.
, ABS- Any absolute expression or symbol.
, Long- Any long expression or symbol.
expr8- Any expression evaluating to a value less than
256.

Note: the only real difference between the existing standard and the proposed
standard is that the period (".") can be used to form an indexed address ex-
pression. This is compatible (in practice, as well as philosophy) with the
record structure mechanism supported by this proposed standard. This syntax
for the various addressing modes is required at all compliance levels.

Suggestion: ():L, ():L,Y, and (], [],Y, and
[ .EQU <16-bit value>
.EDP <8-bit value>
.EQL <24-bit value>
.CON <32-bit value>
.FCON

All except .FCON are required at all compliance levels. .FCON is required
at the full and extended levels.

.EQU lets you define a absolute symbol; an address whose value is
relative to the DBR. An error should be generated if the value in the
operand field requires more than 16 bits. The type of the operand expression
is ignored. It may be a constant expression, a direct page expression, or
even a long address expression. As long as it's an integer expression an
can fit into 16 bits, it's quite acceptable.

.EDP (equate to direct page) is used to define direct page symbols.
Again, the operand field may be of any integer type as long as the result
fits into 8 bits. A recommended synonym for .EDP is .EPZ (equate to page
zero) in deference to the 6502's zero page addressing mode.

.EQL (equate long) defines long address expressions. As usual, the
operand field may contain any integer expression that fits within 24 bits.

.CON (constant) is used to define integer numeric constants. Any
32 bit numeric value may be specified in the operand field.

.FCON (floating point constant) is used to declare symbolic floating
point constants. Such constants must be stored in the symbol table as
80-bit SANE extended values.

In addition to the typed equates, this proposed standard also allows
an untyped equate, which takes the form:

where "" is any valid operand that may appear in the operand field
of any instruction. 's type may be integer, string, floating point
and may also include an addressing mode. The following are all legal:

lbl = 5
lbl = 5.5
lbl = "Five"
lbl = Array,X
lbl = (dp,s),y

Labels defined by "=" may appear anywhere the operand field specified for
that label is allowed. In general, a simple string substitution should be
performed when a label defined by "=" is used. Note: a label declared by
"=" can be redefined without error throughout the program. The "=" directive
is required only at the full and extended compliance levels.

Data Definitions:
-----------------

While the equates are probably the most important assembler
directives, the data definition instructions are probably the most important
pseudo-opcodes around. These instructions are classed into four groups
determined by the types of operands they accept. In the following paragraphs
all optional items are enclosed within braces.

The first group of data reservation instructions accept any integer
type expression as operands. They are:

{label} .BYTE {expr1, expr2, ..., exprn}
{label} .WORD {expr1, expr2, ..., exprn}
{label} .LONG {expr1, expr2, ..., exprn}

If a label is present, it is treated as a statement label within the current
segment and assigned the value of the location counter before any bytes are
emitted. For the .BYTE opcode, one byte of data is emitted for each operand
in the operand field, that byte being the L.O. byte of each expression.
Operands are purely optional. If no operand appears, then an indeterminate
value is emitted. The .WORD opcodes outputs two bytes for each expression in
the operand field (or two indeterminate bytes if no operand is present). The
.LONG instruction outputs four bytes for each operand. These three pseudo-
opcodes must be present at all compliance levels.

The next group of pseudo-opcodes are used to create tables of
addresses. As such, they only allow symbols that have been defined by
.EQU, .EQL, "=" (as applicable), statement labels, procedure labels, and
segment labels in their operand fields. They are:

{label} .OFFS expr1 {,expr2, ..., exprn}
{label} .ADRS expr1 {,expr2, ..., exprn}
{label} .PTR expr1 {,expr2, ..., exprn}

.OFFS outputs two bytes for each operand; .ADRS outputs three bytes for
each operand; and .PTR outputs four bytes for each operand. These three
pseudo-opcodes are only required at the full and extended compliance levels.

The third group of declarations are used to create constant tables.
As such, they only allow symbols declared by .CON. They are:

{label} .SHORT expr1 {,expr2, ..., exprn}
{label} .INTEGER expr1 {,expr2, ..., exprn}
{label} .LONGINT expr1 {,expr2, ..., exprn}

These pseudo-ops output one, two, and four bytes respectively. These
pseudo-opcodes are not required at the subset compliance level, they are
required only at the full and extended levels.

Note: non-symbolic constants are allowed in any of the above
pseudo-opcodes. Only symbols should have their type information checked.

The last group of data declaration pseudo-opcodes are used to
initialize floating point values. These pseudo-ops are:

{label} .FLOAT {item1, item2, ..., itemn}
{label} .DOUBLE {item1, item2, ..., itemn}
{label} .EXTENDED {item1, item2, ..., itemn}
{label} .COMP {item1, item2, ..., itemn}

each instruction generates operands of 4, 8, 10, or 8 bytes in length,
respectively. If the operand field is left blank, the corresponding bytes
contain an indeterminate value, but the assembler should initialize them to
NaN (not a number). These four pseudo-opcodes are required only at the
full and extended levels.

Although not required by the standard, the following data declaration
directives are recommended and should be supported:

{label} .HBYTE expr1 {,expr2, ..., exprn}
{label} .BBYTE expr1 {,expr2, ..., exprn}
{label} .XBYTE expr1 {,expr2, ..., exprn}
{label} .HWORD expr1 {,expr2, ..., exprn}

the first three reserve one byte of memory for each operand and store the
H.O (bits 8-15), bank (bits 16-23), or extra byte (bits 24-31) respectively.
.HWORD reserves two bytes composed of bits 16-31 for each operand.

Arrays:
-------

Space for arrays and data tables can be reserved using the data
declaration statement mentioned above in conjunction with the "DUP" operator.
DUP is a binary operator that takes the form:

count DUP (list)

where count is some constant value and list is a (possibly empty) list of
values. The items in (list) are repeated "count" times. For example, the
following .BYTE statement reserves space for an array of 64 bytes and
initializes each byte to zero:

MyArray .BYTE 64 DUP (0)

The following statement reserves 256 bytes consisting of the values 1, 2, 3,
4, 5, 6, 7, and 8 repeated 32 times:

MyArray .BYTE 32 DUP (1,2,3,4,5,6,7,8)

The DUP operator is fully recursive. That is, one of the items in
the list may, itself, be a list defined by the DUP operator. For example,

Example .BYTE 16 DUP (0,1,2 DUP (3,4,5))

reserves 128 bytes consisting of the list "0,1,3,4,5,3,4,5" repeated 16 times.

If the DUP list is empty, e.g., "16 dup ()", then exactly one item
is reserved for each entry, but it is not initialized. The following example
reserves space for 128 uninitialized words:

OffsetTable .WORD 128 DUP ()

Type definitions:
-----------------

Enumerated data types can be declared with the ".TYPE" directive.
This directive takes the form:

{label} .TYPE item1 {,item2, ..., itemn}

The items in the list are assigned consecutive values starting from zero.
For example, in the following .TYPE statement, the symbols red, green, and
blue are assigned the values zero, one, and two, respectively:

colors .TYPE red,green,blue

The symbols in the operand field of a .TYPE statement must be unique and
undefined elsewhere (within the current scope, more on that later). The
.TYPE statement above is almost identical to the statements:

red .con 0
green .con 1
blue .con 2

However, there is one major difference. The .TYPE statement also defines a
symbol specified in the label field. This symbol can be used as a pseudo-
opcode to reserve space for values of the specified type. In the example
above, "colors" could be used as a pseudo-opcode to reserve space for the
values red, green, and blue. To differentiate type declarations from other
instructions, a special lead-in character is used. The slash ("/") is
recommended by this standard, but the user should have the option of choosing
this character via a setup program for the assembler. From the example
above, colors could be used as a pseudo-opcode in the following manner:

Christmas /colors red,green
Ocean /colors blue,green
Sky /colors blue
/colors red
Primaries /colors red,blue,green

Unlike other data reserving pseudo-opcodes, a "/colors" definition only
allows symbols that appear in the operand field of the associated .TYPE
statement or one of those symbols in a expression that contains a single
such symbol plus or minus a numeric constant, as long as the result is still
within the range of symbols declared for that type. E.g.,

Okay /colors red,green+1,blue
NotOkay1 /colors blue+2 ;Outside allowable range
NotOkay2 /colors red+blue ;can't add two such symbols
NotOkay3 /colors $25 ;Not red, green, or blue

If you need to coerce an expression to the proper form, simply use the type
name as a pseudo-function. E.g.,

ThisIsOkay /colors colors(0),blue ;Same as red, blue

If the operand is not appropriate, the assembler should generate a warning
and emit the code as though the .BYTE statement were used.

If there isn't a label starting in column one of a .TYPE statement
then the symbols defined in the operand field are applied to the previous
.TYPE statement. This allows you to create .TYPEs where several symbols
(which couldn't possibly fit on a single line) are declared as constants.
E.g.,

colors .TYPE red, yellow, blue
.TYPE orange, green violet
.TYPE brown, black, white

All of these symbols will be associated with "colors". A maximum of 256
symbols can be associated with a symbol via the .TYPE statement. Whenever
the data reservation form is used, exactly one byte is reserved for each
item in the operand field. If you need to reserve more than a single byte
for each item, use the record declarations described next.

The DUP operator can be used to define enumerated data type arrays,
e.g.,

LotsOfRed /colors 16 DUP (red)

Another form of the .TYPE statement allows you to declare byte
subrange values. A definition of this type takes the form:

label .TYPE start..stop

where start and stop are constant values in the range 0..255 and
start <= stop. Examples:

LessThan10 .TYPE 0..9
Nibbles .TYPE 0..$F
PrimaryColors .TYPE red..blue ;From above, is red, yellow, blue

Implementation of the .TYPE statement is required only at the full
and extended compliance levels.

Records:
--------

A record data structure can be defined with the ".RECORD" and ".ENDR"
directives using the syntax:

label .RECORD

.ENDR

This creates a template, but does not generate any code. An example might
be:

CursorPosn .RECORD
ROW .BYTE 0
COLUMN .BYTE 0
.ENDR

This definition creates the type "CursorPosn". Like the .TYPE definitions,
the symbol defined by .RECORD can be used as a pseudo-opcode to reserve
storage for a variable. For example, to declare a variable of type
"CursorPosn" the following statement is used:

MyCursor /CursorPosn

This statement reserves two bytes, initialized to zeros, at the current
location counter.

Access to the fields of the record is accomplished by using the
"." operator, just like Pascal. E.g.,

lda MyCursor.ROW ;Fetches first byte.
lda MyCursor.COLUMN ;Fetches the second byte.

In the example above, the ROW and COLUMN fields of each variable
declared with CursorPosn are always initialized to zero. Any other value
could have been used by substituting the appropriate value, or an
indeterminate value could have been specified by the definition:

CursorPosn .RECORD
ROW .BYTE
COLUMN .BYTE
.ENDR

On occasion, you may want each record variable definition to
specify the initial values. This can be accomplished by specifying
parameters in the record definition. Parameters are specified by the
symbols: ?0, ?1, ..., ?9. ?0 corresponds to the first parameter, ?1 to
the second, etc. Consider the following record and variable definitions:

CursorPosn .RECORD
ROW .BYTE ?0
COLUMN .BYTE ?1
.ENDR

HomePosn /CursorPosn 0,0
LowerRight /CursorPosn 23,79
MyCursor /CursorPosn 5,10

The only problem with this definition form is that each CursorPosn
variable must supply exactly two operands. Sometimes you may want to have
a default value in the event an operand isn't specified. This is accomplished
using a record defintion of the form:

CursorPosn .RECORD ?0=0,?1=0
ROW .BYTE ?0
COLUMN .BYTE ?1
.ENDR

This definition instructions the assembler to allow zero or more parameters,
defaulting ?0 and ?1 to zero if their respective entries aren't present.
The .DEFAULT directive can also be used, particularly if you run out of
room on the .RECORD line:

OpenRec .RECORD ?0=0, ?1=1
.DEFAULT ?2=ZRO('Hello there'), ?3=2
FirstItem .WORD ?0
.LONG ?3
SecondItem .BYTE ?1, ?2
.ENDR

Record definitions are required at the full and extended compliance
levels, they are not required at the subset compliance level.

Sets:
-----

Bit string types can be declared using the .SET directive. .SET is
used in a manner quite similar to .TYPE except the items in the operand field
can be any constant whose value is less than 32. Up to 32 items may
appear in the operand field of a .SET definition. The syntax is

label .SET item1 {,item2, ..., itemn} ;n <= 32.

An alternate form is to specify the name of some type variable in the operand
field. The following definition creates a set of integers in the range
0..9:

LessThan10 .TYPE 0..9
SetOfDigits .SET LessThan10

Declaring a set variable is quite similar to declaring an enumerated
type variable or a record variable: simply use the set name as a pseudo-opcode
prefaced by a "/":

Digits /SetOfDigits

Set constants are specified by placing the items in the set within
a pair of braces. E.G.:

BitValues .TYPE 0..7
SetOfBitValues .SET BitValues
Bits /SetOfBitValues {0,1,2,3}
;
;
lda #{0,2,7}
sta Bits

The assembler, by default, should allow set constants composed of
the integer values 0..31. This allows programmers to easily deal with bits
by bit numbers rather than the integers those bit patterns represent. For
example, to strip all but the H.O. two bits in the (8-bit) accumulator, the
instruction "AND #{6,7}" makes a lot more sense than "AND #$C0". All other
entities appearing within "{" and "}" must appear somewhere in the operand
field of a .SET statement (or must be a member of a .TYPE definition if that
type appears in the operand field of a .SET).

Macros:
-------

Macros are created using the .MACRO and .ENDM directives. The syntax
for a macro definition is

label .MACRO {default parameter values}

.ENDM

Macros are invoked by placing an underscore, followed by the macro name (the
label in the .MACRO statement). The user should be able to change the macro
lead-in character from underscore to some other character via an assembler
set up program.

All labels declared within the macro are local to that definition unless the
".GLOBAL" directive is used to extend their scope. In general, global
macro labels (except, possibly, those defined by "=") are not useful anyway
since a duplicate label error might occur on the second invocation of the
macro.

The macro body consists of a sequence of assembler statements. Most
reasonable statements may be included in the macro body. The standard does
not required nested macro definitions. Nor need the macro definitions allow
.RECORD, .TYPE, or .SET definitions (since labels are local to the macro,
such definitions are dubious anyway).

Macro parameters are specified using ?0, ?1, ..., ?9, just as for
.RECORD definitions. "?#" can be used to determine the actual number of
parameters present. "?:expr" can be used to select a parameter using a
numeric expression. For example, "?:?#-1" returns the value of the last
parameter specified. Default values for the parameters can be specified
in the .MACRO operand field, or in a .DEFAULT statement, just like specifying
default values for .RECORD parameters. E.g.,

MyMacro .MACRO ?0=0, ?1=2
.DEFAULT ?2="Hello there"
.BYTE ?0
.WORD ?1
.BYTE ?2
.ENDM

then:

_MyMacro 10,20

generates the bytes:
10, 20, 0, Hello there

Macros, by the very nature, allow a variable number of parameters.
If more parameters are specified than there are references for, the extra
parameters are ignored. If fewer parameters are specified than there are
references for, the additional references will be treated as undefined
symbols. If you want to be able to force the user to enter an exact number
of parameters, then use the ?# in the default field to specify a fixed number
of parameters. The following macro definition requires the user to enter
exactly two parameters whenever TwoParms is invoked:

TwoParms .MACRO ?#=2
lda ?0
sta ?1
.ENDM

If the number of parameters is fixed at a certain value, default values
are not allowed in the macro definition.

Since macro parameters, in a macro invocation, are separated by
commas, you cannot directly create a macro of the form:

LDAIX .MACRO ?#=1
lda ?0
.ENDM

and invoke it by:

_LDAIX LBL,X

intending the "LDA LBL,X" instruction to be generated. Instead, the macro
mechanism will think that LBL and X are two different parameters and generate
an error since only a single parameter is allowed. The "<<" and ">>" symbols
are used as an escape mechanism to parenthesize such operands. To handle the
case above, the following statement could be used:

_LDAIX <>

and this would generate the instruction "LDA LBL,X".

The lookup, value, type, and mode functions are quite useful for
dealing with macro parameters. The exact values returned by these functions
will be described at a later time.

For additional information on macros and dealing with macro para-
meters, see the sections on conditional assembly and while loops.

Macros are required only at the full and extended compliance levels.

Address Expression Functions:
-----------------------------

Format:

label .FUNC {default parameter values}

.RETURN expr
.ENDF

The .FUNC statement lets programmers define their own address
expression functions that can be used in operand fields of assembly language
statements. The function body typically contains a sequence of equates
and other value computing statements; it may not contain any code generating
statements.

Like a macro definition, all symbols defined inside an address
expression function are local to that function. Likewise, default parameters
may be declared in the operand field of the .FUNC statement or via the
.DEFAULT statement. Alternately, you can specify that a fixed number of
parameters are required by using the "?#=expr" item in the operand field
of the .FUNC statement.

The expression following the .RETURN statement is the value returned
by the addressing mode function. Note that more than one .RETURN may appear
within the function (perhaps within the confines of a conditional assembly
sequence). If more than one .RETURN statement is encountered, all but the
last are ignored. The expression returned in the .RETURN operand field may
contain addressing modes in addition to the actual expression value. In
general, anything allowed as a macro parameter can be returned as an address
expression value.

An address expression function is invoked by placing the function
name in some other expression followed by the parameters enclosed within
parentheses. The parentheses are required even if the parameter list is
empty (just like the "C" programming language). Examples follow:

StripLONibble .FUNC ?#=1
value = ?0 AND $F0
.RETURN value
.ENDF
;
AppendTXT .FUNC ?#=1
string = ?0 + ".TXT"
.RETURN string
.ENDF
;
.
.
.
LDA #StripLONibble($FF)
.
.
.
.BYTE AppendTxt("MyString")

The LDA instruction generates

LDA #$F0,

the .BYTE statement becomes

.BYTE "MyString.TXT"

The latter example demonstrates that address expression functions can
return any valid type. This includes strings, records, sets, and any
other entity allowed in an operand field. Consider the following:

LBLX .FUNC ?#=2
L = ?0-?1,X
.RETURN L
.ENDF

LDA LBLX($100,10)

This generates the code:

LDA $100-10,X

Address expression functions are required only at the full and
extended compliance levels.

The Label Type
--------------

The ".LABEL" directive is used to declare a valueless symbol, that is, one which
is defined but is assigned no particular value. The syntax for the .LABEL directive is:

.LABEL symbol1 {, symbol2, ..., symboln}

Each symbol appearing in the operand field is inserted into the symbol table as a "label"
typed symbol.

Label-typed symbols are useful mainly in macros and in the operand fields of
conditional assembly statements. The only operations you can perform using label-typed
symbols are "=" and "<>". Most of the reserved symbols in the assembler (such as A, X,
Y, DBR, D, M, S, etc.) are actually label-typed symbols.

An example of where you might use a label-typed symbol follows:

CmpReg .MACRO ?#=2
.IF ?0=A
cmp ?1
.ELSE
.IF ?0=X
cpx ?1
.ELSE
.IF ?0=Y
cpy ?1
.ELSE
.PAUSE
.ENDIF
.ENDIF
.ENDIF
.ENDM

The "=" equate can also be used to defined label-typed symbols by specifying a
label-typed symbol in the operand field, e.g.,

ACC = A
XReg = X
etc.

Note that the last equate above does not allow you to enter indexed by X addressing modes as

,XReg

it simply allows you to use a statement of the form:

.IF XReg=X

and wind up assemblying the code after the ".IF".

The ".LABEL" directive is required at the full and extended compliance levels; it
is not required at the subset compliance level.

Procedures:
-----------

At the full and extended compliance levels, the .PROC and .ENDP
directives can be used to declare 65c816 procedures (subroutines). Procedure
declarations take the form:

procname .PROC {near|far}

.ENDP

If an operand appears after the .PROC statement, it must be either "near" or
"far". If no operand appears, "near" is assumed.

The procedure name that appears in the label field of the .PROC
statement is assigned the current value of the location counter at that
point in the program. It is also given the type of near procedure or
far procedure, depending upon the .PROC operand field.

All labels defined inside a procedure are local to that procedure
unless the .GLOBAL directive is used to extend their scope beyond the
procedure. Therefore, labels inside one procedure may be reused outside
that procedure. If a label inside a procedure is already defined outside
that procedure an error is not generated, instead the new label supercedes
the old one INSIDE THE PROCEDURE (scoping rules are the same as for Pascal).
Procedures may be nested inside one other, the scoping rules used by Pascal
apply in such situations.

Inside the procedure, RET can be used in place of RTS or RTL. The
assembler will automatically choose the appropriate version depending upon
whether the procedure is a near or far procedure. If RTS is used inside a
FAR procedure or RTL is used inside a NEAR procedure, the assembler will
generate a warning.

The assembler automatically assembles JSR using the absolute or
long addressing mode depending upon the procedure definition. If the
assembler supports the JSL mnemonic and a JSL is used to call a NEAR
procedure, the assembler must generate an warning. If the address expression
following a JSR was coerced using the ":A" or ":L" suffixes, no warning will
be generated if the incorrect distance was specified. I.e., the following
does NOT generate an error:

JSR mysub:L
.
.
.
mysub .PROC NEAR
.
.
.

If you use a coercion operator, the assembler assumes that you know what
you are doing.

Note that the use of the .PROC statement is optional. You may con-
tinue to build and call subroutines without the .PROC directive. However,
using .PROC allows the assembler to perform additional type checking on
certain operations. An external data flow analysis program can also use the
procedure declarations to help locate logical bugs in your code.

.PROC and .ENDP are required at all compliance levels of the
standard.

Module Communication Directives:
--------------------------------

Three directives, .GLOBAL, .PUBLIC, and .EXTERNAL, are used to
communicate symbolic values across procedure, segment, and module boundaries
(a module is any one source file which is assembled as a whole unit). The
.GLOBAL directive is used to make symbols visible outside of procedures,
macros, functions, and records. The .PUBLIC directive is used to make
certain symbols visible outside the current module. The .EXTERNAL directive
is used to make symbols defined outside the current module visible within
the module.

The syntax for the .PUBLIC and .GLOBAL directives is identical, it
takes the form:

.PUBLIC symbol1 {,symbol2, ..., symboln}
and, .GLOBAL symbol1 {,symbol2, ..., symboln}

A label is not allowed in the label field of either mnemonic. The symbols
specified in the operand field of these two instructions are made known
outside the procedure or module where they currently reside. If a procedure
is nested inside another, the .GLOBAL statement makes its symbols known
only to the procedure encompassing the nested procedure. In the following
example, LCL is known only inside procedure X1 and X2, not to the whole
program:

X1 .PROC
.
.
X2 .PROC
.GLOBAL LCL
.
.
.ENDP
.ENDP

If you wanted to make LCL visible at the level above X1, then another
.GLOBAL statement must appear inside the X1 procedure declaring LCL to
be global to that procedure.

Another alternative is to use the .PUBLIC statement. Any symbol
declared public with .PUBLIC is instantly visible throughout the program
(within the confines of the scoping rules). However, keep in mind that
symbols declared as public are visible outside the current module as well
and may intefere with other modules.

The .EXTERNAL directive is used to obtain access to symbols declared
outside the current module. The syntax for the .EXTERNAL directive is:

.EXTERNAL symbol1:type {,symbol2:type, ..., symboln:type}

Again, no label is allowed in the label field of the .EXTERNAL directive.
The type item is any of NEAR, FAR, CONST, DIRECT, ABS, or LONG.

Note: symbols declared with "=", .MACRO, .RECORD, .SET, and .TYPE
may not appear as operands to the .GLOBAL, .PUBLIC, or .EXTERNAL directives.

These directives are not required at the subset compliance level,
only at the full and extended levels.

Segments:
---------

Segments are used to group a collection of logically and physically
related entities within a program. A segment may contain the program code,
variables, stack area, direct page area, or other such data. Typically
a segment is a load module. That is, a segment is loaded as a whole into
memory. If a program consists of two or more segments, they need not all
reside in memory at the same time. The memory manager/loader may load
segments as needed into memory.

Segment definitions are required at all compliance levels. All
programs must consist of at least one segment (this is a source of minor
incompatibility with existing assemblers). The most general form of the
segment definition is:

label .SEGMENT TYPE=expr {,ALIGN=expr} {,ORG=expr} {,NOCODE}

.ENDS

.SEGMENT lets you declare any general type of segment. The symbol in the
label field need not be unique, but if it is redefined elsewhere within the
current scope, it must appear on a .SEGMENT definition whose type is exactly
the same as the current definition.

Unlike .PROCs, .MACROs, etc., symbols defined inside a segment are
not local to the segment, but are instantly visible to the reset of the
module. If you need to declare local variables within a segment, use the
.LOCAL and .RELEASE directives.

The type of segment must be specified in the .SEGMENT operand field.
The actual segment types will be defined at a later date. For now, assume
the types used by the Apple //GS loader are specified after the TYPE= item.
The segment type describes the attributes of the segment, attributes such
as whether the segment is relocatable or absolute, fixed or movable, etc.

The optional ALIGN operand is used to determine some number of bytes
to which this segment (portion) must be aligned. If ALIGN=1 , the segment
will be aligned on any byte boundary. If ALIGN=2 then the segment will be
aligned on a word boundary, etc. Any value between 1 and $10000 can be used
(ALIGN=$10000 will align the segment on a bank boundary).

The ORG=expr option can be used to fix the starting address of the
segment. This option isn't normally used with code-generating segments.
It's mainly used to define I/O port addresses and other absolute variables.

The NOCODE option is used to declare that a segment will not generate
any code (i.e., it's just used to declare variables). If any 65c816 instruct-
ion appears in a NOCODE segment, an error will be generated. All data
declaring pseudo-opcodes (e.g., .BYTE) must specify indeterminate values else
an error will be reported.

If multiple segments with the same name appear in a module (or
across modules, for that matter), they will be combined into a single,
contiguous module by the assembler and/or linker. Consider the following:

MyCode .SEGMENT Type=$1AF
.
.
.
.ENDS
;
MyData .SEGMENT Type=$100
.
.
.
.ENDS
;
MyCode .SEGMENT Type=$1AF
.
.
.
.ENDS

Although MyCode appears in two completely disjoint areas, the assembler/linker
will combine these items into a single segment. Segments appear in the
load module in the order they are declared in the source file. In the
example above, segment MyCode appears before segment MyData (even though
a portion of MyCode appears after MyData, MyCode was still declared before
MyData).

Segments may be nested, but they don't follow any scoping rules.
Declaring one segment inside another is no different that declaring those
two segments completely separate.

If you have two separate segments (different names but the same
type), you can combine them together using the .GROUP directive. This
directive takes the form

label .GROUP seg1, seg2 {,seg3, ..., segn}

Referring to "label" refers to the segment obtained by combining the
segments in the .GROUP operand field.

To simply segment usage, there are six predeclared segments. They
may be declared with the directives:

.CODE .DATA .DIRECT
.STACK .VAR .CONST

.CODE is used to declare static, code-generating segments which allow
65c816 instructions. .DATA is used to declare static data-generating
segments. .CONST is identical to .DATA except data items inside the
.CONST directive are read-only. Any attempt to write to items inside a
.CONST segment should generate an error by the assembler or data flow
analysis programs. .DIRECT is used to declare segments containing direct
page variables. This is a NOCODE segment, so only definitions are allowed,
initial values are illegal. .STACK segments are also NOCODE segments. They
are useful for declaring stack space down in bank zero. The .VAR segment
is used like the .DATA segment, except .VAR segments are NOCODE segments.
They are used for declaring unintialized variables in main RAM.

The syntax for these six directives is

label .xxxx {ALIGN=expr | ORG=expr}

.ENDS

The ASSUME Directive
--------------------

With the addition of the bank registers and the mode bits in the
65c816 processor, an assembler can no longer determine the proper addressing
mode to use in all circumstances without help from the programmer. For
example, if the assembler encounters an instruction of the form "LDA Label"
and Label is a statement label inside some segment (i.e., not declared with
EDP, EQU, EQL, or other type-defining directive), it has no idea whether to
use the direct, absolute, or long addressing mode. To do so would require
that the assembler know the current values of the direct page and data bank
registers at assembly time. Frankly, it is not possible for the assembler
to always know the content of these registers, hence the programmer must
manually supply this information to the assembler. This information, as well
as some other useful information, is supplied to the assembler via the
.ASSUME directive.

The .ASSUME directive uses the syntax:

.ASSUME operand1 {,operand2, ..., operandn}

where operand(i) is one of the following:

DBR:expression24
DBR:NOTHING
DP:expression16
DP:NOTHING
M:expression1
M:NOTHING
X:expression1
X:NOTHING
CPU:cpu_type

where expression24 is an expression yielding a 24-bit value, expression16 is
an expression yielding a 16-bit value, expression1 is an expression yielding
zero or one, NOTHING is a reserved word, and cpu_type is one of {6502, 65c02,
65802, 65816} or one of the later versions of the 65c816 microprocessor.

DP (direct page) is used to let the assembler know where the direct
page register is pointing. If a segment name is given as the expression,
that segment must be one that resides in bank zero and is of type DIRECT.
If the assembler encounters a symbol declared in a segment that is assumed
to be a direct page segment via the DP:expression operand, the assembler will
reference that location using the direct page addressing mode (if posssible).
If the "DP:NOTHING" form is used, the assembler will only use the direct page
addressing mode if a symbol was declared with the EDP equate. None of the
segments will be treated as direct page segments, even if they were declared
as type DIRECT. If you want to simultaneously refer to several segments as
direct page segments, group them together using the .GROUP directive and
specify the group name as the expression value after the DP:, i.e.,

DPGroup .GROUP DPSeg1, DPSeg2, DPSeg3
.ASSUME DP:DPGroup

By default, the assembler should assume DP:NOTHING.

DBR is used to tell the assembler which segment/bank the DBR (data
bank register) points at. References to variables within that segment will
be assembled as absolute references (unless that segment name is also
specified after DP:expr, in which case the direct page addressing mode will
be used, if possible). If DBR:NOTHING is specified, absolute addressing will
be used only for those symbols declared via EQU, all other references will
be assumed to be long references. Note that the H.O. eight bits of the
24-bit expression are used. Therefore, to set the DBR assumption to an
absolute bank in memory, an expression of the form:

.ASSUME DBR:$200000 ;Assume DBR=$20

must be used. By default, the assembler should assume DBR:NOTHING.

Normally, a programmer should use "#" and "=" to specify eight or
sixteen bit immediate operand sizes. To help ensure upwards compatibility
with existing source code, a mechanism has been added whereby the "#" is
used and the .ASSUME directive controls the size of immediate operands. This
task is achieved using the M:expr and X:expr operands. Normally the assembler
defaults to M:NOTHING and X:NOTHING. In this mode, "#" specifies 8-bit
immediate operands and "=" specifies 16-bit operands. If the expression
following the M or X is zero or one, then any immediate operand containing
an equal sign is flagged as an error and the "#" specifies an eight-bit
operand if the expression was 1, a sixteen-bit operand if the expression was
zero. If the expression evaluates to any other value an error is generated.
Note that M only affects accumulator and memory operations while X affects
the index register operations. It is perfectly permissible to have an
.ASSUME of the form:

.ASSUME M:NOTHING,X:1

The "=" immediate specifier would be allowed for accumulator operations but
not for X/Y index register operations.

To help ensure compatibility with the existing defacto standard,
LONGI, LONGA, SHORTI, and SHORTA should be provided as built-in macros
generating the appropriate .ASSUME statement.

The "CPU:cpu_type" operand to the .ASSUME statement lets users
specify the exact 6500 family CPU they are using. The effect of this
operand is to "disconnect" certain instructions. If a certain CPU is
specified and a programmer uses an addressing mode or instruction which isn't
available on that CPU, the assembler will generate an error. By default,
the assembler should assume the CPU of the machine on which the assembler
is intended to run (e.g., 65c816 for Apple //GS machines). If the assembler
is running on a different processor other than a 6500 family chip, it should
default to 65c816. The user should be able to choose this default value
from an assembler set-up program.

The .ASSUME directive, and all operands available to it, must be
supported at all compliance levels.

Local Symbols
-------------

In addition to local labels automatically specified inside procedures,
macros, and expression functions, you can also explicitly declare local sym-
bols within the source file. User-defined local symbols come in two
varieties: numeric and symbolic.

Up to 10 active numeric local labels can be specified at any given
time. The numeric local labels are similar to those used by D. E. Knuth
in "The Art of Computer Programming, Vol 1", although the syntax is different.
Numeric local labels are declared by placing a caret (up-arrow) in front of
a single decimal digit in the label field. Examples follow:

^0 LDX #05
^9 DEX
^4 LDA LBL

Numeric local labels are referenced with the ">n" and "0 ;References ^0 below.
TXA
^0 STA Array+1

Note that multiple occurrences of the same numeric local label may appear
within the program. The are differentiated by the "<" and ">" symbols.

Since "<" and ">" may appear both as operators and as the beginning
of an operand, a minor ambiguity results. If you see a portion of an ex-
pression like ">0", does it mean 'is some value greater than zero' or does
it refer to the next occurrence of "^0"? This is easily handled from context.
If the ">" or "<" appears where an operator is expected, then the appropriate
operation is performed. If they appear where an operand is expected and they
are followed by a single decimal digit, then they are used as lead-ins for
numeric local labels. Otherwise an error must be generated.

Numeric local labels are great for those cases where you need to
perform a short branch or to set up a small loop and you don't want to use
meaningless mnemonics like "loop1", "SkipInstr12", etc. Other times, you
may want to use a meaningful name like "MainLoop" or "ElseQuit", without
having to worry about conflicts in other parts of the program. Such cases
are easily handled by the symbolic local label facility specified by this
proposal. Two assembler directives: .LOCAL and .RELEASE are used to define
the scope of user-specified local labels. The syntax for these two directives
is identical, it is:

.LOCAL label1 {,label2, ..., labeln}
.RELEASE label1 {,label2, ..., labeln}

A label defined with .LOCAL is confined to the scope of the .LOCAL/.RELEASE
pair. .LOCAL/.RELEASE pairs may be nested allowing you to redefine a symbol
to any reasonable depth (say, a minimum of 8 levels).

Numeric local labels are required at all compliance levels. Symbolic
local labels are required at the full and extended compliance levels.

Conditional Assembly
--------------------

Conditional assembly is handled by the .IF, .ELSE, .IF1, .IF2,
.IFDEF, .IFNDEF, and .ENDIF directives. .IF is followed by a numeric
address expression that yields a zero (false) or non-zero (true) result.
The following code (up to the .ELSE or .ENDIF) is assembled if the result is
true. Otherwise the code after the .ELSE (if it is present) is assembled in
its stead. .IF1 and .IF2 assemble their respective code during passes
one and two. .IFDEF and .IFNDEF accept a single symbol as their parameter
and test whether or not this symbol is currently defined. The .ELSE
directive can be used to assemble additional code in the event the tested
condition is false. Finally, the .ENDIF directive is used to terminate
a conditional assembly sequence.

Conditional assembly blocks can be nested to at least eight levels,
preferably more. Since all conditional assembly blocks are terminated with
.ENDIF, there is no need to worry about matching .ELSEs as you would, say,
in Pascal. Every form of the IF statement is terminated with its own .ENDIF.

The .IF1 and .IF2 directives are normally used to print messages and
perform other minor housekeeping chores. In general, there's absolutely no
reason why anyone would want to generate code inside one of these conditional
assembly blocks. Therefore, the assembler may optionally generate an error
message if the location counter is modified anywhere inside the .IF1/.IF2
conditional assembly block.

.IF, .ELSE, and .ENDIF are required at all compliance levels. .IF1,
.IF2, .IFDEF, and .IFNDEF are required only at the full and extended com-
pliance levels.

While Loops
-----------

Sometimes, especially within macros, you will need some sort of
looping structure to process parameters or otherwise generate sequences of
code; the .WHILE/.ENDW directives are used for this purpose. The syntax
for the while section is:

.WHILE expression

.ENDW

The instructions in the loop body are repeated as long as the expression
yields a non-zero value. For the loop to terminate, the variable(s)
controlling the loop must be defined using the "=" assembler directive
since this is the only directive that allows you to redefine an instance
of a variable.

The .WHILE directive is especially useful for processing a macro
(or record definition) with a variable number of parameters. Consider the
following macro:

ByteTable .MACRO
ParmCnt = ?#
.WHILE ParmCnt
.BYTE ?:(?#-ParmCnt)
ParmCnt = ParmCnt-1
.ENDW
.ENDM

_ByteTable 0,5,4,2,7

This example emits the five bytes 0, 5, 4, 2, and 7 into the object code
stream.

INCLUDE Mechanism
-----------------

A source file include mechanism is provided by the .INCLUDE directive.
Its syntax is

.INCLUDE "filename"

The specified file will be inserted at the point of the .INCLUDE directive
in the current assembly, as though the code were actually inserted at that
point.

The include mechanism must be capable of nested includes up to four
levels deep. The .INCLUDE directive must be supported at all compliance
levels of the assembler, although assemblers operating at the subset
compliance level need not support nested include files.

Programs, Modules, and Units
----------------------------

The assembler handles three types of sources files: programs, modules,
and units. Unless otherwise specified, all source files are assumed to be
programs. A program is differentiated from a module or unit in that the
assembler/linker assumes that control is transferred to some point in a
'program' when it is loaded into memory. Modules and units are assumed to
be subserviant sections of code that contain data and/or code used by
programs.

By default, a piece of code is assumed to be a program and control
is transferred to the first byte of that code when the program is loaded
into memory. This helps improve compatibility with existing source files.
The .PROGRAM directive can be used to explicitly declare a piece of code as
a main program, as well as provide an entry address other than the first byte
of code emitted. The syntax for the .PROGRAM directive is

.PROGRAM label

where "label" is a program statement label somewhere within the current
assembly. The address of this label is passed on to the linker/loader where
it will be used to provide a starting address for the code. All of the
statements in the source file will be assembled into the program from the
.PROGRAM directive till the .END directive. If a .PROGRAM directive appears
in the source file, it must appear before any other statement (other than a
comment or listing directive) and there may only be one .PROGRAM directive
encountered per assembly. No modules or units may appear as part of a
program assembly (see below).

The .MODULE directive is used to tell the assembler that it is
assemblying an object code module which is to be linked into a separate
program before execution. The .PUBLIC statement is used as the means
to communicate linkage information to other modules, units, and programs.
Like the .PROGRAM directive, the .MODULE directive must appear before most
statements in the source file and the module is terminated with the .END
directive. However, another module may appear in the source file immediately
after the .END directive. Such modules are assembled as independent entries
in a library. The syntax for the .MODULE directive is:

.MODULE ModuleName

The module name operand is stored as part of the source file for use by the
linker, but is not otherwise refereced during the assembly process. In fact,
this symbol may be redefined later in the source file.

The .LINK directive can be used to link a module into another module,
unit, or program at assembly time. The syntax for this directive is:

.LINK "filename",ModuleName

where filename is the operating system name of the object code file or
library file containing the module, and ModuleName is the actual module
name specified with the .MODULE directive. The specified object code is
inserted into the assembly at the point of the .LINK directive. Access to
the symbols declared public within the module is accomplished using the
.EXTERNAL directive.

Units are a much more structured form of modules. With a unit,
you specify not only the symbols visible to the code using the unit, but
also how that data is used. Units also allow you to pass type checking
information so the assembler can check for possible logical errors during
assembly. Finally, as an added bonus, within units you can link in macros,
records, types, symbols defined by "=", and other entities that cannot be
handled by modules and the .PUBLIC/.EXTERNAL mechanism.

A unit takes the form:

.UNIT UnitName

.BEGIN

.END

Like .MODULEs, several units may appear in the source file by simply following
the .END directive with the next unit definition. In fact, .MODULEs and
.UNITs can be intermixed in the same source file. If more than one module or
unit appears in the source file, they will be assembled into different slots
in the object file generated (i.e., a library file will be generated).

The interface section of a unit contains those items that will be
public to the unit. Equates, records, macros, types, sets, and any other
non-code generating declaration can be used in the interface section (note:
an exact list of items will be specified later). Such definitions will be
made available to the code that uses this unit as well as to the code in the
implementation section. In addition to such declarations, the interface
section may also contain .PROC definitions and .ENTRY definitions. The
.PROC definitions simply contain the .PROC statement (which must also appear
in the implementation section), the .ENTRY definition is used in lieu of the
.PUBLIC directive and takes the form:

label .ENTRY {NEAR or FAR}

An example of a simple unit might be:

.UNIT SimpleUnit
MyMac .macro
lda #0
sta ?0
.endm
;
ClrSub .proc near
SetTrue .proc far
SetIt .entry far
;
.BEGIN ;Start implemenation section.
;
ClrSub .proc near
_MyMac $11
ret
.endp
;
SetTrue .proc far
lda #1
SetIt sta $23
ret
.endp
.end

To use the code defined in a unit, the ".USE" directive is used in
a fashion not unlike the .LINK directive, namely,

.USE "filename",UnitName

where filename is the operating system pathname and UnitName is the name
specified in the operand field of the unit directive. Whenever the .USE
directive appears in a source file, the content of the implementation section
will be listed if the source listing option is turned on.

Whenever the .USE or .LINK directives are employed, the corresponding
object code is always inserted into the assembly. Therefore the assembler
is performing double duty, it's acting as both the assembler and linker.
With units, the assembler always performs the link operation. With modules,
you can defer the link operation to a separate linkage step, although there
are only a few instances where this would be beneficial (for example, while
creating libraries).

All of the program linkage directives are optional at the subset
compliance level, but required at the full and extended levels.

Listing Controls
----------------

Several directives are used to control the appearence of the assem-
bled source listing. The exact format of the listing will be specified with-
in this proposal (although at a later date). The exact listing format must
be adhered to so that symbolic debuggers can take advantage of an assembled
source listing saved as a text file for use when stepping through a program.

.ON operands
.OFF operands

These two directives are used to turn certain listing options or
or off. Valid operands include LIST, OBJ, MAC, and COND. LIST controls
whether or not the source file is listed and supercedes all other options.
OBJ (if on) will force the assembler to display all bytes of object code
emitted by an instruction, even if it takes more than one line to display it
all; if off, OBJ will only display the number of emitted object code bytes
that fit on the current source line. MAC controls macro expansions during
the listing. If off, only the macro name, not the expansion, will be dis-
played. COND controls the printing of statements in a false conditional or
while loop section.

The .TITLE and .SUBTITLE directives let you assign titles and sub-
titles to the source file. The syntax for these directives is

.TITLE "Title of source file"
.SUBTITLE "Subtitle for this section"

The title is displayed at the top of each page and the subtitle is displayed
immediately below the title. .TITLE always forces a page eject, .SUBTITLE
never does.

The .PAGE directive forces an immediate page ejection on the listing.
It requires no operands.

The .PRINTF directive has the syntax:

.PRINTF "Control string" {,operands}

It is used in a manner analogous to the PRINTF in the "C" programming
language. If expressions follow the control string, "%" modifiers in the
control string specify their output format. E.g.,

.PRINTF "Label = $%4h",Label

would print

Label = $1234

assuming the value associated with Label was $1234.

The .PAUSE directive can be used to force an assembly time error.
It is useful mainly in macros, records, expression functions, etc. to force
an error if an illegal condition (like bad number of parameters) occurs.

The listing control directives are required only at the full and
extended compliance levels.

Data Flow Analysis Directives
-----------------------------

The following directives are quite useful to add-on debuggers and
data flow analysis programs. They are required only at the full and extended
compliance levels:

label .table

.endt

For .table, the label is assigned the current value of the location counter
and label is treated like a statement label. .TABLE and .ENDT are otherwise
ignored.

label .REF label1 {, label2, ..., labeln}

This statement is ignored by the assembler. The statement label, if present,
is also ignored.

Other Optional Goodies
----------------------

The following are not required by this proposal, but should be
provided nonetheless:

.system "DOS command"

.SYSTEM issues the specified command to the operating system. This command
is useful for deleteing files during assembly, changing directories, etc.

Operation of the Assembler
--------------------------

Given the structure of the assembler, there's no way it can accomplish
its job in less than three passes without placing severe burdens on the
user (I could provide you with a mathematical proof of this, but I don't want
to bore you to death). Therefore, the standard specifies that the assembler
must use three (or more) passes to do its job. During the first pass the
assembler associates labels with segments (and groups of segments), determines
whether or not those symbols are near or far, and performs other housekeeping
chores fit for pass one. Pass two of the assembler is equivalent to the
traditional pass one of an assembler, it computes the values for all of the
symbols in the program. Pass three generates the actual object code.

In Addition to the Assembler...
-------------------------------

The standard should also include specifications for a run-time
library to be provided with the assembler as well as a list of tools
(e.g., debugger, linker, librarian, etc.) which must be provided with the
product to meet the full compliance level. I would like to propose the
following items in the run-time library:

TTY_IO: A set of routines to communicate with a text-based
user console. INIT, GETC, and PUTC are the basic routines. These three
routines are easily supported on any system supporting a user console.

TERMINAL_IO: A set of routines to communicate with a cursor-based
terminal device. Routines supported should include INIT, GETC, PUTC, GOTOXY,
HOME, CLREOLN, and CLREOP.

CONSOLE_IO: A set of routines to communicate with a DMA-based video
display device. See the specifications for ANIX's CHARIO driver for the
routines to be supplied with this library entry.

AUX_IO: A driver for a set of one or more serial communication ports.
Routines should include INITA, SETUPA, GETA, PUTA, STATUSA.

PRT_IO: A driver for a set of one or more printer ports. Routines
should include INITP, SETUPP, PUTP, and STATUSP.

NET_IO: A driver for a set of one or more network ports. Routines
should include INITN, SETUPN, GETPacket, SendPacket, etc.

CLK_IO: A driver for a real time clock or clock-calendar unit.

FP: An IEEE floating point package for the 65c816 chip.

MATH: A set of integer math routines (multiply, divide, extended
precision, etc.).

CONV: A set of conversion routines (binary -> decimal, etc.).

FILE_IO: A set of routines that interface to the host's operating
system providing a common interface to various operating systems.

DVC_IO: A hardware independent device I/O package (allowing named
devices which can be connected through a BIOS (like the AUX_IO and PRT_IO
packages) to various hardware devices.

STD_IO: A set of routines to perform various I/O operations such
as PRINT, PRINTF, SCANF, PUTI (integer), GETI, PUTH (hex), GETH, etc.

MEM_MGR: A set of memory management routines to efficiently allocate
and deallocate memory.

This is, by no means, an exhaustive list, but a quick sample of the types of
routines that should be provided.

Apple //GS users may complain that many of these routines already
exist within the confines of the Apple toolbox. The intent, however, is to
provide a set of useful routines that can be utilized on ANY 65c816 system
so 65c816 code can be easily ported to systems other than the Apple //GS.