FORTRAN to Rust: part 2

2026-03-05

See part 1 of this series for an introduction to FORTRAN and a basic overview of the f2rust compiler.

This part dives into an assortment of interesting or weird or annoying parts of FORTRAN, and discusses how f2rust approaches them.

Memory allocation

FORTRAN is designed to work on systems with no dynamic memory allocation. Notably that doesn’t just mean no heap, it means no stack either. The address of every variable and every intermediate value can be statically computed at compile time.

As it can’t rely on a stack, recursive functions are forbidden. Nested invocations of the same function would store their local variables and intermediates in the same memory locations and interfere with each other.

Of course modern compilers – and I guess many compilers back in the 70s — do use a stack, but recursion is still a problem. FORTRAN 77 encourages programmers to allocate very large arrays as local variables (because it doesn’t really provide any alternative), but e.g. Linux has a default stack limit of 8 MiB, so the very large arrays can’t be stored on the stack.

Modern gfortran defaults to storing any array larger than 32 KiB in static memory instead. Recursion will misbehave, as the nested invocations will use the same static memory. Fortran 90 introduced the RECURSIVE keyword on function declarations: that causes gfortran to store all arrays on the stack regardless of size, so it’s safe to call the function recursively, but if the stack exceeds the 8 MiB limit then the process will crash. The programmer is responsible for only adding RECURSIVE to functions with modest stack usage. (Fortunately Fortran 90 also introduced heap allocation, which is a better way of handling large arrays.)

In f2rust we can’t use static memory (there’s no recursion but we want to be thread-safe), so we implement both stack-allocated and heap-allocated arrays. The compiler determines the size of each array (evaluating constant expressions used in the array bounds) and anything above an arbitrary limit of 256 elements is stored as a heap-allocated Vec. SPICE uses a lot of 3D vectors, which remain on the stack and avoid the cost of heap operations and pointer indirections.

SAVE variables

SAVE variables in FORTRAN are similar to static variables in C: their value is preserved between invocations of the function.

Sometimes SPICE uses this to perform lazy initialisation:

      DOUBLE PRECISION FUNCTION PI ( )

      DOUBLE PRECISION   VALUE
      SAVE               VALUE

C     Initial value
      DATA               VALUE   / 0.D0 /

      IF ( VALUE .EQ. 0.D0 ) THEN
         VALUE = ACOS ( -1.D0 )
      END IF

      PI = VALUE

      RETURN
      END

On the first call it will see the initial value (0.0) provided by the DATA statement, and compute the new VALUE. Because of the SAVE declaration, subsequent calls will see the value previously written to VALUE and can avoid the costly ACOS call.

(You might ask why SPICE implements it this way instead of simply hardcoding the value of π. Good question.)

A related feature is that functions and subroutines can declare multiple entry points. This can be used for some crazy control flow (it’s basically GO TO into the middle of a function), but fortunately SPICE uses it in a simpler way: every ENTRY immediately follows a RETURN, so control flow never crosses over an entry point. Each entry is almost like a totally separate function, except when they use SAVE variables:

      LOGICAL FUNCTION SETERR ( STATUS )

      LOGICAL            STATUS
      LOGICAL            SVSTAT
      SAVE               SVSTAT

C     Initial value
      DATA               SVSTAT   / .FALSE. /

C     Store the argument
      SVSTAT = STATUS
C     Set the function's return value
      SETERR = .TRUE.

      RETURN

C     Alternate entry point for SETERR
      ENTRY FAILED()
C     Return SVSTAT
      FAILED = SVSTAT
      END

You can call SETERR(.TRUE.) to set the flag, then IF (FAILED()) ... to test it. Because FAILED is an entry point to the same function, it has access to the same SAVE variables, and this can be used to share global state between a small set of operations.

(One limitation is that every entry must have the same return type. This is why SETERR returns a useless .TRUE. value: FAILED wants to return a LOGICAL (boolean), so SETERR must be declared as returning a LOGICAL too.)

This kind of global state is the main reason SPICE is not thread-safe. Rust does not like global state because it expects all programs to be thread-safe. To handle this in f2rust we put all the SAVE variables into a Context object that is passed to every function that needs it:

struct SaveVars {
    SVSTAT: bool,
}

impl SaveInit for SaveVars {
    fn new() -> Self {
        let SVSTAT = false;

        Self { SVSTAT }
    }
}

pub fn SETERR(STATUS: bool, ctx: &mut Context) -> bool {
    let save = ctx.get_vars::<SaveVars>(); // Rc<RefCell<SaveVars>>
    let save = &mut *save.borrow_mut();    // &mut SaveVars

    save.SVSTAT = STATUS;
    return true;
}

pub fn FAILED(ctx: &mut Context) -> bool {
    let save = ctx.get_vars::<SaveVars>();
    let save = &mut *save.borrow_mut();

    return save.SVSTAT;
}

where Context is like:

pub struct Context {
    data: HashMap<TypeId, Rc<dyn Any>>,
}

pub trait SaveInit {
    fn new() -> Self;
}

impl Context {
    pub fn get_vars<T: 'static + SaveInit>(&mut self) -> Rc<RefCell<T>> {
        let obj = self.data.entry(TypeId::of::<T>())
            .or_insert_with(|| Rc::new(RefCell::new(T::new())));

        Rc::downcast::<RefCell<T>>(Rc::clone(obj)).unwrap()
    }
}

This is slightly tricky code. Firstly, TypeId::of::<T>() will give a unique value for every distinct type. Since SETERR and FAILED are using the same SaveVars struct, they will compute the same TypeId and access the same item in the HashMap.

Secondly we have to wrap the type in RefCell, which provides dynamic borrow checking. We know that FORTRAN doesn’t allow recursion, so we won’t try to borrow SaveVars while it’s already borrowed by the same function higher up the call stack – but the Rust compiler doesn’t know that. And even though we can verify SPICE does no recursion itself, it provides APIs with user-defined callbacks, so a user might accidentally trigger recursion.

That means static borrow checking is insufficient, and we use RefCell::borrow_mut to safely panic in the unlikely case of recursion.

Thirdly we have to wrap the type in Rc. We never remove or overwrite items in the HashMap, so ideally we could take a reference to an object owned by Context – but the Rust compiler doesn’t know that. Maybe we could do something clever with unsafe and lifetimes, but Rc is a much simpler solution that proves the SaveVars won’t be freed prematurely.

Finally we store the RefCell<SaveVars> as dyn Any. This lets us combine all the distinct SaveVars structs from different source files. Rc::downcast converts Rc<dyn Any> back to Rc<RefCell<SaveVars>>.

This isn’t the most efficient possible implementation, but it’s not too bad, and it’s safe and relatively straightforward.

Unfortunately FAILED is by far the most frequently-called function in SPICE (it’s used by basically every non-leaf function as a crude form of stack unwinding) and it stood out in some profiler reports, so I added a hack: f2rust provides intrinsic functions that read/write a special bool flag in Context, and SETERR/FAILED are rewritten to call those intrinsics instead of using SAVE. But everything else still uses the HashMap properly.

GO TO

FORTRAN has many versions of GO TO. Happily SPICE doesn’t use any of them, so we don’t need to support them. But we can have a peek out of curiosity.

There’s the traditional form of GO TO, with a simple numeric statement label:

      PRINT *, 'Hello'
      GO TO 100
      PRINT *, '(unreachable)'
100   PRINT *, 'world'

There’s computed GO TO where an integer expression gives the index into a list of statement labels, like a jump table:

      INTEGER I

      GO TO (200, 201) MOD(I, 2) + 1

200   PRINT *, 'even'
      GO TO 299
201   PRINT *, 'odd'
299   CONTINUE

There’s assigned GO TO where the ASSIGN statement stores a statement label into an integer variable, and you can subsequently jump via that variable:

      INTEGER J

      IF (MOD(I, 2) .EQ. 0) THEN
        ASSIGN 300 TO J
      ELSE
        ASSIGN 301 TO J
      END IF

      GO TO J

300   PRINT *, 'even'
      GO TO 399
301   PRINT *, 'odd'
399   CONTINUE

(The compiler might implement ASSIGN by storing the memory address of the target; it’s probably not going to store the numeric value of the label. That’s why you can’t simply say J = 300 and can’t pass an integer expression to ASSIGN.)

Then it gets weird with alternate returns:

      SUBROUTINE PARITY(I, *, *)
        INTEGER I
        RETURN MOD(I, 2) + 1
      END SUBROUTINE

      CALL PARITY(I, *400, *401)
      PRINT *, '(unreachable)'
400   PRINT *, 'even'
      GO TO 499
401   PRINT *, 'odd'
499   CONTINUE

Each * in PARITY’s argument list represents a statement label. The RETURN e statement, where e is an integer expression, causes execution to jump to the statement label in the e’th asterisk argument. (Unless e is less than 1 or greater than the number of asterisks, in which case it behaves like a normal RETURN and execution jumps to the statement immediately following the CALL.)

Alternate returns could be used for error handling – instead of having the subroutine set an error code that the caller has to remember to check after the call returns, make the caller provide the labels for its error handling routines and the subroutine can jump there directly.

Fortran 90 called alternate returns “obsolescent” (meaning they’re still supported but strongly discouraged) and said you should use error codes instead. Assigned GO TO was marked obsolescent in Fortran 90, and fully deleted in Fortran 95. Fortran 95 also obsolesced computed GO TO. Only the good old basic GO TO survives untouched.

Translating any of these into Rust would be a bit awkward. Probably we’d have to split the function body into a big match statement inside a loop, with an arm for each statement label, so we can jump between arms in any order. A bit like how async causes the Rust compiler to turn a function into a state machine.

Fortunately FORTRAN bans jumping into an IF or DO block. Unfortunately you can still jump around within a block or from inside to outside, so the IF and DO control flow would also need to be flattened and reimplemented with the match state machine. That would ruin our attempt to have readable Rust code that closely mirrors the original FORTRAN.

Multidimensional arrays

We previously mentioned one-dimensional arrays. Multidimensional arrays are supported too, up to a maximum of 7 dimensions. (Why 7? I guess they just thought that was plenty.)

C     A 3x3 matrix
      DOUBLE PRECISION M(3, 3)
      M(1, 1) = 1.0
      M(2, 1) = 0.0
      M(3, 1) = 0.0
      ...

The array is stored in memory with M(I, J) immediately followed by M(I+1, J) (column-major order). Notably this is the opposite way to C and Rust, where m[j][i] is followed in memory by m[j][i+1] (row-major).

SPICE’s C wrapper tries to hide this difference by transposing every input/output matrix, so C users can copy FORTRAN examples without having to remember to swap all the indexes around. In rsspice we don’t transpose, and instead encourage users to use a higher-level type like nalgebra::Matrix3 which wraps column-major storage and provides column-major indexing (i.e. m[(i, j)] is followed in memory by m[(i+1, j)], mirroring FORTRAN).

Aliasing

FORTRAN has a simple aliasing rule: if a function is called with two dummy arguments pointing at the same memory, you must not assign a value to either of them.

This means that in part 1’s VADDG(V1, V2, NDIM, VOUT) example, the compiler can assume that writing to VOUT will not alter any elements of V1 as they must be non-overlapping arrays. It can e.g. safely use SIMD instructions to read multiple elements of V1 before writing them all to VOUT at once.

This is stricter than the aliasing rules in C/C++, where arguments may point at the same memory and be written to if they are declared as the same type (or if at least one of them is char). And this is one reason why scientific code often has better performance in FORTRAN: a C compiler has to assume that VOUT may overlap V1, so it cannot safely use SIMD instructions (or at least it has to be clever enough to insert a test for overlaps and branch to a non-SIMD fallback just in case).

Rust’s rules are even stricter. It doesn’t matter whether you actually assign a value at run-time – the mere existence of two references at the same time, where at least one of them is declared mut, is forbidden.

This is nice because it saves you from accidentally violating the aliasing rules at run-time, which in FORTRAN/C would result in undefined behaviour. But it’s a pain when translating FORTRAN to Rust.

For example FORTRAN allows you to use non-overlapping regions of the same array:

      DOUBLE PRECISION    STATE ( 9 )

      CALL VADDG ( STATE(0), STATE(3), 3, STATE(6) )

The equivalent in Rust generates a compile error because each slice borrows the entire array:

error[E0502]: cannot borrow `state` as mutable because it is also borrowed as immutable
   |
13 |   VADDG(&state[0..3], &state[3..6], 3, &mut state[6..9]);
   |   -----  ----- immutable borrow occurs here ^^^^^ mutable borrow occurs here
   |   |
   |   immutable borrow later used by call
   |
   = help: use `.split_at_mut(position)` to obtain two mutable non-overlapping sub-slices

The suggested split_at_mut splits a slice into two, but in this case we want to split into three, and in general it would be pretty tricky to generate the right code for any combination of ranges. Luckily Rust stabilised get_disjoint_mut just as I was writing this code, which allows a much cleaner solution:

let mut state = [0.0; 9];

let [v1, v2, vout] = state
    .get_disjoint_mut([0..3, 3..6, 6..9])
    .expect("ranges must not overlap");

VADDG(v1, v2, 3, vout);

Overlaps which would cause an aliasing violation in FORTRAN will instead cause a run-time panic in Rust. That’s not great, but it’s much better than undefined behaviour, and hopefully any bugs will be picked up by the test suite.

Unfortunately, some overlaps which wouldn’t cause an aliasing violation in FORTRAN will also panic. For example it’s okay if V1 and V2 overlap (as neither is written to), but get_disjoint_mut doesn’t distinguish mut and non-mut ranges, so it’s stricter than necessary.

Even more unfortunately, SPICE frequently and shamelessly violates FORTRAN’s aliasing rule. It’s common for array transformation functions to have IN and OUT arguments, with documentation saying you can call it with OUT = IN for an in-place transformation. It appears to be careful not to read an array element after writing to the same element via an aliased argument, so it will probably avoid triggering any misbehaviour in practice, but it’s still quite naughty.

To solve this, f2rust implements a safe but inefficient solution: whenever an array symbol is used twice in a function call, and at least one of them is a mut argument, all the non-mut arguments are cloned into new arrays. If more than one is mut, then all the mut arguments are created with get_disjoint_mut.

The earlier example would become something like:

let mut state = [0.0; 9];

VADDG(
    &state[0..3].to_vec(),
    &state[3..6].to_vec(),
    3,
    &mut state[6..9]
);

where slice::to_vec() copies the slice into a heap-allocated Vec. This works fine even if the ranges are not disjoint. As long as the function doesn’t attempt read-after-write of an aliased element, it shouldn’t notice that it’s reading from an old clone of the array; and even if it does, at least the behaviour is well-defined.

As yet another complication, this example only works because the mut is the final argument. Rust evaluates arguments left-to-right and it won’t allow accesses to &state after borrowing the &mut state. Fortunately SPICE’s convention is to always put the output arguments last, so we don’t need to worry about this.

This is the area of f2rust that is most tightly coupled to SPICE: we don’t have a good general solution for mapping FORTRAN’s aliasing onto Rust, so instead there’s a series of rules for the specific cases that occur in SPICE. Other FORTRAN codebases may require a different series of rules.

EQUIVALENCE

To give more aliasing headaches, FORTRAN’s EQUIVALENCE statement declares that multiple variables are stored in the same memory location, like an inside-out version of C’s union type. For example:

      INTEGER               BEGIN
      INTEGER               END
      INTEGER               PTR ( 2 )

      EQUIVALENCE         ( BEGIN, PTR(1) )
      EQUIVALENCE         ( END,   PTR(2) )

says BEGIN has the same storage as PTR(1), and END the same as PTR(2). The standard also requires that PTR(1) and PTR(2) are stored consecutively, so it’s BEGIN and END that have to be moved around to make this work. Effectively it’s giving friendly names for the given array elements. Unlike aliased dummy arguments, the program is allowed to read and write the same memory through all the equivalenced names (if they’re of the same primitive type).

But Rust wouldn’t let us have both BEGIN: &mut i32 and PTR: &mut [i32] in scope simultaneously, pointing at the same memory, because &mut is an exclusive reference. We implement this ‘friendly name’ case by recognising the pattern and expanding it during AST processing: every use of the expression BEGIN is simply replaced with PTR(1) and we never have to worry about it again.

In general the EQUIVALENCE variables may be of different types, but it’s not permitted to write as one type and read as another.

As it happens, that “not permitted” case is relied on by SPICE. Some data files pack DOUBLE PRECISION and INTEGER values together, so it reads from disk into a DOUBLE PRECISION array then declares an EQUIVALENCE with an INTEGER array, letting it copy the INTEGER values out.

For this case, we pick one of the types (the one with the strictest alignment requirement in Rust, i.e. the DOUBLE PRECISION) as the primary representation. The array is allocated with that type and can be accessed via that type as normal. When accessed via the other type (the INTEGER) we temporarily cast it to the other type with bytemuck::must_cast_slice[_mut], which is like a compile-time-safety-checked version of mem::transmute. As long as the code doesn’t try to access both types in the same expression (which would annoy Rust’s borrow checker), this temporary cast works okay.

Strings

FORTRAN 66 (the predecessor to 77) had a pretty crazy approach to strings. This isn’t relevant to f2rust, but I find it interesting so let’s discuss it anyway.

The “Hollerith type” represents strings, but it doesn’t exist in the language syntax; you cannot declare a variable as a Hollerith type. Instead you declare it as INTEGER, or as an array of integers, where each value represents an implementation-dependent number of characters.

The IBM 704, where FORTRAN was originally developed, had 36-bit words and 6-bit characters, so each INTEGER could contain 6 characters. (Incidentally this is probably why FORTRAN symbols have a maximum length of 6 characters: the original compiler could easily store each symbol in a single word.)

The syntax does support Hollerith constants, which are simply an alternative way of writing integer constants, expressed as a length n and an H followed by n characters:

      INTEGER W(2)

C     Initialise array with two Hollerith constants
      DATA W / 4Habcd, 2Hef /

C     Print with hex formatting
      PRINT '("0x", Z8)', W

C     Little-endian 32-bit ASCII output:
C       0x64636261
C       0x20206665

(The shorter constant here is padded with 0x20, which is the ASCII space character.)

To manipulate strings, you have to do it all yourself with integer arithmetic, written specifically for your machine’s word length and character length. And the language didn’t even have bit-shift operations, which must have made it particularly awkward.

In some implementations you could also store strings in DOUBLE PRECISION, where each element contains twice as many characters as INTEGER. You lose the ability to do arithmetic on individual characters, but it might simplify or improve performance of some code that treats strings as opaque values.

FORTRAN 77 put a lot more effort into supporting strings properly. It introduced the CHARACTER type, representing a string. (There is no separate type for an individual character but you can easily use a string of length 1.)

CHARACTER variables are declared with a fixed length, with the syntax CHARACTER*N. If you store a shorter string into the variable it will be padded with blanks (space characters). Many operations (though not all) ignore trailing blanks, effectively allowing variable-length strings in fixed-size allocations.

You can read and write substrings:

      CHARACTER*32 MSG
      DATA MSG / 'Hello world' /

      MSG(7:11) = MSG(1:5)
      MSG(13:) = '!!'

      PRINT *, MSG

C     Output:
C       Hello Hello !!

Assignment will truncate or pad with blanks to ensure the value fits into the destination string or substring, making this much easier and safer than in C.

You can also have CHARACTER arrays, which are simply arrays where every element is a string of equal length. (Contrast with C where you’d typically have an array of pointers to strings, allowing each string to have a different length.)

String arguments

Things get trickier when you pass strings as arguments or return them from functions.

As discussed before, FORTRAN doesn’t really do interfaces. The caller passes values of whatever types it fancies – everything’s basically a pointer – and the function reinterprets them as whatever type was declared inside the function.

The length of a string is part of its CHARACTER type. If you pass a CHARACTER*N string to a function that declares CHARACTER*M, and N >= M, that’s okay. The function gets access to an M-length substring of the argument and won’t touch any characters outside that.

If N < M, that’s not allowed. The FORTRAN compiler probably won’t detect this and you’ll get undefined behaviour.

But a function can also declare its dummy argument as CHARACTER*(*), which means it adopts the length of the caller’s string (essentially setting M = N). The compiler needs to know this length at run-time so it can truncate and pad appropriately.

To implement this, compilers like gfortran add a hidden argument to the end of the function for every CHARACTER argument, containing the caller’s string length, just in case the function declared it as CHARACTER*(*) and needs to know the length.

A similar issue occurs when functions return strings. FORTRAN is designed with an expectation that the caller allocates space for the returned string, so the function can declare its return type as CHARACTER*(*) but the caller must provide an explicit length. The compiler must pass hidden arguments containing both that string pointer and its length.

Rust strings

This kinda maps onto Rust strings easily enough, since string slices (&str) hold both pointer and length. There is just the problem of Rust strings being guaranteed-valid UTF-8 while FORTRAN has barely even heard of ASCII. To handle this, f2rust internally uses &[u8] for strings (allowing non-UTF-8 in the translated FORTRAN code), while the generated public API uses the more ergonomic &str.

Non-mutable string arguments are trivial to convert in the API wrapper with str::as_bytes(), but mutable arguments need str::as_bytes_mut() which is an unsafe function. as_bytes_mut documents that the bytes must be valid UTF-8 when the borrow ends, so we use a wrapper type with a drop that checks validity and panics before anything illegal can happen.

When APIs have variable-length string outputs, FORTRAN will have padded them with blanks, and we don’t want to expose that FORTRAN implementation detail to the Rust user. So rsspice has a second layer of user-friendly API wrappers that allocate a String with the maximum possible output length (painfully determined by manually reading the documentation and/or implementation of every string-outputting API), pass it as &mut str to the inner wrapper (which does the as_bytes_mut), then trim the String in the outer wrapper before returning it.

Character arrays

Unfortunately that mapping won’t work for CHARACTER arrays. Remember we said you can pass CHARACTER*N to a function expecting CHARACTER*M with N > M and it just acts on a substring. What if you pass an array CHARACTER*N ARR(10) to a function expecting CHARACTER*M ARR(10) with N != M?

FORTRAN assumes array elements are stored consecutively in memory, so the function expects the first string at bytes 1 to M (inclusive), the second at bytes M+1 to 2*M, etc. But that’s not how the caller stored the strings.

So the function simply ignores the caller’s string boundaries, treats the whole array as a flat sequence of bytes, and reinterprets them in a new shape:

      PROGRAM MAIN
C       Initialise with string length 5
        CHARACTER*5 WORDS(3)
        DATA WORDS / 'Hello', 'world', 'abcde' /

        CALL TEST(WORDS)
      END

      SUBROUTINE TEST(ARR)
C       Read with string length 4
        CHARACTER*4 ARR(3)
        PRINT *, ARR(1)
        PRINT *, ARR(2)
        PRINT *, ARR(3)
      END

C     Output:
C       Hell
C       owor
C       ldab

(The FORTRAN specification explicitly requires this behaviour, it’s not something they just forgot to ban.)

We can’t map this onto any idiomatic Rust type, so instead we add custom CharArray / CharArrayMut types for the public API, which wrap a &[u8] and a string length. As with DummyArray described earlier, function bodies will wrap these in DummyCharArray or DummyCharArrayMut2D etc which can reinterpret the &[u8] with a new string length (and a new array dimensionality). Unfortunately not a very elegant solution; rsspice adds some wrapper types to make it slightly nicer to use, but this is probably the worst mismatch between FORTRAN and Rust from an API user’s perspective.

DATA initialisation

FORTRAN has a relatively elaborate system for static initialisation. You can initialise a single variable to a constant value:

      INTEGER A
      DATA A /100/
C     A = 100

Or you can initialise a list of variables to a list of values of varying types:

      INTEGER B
      CHARACTER*4 C
      DATA B, C /100, 'TEST'/
C     B = 100
C     C = 'TEST'

The variable list can include arrays, and the value list can include repetition (reps*value):

      INTEGER D(3), E(3)
      DATA D, E /100, 4*200, 300/
C     D = [100, 200, 200]
C     E = [200, 200, 300]

The variable list can include “implied-DO” loops, which select a subset of array elements by iterating over a loop variable and evaluating the index:

      INTEGER F(10)
      DATA (F(I*2-1), I=1,5) /5*0/
      DATA (F(I*2  ), I=1,5) /5*1/
C     F = [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]

      INTEGER G(3,3)
      DATA ((G(I,J), I=1,3), J=1,3) /1,2,3,4,5,6,7,8,9/
C     G = [ [1,2,3], [4,5,6], [7,8,9] ]

The compiler can fully evaluate these at compile-time, but that’s not ideal for f2rust. Repetition and implied-DO loops might expand into a very large number of elements; the generated Rust code will be harder to read and compare to the FORTRAN; and we allocate all arrays dynamically, so we’d waste memory storing a static version that’s never used except to copy into the dynamic array.

Instead we translate the whole value list into a chained iterator, and move values from the iterator into each of the target variables or elements in turn. For the /100, 4*200, 300/ example:

let mut clist =
    [Val::I(100)].into_iter()
    .chain(std::iter::repeat_n(Val::I(200), 4))
    .chain([Val::I(300)]);

D.iter_mut().for_each(|n| *n = clist.next().unwrap().into_i32());
E.iter_mut().for_each(|n| *n = clist.next().unwrap().into_i32());

and for the implied-DO example:

let mut clist = [ Val::I(1), Val::I(2), Val::I(3),
                  Val::I(4), Val::I(5), Val::I(6),
                  Val::I(7), Val::I(8), Val::I(9) ].into_iter();

for J in 1..=3 {
    for I in 1..=3 {
        G[[I, J]] = clist.next().unwrap().into_i32();
    }
}

FORTRAN doesn’t require the values to all be the same type, but Rust iterators must have one item type. FORTRAN will also do some automatic type conversion from the values to the target types. So we wrap values in an enum Val { I(i32), D(f64), ... } which implements Val::into_i32() etc to convert as needed for each target.

This isn’t incredibly efficient Rust code, but it isn’t incredibly inefficient either (after the optimiser has had a go over it), and SPICE doesn’t initialise particularly large data tables this way, so it’s okay for now.

DATA variables ought to be declared as SAVE even if they are read-only, because they are only initialised once at startup and (per the standard) all non-SAVE variables become undefined after the function returns, so you couldn’t call the function a second time. (Fortran 90 makes them implicitly SAVEd even if you don’t declare it, but FORTRAN 77 doesn’t protect you from this mistake). In f2rust we do all this allocation and initialisation lazily in SaveVars::new(), which keeps it thread-safe and reduces the startup cost.

IO

Nowadays we think of a file as a random-access sequence of bytes, with applications layering their own structure on top of it. FORTRAN does not think that way.

In FORTRAN, a file is a sequence of records. One record might correspond to one punched card, or perhaps a region of magnetic tape or disk. The application declares the structure of each record in terms of FORTRAN types, and the FORTRAN implementation maps that onto the underlying storage device in a fairly opaque way.

Some files only support sequential access. You can read/write the current record and automatically advance to the next; or you can backspace to the previous record. This makes sense for magnetic tape, where random access is physically impossible. It also allows variable-size records.

Other files support direct access. You can read/write any record at any time, identified by record number. This requires fixed-size record storage.

Additionally, FORTRAN supports both formatted and unformatted files. Formatted are text files, where the standard describes how to read/write values as text. Unformatted are binary files with an implementation-dependent representation.

For formatted (text) files, the READ, WRITE and PRINT statements take a format specification string to describe the record. This is a sequence of edit descriptors: for example I8 is an 8-character-wide integer (usually padded with spaces), F11.7 is an 11-character-wide float with 7 decimal places, A4 is a 4-character string.

Edit descriptors can be grouped and repeated: 3(A4,I8) is three pairs of strings and integers. Some edit descriptors change formatting flags or position: e.g. SP means any subsequent positive integers will be output with a preceding +; TL3 moves the read/write cursor 3 characters to the left within the current record.

Just like DATA statements, READ/WRITE/PRINT have a list of variables, array elements, implied-DO lists, etc. If there are enough elements, it will read/write multiple records with the same format specification. For example you can print a 3x3 matrix with the format specification for a single row:

      DOUBLE PRECISION M(3,3)
      PRINT '(3F4.1)', M
C     Example output:
C       1.0 2.0 3.0
C       4.0 5.0 6.0
C       7.0 8.0 9.0

C     Or transposed using nested implied-DO:
      PRINT '(3F4.1)', ((M(I,J), J=1,3), I=1,3)

We implement this with a very similar mechanism to DATA statements. The format specification is parsed into an iterator that expands any repetitions and returns a sequence of edit descriptors, and the generated code consumes the iterator to see how to format each value in turn. (Format specification strings can be generated by the application at run-time, so the parsing must also be done at run-time.)

Float formatting is tricky because it doesn’t exactly match Rust’s built-in formatting. I ended up using the ryu float-to-string crate, or rather a fork that exposes its internal decimal-mantissa-and-exponent representation, and wrote my own code to put the zeroes and decimal point in the right places. FORTRAN leaves some of the formatting implementation-defined (e.g. it equally allows +0.5 and 0.5 and .5 by default, if there was no edit descriptor like SP to force the optional characters on/off), but some of SPICE’s test suite assumes what is evidently the convention among modern Linux-based compilers (like gfortran), so we try to match that convention.

For unformatted files there is no format specification string, it just depends on the types of values you’re reading/writing. I couldn’t find any documentation of the unformatted representation on modern platforms but some experimenting with gfortran showed it was trivial: INTEGER matches Rust’s i32::to_le_bytes(), DOUBLE PRECISION is f64::to_le_bytes(), LOGICAL (boolean) is encoded as a 0/1 INTEGER, etc, with no extra padding or alignment.

For the record storage, we have four combinations to test in gfortran (because I couldn’t find any documentation of this either):

Sequential, formatted: Variable-length strings terminated with a newline character. (Like, a normal text file, one record per line.)
Sequential, unformatted: Binary data, preceded and followed with a 32-bit little-endian length. This makes it possible to both read and backspace over each record.
Direct, formatted: Strings padded with space characters up to the specified record length.
Direct, unformatted: Binary data padded with 0x00 bytes up to the specified record length.

After reproducing this behaviour, our implementation is compatible with the published SPICE data files, so that’s nice.

IO errors

FORTRAN has relatively sensible IO error handling. The IO statements (OPEN, READ, etc) can be called with an IOSTAT=ios parameter and will set the ios variable to 0 on success, some positive error code on failure, and negative on end-of-file. The application must check ios and handle it appropriately.

IO statements can also have ERR=label, in which case execution will jump to the given statement label on error, so you don’t have to check ios after every call. (This is a form of GO TO and therefore hard to translate to Rust, but fortunately SPICE doesn’t use it so we don’t need to support it.)

If there is neither IOSTAT nor ERR then IO errors will automatically terminate the program, saving you from undefined behaviour.

f2rust integrates this with Rust’s standard error mechanism: the IO operations return Result<T, Error> (with a custom Error enum that wraps std::io::Error and EndOfFile etc), and by default the translated functions propagate errors up to the public API with ?. When IOSTAT=ios is specified, the IO operation is wrapped in a closure that intercepts the Error and maps it onto an error code instead.

Putting all this together, we end up with code like:

      DOUBLE PRECISION      RECORD ( 128 )

C     Record length in bytes
      INTEGER               RECL
      PARAMETER           ( RECL  = 1024 )

      OPEN ( UNIT   = UNIT,
     .       FILE   = FNAME,
     .       ACCESS = 'DIRECT',
     .       RECL   = RECL,
     .       STATUS = 'OLD',
     .       IOSTAT = IOSTAT      )

      READ ( UNIT   = UNIT,
     .       REC    = RECNO,
     .       IOSTAT = IOSTAT  )    RECORD

translating into:

let specs = io::OpenSpecs {
    unit: Some(UNIT),
    file: Some(FNAME),
    access: Some(b"DIRECT"),
    recl: Some(RECL),
    status: Some(b"OLD"),
    ..Default::default()
};
IOSTAT = io::capture_iostat(|| ctx.open(specs))?;

let mut reader = io::UnformattedReader::new(ctx.io_unit(UNIT)?, Some(RECNO))?;
IOSTAT = io::capture_iostat(|| {
    reader.start()?;
    for n in RECORD.iter_mut() {
        *n = reader.read_f64()?;
    }
    reader.finish()?;
    Ok(())
})?;

(UNIT is a numeric identifier for the open file; the application is responsible for allocating unique identifiers for all its files. This is more global state that we have to store in the Context object. And the application has to magically know that units 5 and 6 are reserved for stdin and stdout on most platforms.)

Conclusion

At this point it’s many months since I actually wrote the code, so there’s probably a lot of bits I’ve forgotten about, but I think I got round to mentioning most of the major topics and it’s time to give up and publish this.

When I started I didn’t really know what to expect of FORTRAN, and I ended up pleasantly surprised. There’s a lot of rough edges and misfeatures that we know to avoid nowadays – but, considering it’s a half-century-old version of an even older language, it seems pretty coherent and well designed and well documented and generally tolerable to work with. I don’t recommend you go and start writing new programs in FORTRAN 77, but it certainly wouldn’t be the worst choice. In any case, it’s fun to look at the history of programming languages and get a better sense of how we’ve ended up where we are now.

But I still haven’t made any progress on that Voyager project.