 Modified 2020-03-01T18:17:07Z

I have implemented a simple Java class, `RomanConverter`, for converting between Roman numerals and corresponding integers. The conversions are made as described in Wikipedia's page Roman numerals (especially the section Reading Roman numerals).

In my implementation I am using the original symbols together with the subtractive notation symbols. Each symbol can occur only once, except for the four symbols `I`, `X`, `C` and `M`, where the values start with 1, that can be repeated up to three times. (For more details, see the explanations of the `Symbol` enum below.)

The main goals have been to keep it simple, intuitive, elegant and understandable. By looking into this solution, you will hopefully get a better understanding of the beautiful world of Roman numerals and their relationship with integers.

There are two public methods that do the conversions:

• `String convert(int number)` that converts from an integer to its Roman numeral representation, e.g. `convert(8)` returns `"VIII"`.
• `int convert(String roman)` that converts from a Roman numeral to its corresponding integer, e.g. `convert("VIII")` returns `8`.

# The `RomanConverter` class

Here is a listing of the complete `RomanConverter` class. I will go through each of the important parts of the class in the next section.

```/**
* Converts between Roman numerals and corresponding integers.
*
* Implemented according to algorithms described in "Roman numerals".
*
* @author Anders Gustafson (www.nemonisimors.com) MMXIV
* @see Roman numerals
*/
public class RomanConverter {

/**
* Since max repetitions for each Roman symbol is three and
* largest symbol (M) represents 1000, i.e. MMMM minus I.
*/
static final int MAX_FOR_ROMAN_NUMERAL = 3999;

/**
* The seven original Roman symbols (single characters) and
* the six "subtractive notation" symbols (double characters).
*
*/
private enum Symbol {
// Values in descending order is important for the conversion
// algorithms, since "Symbols are placed from left to right
// in order of value, starting with the largest".
M(1000),
CM(900),
D(500),
CD(400),
C(100),
XC(90),
L(50),
XL(40),
X(10),
IX(9),
V(5),
IV(4),
I(1);

private final int value;
private final int maxRepetitions;
private final int numberOfSymbolsToSkipIfPresent;

Symbol(final int value) {
final int firstDigit = getFirstDigit(value);
this.value = value;
this.maxRepetitions = firstDigit == 1 ? 3 : 1;
switch (firstDigit) {
case 9:
this.numberOfSymbolsToSkipIfPresent = 3;
break;
case 5:
case 4:
this.numberOfSymbolsToSkipIfPresent = 1;
break;
case 1:
this.numberOfSymbolsToSkipIfPresent = 0;
break;
default:
throw new IllegalStateException(String.format("Invalid value %d defined for symbol %s", value, this));
}
}

private static int getFirstDigit(final int n) {
final int m = Math.abs(n);
return m < 10 ? m : getFirstDigit(m / 10);
}
}

/**
* Converts an integer to its Roman numeral representation.
*
* @param number An integer, between 0 and {@link #MAX_FOR_ROMAN_NUMERAL}.
* @return The Roman numeral corresponding to the integer.
* @throws IllegalArgumentException if input is out of range.
*/
public static String convert(final int number) {
verifyValidArgument(0 <= number && number <= MAX_FOR_ROMAN_NUMERAL,
"Roman numerals out of range: %s", number);

final StringBuilder roman = new StringBuilder();
int n = number;

for (final Symbol symbol : Symbol.values()) {

for (int rest = n - symbol.value;
rest >= 0;
n = rest, rest -= symbol.value) {

roman.append(symbol);
}
}

return roman.toString();
}

/**
* Converts a Roman numeral to its corresponding integer.
*
* @param roman The Roman numeral, to be converted to an integer.
* @return The integer corresponding to the Roman numeral.
* @throws IllegalArgumentException if input is null,
*                                  or is an invalid Roman numeral.
*/
public static int convert(final String roman) {
verifyValidArgument(roman != null, "Roman numeral can not be null");

int result = 0;
String romanCopy = roman.toUpperCase();

int numberOfSymbolsToSkip = 0;
for (final Symbol symbol : Symbol.values()) {

if (numberOfSymbolsToSkip > 0) {
numberOfSymbolsToSkip--;
continue;
}

int maxRepetitions = symbol.maxRepetitions;
final String symbolAsString = symbol.toString();
while (romanCopy.startsWith(symbolAsString)) {
verifyValidArgument(--maxRepetitions >= 0,
"Invalid Roman numeral %s: repeats %s too many times", roman, symbolAsString);

result += symbol.value;
romanCopy = romanCopy.replaceFirst(symbolAsString, "");
numberOfSymbolsToSkip = symbol.numberOfSymbolsToSkipIfPresent;
}
}

verifyValidArgument(romanCopy.isEmpty(),
"Invalid Roman numeral %s: can not parse the rest %s", roman, romanCopy);

return result;
}

private static void verifyValidArgument(final boolean assertion,
final String message,
final Object... messageArgument) {
if (!assertion) {
throw new IllegalArgumentException(String.format(message, messageArgument));
}
}
}
```

# Explanations of the important parts

There are three important parts in the code above, which I will explain further, by first giving a code snippet and then going into the details.

## The `Symbol` enum

```private enum Symbol {
// Values in descending order is important for the conversion
// algorithms, since "Symbols are placed from left to right
// in order of value, starting with the largest".
M(1000),
CM(900),
D(500),
CD(400),
C(100),
XC(90),
L(50),
XL(40),
X(10),
IX(9),
V(5),
IV(4),
I(1);

private final int value;
private final int maxRepetitions;
private final int numberOfSymbolsToSkipIfPresent;

Symbol(final int value) {
final int firstDigit = getFirstDigit(value);
this.value = value;
this.maxRepetitions = firstDigit == 1 ? 3 : 1;
switch (firstDigit) {
case 9:
this.numberOfSymbolsToSkipIfPresent = 3;
break;
case 5:
case 4:
this.numberOfSymbolsToSkipIfPresent = 1;
break;
case 1:
this.numberOfSymbolsToSkipIfPresent = 0;
break;
default:
throw new IllegalStateException(String.format("Invalid value %d defined for symbol %s", value, this));
}
}

private static int getFirstDigit(final int n) {
final int m = Math.abs(n);
return m < 10 ? m : getFirstDigit(m / 10);
}
}
```

I choose to use an `enum` to represent the Roman symbols. I also choose to define both the seven original symbols (`I`, `V`, `X`, `L`, `C`, `D` and `M`) together with the six subtractive notation symbols (`IV`, `IX`, `XL`, `XC`, `CD` and `CM`). The reason for also using the subtractive notation symbols in the `enum` is because it makes the conversion methods much clearer and simpler.

The symbols are defined in descending order, which also will help in the conversion methods.

### Max number of repetitions

We need to keep track of the max number of repetitions for each symbol, in order to validate correctness for Roman numerals. For example, three `I`'s is okay (`III` which equals 3), but three `IV`'s is not okay (`IVIVIV`). Neither is two `V`'s okay (`VV`). The rule for a Roman numeral is that some symbols can be repeated up to three times and some symbols can only occur once. The only symbols that can be repeated (three times) are the four symbols that has a value that starts with 1, i.e. `I`, `X`, `C` and `M`. This is something that many implementations do not handle, for example `IIIIII` can be converted to 6, which I think should be avoided.

### Avoid ambiguity

There is another thing that many of the implementations I have seen handle incorrectly, but which I think is very important. We need to avoid ambiguity. 500 can be expressed by, for example, `D` or `CDC`. 900 can be expressed by, for example, `CM` or `DCD`. In my opinion the only valid expressions should be `D` and `CM` in those cases. There should be only one way to write a Roman numeral for every value.

In order to solve this issue with ambiguity, we keep track of which symbols can exist together. It turns out that the first digit of the symbol values will be useful here also. If the first digit is 9 we have to skip the three following symbols, if the first digit is 5 or 4 we have to skip the following symbol and if the first digit is 1 we do not have to skip any symbols.

An example will help to explain. If we have `XC`, we can not also use `L`, `XL` or `X`, and therefore we need to skip those three consecutive symbols. If we have `L`, we can not also use `XL`, and therefore we need to skip that symbol. If we have `XL`, we can not also use `X`, and therefore we need to skip that symbol. If we have `C`, we can use any of the consecutive symbols, and therefore we do not need to skip any symbols.

The variable `numberOfSymbolsToSkipIfPresent` is used to avoid this ambiguity.

## Converting integers to Roman numerals

```public static String convert(final int number) {
verifyValidArgument(0 <= number && number <= MAX_FOR_ROMAN_NUMERAL,
"Roman numerals out of range: %s", number);

final StringBuilder roman = new StringBuilder();
int n = number;

for (final Symbol symbol : Symbol.values()) {

for (int rest = n - symbol.value;
rest >= 0;
n = rest, rest -= symbol.value) {

roman.append(symbol);
}
}

return roman.toString();
}
```

Since we have both the original symbols and the subtractive notation symbols, the process of converting from integer to Roman numeral will be easy. Just loop over the `Symbol` enums (which are sorted in decreasing order) and if the current `Symbol`'s value is less than or equal to your integer, add the `Symbol` to your roman string and subtract the `Symbol`'s value from your integer, giving you the rest. Then just repeat this until your rest is zero.

## Converting Roman numerals to integers

```public static int convert(final String roman) {
verifyValidArgument(roman != null, "Roman numeral can not be null");

int result = 0;
String romanCopy = roman.toUpperCase();

int numberOfSymbolsToSkip = 0;
for (final Symbol symbol : Symbol.values()) {

if (numberOfSymbolsToSkip > 0) {
numberOfSymbolsToSkip--;
continue;
}

int maxRepetitions = symbol.maxRepetitions;
final String symbolAsString = symbol.toString();
while (romanCopy.startsWith(symbolAsString)) {
verifyValidArgument(--maxRepetitions >= 0,
"Invalid Roman numeral %s: repeats %s too many times", roman, symbolAsString);

result += symbol.value;
romanCopy = romanCopy.replaceFirst(symbolAsString, "");
numberOfSymbolsToSkip = symbol.numberOfSymbolsToSkipIfPresent;
}
}

verifyValidArgument(romanCopy.isEmpty(),
"Invalid Roman numeral %s: can not parse the rest %s", roman, romanCopy);

return result;
}
```

Converting from Roman numeral to integer is a bit trickier. Therefore we keep track of the max number of repetitions for each symbol. We also keep track of which symbols can exist together. As for the rest, it is almost the same algorithm as for converting integer to Roman. Loop over the `Symbol` enums and if your Roman string starts with the `Symbol`, add the `Symbol`'s value to your result and remove the `Symbol` from your Roman string. Then just repeat until your Roman string is empty.

This method will throw `IllegalArgumentException` for invalid input strings, such for example `IL`, `IVIV`, `IVI`, `XXL`, `ABC` and `MMMM`.

# Example of usage

As an example, you can easily exercise the conversion methods with the following code. It will convert all valid integers to Roman numerals and back again.

```IntStream.rangeClosed(0, RomanConverter.MAX_FOR_ROMAN_NUMERAL)
.forEach(n -> {
final String roman = RomanConverter.convert(n);
final int m = RomanConverter.convert(roman);
System.out.printf("%5d => %15s => %5d%n", n, roman, m);
if (n != m) {
throw new IllegalStateException(String.format("%d was converted to %d", n, m));
}
});
```

The output from the above code will look like:

```   0 =>                 =>     0
1 =>               I =>     1
2 =>              II =>     2
3 =>             III =>     3
4 =>              IV =>     4
5 =>               V =>     5
6 =>              VI =>     6
7 =>             VII =>     7
8 =>            VIII =>     8
9 =>              IX =>     9
10 =>               X =>    10
11 =>              XI =>    11
12 =>             XII =>    12
13 =>            XIII =>    13
14 =>             XIV =>    14
15 =>              XV =>    15
16 =>             XVI =>    16
17 =>            XVII =>    17
18 =>           XVIII =>    18
19 =>             XIX =>    19
20 =>              XX =>    20
:                  :        :
3997 =>      MMMCMXCVII =>  3997
3998 =>     MMMCMXCVIII =>  3998
3999 =>       MMMCMXCIX =>  3999
```

Some examples of exception messages when you try to convert invalid Roman numeral strings:

```Invalid Roman numeral XIVIV: repeats IV too many times
Invalid Roman numeral IVIV: repeats IV too many times
Invalid Roman numeral IVI: can not parse the rest I
Invalid Roman numeral IVIII: can not parse the rest III
Invalid Roman numeral IXV: can not parse the rest V
Invalid Roman numeral MXLXL: repeats XL too many times
Invalid Roman numeral CMCM: repeats CM too many times
Invalid Roman numeral ABC: can not parse the rest ABC
Invalid Roman numeral IL: can not parse the rest L
Invalid Roman numeral ID: can not parse the rest D
Invalid Roman numeral MMMM: repeats M too many times
Invalid Roman numeral MA: can not parse the rest A
Invalid Roman numeral DCD: can not parse the rest D
Invalid Roman numeral CDC: can not parse the rest C
Invalid Roman numeral IIII: repeats I too many times
Invalid Roman numeral VV: repeats V too many times
```

# Conclusions/summary

How to convert between Roman numerals and integers is described by a simple algorithm, but when you start to examine the details and corners, you will see that it is far from trivial to implement an elegant solution. The conversion from integer to Roman numeral can be implemented quite easy and elegant, but the conversion from Roman numeral to integer has some pitfalls to consider. I think that implementing the two conversion methods (`String convert(int number)` and `int convert(String roman)`) can be excellent tasks for a Code Retreat or Code Kata.

When using only the seven original symbols it is hard to make the conversions elegant. When also using the six subtractive notation symbols, the conversion implementations will be much more elegant and simple. (On the other hand, if the subtractive notation symbols should not have been invented in the first place, the conversions would be much simpler and more elegant.)

It is interesting to note that the integers on one hand are built with the position based system (base 10). Every digit (0-9) defines a value that depends on where it is. If a position does not have a value you must fill it with a zero. On the other hand, the Roman numerals are not position based but merely "value based", since if a certain symbol exists it always describes the same value. So where integers have more digits for larger numbers, Roman numerals sometimes have fewer characters for larger numbers. For example, 1 < 10 < 300 < 499 < 500 < 999 < 1000, compared with `I` < `X` < `CCC` < `CDXCIX` < `D` < `CMXCIX` < `M`.

Trivia: The longest valid Roman numeral is `MMMDCCCLXXXVIII` which equals 3888 and consists of 15 characters. 