Looking for good programming challenges?

Use the search below to find our solutions for selected questions!

SimpleUTF8 reverse challenge

Sharing is caring!

Problem statement
Consider the SimpleUTF8 format that consists of characters of the following length:
– 1 byte which have the format 0XXXXXXX
– 2 bytes which have the format 110XXXXX 10XXXXXX
– 3 bytes which have the format 1110XXXX 10XXXXXX 10XXXXXX

You will receive a byte[] str. Your task is to reverse the array.

Sample input

0XXXXXXX 0YYYYYYY 110XXXXX 10XXXXXX 1110XXXX 10XXXXXX 10XXXXXX

Sample output

1110XXXX 10XXXXXX 10XXXXXX 110XXXXX 10XXXXXX 0YYYYYYY 0XXXXXXX

Solution
A solution, similar to the reverse words in a string problem, is to implement a helper function reverse(int start, int end, byte[] array) that reverses array[start:end]. Then we can do the following:
Step 1:
reverse(0, str.length, str) to reverse the entire byte array:

0XXXXXXX 0YYYYYYY 110XXXXX 10XXXXXX 1110XXXX 10XXXXXX 10XXXXXX

becomes

10XXXXXX 10XXXXXX 1110XXXX 10XXXXXX 110XXXXX 0YYYYYYY 0XXXXXXX  

Step 2:
Iterate over the (reversed)array using index i and check the prefixes:
– If byte starts with 0 we know that the current byte builds up a character of length 1 byte. Increment i: i = i + 1.
– If byte starts with 10 we know that the current byte belongs to either a 2-byte or 3-byte character. Note its index with start. We need to check the next byte. If it starts with 110 we know that we have a 2-byte character. Call reverse(start,i+1,str) to correct the character. Increment i: i = i + 2. Else, if the next byte also starts with 10 we know that we have a 3-byte character. Call reverse(start,i+2,str) to correct the character. Increment i: i = i + 3.

public class SimpleUTF8 {
    private static void reverse(int start, int end, byte[] array) {
        byte tmp;
        int i = start;
        int j = end;
        while (i < j) {
            tmp = array[i];
            array[i] = array[j];
            array[j] = tmp;
            i++;
            j--;
        }
    }

    private static int prefix(byte b) {
        if ((b & 0xff) >> 7 == 0) {
            return 0; //0
        } else if ((b & 0xff) >> 6 == 2) {
            return 2; //10
        } else if ((b & 0xff) >> 5 == 6) {
            return 6;//110
        }
        return 14;//1110
    }

    public static void main(String[] args) {
        //0XXXXXXX 0YYYYYYY 110XXXXX 10XXXXXX 1110XXXX 10XXXXXX 10XXXXXX
        byte[] str = new byte[]{(byte) 117, (byte) 127, (byte) 202, (byte) 128, (byte) 232, (byte) 136, (byte) 160};

        System.out.println("Input: ");
        for (byte b : str) {
            System.out.print(Integer.toBinaryString(b & 255 | 256).substring(1) + " ");
        }

        System.out.println("\nReversed:");

        // Reverse entire string
        reverse(0, str.length - 1, str);

        for (byte b : str) {
            System.out.print(Integer.toBinaryString(b & 255 | 256).substring(1) + " ");
        }

        // Correct string
        int start = 0;
        int i = 0;
        while (i < str.length) {
            int prefix = prefix(str[i]);
            if (prefix == 0)
                i++;
            else {
                start = i;
                i++;
                int nextPrefix = prefix(str[i]);
                if (nextPrefix == 6) { // 110
                    reverse(start, i, str);

                } else {
                    reverse(start, i + 1, str);
                    i++;
                }
                i++;
            }
        }


        System.out.println("\nCorrected reversed:");
        for (byte b : str) {
            System.out.print(Integer.toBinaryString(b & 255 | 256).substring(1) + " ");
        }
    }
}