A4Q1 - Reference Executable Discrepancy

Ashif Harji

2011-07-08 04:24:03 UTC

In testing my program I found a discrepancy between the reference executable
and my program with the Unicode character 0x110000.
11110100 10010000 10000000 10000000
My program detects that this is invalid at the second byte and prints the
last 3 bytes as extra characters. I believe this is correct because no valid
UTF8 character can have a 1 in the 4th position of the second byte since this
puts it above the allowed range no matter what the bits after it are.
But the reference executable prints out all of the bytes and then determines
that the UTF8 is invalid, printing no extra bytes.
From the assignment description, I was under the assumption that we were
supposed to check if a UTF8 character is valid as each byte is checked and
that appears to true for all Unicode characters less than 0x11000, so is this
a problem with the reference executable?

The short answer is no. Your output should match the reference
executable.

I've sent a more detailed answer directly to the student.

ashif