Discussion:
A4Q1, why this one is invalid?
(too old to reply)
JIAN NAN CAI
2011-06-28 15:18:51 UTC
Permalink
I just get this from the sample utf8 generator and don't know why it is
invalid:

0xe0, 0x93, 0x90, '\n', // invalid, value encoded in wrong range


thx
Terry Anderson
2011-06-28 15:50:43 UTC
Permalink
If you convert these hex values to binary, it will be a lot easier to see
why a specific case is/is not a valid encoding.
--
Terry Anderson
CS 246 Instructor
Post by JIAN NAN CAI
I just get this from the sample utf8 generator and don't know why it is
0xe0, 0x93, 0x90, '\n', // invalid, value encoded in wrong range
thx
Jiannan CAI
2011-06-28 17:56:22 UTC
Permalink
Post by Terry Anderson
If you convert these hex values to binary, it will be a lot easier to
see why a specific case is/is not a valid encoding.
11100000 10010011 10010000

I get this result. It is of type 3 and it followed by enough 10xxxxxx.
So how it is invalid?
Terry Anderson
2011-06-28 18:17:00 UTC
Permalink
The third row of the chart says that for encodings using three bytes, the
character values must be between 0x800 and 0xffff. The data bits of the
encoding you mentioned are:

0000 0100 1101 0000 = 0x4d0

Since 0x4d0 < 0x800, this is not a valid encoding. In other words, it's
not enough to just verify that the check bits are correct; you must also
verify that the encoding represents a value in the correct range.
--
Terry Anderson
CS 246 Instructor
Post by Jiannan CAI
Post by Terry Anderson
If you convert these hex values to binary, it will be a lot easier to
see why a specific case is/is not a valid encoding.
11100000 10010011 10010000
I get this result. It is of type 3 and it followed by enough 10xxxxxx. So how
it is invalid?
Jiannan CAI
2011-06-28 18:16:27 UTC
Permalink
Post by Terry Anderson
If you convert these hex values to binary, it will be a lot easier to
see why a specific case is/is not a valid encoding.
NVM, I get your point now! thx

Loading...