[color=red]1 。bitstring语法改动 添加了unicode数据类型[/color]
6.16 Bit Syntax Expressions
The types utf8, utf16, and utf32 specifies encoding/decoding of the Unicode Transformation Formats UTF-8, UTF-16, and UTF-32, respectively.
When constructing a segment of a utf type, Value must be an integer in one of the ranges 0..16#D7FF, 16#E000..16#FFFD, or 16#10000..16#10FFFF (i.e. a valid Unicode code point). Construction will fail with a badarg exception if Value is outside the allowed ranges. The size of the resulting binary segment depends on the type and/or Value. For utf8, Value will be encoded in 1 through 4 bytes. For utf16, Value will be encoded in 2 or 4 bytes. Finally, for utf32, Value will always be encoded in 4 bytes.
When constructing, a literal string may be given followed by one of the UTF types, for example: <<"abc"/utf8>> which is syntatic sugar for <<$a/utf8,$b/utf8,$c/utf8>>.
A successful match of a segment of a utf type results in an integer in one of the ranges 0..16#D7FF, 16#E000..16#FFFD, or 16#10000..16#10FFFF (i.e. a valid Unicode code point). The match will fail if returned value would fall outside those ranges.
A segment of type utf8 will match 1 to 4 bytes in the binary, if the binary at the match position contains a valid UTF-8 sequence. (See RFC-2279 or the Unicode standard.)
A segment of type utf16 may match 2 or 4 bytes in the binary. The match will fail if the binary at the match position does not contain a legal UTF-16 encoding of a Unicode code point. (See RFC-2781 or the Unicode standard.)
A segment of type utf32 may match 4 bytes in the binary in the same way as an integer segment matching 32 bits. The match will fail if the resulting integer is outside the legal ranges mentioned above.
[color=red]2. 新增加了 binary_to_atom atom_to_binary等bif.