public final class Utf8Safe extends Utf8
There are several variants of UTF-8. The one implemented by this class is the restricted definition of UTF-8 introduced in Unicode 3.1, which mandates the rejection of "overlong" byte sequences as well as rejection of 3-byte surrogate codepoint byte sequences. Note that the UTF-8 decoder included in Oracle's JDK has been modified to also reject "overlong" byte sequences, but (as of 2011) still accepts 3-byte surrogate codepoint byte sequences.
The byte sequences considered valid by this class are exactly those that can be roundtrip converted to Strings and back to bytes using the UTF-8 charset, without loss:
Arrays.equals(bytes, new String(bytes, Internal.UTF_8).getBytes(Internal.UTF_8))
See the Unicode Standard, Table 3-6. UTF-8 Bit Distribution, Table 3-7. Well Formed UTF-8 Byte Sequences.
| Constructor and Description |
|---|
Utf8Safe() |
| Modifier and Type | Method and Description |
|---|---|
String |
decodeUtf8(ByteBuffer buffer,
int offset,
int length)
Decodes the given UTF-8 portion of the
ByteBuffer into a String. |
static String |
decodeUtf8Array(byte[] bytes,
int index,
int size) |
static String |
decodeUtf8Buffer(ByteBuffer buffer,
int offset,
int length) |
int |
encodedLength(CharSequence in)
Returns the number of bytes in the UTF-8-encoded form of
sequence. |
void |
encodeUtf8(CharSequence in,
ByteBuffer out)
Encodes the given characters to the target
ByteBuffer using UTF-8 encoding. |
getDefault, setDefaultpublic static String decodeUtf8Array(byte[] bytes, int index, int size)
public static String decodeUtf8Buffer(ByteBuffer buffer, int offset, int length)
public int encodedLength(CharSequence in)
Utf8sequence. For a string,
this method is equivalent to string.getBytes(UTF_8).length, but is more efficient in
both time and space.encodedLength in class Utf8public String decodeUtf8(ByteBuffer buffer, int offset, int length) throws IllegalArgumentException
ByteBuffer into a String.decodeUtf8 in class Utf8IllegalArgumentException - if the input is not valid UTF-8.public void encodeUtf8(CharSequence in, ByteBuffer out)
ByteBuffer using UTF-8 encoding.
Selects an optimal algorithm based on the type of ByteBuffer (i.e. heap or direct)
and the capabilities of the platform.
encodeUtf8 in class Utf8in - the source string to be encodedout - the target buffer to receive the encoded string.Copyright © 2020. All rights reserved.