Encoding
Learn about character encoding and how it affects your messages.
Encoding schemes
Character encoding refers to the scheme used to represent characters. The default encoding for SMS is GSM 7, which uses 7 bits to represent a single character. The GSM character set contains most standard text and symbol characters. Unicode is an alternative encoding scheme, which uses 16 bits to represent a single character. The Unicode character set includes emoji, smart quotes, and certain special characters.
Calculating character count
A single segment message can contain 140 bytes. At 8 bits per byte, a single segment can have up to 1,120 bits. Therefore, GSM encoded messages can have 160 characters in a single segment while Unicode can only contain 70 characters in a single segment. Messages with more characters are split into multiple segments. For more details about how this works, please reference our concatenation guide.
The table below illustrates how the number of segments changes based on the amount of characters and encoding type used.
Number of Segments | Maximum GSM Characters | Maximum Unicode Characters |
---|---|---|
1 | 160 | 70 |
2 | 306 | 134 |
3 | 459 | 201 |
4 | 612 | 268 |
5 | 765 | 335 |
6 | 918 | 402 |
7 | 1071 | 469 |
8 | 1224 | 536 |
9 | 1377 | 603 |
Managing message length and encoding scheme
Message encoding is based on the characters you use. When you're trying to keep the number of segments low, try crafting your message to avoid using Unicode characters. If you think you've avoided Unicode characters but your message's encoding is still being set to UCS-2, check for these commonly missed Unicode characters in your message:
- Smart quote
- Non-GSM whitespace
- Certain accented characters
Sometimes these characters can be introduced from copying and pasting your text from a text editor. To be confident about the encoding and length of your message before you send, you can use this handy tool.
Updated 2 months ago