Encoding

Encoding schemes

Character encoding refers to the scheme used to represent characters. The default encoding for SMS is GSM 7, which uses 7 bits to represent a single character. The GSM character set contains most standard text and symbol characters. Unicode is an alternative encoding scheme, which uses 16 bits to represent a single character. The Unicode character set includes emoji, smart quotes, and certain special characters.

Calculating character count

A single segment message can contain 140 bytes. At 8 bits per byte, a single segment can have up to 1,120 bits. Therefore, GSM encoded messages can have 160 characters in a single segment while Unicode can only contain 70 characters in a single segment. Messages with more characters are split into multiple segments. For more details about how this works, please reference our concatenation guide.

The table below illustrates how the number of segments changes based on the amount of characters and encoding type used.

Number of Segments	Maximum GSM Characters	Maximum Unicode Characters
1	160	70
2	306	134
3	459	201
4	612	268
5	765	335
6	918	402
7	1071	469
8	1224	536
9	1377	603

Managing message length and encoding scheme

Message encoding is based on the characters you use. When you're trying to keep the number of segments low, try crafting your message to avoid using Unicode characters. If you think you've avoided Unicode characters but your message's encoding is still being set to UCS-2, check for these commonly missed Unicode characters in your message:

Smart quote
Non-GSM whitespace
Certain accented characters

Sometimes these characters can be introduced from copying and pasting your text from a text editor. To be confident about the encoding and length of your message before you send, you can use this handy tool.