I'm writing a code for handling SMS PDUs based on all those ETSI GSM documentations. There is one thing I need to ask about. PDU contains a User Data Length field followed by User Data. According to GSM 03.40, the UDL field is the number of septets of user data when the uncompressed GSM default alphabet is used. However, it also says, that when the data is compressed, then the UDL is the number of octets of user data.
See the quotes:
If the TP User Data is coded using the GSM 7 bit default alphabet, the TP User Data Length field gives an integer representation of the number of septets within the TP User Data field to follow.
[...]
If the TP User Data is coded using compressed GSM 7 bit default alphabet or compressed 8 bit data or compressed UCS2 [24] data, the TP User Data Length field gives an integer representation of the number of octets after compression within the TP User Data field to follow.
The problem is that when the 7-bit data is compressed and the number of octets of the compressed user data is a mu开发者_StackOverflow社区ltiple of 7, you don't know whether the last 7 bits in the last octet are fill bits or a real character. I.e. 7 octets may contain either 7 or 8 7-bit characters when compression is on. And when the UDL field is the number of octets, how can you know whether those 7 octets contain 7 or 8 characters?? If UDL contained the number of septets, everything would be clear, right? So have I misunderstood the documentation or does it really work this way?
Could anyone please explain me how it really works? Thanks in advance!
As you are already aware, creating an MMS message requires you to add a UDH before your text message. The UDH becomes part of your payload, thus reducing the number of characters you can send per segment.
As it has become part of your payload, it needs to confirm with your payloads data requirement - which is 7 bit. The UDH however, is 8 bit, which clearly complicates things.
Consider the UDH of the following as an example (It's a UDH for a concatenated message):
050003000302
- 05 is the length of the UDH (the 5 octets which follow)
- 00 is the IEI
- 03 is the IEDL (3 more octets)
- 00 is a reference (this number must be the same in each of your concatenated message UDH's)
- 03 is the maximum number of messages
- 02 is the current message number.
This is 6 octets in total - equating to 48 bits. This is all and well, but since the UDH is actually part of your SMS message, what you have to do is add more bits so that the actual message starts on a septet boundary. A septet boundary is every 7 bits, so in this case, we will have to add 1 more bit of data to make the UDH 49 bits, and then we can add our standard GSM-7 encoded characters.
You can read up more about this from Here
So, the thing is that I misunderstood the meaning of the compression bit in the Data Coding Scheme byte. I thought it referred to the 7-bit alphabet packing method (where at least one character is stored within one byte) but it refers to Huffman compression.
Therefore, the question above was kind of stupid. Sorry for that :-).
精彩评论