开发者

How many bytes of memory is a tweet?

开发者 https://www.devze.com 2023-03-06 07:55 出处:网络
140 characters.How much memory would it take up ? I\'m trying to calc开发者_如何转开发ulate how many tweets my EC2 Large instanceMongo DB can hold.Twitter uses UTF-8 encoded messages.

140 characters. How much memory would it take up ?

I'm trying to calc开发者_如何转开发ulate how many tweets my EC2 Large instance Mongo DB can hold.


Twitter uses UTF-8 encoded messages.

UTF-8 code points can be up to six four octets long, making the maximum message size 140 x 4 = 560 8-bit bytes.

This is, of course, just for the raw messages, excluding storage overhead, indexing and other storage-related padding.

e: Twitter successfully let me post the message:

™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™™

Yes, that's 140 trademark symbols, which are three octets each in UTF-8


Back in September, an engineer at Twitter gave a presentation that suggested it's about 200 bytes per tweet.

Of course you still have to account for overhead for your own metadata and the database itself, but 200 bytes/record is probably a good place to start.


Typically it's two bytes per character if you're storing Unicode as UTF-8, so that would mean 280 bytes max per tweet.


Probably 284 bytes in memory ( 4 byte length prefix + length*2). Inside the DB I cannot say but probably 280 if the DB is UTF-8, you could add some bytes of overhead, for metadata etc.


Potentially of interest:
http://mehack.com/map-of-a-twitter-status-object
Anatomy of a Twitter Status Object

Also more about twitter character encoding:
http://dev.twitter.com/pages/counting_characters


It's technically stored as UTF-8, and in reality, the slide deck from a tweeter guy here http://www.slideshare.net/raffikrikorian/twitter-by-the-numbers gives the real stat about it:

140 characters, ~200 bytes

0

精彩评论

暂无评论...
验证码 换一张
取 消