
A fluffy experiment in
analyzing emoji use in tweets

Lauren Ancona @laurenancona
Christopher Tufts @devlintufts


  • May 9 - July 11, 2015
  • 28,799 location-enabled tweets
  • 7,363 unique users

Tweet Cleaning

emoji.frequency <- matrix(NA, nrow = nrow(ds), ncol = nrow(emoticons))
for(i in 1:nrow(emoticons)){
  emoji.frequency[,i] <- regexpr(emoticons$bytes[i],ds$text, useBytes = T )	
emoji.per.tweet <- rowSums(emoji.frequency > -1)
emoji.indexes <- which( emoji.per.tweet > 0)
emoji.ds <- NULL
for(i in emoji.indexes){
  valid.cols <- which(emoji.frequency[i,]>-1)
  for(j in valid.cols){
    emoji.ds <- rbind(cbind(ds[i,], emoticons[j,]), emoji.ds)


  • ~ 25% of tweets contained emoji
  • 590 unique emoji
  • 7,131 tweets

Top Emoji

Count Img Unicode Desc
1085 U+1F602 face with tears of joy
519 U+1F629 weary face
430 U+1F62D loudly crying face
402 U+1F60D smiling face with heart-shaped eyes
# Possibilities --- * Geo-analysis with [available]( [open data]( - True on-the-fly rendering as map markers (JS is hard) * [Lexical research]( - gender, age - psychological characteristics
# Caveats --- * Estimated only 1% of tweets have location * Emoji meaning vs. use * Inconsistent rendering across platforms
# Resources --- * [Emoji Cheat Sheet]( * [Tim Whitlock: Emoji Utils]( * [CartoDB (Torque.js)](