Starting at 4:28 PM May 19, 2012, I posted on my Sina Weibo account two names as well as the Chinese words for “Taiwanese independence.” The first name I posted was “Chen Guangcheng” (in English and Chinese), the blind lawyer who escaped house arrest in Shandong province and made his way to the U.S. Embassy in Beijing. The second name was “Bo Xilai” (in English and Chinese), the former Party Secretary of Chongqing who recently fell from power. Less than 14 hours later, I received a message from Sina Weibo’s system administrator informing me that my two posts on “Chen Guangcheng” were “inappropriate” and had been censored. While I can still see the two “Chen Guangcheng” posts on my Sina Weibo account page, no one else can. Surprisingly, my posts on “Bo Xilai” and “Taiwan independence” were not censored.
Herein lies the conundrum with censorship in China. We know that certain topics are censored from blogs hosted in China, Chinese search engines and Weibos. But we don’t know where the line lies. Part of the reason is because the line is constantly moving. Baidu, Sina and Tencent could help identify the line by publishing a list of banned topics or keywords, but they don’t. Rather, they hire “monitoring editors” and rely on self-censorship to ensure that user generated content does not run afoul of Chinese authorities.
Some computer scientists in academia have tried to make sense of censorship in Sina Weibo by analyzing the data. In March 2012, David Bamman, Brendan O’Connor and Noah Smith at Carnegie Mellon University published a paper entitled “Censorship and deletion practices in Chinese social media” in First Monday after analyzing 56 million Sina Weibo messages and found that more than 16% had been deleted. King-Wa Fu and Cedric Sam at the University of Hong Kong’s Journalism and Media Studies Centre have hacked the Weibo Scope Search that archives deleted posts on Sina Weibo.
For my MIT Media Lab final project, I’ve tried to build on King-Wa Fu and Cedric Sam‘s work by analyzing the data collected from the Weibo Scope Search to try to make some sense of Sina Weibo censorship. Since its inception February 1, 2012 to May 20, 2012, the Weibo Scope Search has collected 12,032 deleted messages from Sina Weibo. The first thing I did was to simply plot all the deleted messages on a timeline from February 1, 2012 to May 20, 2012 and this is what I got:
My findings were consistent with the Carnegie Mellon team’s findings. There are spikes in Sina Weibo censorship as a result of media reports and rumors. During the Carnegie Mellon survey duration from June 27, 2011 to September 30, 2011, there was a rumor that former President Jiang Zemin passed away causing a spike in Sina Weibo deletions. From February 1, 2012 to May 20, 2012, the following incidents in China caused in censors employed by Sina Weibo to work overtime:
Interestingly, deletion of Sina Weibo messages tend to hit a low on Saturdays. I’m not too sure why that is except that maybe censors want to take time off on weekends as well. If you want to maximize the length of time your message will remain on Sina Weibo, probably the best time is to post the message after 11 PM Friday night.
The second analysis I did with the Weibo Scope Search was to try and figure out how long it took the censors to delete messages on Sina Weibo. Each Sina Weibo has a time stamp for when it was created. The Weibo Scope Search checks Sina Weibo‘s timeline at most four times a day (but usually less due to limits that Sina Weibo imposes). Let’s say for instance, a user posts a message on Sina Weibo at 8 AM. Weibo Scope Search checks Sina Weibo‘s timeline at 9 AM, 3 PM, 9 PM, and 3 AM. If the message was deleted by the censor at 10 AM, it would show up on Weibo Scope Search‘s “deleted time” as 3 PM.
The fastest a post was deleted on Sina Weibo was just over 4 minutes. The longest time it took for the censor to get around deleting a message on Sina Weibo was over four months. For the posts created on May 20, 2012 and deleted on the same day, it took on average 11 hours for Weibo Scope Search to detect the deletion. It took the censors about 14 hours to delete my post “chen guangcheng.” Determining the average time it takes for censors to delete “irresponsible” messages is a bit tricky since we don’t have data on exactly how long it takes for each post to be deleted. Out of curiosity, I pulled up three messages that took over four months to delete to see what they said:
I’m not too sure why it took so long to delete the posts. Cedric Sam points out that the posts may have been in the Weibo Scope Search database to begin with and they just didn’t turn up until several months later. The researchers at University of Hong Kong’s Journalism and Media Studies Centre are constantly adding new Sina Weibo to their list. Or, they could have just turned on the deletion marking system in the Weibo Scope Search so that it would have caught some censored posts that weren’t caught before.
To be sure, there is no way to tell for sure whether some of the posts were deleted by the users themselves instead of “monitoring editors.” Sina’s API returns two types of error messages: “Weibo does not exist” and “Permission denied.” We assume that when a post is deleted by the user, the “Weibo does not exist” error message comes up. When a post is censored, the “Permission denied” error message comes up. Weibo Scope Search keeps track of all the deleted posts that have the “Permission denied” error message.
If I had more time (and knew how to code), I would have liked to have analyzed more of the data that Weibo Scope Search came up with. Among the things I would have liked to explore are:
King-Wa Fu and Cedric Sam at the University of Hong Kong’s Journalism and Media Studies Centre have built a WeiboScope Search that sends all of the deleted Weibo posts to a server in Hong Kong and stores them. However, the data is in JSON format, which looks like this:
To make sense of the data collected, we need to first clean up the data. I used Google refine to clean up the data by:
Now that we have the data formatted, we want to make sense of it.
The first project I did was to graph the deleted weibos on a timeline. My classmate Eugene Wu suggested that the best software to visualize the data is Tableau.