By Brad Rougeau, Lead Developer of Viewcount
In 2012, a group of us from the Networks Research Group at the University of Calgary began looking at social networks such as Facebook and Twitter, and exploring how their architecture could be modified to reduce the traffic that they generate. Facebook has indicated that 350 million photos are posted on its network each day, while research by comScore suggests that social networking activity accounts for 1 in every 5 minutes spent online.
Given this information, and the rapid growth in the size of content typically shared (Facebook just announced support for sharing 3D immersive videos, for example), we wanted to find a way for these social networks to provide content using the most efficient delivery system possible. As a result, we came up with a peer-to-peer system, where users can cache photos and videos that are determined by an algorithm to be most viewed by their friends. The cache would then serve those photos and videos directly to the friends. This would reduce the load placed on the social network's servers and content delivery networks, as the users would take on the task of uploading this media. In addition, we hypothesized that the majority of friends in social networks are geographically close and, as a result, would often use the same ISP, reducing the overall network traffic generated. This could dramatically reduce the amount of electricity required to power these online social networks.
To accurately simulate or build such a system, we needed to come up with an intelligent caching algorithm that would decide which photos and videos are most likely to be viewed by others, and get the user to cache that media. Since this system would be implemented on existing social networks, it made sense to study the existing large amount of social and demographic data that these networks already possess about their users. Unfortunately, we found that these types of datasets were difficult to find, and what was available was either very limited or outdated.
|
The Viewcount interface |
While this lack of data was unfortunate, it was also unsurprising. Researching large scale social networks in the real world is a massive challenge for a number of reasons, including: privacy concerns, lack of research support from companies running large social networks (occasionally there is outright opposition), informed participant consent (necessary as the research is being performed on humans), achieving an unbiased sampling of users, etc. While the development of an entirely new social network from the ground up — for the sole purpose of research — could solve most of these problems, this is difficult to implement. Social networks typically require large amounts of storage for user-generated content, which is impractical for a research project. We needed to create our own social network that could collect the necessary data, while somehow sidestepping the large storage costs. By utilizing Cybera's Rapid Access Cloud and leveraging Facebook's Platform, we were able to create such a system, called Viewcount.
Viewcount allows users to see the viewing statistics of their Facebook photos and videos. Viewer demographic information and other values are updated daily to indicate how popular each user is. When people visit Viewcount, any request they make is sent to our site, which runs on the Rapid Access Cloud (which is a free cloud resource available to Alberta researchers). At that point, our servers will query the Facebook servers to gather whatever information is being requested, such as number of views of a video posted on a specific Facebook profile page. The response from Facebook is then processed on our server, and it generates a new page showing the query results.
We're able to avoid storage issues by utilizing photos and videos that Facebook is already storing for each user. We don't store any media locally within our database, but instead link directly to the Facebook-stored photos and videos. That way, we only have to store user information and photo/video 'metadata' on our local database, which we can easily do on the Rapid Access Cloud.
|
Order of requests/responses whenever a new page is loaded |
We have spent a lot of time trying to make Viewcount fast, reliable, and easy to use, as well as interesting. The Rapid Access Cloud has been essential to this process, as it gives us full control of our instances and the allocation of resources to these instances.
Without getting into the technical details, here is a short summary of our system: we have one load balancing server which receives all requests and forwards them on, three servers to perform the processing described above, and one server to store the Viewcount database. This setup has been cloned on both the Edmonton and Calgary Rapid Access Cloud regions. We do periodic checks and backups so that if one location goes down, we can migrate to the other location.
Prior to working with Cybera, we were running a single server on the University of Calgary's Computer Science servers. While this was okay for development, we found it to not be reliable enough for hosting a live website. The server would occasionally go down for long periods of time, and was slow to configure, as we needed to contact the tech support staff whenever a new package was needed, or if changes were required to our Apache configuration (the site was running through CPanel).
The Rapid Access Cloud gave us the freedom to fully customize our instances, and the reliability of having two disparate regions for fault tolerance. In addition, Cybera staff have been excellent at sending us advanced warning before any maintenance occurs, and keeping us updated when things (very rarely) go down unexpectedly. They have also been great about providing us with extra resources when they were needed, and meeting us in person to discuss what resources we would need.
As with any social networking application, we need to draw users into Viewcount, as that is the only way we can collect data. The biggest step we have taken to do this is in the core design of the application. While I could go on about the process and iterations we went through to design a responsive, workable, and easily understandable social networking interface, I'll cut it short: we received lot of help from quite a few people at Cybera, the University of Calgary, and other places to design and test the user interface. We have also received a large amount of help from Cybera to create a promotional video for Viewcount. We plan to include a link to that video in the Viewcount posters that we will be putting up on campuses in Calgary and possibly other Western Canadian cities to make users aware of Viewcount.
|
The poster we will be using to advertise Viewcount |
Currently, our biggest issues are drawing in and retaining new users. We only currently have about 100 people using the tool so far. Anecdotally, we have found that people are wary of using any application that tracks them, and understandably so. Facebook's analytics back this up as well, as we have seen that only about 40% of users who are seeing the Facebook permissions dialogue for Viewcount (requesting permissions to receive their photos and videos, and the various demographics we track) are approving it. To try to help with this and make our potential users more informed, we have created this page.
Unsurprisingly, we have also found that users are unwilling to use the app because they don't want others to know when they are 'creeping' on them. In response to this, we have modified some of the demographics recorded to make them more vague (e.g. showing age range rather than specific age), in order to make it more difficult to identify specific viewers. We also send new and/or inactive Viewcount users a Facebook notification if their influence value increases that day. And we have created a monthly contest allowing the most popular Viewcount users to win an Amazon gift card, which we hope will motivate users to actively use Viewcount more (and encourage their friends to do the same). Details for this contest can be found here.
My personal hope for our project is to refocus social network research into the "social" aspect. While there are numerous exceptions, much of the recently published work on this topic has ignored the human factor. I believe that some very interesting networking algorithms specific to social networks can be developed by factoring in human decision making.
While our specific interest is in reducing and optimizing the traffic flows generated by these social networks, I hope that by publishing our anonymized data, we can encourage future research into social networks to consider the sociological aspects inherent to these networks.
If you'd like to install Viewcount to your Facebook account, please visit our website.