Anthony Chivetta resume

Reading: Scaling Social Networks

One problem faced when scaling feed-based social networks (like Twitter) is that there is no partition of users which allows read/write requests to be serviced without touching data on more than one node.

In a series of papers, researchers from Telefonica Research in Spain propose, develop, and prototype a technique which exploits the communities present in social graphs to partition and replicate users such that read and write requests can be serviced with local data.

The authors’ primary argument seems to be that the use of their system allows application programmers to build a scalable, distributed system while treating the system as if it was still monolithic. They accomplish this through the use of middleware that automatically handles replication of data so that all data needed by a particular request is guaranteed to be local.

One shortcoming is that their system does not provide high availability for requests, therefore a server failure causes either some subset of users requests to fail or some requests to fetch data from multiple nodes. In the latter case the database access layer needs to be aware of the architecture so that it can make the distributed requests, breaking the goal of keeping the application unaware of the replication scheme. This lack of high-availability is not due to any fundamental design of the system but rather the lack of a requirement to keep more than one server capable of being a master for a user at a time. More testing should be done to demonstrate that if the system required k such masters per user it would continue to exhibit desirable scaling properties.

Another concern is that some subset of users’ data is replicated to most servers. If total write volume for this subset were to near the capacity of a single server the system would be unable to handle the incoming requests. As a result, writes are still scalability problem for their system. Twitter has experienced considerable growth since their data set was created and so it would be illustrative to test their system against modern day twitter write load.

One potential use of the placement algorithm developed would be as a replacement for hashing schemes in a NoSQL data store. This would help to lower the total number of server from which data must be gathered to satisfy a read request by clustering communities onto smaller numbers together. If a NoSQL data store were to offer a pluggable data placement API, such a social graph aware placement scheme could have the potential to realize significant performance and scalability gains.

blog comments powered by Disqus
Any opinions expressed here are mine only. This site powered by Jekyll.
This site licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.