Decentralized social network wanted
People are getting more concerned with the safety of their personal data
stored at the social network service provider. Many privacy issues such as
private files leakage, manipulation of personal data without permission,
communication censorship have challenged the centralized architecture of
social network. The desire for an architecture where personal data are no
longer concentrated at a central server motivates the development of
decentralized social networks. Diaspora takes the first step by breaking the
centralized server down to a series of regional servers (pods
). A user
registers at a pod that receives and delivers the messages for him. However,
the problem is "what if the user does not trust the pod?". More seriously,
users data at the pod are in plaintext, subject to easier access after attack.
Diaspora does provide an option with higher privacy protection: you can set up your own pod that only serves yourself. If every user applies their own pod, the decentralized social network could have been achieved. Unfortunately, this has never come true. The convoluted installation procedure intimidates most users. This is also true in mainstream P2P platforms such as Freenet, Herbivore, and Kaleidoscope. Normal users can hardly install and maintain these platforms without any technical support. This is the story five to ten years ago. But can we do it now?
With today's advanced platform-neutral, web-based technologies, can we provide the same level of usability in popular social network such as Facebook with enhanced privacy protection? One notable advantage comes from the browser. We believe the current built-in features in browsers are able to accommodate a peer-to-peer network: much more than a simple client-side renderer, nowadays we have WebRTC that enables inter-browser peer-to-peer data channels, and indexed DB that provides persistency as key-value store.
Design of an in-browser social network
Only having P2P data channel and in-browser persistency is far from an operatable social network. We need to design a client-side only logic that is able to orchestrate the social network without a centralized entity. Specifically, we need at least the following three schemes: - A secure signaling and friendship exchange service for the peer-to-peer network that offers both data security & privacy and high deploying scalability. - A mechanism to ensure timeline consistency of posts between sender and receiver without any centralized authority. - An efficient message delivery overlay network that delivers messages to both online and offline friends with low latency.
SAFE store
The Signaling And Friendship Exchange (SAFE) service facilitates peers to exchange friendship requests for establishing shared secrecy, and also exchange network brokering information including metadata, IP address and port number. As SAFE service only provides a platform for peers to exchange information, this service can be hosted in a simple key-value store. Each piece of exchanged data is formatted as a typical json with the key to be only computable and identifiable to the pair of friends. SAFE has no disclose to any user's communication data, friends info, and even the ID. The design of SAFE plays a key role to deliver a private social network.
Timeline consistency
Timeline consistency of messages is resolved by the receiver proactively. When the messages are sent and the receiver is offline, they are maintained in the DHT for a while using replication and persistence. However, after the message time-to-live expires, the message is deleted from the DHT. If the receiver comes online after the messages have been permanently removed from the DHT, then the receiver cannot access them. However, if the sender sends a new message which has message identifier higher than the identifier of the last successfully recieved message, then the reciever detects this gap and requests for the missing messages. In that case, the sender once again introduces these messages in the DHT.
Post delivery overlay network
As a baseline post delivery design, flooding sync tries to propagate posts to as many friends (and friends of friends) as possible. So the chance for a peer to receive her missed posts from a friend when coming back online is increased. DHT based sync exploits the chance of synchronization from other online users than one's friends.
Other considerations
More considerations must be given without the support from a centralized server. First, a flexible bootstrapping service is necessary for users to reach the scripts in the first place. Besides providing an HTTP service, the code can also be placed at public places such as CDN, Github, Dropbox as long as the integrity can be ensured before running (e.g., Firefox supports code signature of Javascript through signing JAR archive.). Second, instead of directly managing the user credential, we can outsource the user credential to existing public key servers like keybase.io. A user's identity is associated with her public key which is accessible from the key server. Third, without the centralized server as the third party authority, users can endorse each other by signing trusted friend's public keys. The establishment of new friendship can be based on the verification of endorsements.
Prototype and preliminary evaluation
We have implemented a prototype in JavaScript. At the peer side, the communication and storage manager are based on the W3C standardized APIs for HTML5: WebRTC and Indexed DB. Therefore, our prototype can support Chrome 23+, Firefox 10.0+, Opera 15+ at the desktop end, and Android 4.4+, Firefox Mobile 22.0+ and FirefoxOS 1.0.1+ for the mobile platform. The crypto module uses OpenGPG for the encryption, hashing, and key signing operations. Peers retrieve others' public key from the public key infrastructure Keybase.io. At the SAFE store side, a simple key-value store is applied using Redis.
We evalute the performance of flooding sync by simulating users with Webdriver. The simulated users login and logout the service periodically. The post arrival latency and the storage overhead of every online activity is recorded. The result of latency and storage overhead is plotted in the following.
The SAFE store performance is measured with two metrics: throughput and capacity. From the real experiment, the result shows that with a single instance of redis, SAFE can process 1,401 login requests per second, 2,307, 4,549, 5,818, and 6,620 as the number of sharding increses in double, respectively. From the result of storing 192,000 records, experiment shows that a each login request consumes 51K bytes. Assuming 64GB of RAM for the SAFE server, a single instance could store upto 1.34 million users.
One might think that replacing Facebook is a pipe dream, but we will take a bold first step to replace a chatting service for closed groups like a web-based IRC. Stay tuned.