Open Source at Twitter

Welcome to the second post of the series covering top open source projects by most popular companies in the world. Last time, we covered Facebook’s contribution to open source community and in this article, we are going to highlight most popular open source projects made by twitter engineering team.

Twitter engineers use, contribute to and release a lot of open source software and their GitHub repo is a proof of that. Twitter has 131 public repos, maintained by 116 active twitter members around the globe. Twitter as such started as a simple Ruby on Rails application but soon realized that – to meet the scale demand of twitter, they need to re-invent and revamp the entire platform. And while doing that, they had implemented and open sourced many great projects. Twitter also maintains an open source community Twitter handle @TwitterOSS.

Top Twitter open source projects are:

Scalding

Scalding is a Scala library that makes it easy to specify Hadoop MapReduce jobs. Scalding is built on top of Cascading, a Java library that abstracts away low-level Hadoop details. Scalding is comparable to Pig, but offers tight integration with Scala, bringing advantages of Scala to your MapReduce jobs.

Finagle

Finagle is used in production at Twitter (and many other organizations), and is being actively developed and maintained. Finagle is an extensible RPC system for the JVM, used to construct high-concurrency servers. Finagle implements uniform client and server APIs for several protocols, and is designed for high performance and concurrency. Most of Finagle’s code is protocol agnostic, simplifying the implementation of new protocols.

Finatra

Finatra is a sinatra-inspired agile web framework for scala, it runs on top of Finagle.

Ambrose

Twitter Ambrose is a platform for visualization and real-time monitoring of MapReduce data workflows. It presents a global view of all the map-reduce jobs derived from your workflow after planning and optimization. As jobs are submitted for execution on your Hadoop cluster, Ambrose updates its visualization to reflect the latest job status, polled from your process.

ambrose UI

Parquest

Parquet is a columnar storage format that supports nested data. Parquet metadata is encoded using Apache Thrift. Twitter created Parquet to make the advantages of compressed, efficient columnar data representation available to any project in the Hadoop ecosystem.

Summingbird

Summingbird is a library that lets you write MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed MapReduce platforms, including Storm and Scalding.

Bootstrap

Bootstrap is a well-known web UI framework used by thousands of companies today. It is a sleek, intuitive and powerful mobile first front-end framework for faster and easier web development.

Bower

Bower is a package manager for the web originally created at Twitter and adopted by a large community. It offers a generic, un-opinionated solution to the problem of front-end package management. It exposes the package dependency model via an API, that can be consumed by a more opinionated build stack. Bower runs over Git and is package-agnostic. A packaged component can be made up of any type of asset and use any type of transport.

Flight

Flight is a lightweight, component-based, event-driven JavaScript framework that maps behavior to DOM nodes. It was created at Twitter, and is used by the Twitter.com and TweetDeck web applications.

Typeahead

Inspired by twitter’s autocomplete search functionality, typeahead.js is a flexible JavaScript library that provides a strong foundation for building robust typeaheads.

Twemcache

Twemcache  is the Twitter Memcached. Twemcache is based on a fork of Memcached v1.4.4 that has been heavily modified to make to suitable for the large scale production environment at Twitter.

FlockDB

FlockDB is a distributed graph database for storing adjacency lists. FlockDB is much simpler than other graph databases such as neo4j because it tries to solve fewer problems. It scales horizontally and is designed for on-line, low-latency, high throughput environments such as websites. Twitter uses FlockDB to store social graphs (who follows whom, who blocks whom) and secondary indices. As of April 2010, the Twitter FlockDB cluster stores 13+ billion edges and sustains peak traffic of 20k writes/second and 100k reads/second.

Check out our other articles on open source projects here.