Marius is a Principal Engineer in Twitter's systems infrastructure group. His interests include large-scale distributed systems, storage systems, indexing systems, data management, distributed coordination, runtimes, profiling tools, networks, network protocols, and functional programming.
Matei Zaharia is an assistant professor at MIT and CTO of Databricks, the company commercializing Apache Spark. He started Spark as a research project at UC Berkeley and has been involved in the big data community since 2007, through projects including Hadoop, Mesos and Shark.
Come to Scala By The Bay well-rested and ready to meet your fellow Scala developers. We'll have two full days of talks (keynotes, full-length, and lightning), a hackathon and an unconference. Before, during, and after the conference enjoy meals with the best-of-SF cuisine (have you tried Ritual Coffee yet?) and participate in SF's famous Friday Night Off-The-Grid Food Truck event.
Stay informed with the Scala By The Bay conference news and event updates.
Functional programming embraces a kind of conceptual purity which often finds itself at odds with the demands of real world systems. Functional programming works best in a kind of ideal computing environment: heaps are endless, file descriptors don’t exist, partial failure cannot happen, and demands can never exceed a system’s capacity. Inevitably, the real world won’t accommodate: as a system scales, so does this apparent rift.Read More
How do we mend this rift? Where do we have to break abstractions (slightly) so that they can deal with real world demands? Better: where can we use FP’s strengths to make the whole situation more manageable? How do we build systems that are at the same time functional — built from simple parts we can combine together, simple to reason about, and reusable — and functional — they stand a chance of being usable in our real, messy environments?
I’m going to talk about our experiences with functional programming (using Scala) at Twitter. Where and how we’ve made productive use of functional techniques; and also where we’ve failed. We’ll examine the development of a few core abstractions, paying particular attention to how they’ve held up to real world demands. Finally, we’ll talk a little about how the language has scaled with our organization — how our people scale informs our use of Scala.
Designing Highly Concurrent, Multi-Protocol, Multi-Tenant Services in Scala
The Firebase servers speak a variety of protocols and handle many separate applications on each server, all at the same time. In this talk, we'll walk through taking a single scala process from serving a single client, a single protocol and a single application up to that same process serving hundreds of thousands of clients, speaking multiple protocols, across hundreds of applications. Along the way, we'll cover popular libraries such as Netty and Akka.
Scala exhibits quite an expressive type system, and we should leverage this type system to it's full extent. In doing so, we gain the ability to statically and provably reason about our programs. In this talk we will explore various methods on how we can use types to our advantage, as well as some hoops we must jump through to not lose this advantage in the Scala language. We will also see examples of these methods in practice by delving into some Typelevel (http://typelevel.org/) projects.
Large scale, real-time stream processing using Spark Streaming
Spark Streaming is a extension to the Spark cluster computing framework that enables high-speed, fault-tolerant stream processing through a high-level Scala API. It builds on a new execution model called "discretized streams" to provide exactly-once processing without the heavy cost of transactions required by previous systems (e.g. Storm), allowing it to process significantly higher rates of data per node while still recovering from faults in seconds. It also greatly simplifies stream programming by providing a set of functional, high-level operators (e.g. maps, filters, and windows) in Scala. Perhaps the most exciting feature of Spark Streaming, however, is that it combines seamlessly with Spark's interactive and batch processing features, allowing ad-hoc queries on stream state and programs that combine streaming and historical data to do online machine learning and graph processing. Spark Streaming scales linearly to 100 nodes and has been used to build applications including session-level metrics reporting and online machine learning.
Scala devops: collaborative development and continuous deployment
It's easy to establish reliable development processes with free Web-based tools. In this talk, we look at how a Scala development team can work collaboratively to build, test, and deploy their software using GitHub, sbt, Travis CI, Coveralls, and Heroku.
Play Framework Constructs: Func-tastic ways of Slicing and Dicing Play
During this talk, Hiren will share powerful abstractions that allow you to build upon the fantastic asynchronous power of the Play Framework. The talk will include service as a function, composable templates, combining monad types, and much more. Come see how it's done and simplify your web application architecture with some of these nifty tools and techniques.
Functional programming is a great computational paradigm which suits many different use cases. Almost none of each is as good match as data analysis. In this talk, Vitaly Gordon, a senior data scientist at LinkedIn and a Scalding contributor would walk you through the different use cases of functional programming for data processing.
Building microservices with Scala, functional domain models and Spring Boot
In this talk you will learn about a modern way of designing applications that’s very different from the traditional approach of building monolithic applications that persist mutable domain objects in a relational database.We will talk about the microservice architecture, it’s benefits and drawbacks and how Spring Boot can help. You will learn about implementing business logic using functional, immutable domain models written in Scala. We will describe event sourcing and how it’s an extremely useful persistence mechanism for persisting functional domain objects in a microservices architecture.
ScalaCheck, the property-based testing library for Scala, is a powerful tool for automating test coverage. Out of the box, you can easily generate gobs of test data and automatically shrink failure cases down to specific causes. Who was ever satisfied with out of the box, though?!? At Reverb, we've been exploring the outer edges of ScalaCheck's capabilities to generate extensive and deep coverage of our code base. We'll walk through some of the techniques we've been playing with and their possibilities, including building complex custom data generators, shrinking smarter, basing data generation on samples from production data, using ScalaCheck to power performance benchmarks, and automatically generating Arbitrary and Shrink instances for case classes using Shapeless.
Programming in Scala, by Martin Odersky et al., is one of the most comprehensive and the de facto reference for Scala. However, the book, originally published in 2008 for Scala 2.7.2, has not been updated since its second edition in 2010, covering up until Scala 2.8.1. In the meantime, Scala had 3 major releases with numerous fresh and advanced features. While we wait for a third edition of Programming in Scala to describe all the latest and greatest Scala features introduced in the last 4 years, this talk presents the main features introduced in Scala 2.9, 2.10, and 2.11: * Parallel collections * Value classes, implicit classes, and extension methods * String interpolation * Futures and Promises * Akka actors * Macros and quasiquotes * Reflection * Modularization * Dynamic types * Error handling with Try * The App trait * New methods in collections * Case classes with more than 22 parameters * Predef.??? * sbt incremental compilation * 2.12 and beyond: what's in Scala future?
Hackathon and Gaming
Saturday August 9th, 2014
Arrival and Breakfast
Come early to get breakfast and get settled for another day of amazing talks!
Next-Generation Languages meet Next-Generation Big Data: Leveraging Scala in Spark
Apache Spark was one of the earliest systems to use Scala for large-scale data processing. While Spark supports APIs in multiple languages, we’ll show how the Scala API in particular benefits from this high-level language to provide an easy-to-use yet efficient programming interface. Spark uses Scala to provide better wrappers over Hadoop data types, provide different operations based on a collection’s type, and offer a simple functional programming interface. Increasingly, it is also using Scala features to optimize operations based on their data types, enable fast serialization (Scala Pickling), and provide database-like query optimization (Spark SQL), all transparent to the user. We’ll illustrate these features through some examples.
This talk will explore the developer experience of using ScalaJS, from the boring-but-important cross-JVM/JS libraries, to pure-Scala client-server web applications, to whiz-bang ScalaJS games and animations. As the person who has written more ScalaJS code than anyone on the planet (!) I will go through the ups and downs of ScalaJS development, and demonstrate why you may want to try it out for your next round of web development.
Demystifying Shapeless: An Exploration of Dependent Types in Scala
I will examine the core ideas used in the implementation of Shapeless and develop an understanding of how to use the library and the key ideas behind it. The goal is demonstrate to developers how they can leverage Scala's more advanced features as way to encode stronger static guarantees about the programs they write. Each concept will be accompanied by pragmatic examples that demonstrate how one can apply these techniques in day to day software engineering.
Distributed Authentication using Reactive Programming in Scala
Modern enterprise identity systems often rely on other systems to perform basic functions such as authenticating a user. Relying on a local database, LDAP system or even a partner system across the Internet means that when those systems become unresponsive, it can have rippling and far-reaching performance impacts across the fabric of your application. We will explore how the use of asynchronous non-blocking or Reactive techniques can overcome such challenges and minimize the impact of downstream service degradation. Specifically, we will compare and contrast a Scala based implementation using Reactive support in the Play framework with a ThreadPool based Java approach to distributed authentication. Comparing the approaches across various dimensions such as throughput, latency and code complexity, will show how the Play framework makes it easy to create an elegant high performance solution. Finally, we look at how Reactive architectures can be applied to a multi-tenanted system allowing us to smooth out resource consumption in a shared environment.
Rapture is a collection of unopinionated Scala libraries for performing everyday tasks like I/O, JSON and XML processing, designed primarily for beautiful and expressive coding. Tasks like reading a file into a string, or copying it encrypted to disk are simple, intuitive one-liners, whether the file is sourced from disk, from the web, from your classpath or from a cloud service, and without ever compromising on type-safety. Faster than one a minute, Jon will cover fifty of the smartest one-liners made possible by Rapture, including the coolest JSON API out there! All the examples will be easy for beginners to understand, but may also show seasoned Scala developers some new tricks...
How LinkedIn Uses Scalding for Data Driven Product Development
Hadoop/Map/Reduce is the most important technology behind Big Data. Scala and Map/Reduce are a match made in heaven. Learn how Scalding, a Scala Domain-Specific Language (DSL) for Hadoop development is used to power data pipelines for some the biggest messaging operations at LinkedIn to make them smarter and more personalized.
The Cake Pattern is a strongly typed solution for Dependency Injection (DI) and Aspect-Oriented Programming (AOP) using only the constructs available in the Scala programming language. The pattern was first explained by Martin Odersky and further illuminated by Jonas Bonér. In this talk, we will demonstrate practical ways to use the Cake Pattern. We will cover ways to handle scope, configuration, initialization, actors, modularization and how to manage the proliferation of boilerplate and obtuse compiler errors.
Dropwizard is a framework for creating RESTful APIs quickly and easily. It pulls together stable, mature libraries from the Java ecosystem into a simple, light-weight package that lets you focus on getting things done! In this talk, we will introduce many the core concepts of Dropwizard by walking through a sample application written in Scala.
Stitch, an Applicative Functor for Composing RPC Services
In a service-oriented architecture (such as Twitter's), application code must orchestrate RPC calls to dozens of services. Good RPC interfaces tend not to be good application interfaces—they are driven by performance needs such as batching, and are limited by the least common denominator of language-agnostic RPC IDLs (e.g. Thrift). This mismatch can lead to serious architectural pain. Read MoreStitch is a Scala library used at Twitter which implements an applicative functor for composing RPC service calls (drawing inspiration from Facebook's Haxl), along with per-service adaptors which layer nice application interfaces over raw RPC interfaces. Stitch allows application code to be written in a clear, modular way, and also executed efficiently at the RPC level.
Boosting Enterprise Software Innovation with Scala
So you love Scala, you're convinced of its benefits and want to use it at work to build key business-critical technologies. In fact, you think your entire team should adopt it! Yet you've already got ongoing project commitments and roadmap you're working against. So how exactly do you convince your boss, executive team, and fellow engineers that this switch will be a great one for the business? Read MoreIn this presentation, I'd like to share: how and why we selected Play and Scala to build a document sharing and collaboration platform to process, render, index contents of millions of document pages per day in near real time how over the last 2.5 years we transformed Nitro from a .NET shop to a Play 2 Scala shop how we transitioned existing web apps and cloud services for millions of monthly paid users relying on Nitro for key business workflows without ever skipping a beat how we used our new Play 2 Scala Nitro Platform to launch several brand new services to existing millions of users and how we prepared for that scale how we used new technologies to attract great talent and how we used their help to retrain existing staff while working on rolling product releases an in-depth look at key architecture decision that helped us rebuild without losing development speed lessons learned on how Play 2 Scala framework and related Scala technologies can improve engineering team agility, final product stability, and why it is an investment that will pay back tenfold. This talk will help arm you in your pitch to convince business owners to invest into re-platforming on Play 2 Scala, excite existing staff and retrain them, hire new talent, and doing so while serving growing business needs.
Kiji and Scalding: The Easy Way to Get Started with Data Science at Scale
While there are now many routes to start learning data science on a laptop, making the jump to cluster sized data remains a hurdle. Using the Kiji project to get started, we will explore how to easily make the leap to big data. On this journey we'll explore Scalding, a Scala DSL that allows easy data manipulation and data science..
In this talk, we'll explore how Scala + Scalatra + Swagger can provide a fast, clean and maintainable interface between consumers and your back end. We’ll walk through the design and implementation with design-first tools along with the clean REST API of Scalatra.
Run Applications Like a Boss: Fault-Tolerant and at Scale with Marathon
Gone are the days where you could build an app that only needs to run on a few servers. Now every app needs to be like Google: highly-distributed, fast, scalable and fault-tolerant. Devops everywhere have been wasting time building their own custom scale-out architectures for web apps. Marathon is a framework written in Scala and built on Apache Mesos that simplifies and automates operations, and provides a simple self-serve interface for developers to launch their apps on a shared cluster in a scalable and fault-tolerant way.
From code to dashboards: Monitoring a Reactive Application with Kamon.
During this talk we will take a very simple Spray/Akka application and start getting metrics out of it using Kamon! While doing so, we will explore the challenges associated with gathering metrics from reactive applications that might process millions of events per second and how Kamon solves these problems.
The Java platform defines a notion of universal equality in the equals method of java.lang.Object: because the type of the parameter passed to equals is java.lang.Object, any object can be compared for equality with any other object. This allows different types that represent the same (or compatible) concepts to be compared for equality (such as Int and BigInt, or Vector[Int] and List[Int]), but also allows comparisons to compile that will always fail at runtime (such as Int and String). Read MoreThis talk will show the approach taken by the Scalactic library (scalactic.org) to provide a === operator that allows cooperative equality between different types where appropriate, but rejects at compile time equality comparisons that would always fail. You'll see how Scalactic uses implicits, type classes, and higher kinded types to achieve its safe equality comparisons and how the ScalaTest library uses Scalactic to provide type-safe equality comparisons in tests.
Fighting spam is critical for a clean user experience. At Tagged, the security team leverages open source software like Kafka, Storm, Scalding, and Play to deploy our spam models. Our real-time system is powered by Kafka and Storm and uses minimal resources to process over one billion daily events within one second of each event’s occurrence. Scala is the glue binding the many technologies in this and other systems, enabling Tagged to deploy complex models iteratively and quickly.
Contact Us about student discounts for the conference.
Be a supporting member of San Francisco's premier Scala conference. We want to hear from you! Contact us for a prospectus and sponsorship agreement, or to talk about how we can help you be a contributing sponsor for the Scala By The Bay conference!