Rich Data Summit

The leading conference focused on turning big data into rich, meaningful data

October 14, 2015 / The Village, San Francisco

Buy Tickets

Update: Our schedule has been finalized


Join world-class speakers, data scientists, investors, and data geeks of all stripes at the Rich Data Summit, the world's leading conference focused on turning big data into rich, meaningful data.

The Rich Data Summit will feature engaging presentations, panel discussions, smaller breakout sessions, and workshops, as well as a lively data solutions expo full of exhibitors and sponsors. Located in the heart of San Francisco, the Rich Data Summit is a one-day event, with keynotes by Nate Silver, Beth Simone Noveck, and Monica Rogati.

Presented by

The leader in people-powered data enrichment.



  1. Day One: October 13, 2015

  2. 6.00 pm - 8.00 pm
    Welcome Reception & Registration

    Avoid the crowds on October 14th and register early, while also meeting other like-minded data-driven professionals at our welcome reception.

  1. Day Two: October 14, 2015

  2. 8.15 am - 9.15 am
    Registration, Breakfast & Data Solutions Exhibition Opens
  3. 9.15 am - 9.45 am
    Welcome & Opening Keynote: Lukas Biewald, Founder & CEO, CrowdFlower
    • lukas-small
  4. 9.45 am - 10.30 am
    Summit Keynote: Nate Silver – The Signal and the Noise
    • natesilver-small

    Nate will chat about his experiences forecasting politics and baseball and how the use of powerful clean data both paves the way to real answers and predicitions and saves us from the chattering pundits and talking heads.

  5. 10.30 am - 10.45 am
    Fireside Chat: Nate Silver & Lukas Biewald
    • lukas-small


    Join Nate and Lukas as they talk about why rich data is taking the place of big data.

  6. 10.45 am - 11.15 am
    Andreas Weigend – The Social Data Revolution
    • natesilver-small

    Humans have always been a social species, but as we become increasingly technological, our everyday interactions are being both implicitly and explicitly recorded as data points. Andreas's talk will look at how this data of the people, created by the people, can best be used for the people. In other words, what exactly do we do with all this new data? What are the implications of recording our every interaction? And where do we draw the lines?

  7. 11.15 am - 11.30 am
    Morning Break
  1. Discovery Stage

  2. 11.30 am - 11.45 am
    Alice Zheng – The Unbearable Richness of (Multi-Modal) Data

    Big data is not just big, it's also rich in diversity. Alice Zheng, Director of Data at Dato, will illustrate the use of multi-modal data in machine learning (image, text, and user-item interactions) and how these successive layers of data will take us from finding similar items to building a personalized recommender.

    11.30 am - 11.45 am
    James Rubinstein – Crowdsourcing for Fun and Profit

    James has run massive data initiatives at Apple, eBay, and now Pinterest. In this practical talk, James will be drawing on his background in experimental design and statistics to show some best practices for large-scale categorization efforts.

  3. 11.45 am - 12.00 pm
    Silvanus Lee – Data Science at Uber

    Silvanus will talk about how data science informs all aspects of Uber's product - from optimizing the app experience for riders and drivers, to using statistical models to understand marketplace dynamics.

    11.45 am - 12.00 pm
    John Stokvis – Reframing the Problem

    Learn how John (Lead Analyst, Global Merchant Data) and the team at Groupon made huge strides, simply by thinking differently about the problem they were trying to solve.

  4. 12.00 pm - 12.15 pm
    Josh Wills – Defensive Data Analysis w/Apache Spark

    Dirty data is a fact of life for the vast majority of data scientists. Josh will explain how experienced data scientists analyze data defensively, anticipating potential causes of invalid data and adding safeguards to their code to catch errors early in the process.

    12.00 pm - 12.15 pm
    Chris Lightner – Real-World Sentiment Analysis

    Chris will highlight ways to leverage the crowd to create efficiencies and optimize the process of extracting insights from online conversation, social buzz and media coverage. This session will showcase specific case studies where the crowd was used to optimize product launch announcement schedules and in-market campaigns in real time, understand purchase intent of products, and much more.

  5. 12.15 pm - 12.30 pm
    Wendy Kan – Can Data Science Really Do That?

    Kaggle has worked on problems ranging from predicting when somebody will have a seizure to using algorithms to grade high school essays. This talk will draw on Wendy's experience as a Kaggle Data Scientist to survey what's possible at the limits of machine learning.

    12.15 pm - 12.30 pm
    Radha Basu – Human-Empowered Data

    Radha, founder and CEO of iMerit, will tell the story of how she built one of the leading digital services companies, that delivers 95%+ customer satisfaction ratings, by sourcing and upskilling employees from the unlikeliest talent pool - marginalized youth and women across India.

  6. 12.30 pm - 1.30 pm
    Lunch & Data Solutions Exhibition
  7. 1.30 pm - 1.45 pm
    Benn Stancil – How to Make Rich Data Meaningful with Storytelling

    Data analysis is about so much more than the numbers. To tell a compelling story with data, you must understand your audience, the context from which the data originates, and countless other nuances. What can data analysts and data scientists embedded in organizations learn from journalism about telling better stories with data?

    1.30 pm - 1.45 pm
    Bruce Smith – How to Get Buy-In for Crowd-Based Data Science

    Every company loves data science these days, but data scientists often meet skepticism when they employ novel tools. In this talk, Bruce shares lessons about getting buy-in to use crowd-based data for search relevance, from everyone in your organization, not just the C-suite.

  8. 1.45 pm - 2.00 pm
    Ian Paterson – Monetizing Data: What's Really Possible

    Companies are creating data exhaust as a by-product of doing business, but few are taking that data and extracting value on the back end. Ian's talk will focus on what possibilities exist to monetize data and how to get started.

    1.45 pm - 2.00 pm
    Tim Converse – Human-in-the-Loop Search Relevance

    Learn how Tim builds world-class internal search engines by combining the best of human and machine intelligence. Tim will cover how to create training sets with real people, optimizing based on the right metrics, and why search is so important to ecommerce.

  9. 2.00 pm - 2.15 pm
    Catherine Bracy – Civic Hacking

    Catherine will draw upon her wealth of experience from Obama for America and her current roll in Code for America to show how democratizing engineering opens doors that would previously be shut, as well as how tech can help fix what's broken in government.

    2.00 pm - 2.15 pm
    Daniela Braga –  Can Crowdsourcing and Data Privacy Coexist?

    What we say and feel gets often captured by our connected devices and we need this data to improve people’s lives. In the speech technology world, this is as especially sensitive issue since users are often not aware that their data may be analyzed. Daniela will show how that data can be both valuable and private.

  10. 2.15 pm - 2.30 pm
    George Mathew - Analytic Enrichment in the Wild

    The 2012 Elections were a turning point for Analytic Enrichment. Never before as so much behavioral data been enriched with demographic and vote preference data to create a richer customer analytic profile for targeting. The Left stole the election through better profiling and targeting. In 2016, the Right has learned their lesson and is coming back in spades. Join the talk to hear more about the Analytic Enrichment process for 2016 Elections and several other use-cases.

    2.15 pm - 2.30 pm
    John Stafford – Managing Your Reputation in the Era of Big Data

    Stanford University is one of the world's leading teaching and research institutions and, as a result, it is also one of the most discussed universities on the social web. In this session, John will show how Stanford uses data enrichment to track its reputation, perform sentiment analysis, and test messaging on social networks.

  11. 2.30 pm - 2.45 pm
    Afternoon break
  12. 2.45 pm - 3.00 pm
    The State of Data Science
    • lukas-small
    • lukas-small

    Data Science is a fast-moving industry subject to constant change and innovation. Come hear the perspectives from a panel of CrowdFlower customers and data science practitioners on the state of data science. The participants will be Bruce Smith (Data Scientist at Intuit) Daniela Braga (Director of Data Science & Crowdsourcing at Voicebox), Tim Converse (Head of Search Science Engineering at eBay) and the session will be moderated by Jack Shay (VP Product, CrowdFlower).

    2.45 pm - 3.00 pm
    Eric Schles – Building a System that Understands Slavery

    In this talk Eric will go over his time at the manhattan DAs office and explain how he found patterns of slavery. He'll be demoing how to integrate closed and open data sources using webscraping, natural language processing, address identification, and data transformation - specifically pulling information out of pdfs in one complete system.

  13. 3.00 pm - 3.15 pm
    Customer Panel – Continued
    3.00 pm - 3.15 pm
    David Gerster – Anomaly Detection using the Isolation Forest Algorithm

    Anomaly detection can provide clues about an outlying minority class in your data: hackers in a set of network events, fraudsters in a set of credit card transactions, or exotic particles in a set of high-energy collisions. In this talk, David will analyze a real dataset of breast tissue biopsies, with malignant results forming the minority class.

  14. 3.15 pm - 3.30 pm
    Robert Munro – Adaptive Learning: The Next Generation of Machine Intelligence

    Adaptive learning optimizes both human and machine learning, to maximize the accuracy of scalable data processing as efficiently as possible. For the auto-industry, adaptive learning is 95% accurate in predicting when social media communications are about purchasing cars, which correlates with actual monthly car sales for the majority models of cars we have investigated.

    3.15 pm - 3.30 pm
    AJ Welch – Relational and Non-Relational Databases

    Relational databases have long been an integral tool in producing rich data, however, over the past decade we've seen an incredible rise in non-relational technologies. AJ will talk about how open source databases like Postgres make it possible to get the best of both worlds. 

  15. 3.30 pm - 3.45 pm
    Richard Socher – Deep Learning for the Enterprise

    Deep Learning has revolutionized several industries with its state of the art results in speech recognition, image classification and natural language understanding. Richard will cover solutions for visual object classification in images, sentiment classification, and automated question answering and marketing analysis.

    3.30 pm - 3.45 pm
    Raul Garreta – Natural Language Processing and Human Training Sets

    In this talk, Raul (CEO & Co-Founder of MonkeyLearn) will talk about how data enrichment and machine learning platforms best complement each other. Specifically, Raul will look at how human-tagged sentiment data can be used to train and test machine learning algorithms.

  16. 3.45 pm - 4.00 pm
    Amanda Kahlow – The Dawn of the Data-Driven Enterprise

    Enterprise companies have untapped potential when it comes to utilizing data to drive better decision-making, but that’s where predictive intelligence comes into play. It enables enterprise marketing and sales teams to become data-driven ones. Amanda will touch on how every part of an organization can use predictive intelligence to succeed.

    3.45 pm - 4.00 pm
    Gregor Stewart – How Humans and Machines Can Work Together

    Gregor (VP Product Management of Basis Technology) will talk about Basis Technology's approach to using data from your own domain, combined with human feedback to adapt an off the shelf model to deliver better results in less time. Specifically, Gregor will look at how both unannotated and annotated data can be used to improve entity extraction models.

  17. 4.00 pm - 4.15 pm
    Afternoon break
  18. 4.15 pm - 4.45 pm
    Fireside Chat – Data Preparation and Data Science
    • lukas-small

    Join Ben Lorica (Chief Data Scientist at O'Reilly), Lukas Biewald, (CEO of CrowdFlower), and Joe Hellerstein (Co-Founder of Trifacta) as they discuss the state of data preparation today and where they see it moving tomorrow.

  19. 4.45 pm - 5.15 pm
    Closing Keynote – Beth Simone Noveck – The Power of Open Data and Open Government
    • bethnoveck-small

    Beth will draw on her wealth of knowledge from years as the deputy CTO for open government to show how open data transformed government policy.

  20. 5.15 pm - 5.45 pm
    Closing Keynote – Monica Rogati – Context, Data, and Connected Things
    • monica-small

    We increasingly expect the world around us to be “smart” and seamlessly adapt to our taste and habits. These expectations are no longer about better ads and search results, but about everyday objects that are intelligent, connected and integrated into our lives. Monica will focus on the role that data plays in enabling us to understand context—with examples drawn from wearables, health and recommender systems.

  21. 5.45 pm
    Closing Remarks
  22. 6.00 pm
    Closing Reception


Both Galvanize and Metis will be holding workshops on the third floor near the Applications stage. Here's what will be covered:

12.45 - 1:30PM

Galvanize Workshop

Better Search at Scale: Leveraging Spark for Contextual NLP
Hosted by Johnathan Dinu

Search allows us to turn the wealth of information on the internet into knowledge, or something we can actually use. In this workshop, you will see how to return more relevant results by contextualizing text with with neural word embeddings (word2vec and doc2vec).

1.45 - 2:30PM

Metis Workshop

Going Where the Unicorns Roam:
Entering the Field of Data Science
Hosted by Jason Moss

Metis runs immersive courses, designed by world-class industry practitioners, with the aim of accelerating data science careers. This course will focus on unlocking the skills younger data scientists need to break into the industry.

Languages, Topics, And Tools

The Rich Data Summit is about practical solutions for data scientists. Which is to say, we spend too much time cleaning data and too little time actually analyzing it. Here are some of the languages, tools, and topics we'll be covering to give data scientists more time to do the work they really love doing.


  • R
  • SQL
  • Python
  • SAS
  • RapidMiner
  • Julia


  • Data Enrichment
  • ETL/Blending
  • Data Integration
  • Predictive Modeling
  • Machine Learning


  • Spark
  • Hadoop
  • MongoDB
  • Redis
  • Kettle



Rich Data Summit Full Pass (October 14th, 2015)

Full access to summit | Workshop Sessions | Cocktail Receptions | Data Solutions Exhibition


Keynote and Data Solutions Exhibition Only (October 14th, 2015)

Morning Keynote | Afternoon Keynote | Data Solutions Exhibition | Cocktail Receptions


Student Ticket (October 14th, 2015)

Full access to summit | Workshop Sessions | Data Solutions Exhibition 


Buy Tickets


  • Where does the Rich Data Summit take place?

    The Rich Data Summit takes place at The Village located at 969 Market Street, San Francisco, CA.

  • When is the Rich Data Summit?

    Rich Data Summit is taking place on October 14th, 2015.

  • What is the format of the Rich Data Summit?

    There will be two tracks at the Rich Data Summit. A conference track with engaging presentations and panel discussions and smaller breakout sessions where you can see and hear more practical examples from data scientists in the trenches.

  • How do I get tickets?

    Simply select your ticket and register on the Eventbrite website.

  • When can I register and pick up my pass?

    You will have two opportunities to register and pick up our summit pass. The first will be on October 13th right before the cocktail reception or on October 14th starting from 7.30am.

  • Are there social events in the evening before the event?

    There will be a welcome cocktail reception on October 13th and then a closing reception on October 14th.

  • I'm interested in sponsoring the event, who do I contact?

    For sponsorship queries, please contact

  • I'm interested in a media pass for the event, who do I contact?

    For information on media at the Rich Data Summit, please contact


Questions regarding sponsorship?

Sponsorship offers you the opportunity to build your brand and connect with 500+ chief data officers, data scientists, data driven executives, investors, entrepreneurs and other data heads. 

Interested in speaking?

We'd love to hear from data driven professionals who love a crowd. To submit your proposal, we'll need a working title, brief description, name of the presenter, and any live examples you can share of previous speaking engagements. 

Submit your proposal

Direct all general inquiries here.

Questions about parking, location, arriving in San Francisco, what to expect, speakers, setting up meetings with CrowdFlower, what will be provided on site, or for any other sticking points? Just reach out to our friendly team and we'll get back to you. 

Know someone who would be interested in attending the Rich Data Summit?


Rich Data Summit

October 14, 2015 / The Village

969 Market Street
San Francisco, CA 94103