Python is the perfect language to build with little effort a framework to control and hack the growth of a company. Being import.io data scientist for the last 2 years, I've come across many differ...
Today data is generated in greater volumes than ever before. In addition to vast amounts of legacy data, new data sources such as application logs or social media complicate data-processing challen...
Pillreports.net is an on-line database of reviews of Ecstasy pills. In consumer theory illicit drugs are experience goods, in that the contents are not known until the time of consumption. Websites...
In this talk I will present experiences of using a combination of Hadoop and Python to build pipelines that process large amount of textual hotel reviews in more than a dozen a languages. In parti...
Classification, in the context of machine learning, deals with the problem of predicting the class of a set of examples given their features. Traditionally, classification methods aim at minimizing...
In this tutorial we will give an introduction to two advanced data storage formats. HDF5 and NetCDF were designed to efficiently store the results of supercomputing applications like climate model ...
Apache Spark is a computational engine for large-scale data processing. It is responsible for scheduling, distribution and monitoring applications which consist of many computational task across ma...
Probabilistic Programming and Bayesian Methods are called by some a new paradigm. There are numerous interesting applications such as to Quantitative Finance.I'll discuss what probabilistic program...
T. Davenport and DJ Patil have pointed out already in 2012 that Data Scientists are working in the “sexiest job of the 21st century”. Although there are plenty of imaginations what a Data Scientist...
A very simple tutorial, ideally aimed at beginners, for both Docker and scientific Python, who wish to learn the basics to be able to create and manage their own development environments, using Doc...
A lot of studies have investigated lately how happy and engaged people are at work. They found that a big influencer is the team atmosphere and the relationship you have with your boss. Being engag...
For all e-commerce sites, marketing is a big part of the business and marketing efficiency and effectiveness are critical to their success. Companies must make many data-driven decisions in order t...
When it comes to ownership, the internet is broken. Artists, designers, and other creatives can share their work easily on the internet, but keeping it as "theirs" and get fairly compensated has pr...
A lot of devices can measure acceleration and rotationrates. With the right features, Machine Learning can predict, weather you are sitting, running, walking or going by bike. This talk will show y...
Python is a great language. But it can be slow compared to other languages for certain types of tasks. If applied appropriately, optimization may reduce program runtime or memory consumption consid...
The city of Dresden has an excellent traffic monitoring and guiding system (VAMOS), which also measures the occupancy rate of city parking spaces. The data is pushed to the city's website, from whi...
In recent years, the adoption of electric cars has resulted in a desperate need from carmakers for accurate range prediction. In addition, fuel efficiency is of increasing concern due to today’s ev...
In a lot of our Data Science customer engagements at Pivotal, the question comes up how to put the developed Data Science models into production. Usually, the code produced by the Data Scientist is...
Instrumentation has seen explosive adoption on the cloud in recent years. With the rise of micro-services we are now in an era where we measure the most trivial events in our systems. At Trademob, ...
Blosc is a fast metacodec with two main features: the shuffle filter and threading. The shuffle filter, which is implemented using SSE2 instructions, allows reordering bytes to reduce the complexit...
Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. It provides elegant, concise construction of novel graphics in the style of D3.js without havi...
Although not exactly a classical big data application, the numerical treatment of partial differential equations (PDEs) has very similar characteristics: By spatial discretization, the continuous p...
PyData Dallas 2015 Scikit-Learn is one of the most popular machine learning library written in Python, it has quite active community and extensive coverage for a number of machine learning algorith...
PyData Dallas 2015 "Briefly, a open-source project designed to tackle the challenge of simultaneously handling the flow of Hadoop and non-Hadoop tasks. In short, Briefly is a Python-based, meta-pro...
PyData Dallas 2015 "We use the Blaze and Bokeh libraries to interactively query and visualize large datasets through Python. Blaze provides a consistent query experience on data ranging from a smal...
PyData Dallas 2015 In this hands-on tutorial, we walk you through the steps to build and deploy a sentiment classifier in Python. The task is to learn to classify reviews from the Yelp! reviews dat...
PyData Dallas 2015 Presentation on the ways The Dallas Morning News is using Python in newsgathering and presentation, including our major effort to train reporters to code in Python (20% of the st...
PyData Dallas 2015 Politicians make claims of "facts" all the time. Oftentimes there are false and misleading claims on important topics, due to careless mistakes and even deliberate manipulation o...
PyData Dallas 2015 Blaze is a library for harnessing the power of big data technologies. We show motivating use cases illustrating why you might want to use blaze, including a comparison of out-of-...
PyData Dallas 2015 H2O – Now with a Python interface! It’s open-source Machine Learning, in-memory big-data clustered computing – Math At Scale.H2O has the Worlds Fastest Logistic Regression (by a ...
PyData Dallas 2015 Whether you modelling an earthquake, hurricane, or medical device, Python is there. The language has become so ubiquitous in scientific research that it is the go to tool. In thi...
PyData Dallas 2015 The `NumPy` model of computation in Python has proven to be one of the most successful ways to integrate high-performance computational code into an application. This talk offers...
PyData Dallas 2015 "We set out to build a fully scalable distributed SQL platform but quickly realized that use cases at that scale were much more complex that simply joining data and easy to diges...
PyData Dallas 2015 IPython has given us novel ways of interacting with our code, data, documentation, and reporting. It has enabled collaboration over common open source formats and APIs. Jupyter i...
PyData Dallas 2015 We all learned to program in a particular way, either you started out using Basic, Pascal, C, Fortran anyone. If you're younger maybe Java was your first language or maybe you ca...
PyData Dallas 2015 Much of the $15.2T commercial real estate (CRE) world is closed and clandestine. This has held the industry back from adopting technological progress, creating inefficiencies acr...
PyData Dallas 2015 Machine Learning should be everywhere. Applications today have the opportunity to leverage all the data being collected about users' interactions and behavior. Unfortunately mach...
PyData Dallas 2015 "A/B testing and control-group testing are very well-known techniques to learn about the market and consumer preferences. In reality, however, lots of companies make incorrect co...
PyData Dallas 2015 Meltem will share her story with Terastructure from the inception of the idea to the realization of a commercially viable product in partnership with the University of North Caro...
PyData Dallas 2015 Machine learning is hard. Machine learning at scale is even harder. Scaling up machine learning requires not just advances in algorithm implementation, but also more scalable dat...
PyData Dallas 2015 Interval computing can play an important role in data analysis. In this talk, the speaker will introduce interval computation and its applications in data analysis. A Python modu...
PyData Dallas 2015 We have been seeing increasing use of hashtags in social media. How can businesses leverage the power of hashtags? By opting to use specific hashtags, customers may be expressing...
PyData Dallas 2015 "What is it you are curious about? If you are more curious about your data than your tools, then Python is for you. If you are more curious about your tools than your data, then ...