Back to Courses

Data Science Courses - Page 55

Showing results 541-550 of 1407
Big Data - Capstone Project
Welcome to the Capstone Project for Big Data! In this culminating project, you will build a big data ecosystem using tools and methods form the earlier courses in this specialization. You will analyze a data set simulating big data generated from a large number of users who are playing our imaginary game "Catch the Pink Flamingo". During the five week Capstone Project, you will walk through the typical big data science steps for acquiring, exploring, preparing, analyzing, and reporting. In the first two weeks, we will introduce you to the data set and guide you through some exploratory analysis using tools such as Splunk and Open Office. Then we will move into more challenging big data problems requiring the more advanced tools you have learned including KNIME, Spark's MLLib and Gephi. Finally, during the fifth and final week, we will show you how to bring it all together to create engaging and compelling reports and slide presentations. As a result of our collaboration with Splunk, a software company focus on analyzing machine-generated big data, learners with the top projects will be eligible to present to Splunk and meet Splunk recruiters and engineering leadership.
Enjoyable Econometrics
The goal of this MOOC is to show that econometric methods are often needed to answer questions. A question comes first, then data are to be collected, and then finally the model or method comes in. Depending on the data, however, it can happen that methods need to be adapted. For example, where we first look at two variables, later we may need to look at three or more. Or, when data are missing, what then do we do? And, if the data are counts, like the number of newspaper articles citing someone, then matters may change too. But these modifications always come last, and are considered only when relevant. An important motivation for me to make this MOOC is to emphasize that econometric models and methods can also be applied to more unconventional settings, which are typically settings where the practitioner has to collect his or her own data first. Such collection can be done by carefully combining existing databases, but also by holding surveys or running experiments. A byproduct of having to collect your own data is that this helps to choose amongst the potential methods and techniques that are around. If you are searching for a MOOC on econometrics that treats (mathematical and statistical) methods of econometrics and their applications, you may be interested in the Coursera course “Econometrics: Methods and Applications” that is also from Erasmus University Rotterdam.
Overview of Data Visualization in Microsoft Excel
After finishing this project, you will have learned some basic rules about data visualization and can apply them whenever you create charts. In present times, one can find data visualization in a wide range of fields. Businesses show graphs to report on revenue, police departments create maps of crimes in their jurisdiction, and on the website for the city hall, you can likely find visual comparisons of people who moved to the city and those who left the city. For this reason, it is important for a lot of people to know the basics of data visualization.
A Geometrical Approach to Genome Analysis: Skew & Z-Curve
In this 1-hour long project-based course, you will learn how to analyze a complete viral genome using geometrical methods (skews and Z-curve), 2D- and 3D-plotting in Python, and how to use some important Python libraries (like Tkinter, Matplotlib, and NumPy) helping you accomplish this. You will also learn about the genomes of some viruses including, Corona, SARS, HIV, Zika, Nidovirous, and rubella viruses.
Demand Forecasting Using Time Series
This course is the second in a specialization for Machine Learning for Supply Chain Fundamentals. In this course, we explore all aspects of time series, especially for demand prediction. We'll start by gaining a foothold in the basic concepts surrounding time series, including stationarity, trend (drift), cyclicality, and seasonality. Then, we'll spend some time analyzing correlation methods in relation to time series (autocorrelation). In the 2nd half of the course, we'll focus on methods for demand prediction using time series, such as autoregressive models. Finally, we'll conclude with a project, predicting demand using ARIMA models in Python.
Statistical Inference
Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference. A practitioner can often be left in a debilitating maze of techniques, philosophies and nuance. This course presents the fundamentals of inference in a practical approach for getting things done. After taking this course, students will understand the broad directions of statistical inference and use this information for making informed choices in analyzing data.
Using TensorFlow with Amazon Sagemaker
Please note: You will need an AWS account to complete this course. Your AWS account will be charged as per your usage. Please make sure that you are able to access Sagemaker within your AWS account. If your AWS account is new, you may need to ask AWS support for access to certain resources. You should be familiar with python programming, and AWS before starting this hands on project. We use a Sagemaker P type instance in this project, and if you don't have access to this instance type, please contact AWS support and request access. In this 2-hour long project-based course, you will learn how to train and deploy an image classifier created and trained with the TensorFlow framework within the Amazon Sagemaker ecosystem. Sagemaker provides a number of machine learning algorithms ready to be used for solving a number of tasks. However, it is possible to use Sagemaker for custom training scripts as well. We will use TensorFlow and Sagemaker's TensorFlow Estimator to create, train and deploy a model that will be able to classify images of dogs and cats from the popular Oxford IIIT Pet Dataset. Since this is a practical, project-based course, we will not dive in the theory behind deep learning based image classification, but will focus purely on training and deploying a model with Sagemaker and TensorFlow. You will also need to have some experience with Amazon Web Services (AWS). Note: This course works best for learners who are based in the North America region. We’re currently working on providing the same experience in other regions.
Recommender Systems Capstone
This capstone project course for the Recommender Systems Specialization brings together everything you've learned about recommender systems algorithms and evaluation into a comprehensive recommender analysis and design project. You will be given a case study to complete where you have to select and justify the design of a recommender system through analysis of recommender goals and algorithm performance. Learners in the honors track will focus on experimental evaluation of the algorithms against medium sized datasets. The standard track will include a mix of provided results and spreadsheet exploration. Both groups will produce a capstone report documenting the analysis, the selected solution, and the justification for that solution.
Statistics For Data Science
This is a hands-on project to give you an overview of how to use statistics in data science.
Publishing Visualizations in R with Shiny and flexdashboard
Data visualization is a critical skill for anyone that routinely using quantitative data in his or her work - which is to say that data visualization is a tool that almost every worker needs today. One of the critical tools for data visualization today is the R statistical programming language. Especially in conjunction with the tidyverse software packages, R has become an extremely powerful and flexible platform for making figures, tables, and reproducible reports. However, R can be intimidating for first time users, and there are so many resources online that it can be difficult to sort through without guidance. This course is the fourth in the Specialization "Data Visualization and Dashboarding in R." Learners will come to this course with a strong background in making visualization in R using ggplot2. To build on those skills, this course covers creating interactive visualization using Shiny, as well as combining different kinds of figures made in R into interactive dashboards.