Back to Courses

Data Science Courses - Page 92

Showing results 911-920 of 1407
Mastering Data Analysis in Excel
Important: The focus of this course is on math - specifically, data-analysis concepts and methods - not on Excel for its own sake. We use Excel to do our calculations, and all math formulas are given as Excel Spreadsheets, but we do not attempt to cover Excel Macros, Visual Basic, Pivot Tables, or other intermediate-to-advanced Excel functionality. This course will prepare you to design and implement realistic predictive models based on data. In the Final Project (module 6) you will assume the role of a business data analyst for a bank, and develop two different predictive models to determine which applicants for credit cards should be accepted and which rejected. Your first model will focus on minimizing default risk, and your second on maximizing bank profits. The two models should demonstrate to you in a practical, hands-on way the idea that your choice of business metric drives your choice of an optimal model. The second big idea this course seeks to demonstrate is that your data-analysis results cannot and should not aim to eliminate all uncertainty. Your role as a data-analyst is to reduce uncertainty for decision-makers by a financially valuable increment, while quantifying how much uncertainty remains. You will learn to calculate and apply to real-world examples the most important uncertainty measures used in business, including classification error rates, entropy of information, and confidence intervals for linear regression. All the data you need is provided within the course, all assignments are designed to be done in MS Excel, and you will learn enough Excel to complete all assignments. The course will give you enough practice with Excel to become fluent in its most commonly used business functions, and you’ll be ready to learn any other Excel functionality you might need in the future (module 1). The course does not cover Visual Basic or Pivot Tables and you will not need them to complete the assignments. All advanced concepts are demonstrated in individual Excel spreadsheet templates that you can use to answer relevant questions. You will emerge with substantial vocabulary and practical knowledge of how to apply business data analysis methods based on binary classification (module 2), information theory and entropy measures (module 3), and linear regression (module 4 and 5), all using no software tools more complex than Excel.
Analyze Datasets and Train ML Models using AutoML
In the first course of the Practical Data Science Specialization, you will learn foundational concepts for exploratory data analysis (EDA), automated machine learning (AutoML), and text classification algorithms. With Amazon SageMaker Clarify and Amazon SageMaker Data Wrangler, you will analyze a dataset for statistical bias, transform the dataset into machine-readable features, and select the most important features to train a multi-class text classifier. You will then perform automated machine learning (AutoML) to automatically train, tune, and deploy the best text-classification algorithm for the given dataset using Amazon SageMaker Autopilot. Next, you will work with Amazon SageMaker BlazingText, a highly optimized and scalable implementation of the popular FastText algorithm, to train a text classifier with very little code. Practical data science is geared towards handling massive datasets that do not fit in your local hardware and could originate from multiple sources. One of the biggest benefits of developing and running data science projects in the cloud is the agility and elasticity that the cloud offers to scale up and out at a minimum cost. The Practical Data Science Specialization helps you develop the practical skills to effectively deploy your data science projects and overcome challenges at each step of the ML workflow using Amazon SageMaker. This Specialization is designed for data-focused developers, scientists, and analysts familiar with the Python and SQL programming languages and want to learn how to build, train, and deploy scalable, end-to-end ML pipelines - both automated and human-in-the-loop - in the AWS cloud.
Artificial Intelligence Ethics in Action
AI Ethics research is an emerging field, and to prove our skills, we need to demonstrate our critical thinking and analytical ability. Since it's not reasonable to jump into a full research paper with our newly founded skills, we will instead work on 3 projects that will demonstrate your ability to analyze ethical AI across a variety of topics and situations. These projects include all the skills you've learned in this AI Ethics Specialization.
Getting Started with BigQuery Machine Learning
This is a self-paced lab that takes place in the Google Cloud console. In this lab, you'll learn how to use BigQuery to create machine learning models for datasets to create a model that predicts whether a visitor will make a transaction.
Social and Economic Networks: Models and Analysis
Learn how to model social and economic networks and their impact on human behavior. How do networks form, why do they exhibit certain patterns, and how does their structure impact diffusion, learning, and other behaviors? We will bring together models and techniques from economics, sociology, math, physics, statistics and computer science to answer these questions. The course begins with some empirical background on social and economic networks, and an overview of concepts used to describe and measure networks. Next, we will cover a set of models of how networks form, including random network models as well as strategic formation models, and some hybrids. We will then discuss a series of models of how networks impact behavior, including contagion, diffusion, learning, and peer influences. You can find a more detailed syllabus here: http://web.stanford.edu/~jacksonm/Networks-Online-Syllabus.pdf You can find a short introductory videao here: http://web.stanford.edu/~jacksonm/Intro_Networks.mp4
Reproducible Research
This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner. Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. The need for reproducibility is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations. Reproducibility allows for people to focus on the actual content of a data analysis, rather than on superficial details reported in a written summary. In addition, reproducibility makes an analysis more useful to others because the data and code that actually conducted the analysis are available. This course will focus on literate statistical analysis tools which allow one to publish data analyses in a single document that allows others to easily execute the same analysis to obtain the same results.
Getting Started with Spatial Analysis in GeoDa
By the end of this project, learners will know how to start out with GeoDa to use it for spatial analyses. This includes how to access and download the software, import multiple layers, and a basic overview of GeoDa. Spatial analysis, as a type of data analysis, has been getting increasingly important. The beginnings are often dated back to John Snow’s cholera outbreak maps from the mid-1800s. In 2003, Dr. Luc Anselin at the University of Chicago developed GeoDa, together with his team, to provide free software that digitizes old school pin maps. Today, it is used in various fields to plan cities and infrastructure, create crime maps, emergency management, and visualize finds at archaeological sites. Note: This course works best for learners who are based in the North America region. We’re currently working on providing the same experience in other regions.
Bitcoin Price Prediction using Facebook Prophet
In this 1.5-hour long project-based course, you will learn how to create a Facebook Prophet Machine learning Model and use it to Forecast the Price of Bitcoin for the future 30 days. We will begin by importing all the necessary libraries including Facebook Prophet. Then we will import our dataset and analyze it. Then we will start creating visualizations in Plotly express in order to understand the historical performance of Bitcoin. We will then prepare our data for Facebook Prophet and create a Facebook Prophet Machine learning Model. We will then fit our prepared data to the Facebook Prophet Model and command it to make a Forecast for the future 30 days. We will then Visualize the Forecast using the Prophet’s internal visualization tools and then download the Forecast data. In the final section, we will go to Google Sheets and learn to extract Financial data of Bitcoin using Google Finance. We will then import the Forecast data into Google Sheets and compare it against the actual data and evaluate the performance of the Model. Please note that although this project deals with Bitcoin and teaches to make Price predictions, it is for educational purposes only and should not be taken for a piece of Financial advice since Cryptocurrencies like Bitcoin are extremely volatile and speculative. Basic knowledge of Python programming language is recommended but even those with no prior programming experience will be able to complete this project. You will need a Google account to complete this project. Note: This course works best for learners who are based in the North America region. We’re currently working on providing the same experience in other regions.
Predicting the Weather with Artificial Neural Networks
In this one hour long project-based course, you will tackle a real-world prediction problem using machine learning. The dataset we are going to use comes from the Australian government. They recorded daily weather observations from a number of Australian weather stations. We will use this data to train an artificial neural network to predict whether it will rain tomorrow. By the end of this project, you will have created a machine learning model using industry standard tools, including Python and sklearn. Note: This course works best for learners who are based in the North America region. We’re currently working on providing the same experience in other regions.
DataOps Methodology
DataOps is defined by Gartner as "a collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and consumers across an organization. Much like DevOps, DataOps is not a rigid dogma, but a principles-based practice influencing how data can be provided and updated to meet the need of the organization’s data consumers.” The DataOps Methodology is designed to enable an organization to utilize a repeatable process to build and deploy analytics and data pipelines. By following data governance and model management practices they can deliver high-quality enterprise data to enable AI. Successful implementation of this methodology allows an organization to know, trust and use data to drive value. In the DataOps Methodology course you will learn about best practices for defining a repeatable and business-oriented framework to provide delivery of trusted data. This course is part of the Data Engineering Specialization which provides learners with the foundational skills required to be a Data Engineer.