Back to Courses

Data Analysis Courses - Page 25

Showing results 241-250 of 998
Diabetes Prediction With Pyspark MLLIB
In this 1 hour long project-based course, you will learn to build a logistic regression model using Pyspark MLLIB to classify patients as either diabetic or non-diabetic. We will use the popular Pima Indian Diabetes data set. Our goal is to use a simple logistic regression classifier from the pyspark Machine learning library for diabetes classification. We will be carrying out the entire project on the Google Colab environment with the installation of Pyspark.You will need a free Gmail account to complete this project. Please be aware of the fact that the dataset and the model in this project, can not be used in the real-life. We are only using this data for the educational purpose. By the end of this project, you will be able to build the logistic regression classifier using Pyspark MLlib to classify between the diabetic and nondiabetic patients.You will also be able to setup and work with Pyspark on Google colab environment. Additionally, you will also be able to clean and prepare data for analysis. You should be familiar with the Python Programming language and you should have a theoretical understanding of the Logistic Regression algorithm. You will need a free Gmail account to complete this project. Note: This course works best for learners who are based in the North America region. We’re currently working on providing the same experience in other regions.
Communicating Business Analytics Results
The analytical process does not end with models than can predict with accuracy or prescribe the best solution to business problems. Developing these models and gaining insights from data do not necessarily lead to successful implementations. This depends on the ability to communicate results to those who make decisions. Presenting findings to decision makers who are not familiar with the language of analytics presents a challenge. In this course you will learn how to communicate analytics results to stakeholders who do not understand the details of analytics but want evidence of analysis and data. You will be able to choose the right vehicles to present quantitative information, including those based on principles of data visualization. You will also learn how to develop and deliver data-analytics stories that provide context, insight, and interpretation.
Introduction to Business Analytics and Information Economics Capstone
Welcome to the Introduction to Business Analytics and Information Economics Capstone! I’m thrilled to have you enrolled in the course. This Capstone will enable you to put into practice some of the concepts you have studied previously about applying economic concepts to information, conceiving analytics hypotheses, valuing information assets, and developing ideas for monetizing information in various ways. I look forward to your contributions and ideas. 
Managing, Describing, and Analyzing Data
In this course, you will learn the basics of understanding the data you have and why correctly classifying data is the first step to making correct decisions. You will describe data both graphically and numerically using descriptive statistics and R software. You will learn four probability distributions commonly used in the analysis of data. You will analyze data sets using the appropriate probability distribution. Finally, you will learn the basics of sampling error, sampling distributions, and errors in decision-making. This course can be taken for academic credit as part of CU Boulder’s Master of Science in Data Science (MS-DS) degree offered on the Coursera platform. The MS-DS is an interdisciplinary degree that brings together faculty from CU Boulder’s departments of Applied Mathematics, Computer Science, Information Science, and others. With performance-based admissions and no application process, the MS-DS is ideal for individuals with a broad range of undergraduate education and/or professional experience in computer science, information science, mathematics, and statistics. Learn more about the MS-DS program at https://www.coursera.org/degrees/master-of-science-data-science-boulder.
Design Thinking and Predictive Analytics for Data Products
This is the second course in the four-course specialization Python Data Products for Predictive Analytics, building on the data processing covered in Course 1 and introducing the basics of designing predictive models in Python. In this course, you will understand the fundamental concepts of statistical learning and learn various methods of building predictive models. At each step in the specialization, you will gain hands-on experience in data manipulation and building your skills, eventually culminating in a capstone project encompassing all the concepts taught in the specialization.
Hierarchical Clustering using Euclidean Distance
By the end of this project, you will create a Python program using a jupyter interface that analyzes a group of viruses and plot a dendrogram based on similarities among them. The dendrogram that you will create will depend on the cumulative skew profile, which in turn depends on the nucleotide composition. You will use complete genome sequences for many viruses including, Corona, SARS, HIV, Zika, Dengue, enterovirus, and West Nile viruses.
Healthcare Data Models
Career prospects are bright for those qualified to work in healthcare data analytics. Perhaps you work in data analytics, but are considering a move into healthcare where your work can improve people’s quality of life. If so, this course gives you a glimpse into why this work matters, what you’d be doing in this role, and what takes place on the Path to Value where data is gathered from patients at the point of care, moves into data warehouses to be prepared for analysis, then moves along the data pipeline to be transformed into valuable insights that can save lives, reduce costs, to improve healthcare and make it more accessible and affordable. Perhaps you work in healthcare but are considering a transition into a new role. If so, this course will help you see if this career path is one you want to pursue. You’ll get an overview of common data models and their uses. You’ll learn how various systems integrate data, how to ensure clear communication, measure and improve data quality. Data analytics in healthcare serves doctors, clinicians, patients, care providers, and those who carry out the business of improving health outcomes. This course of study will give you a clear picture of data analysis in today’s fast-changing healthcare field and the opportunities it holds for you.
Merge, Sort and Filter Data in Python Pandas
Visualizing data patterns often involves re-arrangement and elimination to determine patterns. For example, in a list of data with yearly rainfall amounts, to quickly determine the years with the most rainfall, the data can be sorted according to rainfall in descending order. A filter could be used to limit the amount of data observed, for example, to only show rainfall amounts greater than an inch. A merge can be used to join two datasets together, for example rainfall and temperature data from two different sources. The ability to sort, merge and filter data has always existed using SQL with database data, now it can be done in application memory space using Python. In this course, you will create an application that reads data from two CSV files. You will learn how to merge, sort, and filter the data to ultimately produce a regression plot to determine a possible correlation between two data sets.
Command Line Tools for Genomic Data Science
Introduces to the commands that you need to manage and analyze directories, files, and large sets of genomic data. This is the fourth course in the Genomic Big Data Science Specialization from Johns Hopkins University.
Capstone Project: Predicting Safety Stock
In this course, we'll make predictions on product usage and calculate optimal safety stock storage. We'll start with a time series of shoe sales across multiple stores on three different continents. To begin, we'll look for unique insights and other interesting things we can find in the data by performing groupings and comparing products within each store. Then, we'll use a seasonal autoregressive integrated moving average (SARIMA) model to make predictions on future sales. In addition to making predictions, we'll analyze the provided statistics (such as p-score) to judge the viability of using the SARIMA model to make predictions. Then, we'll tune the hyper-parameters of the model to garner better results and higher statistical significance. Finally, we'll make predictions on safety stock by looking to the data for monthly usage predictions and calculating safety stock from the formula involving lead times.