Back to Courses

Data Science Courses - Page 90

Showing results 891-900 of 1407
Serverless Data Processing with Dataflow: Develop Pipelines
In this second installment of the Dataflow course series, we are going to be diving deeper on developing pipelines using the Beam SDK. We start with a review of Apache Beam concepts. Next, we discuss processing streaming data using windows, watermarks and triggers. We then cover options for sources and sinks in your pipelines, schemas to express your structured data, and how to do stateful transformations using State and Timer APIs. We move onto reviewing best practices that help maximize your pipeline performance. Towards the end of the course, we introduce SQL and Dataframes to represent your business logic in Beam and how to iteratively develop pipelines using Beam notebooks.
Explore insights from text analysis using Amazon Comprehend
In this one-hour project, you will understand how Amazon Comprehend works and how you can use the power of Natual Language Processing, NLP, and Machine Learning to extract information and explore insights from text. You will learn how to use Amazon Comprehend to extract entities, people, sentiments, and other elements from text like Tweets, understand how the results are organized, manipulate the data and generate a report to explore the insights. Amazon Comprehend is a fully managed service and it is one of the most powerful Natural Language Processing engines in the market, so you can get up and running quickly, without having to train models from scratch. Once you're done with this project, you will be able to use Amazon Comprehend to extract, analyze and explore insights in your documents in just a few steps.
Data for Machine Learning
This course is all about data and how it is critical to the success of your applied machine learning model. Completing this course will give learners the skills to: Understand the critical elements of data in the learning, training and operation phases Understand biases and sources of data Implement techniques to improve the generality of your model Explain the consequences of overfitting and identify mitigation measures Implement appropriate test and validation measures. Demonstrate how the accuracy of your model can be improved with thoughtful feature engineering. Explore the impact of the algorithm parameters on model strength To be successful in this course, you should have at least beginner-level background in Python programming (e.g., be able to read and code trace existing code, be comfortable with conditionals, loops, variables, lists, dictionaries and arrays). You should have a basic understanding of linear algebra (vector notation) and statistics (probability distributions and mean/median/mode). This is the third course of the Applied Machine Learning Specialization brought to you by Coursera and the Alberta Machine Intelligence Institute.
Scalable Machine Learning on Big Data using Apache Spark
This course will empower you with the skills to scale data science and machine learning (ML) tasks on Big Data sets using Apache Spark. Most real world machine learning work involves very large data sets that go beyond the CPU, memory and storage limitations of a single computer. Apache Spark is an open source framework that leverages cluster computing and distributed storage to process extremely large data sets in an efficient and cost effective manner. Therefore an applied knowledge of working with Apache Spark is a great asset and potential differentiator for a Machine Learning engineer. After completing this course, you will be able to: - gain a practical understanding of Apache Spark, and apply it to solve machine learning problems involving both small and big data - understand how parallel code is written, capable of running on thousands of CPUs. - make use of large scale compute clusters to apply machine learning algorithms on Petabytes of data using Apache SparkML Pipelines. - eliminate out-of-memory errors generated by traditional machine learning frameworks when data doesn’t fit in a computer's main memory - test thousands of different ML models in parallel to find the best performing one – a technique used by many successful Kagglers - (Optional) run SQL statements on very large data sets using Apache SparkSQL and the Apache Spark DataFrame API. Enrol now to learn the machine learning techniques for working with Big Data that have been successfully applied by companies like Alibaba, Apple, Amazon, Baidu, eBay, IBM, NASA, Samsung, SAP, TripAdvisor, Yahoo!, Zalando and many others. NOTE: You will practice running machine learning tasks hands-on on an Apache Spark cluster provided by IBM at no charge during the course which you can continue to use afterwards. Prerequisites: - basic python programming - basic machine learning (optional introduction videos are provided in this course as well) - basic SQL skills for optional content The following courses are recommended before taking this class (unless you already have the skills) https://www.coursera.org/learn/python-for-applied-data-science or similar https://www.coursera.org/learn/machine-learning-with-python or similar https://www.coursera.org/learn/sql-data-science for optional lectures
Building and analyzing linear regression model in R
By the end of this project, you will learn how to build and analyse linear regression model in R, a free, open-source program that you can download. You will learn how to load and clean a real world dataset. Next, you will learn how to build a linear regression model and various plots to analyze the model’s performance. Lastly, you will learn how to predict future values using the model. By the end of this project, you will become confident in building a linear regression model on real world dataset and the know-how of assessing the model’s performance using R programming language. Linear regression models are useful in identifying critical relationships between predictors (or factors) and output variable. These relationships can impact a business in the future and can help business owners to make decisions. Note: This course works best for learners who are based in the North America region. We’re currently working on providing the same experience in other regions.
Calculating Descriptive Statistics in R
Welcome to this 2-hour long project-based course Calculating Descriptive Statistics in R. In this project, you will learn how to perform extensive descriptive statistics on both quantitative and qualitative variables in R. You will also learn how to calculate the frequency and percentage of categorical variables and check the distribution of quantitative variables. By extension, you will learn how to perform univariate and bivariate statistics for univariate and bivariate variables in R. Note: You do not need to be a Data Scientist to be successful in this guided project, just a familiarity with basic statistics and using R suffice for this project. If you are not familiar with R and want to learn the basics, start with my previous guided project titled “Getting Started with R”.
Communicate Effectively about Ethical Challenges in Data-Driven Technologies
Leading a data-driven organization necessitates effective communication to create a culture of ethical practice. Communication to stakeholders will guide an organization's strategy and potentially impact the future of work for that organization or entity. It is not enough to talk about ethical practices, you need to to relate their value to stakeholders. Building out strategies that are inclusive and relatable can build public trust and loyalty, and knowing how to plan for a crisis will reduce the harm to such trust and loyalty. In this fourth course of the CertNexus Certified Ethical Emerging Technologist (CEET) professional certificate, learners will develop inclusive strategies to communicate business impacts to stakeholders, design communication strategies that mirror ethical principles and policies, and in case of an ethical crisis, be prepared to manage the crisis and the media to reduce business impact. This course is the fourth of five courses within the Certified Ethical Emerging Technologist (CEET) professional certificate. The preceding courses are titled Promote the Ethical Use of Data-Driven Technologies, Turn Ethical Frameworks into Actionable Steps, and Detect and Mitigate Ethical Risks.
Facial Expression Classification Using Residual Neural Nets
In this hands-on project, we will train a deep learning model based on Convolutional Neural Networks (CNNs) and Residual Blocks to detect facial expressions. This project could be practically used for detecting customer emotions and facial expressions. By the end of this project, you will be able to: - Understand the theory and intuition behind Deep Learning, Convolutional Neural Networks (CNNs) and Residual Neural Networks. - Import Key libraries, dataset and visualize images. - Perform data augmentation to increase the size of the dataset and improve model generalization capability. - Build a deep learning model based on Convolutional Neural Network and Residual blocks using Keras with Tensorflow 2.0 as a backend. - Compile and fit Deep Learning model to training data. - Assess the performance of trained CNN and ensure its generalization using various KPIs. - Improve network performance using regularization techniques such as dropout.
SQL for Data Science with R
Much of the world's data resides in databases. SQL (or Structured Query Language) is a powerful language which is used for communicating with and extracting data from databases. A working knowledge of databases and SQL is a must if you want to become a data scientist. The purpose of this course is to introduce relational database concepts and help you learn and apply foundational knowledge of the SQL and R languages. It is also intended to get you started with performing SQL access in a data science environment. The emphasis in this course is on hands-on and practical learning. As such, you will work with real databases, real data science tools, and real-world datasets. You will create a database instance in the cloud. Through a series of hands-on labs, you will practice building and running SQL queries. You will also learn how to access databases from Jupyter notebooks using SQL and R. No prior knowledge of databases, SQL, R, or programming is required. Anyone can audit this course at no charge. If you choose to take this course and earn the Coursera course certificate, you can also earn an IBM digital badge upon successful completion of the course.
Interprofessional Healthcare Informatics
Interprofessional Healthcare Informatics is a graduate-level, hands-on interactive exploration of real informatics tools and techniques offered by the University of Minnesota and the University of Minnesota's National Center for Interprofessional Practice and Education. We will be incorporating technology-enabled educational innovations to bring the subject matter to life. Over the 10 modules, we will create a vital online learning community and a working healthcare informatics network. We will explore perspectives of clinicians like dentists, physical therapists, nurses, and physicians in all sorts of practice settings worldwide. Emerging technologies, telehealth, gaming, simulations, and eScience are just some of the topics that we will consider. Throughout the course, we’ll focus on creativity, controversy, and collaboration - as we collectively imagine and create the future within the rapidly evolving healthcare informatics milieu. All healthcare professionals and IT geeks are welcome!