Back to Courses

Data Science Courses - Page 101

Showing results 1001-1010 of 1407
Memorystore: Qwik Start
This is a self-paced lab that takes place in the Google Cloud console. In this lab you will create a Memorystore instance and leverage its basic capabilities.
Machine Learning: Clustering & Retrieval
Case Studies: Finding Similar Documents A reader is interested in a specific news article and you want to find similar articles to recommend. What is the right notion of similarity? Moreover, what if there are millions of other documents? Each time you want to a retrieve a new document, do you need to search through all other documents? How do you group similar documents together? How do you discover new, emerging topics that the documents cover? In this third case study, finding similar documents, you will examine similarity-based algorithms for retrieval. In this course, you will also examine structured representations for describing the documents in the corpus, including clustering and mixed membership models, such as latent Dirichlet allocation (LDA). You will implement expectation maximization (EM) to learn the document clusterings, and see how to scale the methods using MapReduce. Learning Outcomes: By the end of this course, you will be able to: -Create a document retrieval system using k-nearest neighbors. -Identify various similarity metrics for text data. -Reduce computations in k-nearest neighbor search by using KD-trees. -Produce approximate nearest neighbors using locality sensitive hashing. -Compare and contrast supervised and unsupervised learning tasks. -Cluster documents by topic using k-means. -Describe how to parallelize k-means using MapReduce. -Examine probabilistic clustering approaches using mixtures models. -Fit a mixture of Gaussian model using expectation maximization (EM). -Perform mixed membership modeling using latent Dirichlet allocation (LDA). -Describe the steps of a Gibbs sampler and how to use its output to draw inferences. -Compare and contrast initialization techniques for non-convex optimization objectives. -Implement these techniques in Python.
Working with Datasets
By the end of this project, you will use Python to wrangle two datasets to visualize the relationships among the data. Datasets are collections of data that may exist in a database, a csv file, or an ordinary file. Python is a popular language to use work with dataset data. It has tools to read data in various formats, and libraries to visualize the datasets.
Image Noise Reduction with Auto-encoders using TensorFlow
In this 2-hour long project-based course, you will learn the basics of image noise reduction with auto-encoders. Auto-encoding is an algorithm to help reduce dimensionality of data with the help of neural networks. It can be used for lossy data compression where the compression is dependent on the given data. This algorithm to reduce dimensionality of data as learned from the data can also be used for reducing noise in data. This course runs on Coursera's hands-on project platform called Rhyme. On Rhyme, you do projects in a hands-on manner in your browser. You will get instant access to pre-configured cloud desktops containing all of the software and data you need for the project. Everything is already set up directly in your internet browser so you can just focus on learning. For this project, you’ll get instant access to a cloud desktop with Python, Jupyter, and Tensorflow pre-installed. Note: This course works best for learners who are based in the North America region. We’re currently working on providing the same experience in other regions.
Text Retrieval and Search Engines
Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. Text data are unique in that they are usually generated directly by humans rather than a computer system or sensors, and are thus especially valuable for discovering knowledge about people’s opinions and preferences, in addition to many other kinds of knowledge that we encode in text. This course will cover search engine technologies, which play an important role in any data mining applications involving text data for two reasons. First, while the raw data may be large for any particular problem, it is often a relatively small subset of the data that are relevant, and a search engine is an essential tool for quickly discovering a small subset of relevant text data in a large text collection. Second, search engines are needed to help analysts interpret any patterns discovered in the data by allowing them to examine the relevant original text data to make sense of any discovered pattern. You will learn the basic concepts, principles, and the major techniques in text retrieval, which is the underlying science of search engines.
Creating a Wordcloud using NLP and TF-IDF in Python
By the end of this project, you will learn how to create a professional looking wordcloud from a text dataset in Python. You will use an open source dataset containing Christmas recipes and will create a wordcloud of the most important ingredients used in these recipes. I will teach you how load a JSON dataset, clean the dataset by removing encodings and unwanted characters, and lemmatize your dataset. I will also teach you how to calculate TF-IDF weights of words in your dataset and use these weights to create a wordcloud. You will create a ready-to-use Jupyter notebook for creating a wordcloud on any text dataset. Lemmatization is a process of removing inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma. TF-IDF stands for term frequency-inverse document frequency. TF-IDF gives a weight to each word which tells how important that term is. Using both lemmatization and TF-IDF, one can find the important words in the text dataset and use these important words to create the wordcloud. For example, these datasets could be customer complaints and the business can focus on the important issues that the customers are facing. Wordcloud is a powerful resource which can be used in reports and presentations. Note: This course works best for learners who are based in the North America region. We’re currently working on providing the same experience in other regions.
Implementing Connected Planning
From this course, you’ll understand how to make Connected Planning a reality in an organization. Organizations need a vision supported by a focused effort to move from traditional planning and static business modeling to a Connected Planning approach--where data, people, and plans are linked throughout the organization. Creating a digital ecosystem using a Connected Planning technology platform is a big part of the process, but it’s not the only thing needed. In this course, we’ll explore the role of corporate culture and sponsorship, process redesign, master data management, and change management to successfully implement Connected Planning. You’ll be challenged to examine your own organization’s readiness to undertake the Connected Planning journey. If you’re not ready yet, you’ll identify areas where you need to focus your efforts. If you are ready, you’ll have the framework you need to get started on the path to make Connected Planning a reality in your organization. By the end of this course, you’ll be able to: • Identify challenges to a Connected Planning implementation stemming from issues with people, data, and planning processes • Articulate the benefits of Connected Planning over traditional planning and static business modeling • Explain why corporate culture, data management, and change management are critical to successful Connected Planning execution • Drive or constructively participate in the implementation of Connected Planning in your organization This course is presented by Anaplan, provider of a leading technology platform that is purpose-built for Connected Planning.
Computational Social Science Capstone Project
CONGRATULATIONS! Not only did you accomplish to finish our intellectual tour de force, but, by now, you also already have all required skills to execute a comprehensive multi-method workflow of computational social science. We will put these skills to work in this final integrative lab, where we are bringing it all together. We scrape data from a social media site (drawing on the skills obtained in the 1st course of this specialization). We then analyze the collected data by visualizing the resulting networks (building on the skills obtained in the 3rd course). We analyze some key aspects of it in depth, using machine learning powered natural language processing (putting to work the insights obtained during the 2nd course). Finally, we use a computer simulation model to explore possible generative mechanism and scrutinize aspects that we did not find in our empirical reality, but that help us to improve this aspect of society (drawing on the skills obtained during the 4th course of this specialization). The result is the first glimpse at a new way of doing social science in a digital age: computational social science. Congratulations! Having done all of this yourself, you can consider yourself a fledgling computational social scientist!
Aggregate Data with LibreOffice Base Queries
By the end of this project, you will have written LibreOffice Base queries to retrieve and aggregate data from a Sales database. Using both the Query Design tool and the SQL View you will group and summarize data using functions such as: Sum, Average, Count, Min and Max. Aggregating (grouping and summarizing) data can significantly increase its value when provided to users for use in analysis. Note: This course works best for learners who are based in the North America region. We’re currently working on providing the same experience in other regions.
Practical Python for AI Coding 2
Introduction video : https://youtu.be/TRhwIHvehR0 This course is for a complete novice of Python coding, so no prior knowledge or experience in software coding is required. This course selects, introduces and explains Python syntaxes, functions and libraries that were frequently used in AI coding. In addition, this course introduces vital syntaxes, and functions often used in AI coding and explains the complementary relationship among NumPy, Pandas and TensorFlow, so this course is helpful for even seasoned python users. This course starts with building an AI coding environment without failures on learners’ desktop or notebook computers to enable them to start AI modeling and coding with Scikit-learn, TensorFlow and Keras upon completing this course. Because learners have an AI coding environment on their computers after taking this course, they can start AI coding and do not need to join or use the cloud-based services.