GenAI Intern
VIDUR
Job Description
Job description Key Responsibilities: • Develop scripts for scraping websites using Python and related frameworks (Scrapy, BeautifulSoup, Selenium, etc.). • Extract, clean, and structure data from long raw PDFs into knowledge graphs. • Work with NLP and text processing libraries (e.g., spaCy, NLTK, PDFMiner, PyMuPDF) to parse legal documents. • Collaborate with data scientists and engineers to integrate extracted data into larger AI models. • Optimize scraping techniques to improve speed, accuracy, and reliability. • Assist in pre-processing and structuring data for machine learning applications. Requirements: • Strong proficiency in Python. • Experience with web scraping (BeautifulSoup, Scrapy, Selenium, or Playwright). • Familiarity with PDF parsing techniques (PDFMiner, PyMuPDF, or similar). • Basic understanding of knowledge graphs and relational data representation. • Knowledge of NLP and text processing is a plus. • Ability to work on-site in Gurugram for the duration of the internship. • Strong problem-solving skills and eagerness to learn.
Job Skills
Job Overview
Date Posted
Location
Offered Salary
Not disclosed