Big Data, Introduction | DBDA.X401

Big Data, Introduction | DBDA.X401


In the new paradigm of Big Data where we trust distributed systems to process information across server clusters, we increasingly rely on technologies to manage the massive amounts of information generated by social media, online transactions, web logs, and sensors. These technologies include handling unstructured, semi-structured, and structured data, as well as processing, real-time analytics, and visualization. They are especially useful for reporting in circumstances where a relational database approach is not effective or is too costly.

In this comprehensive introductory course for managers, analysts, architects and developers, you will gain insights into cloud-based Big Data architectures. We will cover Hadoop, Spark and other Big Data platforms based on SQL, such as Hive.

The first half of the course includes an overview of the Big Data technologies and frameworks such as HDFS, MapReduce, Spark, Kafka and Hive. The second half of the course will cover writing programs in Spark and Hive and how to design Big Data applications.

The course consists of interactive lectures, in-class labs, and take-home practice exercises. You’ll complete this course with a deep understanding of the tools to build Big Data applications using MapReduce, Spark, and Hive.


Learning Outcomes
At the conclusion of the course, you should be able to

  • Describe big data concepts, characteristics, data management and warehouse
  • Explain the significance of big data and industry use case references
  • Compare and contrast NoSQL with Hadoop, leverage Hadoop ecosystem for analyzing big data and use Hive/NoSQL for data analysis
  • Write programs and applications in Spark and Hive

Topics Include

  • Evolution of Big Data
  • Big Data use cases
  • Big Data applications architecture
  • Understanding Hadoop distributed file system (HDFS)
  • How MapReduce framework works
  • Introduction to HBase (Hadoop NoSQL database)
  • Introduction to Apache Kafka
  • Developing MapReduce applications
  • Introduction to Spark and SparkSQL
  • Developing Spark/SparkSQL applications
  • Managing tables and query development in Hive
  • Introduction to data pipelines

Skills Needed:

Moderate level of programming knowledge in Python and SQL

Have a question about this course?
Speak to a student services representative.
Call (408) 861-3860
FAQ
ENROLL EARLY!
This course is related to the following programs:

Sections Open for Enrollment:

Open Sections and Schedule
Start / End Date Quarter Units Cost Instructor
06-18-2024 to 08-20-2024 3.0 $910

Satyen Kansara

Enroll

Final Date To Enroll: 06-18-2024

Schedule

Date: Start Time: End Time: Meeting Type: Location:
Tue, 06-18-2024 6:30 p.m. 9:30 p.m. Flexible SANTA CLARA / REMOTE
Tue, 06-25-2024 6:30 p.m. 9:30 p.m. Flexible SANTA CLARA / REMOTE
Tue, 07-02-2024 6:30 p.m. 9:30 p.m. Flexible SANTA CLARA / REMOTE
Tue, 07-09-2024 6:30 p.m. 9:30 p.m. Flexible SANTA CLARA / REMOTE
Tue, 07-16-2024 6:30 p.m. 9:30 p.m. Flexible SANTA CLARA / REMOTE
Tue, 07-23-2024 6:30 p.m. 9:30 p.m. Flexible SANTA CLARA / REMOTE
Tue, 07-30-2024 6:30 p.m. 9:30 p.m. Flexible SANTA CLARA / REMOTE
Tue, 08-06-2024 6:30 p.m. 9:30 p.m. Flexible SANTA CLARA / REMOTE
Tue, 08-13-2024 6:30 p.m. 9:30 p.m. Flexible SANTA CLARA / REMOTE
Tue, 08-20-2024 6:30 p.m. 9:30 p.m. Flexible SANTA CLARA / REMOTE