Data Engineering


Eligibility: Graduates / Final-year students / B.Tech, BSc, MCA, BCA / Working professionals looking to switch to Data Engineering
Duration: 2 Months
Job Roles:
  • Data Engineer
  • Big Data Engineer
  • ETL Developer
  • Cloud Data Engineer
  • Data Pipeline Developer
  • Data Analyst
  • What is Data Engineering?
  • Role of a Data Engineer vs Data Scientist vs Data Analyst
  • Modern data ecosystem
  • Batch vs Streaming data
  • Data Lakes, Warehouses, Lakehouses
  • Real industry architectures
  • Python for Data Engineering
  • Data types, loops, functions
  • File handling (CSV, JSON, XML, Parquet)
  • Working with Pandas for data manipulation
  • Python modules: pathlib, logging, argparse
  • Writing reusable scripts
  • SQL for Data Engineering
  • Joins, aggregations, window functions
  • Stored procedures & functions
  • Query optimization basics
  • Relational Databases
  • MySQL / PostgreSQL
  • Schema design & normalization
  • Transaction management
  • NoSQL Databases
  • MongoDB
  • Cassandra basics
  • Use cases and real-world examples
  • ETL vs ELT
  • Batch ingestion tools
  • Streaming ingestion tools
  • Ingestion from APIs, logs, cloud sources
  • Hands-on with: Apache Sqoop, Kafka basics, Python API ingestion
  • Cloud ingestion (Azure Data Factory / AWS Glue / GCP Dataflow)
  • Hadoop Ecosystem
  • HDFS architecture
  • MapReduce concepts
  • YARN
  • Spark
  • RDDs, DataFrames, Spark SQL
  • Transformations & actions
  • Spark with Python (PySpark)
  • Partitioning, bucketing, optimization
  • Spark Streaming & Structured Streaming
  • OLTP vs OLAP
  • Star & Snowflake schema
  • Fact & dimension tables
  • Dimensional modeling
  • Modern warehouse platforms: Snowflake, BigQuery, Redshift
  • Data Lakehouse concept
  • AWS Data Engineering
  • S3, Glue, EMR, Athena
  • Redshift
  • Lambda for data pipelines
  • Azure Data Engineering
  • Azure Data Lake Storage
  • Data Factory
  • Synapse Analytics
  • Databricks
  • GCP Data Engineering
  • Cloud Storage
  • BigQuery
  • Dataflow
  • Dataproc
  • What is orchestration?
  • Apache Airflow
  • DAGs
  • Operators
  • Scheduling
  • Data validation frameworks (Great Expectations)
  • Governance & lineage
  • Role-based access control
  • PII, encryption, compliance
  • Git & version control
  • CI/CD basics
  • Infrastructure-as-code (Terraform basics)
  • Containerization: Docker
  • Deploying data pipelines
  • Kafka deep dive
  • Stream processing concepts
  • Spark Structured Streaming
  • Real-time dashboards

Batch Schedule Download Brochure

Enquiry