Posts

Showing posts with the label Redshift Architecture

Master Jobs, Stages, and Tasks for Data Engineering Interviews

Image
Mastering Spark execution internals is a "must-have" skill for Data Engineers. Whether you are prepping for an interview or debugging a slow production pipeline, understanding how Spark breaks down your code is the key to performance tuning. Spark applications follow a strict hierarchy: Jobs > Stages > Tasks . Let’s break down exactly how this works. 1. High-Level Architecture Before we dive into the code, let’s look at the components that manage the execution: Driver: The brain. It converts your code into a Directed Acyclic Graph (DAG) and schedules tasks. DAG Scheduler: Splits the graph into Stages based on "shuffles." Task Scheduler: Sends the individual Tasks to the executors. Executors: The workers that actually run the tasks in parallel. 2. Real-World Code Walkthrough: The "Wide" Transformation Let’s analyze a common scenario: reading data, filtering, grouping, and saving. # 1. Read Data (Narrow) df = sp...

Common Key Terms & Terminologies

Image
 scroll down or do CTRL + F if you don't find any term on top........................................ Data warehouse A Data Warehouse (DWH) is a centralized repository designed for storing, managing, and analyzing large volumes of structured data from multiple sources. It enables businesses to perform complex queries, generate reports, and gain insights for decision-making. Key Characteristics: Subject-Oriented : Organized around key business areas (e.g., sales, finance). Integrated : Combines data from different sources into a unified format. Time-Variant : Stores historical data for trend analysis. Non-Volatile : Data is read-only and does not change once stored. Common Technologies: On-Premise : SQL Server, Oracle, Teradata Cloud-Based : Amazon Redshift , Google BigQuery, Snowflake, Azure Synapse Analytics A data warehouse supports Business Intelligence (BI) and analytics by providing structured, cleaned, and optimized data for reporting and decision-making.

Amazon Redshift

Image
                                                      Amazon Redshift