Jun 25, 2024

Data Warehouse

 

What is DATA MODELLING?
Data Modelling or Data Architecture or Dimension Modelling is nothing but creating a blueprint for how data will be organized and stored in the data warehouse. This process involves identifying the entities, attributes, and relationships between entities in the data, and then designing tables and views to represent them. We can model the data by two techniques i.e.,
1. Star Schema
2. Snowflake Schema
Before jumping into these, we need to understand what is dimension table and what is fact table?

DIMENSION TABLE-
A table which stores descriptive attribute, is non- measurable and categorical in nature is called a dimension table.
FACT TABLE-


A fact table is a central table that stores measurable, aggregate, quantitative or factual data about a particular subject area.
Example-


Consider an E-Commerce application, which will have attributes like
Products, Sales, Tax, Customer, Discount.
For above scenario, Products could behave as Dimension table.
And Sales could behave as Fact table.

STAR SCHEMA-


In the previous diagram, the fact table is in the center and the dimension table is in a relationship with it which makes a star like structure hence, this is called the star schema.


SNOWFLAKE SCHEMA-
Snowflake schema is a variation of the star schema that uses multiple layers of dimension tables. This can be useful for complex data relationships.

Standard naming convention-
● A common prefix for fact tables is "FACT_" or "FT_". This prefix helps
distinguish fact tables from dimension tables.
Eg- fact_Sales or fact_Tax
● A common prefix for dimension tables is "DIM_" or "D_". This prefix helps
distinguish dimension tables from fact tables.
Eg- dim_Products, dim_Customers or dim_Discounts

Types of Fact Table-
1. Transaction fact tables: Theystore detailed information about individual business transactions or events. They record every occurrence at the most granular level, providing a comprehensive view of operational data.
2. Periodic snapshot tables: Periodic snapshot tables provide a summarized view of metrics over regular time intervals. They store aggregated data at a specific point in time, such as the end of a day, week, or month.


3. Accumulating snapshot tables: Accumulating snapshot tables track the stages of a business process or workflow. They store data at specific
checkpoints within a process, providing a detailed view of how the process unfolds over time.

Types of Dimension Table-


● Slowly Changing Dimension (SCD) Tables: It store information that rarely changes over time. They typically contain master data or lookup information, such as product codes, customer IDs, or geographic codes. There are four
main types of SCD tables:
a. SCD Type 0: Static and does not changes.
b. SCD Type 1: Overwrite the previous field, doesn’t keep history.
c. SCD Type 2: Add a new history table.
d. SCD Type 3: Add a new column to keep history.


● Conformed Dimension Tables: Conformed dimension tables are
standardized dimension tables that are shared across multiple fact tables or subject areas.

● Degenerate Dimension Tables: Degenerate dimension tables are dimension tables that are embedded within fact tables.

● Junk Dimension Tables: Junk dimension tables are used to group together disparate dimension attributes that do not fit neatly into other dimension tables as they have low cardinality.

● Role Playing Dimension: They are a type of dimension table that can
assume different meanings or roles depending on the context of the analysis. They are often used to represent entities that can play multiple roles in a business process.

● Static Dimension Table: Static dimension tables are a type of table that
stores descriptive attribute data that does not change over time.

● Shrunken Dimension Tables: Shrunken dimension tables are dimension tables that contain a subset of the attributes from a larger dimension table.

 


Sep 22, 2023

Bitcoin Halving

 Bitcoin halving is a significant event on the Bitcoin network every four years. During this event, the block reward that miners receive for verifying transactions and adding new blocks to the blockchain is reduced by 50%. This means that the rate of new Bitcoin creation slows down, and the total supply of Bitcoin approaches its maximum limit of 21 million.


Bitcoin halving is a programmed event and is built into the Bitcoin protocol to ensure that the inflation rate of Bitcoin remains controlled and predictable. The reduced rate of new Bitcoin creation and the expectation of scarcity can increase the value of Bitcoin, which has historically led to an increase in the asset's price in the months leading up to a halving event.
Despite this, the market can be unpredictable, and the impact of halving Bitcoin's price is not guaranteed. However, the reduced supply of Bitcoin resulting from halving helps to maintain its value and ensure that it remains a finite and scarce asset.

The previous Bitcoin halving occurred on May 11, 2020, at a block height of 630,000. At that time, the block reward for miners was reduced from 12.5 BTC to 6.25 BTC per block. This was the third halving event in Bitcoin's history, following the first halving in November 2012 and the second halving in July 2016. The next Bitcoin halving is expected to occur in march 2024, at which point the block reward will be reduced from 6.25 BTC to 3.125 BTC per block.

After the first Bitcoin halving in November 2012, the price of Bitcoin increased by over 8,000% over the following year. After the second halving in July 2016, the price of Bitcoin increased by around 2,500% over the following 18 months. After the most recent halving event in May 2020, the price of Bitcoin initially experienced a slight drop but quickly recovered and went on to gain over 300% in value over the following year, reaching an all-time high of over $64,000 in April 2021.

Sep 7, 2023

Top 5 AWS Interview Tips

 

  • Can you describe the current AWS infrastructure and technologies used within the company?
    • This question shows your eagerness to understand the existing AWS setup and gives you insights into the company's tech stack.
  • What AWS services or tools are particularly important for this role, and how are they utilized here?
    • This question allows you to gauge the specific responsibilities of the role and how AWS is integrated into the company's operations.
  • How does the company handle AWS security and compliance, especially in relation to [mention relevant industry standards or regulations]?
    • Demonstrates your concern for security and regulatory compliance, which is crucial in many AWS roles.
  • What are the biggest challenges or projects related to AWS that the team is currently working on or will be working on in the near future?
    • Shows your interest in contributing to the team's objectives and your willingness to tackle challenges.

Sep 6, 2023

Database vs. Data Warehouse vs. Data Lake

In the world of data management, these three concepts play distinct roles in handling and organizing data. Let's explore their key differences:


📊 Database:
A database is a structured collection of data that is organized, stored, and managed using a predefined schema. It's designed for efficient data retrieval and modification. Databases are used to support transactional operations(OLTP), such as recording customer orders, tracking inventory, and managing user profiles.

🏢 Data Warehouse:
A data warehouse is a centralized repository that aggregates data from various sources across an organization. It's optimized for complex queries and data analysis(OLAP). Data warehouses often use a process called Extract, Transform, Load (ETL) to integrate data from different systems, transform it into a consistent format, and load it into the warehouse. They are used for business intelligence, reporting, and decision-making.

🌊 Data Lake:
A data lake is a vast storage repository that holds both structured and unstructured data at any scale. Unlike traditional databases, data lakes don't require a predefined schema. This flexibility allows organizations to store raw, unprocessed data from various sources, making it suitable for advanced analytics, machine learning, and exploration of new data sources(ELT)

💡 Key Takeaways:
Use Cases: Databases are used for day-to-day operations, data warehouses are for business intelligence, and data lakes are for storing and analyzing large volumes of diverse data.

Schema: Databases have a fixed schema, data warehouses have a structured schema for reporting, and data lakes allow schema-on-read for flexibility.

Data Types: Databases store structured data, data warehouses store structured and semi-structured data, and data lakes store structured, semi-structured, and unstructured data.