Oct 12, 2025

LLM 1

 LLMs are characterized by:

  • Scale: They contain millions, billions, or even hundreds of billions of parameters
  • General capabilities: They can perform multiple tasks without task-specific training
  • In-context learning: They can learn from examples provided in the prompt
  • Emergent abilities: As these models grow in size, they demonstrate capabilities that weren’t explicitly programmed or anticipated

The advent of LLMs has shifted the paradigm from building specialized models for specific NLP tasks to using a single, large model that can be prompted or fine-tuned to address a wide range of language tasks. This has made sophisticated language processing more accessible while also introducing new challenges in areas like efficiency, ethics, and deployment.

However, LLMs also have important limitations:

  • Hallucinations: They can generate incorrect information confidently
  • Lack of true understanding: They lack true understanding of the world and operate purely on statistical patterns
  • Bias: They may reproduce biases present in their training data or inputs.
  • Context windows: They have limited context windows (though this is improving)
  • Computational resources: They require significant computational resources

Why is language processing challenging?

Computers don’t process information in the same way as humans. For example, when we read the sentence “I am hungry,” we can easily understand its meaning. Similarly, given two sentences such as “I am hungry” and “I am sad,” we’re able to easily determine how similar they are. For machine learning (ML) models, such tasks are more difficult. The text needs to be processed in a way that enables the model to learn from it. 

Even with the advances in LLMs, many fundamental challenges remain. These include understanding ambiguity, cultural context, sarcasm, and humor. LLMs address these challenges through massive training on diverse datasets, but still often fall short of human-level understanding in many complex scenarios.

LLM Course - Large Language Models

Understanding NLP and LLMs

What’s the difference?

  • NLP (Natural Language Processing) is the broader field focused on enabling computers to understand, interpret, and generate human language. NLP encompasses many techniques and tasks such as sentiment analysis, named entity recognition, and machine translation.
  • LLMs (Large Language Models) are a powerful subset of NLP models characterized by their massive size, extensive training data, and ability to perform a wide range of language tasks with minimal task-specific training. Models like the Llama, GPT, or Claude series are examples of LLMs that have revolutionized what’s possible in NLP

NLP is a field of linguistics and machine learning focused on understanding everything related to human language. The aim of NLP tasks is not only to understand single words individually, but to be able to understand the context of those words.

The following is a list of common NLP tasks, with some examples of each:

  • Classifying whole sentences: Getting the sentiment of a review, detecting if an email is spam, determining if a sentence is grammatically correct or whether two sentences are logically related or not
  • Classifying each word in a sentence: Identifying the grammatical components of a sentence (noun, verb, adjective), or the named entities (person, location, organization)
  • Generating text content: Completing a prompt with auto-generated text, filling in the blanks in a text with masked words
  • Extracting an answer from a text: Given a question and a context, extracting the answer to the question based on the information provided in the context
  • Generating a new sentence from an input text: Translating a text into another language, summarizing a text

NLP isn’t limited to written text though. It also tackles complex challenges in speech recognition and computer vision, such as generating a transcript of an audio sample or a description of an image.

The Rise of Large Language Models (LLMs)

In recent years, the field of NLP has been revolutionized by Large Language Models (LLMs). These models, which include architectures like GPT (Generative Pre-trained Transformer)

A large language model (LLM) is an AI model trained on massive amounts of text data that can understand and generate human-like text, recognize patterns in language, and perform a wide variety of language tasks without task-specific training. They represent a significant advancement in the field of natural language processing (NLP).


Jun 25, 2024

Data Warehouse

 

What is DATA MODELLING?
Data Modelling or Data Architecture or Dimension Modelling is nothing but creating a blueprint for how data will be organized and stored in the data warehouse. This process involves identifying the entities, attributes, and relationships between entities in the data, and then designing tables and views to represent them. We can model the data by two techniques i.e.,
1. Star Schema
2. Snowflake Schema
Before jumping into these, we need to understand what is dimension table and what is fact table?

DIMENSION TABLE-
A table which stores descriptive attribute, is non- measurable and categorical in nature is called a dimension table.
FACT TABLE-


A fact table is a central table that stores measurable, aggregate, quantitative or factual data about a particular subject area.
Example-


Consider an E-Commerce application, which will have attributes like
Products, Sales, Tax, Customer, Discount.
For above scenario, Products could behave as Dimension table.
And Sales could behave as Fact table.

STAR SCHEMA-


In the previous diagram, the fact table is in the center and the dimension table is in a relationship with it which makes a star like structure hence, this is called the star schema.


SNOWFLAKE SCHEMA-
Snowflake schema is a variation of the star schema that uses multiple layers of dimension tables. This can be useful for complex data relationships.

Standard naming convention-
● A common prefix for fact tables is "FACT_" or "FT_". This prefix helps
distinguish fact tables from dimension tables.
Eg- fact_Sales or fact_Tax
● A common prefix for dimension tables is "DIM_" or "D_". This prefix helps
distinguish dimension tables from fact tables.
Eg- dim_Products, dim_Customers or dim_Discounts

Types of Fact Table-
1. Transaction fact tables: Theystore detailed information about individual business transactions or events. They record every occurrence at the most granular level, providing a comprehensive view of operational data.
2. Periodic snapshot tables: Periodic snapshot tables provide a summarized view of metrics over regular time intervals. They store aggregated data at a specific point in time, such as the end of a day, week, or month.


3. Accumulating snapshot tables: Accumulating snapshot tables track the stages of a business process or workflow. They store data at specific
checkpoints within a process, providing a detailed view of how the process unfolds over time.

Types of Dimension Table-


● Slowly Changing Dimension (SCD) Tables: It store information that rarely changes over time. They typically contain master data or lookup information, such as product codes, customer IDs, or geographic codes. There are four
main types of SCD tables:
a. SCD Type 0: Static and does not changes.
b. SCD Type 1: Overwrite the previous field, doesn’t keep history.
c. SCD Type 2: Add a new history table.
d. SCD Type 3: Add a new column to keep history.


● Conformed Dimension Tables: Conformed dimension tables are
standardized dimension tables that are shared across multiple fact tables or subject areas.

● Degenerate Dimension Tables: Degenerate dimension tables are dimension tables that are embedded within fact tables.

● Junk Dimension Tables: Junk dimension tables are used to group together disparate dimension attributes that do not fit neatly into other dimension tables as they have low cardinality.

● Role Playing Dimension: They are a type of dimension table that can
assume different meanings or roles depending on the context of the analysis. They are often used to represent entities that can play multiple roles in a business process.

● Static Dimension Table: Static dimension tables are a type of table that
stores descriptive attribute data that does not change over time.

● Shrunken Dimension Tables: Shrunken dimension tables are dimension tables that contain a subset of the attributes from a larger dimension table.