Technical Blog: Transformer

Oct 12, 2025

Transformer

Transformer models are used to solve all kinds of tasks across different modalities, including natural language processing (NLP), computer vision, audio processing, and more

The most basic object in the 🤗 Transformers library is the pipeline() function. It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer:

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")

[{'label': 'POSITIVE', 'score': 0.9598047137260437}]

By default, this pipeline selects a particular pretrained model that has been fine-tuned for sentiment analysis in English. The model is downloaded and cached when you create the classifier object. If you rerun the command, the cached model will be used instead and there is no need to download the model again.
There are three main steps involved when you pass some text to a pipeline:
The text is preprocessed into a format the model can understand.
The preprocessed inputs are passed to the model.
The predictions of the model are post-processed, so you can make sense of them
The pipeline() function supports multiple modalities, allowing you to work with text, images, audio, and even multimodal tasks

Text pipelines
text-generation: Generate text from a prompt
text-classification: Classify text into predefined categories
summarization: Create a shorter version of a text while preserving key information
translation: Translate text from one language to another
zero-shot-classification: Classify text without prior training on specific labels
feature-extraction: Extract vector representations of text
Image pipelines
image-to-text: Generate text descriptions of images
image-classification: Identify objects in an image
object-detection: Locate and identify objects in images
Audio pipelines
automatic-speech-recognition: Convert speech to text
audio-classification: Classify audio into categories
text-to-speech: Convert text to spoken audio
Multimodal pipelines
image-text-to-text: Respond to an image based on a text prompt
This is a common scenario in real-world projects because annotating text is usually time-consuming and requires domain expertise. For this use case, the zero-shot-classification pipeline is very powerful: it allows you to specify which labels to use for the classification, so you don’t have to rely on the labels of the pretrained model. You’ve already seen how the model can classify a sentence as positive or negative using those two labels — but it can also classify the text using any other set of labels you like.
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)
{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445963859558105, 0.111976258456707, 0.043427448719739914]}
This pipeline is called zero-shot because you don’t need to fine-tune the model on your data to use it. It can directly return probability scores for any list of labels you want!
Text generation
Now let’s see how to use a pipeline to generate some text. The main idea here is that you provide a prompt and the model will auto-complete it by generating the remaining text. This is similar to the predictive text feature that is found on many phones. Text generation involves randomness, so it’s normal if you don’t get the same results as shown below.
from transformers import pipeline

generator = pipeline("text-generation")
generator("In this course, we will teach you how to")
[{'generated_text': 'In this course, we will teach you how to understand and use '
                    'data flow and data interchange when handling user data. We '
                    'will be working with one or more of the most commonly used '
                    'data flows — data flows of various types, as seen by the '
                    'HTTP'}]
You can control how many different sequences are generated with the argument num_return_sequences and the total length of the output text with the argument max_length.

Technical Blog

Pages

Oct 12, 2025

Transformer

Text pipelines

Image pipelines

Audio pipelines

Multimodal pipelines

Text generation

No comments:

Explore ORACLE4U

Labels

Translate

Wikipedia