The examples below mostly involve OpenAI models. The following code demonstrates initializing and using a language model (OpenAI in this particular case) within LangChain.

from langchain.llms import OpenAI

llm = OpenAI(model_name="text-davinci-003", temperature=0.01)

print(llm("Suggest 3 bday gifts for a data scientist"))

>>>

1. A subscription to a data science magazine or journal

2. A set of data science books

3. A data science-themed mug or t-shirt

As you can see, we initialize an LLM and call it with a query. All the tokenization and embedding happens behind the scene. We can manage the conversation history and incorporate system instructions into the chat to get more response flexibility.

from langchain.chat_models import ChatOpenAI

from langchain.schema import HumanMessage, AIMessage, SystemMessage

chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.01)

conversation_history = [

HumanMessage(content="Suggest 3 bday gifts for a data scientist"),

AIMessage(content="What is your price range?"),

HumanMessage(content="Under 100$"),

]

print(chat(conversation_history).content)

>>>

1. A data science book: Consider gifting a popular and highly recommended book on data science, such as "Python for Data Analysis" by Wes McKinney or "The Elements of Statistical Learning" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. These books can provide valuable insights and knowledge for a data scientist's professional development.

2. Data visualization tool: A data scientist often deals with large datasets and needs to present their findings effectively. Consider gifting a data visualization tool like Tableau Public or Plotly, which can help them create interactive and visually appealing charts and graphs to communicate their data analysis results.

3. Subscription to a data science platform: Give them access to a data science platform like Kaggle or DataCamp, which offer a wide range of courses, tutorials, and datasets for data scientists to enhance their skills and stay updated with the latest trends in the field. This gift can provide them with valuable learning resources and opportunities for professional growth.

system_instruction = SystemMessage(

content="""You work as an assistant in an electronics store.

Your income depends on the items you sold"""

)

user_message = HumanMessage(content="3 bday gifts for a data scientist")

print(chat([system_instruction, user_message]).content)

>>>

1. Laptop: A high-performance laptop is essential for any data scientist. Look for a model with a powerful processor, ample RAM, and a large storage capacity. This will allow them to run complex data analysis tasks and store large datasets.

2. External Hard Drive: Data scientists deal with massive amounts of data, and having extra storage space is crucial. An external hard drive with a large capacity will provide them with a convenient and secure way to store and backup their data.

3. Data Visualization Tool: Data visualization is an important aspect of data science. Consider gifting them a subscription to a data visualization tool like Tableau or Power BI. These tools will help them create visually appealing and interactive charts, graphs, and dashboards to present their findings effectively.

As you can see, we can shift the conversation in a specific direction using different types of Messages: HumanMessage, AIMessage, and SystemMessage.

Open-source

Now, let’s talk about open-source models. Below is a typical example of initializing and using a pre-trained language model for text generation. The code includes tokenizer usage, model configuration, and efficient inference with quantization (several code snippets below), and CUDA support.

from auto_gptq import AutoGPTQForCausalLM

from transformers import AutoTokenizer

from torch import cuda

# Name of the pre-trained model

model_name = "TheBloke/llama-2-13B-Guanaco-QLoRA-GPTQ"

# Initialize the tokenizer for the model

tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

# Initialize the AutoGPTQForCausalLM model with specific configurations

# This model is a quantized version of the GPT model suitable for efficient inference

model = AutoGPTQForCausalLM.from_quantized(

model_name,

use_safetensors=True, # Enables SafeTensors for secure serialization

trust_remote_code=True, # Trusts the remote code (not recommended for untrusted sources)

device_map="auto", # Automatically maps the model to the available device

quantize_config=None # Custom quantization configuration (None for default)

)

# The input query to be tokenized and passed to the model

query = "<Your input text here>"

# Tokenize the input query and convert it to a tensor format compatible with CUDA

input_ids = tokenizer(query, return_tensors="pt").input_ids.cuda()

# Generate text using the model with the specified temperature setting

output = model.generate(input_ids=input_ids, temperature=0.1)

Text Generation

During text generation, you may highly influence the process of text generation using different parameters:

How Does an LLM Generate Text?

temperature affects the randomness of the token generation
Top-k sampling limits token generation to the top k most likely tokens at each step
Top-p (nucleus) sampling limits token generation to the cumulative probability of p
max_tokens specifies the length of generated tokens

llm = OpenAI(temperature=0.5, top_k=10, top_p=0.75, max_tokens=50)

Quantization

It is crucial performance-wise to use quantization.

How to Fit Large Language Models in Small Memory: Quantization

Below, we’ll optimize a pre-trained language model for efficient performance using 4-bit quantization. The use of BitsAndBytesConfig is vital for applying these optimizations, which are particularly beneficial for deployment scenarios where model size and speed are critical factors.

from transformers import BitsAndBytesConfig, AutoModelForCausalLM

import torch

# Specify the model name or path

model_name_or_path = "your-model-name-or-path"

# Configure BitsAndBytesConfig for 4-bit quantization

# This configuration is used for optimizing the model size and inference speed

bnb_config = BitsAndBytesConfig(

load_in_4bit=True, # Enables loading the model in 4-bit precision

bnb_4bit_compute_dtype=torch.bfloat16, # Sets the computation data type to bfloat16

bnb_4bit_quant_type="nf4", # Sets the quantization type to nf4

bnb_4bit_use_double_quant=True # Enables double quantization for improved accuracy

)

# Load the pre-trained causal language model with 4-bit quantization

model_4bit = AutoModelForCausalLM.from_pretrained(

model_name_or_path,

quantization_config=bnb_config, # Applies the 4-bit quantization configuration

device_map="auto", # Automatically maps the model to the available device

trust_remote_code=True # Trusts the remote code (use cautiously)

)

Fine-tuning

In some cases, one needs to fine-tune a pre-trained language model. Usually, it’s achieved using Low-Rank Adaptation (LoRA) for efficient task-specific adaptation. It also shows the use of gradient checkpointing and preparation for k-bit training, which are techniques to optimize the training process regarding memory and computational efficiency.

How to implement reinforcement learning with human feedback for pre-trained LLMs. Consider if you want to fix bad…

from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

from transformers import AutoModelForCausalLM, Trainer, TrainingArguments, DataCollatorForLanguageModeling

# Load a pre-trained causal language model

pretrained_model = AutoModelForCausalLM.from_pretrained("your-model-name")

# Enable gradient checkpointing for memory efficiency

pretrained_model.gradient_checkpointing_enable()

# Prepare the model for k-bit training, optimizing for low-bit-width training

model = prepare_model_for_kbit_training(pretrained_model)

# Define the LoRa (Low-Rank Adaptation) configuration

# This configures the model for task-specific fine-tuning with low-rank matrices

config = LoraConfig(

r=16, # Rank of the low-rank matrices

lora_alpha=32, # Scale for the LoRA layers

lora_dropout=0.05, # Dropout rate for the LoRA layers

bias="none", # Type of bias to use

target_modules=["query_key_value"], # Target model components for LoRA adaptation

task_type="CAUSAL_LM" # Task type, here Causal Language Modeling

)

# Adapt the model with the specified LoRa configuration

model = get_peft_model(model, config)

# Initialize the Trainer for model training

trainer = Trainer(

model=model,

train_dataset=train_dataset, # Training dataset

args=TrainingArguments(

num_train_epochs=10,

per_device_train_batch_size=8,

# Other training arguments...

),

data_collator=DataCollatorForLanguageModeling(tokenizer) # Collates data batches

)

# Disable caching to save memory during training

model.config.use_cache = False

# Start the training process

trainer.train()

Prompts

LangChain allows the creation of dynamic prompts that can guide the behavior of the text generation ability of language models. Prompt templates in LangChain provide a way to generate specific responses from the model. Let’s look at a practical example where we must create SEO descriptions for particular products.

from langchain.prompts import PromptTemplate, FewShotPromptTemplate

# Define and use a simple prompt template

template = "Act as an SEO expert. Provide a SEO description for {product}"

prompt = PromptTemplate(input_variables=["product"], template=template)

# Format prompt with a specific product

formatted_prompt = prompt.format(product="Perpetuum Mobile")

print(llm(formatted_prompt))

>>>

Perpetuum Mobile is a leading provider of innovative, sustainable energy

solutions. Our products and services are designed to help businesses and

individuals reduce their carbon footprint and save money on energy costs.

We specialize in solar, wind, and geothermal energy systems, as well as

energy storage solutions. Our team of experienced engineers and technicians

are dedicated to providing the highest quality products and services to our

customers. We strive to be the most reliable and cost-effective provider of

renewable energy solutions in the industry. With our commitment to

sustainability and customer satisfaction, Perpetuum Mobile is the perfect

choice for your energy needs.

There might be cases when you have a small, few-shot dataset of several examples showcasing how you would like the task to be performed. Let’s take a look at an example of a text classification task:

# Define a few-shot learning prompt with examples

examples = [

{"email_text": "Win a free iPhone!", "category": "Spam"},

{"email_text": "Next Sprint Planning Meeting.", "category": "Meetings"},

{"email_text": "Version 2.1 of Y is now live",

"category": "Project Updates"}

]

prompt_template = PromptTemplate(

input_variables=["email_text", "category"],

template="Classify the email: {email_text} /n {category}"

)

few_shot_prompt = FewShotPromptTemplate(

example_prompt=prompt_template,

examples=examples,

suffix="Classify the email: {email_text}",

input_variables=["email_text"]

)

# Using few-shot learning prompt

formatted_prompt = few_shot_prompt.format(

email_text="Hi. I'm rescheduling daily standup tomorrow to 10am."

)

print(llm(formatted_prompt))

>>>

/n Meetings

Indexes

Indexes in LangChain are used to handle and retrieve large volumes of data efficiently. Instead of uploading the whole file to text to an LLM, we first index/search for relevant information in the source, and only after finding top k answers, we pass them to formulate a response. Pretty neat!

LangChain 101: Part 3b. Talking to Documents: Embeddings and Vectorstores

In LangChain, using indexes includes loading documents from various sources, splitting texts, creating vectorstores, and retrieving relevant documents

from langchain.document_loaders import WebBaseLoader

from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain.vectorstores import FAISS

# Load documents from a web source

loader = WebBaseLoader("https://en.wikipedia.org/wiki/History_of_mathematics")

loaded_documents = loader.load()

# Split loaded documents into smaller texts

text_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=50)

texts = text_splitter.split_documents(loaded_documents)

# Create a vectorstore and perform similarity search

db = FAISS.from_documents(texts, embeddings)

print(db.similarity_search("What is Isaac Newton's contribution in math?"))

>>>

[Document(page_content="Building on earlier work by many predecessors, Isaac Newton discovered the laws of physics that explain Kepler's Laws, and brought together the concepts now known as calculus. Independently, Gottfried Wilhelm Leibniz, developed calculus and much of the calculus notation still in use today. He also refined the binary number system, which is the foundation of nearly all digital (electronic,", metadata={'source': 'https://en.wikipedia.org/wiki/History_of_mathematics', 'title': 'History of mathematics - Wikipedia', 'language': 'en'}),

Document(page_content='mathematical developments, interacting with new scientific discoveries, were made at an increasing pace that continues through the present day. This includes the groundbreaking work of both Isaac Newton and Gottfried Wilhelm Leibniz in the development of infinitesimal calculus during the course of the 17th century.', metadata={'source': 'https://en.wikipedia.org/wiki/History_of_mathematics', 'title': 'History of mathematics - Wikipedia', 'language': 'en'}),

Document(page_content="In the 13th century, Nasir al-Din Tusi (Nasireddin) made advances in spherical trigonometry. He also wrote influential work on Euclid's parallel postulate. In the 15th century, Ghiyath al-Kashi computed the value of π to the 16th decimal place. Kashi also had an algorithm for calculating nth roots, which was a special case of the methods given many centuries later by Ruffini and Horner.", metadata={'source': 'https://en.wikipedia.org/wiki/History_of_mathematics', 'title': 'History of mathematics - Wikipedia', 'language': 'en'}),

Document(page_content='Whitehead, initiated a long running debate on the foundations of mathematics.', metadata={'source': 'https://en.wikipedia.org/wiki/History_of_mathematics', 'title': 'History of mathematics - Wikipedia', 'language': 'en'})]

Besides using similarity_search, we can use vector databases as retrievers:

# Initialize and use a retriever for relevant documents

retriever = db.as_retriever()

print(retriever.get_relevant_documents("What is Isaac Newton's contribution in math?"))

>>>

[Document(page_content="Building on earlier work by many predecessors, Isaac Newton discovered the laws of physics that explain Kepler's Laws, and brought together the concepts now known as calculus. Independently, Gottfried Wilhelm Leibniz, developed calculus and much of the calculus notation still in use today. He also refined the binary number system, which is the foundation of nearly all digital (electronic,", metadata={'source': 'https://en.wikipedia.org/wiki/History_of_mathematics', 'title': 'History of mathematics - Wikipedia', 'language': 'en'}),

Document(page_content='mathematical developments, interacting with new scientific discoveries, were made at an increasing pace that continues through the present day. This includes the groundbreaking work of both Isaac Newton and Gottfried Wilhelm Leibniz in the development of infinitesimal calculus during the course of the 17th century.', metadata={'source': 'https://en.wikipedia.org/wiki/History_of_mathematics', 'title': 'History of mathematics - Wikipedia', 'language': 'en'}),

Document(page_content="In the 13th century, Nasir al-Din Tusi (Nasireddin) made advances in spherical trigonometry. He also wrote influential work on Euclid's parallel postulate. In the 15th century, Ghiyath al-Kashi computed the value of π to the 16th decimal place. Kashi also had an algorithm for calculating nth roots, which was a special case of the methods given many centuries later by Ruffini and Horner.", metadata={'source': 'https://en.wikipedia.org/wiki/History_of_mathematics', 'title': 'History of mathematics - Wikipedia', 'language': 'en'}),

Document(page_content='Whitehead, initiated a long running debate on the foundations of mathematics.', metadata={'source': 'https://en.wikipedia.org/wiki/History_of_mathematics', 'title': 'History of mathematics - Wikipedia', 'language': 'en'})]

Memory

Memory in LangChain refers to the ability of a model to remember previous parts of a conversation or context. This is a must to maintain continuity in interactions. Let’s use ConversationBufferMemory to store and retrieve conversation histories.

from langchain.memory import ConversationBufferMemory

# Initialize conversation buffer memory

memory = ConversationBufferMemory(memory_key="chat_history")

# Add messages to the conversation memory

memory.chat_memory.add_user_message("Hi!")

memory.chat_memory.add_ai_message("Welcome! How can I help you?")

# Load memory variables if any

memory.load_memory_variables({})

>>>

{'chat_history': 'Human: Hi!\nAI: Welcome! How can I help you?'}

We’ll see some more examples of memory usage in the coming sections.

Chains

LangChain chains are sequences of operations that process input and generate output. Let’s look at an example of building a custom chain for developing an email response based on the provided feedback:

from langchain.prompts import PromptTemplate

from langchain.chains import ConversationChain, summarize, question_answering

from langchain.schema import StrOutputParser

# Define and use a chain for summarizing customer feedback

feedback_summary_prompt = PromptTemplate.from_template(

"""You are a customer service manager. Given the customer feedback,

it is your job to summarize the main points.

Customer Feedback: {feedback}

Summary:"""

)

# Template for drafting a business email response

email_response_prompt = PromptTemplate.from_template(

"""You are a customer service representative. Given the summary of customer feedback,

it is your job to write a professional email response.

Feedback Summary:

{summary}

Email Response:"""

)

feedback_chain = feedback_summary_prompt | llm | StrOutputParser()

email_chain = (

{"summary": feedback_chain}

| email_response_prompt

| llm

| StrOutputParser()

)

# Using the feedback chain with actual customer feedback

email_chain.invoke(

{"feedback": "Disappointed with the late delivery and poor packaging."}

)

>>>

\n\nDear [Customer],\n\nThank you for taking the time to provide us with

your feedback. We apologize for the late delivery and the quality of the

packaging. We take customer satisfaction very seriously and we are sorry

that we did not meet your expectations.\n\nWe are currently looking into

the issue and will take the necessary steps to ensure that this does not

happen again in the future. We value your business and hope that you will

give us another chance to provide you with a better experience.\n\nIf you

have any further questions or concerns, please do not hesitate to contact

us.\n\nSincerely,\n[Your Name]

As you can see, we have two chains: one generates the summary of the feedback (feedback_chain) and one generates an email response based on the summary of the feedback (email_chain). The chain above was created using LangChain Expression Language — the preferred way of creating chains, according to LangChain.

LangChain 101: Part 3a. Talking to Documents: Load, Split and simple RAG with LCEL

We can also use predefined chains, for example, for summarization tasks or simple Q&A:

# Predefined chains for summarization and Q&A

chain = summarize.load_summarize_chain(llm, chain_type="stuff")

chain.run(texts[:30])

>>>

The history of mathematics deals with the origin of discoveries in mathematics

and the mathematical methods and notation of the past. It began in the 6th

century BC with the Pythagoreans, who coined the term "mathematics". Greek

mathematics greatly refined the methods and expanded the subject matter of

mathematics. Chinese mathematics made early contributions, including a place

value system and the first use of negative numbers. The Hindu–Arabic numeral

system and the rules for the use of its operations evolved over the course of

the first millennium AD in India and were transmitted to the Western world via

Islamic mathematics. From ancient times through the Middle Ages, periods of

mathematical discovery were often followed by centuries of stagnation.

Beginning in Renaissance Italy in the 15th century, new mathematical

developments, interacting with new scientific discoveries, were made at an

increasing pace that continues through the present day.

chain = question_answering.load_qa_chain(llm, chain_type="stuff")

chain.run(input_documents=texts[:30],

question="Name the greatest Arab mathematicians of the past"

)

>>>

Muḥammad ibn Mūsā al-Khwārizmī

Besides predefined chains for summarizing feedback, answering questions, etc., we can build our own custom ConversationChain and integrate memory into it.

# Using memory in a conversation chain

memory = ConversationBufferMemory()

conversation = ConversationChain(llm=llm, memory=memory)

conversation.run("Name the tallest mountain in the world")

>>>

The tallest mountain in the world is Mount Everest

conversation.run("How high is it?")

>>>

Mount Everest stands at 8,848 meters (29,029 ft) above sea level.

Agents and Tools

LangChain allows the creation of custom tools and agents for specialized tasks. Custom tools can be anything from calling ones’ API to custom Python functions, which can be integrated into LangChain agents for complex operations. Let’s create an agent, that will lowercase any sentence.

from langchain.tools import StructuredTool, BaseTool

from langchain.agents import initialize_agent, AgentType

import re

# Define and use a custom text processing tool

def text_processing(string: str) -> str:

"""Process the text"""

return string.lower()

text_processing_tool = StructuredTool.from_function(text_processing)

# Initialize and use an agent with the custom tool

agent = initialize_agent([text_processing_tool], llm,

agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True

)

agent.run({"input": "Process the text: London is the capital of Great Britain"})

>>>

> Entering new AgentExecutor chain...

I need to use a text processing tool

Action: text_processing

Action Input: London is the capital of Great Britain

Observation: london is the capital of great britain

Thought: I now know the final answer

Final Answer: london is the capital of great britain

> Finished chain.

'london is the capital of great britain'

As you can see, our agent used the tool we defined and lowercase the sentence. Now, let’s create a fully functional agent. For that, we’ll create a custom tool for converting units (miles to kilometers, for example) within the text and integrate it into a conversational agent using the LangChain framework. The UnitConversionTool class provides a practical example of extending base functionalities with specific conversion logic.

import re

from langchain.tools import BaseTool

from langchain.agents import initialize_agent

class UnitConversionTool(BaseTool):

"""

A tool for converting American units to International units.

Specifically, it converts miles to kilometers.

"""

name = "Unit Conversion Tool"

description = "Converts American units to International units"

def _run(self, text: str):

"""

Synchronously converts miles in the text to kilometers.

Args:

text (str): The input text containing miles to convert.

Returns:

str: The text with miles converted to kilometers.

"""

def miles_to_km(match):

miles = float(match.group(1))

return f"{miles * 1.60934:.2f} km"

return re.sub(r'\b(\d+(\.\d+)?)\s*(miles|mile)\b', miles_to_km, text)

def _arun(self, text: str):

"""

Asynchronous version of the conversion function. Not implemented yet.

"""

raise NotImplementedError("No async yet")

# Initialize an agent with the Unit Conversion Tool

agent = initialize_agent(

agent='chat-conversational-react-description',

tools=[UnitConversionTool()],

llm=llm,

memory=memory

)

# Example usage of the agent to convert units

agent.run("five miles")

>>> Five miles is approximately 8 kilometers.

agent.run("Sorry, I meant 15")

>>> 15 kilometers is approximately 9.3 miles

This wraps up the code shown in my LangChain one-pager. I hope you’ve found it helpful!

Read this blog on how to run your own model in local

LangChain Cheatsheet — All Secrets on a Single Page