The examples below mostly involve OpenAI models. The following code demonstrates initializing and using a language model (OpenAI in this particular case) within LangChain.
from langchain.llms import OpenAI
llm = OpenAI(model_name="text-davinci-003", temperature=0.01)
print(llm("Suggest 3 bday gifts for a data scientist"))
>>>
1. A subscription to a data science magazine or journal
2. A set of data science books
3. A data science-themed mug or t-shirt
As you can see, we initialize an LLM and call it with a query. All the tokenization and embedding happens behind the scene. We can manage the conversation history and incorporate system instructions into the chat to get more response flexibility.
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, AIMessage, SystemMessage
chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.01)
conversation_history = [
HumanMessage(content="Suggest 3 bday gifts for a data scientist"),
AIMessage(content="What is your price range?"),
HumanMessage(content="Under 100$"),
]
print(chat(conversation_history).content)
>>>
1. A data science book: Consider gifting a popular and highly recommended book on data science, such as "Python for Data Analysis" by Wes McKinney or "The Elements of Statistical Learning" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. These books can provide valuable insights and knowledge for a data scientist's professional development.
2. Data visualization tool: A data scientist often deals with large datasets and needs to present their findings effectively. Consider gifting a data visualization tool like Tableau Public or Plotly, which can help them create interactive and visually appealing charts and graphs to communicate their data analysis results.
3. Subscription to a data science platform: Give them access to a data science platform like Kaggle or DataCamp, which offer a wide range of courses, tutorials, and datasets for data scientists to enhance their skills and stay updated with the latest trends in the field. This gift can provide them with valuable learning resources and opportunities for professional growth.
system_instruction = SystemMessage(
content="""You work as an assistant in an electronics store.
Your income depends on the items you sold"""
)
user_message = HumanMessage(content="3 bday gifts for a data scientist")
print(chat([system_instruction, user_message]).content)
>>>
1. Laptop: A high-performance laptop is essential for any data scientist. Look for a model with a powerful processor, ample RAM, and a large storage capacity. This will allow them to run complex data analysis tasks and store large datasets.
2. External Hard Drive: Data scientists deal with massive amounts of data, and having extra storage space is crucial. An external hard drive with a large capacity will provide them with a convenient and secure way to store and backup their data.
3. Data Visualization Tool: Data visualization is an important aspect of data science. Consider gifting them a subscription to a data visualization tool like Tableau or Power BI. These tools will help them create visually appealing and interactive charts, graphs, and dashboards to present their findings effectively.
As you can see, we can shift the conversation in a specific direction using different types of Messages: HumanMessage
, AIMessage
, and SystemMessage
.
Open-source
Now, let’s talk about open-source models. Below is a typical example of initializing and using a pre-trained language model for text generation. The code includes tokenizer usage, model configuration, and efficient inference with quantization (several code snippets below), and CUDA support.
from auto_gptq import AutoGPTQForCausalLM
from transformers import AutoTokenizer
from torch import cuda
# Name of the pre-trained model
model_name = "TheBloke/llama-2-13B-Guanaco-QLoRA-GPTQ"
# Initialize the tokenizer for the model
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
# Initialize the AutoGPTQForCausalLM model with specific configurations
# This model is a quantized version of the GPT model suitable for efficient inference
model = AutoGPTQForCausalLM.from_quantized(
model_name,
use_safetensors=True, # Enables SafeTensors for secure serialization
trust_remote_code=True, # Trusts the remote code (not recommended for untrusted sources)
device_map="auto", # Automatically maps the model to the available device
quantize_config=None # Custom quantization configuration (None for default)
)
# The input query to be tokenized and passed to the model
query = "<Your input text here>"
# Tokenize the input query and convert it to a tensor format compatible with CUDA
input_ids = tokenizer(query, return_tensors="pt").input_ids.cuda()
# Generate text using the model with the specified temperature setting
output = model.generate(input_ids=input_ids, temperature=0.1)
Text Generation
During text generation, you may highly influence the process of text generation using different parameters:
How Does an LLM Generate Text?
temperature
affects the randomness of the token generationTop-k
sampling limits token generation to the top k most likely tokens at each stepTop-p
(nucleus) sampling limits token generation to the cumulative probability of pmax_tokens
specifies the length of generated tokens
llm = OpenAI(temperature=0.5, top_k=10, top_p=0.75, max_tokens=50)
Quantization
It is crucial performance-wise to use quantization.
How to Fit Large Language Models in Small Memory: Quantization
Below, we’ll optimize a pre-trained language model for efficient performance using 4-bit quantization. The use of BitsAndBytesConfig
is vital for applying these optimizations, which are particularly beneficial for deployment scenarios where model size and speed are critical factors.
from transformers import BitsAndBytesConfig, AutoModelForCausalLM
import torch
# Specify the model name or path
model_name_or_path = "your-model-name-or-path"
# Configure BitsAndBytesConfig for 4-bit quantization
# This configuration is used for optimizing the model size and inference speed
bnb_config = BitsAndBytesConfig(
load_in_4bit=True, # Enables loading the model in 4-bit precision
bnb_4bit_compute_dtype=torch.bfloat16, # Sets the computation data type to bfloat16
bnb_4bit_quant_type="nf4", # Sets the quantization type to nf4
bnb_4bit_use_double_quant=True # Enables double quantization for improved accuracy
)
# Load the pre-trained causal language model with 4-bit quantization
model_4bit = AutoModelForCausalLM.from_pretrained(
model_name_or_path,
quantization_config=bnb_config, # Applies the 4-bit quantization configuration
device_map="auto", # Automatically maps the model to the available device
trust_remote_code=True # Trusts the remote code (use cautiously)
)
Fine-tuning
In some cases, one needs to fine-tune a pre-trained language model. Usually, it’s achieved using Low-Rank Adaptation (LoRA) for efficient task-specific adaptation. It also shows the use of gradient checkpointing and preparation for k-bit training, which are techniques to optimize the training process regarding memory and computational efficiency.
How to implement reinforcement learning with human feedback for pre-trained LLMs. Consider if you want to fix bad…
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from transformers import AutoModelForCausalLM, Trainer, TrainingArguments, DataCollatorForLanguageModeling
# Load a pre-trained causal language model
pretrained_model = AutoModelForCausalLM.from_pretrained("your-model-name")
# Enable gradient checkpointing for memory efficiency
pretrained_model.gradient_checkpointing_enable()
# Prepare the model for k-bit training, optimizing for low-bit-width training
model = prepare_model_for_kbit_training(pretrained_model)
# Define the LoRa (Low-Rank Adaptation) configuration
# This configures the model for task-specific fine-tuning with low-rank matrices
config = LoraConfig(
r=16, # Rank of the low-rank matrices
lora_alpha=32, # Scale for the LoRA layers
lora_dropout=0.05, # Dropout rate for the LoRA layers
bias="none", # Type of bias to use
target_modules=["query_key_value"], # Target model components for LoRA adaptation
task_type="CAUSAL_LM" # Task type, here Causal Language Modeling
)
# Adapt the model with the specified LoRa configuration
model = get_peft_model(model, config)
# Initialize the Trainer for model training
trainer = Trainer(
model=model,
train_dataset=train_dataset, # Training dataset
args=TrainingArguments(
num_train_epochs=10,
per_device_train_batch_size=8,
# Other training arguments...
),
data_collator=DataCollatorForLanguageModeling(tokenizer) # Collates data batches
)
# Disable caching to save memory during training
model.config.use_cache = False
# Start the training process
trainer.train()
Prompts
LangChain allows the creation of dynamic prompts that can guide the behavior of the text generation ability of language models. Prompt templates in LangChain provide a way to generate specific responses from the model. Let’s look at a practical example where we must create SEO descriptions for particular products.
from langchain.prompts import PromptTemplate, FewShotPromptTemplate
# Define and use a simple prompt template
template = "Act as an SEO expert. Provide a SEO description for {product}"
prompt = PromptTemplate(input_variables=["product"], template=template)
# Format prompt with a specific product
formatted_prompt = prompt.format(product="Perpetuum Mobile")
print(llm(formatted_prompt))
>>>
Perpetuum Mobile is a leading provider of innovative, sustainable energy
solutions. Our products and services are designed to help businesses and
individuals reduce their carbon footprint and save money on energy costs.
We specialize in solar, wind, and geothermal energy systems, as well as
energy storage solutions. Our team of experienced engineers and technicians
are dedicated to providing the highest quality products and services to our
customers. We strive to be the most reliable and cost-effective provider of
renewable energy solutions in the industry. With our commitment to
sustainability and customer satisfaction, Perpetuum Mobile is the perfect
choice for your energy needs.
There might be cases when you have a small, few-shot dataset of several examples showcasing how you would like the task to be performed. Let’s take a look at an example of a text classification task:
# Define a few-shot learning prompt with examples
examples = [
{"email_text": "Win a free iPhone!", "category": "Spam"},
{"email_text": "Next Sprint Planning Meeting.", "category": "Meetings"},
{"email_text": "Version 2.1 of Y is now live",
"category": "Project Updates"}
]
prompt_template = PromptTemplate(
input_variables=["email_text", "category"],
template="Classify the email: {email_text} /n {category}"
)
few_shot_prompt = FewShotPromptTemplate(
example_prompt=prompt_template,
examples=examples,
suffix="Classify the email: {email_text}",
input_variables=["email_text"]
)
# Using few-shot learning prompt
formatted_prompt = few_shot_prompt.format(
email_text="Hi. I'm rescheduling daily standup tomorrow to 10am."
)
print(llm(formatted_prompt))
>>>
/n Meetings
Indexes
Indexes in LangChain are used to handle and retrieve large volumes of data efficiently. Instead of uploading the whole file to text to an LLM, we first index/search for relevant information in the source, and only after finding top k answers, we pass them to formulate a response. Pretty neat!
LangChain 101: Part 3b. Talking to Documents: Embeddings and Vectorstores
In LangChain, using indexes includes loading documents from various sources, splitting texts, creating vectorstores, and retrieving relevant documents
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
# Load documents from a web source
loader = WebBaseLoader("https://en.wikipedia.org/wiki/History_of_mathematics")
loaded_documents = loader.load()
# Split loaded documents into smaller texts
text_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=50)
texts = text_splitter.split_documents(loaded_documents)
# Create a vectorstore and perform similarity search
db = FAISS.from_documents(texts, embeddings)
print(db.similarity_search("What is Isaac Newton's contribution in math?"))
>>>
[Document(page_content="Building on earlier work by many predecessors, Isaac Newton discovered the laws of physics that explain Kepler's Laws, and brought together the concepts now known as calculus. Independently, Gottfried Wilhelm Leibniz, developed calculus and much of the calculus notation still in use today. He also refined the binary number system, which is the foundation of nearly all digital (electronic,", metadata={'source': 'https://en.wikipedia.org/wiki/History_of_mathematics', 'title': 'History of mathematics - Wikipedia', 'language': 'en'}),
Document(page_content='mathematical developments, interacting with new scientific discoveries, were made at an increasing pace that continues through the present day. This includes the groundbreaking work of both Isaac Newton and Gottfried Wilhelm Leibniz in the development of infinitesimal calculus during the course of the 17th century.', metadata={'source': 'https://en.wikipedia.org/wiki/History_of_mathematics', 'title': 'History of mathematics - Wikipedia', 'language': 'en'}),
Document(page_content="In the 13th century, Nasir al-Din Tusi (Nasireddin) made advances in spherical trigonometry. He also wrote influential work on Euclid's parallel postulate. In the 15th century, Ghiyath al-Kashi computed the value of π to the 16th decimal place. Kashi also had an algorithm for calculating nth roots, which was a special case of the methods given many centuries later by Ruffini and Horner.", metadata={'source': 'https://en.wikipedia.org/wiki/History_of_mathematics', 'title': 'History of mathematics - Wikipedia', 'language': 'en'}),
Document(page_content='Whitehead, initiated a long running debate on the foundations of mathematics.', metadata={'source': 'https://en.wikipedia.org/wiki/History_of_mathematics', 'title': 'History of mathematics - Wikipedia', 'language': 'en'})]
Besides using similarity_search
, we can use vector databases as retrievers:
# Initialize and use a retriever for relevant documents
retriever = db.as_retriever()
print(retriever.get_relevant_documents("What is Isaac Newton's contribution in math?"))
>>>
[Document(page_content="Building on earlier work by many predecessors, Isaac Newton discovered the laws of physics that explain Kepler's Laws, and brought together the concepts now known as calculus. Independently, Gottfried Wilhelm Leibniz, developed calculus and much of the calculus notation still in use today. He also refined the binary number system, which is the foundation of nearly all digital (electronic,", metadata={'source': 'https://en.wikipedia.org/wiki/History_of_mathematics', 'title': 'History of mathematics - Wikipedia', 'language': 'en'}),
Document(page_content='mathematical developments, interacting with new scientific discoveries, were made at an increasing pace that continues through the present day. This includes the groundbreaking work of both Isaac Newton and Gottfried Wilhelm Leibniz in the development of infinitesimal calculus during the course of the 17th century.', metadata={'source': 'https://en.wikipedia.org/wiki/History_of_mathematics', 'title': 'History of mathematics - Wikipedia', 'language': 'en'}),
Document(page_content="In the 13th century, Nasir al-Din Tusi (Nasireddin) made advances in spherical trigonometry. He also wrote influential work on Euclid's parallel postulate. In the 15th century, Ghiyath al-Kashi computed the value of π to the 16th decimal place. Kashi also had an algorithm for calculating nth roots, which was a special case of the methods given many centuries later by Ruffini and Horner.", metadata={'source': 'https://en.wikipedia.org/wiki/History_of_mathematics', 'title': 'History of mathematics - Wikipedia', 'language': 'en'}),
Document(page_content='Whitehead, initiated a long running debate on the foundations of mathematics.', metadata={'source': 'https://en.wikipedia.org/wiki/History_of_mathematics', 'title': 'History of mathematics - Wikipedia', 'language': 'en'})]
Memory
Memory in LangChain refers to the ability of a model to remember previous parts of a conversation or context. This is a must to maintain continuity in interactions. Let’s use ConversationBufferMemory
to store and retrieve conversation histories.
from langchain.memory import ConversationBufferMemory
# Initialize conversation buffer memory
memory = ConversationBufferMemory(memory_key="chat_history")
# Add messages to the conversation memory
memory.chat_memory.add_user_message("Hi!")
memory.chat_memory.add_ai_message("Welcome! How can I help you?")
# Load memory variables if any
memory.load_memory_variables({})
>>>
{'chat_history': 'Human: Hi!\nAI: Welcome! How can I help you?'}
We’ll see some more examples of memory usage in the coming sections.
Chains
LangChain chains are sequences of operations that process input and generate output. Let’s look at an example of building a custom chain for developing an email response based on the provided feedback:
from langchain.prompts import PromptTemplate
from langchain.chains import ConversationChain, summarize, question_answering
from langchain.schema import StrOutputParser
# Define and use a chain for summarizing customer feedback
feedback_summary_prompt = PromptTemplate.from_template(
"""You are a customer service manager. Given the customer feedback,
it is your job to summarize the main points.
Customer Feedback: {feedback}
Summary:"""
)
# Template for drafting a business email response
email_response_prompt = PromptTemplate.from_template(
"""You are a customer service representative. Given the summary of customer feedback,
it is your job to write a professional email response.
Feedback Summary:
{summary}
Email Response:"""
)
feedback_chain = feedback_summary_prompt | llm | StrOutputParser()
email_chain = (
{"summary": feedback_chain}
| email_response_prompt
| llm
| StrOutputParser()
)
# Using the feedback chain with actual customer feedback
email_chain.invoke(
{"feedback": "Disappointed with the late delivery and poor packaging."}
)
>>>
\n\nDear [Customer],\n\nThank you for taking the time to provide us with
your feedback. We apologize for the late delivery and the quality of the
packaging. We take customer satisfaction very seriously and we are sorry
that we did not meet your expectations.\n\nWe are currently looking into
the issue and will take the necessary steps to ensure that this does not
happen again in the future. We value your business and hope that you will
give us another chance to provide you with a better experience.\n\nIf you
have any further questions or concerns, please do not hesitate to contact
us.\n\nSincerely,\n[Your Name]
As you can see, we have two chains: one generates the summary of the feedback (feedback_chain
) and one generates an email response based on the summary of the feedback (email_chain
). The chain above was created using LangChain Expression Language — the preferred way of creating chains, according to LangChain.
LangChain 101: Part 3a. Talking to Documents: Load, Split and simple RAG with LCEL
We can also use predefined chains, for example, for summarization tasks or simple Q&A:
# Predefined chains for summarization and Q&A
chain = summarize.load_summarize_chain(llm, chain_type="stuff")
chain.run(texts[:30])
>>>
The history of mathematics deals with the origin of discoveries in mathematics
and the mathematical methods and notation of the past. It began in the 6th
century BC with the Pythagoreans, who coined the term "mathematics". Greek
mathematics greatly refined the methods and expanded the subject matter of
mathematics. Chinese mathematics made early contributions, including a place
value system and the first use of negative numbers. The Hindu–Arabic numeral
system and the rules for the use of its operations evolved over the course of
the first millennium AD in India and were transmitted to the Western world via
Islamic mathematics. From ancient times through the Middle Ages, periods of
mathematical discovery were often followed by centuries of stagnation.
Beginning in Renaissance Italy in the 15th century, new mathematical
developments, interacting with new scientific discoveries, were made at an
increasing pace that continues through the present day.
chain = question_answering.load_qa_chain(llm, chain_type="stuff")
chain.run(input_documents=texts[:30],
question="Name the greatest Arab mathematicians of the past"
)
>>>
Muḥammad ibn Mūsā al-Khwārizmī
Besides predefined chains for summarizing feedback, answering questions, etc., we can build our own custom ConversationChain
and integrate memory into it.
# Using memory in a conversation chain
memory = ConversationBufferMemory()
conversation = ConversationChain(llm=llm, memory=memory)
conversation.run("Name the tallest mountain in the world")
>>>
The tallest mountain in the world is Mount Everest
conversation.run("How high is it?")
>>>
Mount Everest stands at 8,848 meters (29,029 ft) above sea level.
Agents and Tools
LangChain allows the creation of custom tools and agents for specialized tasks. Custom tools can be anything from calling ones’ API to custom Python functions, which can be integrated into LangChain agents for complex operations. Let’s create an agent, that will lowercase any sentence.
from langchain.tools import StructuredTool, BaseTool
from langchain.agents import initialize_agent, AgentType
import re
# Define and use a custom text processing tool
def text_processing(string: str) -> str:
"""Process the text"""
return string.lower()
text_processing_tool = StructuredTool.from_function(text_processing)
# Initialize and use an agent with the custom tool
agent = initialize_agent([text_processing_tool], llm,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run({"input": "Process the text: London is the capital of Great Britain"})
>>>
> Entering new AgentExecutor chain...
I need to use a text processing tool
Action: text_processing
Action Input: London is the capital of Great Britain
Observation: london is the capital of great britain
Thought: I now know the final answer
Final Answer: london is the capital of great britain
> Finished chain.
'london is the capital of great britain'
As you can see, our agent used the tool we defined and lowercase the sentence. Now, let’s create a fully functional agent. For that, we’ll create a custom tool for converting units (miles to kilometers, for example) within the text and integrate it into a conversational agent using the LangChain framework. The UnitConversionTool
class provides a practical example of extending base functionalities with specific conversion logic.
import re
from langchain.tools import BaseTool
from langchain.agents import initialize_agent
class UnitConversionTool(BaseTool):
"""
A tool for converting American units to International units.
Specifically, it converts miles to kilometers.
"""
name = "Unit Conversion Tool"
description = "Converts American units to International units"
def _run(self, text: str):
"""
Synchronously converts miles in the text to kilometers.
Args:
text (str): The input text containing miles to convert.
Returns:
str: The text with miles converted to kilometers.
"""
def miles_to_km(match):
miles = float(match.group(1))
return f"{miles * 1.60934:.2f} km"
return re.sub(r'\b(\d+(\.\d+)?)\s*(miles|mile)\b', miles_to_km, text)
def _arun(self, text: str):
"""
Asynchronous version of the conversion function. Not implemented yet.
"""
raise NotImplementedError("No async yet")
# Initialize an agent with the Unit Conversion Tool
agent = initialize_agent(
agent='chat-conversational-react-description',
tools=[UnitConversionTool()],
llm=llm,
memory=memory
)
# Example usage of the agent to convert units
agent.run("five miles")
>>> Five miles is approximately 8 kilometers.
agent.run("Sorry, I meant 15")
>>> 15 kilometers is approximately 9.3 miles
This wraps up the code shown in my LangChain one-pager. I hope you’ve found it helpful!
Read this blog on how to run your own model in local