Text Summarization with LangChain

So far I have used the HuggingFace models to summarize long text. But after discovering LangChain’s extraction capabilities I tried to do an pipeline to summarize long text.

In this example my pipeline will summarize this paper: https://journals.sagepub.com/doi/epub/10.1177/23409444231185790 Sensing, seizing, and reconfiguring dynamic capabilities in innovative firms: Why does strategic leadership make a difference? is a academic paper and I don’t want read it at all. I’ll decide after the reading AI generated summary.

🤖 Unlocking the Magic of Document Summarization with Python! 📚🔍

Hey there, code wizards! 👋 Are you ready to dive into a world where your Python spells can turn lengthy documents into bite-sized nuggets of wisdom? 🧙‍♂️ Today, we’re going to unravel the mysteries of document summarization using a pinch of code and a sprinkle of AI fairy dust! ✨

🔮 The Quest for Knowledge

Imagine you’ve stumbled upon a mystical text known as “sample-paper.pdf”. 📜 But, oh dear adventurer, it’s as thick as an ancient tome! Fear not, for we have a spell to split these pages into manageable chunks using the mighty PyPDFLoader! 📚✂️

# %% MR Read PDF
loader = PyPDFLoader("sample-paper.pdf")
pages = loader.load_and_split()

🌟 Forging the Index of Power

Now that our pages are in hand, let’s forge an index that’ll make finding knowledge a breeze! Meet FAISS, a trusty tool that creates an index using the power of OpenAIEmbeddings! 🔍🔗

# %% MR Make Index
faiss_index = FAISS.from_documents(pages, OpenAIEmbeddings())

🔍 Seeking the Golden Nuggets

Ever wished for a magical magnifying glass that finds the juiciest bits of information in the blink of an eye? Our FAISS index can do just that! 🕵️‍♀️ Let’s summon it to search for a specific incantation, “What about CEOs?”, and retrieve the top 5 results!

# %% MR Search
docs = faiss_index.similarity_search("What about CEOs?", k=5)
for doc in docs:
    print(str(doc.metadata["page"]) + ":", doc.page_content[:300])
    print("----")

📝 Summoning the Summary Sorcery

But wait, there’s more! 🎩✨ Our enchanted code can also summarize these findings into beautiful, concise gems. With the power of OpenAI’s language model, we can create both regular summaries and bullet-point wonders! Let’s weave the magic:

# %% MR
llm = OpenAI(temperature=0)
map_prompt = """
Write a concise summary of the following:
"{text}"
CONCISE SUMMARY:
"""
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text"])

combine_prompt = """
Write a concise summary of the following text delimited by triple backquotes.
Return your response in bullet points which covers the key points of the text.
```{text}```
BULLET POINT SUMMARY:
"""
combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=["text"])

summary_chain = load_summarize_chain(llm=llm,
                                     chain_type='map_reduce',
                                     map_prompt=map_prompt_template,
                                     combine_prompt=combine_prompt_template,
                                    )

mapreduce_output = summary_chain.run(docs)

🎉 Unveiling the Treasures

And there you have it, fellow explorers! With a wave of your coding wand, you’ve tamed the wild words and unleashed their essence. From splitting ancient texts to summoning concise summaries, your journey has been nothing short of magical! 🌟✨

So go forth, brave coders! May your code be bug-free and your summaries be succinct. Remember, the world of AI and coding is as enchanting as the stories it unveils. Until next time, happy coding! 🚀🔥

There you go! A playful and informative blog post that introduces the provided code in a whimsical manner. Feel free to customize and adjust the text as needed to match your style and preferences.

🎉 Bonus

The AI gives me this result:

This study examined the exploratory and exploitative learning, as well as the transactional and transformational leadership styles of CEOs in two departments (marketing and production).
Cronbach’s alpha coefficients were found to be .86, .71, and .84 for sensing, seizing, and reconfiguration capabilities respectively.
Exploratory and exploitative learning are related to sensing and seizing capabilities in both marketing and production departments.
Transformational leadership style was found to reinforce the relationship between exploratory learning and sensing capability in both departments.
Correlation and descriptive analyses showed positive relationships between OL, leadership style, and the DCs of the firm.
Multiple and hierarchical regression analysis was used to test hypotheses, and results showed a statistically significant relationship between exploratory learning and sensing capability for production departments, and a positive and statistically significant relationship between exploitative learning and sensing capability for marketing departments.

Mastering Productivity and Self-Reflection: A Guide to Effective Daily Journaling

Using VertexAI Instead of OpenAI in LangChain