So far I have used the HuggingFace models to summarize long text. But after discovering LangChain’s extraction capabilities I tried to do an pipeline to summarize long text.

In this example my pipeline will summarize this paper: https://journals.sagepub.com/doi/epub/10.1177/23409444231185790 Sensing, seizing, and reconfiguring dynamic capabilities in innovative firms: Why does strategic leadership make a difference? is a academic paper and I don’t want read it at all. I’ll decide after the reading AI generated summary.

🤖 Unlocking the Magic of Document Summarization with Python! 📚🔍

Hey there, code wizards! 👋 Are you ready to dive into a world where your Python spells can turn lengthy documents into bite-sized nuggets of wisdom? 🧙‍♂️ Today, we’re going to unravel the mysteries of document summarization using a pinch of code and a sprinkle of AI fairy dust! ✨

🔮 The Quest for Knowledge

Imagine you’ve stumbled upon a mystical text known as “sample-paper.pdf”. 📜 But, oh dear adventurer, it’s as thick as an ancient tome! Fear not, for we have a spell to split these pages into manageable chunks using the mighty PyPDFLoader! 📚✂️

# %% MR Read PDF
loader = PyPDFLoader("sample-paper.pdf")
pages = loader.load_and_split()

🌟 Forging the Index of Power

Now that our pages are in hand, let’s forge an index that’ll make finding knowledge a breeze! Meet FAISS, a trusty tool that creates an index using the power of OpenAIEmbeddings! 🔍🔗

# %% MR Make Index
faiss_index = FAISS.from_documents(pages, OpenAIEmbeddings())

🔍 Seeking the Golden Nuggets

Ever wished for a magical magnifying glass that finds the juiciest bits of information in the blink of an eye? Our FAISS index can do just that! 🕵️‍♀️ Let’s summon it to search for a specific incantation, “What about CEOs?”, and retrieve the top 5 results!

# %% MR Search
docs = faiss_index.similarity_search("What about CEOs?", k=5)
for doc in docs:
    print(str(doc.metadata["page"]) + ":", doc.page_content[:300])
    print("----")

📝 Summoning the Summary Sorcery

But wait, there’s more! 🎩✨ Our enchanted code can also summarize these findings into beautiful, concise gems. With the power of OpenAI’s language model, we can create both regular summaries and bullet-point wonders! Let’s weave the magic:

# %% MR
llm = OpenAI(temperature=0)
map_prompt = """
Write a concise summary of the following:
"{text}"
CONCISE SUMMARY:
"""
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text"])

combine_prompt = """
Write a concise summary of the following text delimited by triple backquotes.
Return your response in bullet points which covers the key points of the text.
```{text}```
BULLET POINT SUMMARY:
"""
combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=["text"])

summary_chain = load_summarize_chain(llm=llm,
                                     chain_type='map_reduce',
                                     map_prompt=map_prompt_template,
                                     combine_prompt=combine_prompt_template,
                                    )

mapreduce_output = summary_chain.run(docs)

🎉 Unveiling the Treasures

And there you have it, fellow explorers! With a wave of your coding wand, you’ve tamed the wild words and unleashed their essence. From splitting ancient texts to summoning concise summaries, your journey has been nothing short of magical! 🌟✨

So go forth, brave coders! May your code be bug-free and your summaries be succinct. Remember, the world of AI and coding is as enchanting as the stories it unveils. Until next time, happy coding! 🚀🔥


There you go! A playful and informative blog post that introduces the provided code in a whimsical manner. Feel free to customize and adjust the text as needed to match your style and preferences.

🎉 Bonus

The AI gives me this result:

  • This study examined the exploratory and exploitative learning, as well as the transactional and transformational leadership styles of CEOs in two departments (marketing and production).
  • Cronbach’s alpha coefficients were found to be .86, .71, and .84 for sensing, seizing, and reconfiguration capabilities respectively.
  • Exploratory and exploitative learning are related to sensing and seizing capabilities in both marketing and production departments.
  • Transformational leadership style was found to reinforce the relationship between exploratory learning and sensing capability in both departments.
  • Correlation and descriptive analyses showed positive relationships between OL, leadership style, and the DCs of the firm.
  • Multiple and hierarchical regression analysis was used to test hypotheses, and results showed a statistically significant relationship between exploratory learning and sensing capability for production departments, and a positive and statistically significant relationship between exploitative learning and sensing capability for marketing departments.

<
Previous Post
Mastering Productivity and Self-Reflection: A Guide to Effective Daily Journaling
>
Next Post
Using VertexAI Instead of OpenAI in LangChain