Welcome! This tutorial shows you how to extract content from PowerPoint presentations and prepare it for AI analysis.
What youโll learn:
- ๐ Load a PowerPoint file into Python
- โ๏ธ Split it into individual slides
- ๐ Extract text and images from each slide
- ๐ค Prepare content for AI tools like ChatGPT or Claude
No prior experience needed! Weโll go step by step.
Step 1: Import the Tools ๐ ๏ธยถ
from attachments import attach, load, split, present, refine, adapt
from attachments.data import get_sample_path
Step 2: Load a PowerPoint File ๐ยถ
This is all you need to get started:
# Get sample file and load it
pptx_file = get_sample_path("sample_multipage.pptx")
loaded_pptx = attach(pptx_file) | load.pptx_to_python_pptx + present.markdown + present.images
Step 3: Traditional Approach - All Content Together ๐ยถ
Extract all slides as one big text block:
print(f"โ
Loaded {len(loaded_pptx._obj.slides)} slides")
print(f"๐ Total text: {len(loaded_pptx.text)} characters")
print(f"๐ผ๏ธ Images: {len(loaded_pptx.images)}")
โ
Loaded 6 slides
๐ Total text: 1414 characters
๐ผ๏ธ Images: 6
Step 4: The Split Approach โจยถ
This is where it gets powerful! Split into individual slides for granular analysis:
# Split into individual slides
slide_collection = (attach(pptx_file)
| load.pptx_to_python_pptx
| split.slides
| present.markdown + present.images
)
print(f"๐ Split into {len(slide_collection)} individual slides")
๐ Split into 6 individual slides
Step 5: Extract Content from Each Slide ๐ยถ
count = 0
for slide in slide_collection:
count += 1
print(f"\n๐ Slide {count}: {len(slide.text)} chars")
print(slide.text[:200] + "...")
๐ Slide 1: 392 chars
# Presentation: /home/maxime/Projects/attachments/src/attachments/data/sample_multipage.pptx#slide-1
## Slide 1
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Qingyun Wu ,...
๐ Slide 2: 1043 chars
# Presentation: /home/maxime/Projects/attachments/src/attachments/data/sample_multipage.pptx#slide-2
## Slide 2
2cdda5c8-e50e-4db4-b5f0-9722a649f455
AutoGen is an open-source framework that allows ...
๐ Slide 3: 163 chars
# Presentation: /home/maxime/Projects/attachments/src/attachments/data/sample_multipage.pptx#slide-3
## Slide 3
A table to test parsing:
*Slides processed: 1*
...
๐ Slide 4: 163 chars
# Presentation: /home/maxime/Projects/attachments/src/attachments/data/sample_multipage.pptx#slide-4
## Slide 4
A chartย to test parsing:
*Slides processed: 1*
...
๐ Slide 5: 161 chars
# Presentation: /home/maxime/Projects/attachments/src/attachments/data/sample_multipage.pptx#slide-5
## Slide 5
A Nested Shape parsing
*Slides processed: 1*
...
๐ Slide 6: 175 chars
# Presentation: /home/maxime/Projects/attachments/src/attachments/data/sample_multipage.pptx#slide-6
## Slide 6
These Test Strings are in the Image!
*Slides processed: 1*
...
Step 6: Turn each slide into a prompt for AI ๐ยถ
# Split into individual slides
slide_collection = (attach(pptx_file)
| load.pptx_to_python_pptx
| split.slides
| present.markdown + present.images
)
for i in slide_collection:
print(i | adapt.claude("Analyze this slide"))
break
[{'role': 'user', 'content': [{'type': 'text', 'text': 'Analyze this slide\n\n# Presentation: /home/maxime/Projects/attachments/src/attachments/data/sample_multipage.pptx#slide-1\n\n## Slide 1\n\nAutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation\n\nQingyun Wu , Gagan Bansal , Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Awadallah, Ryen W. White, Doug Burger, Chi Wang\n\n\n*Slides processed: 1*\n\n'}]}]