Skip to article frontmatterSkip to article content

From PowerPoint to AI: A Beginnerโ€™s Guide ๐Ÿš€

Welcome! This tutorial shows you how to extract content from PowerPoint presentations and prepare it for AI analysis.

What youโ€™ll learn:

  • ๐Ÿ“‚ Load a PowerPoint file into Python
  • โœ‚๏ธ Split it into individual slides
  • ๐Ÿ“ Extract text and images from each slide
  • ๐Ÿค– Prepare content for AI tools like ChatGPT or Claude

No prior experience needed! Weโ€™ll go step by step.

Step 1: Import the Tools ๐Ÿ› ๏ธยถ

from attachments import attach, load, split, present, refine, adapt
from attachments.data import get_sample_path

Step 2: Load a PowerPoint File ๐Ÿ“„ยถ

This is all you need to get started:

# Get sample file and load it
pptx_file = get_sample_path("sample_multipage.pptx")
loaded_pptx = attach(pptx_file) | load.pptx_to_python_pptx + present.markdown + present.images

Step 3: Traditional Approach - All Content Together ๐Ÿ“ยถ

Extract all slides as one big text block:

print(f"โœ… Loaded {len(loaded_pptx._obj.slides)} slides")
print(f"๐Ÿ“„ Total text: {len(loaded_pptx.text)} characters")
print(f"๐Ÿ–ผ๏ธ Images: {len(loaded_pptx.images)}")
โœ… Loaded 6 slides
๐Ÿ“„ Total text: 1414 characters
๐Ÿ–ผ๏ธ Images: 6

Step 4: The Split Approach โœจยถ

This is where it gets powerful! Split into individual slides for granular analysis:

# Split into individual slides
slide_collection = (attach(pptx_file)
                    | load.pptx_to_python_pptx
                    | split.slides
                    | present.markdown + present.images
                    )

print(f"๐Ÿ“‘ Split into {len(slide_collection)} individual slides")
๐Ÿ“‘ Split into 6 individual slides

Step 5: Extract Content from Each Slide ๐Ÿ“‹ยถ

count = 0
for slide in slide_collection:
    count += 1
    print(f"\n๐Ÿ“„ Slide {count}: {len(slide.text)} chars")
    print(slide.text[:200] + "...")

๐Ÿ“„ Slide 1: 392 chars
# Presentation: /home/maxime/Projects/attachments/src/attachments/data/sample_multipage.pptx#slide-1

## Slide 1

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Qingyun Wu ,...

๐Ÿ“„ Slide 2: 1043 chars
# Presentation: /home/maxime/Projects/attachments/src/attachments/data/sample_multipage.pptx#slide-2

## Slide 2

2cdda5c8-e50e-4db4-b5f0-9722a649f455

AutoGen is an open-source framework that allows ...

๐Ÿ“„ Slide 3: 163 chars
# Presentation: /home/maxime/Projects/attachments/src/attachments/data/sample_multipage.pptx#slide-3

## Slide 3

A table to test parsing:

*Slides processed: 1*

...

๐Ÿ“„ Slide 4: 163 chars
# Presentation: /home/maxime/Projects/attachments/src/attachments/data/sample_multipage.pptx#slide-4

## Slide 4

A chartย to test parsing:

*Slides processed: 1*

...

๐Ÿ“„ Slide 5: 161 chars
# Presentation: /home/maxime/Projects/attachments/src/attachments/data/sample_multipage.pptx#slide-5

## Slide 5

A Nested Shape parsing

*Slides processed: 1*

...

๐Ÿ“„ Slide 6: 175 chars
# Presentation: /home/maxime/Projects/attachments/src/attachments/data/sample_multipage.pptx#slide-6

## Slide 6

These Test Strings are in the Image!

*Slides processed: 1*

...

Step 6: Turn each slide into a prompt for AI ๐Ÿ“‹ยถ

# Split into individual slides
slide_collection = (attach(pptx_file)
                    | load.pptx_to_python_pptx
                    | split.slides
                    | present.markdown + present.images
                    )

for i in slide_collection:
    print(i | adapt.claude("Analyze this slide"))
    break
[{'role': 'user', 'content': [{'type': 'text', 'text': 'Analyze this slide\n\n# Presentation: /home/maxime/Projects/attachments/src/attachments/data/sample_multipage.pptx#slide-1\n\n## Slide 1\n\nAutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation\n\nQingyun Wu , Gagan Bansal , Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Awadallah, Ryen W. White, Doug Burger, Chi Wang\n\n\n*Slides processed: 1*\n\n'}]}]