Docs

Schedule a demo

Jul 2, 2025

How to go from S3 to Qdrant with no code using Unstructured

Unstructured

LLM

Modern AI applications like Retrieval-Augmented Generation (RAG), agentic systems, semantic search, and document intelligence rely on transforming raw content into structured, vectorized knowledge. But building pipelines that connect your cloud storage to vector databases typically demands custom code, infrastructure setup, and ongoing orchestration. This tutorial shows you how to skip all of that. With Unstructured’s no-code Workflow builder, you can ingest PDFs, Word docs, HTML, and many other document types from Amazon S3, enrich them with powerful parsing and metadata extraction, add embeddings, and push this data into Qdrant — all from your browser. Whether you're prototyping a chatbot or launching production-grade search, this is the fastest way to go from unstructured data to AI-ready vectors.

This tutorial walks you through building a document transformation pipeline from Amazon S3 to Qdrant using Unstructured’s web-based Workflow builder. No orchestration code required — everything happens in the UI.

We’ll show you how to:

Pull documents from an S3 bucket
Partition text from PDFs, DOCX, HTML, and other document types
Generate embeddings
Push vectors into Qdrant for RAG or search applications

Step 1: Connect your S3 bucket in Unstructured

🔑 Retrieve AWS Security Credentials

Navigate to the top bar in AWS and click your account ID in the top right
Scroll down to Security Credentials
Scroll to the Access keys section and click Create access key
You'll receive an Access Key ID and Secret Access Key
Click Download .csv file to keep a local copy of the keys for reference

🪣 Create a New S3 Bucket

This bucket will contain your input PDFs.

In the AWS Console, go to Amazon S3 → Buckets, then click Create bucket
Use a name like nicks-demo-s3-bucket
Keep Block all public access checked
Leave all other settings as default
Click Create bucket

📄 Upload Your Files

Locate your new bucket in the list and click its name
Click the Upload button
Click Add files and select the documents you want to upload (PDF, DOCX, HTML, JPEG, etc.)
Copy the full Destination URI (e.g., s3://nicks-demo-s3-bucket)
Scroll down and click Upload

🔒 Set S3 Bucket Permissions

Navigate to your bucket
Select the Permissions tab at the top

Note: If you're using access keys tied to an account with full S3 read/write permissions, you can leave the bucket policy blank.

Step 2: Create a new S3 connector in Unstructured

Go to platform.unstructured.io or your organization's tenant address
In the left sidebar, click Connectors
Click + New, ensure Source is selected, and choose Amazon S3
Set a name like nicks-test-s3-connector
Fill in the Bucket URI, AWS Key, and AWS Secret Key
Check Recursive if you want to ingest nested folders
Leave Custom URL blank
Click Save and Test
Upon success, you'll see a confirmation message.

Step 3: Set up Qdrant

🔧 Create a Qdrant Cluster

Sign up at Qdrant.com
On the landing page, click Create Cluster
Name the cluster, choose Amazon Web Services as the cloud provider, and leave the region as default
When prompted, copy the API Key and Qdrant URL — this is the only time they'll be shown
Wait for the cluster status to show Healthy
Click Access Cluster
Paste your saved API key and click Apply
From the left sidebar, click Collections

🧑‍💻 Initialize the Collection via Script

Open a terminal in your IDE and run:

Install dependencies:

Create a file called env-vars.sh with credentials
Create a main.py file with initialization code
Run it using: python3 main.py

Expected output: Collection 'my-test-collection' successfully initialized with status: green

Step 4: Create a Qdrant destination connector

Log in to Unstructured
Click Connectors
Click + New, choose Destination → Qdrant
Give the connector a name
Use values from env-vars.sh to populate fields
Click Test Connection — you should get a success message

Step 5: Create a workflow in Unstructured

From the main dashboard, click Workflows → New Workflow
Select Build it for me
Name your workflow, choose the previously created source and destination connectors, then click Continue
Use the automatic partitioning strategy, default embedding model and size
Leave other settings default, then click Complete

Optional: Adjust the Embedder

Go to the Embedder segment of your workflow
Click the gear icon in the top right
Choose your embedding model
Confirm that [dim <embedding-size>] matches your Qdrant config

Step 6: Run & test the workflow

▶️ Full Run

Go to the Workflows page
Click Run next to your workflow
Use the Schedule tab to automate runs
Visit your Qdrant collection to verify chunked documents have arrived

To delete records, use API calls or the Console tab in Qdrant.

📄 Upload a Sample Document

In your workflow, go to the Source segment
Upload a single document
Click the Results </> icon above the segment to inspect JSON output at every stage

Step 7: Get more from your workflow

🪄 Partitioning Strategy

The default auto strategy detects structure (titles, tables, images) and selectively applies VLM parsing. Read more info here.

🖼️ Image Description Enrichment

Generates human-readable captions for diagrams, photos, and visual elements.

When useful:

Instruction manuals with schematics
Research reports with charts
Scanned docs with key visual content

📊 Table Summary Enrichment

Converts tables to natural language summaries (e.g., 'North America leads in Q1 sales').

Ideal for:

Financial reports
Policy documents
Scanned PDFs with tables

🛠️ Additional Options

Table-to-HTML Enrichment
Named Entity Recognition (NER)
Chunking: by title, character, page, or similarity
Contextual Chunking: prepend summaries to chunks

And that's it! 🎉

You now have a fully automated pipeline from S3 documents to enriched, vectorized content in Qdrant, built entirely in Unstructured's UI. Whether launching a RAG system or indexing internal files, this is a fast, reliable starting point.