Jul 2, 2025
How to go from S3 to Qdrant with no code using Unstructured
Unstructured
LLM
Modern AI applications like Retrieval-Augmented Generation (RAG), agentic systems, semantic search, and document intelligence rely on transforming raw content into structured, vectorized knowledge. But building pipelines that connect your cloud storage to vector databases typically demands custom code, infrastructure setup, and ongoing orchestration. This tutorial shows you how to skip all of that. With Unstructured’s no-code Workflow builder, you can ingest PDFs, Word docs, HTML, and many other document types from Amazon S3, enrich them with powerful parsing and metadata extraction, add embeddings, and push this data into Qdrant — all from your browser. Whether you're prototyping a chatbot or launching production-grade search, this is the fastest way to go from unstructured data to AI-ready vectors.
This tutorial walks you through building a document transformation pipeline from Amazon S3 to Qdrant using Unstructured’s web-based Workflow builder. No orchestration code required — everything happens in the UI.
We’ll show you how to:
Pull documents from an S3 bucket
Partition text from PDFs, DOCX, HTML, and other document types
Generate embeddings
Push vectors into Qdrant for RAG or search applications
Step 1: Connect your S3 bucket in Unstructured
🔑 Retrieve AWS Security Credentials
Navigate to the top bar in AWS and click your account ID in the top right
Scroll down to Security Credentials
Scroll to the Access keys section and click Create access key
You'll receive an Access Key ID and Secret Access Key
Click Download .csv file to keep a local copy of the keys for reference
🪣 Create a New S3 Bucket
This bucket will contain your input PDFs.
In the AWS Console, go to Amazon S3 → Buckets, then click Create bucket
Use a name like
nicks-demo-s3-bucket
Keep Block all public access checked
Leave all other settings as default
Click Create bucket
📄 Upload Your Files
Locate your new bucket in the list and click its name
Click the Upload button
Click Add files and select the documents you want to upload (PDF, DOCX, HTML, JPEG, etc.)
Copy the full Destination URI (e.g.,
s3://nicks-demo-s3-bucket
)Scroll down and click Upload
🔒 Set S3 Bucket Permissions
Navigate to your bucket
Select the Permissions tab at the top
Note: If you're using access keys tied to an account with full S3 read/write permissions, you can leave the bucket policy blank.
Step 2: Create a new S3 connector in Unstructured
Go to platform.unstructured.io or your organization's tenant address
In the left sidebar, click Connectors
Click + New, ensure Source is selected, and choose Amazon S3
Set a name like
nicks-test-s3-connector
Fill in the Bucket URI, AWS Key, and AWS Secret Key
Check Recursive if you want to ingest nested folders
Leave Custom URL blank
Click Save and Test
Upon success, you'll see a confirmation message.
Step 3: Set up Qdrant
🔧 Create a Qdrant Cluster
Sign up at Qdrant.com
On the landing page, click Create Cluster
Name the cluster, choose Amazon Web Services as the cloud provider, and leave the region as default
When prompted, copy the API Key and Qdrant URL — this is the only time they'll be shown
Wait for the cluster status to show Healthy
Click Access Cluster
Paste your saved API key and click Apply
From the left sidebar, click Collections
🧑💻 Initialize the Collection via Script
Open a terminal in your IDE and run:
Install dependencies:
Create a file called
env-vars.sh
with credentialsCreate a
main.py
file with initialization codeRun it using:
python3 main.py
Expected output: Collection 'my-test-collection' successfully initialized with status: green
Step 4: Create a Qdrant destination connector
Log in to Unstructured
Click Connectors
Click + New, choose Destination → Qdrant
Give the connector a name
Use values from
env-vars.sh
to populate fieldsClick Test Connection — you should get a success message
Step 5: Create a workflow in Unstructured
From the main dashboard, click Workflows → New Workflow
Select Build it for me
Name your workflow, choose the previously created source and destination connectors, then click Continue
Use the automatic partitioning strategy, default embedding model and size
Leave other settings default, then click Complete
Optional: Adjust the Embedder
Go to the Embedder segment of your workflow
Click the gear icon in the top right
Choose your embedding model
Confirm that
[dim <embedding-size>]
matches your Qdrant config
Step 6: Run & test the workflow
▶️ Full Run
Go to the Workflows page
Click Run next to your workflow
Use the Schedule tab to automate runs
Visit your Qdrant collection to verify chunked documents have arrived
To delete records, use API calls or the Console tab in Qdrant.
📄 Upload a Sample Document
In your workflow, go to the Source segment
Upload a single document
Click the Results
</>
icon above the segment to inspect JSON output at every stage
Step 7: Get more from your workflow
🪄 Partitioning Strategy
The default auto strategy detects structure (titles, tables, images) and selectively applies VLM parsing. Read more info here.
🖼️ Image Description Enrichment
Generates human-readable captions for diagrams, photos, and visual elements.
When useful:
Instruction manuals with schematics
Research reports with charts
Scanned docs with key visual content
📊 Table Summary Enrichment
Converts tables to natural language summaries (e.g., 'North America leads in Q1 sales').
Ideal for:
Financial reports
Policy documents
Scanned PDFs with tables
🛠️ Additional Options
Table-to-HTML Enrichment
Named Entity Recognition (NER)
Chunking: by title, character, page, or similarity
Contextual Chunking: prepend summaries to chunks
And that's it! 🎉
You now have a fully automated pipeline from S3 documents to enriched, vectorized content in Qdrant, built entirely in Unstructured's UI. Whether launching a RAG system or indexing internal files, this is a fast, reliable starting point.