Scarf analytics pixel

Jul 2, 2025

How to go from S3 to Qdrant with no code using Unstructured

Unstructured

LLM

Modern AI applications like Retrieval-Augmented Generation (RAG), agentic systems, semantic search, and document intelligence rely on transforming raw content into structured, vectorized knowledge. But building pipelines that connect your cloud storage to vector databases typically demands custom code, infrastructure setup, and ongoing orchestration. This tutorial shows you how to skip all of that. With Unstructured’s no-code Workflow builder, you can ingest PDFs, Word docs, HTML, and many other document types from Amazon S3, enrich them with powerful parsing and metadata extraction, add embeddings, and push this data into Qdrant — all from your browser. Whether you're prototyping a chatbot or launching production-grade search, this is the fastest way to go from unstructured data to AI-ready vectors.

This tutorial walks you through building a document transformation pipeline from Amazon S3 to Qdrant using Unstructured’s web-based Workflow builder. No orchestration code required — everything happens in the UI.


We’ll show you how to:

  • Pull documents from an S3 bucket

  • Partition text from PDFs, DOCX, HTML, and other document types

  • Generate embeddings

  • Push vectors into Qdrant for RAG or search applications

Step 1: Connect your S3 bucket in Unstructured

🔑 Retrieve AWS Security Credentials

  1. Navigate to the top bar in AWS and click your account ID in the top right

  2. Scroll down to Security Credentials

  3. Scroll to the Access keys section and click Create access key

  4. You'll receive an Access Key ID and Secret Access Key

  5. Click Download .csv file to keep a local copy of the keys for reference

🪣 Create a New S3 Bucket

This bucket will contain your input PDFs.

  1. In the AWS Console, go to Amazon S3 → Buckets, then click Create bucket

  2. Use a name like nicks-demo-s3-bucket

  3. Keep Block all public access checked

  4. Leave all other settings as default

  5. Click Create bucket

📄 Upload Your Files

  1. Locate your new bucket in the list and click its name

  2. Click the Upload button

  3. Click Add files and select the documents you want to upload (PDF, DOCX, HTML, JPEG, etc.)

  4. Copy the full Destination URI (e.g., s3://nicks-demo-s3-bucket)

  5. Scroll down and click Upload

🔒 Set S3 Bucket Permissions

  1. Navigate to your bucket

  2. Select the Permissions tab at the top

Note: If you're using access keys tied to an account with full S3 read/write permissions, you can leave the bucket policy blank.

Step 2: Create a new S3 connector in Unstructured

  1. Go to platform.unstructured.io or your organization's tenant address

  2. In the left sidebar, click Connectors

  3. Click + New, ensure Source is selected, and choose Amazon S3

  4. Set a name like nicks-test-s3-connector

  5. Fill in the Bucket URI, AWS Key, and AWS Secret Key

  6. Check Recursive if you want to ingest nested folders

  7. Leave Custom URL blank

  8. Click Save and Test

  9. Upon success, you'll see a confirmation message.

Step 3: Set up Qdrant

🔧 Create a Qdrant Cluster

  1. Sign up at Qdrant.com

  2. On the landing page, click Create Cluster

  3. Name the cluster, choose Amazon Web Services as the cloud provider, and leave the region as default

  4. When prompted, copy the API Key and Qdrant URL — this is the only time they'll be shown

  5. Wait for the cluster status to show Healthy

  6. Click Access Cluster

  7. Paste your saved API key and click Apply

  8. From the left sidebar, click Collections

🧑‍💻 Initialize the Collection via Script

  1. Open a terminal in your IDE and run:


  1. Install dependencies:

  1. Create a file called env-vars.sh with credentials

  2. Create a main.py file with initialization code

  3. Run it using: python3 main.py

Expected output: Collection 'my-test-collection' successfully initialized with status: green

Step 4: Create a Qdrant destination connector

  1. Log in to Unstructured

  2. Click Connectors

  3. Click + New, choose Destination → Qdrant

  4. Give the connector a name

  5. Use values from env-vars.sh to populate fields

  6. Click Test Connection — you should get a success message

Step 5: Create a workflow in Unstructured

  1. From the main dashboard, click Workflows → New Workflow

  2. Select Build it for me

  3. Name your workflow, choose the previously created source and destination connectors, then click Continue

  4. Use the automatic partitioning strategy, default embedding model and size

  5. Leave other settings default, then click Complete

Optional: Adjust the Embedder

  1. Go to the Embedder segment of your workflow

  2. Click the gear icon in the top right

  3. Choose your embedding model

  4. Confirm that [dim <embedding-size>] matches your Qdrant config

Step 6: Run & test the workflow

▶️ Full Run

  1. Go to the Workflows page

  2. Click Run next to your workflow

  3. Use the Schedule tab to automate runs

  4. Visit your Qdrant collection to verify chunked documents have arrived

To delete records, use API calls or the Console tab in Qdrant.

📄 Upload a Sample Document

  1. In your workflow, go to the Source segment

  2. Upload a single document

  3. Click the Results </> icon above the segment to inspect JSON output at every stage

Step 7: Get more from your workflow

🪄 Partitioning Strategy

The default auto strategy detects structure (titles, tables, images) and selectively applies VLM parsing. Read more info here.

🖼️ Image Description Enrichment

Generates human-readable captions for diagrams, photos, and visual elements.

When useful:

  • Instruction manuals with schematics

  • Research reports with charts

  • Scanned docs with key visual content

📊 Table Summary Enrichment

Converts tables to natural language summaries (e.g., 'North America leads in Q1 sales').

Ideal for:

  • Financial reports

  • Policy documents

  • Scanned PDFs with tables

🛠️ Additional Options

  • Table-to-HTML Enrichment

  • Named Entity Recognition (NER)

  • Chunking: by title, character, page, or similarity

  • Contextual Chunking: prepend summaries to chunks

And that's it! 🎉

You now have a fully automated pipeline from S3 documents to enriched, vectorized content in Qdrant, built entirely in Unstructured's UI. Whether launching a RAG system or indexing internal files, this is a fast, reliable starting point.

Keep Reading

Keep Reading

Recent Stories

Recent Stories

Jul 8, 2025

Improving Retrieval in RAG with Reranking

Unstructured

LLM

Jul 8, 2025

Improving Retrieval in RAG with Reranking

Unstructured

LLM

Jul 8, 2025

Improving Retrieval in RAG with Reranking

Unstructured

LLM

Jul 2, 2025

How to go from S3 to Qdrant with no code using Unstructured

Unstructured

LLM

Jul 2, 2025

How to go from S3 to Qdrant with no code using Unstructured

Unstructured

LLM

Jul 2, 2025

How to go from S3 to Qdrant with no code using Unstructured

Unstructured

LLM

Jun 5, 2025

Level Up Your GenAI Apps: What’s Next for RAG

Maria Khalusova

RAG

Jun 5, 2025

Level Up Your GenAI Apps: What’s Next for RAG

Maria Khalusova

RAG

Jun 5, 2025

Level Up Your GenAI Apps: What’s Next for RAG

Maria Khalusova

RAG