Industrial Computer Vision for Defect Detection on AWS: A Practical Reference Architecture

Industrial Computer Vision for Defect Detection on AWS: A Practical Reference Architecture

Cloud Computing

Feb 6, 2026

20 min

Cloud Computing

Feb 6, 2026

20 min


The outcome

A credible defect detection system in manufacturing should deliver four non-negotiables:

  1. Consistent capture → consistent predictions (poor capture conditions destroy model quality).

  2. Clear model lifecycle (train, validate, deploy, monitor, retrain).

  3. Real-time + batch flexibility (some lines need immediate decisions; others run audits overnight).

  4. Actionable outputs (alerts, dashboards, and write-backs into MES/QMS workflows).

The AWS architecture is designed around exactly those constraints.


Architecture at a glance (what’s happening end-to-end)


1) Image capture on the shop floor (device tier)

The process starts where quality begins: X-ray machines, cameras, and other inspection devices operating under consistent capture conditions. Consistency matters more than people expect—lighting, angle, exposure, calibration, and part positioning can make or break model performance.

Intelliblitz note: Before we touch ML, we standardize capture. We treat it like an engineering spec, not an “operator preference.”

2) Transfer from edge to AWS (edge tier ingestion)

The architecture supports multiple ingestion paths depending on what your plant already runs:

  • AWS Transfer Family (managed file transfer patterns)

  • AWS DataSync (reliable, large-scale data movement)

  • AWS IoT Greengrass (especially useful when images are produced at the edge and you want local control + secure sync)

This is practical: different plants have different operational realities. The system shouldn’t force a single ingestion pattern.

3) Central storage and dataset management in Amazon S3

All images land in Amazon S3, where they’re organized into datasets—typically split into training and test/validationsets (and in mature systems: versioned datasets by line, plant, supplier, or part revision).

S3 becomes your system of record for:

  • Raw images

  • Labeled datasets

  • Inference outputs (predictions, scores, metadata)

  • Audit trails for compliance

Intelliblitz note: We implement a dataset structure that supports traceability (part ID, batch/lot, timestamp, line, tool calibration version). This is what makes audits and root-cause analysis possible later.

4) Model training with Amazon Lookout for Vision

Amazon Lookout for Vision is used to:

  • Assist with labeling workflows

  • Train and tune the defect detection model

  • Deploy the trained model for inference

This reduces the barrier to entry for industrial anomaly detection—especially for teams that don’t want to build a full custom vision training stack from scratch.

5) Expose and manage the model via AWS Lambda + Amazon API Gateway

Once trained, the architecture exposes model management and interaction through:

  • AWS Lambda (for management logic / lightweight control plane actions)

  • Amazon API Gateway (to provide a clean API layer for admins and data teams)


This is important because production deployments need controlled access, logging, and governance—not “someone running notebooks.”


Inference paths: real-time, orchestrated, and batch


6) Serverless orchestration for inference (AWS Step Functions + Lambda)

For runtime defect detection, the architecture uses AWS Step Functions to orchestrate inference workflows with Lambda.

Why this matters:

  • Step Functions gives you repeatable orchestration and visibility

  • Lambda provides modular “glue” logic: preprocess, call inference, postprocess, route outputs

This creates a scalable pattern for event-driven inspection pipelines.

7) Batch anomaly detection (AWS Batch + AWS Fargate)

Not every inspection needs real-time decisions. Many manufacturers run:

  • Batch analysis per shift

  • Supplier audits

  • Re-inspection of borderline items

  • Retrospective analysis after a quality incident

Here the architecture supports AWS Batch, with compute provisioned using AWS Fargate, so you can scale batch workloads without managing servers.

8) Alerts and confidence-based notifications (Amazon SNS)

The system notifies users using Amazon SNS—and crucially, it can route alerts based on confidence thresholds.

A mature quality workflow doesn’t treat predictions as binary. It treats them as:

  • High confidence defect → immediate action

  • Medium confidence → human review queue

  • Low confidence → label for future retraining / capture conditions check


This is how you reduce false positives without becoming blind to real defects.


9) Results storage, visualization, and shop-floor write-back (S3 + QuickSight + MES integration)

Finally, results can be:

  • Stored back in Amazon S3

  • Visualized for business and quality users in Amazon QuickSight

  • Written back into the MES through an integration step (commonly via AWS Lambda)


This closes the loop—from inspection → decision → reporting → operational action.

Intelliblitz note: Dashboards are not the finish line. The finish line is “the line behavior changes automatically” (hold/rework routing, supplier scoring, CAPA triggers, operator prompts).

What separates a demo from a production system

Most “AI defect detection” projects fail not because the model is bad—but because the surrounding system is weak. Here’s what we add when we implement this architecture for real clients:

Data governance and traceability

  • Dataset versioning and lineage (what model was trained on what data)

  • Immutable audit logs for regulated environments

  • Role-based access for images, labels, and predictions

MLOps and model lifecycle discipline

  • Automated evaluation gates before deployment

  • Drift monitoring (capture drift, supplier drift, process drift)

  • Retraining cadence tied to business events (new tooling, new supplier, new part revision)



Human-in-the-loop workflows

  • A review queue for borderline predictions

  • Labeling feedback loops that improve the model weekly (not yearly)

  • Operator-friendly UI patterns that don’t slow the line down

Real ROI measurement

We instrument the system to quantify:

  • Scrap reduction

  • Yield improvement

  • Reduced rework time

  • Faster containment during quality events

  • Reduced inspection labor hours (or reallocated to higher-value tasks)



The outcome

A credible defect detection system in manufacturing should deliver four non-negotiables:

  1. Consistent capture → consistent predictions (poor capture conditions destroy model quality).

  2. Clear model lifecycle (train, validate, deploy, monitor, retrain).

  3. Real-time + batch flexibility (some lines need immediate decisions; others run audits overnight).

  4. Actionable outputs (alerts, dashboards, and write-backs into MES/QMS workflows).

The AWS architecture is designed around exactly those constraints.


Architecture at a glance (what’s happening end-to-end)


1) Image capture on the shop floor (device tier)

The process starts where quality begins: X-ray machines, cameras, and other inspection devices operating under consistent capture conditions. Consistency matters more than people expect—lighting, angle, exposure, calibration, and part positioning can make or break model performance.

Intelliblitz note: Before we touch ML, we standardize capture. We treat it like an engineering spec, not an “operator preference.”

2) Transfer from edge to AWS (edge tier ingestion)

The architecture supports multiple ingestion paths depending on what your plant already runs:

  • AWS Transfer Family (managed file transfer patterns)

  • AWS DataSync (reliable, large-scale data movement)

  • AWS IoT Greengrass (especially useful when images are produced at the edge and you want local control + secure sync)

This is practical: different plants have different operational realities. The system shouldn’t force a single ingestion pattern.

3) Central storage and dataset management in Amazon S3

All images land in Amazon S3, where they’re organized into datasets—typically split into training and test/validationsets (and in mature systems: versioned datasets by line, plant, supplier, or part revision).

S3 becomes your system of record for:

  • Raw images

  • Labeled datasets

  • Inference outputs (predictions, scores, metadata)

  • Audit trails for compliance

Intelliblitz note: We implement a dataset structure that supports traceability (part ID, batch/lot, timestamp, line, tool calibration version). This is what makes audits and root-cause analysis possible later.

4) Model training with Amazon Lookout for Vision

Amazon Lookout for Vision is used to:

  • Assist with labeling workflows

  • Train and tune the defect detection model

  • Deploy the trained model for inference

This reduces the barrier to entry for industrial anomaly detection—especially for teams that don’t want to build a full custom vision training stack from scratch.

5) Expose and manage the model via AWS Lambda + Amazon API Gateway

Once trained, the architecture exposes model management and interaction through:

  • AWS Lambda (for management logic / lightweight control plane actions)

  • Amazon API Gateway (to provide a clean API layer for admins and data teams)


This is important because production deployments need controlled access, logging, and governance—not “someone running notebooks.”


Inference paths: real-time, orchestrated, and batch


6) Serverless orchestration for inference (AWS Step Functions + Lambda)

For runtime defect detection, the architecture uses AWS Step Functions to orchestrate inference workflows with Lambda.

Why this matters:

  • Step Functions gives you repeatable orchestration and visibility

  • Lambda provides modular “glue” logic: preprocess, call inference, postprocess, route outputs

This creates a scalable pattern for event-driven inspection pipelines.

7) Batch anomaly detection (AWS Batch + AWS Fargate)

Not every inspection needs real-time decisions. Many manufacturers run:

  • Batch analysis per shift

  • Supplier audits

  • Re-inspection of borderline items

  • Retrospective analysis after a quality incident

Here the architecture supports AWS Batch, with compute provisioned using AWS Fargate, so you can scale batch workloads without managing servers.

8) Alerts and confidence-based notifications (Amazon SNS)

The system notifies users using Amazon SNS—and crucially, it can route alerts based on confidence thresholds.

A mature quality workflow doesn’t treat predictions as binary. It treats them as:

  • High confidence defect → immediate action

  • Medium confidence → human review queue

  • Low confidence → label for future retraining / capture conditions check


This is how you reduce false positives without becoming blind to real defects.


9) Results storage, visualization, and shop-floor write-back (S3 + QuickSight + MES integration)

Finally, results can be:

  • Stored back in Amazon S3

  • Visualized for business and quality users in Amazon QuickSight

  • Written back into the MES through an integration step (commonly via AWS Lambda)


This closes the loop—from inspection → decision → reporting → operational action.

Intelliblitz note: Dashboards are not the finish line. The finish line is “the line behavior changes automatically” (hold/rework routing, supplier scoring, CAPA triggers, operator prompts).

What separates a demo from a production system

Most “AI defect detection” projects fail not because the model is bad—but because the surrounding system is weak. Here’s what we add when we implement this architecture for real clients:

Data governance and traceability

  • Dataset versioning and lineage (what model was trained on what data)

  • Immutable audit logs for regulated environments

  • Role-based access for images, labels, and predictions

MLOps and model lifecycle discipline

  • Automated evaluation gates before deployment

  • Drift monitoring (capture drift, supplier drift, process drift)

  • Retraining cadence tied to business events (new tooling, new supplier, new part revision)



Human-in-the-loop workflows

  • A review queue for borderline predictions

  • Labeling feedback loops that improve the model weekly (not yearly)

  • Operator-friendly UI patterns that don’t slow the line down

Real ROI measurement

We instrument the system to quantify:

  • Scrap reduction

  • Yield improvement

  • Reduced rework time

  • Faster containment during quality events

  • Reduced inspection labor hours (or reallocated to higher-value tasks)



The outcome

A credible defect detection system in manufacturing should deliver four non-negotiables:

  1. Consistent capture → consistent predictions (poor capture conditions destroy model quality).

  2. Clear model lifecycle (train, validate, deploy, monitor, retrain).

  3. Real-time + batch flexibility (some lines need immediate decisions; others run audits overnight).

  4. Actionable outputs (alerts, dashboards, and write-backs into MES/QMS workflows).

The AWS architecture is designed around exactly those constraints.


Architecture at a glance (what’s happening end-to-end)


1) Image capture on the shop floor (device tier)

The process starts where quality begins: X-ray machines, cameras, and other inspection devices operating under consistent capture conditions. Consistency matters more than people expect—lighting, angle, exposure, calibration, and part positioning can make or break model performance.

Intelliblitz note: Before we touch ML, we standardize capture. We treat it like an engineering spec, not an “operator preference.”

2) Transfer from edge to AWS (edge tier ingestion)

The architecture supports multiple ingestion paths depending on what your plant already runs:

  • AWS Transfer Family (managed file transfer patterns)

  • AWS DataSync (reliable, large-scale data movement)

  • AWS IoT Greengrass (especially useful when images are produced at the edge and you want local control + secure sync)

This is practical: different plants have different operational realities. The system shouldn’t force a single ingestion pattern.

3) Central storage and dataset management in Amazon S3

All images land in Amazon S3, where they’re organized into datasets—typically split into training and test/validationsets (and in mature systems: versioned datasets by line, plant, supplier, or part revision).

S3 becomes your system of record for:

  • Raw images

  • Labeled datasets

  • Inference outputs (predictions, scores, metadata)

  • Audit trails for compliance

Intelliblitz note: We implement a dataset structure that supports traceability (part ID, batch/lot, timestamp, line, tool calibration version). This is what makes audits and root-cause analysis possible later.

4) Model training with Amazon Lookout for Vision

Amazon Lookout for Vision is used to:

  • Assist with labeling workflows

  • Train and tune the defect detection model

  • Deploy the trained model for inference

This reduces the barrier to entry for industrial anomaly detection—especially for teams that don’t want to build a full custom vision training stack from scratch.

5) Expose and manage the model via AWS Lambda + Amazon API Gateway

Once trained, the architecture exposes model management and interaction through:

  • AWS Lambda (for management logic / lightweight control plane actions)

  • Amazon API Gateway (to provide a clean API layer for admins and data teams)


This is important because production deployments need controlled access, logging, and governance—not “someone running notebooks.”


Inference paths: real-time, orchestrated, and batch


6) Serverless orchestration for inference (AWS Step Functions + Lambda)

For runtime defect detection, the architecture uses AWS Step Functions to orchestrate inference workflows with Lambda.

Why this matters:

  • Step Functions gives you repeatable orchestration and visibility

  • Lambda provides modular “glue” logic: preprocess, call inference, postprocess, route outputs

This creates a scalable pattern for event-driven inspection pipelines.

7) Batch anomaly detection (AWS Batch + AWS Fargate)

Not every inspection needs real-time decisions. Many manufacturers run:

  • Batch analysis per shift

  • Supplier audits

  • Re-inspection of borderline items

  • Retrospective analysis after a quality incident

Here the architecture supports AWS Batch, with compute provisioned using AWS Fargate, so you can scale batch workloads without managing servers.

8) Alerts and confidence-based notifications (Amazon SNS)

The system notifies users using Amazon SNS—and crucially, it can route alerts based on confidence thresholds.

A mature quality workflow doesn’t treat predictions as binary. It treats them as:

  • High confidence defect → immediate action

  • Medium confidence → human review queue

  • Low confidence → label for future retraining / capture conditions check


This is how you reduce false positives without becoming blind to real defects.


9) Results storage, visualization, and shop-floor write-back (S3 + QuickSight + MES integration)

Finally, results can be:

  • Stored back in Amazon S3

  • Visualized for business and quality users in Amazon QuickSight

  • Written back into the MES through an integration step (commonly via AWS Lambda)


This closes the loop—from inspection → decision → reporting → operational action.

Intelliblitz note: Dashboards are not the finish line. The finish line is “the line behavior changes automatically” (hold/rework routing, supplier scoring, CAPA triggers, operator prompts).

What separates a demo from a production system

Most “AI defect detection” projects fail not because the model is bad—but because the surrounding system is weak. Here’s what we add when we implement this architecture for real clients:

Data governance and traceability

  • Dataset versioning and lineage (what model was trained on what data)

  • Immutable audit logs for regulated environments

  • Role-based access for images, labels, and predictions

MLOps and model lifecycle discipline

  • Automated evaluation gates before deployment

  • Drift monitoring (capture drift, supplier drift, process drift)

  • Retraining cadence tied to business events (new tooling, new supplier, new part revision)



Human-in-the-loop workflows

  • A review queue for borderline predictions

  • Labeling feedback loops that improve the model weekly (not yearly)

  • Operator-friendly UI patterns that don’t slow the line down

Real ROI measurement

We instrument the system to quantify:

  • Scrap reduction

  • Yield improvement

  • Reduced rework time

  • Faster containment during quality events

  • Reduced inspection labor hours (or reallocated to higher-value tasks)


Cloud Computing

Feb 6, 2026

20 min

Industrial Computer Vision for Defect Detection on AWS: A Practical Reference Architecture

Manufacturing quality teams already know the hard truth: defects don’t announce themselves. They hide in tiny anomalies—hairline cracks, porosity, irregular castings, subtle damage—often visible only in X-ray or high-resolution imaging. Human inspection is expensive, inconsistent at scale, and hard to standardize across shifts, plants, and suppliers. This is where industrial computer vision becomes a competitive weapon: consistent inspections, measurable confidence scores, faster root-cause analysis, and a closed loop back to the shop floor. In this post, we’ll break down an AWS reference architecture that detects defects using Amazon Lookout for Vision, Amazon S3, AWS Lambda, and an orchestration layer for batch and real-time inference—then we’ll show how Intelliblitz typically turns this into a production-grade, plant-ready system.

Blog Image

Business

Apr 8, 2024

2 min

Why Most BI Dashboards Fail (And What Elite Ones Do Differently)

Dive into the world of project management software, exploring different tools, features, and best practices to streamline project workflows and boost productivity.

Blog Image

BI Dashboards

Nov 14, 2024

2 min

The Silent Killer in Scaling Companies: Fragmented Data Visibility

You don’t need more dashboards. You need unified vision.

Blog Image

Artificial Intelligence

Dec 19, 2024

2 min

AI Is Useless Without This: The BI Layer That Actually Moves the Needle

Everyone is talking about AI. Almost no one is ready for it.

From charts to command — clarity is compounding

The Architecture Behind Elite Founders’ Growth Engines

We build embedded systems that align strategy, data, and execution — engineered for clarity and speed. For operators who don’t have time to guess.

Speak with our lead systems architect & Core Business team. No fluff. No sales


From charts to command — clarity is compounding

The Architecture Behind Elite Founders’ Growth Engines

We build embedded systems that align strategy, data, and execution — engineered for clarity and speed. For operators who don’t have time to guess.

Speak with our lead systems architect & Core Business team. No fluff. No sales


The Architecture Behind Elite Founders’ Growth Engines

We build embedded systems that align strategy, data, and execution — engineered for clarity and speed. For operators who don’t have time to guess.

Speak with our lead systems architect & Core Business team. No fluff. No sales