PDCA Quality Control: Applying Plan-Do-Check-Act in Modern Industries

Exploring Ideas: A Blog on Technology, Startups, Food, and More

When organizations face high-consequence decisions under uncertainty (whether in healthcare, aviation, or nuclear power), the initial decision is just the beginning. How do these organizations ensure their decisions lead to desired outcomes while managing risks? How do they learn and adapt when lives or critical infrastructure are at stake? The PDCA Cycle (Plan-Do-Check-Act), developed by W. Edwards Deming at Bell Labs and refined through decades of use in high-reliability organizations like Toyota, provides a systematic framework for making and implementing critical decisions while continuously learning from outcomes.

The Four Stages of Systematic Learning

PDCA isn’t just a decision-making tool; it’s a rigorous approach to implementing decisions where the stakes are high and uncertainty is constant. Each stage builds on the previous one, fostering a cycle of evidence-based learning and adaptation particularly valuable in high-stakes environments:

Plan: This stage involves thoroughly understanding the current situation, identifying critical risks and uncertainties, setting clear objectives, and developing a detailed plan that includes contingencies. What precisely are we trying to achieve? What are the potential failure modes? How will we measure success and monitor risks?
Do: Implement the plan in a controlled manner, with careful attention to safety and risk management. Where possible, test changes in limited scenarios before full implementation. Collect comprehensive data on both process and outcomes. Execute the plan while maintaining vigilant monitoring.
Check (or Study): Conduct rigorous analysis of the data collected during the “Do” phase. Compare outcomes against objectives and risk assessments from the “Plan” phase. Investigate any deviations or unexpected results thoroughly. What actually happened? What does the evidence tell us about our assumptions and approach?
Act: Based on the analysis, take appropriate action. If successful, document the approach and implement safeguards for sustained success. If issues arose, conduct detailed analysis and begin a new cycle with refined understanding. How do we either standardize success or adjust our approach based on what we’ve learned?

The strength of PDCA lies in its systematic approach. Structured testing and monitoring during the ‘Do’ and ‘Check’ phases help identify potential issues before they become critical ( Risk Management). Rigorous analysis ensures decisions are refined based on data, not just assumptions (Evidence-Based Learning). Each cycle builds organizational knowledge about complex systems and uncertainties (Continuous Improvement), and successful approaches are documented and systematized to ensure reliability (Standardization).

PDCA in Practice: Nuclear Power Operations

Consider how nuclear power plants use PDCA when implementing new safety procedures:

Plan: Analyze current safety protocols, identify potential risks, and design new procedures. Define specific safety metrics and monitoring approaches. Develop detailed contingency plans.
Do: Implement new procedures in simulator training first, then in limited real-world scenarios under heightened supervision. Collect comprehensive data on operator performance and system responses.
Check: Analyze safety metrics, operator feedback, and system data. Evaluate both intended and unintended consequences of the changes. Review any near-misses or unexpected events.
Act: If the new procedures prove effective, standardize them across all relevant operations. If issues are identified, analyze root causes and begin a new PDCA cycle with refined procedures.

This practical application highlights the core requirements for effective PDCA implementation in high-reliability organizations: comprehensive planning with detailed risk assessment, controlled implementation with careful monitoring, rigorous and objective data analysis, and systematic documentation to capture learnings and prevent future issues.

PDCA in AI Systems: Adapting the Cycle for MLOps

The Plan-Do-Check-Act (PDCA) cycle, a cornerstone of quality management and process improvement, maps directly onto the modern practices for developing, deploying, and maintaining Artificial Intelligence systems, often encompassed under the term MLOps (Machine Learning Operations).

The AI/ML Lifecycle as a PDCA Loop

The iterative nature of building and operating reliable AI systems closely follows the PDCA structure:

graph TD
    subgraph PDCA_AI [PDCA Cycle in MLOps]
        direction LR
        P[Plan: Define Goals, Data Strategy, Model Selection, Deployment Plan] --> D(Do: Data Engineering, Model Training, Validation, Deployment);
        D --> C(Check: Monitor Performance, Detect Drift, Evaluate Metrics, Analyze Failures);
        C --> A(Act: Retrain, Fine-tune, Rollback, Update Pipeline, Improve Process);
        A --> P;
    end

Plan: This phase involves defining the business objectives, understanding data requirements and availability, selecting appropriate model architectures and training strategies, establishing performance metrics (technical and business KPIs), planning for deployment, and setting up monitoring strategies. What problem are we solving? What data do we need? How will we build and test the model? How will we know if it’s working in production?
Do: This is the execution phase, encompassing data collection and preparation (ETL, feature engineering), model training and hyperparameter tuning, rigorous validation against test sets, packaging the model, and deploying it into the target environment (e.g., as an API, embedded system).
Check: Once deployed, continuous monitoring is crucial. This involves tracking the model’s predictive performance, monitoring for data drift (changes in input data distribution) and concept drift (changes in the underlying relationships), evaluating business impact, analyzing specific prediction failures, and checking system health (latency, throughput, errors). Is the model performing as expected? Has the data changed? Is it delivering business value? Are there unexpected failures?
Act: Based on the insights gathered during the “Check” phase, appropriate actions are taken. This might include triggering automated retraining pipelines, fine-tuning the model with new data, rolling back to a previous version if performance degrades significantly, updating data processing steps, improving monitoring alerts, or initiating a larger redesign (returning to the “Plan” phase).

Why PDCA is Critical for AI Reliability

Applying a PDCA mindset through MLOps is essential for trustworthy AI:

Managing Complexity: AI systems, especially ML models, are complex and their behavior can change over time. PDCA provides structure for managing this.
Handling Drift: Data and concepts inevitably drift in the real world; the Check-Act loop is necessary to maintain performance.
Risk Mitigation: Monitoring (Check) allows for early detection of performance degradation or emerging biases before they cause significant harm.
Continuous Improvement: It embeds a cycle of learning and refinement into the AI lifecycle, preventing model stagnation.
Accountability & Governance: Structured monitoring and action provide necessary oversight and traceability for AI systems.

By treating the AI lifecycle not as a one-off development project but as a continuous PDCA cycle managed through robust MLOps practices, organizations can build and maintain AI systems that are more reliable, adaptive, and aligned with business objectives over time.

PDCA within an AI Coding Agent

Beyond the lifecycle management of AI systems (MLOps), the PDCA cycle can also model the iterative refinement process within an AI agent designed for tasks like code generation or modification. Consider an AI coding assistant:

graph TD
    subgraph PDCA_CodingAgent [PDCA Cycle in AI Coding Agent]
        direction LR
        P[Plan: Understand Requirement, Analyze Code, Devise Implementation Strategy] --> D(Do: Generate or Modify Code);
        D --> C(Check: Compile, Run Linter, Execute Test Suite);
        C --> A(Act: Analyze Results, Refine Strategy and Code, Commit or Reiterate);
        A -- Success --> P_Next[Next Task/Commit];
        A -- Issues Found --> P; 
    end

Plan: The agent receives a coding task (e.g., “Refactor function X to improve efficiency,” “Implement feature Y,” “Fix bug Z described in ticket 123”). It analyzes the requirements, examines the relevant existing codebase, and formulates a plan for the necessary code changes. This might involve outlining new functions, identifying code sections to modify, and predicting potential impacts. What code needs to be written or changed, and how?
Do: The agent executes the plan by generating new code or modifying existing code according to the strategy developed in the “Plan” phase. Write the code.
Check: This is a critical verification stage. The agent integrates the generated/modified code and performs automated checks:
- Compilation/Interpretation: Does the code syntax parse correctly?
- Linting: Does the code adhere to predefined style guides and static analysis rules?
- Testing: Does the code pass the relevant unit tests, integration tests, or other automated test suites designed to verify its functionality and correctness against requirements?
- Did the changes work as intended? Does the code meet quality standards?
Act: The agent analyzes the results from the “Check” phase:
- Success: If compilation, linting, and tests all pass, the agent might mark the task as complete, commit the code, or proceed to the next part of a larger task.
- Failure: If any check fails (e.g., linting errors, test failures), the agent analyzes the feedback (error messages, test results). It uses this information to revise its understanding of the problem or its implementation strategy (returning to “Plan”) or directly attempts to fix the specific issues in the code (returning to “Do” with a refined micro-plan). Based on the test and lint results, what needs to be corrected or improved?

This internal PDCA loop allows the AI coding agent to iteratively refine its output, using concrete feedback (compiler output, linter messages, test results) to converge on a correct and high-quality solution, mirroring the test-driven development (TDD) or continuous integration/continuous deployment (CI/CD) practices used by human developers.

A Framework for High-Reliability Operations

The PDCA cycle is fundamental to organizations where failure is not an option. It provides a structured approach to making and implementing decisions while maintaining safety and reliability in complex, uncertain environments. Success hinges on comprehensive planning, controlled implementation, rigorous analysis, and systematic documentation.

The Key Takeaway: In situations where decisions carry significant consequences and uncertainty is high, the PDCA Cycle provides a systematic framework for implementation and learning. Through careful Planning, controlled Doing, thorough Checking, and appropriate Acting, organizations can navigate complex decisions while maintaining safety and reliability. This systematic approach is essential when the stakes are high and the margin for error is small.

Subscribe to the Newsletter

Get the latest posts and insights delivered straight to your inbox.