The 8th International Conference on Emerging Data and Industry (EDI40)
Automating UML-Based Visualization of Software Ecosystems: Tracking Versions, Dependencies, and Security Updates
V. Kana, M. P. Lnua, S. Berhea, C. El Karia, M. Maynardb, F. Khomhc

aUniversity of the Pacific, 3601 Pacific Ave, Stockton, CA 95211, USA
bData Independence LLC, 23 Settlers Way, Ellington, CT 06029, USA
cPolytechnique Montréal, Montréal, QC, H3T 1J4, Canada
"Yesterday the Android app worked, but today it does not work"

— Frequent user feedback, 2020

Which software components updated (automatically) recently?
  • VMWare Website
  • Safari Website
  • Chrome Website
  • Firefox Website
  • iOS Website
  • GitHub API
  • Android Website
  • Java Website
  • MySQL Website
  • DistroLinux API
  • Eclipse Website
  • Android Studio Website
Data Sources
(release notes)
➡️
Data Lake
(consolidate in data lake)
➡️
Data Model
(release notes feed)
  • MongoDB Data Lake
  • Version JSON Object
  • 20+ Bots with daily sync
  • Open, Commercial, and Closed Software
  • Last 2 years:
  • 13,759 Components
  • 56,856 Versions (95,862)
Data Lake Diagram

Results (2020)

Metric Why
Recent updates Helps track recent update bursts or regressions
Update type Distinguishes patch (e.g., 1.2.3), minor (e.g., 1.2.0), and major (e.g., 1.0.0) updates
Count version Measures churn and helps reduce risk of coinciding updates
"Yesterday the Android app worked, but today it does not work"
Version Feed
  • App not working after overnight update
  • No user interaction before the failure
  • Update not announced in the changelog
  • Prototype: Review update from last night
“I put the iOS 12 beta on my phone and it stopped receiving calls. [...] I wouldn’t recommend updating anything you rely on!”

— Frequent user feedback, 2020

Weekly Update Graph
Monthly Update Graph
Yearly Update Graph
  • Updates follow business cycles
  • Less updates each quarter
  • Recommendation: eg., update on weekend

Limitations

  • Semantic version only (not CI-CD)
  • Version dependency unknown
  • How to prioritize dependent updates?
"What dependent ecosystem components are affected by an update?"

— Users feedback, 2023

Data Sources
(release notes + CVE)
➡️
Data Lake
(consolidate in data lake)
➡️
Data Model
(ecosystem feed + graph)
  • Graph Node by Component
  • Component with Search Tags
  • Graph Edge by Matching Search Tags
Data Lake Diagram

Results (2023)

Metric Why
Recent updates Helps track recent update bursts or regressions
Update type Patch (e.g., 1.2.3), minor (e.g., 1.2.0), and major (e.g., 1.0.0) updates
Count version Measures churn and helps reduce risk of coinciding updates
Degree of node Measures the number of association for each component
  • List component nodes
  • List component edges by search tags
  • Color code types: Major, Minor, Patch
  • Count component version
  • Prototype: Review connected components
Version Feed

Limitations

  • Release notes not technical enough
  • Model association instead of dependency
  • No formal graph model
  • No weighting of current metrics
  • How to better model dependent component types?
"What is the impact of an Linux OS update?"

— From component to component type, 2024

Data Sources
(release notes + CVE)
➡️
Data Lake
(consolidate and classify in data lake)
➡️
Data Model
(ecosystem feed + weighted graph)
  • Release notes git commit message
  • CVE Description
  • How to know it is an operating system?
Data Lake Diagram
Version Feed

Results (2024)

Weighted Metric Why
Recent updates Helps track recent update bursts or regressions
Update type Distinguishes patch (e.g., 1.2.3), minor (e.g., 1.2.0), and major (e.g., 1.0.0) updates
Count version Measures churn and helps reduce risk of coinciding updates
Count edge Measures dependency between components
Classify component Measures churn and helps reduce risk of high impact component updates
Classify version Measures churn and helps reduce risk of security / breaking updates
Version Feed
  • List component nodes
  • List component edges by search tags
  • Color code types: Major, Minor, Patch
  • Count component version
  • Weighted metrics incl. component type
  • Prototype: Review weighted graph
Version Feed
Version Feed
"What is the maintenance cost of a software stack ecosystem?"
Data Lake Diagram
Version Feed
Version Feed

Limitations

  • Breaking updates not classified
  • Component type classification not good enough
  • Release note and CVE description do not good enough for breaking updates
  • How to classify breaking updates?
Data Sources
(release notes + CVE + reddit posts)
➡️
Data Lake
(consolidate and classify in data lake)
➡️
Data Model
(ecosystem feed + weighted graph)

Results (Summer 2024)

Weighted Metric Why
Recent updates Helps track recent update bursts or regressions
Update type Distinguishes patch (e.g., 1.2.3), minor (e.g., 1.2.0), and major (e.g., 1.0.0) updates
Count version Measures churn and helps reduce risk of coinciding updates
Count edge Measures dependency between components
Classify component Measures churn and helps reduce risk of high impact component updates
Classify version Measures churn and helps reduce risk of security
Classify version Measures churn and helps reduce risk of breaking updates
RQ6: Reddit Data Impact on Critical Issue Detection

Increased Detection

  • 5% breaking issues identified

Impact on Applications

  • Early identification improves mitigation
  • Absent from release notes/CVEs
  • Potential for better user experience
  • Reddit organizes content in topics
  • Reddit has an open API without costs
  • Reddit platform is very active
Data Lake Diagram

Limitations

  • Reddit posts are not verified
  • Reddit posts sometimes too generic (no version)
  • How to standardize data model diagram?
Data Sources
(release notes + CVE + reddit posts)
➡️
Data Lake
(consolidate and classify in data lake)
➡️
Data Model
(feed + weighted graph + UML Diagram)
"Can we generate a standard UML diagram from latest updates?"

— Architect question, 2025

Author(s) Summary Data Focus Area Year
Luján-Mora et al. Data Mapping Diagrams for Data Warehouse Design with UML Introduces attribute-level mapping diagrams in UML to document ETL processes Data warehouse design, UML integration 2005
Bali et al. SoftArchViz Automated software architecture visualization from source code and patterns Dynamic software structure visualization 2007
Narawita et al. Automated UML Generation using NLP Generates use case and class diagrams from requirement texts using NLP Requirements analysis, design automation 2021
Lyashenko et al. Real-time Monitoring for Cyber-Physical Systems Hardware/software integration for monitoring legacy production systems Cyber-physical system visualization 2023
Gamage et al. Automated Architecture Diagram Generator Uses NLP with BERT and knowledge graphs to automate diagram creation Software architecture, NLP automation 2023
Hamza et al. Feature-driven Architecture Generation Uses feature modeling to create configurable architectures for pervasive systems Pervasive systems, feature modeling 2012
Renovate GitHub Dependency Update Tool Tracks and automates updates in GitHub repositories Repository-specific update tracking 2019
Berhe et al. Release Note Modeling and Impact-Driven Update Prioritization Integrates release notes, CVEs, and Reddit data for multi-ecosystem update tracking and UML-Based visualization Standardization, software maintenance, security-aware automation 2025
Version Feed

Results 2025

  • Tree structure with the OS as the root node
  • Software stack packages
  • Default peer components
  • Default display latest version
Data Lake Diagram
Prototype
Hospital OS Ecosystem Diagram
https://releasetrain.io/?q=linux,apache,mysql,php
Results Overview
  • UML API:https://github.com/plantuml/plantuml.js
  • UML Customization: Enabled consistent, adaptable diagrams
  • Cross-Browser Support: Stable on Safari, Chrome, and Firefox
  • Impact Sorting: Grouping by update type improved clarity
  • Render Time: ~2 seconds per OS diagram with four components
  • Visual Cues: Styling (e.g., bold CVEs) highlighted priorities
Limitations
  • Limited to static, read-only OS component images; lacks interactivity
  • Does not support all UML properties (e.g., line breaks in component descriptions)
  • Poor scalability between single and multiple components;
  • Requires browser cache clearing after updates to avoid inconsistencies
  • External API dependency may affect availability and responsiveness
  • PlantUML rendering requires strict formatting, increasing error potential
  • Limited error messages complicate debugging and slow down testing
  • User evaluation needed
Objective
  • Visualize current vs. latest version status
  • Support multiple OS-level components
  • Display metrics (recent, type, degree, etc.)
  • Display one row per OS component
  • Use color to indicate update types
  • Extend functionality from: github.com/plantuml/plantuml.js

Thank you
Hospital OS Ecosystem Diagram