About arc42
arc42, the template for documentation of software and system architecture.
Template Version 8.2 EN. (based upon AsciiDoc version), January 2023
Created, maintained and © by Dr. Peter Hruschka, Dr. Gernot Starke and contributors. See https://arc42.org.
1. Introduction and Goals
YOVI is a web-based platform for playing Game Y — an abstract strategy board game where two players compete to connect all three sides of a triangular board with their pieces. The project is developed as part of the Software Architecture (ASW) course at the University of Oviedo, for the company Micrati.
The platform allows users to play against AI opponents with six difficulty levels, compete against other human players online via real-time multiplayer, interact with social features (friends, notifications), and track their performance through match history and statistics. An admin role provides privileged users with a management panel to oversee the platform. External bots can integrate with the system via a dedicated public interoperability API.
The system follows a microservices architecture composed of eight independent services: a React frontend (webapp),
an API gateway (gateway), a JWT authentication service (authentication), a user management service (users),
a Rust game engine and bot service (gamey), a real-time multiplayer service (multiplayer), a bot interoperability
API (botapi), and an Nginx reverse proxy. Observability is provided by Prometheus and Grafana.
Along all the development of the project, we have focused on keeping a clear documentation of our progress in the wiki the Github project. In the different section of the arc42 documentations, link can be find to these sections of the wiki
Here the full wiki : GitHub Wiki YOVI
And all the team meetings : Meetings
🎮 Live deployment: https://yovi.13.63.89.84.sslip.io
1.1. Requirements Overview
The YOVI system is a web-based gaming platform based on Game Y, developed for the company Micrati. The main goal is to allow users to play matches against the machine or against other players online, with support for registration, social features, administration, and statistics.
Key functional requirements include:
-
Public deployment and web accessibility via HTTPS.
-
Web application in React + TypeScript supporting the classic version of Game Y.
-
Rust game engine to validate game state and compute bot moves.
-
User registration and JWT-based authentication.
-
Role-based access control: regular users and administrators.
-
Match history and game statistics with pagination.
-
Six AI difficulty levels selectable by the user. Bot Implementations wiki page
-
Real-time online multiplayer (Player vs Player) via WebSockets (Socket.IO).
-
Social features: friend requests, friend list management, and user search.
-
In-app notification system (friend requests, welcome messages).
-
User profile with editable fields (name, bio, location, preferred language).
-
Admin panel: privileged users can view all registered accounts, grant or revoke admin permissions, delete a user’s match history, and permanently delete accounts.
-
Public interoperability API for external bots using YEN notation.
-
Interoperability client mode: our bots can play against other teams' APIs.
-
Internationalization (i18n) — English and Spanish supported.
-
Monitoring with Prometheus metrics and Grafana dashboards.
-
Load testing suite with k6.
1.2. Quality Goals
The main quality attributes for the system architecture are:
| Priority | Quality Goal | Motivation |
|---|---|---|
1 |
Functionality |
The system must correctly implement Game Y rules and provide all six working AI strategies. The multiplayer mode must accurately synchronize game state between two players in real time. The admin panel must correctly enforce role-based access so that only admin users can perform privileged operations. |
2 |
Usability |
The React-based interface must be intuitive for new players. The component-based architecture supports internationalization (EN/ES), theme switching (dark/light), and responsive layout. |
3 |
Reliability & Availability |
Using Docker and Docker Compose ensures consistent deployment. The gateway, multiplayer service, and Nginx are designed to be stateless and restartable. Docker restart policies limit downtime in case of failure. |
4 |
Modularity & Maintainability |
The eight-service architecture ensures components can be developed, tested, and extended independently. The
Strategy Pattern in |
5 |
Security |
JWT-based authentication protects user-specific operations. bcrypt hashing is used for passwords. Role claims in the JWT payload allow the gateway and frontend to enforce admin-only routes without additional service calls. The reverse proxy limits direct exposure of internal services. The gateway sanitizes and validates all path parameters. |
6 |
Testability |
The separation of concerns allows comprehensive testing: unit tests in Rust ( |
1.3. Stakeholders
| Role / Name | Contact | Expectations |
|---|---|---|
Development Team |
Ana Pérez Bango (UO294100), Adriana García Suárez (UO300042) |
Implement a scalable, maintainable, and well-documented solution that fulfills all course requirements. |
End Users (Human Players) |
N/A |
Usable, stable interface with complete gameplay, multiplayer, social features, and statistics. |
Platform Administrators |
Privileged users via the admin panel |
Ability to manage user accounts, grant or revoke admin permissions, delete match histories, and remove accounts from the platform. |
Micrati (Client) |
N/A |
Fulfillment of game, API, public deployment, and interoperability requirements. |
Project Evaluators |
Arquisoft course staff |
Compliance with documentation (Arc42 + ADRs), testing, deployment, code quality (SonarCloud), and feature completeness criteria. |
External Bot Developers |
API users via |
Well-documented, stable, and versioned API for automated bot integration using YEN notation. |
Rival Teams (Interop) |
Other ASW course teams |
Stable bot interoperability API that allows bot-vs-bot matches between teams following the shared YEN contract. |
2. Architecture Constraints
This section lists the main constraints that shape the architecture of the YOVI system. Unlike requirements, constraints define how it must be done or which technologies must be used. These are non-negotiable boundaries within which we must operate.
2.1. Technical Constraints
These constraints are imposed by the client (Micrati). They dictate the technology stack and communication protocols.
| Constraint | Explanation | Rationale | Negotiable? |
|---|---|---|---|
Frontend Technology |
The web application MUST be implemented in TypeScript. React is the chosen framework. |
The client requires TypeScript for type safety and maintainability. |
No (TypeScript mandatory) |
Game Engine Language |
The game logic module MUST be implemented in Rust. |
Performance and memory safety requirements for game state validation and AI strategies. |
No |
Communication Protocol |
All communication between the webapp and game engine MUST use JSON messages following YEN notation for game states. |
Standardized format specified by the client for interoperability. |
No |
Public Interoperability API |
The system MUST expose a documented public interoperability API ( |
Core requirement from Micrati for third-party bot integration and cross-team competitions. |
No |
Real-time Multiplayer |
The player-vs-player mode MUST use WebSockets (via Socket.IO) for bidirectional real-time communication. |
HTTP polling is insufficient for the latency requirements of a turn-based game with live state sync. |
No (WebSockets mandatory; Socket.IO library negotiable) |
Deployment |
The complete system MUST be publicly accessible via the Web over HTTPS. |
The client needs to demonstrate the working product to evaluators and end users. |
No |
Containerization |
All services MUST be containerized using Docker with a root-level |
Ensures consistent deployment across environments. |
Partial (format flexible) |
Database |
User data MUST be persisted. MongoDB is the current implementation. |
Data persistence is mandatory; the specific database technology is negotiable. |
Yes (technology choice) |
Authentication |
User operations MUST be secured using JWT-based authentication with bcrypt password hashing. |
Ensures secure access to user-specific data and actions. Passwords must never be stored in plain text. |
No |
Role-based Access Control |
The system MUST support at least two roles: regular user and administrator. Role information MUST be encoded in the JWT payload so the gateway and frontend can enforce access without additional service calls. |
Required for the admin panel feature. Encoding roles in the token avoids an extra round-trip on every protected request. |
Partial (additional roles could be added) |
Load Testing |
The system MUST include a load testing suite using k6 covering at minimum registration, login, and game start scenarios. |
Required to validate that the system meets the concurrent user quality goals under realistic conditions. |
Partial (additional scenarios encouraged) |
2.2. Domain Constraints
These constraints come from the game domain itself and the specific requirements of Game Y.
| Constraint | Explanation | Rationale | Negotiable? |
|---|---|---|---|
Game Rules |
The system MUST correctly implement the rules of Game Y (classic version minimum). |
Core business requirement; incorrect game rules make the product worthless. |
No |
Game Modes |
Both player-versus-machine (PvM) and player-versus-player (PvP) modes MUST be available. The PvP mode operates in real time via WebSockets. |
Primary use cases for the platform. |
No |
Board Size |
The game MUST support variable board sizes configurable by the user (minimum size 3, no enforced maximum). |
Required for different difficulty levels and game variants. |
No |
AI Strategies |
The computer MUST implement more than one strategy, selectable by the user. Six strategies are currently implemented. |
Demonstrates AI sophistication and provides varied gameplay experience. |
No (minimum one strategy mandatory; more encouraged) |
User Data |
Users MUST be able to register and access their match history and statistics. |
Basic user management requirement. |
No |
Admin Capabilities |
Administrator users MUST be able to: view all registered accounts, grant or revoke admin permissions, delete a user’s match history, and permanently delete user accounts. |
Required for platform moderation and management. |
No |
2.3. Organizational Constraints
These constraints govern our development process and team practices.
| Constraint | Explanation | Rationale | Negotiable? |
|---|---|---|---|
Documentation Standard |
Architecture MUST be documented following the Arc42 template (sections 1-15, including new sections for Load Testing, API Reference, and Monitoring). |
Course requirement for the ASW lab. |
No |
Decision Recording |
Architectural decisions MUST be recorded as ADRs (Architecture Decision Records) in the GitHub wiki. |
Ensures traceability and rationale documentation. |
No |
Version Control |
Development MUST use Git with the repository hosted on GitHub ( |
Course requirement for collaboration and evaluation. |
No |
Branching Strategy |
A branch-based workflow MUST be followed (feature branches, pull requests, code reviews). |
Ensures code quality and team coordination. |
Partial |
Testing Requirements |
The system MUST include unit tests, integration tests, end-to-end tests, and load tests. |
Quality assurance requirement from course evaluation. |
No |
Test Coverage |
Code coverage MUST meet thresholds defined in |
Quality gate for acceptance. |
No |
CI/CD |
Automated build, test, and deployment MUST be implemented via GitHub Actions. CI/CD Pipeline wiki page |
Ensures reproducible builds and deployment automation. |
No |
Issue Tracking |
All tasks MUST be tracked using GitHub Issues. |
Project management and traceability requirement. |
No |
Code Quality |
Code quality MUST be monitored via SonarCloud. Critical issues block merges. |
Required by course evaluation criteria. |
No |
2.4. External Constraints
These constraints come from external stakeholders and the operating environment.
| Constraint | Explanation | Rationale | Negotiable? |
|---|---|---|---|
Bot Integration |
External bots can ONLY interact through the public interoperability API ( |
Security and encapsulation requirement. |
No |
Interoperability Contract |
The |
Enables bot-vs-bot matches between teams from different universities. |
No |
API Documentation |
The public interoperability API MUST be documented via OpenAPI (YAML) at |
Enables external developers to build compatible bots without reading source code. |
Partial (format negotiable) |
YEN Notation Compliance |
Any system claiming to support Game Y MUST use YEN notation for game state representation. |
Industry standard for this game family. |
No |
Public Accessibility |
The deployed system MUST be accessible without special network configuration over HTTPS. |
End users and evaluators need to access the platform easily. |
No |
2.5. Quality Constraints
These constraints define the minimum acceptable quality levels the architecture must guarantee.
| Constraint | Target | Measurement | Negotiable? |
|---|---|---|---|
Response Time (bot move) |
Game engine must respond within 2 seconds for standard board sizes (⇐11x11). |
k6 load test |
Partial |
Response Time (login) |
Login endpoint must respond within 1.5 seconds at 50 concurrent users. |
k6 load test |
Partial |
Response Time (registration) |
Registration endpoint must respond within 2 seconds at 50 concurrent users. |
k6 load test |
Partial |
Concurrent Users |
Support at least 50 concurrent users on auth endpoints without degradation. |
k6 ramping VU scenarios (register and login scripts). |
Yes (scale up with resources) |
Error Rate |
HTTP error rate MUST remain below 5% under load test conditions. |
k6 threshold |
Partial |
Availability |
System must achieve 99% uptime during the evaluation period. |
Monitoring via Prometheus and Grafana dashboards. |
Partial |
Test Coverage |
Code coverage MUST remain above 80% as enforced by SonarCloud. |
Automated coverage reports in CI pipeline. |
No |
2.6. Implications of These Constraints
These constraints collectively shape our architecture in specific ways:
-
Eight-service structure: The technology constraints (TypeScript + Rust + WebSockets) force a separation between frontend (
webapp), game engine (gamey), real-time multiplayer (multiplayer), and the remaining Node.js services. The admin role requirement adds access control logic at both the gateway and users service levels. -
JWT with role claims: The role-based access control constraint means the JWT payload must include a
rolefield (e.g.,"admin"or"user") so that the gateway can enforce admin-only endpoints without querying the users service on every request. -
Reverse proxy and routing: Public access is centralized through an Nginx reverse proxy, which handles HTTPS termination and routes requests to
webapp,gateway,botapi, andmultiplayervia Socket.IO. -
Socket.IO for multiplayer: The real-time constraint forces a stateful WebSocket server (
multiplayerservice) alongside the stateless REST services. This service maintains room state in memory and must be considered separately in scaling and failure scenarios. -
Docker standardization: The containerization constraint ensures consistent development and deployment environments across all eight services plus the monitoring stack (Prometheus + Grafana).
-
Load testing as a first-class concern: The k6 constraint means performance is validated automatically, not just observed. Thresholds are defined in the test scripts and enforced in CI.
-
Documentation overhead: Arc42, ADRs, OpenAPI, and load test documentation require dedicated time alongside development, but provide traceability and evaluability for the course.
3. Context and Scope
This section delimits the YOVI system from its environment. Following arc42 guidelines, we separate business context (what the system does, from a domain perspective) from technical context (how it communicates, from an infrastructure perspective).
3.1. Business Context
The YOVI system is a web-based gaming platform for Game Y. From a business perspective, it provides three distinct value propositions:
-
For human players: An intuitive web interface to play Game Y against AI opponents or other human players in real time, register an account, manage a profile, interact socially with other users, and track performance.
-
For platform administrators: A privileged management panel to oversee registered accounts, grant or revoke admin permissions, delete match histories, and remove user accounts from the platform.
-
For bot developers: A public interoperability API (
botapi) that allows automated bots to interact with the game engine, request moves, create and manage game sessions, and participate in cross-team bot-vs-bot competitions.
The system depends on no external business systems — it is self-contained in terms of domain functionality. All game logic, user management, authentication, multiplayer coordination, and AI strategies are implemented within the YOVI system boundaries.
3.1.1. Business Context Diagram
3.1.2. Business Interfaces
| Interface | Description | Input | Output | Communication Partner |
|---|---|---|---|---|
Game Play UI (PvM) |
Web UI for human players to play against AI |
Mouse clicks on board cells |
Visual board, updated YEN state |
Human Player |
Game Play UI (PvP) |
Real-time web UI for human vs human matches |
Mouse clicks; room code to join |
Live board updates via WebSocket events |
Human Player |
User Management |
Registration and login interface |
Username, password, optional email |
Session JWT token, user profile |
Human Player |
Profile & Statistics |
Access to user profile and match history |
User identity (JWT) |
Profile fields, win/loss records, game history |
Human Player |
Social Features |
Friend requests, friend list, user search |
Username queries; friend request actions |
Friend list, notifications, search results |
Human Player |
Notification System |
In-app notifications for events |
System-generated triggers (friend request, welcome) |
Unread badge, notification list |
Human Player |
Admin Panel |
Management UI for privileged users |
Admin JWT; user selection; action selection |
Updated user list; confirmation of action taken |
Administrator |
Bot API — Game Session |
Programmatic interface for bots to create and play games |
Board state (YEN), bot ID, move coordinates |
Updated YEN state, game status |
External Bot |
Bot API — Stateless Move |
Request a single bot move without session state |
YEN position, optional bot ID |
Move coordinates |
External Bot |
Bot API — Remote Interop |
Client-mode sessions against other teams' APIs |
Remote API base URL, game ID, local bot ID |
Session state, move result, action taken |
External Bot / Rival Team API |
3.2. Technical Context
From a technical perspective, the YOVI system is implemented as multiple independent services communicating via HTTP/REST and WebSockets. Public access is handled through an Nginx reverse proxy, which routes external requests to the appropriate services. This section describes the technical channels and protocols that enable the business interfaces defined above.
3.2.1. Technical Context Diagram
3.2.2. Technical Interfaces and Channels
| Business Interface | Technical Channel | Protocol | Data Format |
|---|---|---|---|
Game Play UI (PvM) |
Browser → Nginx → Webapp; API calls via Gateway → Gamey |
HTTPS / HTTP |
HTML/CSS/JS + JSON (YEN) |
Game Play UI (PvP) |
Browser → Nginx → Multiplayer (WebSocket upgrade); Multiplayer → Gamey |
HTTPS → WSS (Socket.IO) |
JSON events (room state, YEN, move result) |
User Registration |
Browser → Gateway → Auth → Users |
HTTPS → HTTP → HTTP |
JSON |
User Login |
Browser → Gateway → Auth → Users |
HTTPS → HTTP → HTTP |
JSON + JWT |
Token Verification |
Browser → Gateway → Auth |
HTTPS → HTTP |
JSON (JWT) |
Profile & Statistics |
Browser → Gateway → Users |
HTTPS → HTTP |
JSON |
Social Features |
Browser → Gateway → Users |
HTTPS → HTTP |
JSON |
Notification System |
Browser → Gateway → Users |
HTTPS → HTTP |
JSON |
Admin Panel |
Browser → Gateway → Users (admin-only endpoints) |
HTTPS → HTTP |
JSON (role enforced via JWT claim) |
Bot API — Game Session |
Bot → Nginx → BotAPI → Gamey |
HTTPS → HTTP → HTTP |
JSON (YEN) |
Bot API — Stateless Move |
Bot → Nginx → BotAPI → Gamey |
HTTPS → HTTP → HTTP |
JSON (YEN) |
Bot API — Remote Interop |
BotAPI → Rival Team API (outbound HTTP client) |
HTTP |
JSON (YEN, shared cross-team contract) |
Game Logic Execution |
Gateway → Gamey; Multiplayer → Gamey |
HTTP (internal Docker network) |
JSON (YEN) |
Database Access |
Users Service → MongoDB |
MongoDB Wire Protocol |
BSON |
Metrics Collection |
Prometheus → Gateway, Users, Gamey |
HTTP (scrape |
Prometheus text format |
Metrics Visualization |
Grafana → Prometheus |
HTTP (PromQL queries) |
JSON (time series) |
3.2.3. Technology Stack per Component
| Component | Technology | Justification |
|---|---|---|
Web Frontend (webapp) |
React + TypeScript + Vite |
Client requirement (TypeScript); component reusability; custom i18n; theme switching (dark/light) |
API Gateway (gateway) |
Node.js + Express 5 + express-prom-bundle |
Single entry point for webapp traffic; centralized CORS, input sanitization, and Prometheus metrics |
Authentication Service (authentication) |
Node.js + Express 5 + jsonwebtoken + bcryptjs |
Isolated JWT logic with single responsibility; bcrypt for secure password hashing |
User Service (users) |
Node.js + Express 5 + Mongoose + express-prom-bundle |
User data persistence, friend graph, notifications, game results, admin operations; Prometheus metrics |
Game Engine (gamey) |
Rust + Axum + Tokio + axum-prometheus |
Mandated by client; memory safety and performance for game logic and six AI strategies |
Multiplayer Service (multiplayer) |
Node.js + Express + Socket.IO + axios |
Real-time bidirectional WebSocket communication for PvP rooms; delegates game rules to Gamey |
Bot API (botapi) |
Node.js + Express + TypeScript |
Public interoperability API for external bots; client-mode sessions against rival APIs |
Reverse Proxy (nginx) |
Nginx stable-alpine |
HTTPS termination, HTTP→HTTPS redirect, path-based routing to all internal services |
Database |
MongoDB (via Mongoose ODM) |
Flexible schema for user data; aggregation pipelines for ranking and statistics |
Monitoring |
Prometheus + Grafana |
Metrics scraping from three services; pre-built dashboard ("Yovi Services Overview") |
Containerization |
Docker + Docker Compose |
Consistent dev/test/prod environments across all eight services |
CI/CD |
GitHub Actions + GitHub Container Registry |
Automated build, test, publish, and deploy pipeline on every release tag |
3.3. Scope
3.3.1. In Scope (Implemented)
-
Classic Game Y implementation with correct rules (move validation, win condition: connect all 3 sides)
-
Player-versus-machine (PvM) game mode with variable board size and six AI difficulty levels
-
Player-versus-player (PvP) real-time multiplayer via Socket.IO rooms with unique room codes
-
User registration and JWT-based authentication with bcrypt password hashing
-
Role-based access control: regular users and administrators
-
Admin panel: view all users, grant/revoke admin role, delete match history, delete accounts
-
User profiles with editable fields (real name, bio, city, country, preferred language)
-
Match history and game statistics stored in MongoDB
-
Top-10 ranking by wins via MongoDB aggregation
-
Social features: friend requests, friend list management, bidirectional unfriend
-
In-app notification system (friend requests, welcome messages) with mark-as-read
-
User search by username or real name
-
Hint system: AI suggests a move using
alfa_beta_botviaPOST /hint -
Public interoperability API (botapi) for external bots using YEN notation
-
Remote interop client mode: our bots play against other teams' APIs
-
Internationalization (EN/ES) via custom i18n module with
localStoragepersistence -
Dark/light theme switching with
localStoragepersistence -
Prometheus metrics on gateway, users, and gamey services
-
Grafana dashboard ("Yovi Services Overview") with request rate, p95 latency, and error rate panels
-
Load testing suite with k6 (register, login, start_game scenarios)
-
Swagger UI for users service OpenAPI spec at
/api-docs -
Docker Compose for local development and production deployment
-
GitHub Actions CI/CD pipeline with automated test, build, publish, and deploy stages
-
SonarCloud code quality integration
3.3.2. Out of Scope
-
Game variants (Poly-Y, Hex, Tabu Y) — not implemented
-
Mobile native applications (iOS / Android)
-
OAuth / social login (Google, GitHub)
-
Persistent multiplayer room state (rooms are in-memory; lost on service restart)
-
Advanced analytics or machine learning for AI
-
Real-time admin notifications (admin panel requires manual refresh)
4. Solution Strategy
This section summarizes the fundamental decisions that shape the YOVI system architecture. Each decision is motivated by specific constraints (from section 2) and quality goals (from section 1), and forms the foundation for detailed design decisions in later sections.
4.1. 1. Technology Decisions
| Decision | Rationale | Constraints / Goals Addressed | Alternatives Considered |
|---|---|---|---|
Frontend: React + TypeScript + Vite |
React’s component model enables UI reuse, i18n support, and theme switching. TypeScript is mandatory per client. Vite provides fast dev server and optimized production builds. |
Technical Constraint: TypeScript; Quality: Usability, Maintainability |
Vue, Angular (rejected: team familiarity) |
Authentication Service: Node.js/Express + JWT + bcryptjs |
Isolated JWT logic with single responsibility improves testability. bcrypt ensures passwords are never stored in plain text. JWT role claims allow stateless admin authorization. |
Quality: Security, Testability; Constraint: RBAC, bcrypt |
Auth inside users service (rejected: violates SRP) |
User Service: Node.js/Express + Mongoose |
Lightweight, integrates well with MongoDB. Handles user data, friends graph, notifications, game results, ranking, search, and admin operations in a single cohesive service. |
Quality: Development speed, Testability |
Python/Flask, Java/Spring (rejected: heavier) |
Game Engine: Rust + Axum + Tokio |
Mandated by client. Memory safety and performance for game logic and six AI strategies. Axum provides ergonomic
async HTTP with type-safe extractors and shared state via |
Technical Constraint: Rust; Quality: Performance, Reliability |
Actix-web (rejected: more complex); C++, Go (not allowed) |
Multiplayer Service: Node.js + Socket.IO |
Socket.IO provides battle-tested WebSocket abstraction with room management, reconnection, and fallback transport. Delegates game rules to Gamey via HTTP, keeping itself stateless regarding game logic. |
Technical Constraint: WebSockets; Quality: Reliability, Functionality |
Raw WebSocket (rejected: no room management); SSE (rejected: unidirectional) |
Bot API: Node.js + Express + TypeScript |
Dedicated public API in TypeScript for type safety. Supports both server mode (external bots play our bots) and client mode (our bots play rival APIs). In-memory session store is sufficient given stateless game rules. |
Technical Constraint: Public Interop API; Quality: Interoperability, Testability |
Rust-based (rejected: slower iteration); Python (rejected: team expertise) |
Reverse Proxy: Nginx |
Centralizes public access, HTTPS termination, HTTP→HTTPS redirect, and path-based routing to all four public-facing services (webapp, gateway, multiplayer, botapi). |
Quality: Security, Deployability, Availability |
Traefik (rejected: more complex config); no proxy (rejected: insecure) |
Database: MongoDB via Mongoose |
Schema flexibility for evolving user data. Aggregation pipelines support ranking and statistics. Mongoose ODM provides schema validation, virtuals, and middleware hooks. |
Quality: Development speed, Deployability |
PostgreSQL (rejected: schema changes slower) |
Monitoring: Prometheus + Grafana |
Prometheus scrapes metrics from gateway, users, and gamey. Grafana provides a pre-built dashboard with request rate, p95 latency, and error rate panels for all three services. |
Quality: Observability, Availability |
Datadog (rejected: paid); custom logging only (rejected: no time-series) |
Load Testing: k6 |
k6 provides JavaScript-based load test scripts with custom metrics (Trend, Rate), threshold enforcement, and JSON result export. Three scenarios cover registration, login, and game start. |
Technical Constraint: Load Testing; Quality: Performance |
Locust (rejected: Python only); JMeter (rejected: XML config, heavy) |
Containerization: Docker + Docker Compose |
Ensures consistent environments across dev/test/prod. Required for deployment. GitHub Container Registry stores published images per service. |
Technical Constraint: Containerization; Quality: Deployability |
Kubernetes (overkill); manual deployment (rejected: inconsistent) |
CI/CD: GitHub Actions |
Integrated with repository. Automates test, build, publish to GHCR, and deploy on every release tag. |
Organizational Constraint: CI/CD; Quality: Testability |
Jenkins (rejected: separate infrastructure) |
4.2. 2. Top-Level Decomposition
Decision: Eight-service microservices architecture with dedicated components for each responsibility, exposed through an Nginx reverse proxy.
Reasons:
-
The technology constraints (TypeScript + Rust + WebSockets) force separation — they cannot run in the same process
-
Real-time multiplayer requires a stateful WebSocket server that is architecturally distinct from the stateless REST services
-
Separating authentication from user data management improves testability and single responsibility
-
The public bot interoperability API must be independently deployable and versioned
-
Nginx provides the single public entry point, handling HTTPS termination and routing
How it maps to quality goals:
-
Maintainability: Services can be modified independently
-
Testability: Each service is tested in isolation with its own test suite
-
Deployability: Components are published as individual Docker images to GHCR
4.3. 3. Design Patterns
| Pattern | Application | Rationale | Location |
|---|---|---|---|
Strategy Pattern |
Bot AI implementation in |
Six strategies implement |
|
Registry Pattern |
|
Allows dynamic bot selection by name at runtime. Used by both the game server handlers and the CLI. |
|
Gateway / Router Pattern |
Nginx + gateway + botapi routing |
Public traffic routes through Nginx; internal request flows are separated for web and bot clients. |
|
Middleware Pattern |
JWT verification in auth service; Prometheus in gateway, users, gamey |
|
|
Observer / Event Pattern |
Socket.IO events in multiplayer service |
Room state changes emit |
|
Repository Pattern |
Mongoose models in users service |
|
|
React Context Pattern |
|
Global state (theme, language) shared across all components without prop drilling. |
|
Union-Find (Disjoint Set) |
Win condition detection in game core |
Tracks connected components of each player’s pieces. Checks if any component touches all three sides.
Used in |
|
4.4. 4. Quality Goal Realization
| Quality Goal | How We Achieve It | Key Decisions | Verification |
|---|---|---|---|
Functionality |
Rust engine with Union-Find win detection; six strategies in |
Rust for game logic; Strategy Pattern; Socket.IO for multiplayer |
Unit tests in |
Usability |
React SPA with responsive UI; custom i18n (EN/ES); dark/light theme; hint system |
React frontend; |
E2E tests (Playwright); language toggle tested; theme persistence tested |
Modularity & Maintainability |
Eight services with single responsibilities; Strategy Pattern; OpenAPI spec for botapi |
Eight-service architecture; Strategy Pattern; TypeScript in botapi |
Independent deployment possible; new AI strategy = new struct + registration |
Reliability & Availability |
Docker restart policies; stateless gateway and auth; Prometheus alerts |
Containerization; stateless design; monitoring stack |
|
Security |
bcrypt hashing; JWT with role claims; input sanitization in gateway; Nginx as public perimeter |
bcrypt in authentication service; RBAC in JWT; gateway path validators |
Auth tests; admin route protection tests; SonarCloud security hotspot review |
Testability |
Separation of concerns; Vitest/Jest in Node; |
Service separation; Middleware Pattern; in-memory test DBs ( |
Unit: |
Interoperability |
REST API with YEN notation; OpenAPI doc; client mode for rival APIs; versioned endpoints ( |
Public |
API tests verify YEN round-trips; cross-team integration tested during course sessions |
4.5. 5. Organizational and Process Decisions
| Decision | Rationale | Impact | Constraint Addressed |
|---|---|---|---|
Iterative Development (sprints) |
Early validation of gameplay; adapt to feedback |
Regular demos; continuous integration |
Course timeline; quality goals |
Kanban Board (GitHub Projects) |
Visualize work; identify bottlenecks |
Issues tracked; clear priorities |
Organizational: issue tracking |
Code Reviews via Pull Requests |
Catch issues early; share knowledge |
All PRs require review; standards enforced |
Quality: maintainability, reliability |
Definition of Done |
Feature = code + tests + docs + reviewed |
Ensures completeness before merge |
Quality: testability, maintainability |
Architecture Decision Records (ADRs) |
Document why decisions were made |
GitHub wiki with numbered ADRs |
Organizational: documentation standard |
SonarCloud Quality Gate |
Automated code quality enforcement |
PRs blocked if coverage drops below 80% or critical issues found |
Organizational: code quality |
4.6. 6. Key Architectural Decisions Summary
| ADR | Summary | Status |
|---|---|---|
ADR-001 |
Extensible game mode architecture starting with PvM; PvP added as real-time multiplayer |
Implemented ✅ |
ADR-002 |
MongoDB for user data — schema flexibility, aggregation for ranking |
Implemented ✅ |
ADR-003 |
Eight-service microservices architecture (updated from original three-service) |
Implemented ✅ (updated) |
ADR-004 |
Nginx as public reverse proxy with WebSocket upgrade support for Socket.IO |
Implemented ✅ (updated) |
ADR-005 |
Strategy Pattern for AI — six bot strategies as pluggable |
Implemented ✅ (updated) |
ADR-006 |
Dedicated authentication microservice with bcrypt and JWT role claims |
Implemented ✅ |
ADR-007 |
Axum as HTTP framework for Rust game engine |
Implemented ✅ |
ADR-008 |
Custom i18n in frontend — zero dependencies, EN/ES, |
Implemented ✅ |
ADR-009 |
Socket.IO for real-time multiplayer — dedicated service, in-memory rooms, delegates rules to Gamey |
Implemented ✅ |
ADR-010 |
400ms debounce on social search — vs. throttle and search-on-submit |
Implemented ✅ |
ADR-011 |
React Context for theme and i18n — vs. Redux/Zustand |
Implemented ✅ |
ADR-012 |
Unambiguous alphabet for room codes — excludes O/0/I/1 to minimise transcription errors |
Implemented ✅ |
ADR-013 |
|
Implemented ✅ |
ADR-014 |
Test DB isolation via |
Implemented ✅ |
ADR-015 |
Express 5 in REST services, Express 4 in multiplayer — Socket.IO compatibility constraint |
Implemented ✅ |
ADR-016 |
In-memory state for BotAPI and multiplayer — conscious decision vs. Redis/MongoDB |
Implemented ✅ |
4.7. 7. Traceability to Constraints
| Constraint (Section 2) | How Solution Strategy Addresses It |
|---|---|
TypeScript frontend |
React + TypeScript in |
Rust game engine |
|
JSON + YEN communication |
REST APIs with YEN validation in |
Real-time multiplayer (WebSockets) |
|
Public API for bots |
|
Docker containerization |
Each service has a |
User data persistence |
MongoDB in |
JWT authentication + bcrypt |
Dedicated |
Role-based access control |
|
Admin capabilities |
Admin endpoints in users service: list all users, update role, delete history, delete account |
Multiple AI strategies |
Strategy Pattern in |
Load testing |
k6 scripts in |
Monitoring |
Prometheus scrapes gateway, users, gamey; Grafana dashboard provisioned automatically |
Testing requirements |
Unit, integration, E2E, property-based, and load tests across all services |
CI/CD |
GitHub Actions: test → build → publish to GHCR → deploy on release |
Documentation (Arc42 + ADRs) |
This document (sections 1-15) + ADRs 001-009 in GitHub wiki |
Internationalization |
Custom i18n module in |
5. Building Block View
This section describes the static decomposition of the YOVI system into building blocks. Following arc42 guidelines, we present a hierarchical view:
-
Level 1: White box description of the overall system with black box descriptions of each top-level building block.
-
Level 2: White box descriptions of selected building blocks showing their internal structure.
5.1. Level 1: Overall System White Box
The YOVI system is decomposed into eight top-level building blocks plus a database and a monitoring stack, all orchestrated via Docker Compose and exposed through Nginx.
5.1.1. Black Box Descriptions — Level 1
Nginx Reverse Proxy
-
Purpose: Single public entry point. Handles HTTPS termination, HTTP→HTTPS redirect, and path-based routing to internal services. TLS certificates are mounted from
nginx/certs/.
-
Provided interfaces: HTTPS on ports 80 and 443
-
Routing rules:
-
/→ webapp:80 -
/api/*→ gateway:8080 -
/socket.io/*→ multiplayer:7000 (with WebSocket upgrade) -
/interop/*→ botapi:4001
-
-
Location:
nginx/conf.d/default.conf
Webapp
-
Purpose: Single-page React application served as static files. Provides the complete user interface for human players and administrators. Communicates with backend exclusively through
/api/(REST) and/socket.io/(WebSocket) via Nginx. -
Provided interfaces: HTML/CSS/JS bundle served to browsers
-
Required interfaces: Gateway REST API; Multiplayer Socket.IO
-
Key routes:
/(login),/register,/home,/game,/select-difficulty,/game/finished,/statistics,/profile/:username,/social,/multiplayer,/multiplayer/game,/admin(admin only) -
Location:
webapp/
Gateway
-
Purpose: Backend entry point for all webapp requests. Routes to the appropriate internal service, handles CORS, sanitizes path parameters, validates input, and exposes Prometheus metrics. Stateless — no session state.
-
Provided interfaces: REST API at port 8080 (internal), exposed via Nginx at
/api/* -
Required interfaces: Authentication Service, Users Service, Gamey Service, Multiplayer Service
-
Prometheus metrics: request count, method, path, status code via
express-prom-bundle -
Location:
gateway/gateway-service.js
Authentication Service
-
Purpose: Handles all identity operations: payload validation, bcrypt password hashing and comparison, JWT token generation and verification with role claims. Delegates user data persistence to the Users Service.
-
Provided interfaces:
-
POST /register— validates payload, hashes password, calls Users Service, returns JWT -
POST /login— fetches user from Users Service, bcrypt compares, returns JWT -
GET /verify— validates JWT, returns decoded claims -
GET /health
-
-
Required interfaces: Users Service (internal HTTP)
-
Location:
authentication/auth-service.js
Users Service
-
Purpose: Manages all user data: profiles, credentials, game results, match history, statistics, friend graph, notifications, ranking, user search, and admin operations. Pure data service — no JWT logic.
-
Provided interfaces (selected):
-
POST /createuser,GET /users/:username,GET /users/:username/profile,PUT /users/:username/profile -
DELETE /users/:username -
POST /gameresults,GET /gameresults/:username,GET /ranking -
GET /users/:username/friends,POST /users/:username/friends/request,POST /users/:username/friends/accept,POST /users/:username/friends/reject,DELETE /users/:username/friends/:friend -
GET /users/:username/notifications,PATCH /users/:username/notifications/:id/read -
GET /search?q=… -
GET /health,GET /metrics(Prometheus) -
Admin endpoints:
GET /admin/users,PATCH /admin/users/:username/role,DELETE /admin/users/:username,DELETE /admin/users/:username/history
-
-
Required interfaces: MongoDB
-
Swagger UI: available at
/api-docs -
Location:
users/users-service.js
Gamey (Game Engine)
-
Purpose: Encapsulates all Game Y logic: move validation, win condition detection (connecting all three sides of the triangular board using Union-Find), and AI move calculation with six strategies. Stateless — no persistent data.
-
Provided interfaces:
-
GET /status— health check -
GET /metrics— Prometheus metrics (axum-prometheus) -
POST /game/new— creates initial game state, returns YEN -
POST /game/check— checks win condition from YEN -
POST /v1/game/pvb/:bot_id— applies player move, computes bot response, returns updated YEN -
POST /v1/game/pvp/move— applies a player move in PvP context, returns updated YEN + win status -
POST /v1/ybot/choose/:bot_id— returns bot’s chosen coordinates without applying move
-
-
Required interfaces: None (fully stateless)
-
Bot strategies:
random_bot,heuristic_bot,minimax_bot,alfa_beta_bot,monte_carlo_hard,monte_carlo_extreme -
Location:
gamey/
Multiplayer Service
-
Purpose: Manages real-time player-vs-player game rooms via Socket.IO. Creates and joins rooms with unique 6-character codes. Maintains room state (players, YEN, status) in memory. Delegates game rule enforcement to Gamey via HTTP. Broadcasts game events to all room participants.
-
Provided interfaces:
-
REST:
POST /rooms/create,POST /rooms/join,POST /rooms/state,POST /rooms/move,POST /rooms/leave,GET /rooms/:code,GET /health -
Socket.IO events (server → client):
connected,room_updated,game_started,game_updated,game_over,opponent_left -
Socket.IO events (client → server):
create_room,join_room,get_room_state,make_move,leave_room
-
-
Required interfaces: Gamey (HTTP)
-
State: In-memory
RoomManager(lost on restart) -
Location:
multiplayer/src/multiplayer-service.js
BotAPI (Interoperability Service)
-
Purpose: Public interoperability API for external bots. Operates in two modes:
-
Server mode: external bots play against our bots via
POST /games,GET /games/:id,POST /games/:id/play,GET /play -
Client mode: our bots play against rival teams' APIs via
POST /remote-games/create,POST /remote-games/connect,POST /remote-games/:id/play-turn
-
-
Provided interfaces: REST API at port 4001, exposed via Nginx at
/interop/* -
Required interfaces: Gamey (HTTP); Rival Team APIs (HTTP outbound, allowlisted hosts)
-
State: In-memory
ActiveGamesStoreandRemoteGameSessionsStore(lost on restart) -
OpenAPI spec:
botapi/src/openapi/openapi.yaml -
Location:
botapi/
MongoDB
-
Purpose: Persists all user data via the Users Service. Never accessed directly by other services.
-
Collections:
users,gameresults,notifications -
Location: Docker container managed externally; URI injected via
MONGODB_URIenv var
Prometheus + Grafana
-
Purpose: Metrics collection and visualization. Prometheus scrapes
/metricsfrom gateway, users, and gamey every 15 seconds. Grafana provides the pre-built "Yovi Services Overview" dashboard with request rate, p95 latency, and error rate panels. -
Location:
users/monitoring/prometheus/,users/monitoring/grafana/
5.2. Level 2: Internal Structure
5.2.1. White Box: Webapp (webapp/)
Key webapp building blocks:
| Block | Responsibility | Location |
|---|---|---|
App.tsx + React Router |
Route definitions; redirect to |
|
ThemeProvider |
Dark/light theme context; |
|
I18nProvider |
EN/ES translations via React context; |
|
Navbar |
Navigation links; notification bell with unread badge; profile link; theme/language toggles; logout |
|
Game (PvM) |
Game board rendering; PvM move loop via |
|
MultiplayerLobby / MultiplayerGame |
Room creation/join via REST; Socket.IO connection; real-time board sync; PvP move via Socket.IO |
|
Social |
User search with 400ms debounce; friend request buttons; profile navigation |
|
UserProfile |
Editable profile fields; match history display |
|
Statistics |
Win/loss stats, game history with pagination |
|
AdminPanel |
Admin-only page; user list; role toggle; delete history; delete account |
|
5.2.2. White Box: Gamey (gamey/)
Key gamey building blocks:
| Block | Responsibility | Location |
|---|---|---|
HTTP Server (Axum) |
Async HTTP server with Tokio runtime; Prometheus middleware; shared |
|
GameY (core) |
Board state; move application; available cell tracking; player position queries |
|
Coordinates |
Barycentric (x,y,z) representation; |
|
PlayerSet (Union-Find) |
Tracks connected components per player; |
|
YBot trait |
|
|
YBotRegistry |
|
|
YEN notation |
|
|
5.2.3. White Box: Users Service (users/)
Key users service building blocks:
| Block | Responsibility | Location |
|---|---|---|
User model |
Schema with |
|
GameResult model |
Stores match outcomes: |
|
Notification model |
|
|
Friend routes |
Request (creates notification), accept (bidirectional add), reject (remove from |
|
Admin routes |
|
|
Ranking |
MongoDB aggregation: |
|
5.2.4. White Box: Multiplayer Service (multiplayer/)
Key multiplayer building blocks:
| Block | Responsibility | Location |
|---|---|---|
RoomManager |
Creates rooms with unique codes (6-char alphanumeric); tracks player sockets; manages turn order via
|
|
GameyClient |
HTTP client wrapping |
|
RoomCodeGenerator |
Generates unique 6-character codes from a custom alphabet (no O/0, I/1 ambiguity); retries up to 1000 times to guarantee uniqueness against existing codes |
|
Socket.IO events (server-emitted) |
|
|
5.3. Interface Overview
| Interface | Provider | Consumer | Protocol | Purpose |
|---|---|---|---|---|
Public HTTPS |
Nginx |
Browser / Bot |
HTTPS |
Single external entry point with TLS termination |
WebSocket Upgrade |
Nginx → Multiplayer |
Browser |
WSS (Socket.IO) |
Real-time PvP game events |
Auth API |
Authentication |
Gateway |
HTTP (internal) |
Register, login, verify JWT |
User Data API |
Users |
Auth, Gateway |
HTTP (internal) |
User CRUD, friends, notifications, admin ops |
Game API |
Gamey |
Gateway, Multiplayer, BotAPI |
HTTP (internal) |
Game logic, bot moves, win detection |
Interop API |
BotAPI |
External Bots |
HTTPS → HTTP |
Public bot integration with YEN |
Remote Interop |
BotAPI (client) |
Rival Team APIs |
HTTP (outbound) |
Cross-team bot-vs-bot via YEN contract |
Database |
MongoDB |
Users |
MongoDB Wire |
Data persistence |
Metrics |
Prometheus |
Gateway, Users, Gamey |
HTTP (scrape) |
Observability data collection |
6. Runtime View
This section describes the dynamic behavior of the YOVI system through a representative selection of architecturally significant scenarios. These scenarios demonstrate how the building blocks defined in Section 5 collaborate at runtime to deliver the required functionality.
6.1. Runtime Scenario 1: User Registration
A new user creates an account. The authentication service validates credentials, hashes the password with bcrypt, delegates user creation to the users service, and returns a JWT token with role claims.
6.2. Runtime Scenario 2: User Login and Session Verification
A returning user logs in. The auth service fetches the stored bcrypt hash from the users service, compares it, and issues a JWT. The home page verifies the token on every load.
6.3. Runtime Scenario 3: Player vs Machine Game (PvM)
A complete game session between a human player and the AI bot. Demonstrates the game loop, YEN notation exchange, and result persistence.
6.4. Runtime Scenario 4: Real-Time Multiplayer (PvP via Socket.IO)
Two human players compete in real time. Demonstrates room creation, Socket.IO event flow, and move synchronization between clients.
6.5. Runtime Scenario 5: Friend Request and Notification Flow
A user searches for another user, sends a friend request, and the recipient receives and accepts it.
6.6. Runtime Scenario 6: Admin Panel — Delete User Account
An administrator uses the admin panel to delete another user’s account. Demonstrates role enforcement and cascading data deletion.
6.7. Runtime Scenario 7: External Bot Using the Interoperability API
An external bot creates a game session against one of our bots, plays moves, and receives the updated state.
6.8. Runtime Scenario 8: Error and Failure Handling
6.8.1. Scenario 8.1: Game Engine Unavailable
6.8.2. Scenario 8.2: Opponent Disconnects During Multiplayer
6.8.3. Scenario 8.3: Invalid or Expired JWT
6.8.4. Scenario 8.4: Non-Admin Accessing Admin Panel
6.9. Summary of Runtime Scenarios
| Scenario | Description | Building Blocks | Quality Goals Demonstrated |
|---|---|---|---|
1 — Registration |
New user creates account with bcrypt hashing and JWT generation |
Webapp, Gateway, Auth, Users, MongoDB |
Security, Functionality |
2 — Login + Verify |
Returning user authenticates; session verified on every page load |
Webapp, Gateway, Auth, Users |
Security, Usability |
3 — PvM Game |
Complete game loop against AI bot with result persistence |
Webapp, Gateway, Gamey, Users, MongoDB |
Functionality, Performance |
4 — PvP Multiplayer |
Real-time room creation, join, and move synchronization via Socket.IO |
Webapp, Gateway, Multiplayer, Gamey |
Functionality, Reliability |
5 — Friend Request |
User search, friend request, notification, and acceptance |
Webapp, Gateway, Users, MongoDB |
Functionality, Usability |
6 — Admin Delete |
Admin removes account with cascading data deletion and role enforcement |
Webapp, Gateway, Users, MongoDB |
Security, Functionality |
7 — External Bot (BotAPI) |
Bot creates session, plays moves, queries state via interop API |
BotAPI, Gamey |
Interoperability, Reliability |
8 — Error Handling |
Service down, opponent disconnect, token expiry, unauthorized admin access |
All services |
Reliability, Security, Availability |
7. Deployment View
This section describes the technical infrastructure that executes the YOVI system and the mapping of building blocks to infrastructure elements. The application is built on Docker containerization running on a cloud virtual machine, combining the portability of containers with the security and isolation of the VM.
Two Docker Compose configurations are maintained:
-
docker-compose.yml— development/CI build: builds images from local source code and tags them for publication to GitHub Container Registry (GHCR). -
docker-compose.deploy.yml— production deployment: pulls pre-built images from GHCR; no build step.
7.1. Level 1: Production Deployment Overview
All services run as Docker containers on a single cloud VM, orchestrated with docker-compose.deploy.yml.
All containers share the monitor-net bridge network. Only Nginx exposes ports externally.
7.2. Level 2: Container Detail
This table lists every container in the production deployment, its image, internal port, dependencies, and key environment variables.
| Container | Image | Exposes | Depends On | Key Env Vars |
|---|---|---|---|---|
nginx |
|
80, 443 (public) |
webapp, gateway, botapi, multiplayer |
— |
webapp |
|
80 (internal) |
gateway |
|
gateway |
|
8080 (internal) |
authentication, gamey, users, multiplayer |
|
authentication |
|
5000 (internal) |
users |
|
users |
|
3000 (internal) |
MongoDB |
|
gamey |
|
4000 (internal) |
— |
— |
multiplayer |
|
7000 (internal) |
gamey |
|
botapi |
|
4001 (internal) |
gamey |
|
prometheus |
|
9090:9090 |
— |
Config: |
grafana |
|
9091:3000 |
prometheus |
Provisioning: |
7.3. Level 3: Nginx Routing Configuration
Nginx is the only container with publicly exposed ports. It routes all inbound traffic to the appropriate internal container based on the request path.
Key Nginx configuration decisions:
-
HTTPS only — port 80 redirects permanently (301) to port 443.
-
TLS certificates — mounted read-only from
nginx/certs/(Let’s Encrypt format). -
WebSocket upgrade — the
/socket.io/location setsUpgradeandConnectionheaders to allow Socket.IO to negotiate the WebSocket protocol through the proxy. -
SPA fallback — the
/location servesindex.htmlfor any unmatched path, enabling client-side routing in the React application. -
Security header —
x-powered-byis disabled on all Node.js services.
7.4. Level 3: Prometheus Scrape Configuration
Prometheus scrapes metrics from three services on the Docker internal network every 15 seconds.
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'users-service'
static_configs:
- targets: ['users:3000']
- job_name: 'gateway-service'
static_configs:
- targets: ['gateway:8080']
- job_name: 'gamey-service'
static_configs:
- targets: ['gamey:4000']
The Grafana dashboard ("Yovi Services Overview", UID yovi-overview) is provisioned automatically at startup
via the users/monitoring/grafana/provisioning/ directory. It includes three panels:
-
Request Rate (req/s) —
rate(http_requests_total[1m])per service, method, path, and status code -
P95 Request Duration (s) —
histogram_quantile(0.95, …)per service -
Error Rate (4xx+5xx) —
rate(http_requests_total{status_code=~"4..|5.."}[1m])per service
7.5. Development vs Production Differences
| Aspect | Development (docker-compose.yml) |
Production (docker-compose.deploy.yml) |
|---|---|---|
Image source |
Built from local source ( |
Pulled from GHCR ( |
Build args |
|
Not needed (image already built) |
Volumes |
Source code not mounted (clean build) |
Config files mounted read-only ( |
Secrets |
|
Environment variables injected by CI/CD on the VM |
Monitoring ports |
Prometheus 9090, Grafana 9091 accessible locally |
Same ports — restrict via firewall in production if needed |
MongoDB |
Local container or Atlas URI in |
Atlas URI in production environment variable |
7.6. CI/CD and Deployment Pipeline
The GitHub Actions workflow (release-deploy.yml) automates the full pipeline on every release tag:
7.7. Mapping of Building Blocks to Infrastructure
| Building Block | Container | Network | Notes |
|---|---|---|---|
Nginx Reverse Proxy |
|
monitor-net + public |
Only container with external ports (80, 443) |
Webapp (React SPA) |
|
monitor-net (internal) |
Served as static files; built with Vite |
API Gateway |
|
monitor-net (internal) |
Prometheus metrics at |
Authentication Service |
|
monitor-net (internal) |
Stateless; no persistent storage |
Users Service |
|
monitor-net (internal) |
Prometheus metrics; Swagger UI at |
Game Engine |
|
monitor-net (internal) |
Stateless Rust binary; Prometheus via axum-prometheus |
Multiplayer Service |
|
monitor-net (internal) |
In-memory room state; lost on container restart |
Bot API |
|
monitor-net (internal) |
In-memory session state; lost on container restart |
MongoDB |
External (Atlas URI) |
— (external) |
URI injected via |
Prometheus |
|
monitor-net |
Scrapes gateway, users, gamey every 15s |
Grafana |
|
monitor-net |
Dashboard auto-provisioned; port 9091 external |
7.8. Quality and Operational Considerations
| Concern | Approach |
|---|---|
Isolation |
All containers share a single |
Portability |
All services are containerized with multi-stage Dockerfiles (where applicable). Images are published to GHCR and pulled on the target VM — no build tools required in production. |
Stateless services |
Gateway, authentication, and gamey are fully stateless. Multiplayer and botapi hold in-memory state (rooms and sessions respectively) which is lost on container restart — acceptable for the current scope. |
Data durability |
MongoDB Atlas provides managed persistence with automatic backups. The |
Secret management |
|
Restart policy |
Docker Compose |
Monitoring |
Prometheus + Grafana are always-on in production. The "Yovi Services Overview" dashboard shows request rate, p95 latency, and error rate for gateway, users, and gamey in real time with a 30-second refresh. |
SSL certificates |
TLS certificates are mounted from |
8. Cross-cutting Concepts
This section describes concepts, patterns, and approaches that span multiple building blocks and apply system-wide. These are the architectural decisions that cut across service boundaries and define how YOVI behaves as a whole rather than within any single component.
8.1. 1. Game State Representation — YEN Notation
All game state across the system is represented using YEN (Y Exchange Notation), a JSON format inspired by chess FEN notation. This is mandated by the client and forms the backbone of all service-to-service and external communication involving game state.
{
"size": 5,
"turn": 0,
"players": ["B", "R"],
"layout": "B/BR/.R./..../....."
}
Fields:
-
size— board edge length (e.g., 7 means 28 total cells) -
turn— index intoplayersarray indicating whose turn it is (0 or 1) -
players— array of token characters;"B"(Blue) moves first,"R"(Red) second -
layout— rows separated by/; each row is one character per cell (.= empty,BorR= occupied); row 0 has 1 cell, row 1 has 2, …, row N-1 has N cells
Usage across the system:
| Component | How YEN is used |
|---|---|
Gamey |
Parses YEN via |
Gateway |
Passes YEN through transparently — no parsing; forwards raw JSON body between webapp and Gamey |
Multiplayer |
Stores current YEN per room in |
BotAPI |
Validates YEN via |
Webapp |
Receives YEN from gateway; renders board from |
8.2. 2. Authentication and Authorization
8.2.1. JWT-based Authentication
All user-facing endpoints that require identity are protected by JWT tokens issued by the authentication service. The token lifecycle is:
-
User logs in → Auth Service verifies bcrypt hash → issues signed JWT
-
Webapp stores token in
localStorage -
Every protected request includes
Authorization: Bearer <token> -
Gateway forwards token to Auth Service for verification (
GET /verify) -
Auth Service returns decoded claims; gateway proxies the response
JWT payload structure:
{
"id": "64f1a2b3c4d5e6f7a8b9c0d1",
"username": "alice",
"email": "alice@example.com",
"role": "user",
"iat": 1712345678,
"exp": 1712432078
}
The role field is set to "user" by default and "admin" for privileged accounts. It is embedded
directly in the token to avoid an additional database round-trip on every admin route check.
8.2.2. Role-Based Access Control (RBAC)
Two roles are defined:
| Role | Capabilities |
|---|---|
user (default) |
Play games (PvM and PvP), manage own profile, send/accept/reject friend requests, view own statistics and match history, receive and mark notifications, search for other users |
admin |
All user capabilities, plus: view all registered accounts (paginated), grant or revoke admin role on any account, delete any user’s match history, permanently delete any user account |
RBAC is enforced at two layers:
-
Client-side guard (webapp): Admin pages check the JWT role claim before rendering; non-admin users are redirected to
/homeimmediately. -
Server-side enforcement (gateway): Admin API routes verify the JWT role claim on every request. A valid token with
role !== "admin"receives403 Forbidden. This prevents bypassing the client guard via direct API calls.
8.2.3. Password Security
Passwords are hashed with bcrypt (cost factor 10) in the users service at account creation
(POST /createuser). The authentication service fetches the stored hash and uses bcrypt.compare() during
login. Plain-text passwords are never stored or logged.
8.3. 3. Real-Time Communication — Socket.IO
The multiplayer service uses Socket.IO for bidirectional real-time communication between clients and the server. Socket.IO is layered over WebSockets with automatic fallback to HTTP long-polling.
8.3.1. Transport Negotiation
Nginx proxies /socket.io/* with WebSocket upgrade headers:
location /socket.io/ {
proxy_pass http://multiplayer:7000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
8.3.2. Room Lifecycle
8.3.3. Event Catalogue
| Direction | Event | Payload | When |
|---|---|---|---|
Client → Server |
|
|
Host creates a new room |
Client → Server |
|
|
Guest joins by room code |
Client → Server |
|
|
Player places a piece |
Client → Server |
|
|
Player explicitly leaves |
Client → Server |
|
|
Reconnecting client requests current state |
Server → Client |
|
|
Immediately after connection established |
Server → Client |
|
|
Any room state change (player joined/left) |
Server → Client |
|
|
Both players are connected |
Server → Client |
|
|
After every valid move |
Server → Client |
|
|
When a player connects all three sides |
Server → Client |
|
|
When a player disconnects or leaves |
8.3.4. Room State (In-Memory)
Room state is held exclusively in the RoomManager instance within the multiplayer service. It is not
persisted to any database. If the multiplayer container restarts, all active rooms are lost. This is an
accepted trade-off for the current scope — active multiplayer sessions are short-lived and can be restarted
by the players.
8.4. 4. Internationalization (i18n)
The webapp supports English (EN) and Spanish (ES) via a custom React Context implementation with zero external dependencies.
Implementation:
-
All UI strings are externalized to a
translations.tsdictionary keyed by dot-notation paths (e.g.,"home.welcome","login.error.password") -
I18nProviderwraps the app and exposest(key, vars?)via theuseI18n()hook -
Variable interpolation:
t("home.welcome", { username })→"Hello alice"(replaces{username}) -
Language preference is persisted in
localStorageunder the key"lang" -
Fallback: if a key is missing in the selected language, the other language is tried; if still missing, the key itself is returned
Language toggle is available on every page (login, register, and in the navbar for authenticated users).
8.5. 5. Theming (Dark / Light Mode)
The webapp supports dark (default) and light themes via a custom ThemeProvider React context.
Implementation:
-
ThemeProvidersets thedata-themeattribute ondocument.documentElement -
CSS variables are defined per theme in
App.cssunder[data-theme="dark"]and[data-theme="light"] -
Theme preference is persisted in
localStorageunder the key"theme" -
useTheme()hook exposesthemeandtoggleTheme()to any component
8.6. 6. Observability — Metrics and Monitoring
Three services expose Prometheus-compatible metrics endpoints:
| Service | Metrics Library | Endpoint |
|---|---|---|
Gateway |
|
|
Users |
|
|
Gamey |
|
|
Standard metrics collected per service:
-
http_requests_total— labelled by method, path (normalized), and status code -
http_request_duration_seconds— histogram for p50/p95/p99 latency -
For Gamey:
axum_http_requests_totalandaxum_http_requests_duration_seconds
The Grafana dashboard "Yovi Services Overview" (UID yovi-overview) auto-provisions with three panels
covering all three services simultaneously, refreshing every 30 seconds.
8.7. 7. Error Handling Strategy
A consistent error response format is used across all Node.js services:
{
"ok": false,
"error": "Human-readable error message",
"details": {}
}
For the BotAPI (TypeScript), the format follows the OpenAPI spec:
{
"code": "NOT_FOUND",
"message": "game abc123 not found"
}
HTTP status codes are used consistently:
| Status | When | Example |
|---|---|---|
400 Bad Request |
Invalid input format or business rule violation |
Passwords do not match; malformed YEN; occupied cell |
401 Unauthorized |
Missing or invalid JWT token |
No |
403 Forbidden |
Valid token but insufficient role |
Regular user accessing admin endpoint |
404 Not Found |
Resource does not exist |
Username not found; room not found; game not found |
409 Conflict |
Duplicate resource |
Username already exists; friend request already sent |
500 Internal Server Error |
Unexpected server-side error |
Unhandled exception in service logic |
502 Bad Gateway |
Internal service unreachable |
Gamey container down; auth service unreachable |
503 Service Unavailable |
Service temporarily down |
Container restarting |
The gateway uses a centralized forwardAxiosError() function that propagates the status code and error
message from internal services to the client, or returns 502 if the service is completely unreachable.
8.8. 8. Testing Strategy
| Test Type | Tool | Location | Scope |
|---|---|---|---|
Unit tests |
Vitest (gateway, auth, botapi), Jest (users, multiplayer), |
|
Individual functions, modules, and Mongoose models |
Integration tests |
Supertest (Node.js), |
|
HTTP endpoints with in-memory or test databases |
Property-based tests |
|
|
Coordinate roundtrip invariants, barycentric sum invariant |
End-to-end tests |
Playwright |
|
Complete user journeys in a real browser |
Load tests |
k6 |
|
Registration (50 VUs), login (50 VUs), game start (20 VUs) |
All tests run in CI via GitHub Actions on every pull request. SonarCloud enforces a minimum coverage of 80% across all services as a quality gate.
Special testing considerations:
-
Users service: Uses
mongodb-memory-serverfor isolated in-memory MongoDB during tests; thevitest.globalSetup.jsdrops the_testdatabase after all suites complete. -
Gamey: Integration tests use
tower::ServiceExt::oneshotto call handlers without binding to a network port, keeping tests fast and isolated. -
BotAPI: HTTP clients (
gameyClient,remoteInteropClient) are mocked withvi.mock()so no real network calls are made during unit tests. -
Multiplayer: Socket.IO behavior is tested via
socket.io-clientconnected to a real in-process server instance.
8.9. 9. Security Concepts
| Measure | Implementation |
|---|---|
Transport encryption |
HTTPS with TLS via Nginx for all external traffic. Internal container-to-container communication uses plain HTTP on the isolated Docker bridge network. |
Password hashing |
bcrypt with cost factor 10 in the users service. Hashed passwords are returned only to the auth service for comparison and are never included in any public API response. |
JWT signing |
HS256 algorithm with a strong secret ( |
Role enforcement |
Admin role is encoded in the JWT payload. Both client-side (webapp route guard) and server-side (gateway middleware) checks are applied, so a compromised client cannot bypass authorization. |
Input sanitization |
The gateway validates and sanitizes all path parameters (username: alphanumeric + underscore + hyphen, max 60 chars; notification ID: MongoDB ObjectId regex; room code: uppercase alphanumeric). Invalid inputs return 400 before reaching internal services. |
Authorization header sanitization |
The gateway extracts Bearer tokens using a strict regex ( |
Bot API host allowlist |
The |
x-powered-by disabled |
All Node.js services call |
Secrets management |
|
8.10. 10. Code Organization Conventions
All services follow a consistent structure within their respective technology stack:
Node.js services (CommonJS — users, multiplayer):
service-name/ src/ # Source files (or flat structure for simpler services) models/ # Mongoose models (users service) monitoring/ # Prometheus + Grafana config (users service) __tests__/ # Test files (Jest / Vitest) Dockerfile package.json .env.example
Node.js services (ESM — gateway, authentication):
service-name/ service-name.js # Main entry point vitest.config.js Dockerfile package.json
TypeScript service (botapi):
botapi/
src/
app.ts # Express app factory
server.ts # Entry point
routes/ # Route definitions
controllers/ # Request handlers
services/ # Business logic
clients/ # HTTP clients (gamey, remote interop)
store/ # In-memory state stores
models/ # TypeScript interfaces
dtos/ # Request/response types
utils/ # YEN helpers, ID generation
openapi/ # openapi.yaml spec
__tests__/ # Vitest tests
tsconfig.json
Dockerfile
Rust service (gamey):
gamey/src/ main.rs # Binary entry point (CLI + server mode) lib.rs # Library exports core/ # GameY, Coordinates, Movement, Player, PlayerSet bot/ # YBot trait, YBotRegistry, bot_implementations/ notation/ # YEN parse/serialize game_server/ # Axum HTTP server (mod.rs, bot/, game/, error.rs, state.rs, version.rs) gamey_error.rs # Error types cli.rs # Interactive terminal mode Cargo.toml
9. Architecture Decisions
This section documents the most significant architectural decisions made during the design and development of the YOVI platform. Each decision follows the ADR (Architecture Decision Record) format providing context, rationale, and consequences.
For the full ADR history see the GitHub Wiki — ADR.
9.1. ADR-001: Game Modes Available to the User
-
Status: Implemented ✅
-
Date: 2026-01-15
-
Deciders: Development Team
Context: The platform needs to offer users different ways to play Game Y. The initial requirements specify player-versus-machine mode, but the architecture should be extensible to support additional modes. Real-time player-versus-player was subsequently added as a core requirement.
Decision:
Design the game mode system with extensibility as a core principle. Implement PvM as the primary
mode; add real-time PvP via Socket.IO as a dedicated multiplayer service that delegates all rule
enforcement to gamey. Additionally expose a local PvP mode (same device, two players) as a variant
of the PvM game component.
Consequences:
-
✅ New game modes can be added without modifying core game logic
-
✅ PvM and PvP share the same game engine (
gamey) via different HTTP endpoints -
✅ Multiplayer state (rooms) is isolated — failure does not affect PvM
-
⚠ Two separate game flows (REST for PvM, Socket.IO for PvP) increase client complexity
-
Mitigation: Clear separation in webapp components (
Game.tsxfor PvM,MultiplayerGame.tsxfor PvP)
9.2. ADR-002: MongoDB for User Data Persistence
-
Status: Implemented ✅
-
Date: 2026-01-20
-
Deciders: Development Team
Context: The users service needs to persist user profiles, credentials, friend graphs, notifications, and game results. The schema evolves as new features are added (friends, notifications, admin role, profile fields).
Decision:
Use MongoDB via Mongoose ODM. Three collections: users, gameresults, notifications.
Consequences:
-
✅ Schema flexibility — adding
friends[],role,bio,locationrequired no migration scripts -
✅ Aggregation pipelines support ranking (
$group,$sort,$limit 10) -
✅ Compound index on
(recipient, createdAt DESC)for efficient notification queries -
⚠ No built-in referential integrity — orphan cleanup handled in application code on user delete
-
Mitigation: Explicit cascade deletes in
DELETE /admin/users/:usernamehandler
9.3. ADR-003: Eight-Service Microservices Architecture
-
Status: Implemented ✅ — Updated 2026-03-20 (originally three-service)
-
Date: 2026-01-15 | Last Update: 2026-03-20
-
Deciders: Development Team
Context: Technology constraints (TypeScript + Rust) force at least a frontend/backend split. As the project grew, authentication, API gateway, real-time multiplayer, and bot interoperability were identified as distinct responsibilities benefiting from dedicated services.
Evolution:
| Version | Services |
|---|---|
v1 (2026-01-15) |
|
v2 (2026-02-24) |
+ |
v3 (2026-03-20) |
+ |
Decision: Eight independent services, each with a single responsibility, orchestrated via Docker Compose and exposed through Nginx.
Consequences:
-
✅ Each service can be developed, tested, and deployed independently
-
✅ Technology-specific optimizations per service
-
✅ Failure in one service does not affect others
-
⚠ Eight containers, more inter-service HTTP calls
-
Mitigation: Docker Compose simplifies local development; Prometheus + Grafana provide observability
9.4. ADR-004: Nginx as Public Reverse Proxy
-
Status: Implemented ✅ — Updated 2026-03-20
-
Date: 2026-01-20 | Last Update: 2026-03-20
Context:
With eight services, external clients need a single entry point. Socket.IO requires WebSocket upgrade
support. BotAPI needs its own path prefix (/interop/).
Decision: Nginx handles HTTPS termination, HTTP→HTTPS redirect, and routes to four services:
-
/→webapp:80 -
/api/*→gateway:8080 -
/socket.io/*→multiplayer:7000(WebSocket upgrade) -
/interop/*→botapi:4001
Consequences:
-
✅ Single public URL; internal topology hidden from clients
-
✅ TLS managed centrally — internal services use plain HTTP
-
✅ WebSocket upgrade transparent to the client
-
⚠ Single point of failure — mitigated by Docker restart policy
-
⚠ Certificate renewal requires Nginx reload
9.5. ADR-005: Strategy Pattern for AI Bot Behaviours
-
Status: Implemented ✅ — Updated 2026-03-15
-
Date: 2026-01-25 | Last Update: 2026-03-15
Context:
The system must support multiple AI strategies selectable at runtime. New strategies may be added without
modifying existing code. The YBot trait must be Send + Sync for Axum’s async handlers.
Decision:
Implement the Strategy Pattern via the YBot trait in Rust. All six strategies are registered in
YBotRegistry and selected by name at runtime.
| Bot ID | Algorithm | Difficulty |
|---|---|---|
|
Random valid move |
— |
|
Side connection heuristic |
Easy |
|
Minimax (depth 3) |
Medium |
|
Minimax + alpha-beta pruning |
Hard |
|
Monte Carlo Tree Search |
Expert |
|
MCTS (more iterations) |
Extreme |
Consequences:
-
✅ New strategy = new struct + one
with_bot()call — no existing code changed -
✅ Each strategy tested independently
-
⚠ MCTS Extreme can exceed 2s on boards > 11×11
-
Mitigation: performance budget enforced in k6; depth configurable per constructor
9.6. ADR-006: Dedicated Authentication Microservice
-
Status: Implemented ✅
-
Date: 2026-02-24
Context: Authentication (JWT generation, bcrypt comparison) and user data management are distinct responsibilities. Mixing them in the users service violates SRP and reduces testability.
Decision:
Extract a dedicated authentication/ service responsible for: payload validation, bcrypt comparison,
JWT generation with role claims, and exposing /register, /login, /verify, /health.
Consequences:
-
✅ JWT logic is independently testable
-
✅ Future auth strategies (OAuth) can be added without touching users service
-
⚠ Login requires two HTTP calls: auth → users (sub-millisecond on Docker network)
9.7. ADR-007: Axum as HTTP Framework for the Rust Game Engine
-
Status: Implemented ✅
-
Date: 2026-02-01
Context:
The gamey Rust service needs to expose a REST API. The choice affects ergonomics, async support, and
testability.
Decision:
Use Axum with Tokio. Key reasons: extractor-based API, State<AppState> for shared
Arc<YBotRegistry>, tower::ServiceExt::oneshot for port-free integration tests, and axum-prometheus
for native metrics.
Alternatives Considered:
-
Actix-web: Rejected — more complex, historically used unsafe code
-
Rocket: Rejected — async support was immature at time of decision
-
Raw hyper: Rejected — excessive boilerplate
Consequences:
-
✅ Type-safe shared state; compile-time handler signature guarantees
-
✅ Integration tests run without network binding
-
⚠ Rust async adds compile-time complexity
9.8. ADR-008: Custom i18n Implementation in the Frontend
-
Status: Implemented ✅
-
Date: 2026-03-01
Context: The webapp needs EN/ES support. Scope is < 200 keys, two languages only.
Decision:
Custom I18nProvider using React Context API with a static translations.ts dictionary.
Features: {variable} interpolation, localStorage persistence, Spanish fallback for missing keys.
Alternatives Considered:
-
react-i18next: Rejected — dependency overhead disproportionate to the number of strings
-
FormatJS / react-intl: Rejected — same concern; better for large-scale applications
Consequences:
-
✅ Zero additional dependencies
-
✅ TypeScript
Dicttype catches missing keys at compile time -
⚠ No pluralization support (not currently needed)
9.9. ADR-009: Socket.IO for Real-Time Multiplayer
-
Status: Implemented ✅
-
Date: 2026-03-10
Context: PvP game mode requires real-time bidirectional communication between two clients coordinated by a server.
Decision:
Use Socket.IO in a dedicated multiplayer/ service. The service manages room state in memory,
enforces turn order, and delegates game rules to gamey via HTTP. Socket.IO provides room management,
reconnection, and WebSocket/long-polling fallback out of the box.
Alternatives Considered:
-
Raw WebSocket: Rejected — no room management; reconnection would need custom implementation
-
SSE: Rejected — unidirectional; clients cannot send moves without a separate REST endpoint
-
HTTP polling: Rejected — too much latency and server load for turn-based sync
-
Multiplayer inside gateway: Rejected — mixes stateful logic with stateless REST routing
Consequences:
-
✅ Room management and event broadcasting handled by Socket.IO
-
✅ Game rule enforcement remains in
gamey— no duplication -
⚠ Room state is in-memory — lost on container restart
-
⚠ Cannot scale horizontally without Redis Socket.IO adapter
-
Mitigation: Acceptable for current scope; Redis adapter is documented upgrade path
9.10. ADR-010: YEN as Universal Game State Format
-
Status: Implemented ✅
-
Date: 2026-01-20
Context: Multiple services (gateway, multiplayer, botapi, webapp) need to exchange game state. A shared format must be human-readable, JSON-serializable, and parseable by both Rust and JavaScript without lossy conversion.
Decision:
Adopt YEN (Y Exchange Notation) as the sole game state format across all services. YEN is a
JSON object with four fields: size (integer), turn (player index), players (token array),
and layout (row-separated string of cell characters).
{ "size": 5, "turn": 0, "players": ["B","R"], "layout": "B/BR/.R./..../....." }
Alternatives Considered:
-
Flat cell array: Rejected — loses the triangular structure; requires extra size metadata
-
Coordinate list (only occupied cells): Rejected — harder to detect empty cells; requires full board size for validation; not human-readable
-
Binary encoding: Rejected — not human-readable; harder to debug across languages
Consequences:
-
✅ Single format understood by Rust (
gamey), Node.js (gateway, multiplayer, botapi), and TypeScript (webapp) -
✅ Human-readable layout string simplifies debugging
-
✅ Enables cross-team interoperability without format negotiation
-
✅
assertValidYen()andGameY::try_from(yen)provide validation at every entry point -
⚠ Layout string grows quadratically with board size (size 11 = 66 chars)
-
⚠ Strict row-length validation (
row i must have i+1 chars) can reject valid states if serialized incorrectly
9.11. ADR-011: Role-Based Access Control via JWT Claims
-
Status: Implemented ✅
-
Date: 2026-03-15
Context: The admin panel requires differentiating regular users from administrators on every request. Two options exist: (1) embed the role in the JWT payload, or (2) look up the role from the database on every request.
Decision:
Embed the role field ("user" or "admin") directly in the JWT payload at token generation time.
The gateway reads the role from the decoded token without querying the users service.
{ "id": "...", "username": "alice", "email": "...", "role": "admin", "iat": "...", "exp": "..." }
Enforcement is applied at two layers:
* Client-side: Webapp checks role claim before rendering admin routes; redirects non-admins to /home
* Server-side: Gateway middleware decodes JWT and rejects requests to /admin/* if role !== "admin"
Alternatives Considered:
-
Database lookup on every admin request: Rejected — adds latency and database load; the users service becomes a synchronous dependency on every protected request
-
Separate admin token: Rejected — doubles the authentication flow and complicates token storage
-
Scope-based OAuth claims: Considered — more standard but overkill for two roles; no OAuth provider in use
Consequences:
-
✅ Zero additional latency for role checks — role is in the token
-
✅ Admin routes are protected even if the users service is temporarily unavailable
-
⚠ Role changes (grant/revoke admin) only take effect on the next login — existing tokens retain the old role
-
⚠ Token cannot be invalidated without a blocklist (not implemented)
-
Mitigation: Acceptable for current scale; role revocation edge case documented as known limitation
9.12. ADR-012: Union-Find for Win Condition Detection
-
Status: Implemented ✅
-
Date: 2026-02-10
Context: Game Y is won by connecting all three sides of the triangular board. After every move, the system must determine whether any player has formed a connected path touching sides A, B, and C simultaneously. The check must be fast (< 2s total including bot move).
Decision:
Use a Union-Find (Disjoint Set Union) data structure in the game core. Each placed piece is
represented as a PlayerSet node with three boolean flags: touches_side_a, touches_side_b,
touches_side_c. When a piece is placed, it is merged with all adjacent same-color pieces. The win
condition is detected when any root node has all three flags set to true.
Alternatives Considered:
-
BFS/DFS on every move: Considered — correct but O(N) per move where N is board size; Union-Find achieves near-O(1) amortized per union/find operation
-
Pre-compute all winning paths: Rejected — exponential number of paths on a triangular board
-
Side-connectivity matrix: Rejected — more complex to maintain incrementally
Consequences:
-
✅ Win condition check is near-O(1) amortized after each move
-
✅ Naturally integrates with move application — no separate pass over the board
-
✅ Side-touch flags are propagated correctly through path compression
-
⚠ Union-Find requires careful implementation of path compression with flag merging
-
⚠ Barycentric coordinate system (
x + y + z = size - 1) requires non-trivial index conversion
9.13. ADR-013: Barycentric Coordinate System for the Game Board
-
Status: Implemented ✅
-
Date: 2026-02-10
Context: A triangular Game Y board needs a coordinate system for identifying cells, computing adjacency, and detecting side contact. Standard (row, col) coordinates work but make side-touch detection verbose.
Decision:
Use barycentric coordinates (x, y, z) where x + y + z = size - 1. Each coordinate represents
the distance from one of the three sides:
-
x = 0→ cell touches side A -
y = 0→ cell touches side B -
z = 0→ cell touches side C
Conversion functions from_index(idx, size) and to_index(size) map between the linear array
representation (for board storage) and barycentric coordinates (for rule logic).
Alternatives Considered:
-
(row, col) only: Rejected — side detection requires
row == 0,col == 0,col == rowchecks separately; adjacency computation is less elegant -
Axial coordinates (q, r): Considered — common for hex grids but not natural for triangular boards
Consequences:
-
✅ Side-touch detection is
x == 0,y == 0, orz == 0— one comparison each -
✅ Property-based tests verify
x + y + z == size - 1invariant holds for all valid indices -
✅ Human-readable in API responses (YEN coordinates use
{x, y, z}) -
⚠ Three-coordinate system is unfamiliar to developers new to the codebase
-
⚠
from_indexuses floating-point arithmetic (sqrt) — validated by round-trip property tests
9.14. ADR-014: Express 5 for Node.js Services
-
Status: Implemented ✅
-
Date: 2026-02-15
Context: Three Node.js services (gateway, authentication, users) need an HTTP framework. Express is the industry-standard choice for Node.js REST APIs. Express 4 vs Express 5 was evaluated.
Decision:
Use Express 5 (^5.0.0) for gateway and authentication services. Express 5 provides native
async/await error propagation — unhandled promise rejections in route handlers are automatically
passed to the error middleware without requiring explicit try/catch wrappers in every handler.
|
Note
|
The users service uses Express 5 as well (^5.2.1). The multiplayer service uses Express 4
(^4.21.1) since it was developed slightly earlier.
|
Alternatives Considered:
-
Fastify: Rejected — team familiarity with Express; Fastify’s schema validation adds setup overhead
-
Hono: Considered — fast and TypeScript-native but less ecosystem maturity for course timeframe
-
Express 4: Still used in multiplayer — acceptable since multiplayer handlers use explicit
try/catchthroughout
Consequences:
-
✅ Native async error propagation reduces boilerplate in route handlers
-
✅ Familiarity across the team — no learning curve
-
⚠ Express 5 was in release candidate status during development — minor API differences from Express 4
-
⚠ Some middleware (e.g., older versions of
express-prom-bundle) required version pinning
9.15. ADR-015: In-Memory State for BotAPI and Multiplayer Sessions
-
Status: Implemented ✅
-
Date: 2026-03-10
Context: Both the multiplayer service (rooms) and the BotAPI (game sessions, remote interop sessions) need to maintain short-lived state across multiple HTTP requests or Socket.IO events. Options: in-memory, Redis, MongoDB.
Decision:
Use in-memory Map-based stores (RoomManager, ActiveGamesStore, RemoteGameSessionsStore)
in both services. State is lost on container restart.
Rationale:
-
Bot games and multiplayer rooms are short-lived (typically < 30 minutes)
-
Adding Redis introduces a new infrastructure dependency and operational complexity
-
MongoDB writes on every move would add latency and storage overhead for ephemeral data
-
Course evaluation scale (< 50 concurrent sessions) does not require persistence
Consequences:
-
✅ Zero additional infrastructure — no Redis or extra MongoDB collections needed
-
✅ Sub-millisecond state access — no network round-trip
-
✅ Simple implementation —
Map<string, Room>with helper methods -
⚠ State lost on container restart — players must create new rooms after deployments
-
⚠ Cannot scale horizontally without a shared store
-
Mitigation: Documented as TD1 (multiplayer) and acceptable limitation for BotAPI; Redis adapter is the documented upgrade path
9.16. ADR-016: Dedicated db.js with Test Database Isolation in Users Service
-
Status: Implemented ✅
-
Date: 2026-02-20
Context:
The users service needs to connect to MongoDB. Tests must not read from or write to the production
database. The MongoDB URI format (host1:port,host2:port/db) is not a valid URL, ruling out the
standard URL constructor for parsing.
Decision:
Centralise MongoDB connection in db.js with a buildConnectionUri() function that appends _test
to the database name when NODE_ENV === "test" using regex-based URI manipulation instead of
URL parsing.
// Append _test suffix without URL parsing (handles multi-host URIs)
if (/\/[^/?]+\?/.test(uri)) return uri.replace(/\/([^/?]+)\?/, '/$1_test?')
if (/\/[^/?]+$/.test(uri)) return uri.replace(/\/([^/?]+)$/, '/$1_test')
The vitest.globalSetup.js drops the _test database after all test suites complete.
Alternatives Considered:
-
mongodb-memory-serveralways: Considered — fully isolated but slower startup; does not test against real MongoDB behavior -
Separate
.env.testfile: Considered — requires manual configuration; regex approach is automatic -
URL constructor: Rejected — multi-host MongoDB Atlas URIs (
host1,host2/db) are not valid URLs
Consequences:
-
✅ Tests run against a real MongoDB instance (
_testDB) without affecting production data -
✅ Works with any MongoDB URI format including Atlas cluster URIs
-
✅
vitest.globalSetup.jsensures clean state between CI runs -
⚠ Requires a running MongoDB instance for tests (CI provides one)
-
⚠ Regex-based URI manipulation could fail on unusual URI formats
9.17. ADR-017: Vitest as Test Runner for Node.js Services
-
Status: Implemented ✅
-
Date: 2026-02-15
Context: Node.js services need a test runner. Options considered: Jest (established, CommonJS-friendly), Vitest (fast, native ESM support), Mocha (flexible but requires more setup).
Decision:
Use Vitest (^4.0.x) for gateway, authentication, and botapi services (all ESM). Use Jest
(^29.x) for users and multiplayer services (CommonJS). Both integrate with @vitest/coverage-v8
and jest --coverage for SonarCloud coverage reporting.
Rationale:
Gateway and authentication use "type": "module" (ESM). Jest has historically poor ESM support
requiring Babel transforms. Vitest natively supports ESM with no configuration. Users and multiplayer
use CommonJS (require()), where Jest is the natural choice.
Alternatives Considered:
-
Vitest for all: Considered — CommonJS support requires
"type": "module"migration of users and multiplayer, which would have been a large refactor mid-project -
Jest for all: Rejected — ESM support in Jest requires
--experimental-vm-modulesflag and Babel configuration overhead -
Mocha: Rejected — no built-in coverage; requires additional plugins for assertions and mocking
Consequences:
-
✅ Each service uses the test runner best suited to its module system
-
✅ Both produce LCOV coverage reports compatible with SonarCloud
-
⚠ Two different test runners in the same repository increase onboarding friction
-
Mitigation: Consistent
npm testandnpm run test:coveragescripts across all services hide the underlying runner differences
9.18. ADR-018: ESM vs CommonJS Module System Split
-
Status: Implemented ✅
-
Date: 2026-01-20
Context:
Node.js supports two module systems: CommonJS (require()) and ES Modules (import/export). The
project started with CommonJS (users service) and progressively adopted ESM (gateway, authentication)
as newer services were added. This created a split.
Decision: Accept the mixed module system rather than migrating all services to ESM at once:
-
CommonJS (
"type": "commonjs"or notypefield):users/,multiplayer/ -
ESM (
"type": "module"):gateway/,authentication/ -
TypeScript compiled to CommonJS:
botapi/(TypeScript source, compiled todist/as CJS viatsconfig.jsonwith"module": "Node16")
The users-service.js uses a hybrid approach: the main file uses ESM (import), but loads CommonJS
models via createRequire(import.meta.url) to bridge the module system gap.
Alternatives Considered:
-
Migrate all to ESM: Considered — Mongoose and some Express middleware had ESM compatibility issues at the time; migration risk was high mid-project
-
Migrate all to CommonJS: Rejected — gateway and auth were already written in ESM; rewriting would lose async error propagation benefits of Express 5 + ESM
Consequences:
-
✅ Each service works correctly with its chosen module system
-
✅ No breaking changes required to existing services
-
⚠ Inconsistency increases onboarding friction and makes copy-paste between services error-prone
-
⚠
createRequirebridge inusers-service.jsis unusual and requires a comment explaining why -
Mitigation: Each service has its own
package.jsonclearly indicating module system;"type"field is explicit in all cases
9.19. ADR-019: express-prom-bundle for Prometheus Metrics in Node.js Services
-
Status: Implemented ✅
-
Date: 2026-03-01
Context:
Gateway and users services need to expose Prometheus metrics. Options: manual metric registration
with prom-client, or a middleware wrapper that auto-instruments all HTTP requests.
Decision:
Use express-prom-bundle as Express middleware in gateway and users services. Configuration:
promBundle({
includeMethod: true,
includePath: true,
includeStatusCode: true,
normalizePath: true,
})
normalizePath: true collapses parametric paths (e.g., /users/alice and /users/bob → /users/#val)
to prevent high-cardinality metric labels.
Alternatives Considered:
-
Manual
prom-clientcounters: Rejected — requires instrumenting every route handler individually; high boilerplate; easy to miss routes -
OpenTelemetry: Considered — more powerful but significantly more complex to configure for a course project; overkill for three services
Consequences:
-
✅ All HTTP requests automatically tracked with method, path, and status code labels
-
✅
normalizePathprevents cardinality explosion from user-specific paths -
✅ Consistent metric names across gateway and users (
http_requests_total,http_request_duration_seconds) -
⚠
normalizePathmay collapse paths that should be tracked separately -
⚠ Metrics endpoint (
/metrics) is publicly accessible — acceptable since it contains no sensitive data
9.20. ADR-020: React Context API for Global State (Theme and i18n)
-
Status: Implemented ✅
-
Date: 2026-03-01
Context: The webapp needs two pieces of global state shared across all components: the current language (EN/ES) and the current theme (dark/light). Options: prop drilling, React Context, or a state management library (Redux, Zustand, Jotai).
Decision:
Use the React Context API with custom providers (ThemeProvider, I18nProvider) wrapping the
entire application at the main.tsx level. Each provider exposes a custom hook (useTheme(),
useI18n()) that components call directly.
Alternatives Considered:
-
Prop drilling: Rejected — theme and language are needed in virtually every component; prop drilling would pollute every component signature
-
Redux Toolkit: Rejected — significant boilerplate for two simple global values; no async state needed
-
Zustand: Considered — lightweight, but adds a dependency; React Context is sufficient for synchronous, rarely-changing global state
Consequences:
-
✅ Zero additional dependencies
-
✅ Clean API:
const { t } = useI18n()andconst { theme, toggleTheme } = useTheme() -
✅ Providers are independently testable
-
⚠ Context re-renders all consumers on every change — acceptable since theme/language changes are rare
-
⚠
useTheme()anduseI18n()return fallback values if called outside their provider, preventing runtime errors but potentially silencing configuration mistakes
9.21. ADR-021: Room Code Generation Without Ambiguous Characters
-
Status: Implemented ✅
-
Date: 2026-03-10
Context: Multiplayer rooms need unique, human-typeable codes that players share out of band (chat, voice). Codes must be short, unambiguous, and easy to read aloud.
Decision:
Generate 6-character codes from a custom alphabet "ABCDEFGHJKLMNPQRSTUVWXYZ23456789" that
excludes visually ambiguous characters: O (confused with 0), I (confused with 1),
0 (zero), and 1 (one). The generator retries up to 1000 times to guarantee uniqueness against
existing room codes.
Alternatives Considered:
-
UUID: Rejected — 36 characters; impossible to type from memory or dictate verbally
-
Sequential numeric ID: Rejected — predictable; allows enumeration of all active rooms
-
Full alphabet including ambiguous chars: Rejected —
O/0andI/1confusion causes UX friction when sharing codes verbally
Consequences:
-
✅ Codes are short (6 chars), unique, and unambiguous when read aloud
-
✅ 32^6 ≈ 1 billion possible codes — no practical collision risk at current scale
-
✅
generateUniqueRoomCode()retries up to 1000 times — effectively guaranteed for any reasonable number of active rooms -
⚠ Case-sensitive codes (all uppercase) — enforced by
toUpperCase()on input
9.22. ADR-022: Debounced Search in Social Features
-
Status: Implemented ✅
-
Date: 2026-03-18
Context: The Social page provides a live user search. Without rate limiting, every keystroke would trigger an API call to the users service, causing unnecessary load and poor UX for slow typists.
Decision:
Implement a 400ms debounce on the search input using a useRef<NodeJS.Timeout> timer in the
Social.tsx component. The search request is only sent after the user stops typing for 400ms.
Minimum query length is 1 character.
Alternatives Considered:
-
No debounce (fire on every keystroke): Rejected — excessive API calls; users service and MongoDB would receive a request per character typed
-
Search-on-submit (button click): Rejected — poor UX; most modern search UIs are live
-
Throttle instead of debounce: Rejected — debounce is more appropriate for search: we want to wait until the user finishes typing, not limit to one request per N ms
Consequences:
-
✅ Significant reduction in API calls during normal typing (e.g., typing "alice" = 1 request vs 5)
-
✅ Zero additional dependencies — implemented with native
setTimeout/clearTimeout -
⚠ 400ms delay is noticeable on fast connections — acceptable trade-off for reduced server load
-
⚠ Debounce timer must be cleared on component unmount to avoid memory leaks — handled via
useRefpattern
9.23. Summary of All Architectural Decisions
| ID | Decision | Date | Status |
|---|---|---|---|
ADR-001 |
Extensible game modes — PvM, PvP, local PvP |
2026-01-15 |
✅ |
ADR-002 |
MongoDB + Mongoose for user data |
2026-01-20 |
✅ |
ADR-003 |
Eight-service microservices (evolved from three) |
2026-01-15 |
✅ updated |
ADR-004 |
Nginx as public reverse proxy with WS upgrade |
2026-01-20 |
✅ updated |
ADR-005 |
Strategy Pattern for AI — six |
2026-01-25 |
✅ updated |
ADR-006 |
Dedicated authentication microservice with bcrypt |
2026-02-24 |
✅ |
ADR-007 |
Axum as HTTP framework for Rust game engine |
2026-02-01 |
✅ |
ADR-008 |
Custom i18n — zero dependencies, EN/ES |
2026-03-01 |
✅ |
ADR-009 |
Socket.IO for real-time multiplayer in dedicated service |
2026-03-10 |
✅ |
ADR-010 |
YEN as universal game state format |
2026-01-20 |
✅ |
ADR-011 |
Role-based access control via JWT claims |
2026-03-15 |
✅ |
ADR-012 |
Union-Find for win condition detection |
2026-02-10 |
✅ |
ADR-013 |
Barycentric coordinate system for the game board |
2026-02-10 |
✅ |
ADR-014 |
Express 5 for Node.js services |
2026-02-15 |
✅ |
ADR-015 |
In-memory state for BotAPI and multiplayer sessions |
2026-03-10 |
✅ |
ADR-016 |
Test database isolation via URI suffix |
2026-02-20 |
✅ |
ADR-017 |
Vitest (ESM) + Jest (CJS) as test runners |
2026-02-15 |
✅ |
ADR-018 |
ESM vs CommonJS module system split |
2026-01-20 |
✅ |
ADR-019 |
|
2026-03-01 |
✅ |
ADR-020 |
React Context API for theme and i18n global state |
2026-03-01 |
✅ |
ADR-021 |
Room code generation without ambiguous characters |
2026-03-10 |
✅ |
ADR-022 |
400ms debounced search in social features |
2026-03-18 |
✅ |
10. Quality Requirements
This section makes the quality goals from Section 1 concrete, measurable, and testable through a quality tree and specific scenarios following the ATAM approach.
10.1. Quality Tree
10.2. Quality Scenarios
10.2.1. QS-01: Bot Move Calculation Performance
| Aspect | Description |
|---|---|
ID |
QS-01 |
Quality Goal |
Performance / Response Time |
Stimulus |
User makes a move in a PvM game against |
Environment |
Normal operation, board size ≤ 11×11 |
Artifact |
Gamey service ( |
Response |
System validates player move, computes bot response, and returns updated YEN |
Response Measure |
95% of responses within 2 seconds ( |
Verification |
k6 load test |
10.2.2. QS-02: Login Response Time Under Load
| Aspect | Description |
|---|---|
ID |
QS-02 |
Quality Goal |
Performance / Response Time |
Stimulus |
50 concurrent users log in simultaneously |
Environment |
Production deployment under load test conditions |
Artifact |
Gateway → Auth Service → Users Service |
Response |
All users receive a valid JWT token |
Response Measure |
|
Verification |
k6 load test |
10.2.3. QS-03: Concurrent User Throughput
| Aspect | Description |
|---|---|
ID |
QS-03 |
Quality Goal |
Performance / Throughput |
Stimulus |
50 concurrent users registering simultaneously |
Environment |
Production deployment |
Artifact |
Gateway → Auth → Users → MongoDB |
Response |
All users are registered or receive a meaningful error (duplicate username) |
Response Measure |
|
Verification |
k6 load test |
10.2.4. QS-04: New User Learnability
| Aspect | Description |
|---|---|
ID |
QS-04 |
Quality Goal |
Usability / Learnability |
Stimulus |
A first-time user wants to start a game against the AI |
Environment |
User has no prior knowledge of Game Y or YOVI |
Artifact |
Webapp — Home, SelectDifficulty, Game pages |
Response |
User navigates to a running game without external instructions |
Response Measure |
80% of new users complete first move within 2 minutes in usability testing |
Verification |
Usability test with 5 new participants; task completion time recorded |
10.2.5. QS-05: Internationalization Completeness
| Aspect | Description |
|---|---|
ID |
QS-05 |
Quality Goal |
Usability / Internationalization |
Stimulus |
User switches language to Spanish (or English) |
Environment |
Any page of the webapp |
Artifact |
|
Response |
All user-facing strings are displayed in the selected language |
Response Measure |
100% of translation keys present in both EN and ES; zero hardcoded UI strings |
Verification |
Automated check for missing keys; TypeScript |
10.2.6. QS-06: Admin Panel Usability
| Aspect | Description |
|---|---|
ID |
QS-06 |
Quality Goal |
Usability / Admin UX |
Stimulus |
An administrator needs to remove a user account from the platform |
Environment |
Admin user logged in with valid admin JWT |
Artifact |
Webapp |
Response |
Admin locates the user, confirms deletion, and the account is permanently removed with all associated data |
Response Measure |
Task completed in < 3 clicks from the admin panel; confirmation dialog prevents accidental deletion |
Verification |
Manual usability walkthrough; E2E test covering admin delete flow |
10.2.7. QS-07: AI Strategy Extensibility
| Aspect | Description |
|---|---|
ID |
QS-07 |
Quality Goal |
Maintainability / Extensibility |
Stimulus |
Developer needs to add a new AI bot strategy (e.g., RAVE MCTS) |
Environment |
Development environment, existing codebase |
Artifact |
|
Response |
New strategy implemented and accessible via HTTP API |
Response Measure |
Implementation requires only: (1) new struct implementing |
Verification |
Code review confirms no existing file was modified; new bot appears in |
10.2.8. QS-08: Test Coverage
| Aspect | Description |
|---|---|
ID |
QS-08 |
Quality Goal |
Maintainability / Testability |
Stimulus |
Developer merges a pull request |
Environment |
CI pipeline (GitHub Actions) |
Artifact |
All services |
Response |
SonarCloud analyses coverage; PR is blocked if below threshold |
Response Measure |
Code coverage > 80% across all services as enforced by SonarCloud quality gate |
Verification |
Automated coverage reports in CI; SonarCloud dashboard badge in README |
10.2.9. QS-09: System Availability
| Aspect | Description |
|---|---|
ID |
QS-09 |
Quality Goal |
Reliability / Availability |
Stimulus |
Evaluators and users access the platform during the evaluation period |
Environment |
Production deployment on cloud VM |
Artifact |
All services via Nginx |
Response |
System responds to requests normally |
Response Measure |
99% uptime during evaluation period; monitored via Grafana "Yovi Services Overview" dashboard |
Verification |
Uptime tracking via Prometheus; Grafana alert on sustained error rate spike |
10.2.10. QS-10: Crash Recovery
| Aspect | Description |
|---|---|
ID |
QS-10 |
Quality Goal |
Reliability / Fault Tolerance |
Stimulus |
A service container crashes (e.g., gamey OOM, multiplayer panic) |
Environment |
Production, during active usage |
Artifact |
Any application container |
Response |
Docker detects the failure and restarts the container automatically |
Response Measure |
Service restored within 30 seconds; other services unaffected during restart |
Verification |
Chaos test: |
10.2.11. QS-11: Data Durability
| Aspect | Description |
|---|---|
ID |
QS-11 |
Quality Goal |
Reliability / Data Durability |
Stimulus |
The users service container or VM restarts unexpectedly |
Environment |
Post-restart |
Artifact |
MongoDB (Atlas), Users Service |
Response |
All user profiles, match history, friends, and notifications are intact |
Response Measure |
Zero data loss — 100% of committed records recoverable |
Verification |
Restart container, verify data via |
10.2.12. QS-12: Token Expiry Enforcement
| Aspect | Description |
|---|---|
ID |
QS-12 |
Quality Goal |
Security / Authentication |
Stimulus |
A user with an expired JWT attempts to access a protected endpoint |
Environment |
Token issued more than |
Artifact |
Auth Service ( |
Response |
Request is rejected and user is redirected to the login page |
Response Measure |
100% of expired token requests return 401; webapp clears |
Verification |
Integration test with manually expired token; E2E test simulating session expiry |
10.2.13. QS-13: Admin Role Enforcement
| Aspect | Description |
|---|---|
ID |
QS-13 |
Quality Goal |
Security / Authorization (RBAC) |
Stimulus |
A regular user attempts to access an admin endpoint directly (bypassing the UI) |
Environment |
Production; user has a valid non-admin JWT |
Artifact |
Gateway admin route middleware |
Response |
Request is rejected with 403 Forbidden |
Response Measure |
100% of non-admin requests to admin endpoints return 403; no data is returned |
Verification |
Integration test sending valid user token to |
10.2.14. QS-14: Bot API Interoperability
| Aspect | Description |
|---|---|
ID |
QS-14 |
Quality Goal |
Interoperability / Bot API |
Stimulus |
An external bot creates a game, plays three moves, and queries the state |
Environment |
Production endpoint |
Artifact |
BotAPI service |
Response |
All three operations succeed; YEN state is consistent across requests |
Response Measure |
100% of valid YEN requests handled correctly; |
Verification |
API tests in |
10.2.15. QS-15: Cross-Team Bot-vs-Bot Match
| Aspect | Description |
|---|---|
ID |
QS-15 |
Quality Goal |
Interoperability / Cross-team |
Stimulus |
Our bot plays a complete game against a rival team’s API using the remote interop client |
Environment |
Both YOVI and the rival API deployed and accessible |
Artifact |
BotAPI remote interop service; |
Response |
Session created, turns played alternately until game finishes; final status returned |
Response Measure |
Game completes without error; |
Verification |
Manual test during course interop session; |
10.3. Traceability to Quality Goals
| Quality Goal (Section 1) | Related Scenarios | Key Metrics |
|---|---|---|
Functionality |
QS-01, QS-06, QS-07, QS-14, QS-15 |
Correct game rules; admin ops; extensible AI; bot API compliance |
Usability |
QS-04, QS-05, QS-06 |
New user < 2min; 100% i18n coverage; admin task < 3 clicks |
Reliability & Availability |
QS-09, QS-10, QS-11 |
99% uptime; crash recovery < 30s; zero data loss |
Modularity & Maintainability |
QS-07, QS-08 |
New AI strategy = 2 changes; coverage > 80% |
Security |
QS-12, QS-13 |
100% expired tokens rejected; 100% non-admin admin routes blocked |
Testability |
QS-08 |
Coverage > 80% enforced by SonarCloud |
Interoperability |
QS-14, QS-15 |
Valid YEN always accepted; cross-team game completes |
Performance |
QS-01, QS-02, QS-03 |
p95 move < 2s; p95 login < 1.5s; 50 concurrent users |
11. Risks and Technical Debts
Each item is assessed using a risk matrix:
-
Probability: Low (1), Medium (2), High (3)
-
Impact: Low (1), Medium (2), High (3)
-
Risk Score = Probability × Impact (1–9)
Items with score ≥ 6 require active mitigation. Items with score ≥ 8 are critical.
11.1. Technical Risks
| ID | Risk Description | Prob | Impact | Score | Mitigation Strategy |
|---|---|---|---|---|---|
R1 |
TypeScript–Rust integration — Serialization bugs in the YEN format between Node.js and Rust services (e.g., field naming, null vs. undefined, integer overflow) |
2 (M) |
3 (H) |
6 |
Versioned JSON interface (YEN); contract tests in BotAPI ( |
R2 |
Game Y rules misunderstanding — Incorrect Union-Find win detection (e.g., off-by-one in barycentric coordinates, wrong side-touch detection) |
2 (M) |
3 (H) |
6 |
Property-based tests with |
R3 |
Multiplayer room state loss — In-memory |
3 (H) |
2 (M) |
6 |
Accepted trade-off for current scope; rooms are short-lived; players can create a new room; documented in Section 8; Redis adapter is the documented upgrade path for horizontal scaling |
R4 |
Socket.IO scalability — Multiplayer service cannot be scaled horizontally without a shared session store; a single instance becomes a bottleneck under high PvP load |
2 (M) |
2 (M) |
4 |
Current deployment is single-instance; Redis Socket.IO adapter documented as future upgrade; acceptable for course evaluation scale (< 50 concurrent users) |
R5 |
Bot API remote interop host allowlist — The hardcoded allowlist in |
2 (M) |
2 (M) |
4 |
Allowlist is configurable in source; update process is documented; fallback: disable check for trusted environments via environment variable |
R6 |
AI performance on large boards — |
2 (M) |
2 (M) |
4 |
Performance budget enforced in k6 threshold; board size is user-configurable with no enforced maximum;
|
R7 |
MongoDB Atlas dependency — External managed database; outage would take down the entire platform since users, auth, and game results all depend on it |
1 (L) |
3 (H) |
3 |
Atlas provides 99.95% SLA with automatic failover; |
R8 |
JWT secret exposure — If |
1 (L) |
3 (H) |
3 |
Secret stored in GitHub Secrets only; never committed; short token lifetime ( |
R9 |
Nginx single point of failure — All external traffic routes through a single Nginx container; a crash takes down the entire public-facing system |
1 (L) |
3 (H) |
3 |
Docker |
R10 |
Scope creep — optional features delay evaluation deliverables — Social features, admin panel, and cross-team interop were added incrementally; each adds surface area for bugs and testing debt |
2 (M) |
2 (M) |
4 |
MoSCoW prioritisation; Definition of Done enforced per feature; SonarCloud coverage gate prevents shipping undertested features |
11.2. Technical Debts
| ID | Debt Description | Impact if Not Fixed | Status / Planned Resolution |
|---|---|---|---|
TD1 |
Multiplayer room state is in-memory only — Active PvP rooms are lost if the multiplayer container restarts. Players must create a new room after any service interruption. |
Medium — poor UX during deployments or crashes; unacceptable for production at scale |
Accepted for current scope. Future: Redis Socket.IO adapter for persistent room state across restarts and horizontal scaling. |
TD2 |
BotAPI remote host allowlist is hardcoded — |
Low — only affects cross-team interop sessions; can be unblocked quickly |
Planned: move allowlist to an environment variable ( |
TD3 |
Users service |
Low — internal service not consumed by external developers; Swagger UI is misleading |
Planned: update |
TD4 |
No pagination on most list endpoints — |
Low-Medium — acceptable for current data volumes; degrades as user base grows |
Planned: add |
TD5 |
Plain-text password comparison — ~~The authentication service compared passwords in plain text~~ |
~~High — critical security vulnerability~~ |
✅ Resolved (2026-03-05) — bcrypt hashing implemented in |
TD6 |
Multiplayer game results not persisted — PvP game outcomes are not saved to MongoDB. The users
service |
Medium — users expect their multiplayer wins to count; ranking is PvM-only |
Planned: multiplayer service calls |
TD7 |
Hint system uses alfa_beta_bot regardless of selected difficulty — |
Low — hints still work and are useful; inconsistency may confuse advanced users |
Planned: pass the selected bot ID as a parameter to |
12. Load Testing
This section documents the load testing strategy, tooling, test scenarios, thresholds, and how to execute
the test suite. Load tests are a first-class concern in YOVI (see Section 2, Technical Constraints) and
are maintained alongside the production code in tests/load/.
12.1. Tooling
YOVI uses k6 (https://k6.io) for load testing. k6 provides:
-
JavaScript-based test scripts with a familiar API
-
Built-in virtual user (VU) management with ramping stages
-
Custom metrics (
Trend,Rate) for fine-grained measurement -
Threshold definitions that fail the test if SLAs are breached
-
JSON result export for archiving and trend analysis
-
setup()/teardown()lifecycle hooks for prerequisite data
12.2. Test Scenarios
Three scenarios are defined, each targeting a different critical user flow:
12.2.1. Scenario 1: User Registration (register.js)
Simulates 50 concurrent users registering new accounts against POST /register.
| Parameter | Value |
|---|---|
Target endpoint |
|
VU ramp |
0 → 50 VUs over 10s; hold 50 VUs for 30s; ramp down over 10s |
Custom metrics |
|
Thresholds |
|
Request body |
Unique username per VU+timestamp: |
Expected response |
HTTP 200 or 201; no |
export const options = {
scenarios: {
registration_load: {
executor: "ramping-vus",
startVUs: 0,
stages: [
{ duration: "10s", target: 50 },
{ duration: "30s", target: 50 },
{ duration: "10s", target: 0 },
],
},
},
thresholds: {
register_duration: ["p(95)<2000"],
register_success: ["rate>0.95"],
http_req_failed: ["rate<0.05"],
},
};
12.2.2. Scenario 2: User Login (login.js)
Simulates 50 concurrent users logging in with pre-seeded credentials against POST /login.
| Parameter | Value |
|---|---|
Target endpoint |
|
VU ramp |
0 → 50 VUs over 10s; hold 50 VUs for 30s; ramp down over 10s |
Custom metrics |
|
Thresholds |
|
setup() hook |
Creates 10 seed users ( |
VU behaviour |
Each VU picks a seed user in round-robin: |
Expected response |
HTTP 200; |
export const options = {
thresholds: {
login_duration: ["p(95)<1500"],
login_success: ["rate>0.95"],
http_req_failed: ["rate<0.05"],
},
};
12.2.3. Scenario 3: Start Bot Game (start_game.js)
Simulates 20 concurrent users starting a new bot game against POST /game/new.
| Parameter | Value |
|---|---|
Target endpoint |
|
VU ramp |
0 → 20 VUs over 10s; hold 20 VUs for 30s; ramp down over 10s |
Custom metrics |
|
Thresholds |
|
setup() hook |
Registers |
Request body |
|
Expected response |
HTTP 200 or 201; |
Note: The game start threshold (p(95) < 3000ms) is more lenient than auth endpoints because the Rust
game engine initialises board state synchronously on each request.
12.3. Running the Tests
12.3.1. Prerequisites
-
k6 installed: https://k6.io/docs/getting-started/installation/
-
YOVI services running (locally or pointing to production)
12.3.2. Run a single scenario
# Against local deployment (default)
k6 run tests/load/register.js
# Against production
k6 run -e BASE_URL=https://yovi.13.63.89.84.sslip.io/api tests/load/login.js
12.3.3. Run all scenarios sequentially
Two runner scripts are provided:
Linux / macOS:
chmod +x tests/load/run_load_tests.sh
./tests/load/run_load_tests.sh
# Against production:
./tests/load/run_load_tests.sh https://yovi.13.63.89.84.sslip.io/api
Windows (PowerShell):
.\tests\load\run_load_tests.ps1
# Against production:
.\tests\load\run_load_tests.ps1 -BaseUrl https://yovi.13.63.89.84.sslip.io/api
The runner scripts execute all three scenarios sequentially and save results as JSON and plain text logs
to tests/load/results/ with a timestamp suffix (e.g., register_20260415_143022.json).
12.4. Results Interpretation
A k6 run produces a summary table on stdout. Key metrics to check:
| Metric | Threshold | What to look for |
|---|---|---|
|
< 2000ms |
High p95 indicates database pressure or auth service saturation |
|
< 1500ms |
Login is faster than register (no bcrypt hash write); high p95 indicates auth or users service bottleneck |
|
< 3000ms |
High p95 may indicate gamey is CPU-bound on bot initialisation for size 7 |
|
< 5% (< 10% for game) |
Any failures above threshold indicate service errors under load; check service logs |
|
> 95% |
Failures may indicate duplicate username collisions (expected ~5%) or service errors |
|
> 95% |
Failures indicate seed users were not created or credential mismatch |
A ✓ next to each threshold in the k6 summary means the SLA was met. A ✗ means the threshold was breached and the test suite should be considered failed.
12.5. Architecture of the Load Test Suite
12.6. Custom Metrics Detail
Each scenario defines two custom k6 metrics:
| Scenario | Metric | Type | What it measures |
|---|---|---|---|
Register |
|
Trend |
End-to-end latency of |
Register |
|
Rate |
Proportion of requests that returned HTTP 2xx without an |
Login |
|
Trend |
End-to-end latency of |
Login |
|
Rate |
Proportion of requests that returned HTTP 200 with a |
Start Game |
|
Trend |
End-to-end latency of |
Start Game |
|
Rate |
Proportion of requests that returned HTTP 2xx with a |
12.7. Known Limitations
-
No multiplayer load test: Socket.IO load testing requires a dedicated k6 xk6-websockets extension or a separate tool (e.g., Artillery). Not included in the current suite.
-
No admin endpoint load test: Admin operations are low-frequency by design and not included.
-
Seed user collisions: The login test creates 10 seed users before the VUs start. If the test is run repeatedly without cleanup,
409 Conflictresponses are expected and ignored insetup(). -
Production rate limits: Running the full suite against the production URL (
https://yovi.13.63.89.84.sslip.io/api) may impact real users during the test window. Prefer running against a local or staging deployment for development purposes.
13. API Reference
This section provides a centralised reference of all HTTP endpoints exposed by every YOVI service.
Endpoints are grouped by service. All internal services communicate over the Docker bridge network
(monitor-net); only the routes listed under Gateway and BotAPI are reachable by external
clients through Nginx.
For the full interactive OpenAPI specification of the Bot API, see botapi/src/openapi/openapi.yaml.
13.1. Gateway (/api/*)
The gateway is the single entry point for all webapp traffic. All routes listed here are accessible
at https://yovi.13.63.89.84.sslip.io/api/<route> in production. JWT is required on all protected
routes unless stated otherwise.
13.1.1. Authentication Routes
| Method | Path | Auth | Body / Params | Description |
|---|---|---|---|---|
|
|
None |
|
Validates payload, hashes password with bcrypt, creates user, returns JWT. Forwards to Auth Service. |
|
|
None |
|
Verifies bcrypt hash, returns JWT with role claim. Forwards to Auth Service. |
|
|
Bearer JWT |
— |
Validates JWT signature and expiry. Returns decoded claims |
13.1.2. User Routes
| Method | Path | Auth | Body / Params | Description |
|---|---|---|---|---|
|
|
Bearer JWT |
— |
Returns public profile (no password): username, email, realName, bio, location, preferredLanguage, friends count. |
|
|
Bearer JWT |
|
Updates editable profile fields. Only allowed fields are applied ( |
|
|
Bearer JWT |
— |
Deletes own account and all associated game results and notifications. |
13.1.3. Game Routes
| Method | Path | Auth | Body / Params | Description |
|---|---|---|---|---|
|
|
Bearer JWT |
|
Creates a new empty game board. Returns initial YEN state. Forwards to Gamey. |
|
|
Bearer JWT |
|
Applies player move and returns bot response. |
|
|
Bearer JWT |
|
Returns a suggested move for the current position using |
13.1.4. Game Result Routes
| Method | Path | Auth | Body / Params | Description |
|---|---|---|---|---|
|
|
Bearer JWT |
|
Saves a game result. |
|
|
Bearer JWT |
— |
Returns all game results for a user, newest first. |
|
|
Bearer JWT |
— |
Returns top-10 users by wins. MongoDB aggregation pipeline. |
13.1.5. Social Routes
| Method | Path | Auth | Body / Params | Description |
|---|---|---|---|---|
|
|
Bearer JWT |
|
Searches users by username or realName. Returns matching user profiles (no passwords). |
|
|
Bearer JWT |
— |
Returns the accepted friends list of a user. |
|
|
Bearer JWT |
— |
Sends a friend request to |
|
|
Bearer JWT |
— |
Accepts a friend request from |
|
|
Bearer JWT |
— |
Rejects a friend request from |
|
|
Bearer JWT |
— |
Removes |
13.1.6. Notification Routes
| Method | Path | Auth | Body / Params | Description |
|---|---|---|---|---|
|
|
Bearer JWT |
— |
Returns all notifications for the authenticated user, newest first. Username extracted from JWT. |
|
|
Bearer JWT |
— |
Marks notification |
13.1.7. Admin Routes
Admin routes require a valid JWT with role: "admin". Regular user tokens receive 403 Forbidden.
| Method | Path | Auth | Body / Params | Description |
|---|---|---|---|---|
|
|
Admin JWT |
|
Returns paginated list of all registered users with profile and role information. |
|
|
Admin JWT |
|
Grants or revokes admin privileges on the target user account. |
|
|
Admin JWT |
— |
Deletes all game results for |
|
|
Admin JWT |
— |
Permanently deletes the user account and all associated data (game results, notifications). Also
removes the user from all other users' friend lists via |
13.1.8. Multiplayer REST Routes (via Gateway)
| Method | Path | Auth | Body / Params | Description |
|---|---|---|---|---|
|
|
Bearer JWT |
|
Creates a new room. Calls Gamey for initial YEN. Returns room code and player color ( |
|
|
Bearer JWT |
|
Joins an existing room by code. Returns room state and player color ( |
|
|
Bearer JWT |
|
Returns current room state and the requesting user’s color. |
|
|
Bearer JWT |
|
Submits a move. Validates turn order, forwards to Gamey, emits |
|
|
Bearer JWT |
|
Removes player from room. Emits |
13.2. Gamey — Game Engine (gamey:4000)
Internal service — not directly accessible from outside the Docker network. Called by Gateway, Multiplayer, and BotAPI.
| Method | Path | Body | Description |
|---|---|---|---|
|
|
— |
Health check. Returns |
|
|
— |
Prometheus metrics endpoint. Scraped by Prometheus every 15 seconds. |
|
|
|
Creates a new empty game. Returns initial YEN with all |
|
|
YEN body |
Checks current win condition without applying a move. Returns |
|
|
|
Applies player move at |
|
|
|
Applies a PvP move at |
|
|
YEN body |
Returns the bot’s chosen move coordinates without applying it to the board.
Returns |
Available bot_id values:
| bot_id | Algorithm | Difficulty |
|---|---|---|
|
Random valid move |
— |
|
Side connection heuristic |
Easy |
|
Minimax (depth 3) |
Medium |
|
Minimax + alpha-beta pruning |
Hard |
|
Monte Carlo Tree Search |
Expert |
|
MCTS (more iterations) |
Extreme |
13.3. Authentication Service (authentication:5000)
Internal service — called exclusively by the Gateway.
| Method | Path | Body | Description |
|---|---|---|---|
|
|
|
Validates payload, hashes password, calls Users Service to create user, returns JWT. |
|
|
|
Fetches user from Users Service, bcrypt compares password, returns JWT with role claim. |
|
|
Authorization header |
Verifies JWT signature and expiry. Returns decoded claims. |
|
|
— |
Returns |
13.4. Users Service (users:3000)
Internal service — called by Gateway and Authentication Service. Swagger UI available at
http://users:3000/api-docs (note: currently documents only a subset of endpoints — tracked as TD3).
| Method | Path | Body / Params | Description |
|---|---|---|---|
|
|
|
Creates a new user with bcrypt-hashed password and welcome notification. |
|
|
— |
Returns user including hashed password (used by Auth Service for bcrypt comparison). |
|
|
— |
Returns user profile without password field. |
|
|
|
Updates allowed profile fields using |
|
|
— |
Deletes user and all associated game results and notifications. |
|
|
— |
Returns the |
|
|
|
Adds |
|
|
|
Removes |
|
|
|
Removes |
|
|
— |
Bidirectional |
|
|
— |
Returns all notifications for the user, sorted newest first. |
|
|
— |
Sets |
|
|
|
Saves a game result document. |
|
|
— |
Returns all game results for the user, newest first. |
|
|
— |
Top-10 users by wins via MongoDB aggregation. |
|
|
|
Text search on username and realName fields. |
|
|
|
(Admin) Paginated list of all users. |
|
|
|
(Admin) Grants or revokes admin role. |
|
|
— |
(Admin) Deletes all game results for user. |
|
|
— |
(Admin) Deletes user account and all associated data. |
|
|
— |
Returns |
|
|
— |
Prometheus metrics endpoint. |
13.5. Multiplayer Service (multiplayer:7000)
Internal service — REST endpoints called by Gateway. Socket.IO server accessible via Nginx at
/socket.io/*. See Section 8 (Cross-cutting Concepts) for the full Socket.IO
event catalogue.
| Method | Path | Body | Description |
|---|---|---|---|
|
|
— |
Returns |
|
|
— |
Returns serialized room state (no socket IDs). |
|
|
|
Creates room, calls Gamey for initial YEN, returns |
|
|
|
Joins room, sets status to |
|
|
|
Returns current room state and player color. |
|
|
|
Validates turn, calls Gamey PvP move, emits |
|
|
|
Removes player, emits |
13.6. BotAPI — Interoperability Service (/interop/*)
Public service — accessible at https://yovi.13.63.89.84.sslip.io/interop via Nginx.
Full OpenAPI 3.1 spec:
openapi.yaml.
13.6.1. Local Game Endpoints (Server Mode)
External bots use these endpoints to play against YOVI’s bots.
| Method | Path | Body / Params | Description |
|---|---|---|---|
|
|
— |
Returns |
|
|
|
Creates a new active game session. Returns |
|
|
— |
Returns current state of an active game. |
|
|
|
Submits opponent move (detects single added move vs. stored YEN), gets bot response. Returns updated state. |
|
|
|
Stateless move: given a YEN position, returns bot move coordinates without storing state. |
13.6.2. Remote Session Endpoints (Client Mode)
YOVI’s bots use these endpoints to play against rival teams' APIs.
| Method | Path | Body | Description |
|---|---|---|---|
|
|
|
Connects to an existing game on a rival API. Creates a local session tracking the remote game. |
|
|
|
Creates a new game on a rival API and stores a local session. |
|
|
— |
Returns stored remote session information including last known state. |
|
|
— |
Syncs remote state. If it is our turn: chooses move via Gamey, applies to remote API. Returns
|
13.7. Error Response Format
All Node.js services return errors in the following format:
{
"ok": false,
"error": "Human-readable error message"
}
The BotAPI (TypeScript) uses a slightly different format matching the OpenAPI spec:
{
"code": "NOT_FOUND",
"message": "game abc123 not found"
}
Standard HTTP status codes apply across all services. See Section 8 (Cross-cutting Concepts) for the complete status code reference.
14. Monitoring and Observability
This section documents the observability stack, the metrics collected, the Grafana dashboard, and how to access monitoring tools in local and production environments.
14.1. Overview
YOVI uses a Prometheus + Grafana stack for metrics-based observability. Three services expose
Prometheus-compatible /metrics endpoints: the gateway, the users service, and the gamey service.
Prometheus scrapes all three every 15 seconds. Grafana visualises the collected time series in a
pre-built dashboard that auto-provisions at container startup.
14.2. Instrumented Services
| Service | Library | Metrics endpoint | Metric prefix |
|---|---|---|---|
Gateway |
|
|
|
Users Service |
|
|
|
Gamey |
|
|
All express-prom-bundle instances are configured with:
promBundle({
includeMethod: true,
includePath: true,
includeStatusCode: true,
normalizePath: true, // prevents cardinality explosion from dynamic path params
})
See ADR-013 for the rationale behind normalizePath: true.
14.3. Prometheus Configuration
Prometheus is configured via users/monitoring/prometheus/prometheus.yml:
global:
scrape_interval: 15s # standard recommended value for production
scrape_configs:
- job_name: 'users-service'
static_configs:
- targets: ['users:3000']
- job_name: 'gateway-service'
static_configs:
- targets: ['gateway:8080']
- job_name: 'gamey-service'
static_configs:
- targets: ['gamey:4000']
Key decisions:
-
15-second scrape interval — the standard recommended value; balances resolution with storage overhead
-
Static targets — service discovery is not needed since all containers are on a fixed internal Docker network with predictable hostnames
-
No authentication on
/metrics— endpoints are only reachable on the internal Docker network; Nginx does not expose/metricsexternally
14.4. Grafana Dashboard
The "Yovi Services Overview" dashboard (UID yovi-overview) auto-provisions at Grafana startup via
the provisioning directory at users/monitoring/grafana/provisioning/.
Provisioning files:
-
datasources/datasource.yml— registers Prometheus athttp://prometheus:9090as the default data source -
dashboards/dashboard.yml— configures the dashboard provider (file-based, 30s update interval) -
dashboards/dashboard.json— the dashboard definition with all panels
14.4.1. Dashboard Panels
The dashboard contains three panels, each covering all three instrumented services simultaneously:
Panel 1 — Request Rate (req/s)
Displays the per-second HTTP request rate for each service, broken down by method, path, and status code.
# Users and Gateway (express-prom-bundle)
rate(http_requests_total{job="users-service"}[1m])
rate(http_requests_total{job="gateway-service"}[1m])
# Gamey (axum-prometheus)
rate(axum_http_requests_total{job="gamey-service"}[1m])
Legend format: {service} — {method} {path} {status_code}
Panel 2 — P95 Request Duration (seconds)
Displays the 95th percentile request duration for each service over a 5-minute window.
# Users and Gateway
histogram_quantile(0.95,
rate(http_request_duration_seconds_bucket{job="users-service"}[5m])
)
histogram_quantile(0.95,
rate(http_request_duration_seconds_bucket{job="gateway-service"}[5m])
)
# Gamey
histogram_quantile(0.95,
rate(axum_http_requests_duration_seconds_bucket{job="gamey-service"}[5m])
)
Panel 3 — Error Rate (4xx + 5xx)
Displays the per-second rate of HTTP error responses (client and server errors combined) per service.
# Users and Gateway
rate(http_requests_total{
job="users-service",
status_code=~"4..|5.."
}[1m])
rate(http_requests_total{
job="gateway-service",
status_code=~"4..|5.."
}[1m])
# Gamey
rate(axum_http_requests_total{
job="gamey-service",
status_code=~"4..|5.."
}[1m])
A sustained spike in this panel indicates a service degradation or upstream dependency failure (e.g., MongoDB unavailable, Gamey container restarting).
14.5. Accessing Monitoring Tools
14.5.1. Local Development
After running docker-compose up:
| Tool | URL | Notes |
|---|---|---|
Prometheus |
Query interface; check targets at |
|
Grafana |
Default credentials: |
|
Users metrics |
Raw Prometheus text format; useful for debugging metric labels |
|
Gateway metrics |
Raw Prometheus text format |
|
Gamey metrics |
Raw Prometheus text format (axum-prometheus format) |
14.5.2. Production
In production, Prometheus (9090) and Grafana (9091) are exposed on the VM’s public IP but are not routed through Nginx. They should be protected by a firewall rule limiting access to trusted IPs.
To verify the monitoring stack is healthy in production:
# Check all scrape targets are UP
curl http://<VM_IP>:9090/api/v1/targets | jq '.data.activeTargets[].health'
# Check Grafana is running
curl -u admin:admin http://<VM_IP>:9091/api/health
14.6. Observability Gaps and Planned Improvements
| Gap | Impact | Planned improvement |
|---|---|---|
Multiplayer service has no Prometheus metrics |
No visibility into WebSocket connection count, room creation rate, or move throughput |
Add |
BotAPI has no Prometheus metrics |
No visibility into external bot session count or interop request rate |
Add |
Authentication service has no Prometheus metrics |
No visibility into login/register rates or bcrypt latency |
Add |
No alerting rules defined |
Grafana shows degradation visually but no automatic alerts are triggered |
Define Grafana alert rules for: error rate > 10% sustained for 2 minutes; p95 latency > 3s on gateway |
No structured logging |
Service logs are unstructured |
Add structured JSON logging (e.g., |
No uptime / synthetic monitoring |
No external check verifies the system is reachable from outside the VM |
Add an external uptime check (e.g., UptimeRobot, Grafana Cloud Synthetic Monitoring) hitting
|
15. Glossary
This glossary defines the most important domain and technical terms used throughout the YOVI project. It ensures that all stakeholders — developers, client, evaluators, and bot developers — have an identical understanding of these terms.
15.1. Domain Terms
| Term | Definition |
|---|---|
YOVI |
The name of the system developed in this project. A web-based platform for playing Game Y, supporting human players, administrators, and external bots. |
Game Y |
An abstract strategy board game played on a triangular board where two players compete to connect all three sides of the board with a connected group of their pieces. |
Classic Game Y |
The standard version of Game Y with the original rules and a triangular board. Mandatory implementation for this project. |
YEN (Y Exchange Notation) |
A JSON-based format for representing Game Y state, inspired by chess FEN notation. Consists of |
Board Size |
The edge length of the triangular board. A board of size N has N×(N+1)/2 total cells. The default size in YOVI is 7 (28 cells). Users can configure this when starting a game. |
Barycentric Coordinates |
The coordinate system used in YOVI to address cells on the triangular board. A cell is identified by three non-negative integers (x, y, z) where x + y + z = N − 1, and N is the board size. Each coordinate represents the distance from one of the three sides. |
Side A / Side B / Side C |
The three edges of the triangular board. A player wins by connecting all three sides with a single connected group of pieces. In barycentric coordinates: Side A = cells where x = 0; Side B = cells where y = 0; Side C = cells where z = 0. |
Player (Blue / Red) |
The two participants in a Game Y match. Blue ( |
Move / Placement |
A single action where a player places one piece on an empty cell of the board. Represented in YEN by
changing a |
Win Condition |
A player wins when they have a connected group of pieces that simultaneously touches all three sides of the board. Detected via Union-Find (Disjoint Set Union) in the game engine. |
PvM (Player vs Machine) |
A game mode where a human player competes against an AI bot. The bot is selected by the player from six available difficulty levels. |
PvP (Player vs Player) |
A game mode where two human players compete in real time via WebSockets. Each player joins the same room using a shared room code. |
Room |
A private multiplayer session identified by a unique 6-character code. Contains two player slots
(Blue and Red), the current YEN state, and the room status ( |
Room Code |
A 6-character identifier used to join a multiplayer room. Generated from an unambiguous alphabet
(no O/0/I/1) to minimise transcription errors. Example: |
User |
A registered human account with a username, bcrypt-hashed password, optional email, profile fields, friend list, notification list, and match history. |
Administrator (Admin) |
A user with the |
Match History |
A record of all games played by a user, stored in the |
Ranking |
A top-10 leaderboard of users ordered by number of wins. Computed via MongoDB aggregation pipeline on
the |
Friend Request |
An invitation sent by one user to another. Stored in the recipient’s |
Notification |
An in-app message for a user. Two types: |
Hint |
A suggested move computed by |
Bot |
An automated program that interacts with the YOVI system through the public interoperability API
( |
Interoperability (Interop) |
The capability of the YOVI system to interact with external bots and rival teams' APIs using a shared contract based on YEN notation. Supported in both server mode (external bots play our bots) and client mode (our bots play rival APIs). |
15.2. Technical Terms
| Term | Definition |
|---|---|
Microservice |
An independently deployable service with a single well-defined responsibility. YOVI comprises eight microservices: nginx, webapp, gateway, authentication, users, gamey, multiplayer, and botapi. |
Docker / Docker Compose |
Containerization platform (Docker) and multi-container orchestration tool (Docker Compose) used to package, run, and manage all YOVI services consistently across environments. |
GitHub Container Registry (GHCR) |
The container image registry at |
Nginx |
The reverse proxy used as the single public entry point for YOVI. Handles HTTPS termination, HTTP→HTTPS redirect, and path-based routing to webapp, gateway, multiplayer, and botapi. |
JWT (JSON Web Token) |
A compact, URL-safe token used for stateless authentication. Contains user identity claims ( |
bcrypt |
A password hashing algorithm used to store user passwords securely. YOVI uses a cost factor of 10
via the |
RBAC (Role-Based Access Control) |
An authorization model where access to resources is determined by the user’s role. YOVI has two roles:
|
Socket.IO |
A JavaScript library providing real-time bidirectional event-based communication over WebSockets, with automatic HTTP long-polling fallback. Used by the multiplayer service for PvP game events. |
WebSocket |
A communication protocol providing full-duplex channels over a single TCP connection. Used by Socket.IO
for real-time PvP communication. Nginx proxies WebSocket connections to the multiplayer service via the
|
REST API |
Representational State Transfer. The architectural style used for all HTTP-based service-to-service and external communication in YOVI. Resources are accessed via standard HTTP methods (GET, POST, PUT, PATCH, DELETE). |
Express |
A web framework for Node.js used in gateway (v5), authentication (v5), users (v5), and multiplayer (v4) services. Provides routing, middleware, and HTTP request/response handling. |
Axum |
An async HTTP framework for Rust built on Tokio. Used by the gamey service. Provides type-safe
request extractors, shared state via |
Tokio |
An asynchronous runtime for Rust. Used by the gamey service to handle concurrent HTTP requests efficiently without blocking threads. |
Mongoose |
An Object Document Mapper (ODM) for MongoDB in Node.js. Used by the users service to define schemas
( |
MongoDB |
A NoSQL document database used for user data persistence. Collections: |
MongoDB Atlas |
MongoDB’s managed cloud database service. Used in production; connection URI injected via |
Union-Find (Disjoint Set Union) |
A data structure used in the gamey game engine to efficiently track connected components of each
player’s pieces. The |
YBot |
The Rust trait that all AI bot strategies implement. Defines |
YBotRegistry |
A |
Vite |
A modern frontend build tool used by the webapp. Provides a fast development server with hot module replacement and an optimised production build pipeline. |
React |
A JavaScript library for building user interfaces via components. Used by the webapp for all UI pages, the notification panel, the admin panel, and the game board. |
React Context |
A React mechanism for sharing global state without prop drilling. Used in YOVI for |
Vitest |
A Vite-native unit testing framework used by gateway, authentication, and botapi services. Compatible
with the Jest API; supports coverage via |
Jest |
A JavaScript testing framework used by the users and multiplayer services. |
Supertest |
A Node.js HTTP assertion library used alongside Vitest/Jest to test Express HTTP endpoints without starting a real server. |
proptest |
A property-based testing library for Rust. Used in |
k6 |
An open-source load testing tool with a JavaScript API. Used for three load test scenarios:
registration (50 VUs), login (50 VUs), and game start (20 VUs). Threshold definitions enforce SLAs
on |
SonarCloud |
A cloud-based code quality and security analysis platform. Integrated into the CI pipeline to enforce a minimum 80% code coverage and detect code smells, bugs, and security hotspots. |
Prometheus |
An open-source time-series metrics collection system. Scrapes |
Grafana |
A metrics visualization platform. Provides the "Yovi Services Overview" dashboard with request rate, p95 latency, and error rate panels. Auto-provisioned at container startup. |
express-prom-bundle |
A Node.js middleware that automatically instruments Express apps with Prometheus metrics
( |
axum-prometheus |
A Rust crate that adds Prometheus metrics to Axum services via Tower middleware. Exposes
|
GitHub Actions |
A CI/CD platform integrated into the GitHub repository. Runs parallel test jobs, builds Docker images, publishes to GHCR, and deploys to the cloud VM on every release tag. |
OpenAPI |
A specification format for describing REST APIs. Used by the botapi service ( |
Debounce |
A technique that delays execution of a function until after a specified idle period. Used in |
Normalization (Prometheus path) |
The process of replacing dynamic path segments (e.g., usernames, ObjectIds) with a generic placeholder
(e.g., |
Cardinality (Prometheus) |
The number of unique time series in a Prometheus database. High cardinality (e.g., one series per
username in metric labels) degrades query performance and can cause OOM errors. Controlled via
|
SSRF (Server-Side Request Forgery) |
A security vulnerability where an attacker tricks a server into making HTTP requests to unintended
targets. Mitigated in the botapi |
ADR (Architecture Decision Record) |
A document that captures an important architectural decision, including its context, the decision made, alternatives considered, and its consequences. YOVI maintains 16 ADRs in this document and the GitHub wiki. |
15.3. Acronyms
| Acronym | Definition |
|---|---|
ADR |
Architecture Decision Record |
API |
Application Programming Interface |
CI/CD |
Continuous Integration / Continuous Deployment |
GHCR |
GitHub Container Registry |
HTTP |
Hypertext Transfer Protocol |
HTTPS |
Hypertext Transfer Protocol Secure |
i18n |
Internationalization |
JSON |
JavaScript Object Notation |
JWT |
JSON Web Token |
MCTS |
Monte Carlo Tree Search |
ODM |
Object Document Mapper |
OOM |
Out Of Memory |
PvM |
Player versus Machine |
PvP |
Player versus Player |
RBAC |
Role-Based Access Control |
REST |
Representational State Transfer |
SLA |
Service Level Agreement |
SPA |
Single-Page Application |
SSRF |
Server-Side Request Forgery |
TLS |
Transport Layer Security |
UI |
User Interface |
VU |
Virtual User (k6 load testing) |
WS / WSS |
WebSocket / WebSocket Secure |
YEN |
Y Exchange Notation |
