🏢

DSS Planning

Oct 30, 2025, 4:00 PM

47 min

0 attendees

0:000:00

DSS Planning — Recording

Executive Summary

## System Capacity and Constraints The meeting focused on understanding and addressing current system limitations related to token processing quotas. Key constraints include a **1.5 million tokens per minute rate limit** and a **1 billion in-queue tokens capacity**, with the per-minute limit identified as the primary bottleneck. Current usage averages only 8% of total capacity, but sporadic spikes (e.g., 500K-token invoices) risk exceeding thresholds. - **Throttling experiments**: Proposed gradually increasing batch sizes (e.g., from 10 to 15 invoices per batch) instead of adjusting frequency to minimize system "chattiness" and avoid rate-limit breaches. - **Queue prioritization**: Existing queues (live bills vs. priority) will be optimized using a "progressive logic" approach: high-priority queues exhaust allocated capacity first before lower-priority queues activate, preventing resource contention. ## Failure Handling Mechanisms Discussed the system’s existing fail-safes for rate-limit breaches. When token limits are exceeded, requests fail immediately but remain queued in DSS (Digital Submission System), triggering email alerts for manual reprocessing. - **Automatic retry risks**: Avoided due to potential "double-dipping" (retries consuming additional capacity during peak loads), which could worsen congestion. - **Fallback strategy**: Manual intervention remains preferred unless failure rates spike; gradual capacity scaling minimizes failures. ## Deployment and Quota Optimization Reviewed redundant Azure deployments ("O3 batch" vs. "O3 batch 2"), with the latter unused and holding half the quota (500M tokens) of the primary deployment. - **Quota consolidation**: Deleting "O3 batch 2" and reallocating its capacity to the main deployment simplifies management and maximizes usable tokens under one subscription. - **Subscription-level limits**: Emphasized that splitting quotas under the same subscription doesn’t increase overall capacity-only separate subscriptions (e.g., prod vs. non-prod) enable true scaling. ## Output Service Migration Plans to migrate output generation from the legacy system to DSS to reduce bottlenecks. The legacy service stalls under load and lacks resilience, causing operational delays. - **Direct-to-E2BM pipeline**: DSS will generate CSV/ZIP outputs directly for processed invoices, bypassing legacy dependencies. Legacy fallbacks will handle exceptions. - **Data decoupling**: Critical fields for output (e.g., billing metadata) must reside in DSS’s Cosmos DB, eliminating template pulls from the legacy database to ensure self-contained processing. ## Long-Term System Scalability Explored strategies to minimize legacy-system reliance, particularly for newer meters (post-June 2025). - **Greenfield processing**: Invoices from newer meters should flow end-to-end in DSS without legacy interactions, using Cosmos DB as the single source for output data. - **Three-phase roadmap**: 1. Replicate output service in DSS. 2. Reduce legacy load by only pushing data during failures. 3. Fully migrate all invoice processing to DSS. - **Architectural shift**: Outputs must derive solely from build data (not templates), enabling stateless processing and eliminating legacy database bottlenecks.

System Capacity and Constraints

The meeting focused on understanding and addressing current system limitations related to token processing quotas. Key constraints include a 1.5 million tokens per minute rate limit and a 1 billion in-queue tokens capacity, with the per-minute limit identified as the primary bottleneck. Current usage averages only 8% of total capacity, but sporadic spikes (e.g., 500K-token invoices) risk exceeding thresholds.

Throttling experiments: Proposed gradually increasing batch sizes (e.g., from 10 to 15 invoices per batch) instead of adjusting frequency to minimize system "chattiness" and avoid rate-limit breaches.

Queue prioritization: Existing queues (live bills vs. priority) will be optimized using a "progressive logic" approach: high-priority queues exhaust allocated capacity first before lower-priority queues activate, preventing resource contention.

Failure Handling Mechanisms

Discussed the system’s existing fail-safes for rate-limit breaches. When token limits are exceeded, requests fail immediately but remain queued in DSS (Digital Submission System), triggering email alerts for manual reprocessing.

Automatic retry risks: Avoided due to potential "double-dipping" (retries consuming additional capacity during peak loads), which could worsen congestion.

Fallback strategy: Manual intervention remains preferred unless failure rates spike; gradual capacity scaling minimizes failures.

Deployment and Quota Optimization

Reviewed redundant Azure deployments ("O3 batch" vs. "O3 batch 2"), with the latter unused and holding half the quota (500M tokens) of the primary deployment.

Quota consolidation: Deleting "O3 batch 2" and reallocating its capacity to the main deployment simplifies management and maximizes usable tokens under one subscription.

Subscription-level limits: Emphasized that splitting quotas under the same subscription doesn’t increase overall capacity-only separate subscriptions (e.g., prod vs. non-prod) enable true scaling.

Output Service Migration

Plans to migrate output generation from the legacy system to DSS to reduce bottlenecks. The legacy service stalls under load and lacks resilience, causing operational delays.

Direct-to-E2BM pipeline: DSS will generate CSV/ZIP outputs directly for processed invoices, bypassing legacy dependencies. Legacy fallbacks will handle exceptions.

Data decoupling: Critical fields for output (e.g., billing metadata) must reside in DSS’s Cosmos DB, eliminating template pulls from the legacy database to ensure self-contained processing.

Long-Term System Scalability

Explored strategies to minimize legacy-system reliance, particularly for newer meters (post-June 2025).

Greenfield processing: Invoices from newer meters should flow end-to-end in DSS without legacy interactions, using Cosmos DB as the single source for output data.

Three-phase roadmap:

Replicate output service in DSS.

Reduce legacy load by only pushing data during failures.

Fully migrate all invoice processing to DSS.

Architectural shift: Outputs must derive solely from build data (not templates), enabling stateless processing and eliminating legacy database bottlenecks.

DSS Planning

Executive Summary

Detailed Summary

Key Topics

Decisions

Action Items(0/0 done)