🏢

DSS Matching Logic

Dec 31, 2025, 12:10 PM

107 min

0 attendees

0:000:00

DSS Matching Logic — Recording

Executive Summary

## **Analysis and Improvement of DSS Invoice Matching Logic** The meeting centered on a deep technical dive into the Data Services System (DSS) invoice matching logic, with the primary objective of diagnosing persistent issues causing incorrect line item and meter creation. The discussion deconstructed the current multi-phase workflow to identify root causes in vendor matching, data normalization, and the overall integration with the AI (LLM) extraction process, concluding that a fundamental paradigm shift is required to improve accuracy. ### **Overall Problem Statement and Initial Focus** The core issue is that the system frequently fails to correctly match incoming invoice data with existing database templates, leading to the erroneous creation of new meters and line items instead of mapping to known ones. This results in data duplication and workflow inefficiencies. The immediate diagnostic focus was placed on three key areas: improving the core matching logic with potential "fuzzy" matching, cleaning up legacy data, and defining a clear process for handling invoices without an exact template match. ### **Deconstructing the Existing Matching Workflow (Phase 1)** The team meticulously walked through the documented matching flowchart to understand each failure point. - **Step 1: Vendor ID and Account Number Matching:** This initial gate relies on a vendor ID and account number. The analysis revealed critical vulnerabilities: these identifiers can be captured at upload or inferred post-IPS extraction, leading to potential mismatches. A major concern is the translation of a vendor *name* (from IPS) to a vendor *ID* (a number used for database lookup), which may fail if not an exact match. Furthermore, account numbers could be captured differently (e.g., with dashes or leading zeros), requiring normalization to ensure reliable string matching. If this step fails, the process cannot proceed to use an existing template. - **Step 2 & 3: Service Account and Meter Fetching:** Upon successful vendor/account matching, the system fetches all related service accounts (e.g., "Full Service," "Distribution Only") and their associated meters. The logic assumes that for existing customers, these entries should always be present. The discussion confirmed that the system pulls *all* service accounts and meters for the matched account, without initial filtering, which is a potential point for future optimization. - **Step 4: Line Item Fetching and Failure Paths:** Line items are fetched specific to each meter. The meeting clarified the hierarchical dependency: if vendor/account matching fails, meter matching inherently fails, leading to the creation of new structures from IPS data. This results in a bill being created without a linked customer account (`client_id` set to `null`), sending it to a manual review queue. ### **Identifying Specific Matching Failures** The conversation pinpointed exact scenarios where the current logic breaks down, using a real invoice example. - **Meter-Level Matching Criteria:** Meters are matched using a combination of **commodity description**, **rate code description**, and **service address**. The example showed that while the commodity matched, the rate code failed because the IPS extraction captured "Commercial Service" while the database record contained the full string "Rate 4: Commercial Service." This exact string mismatch caused a new meter to be created. - **Issues with Extracted Data:** The **rate code** and **service address** are particularly prone to extraction variance by the LLM. The address, especially, requires "fuzzy" logic for matching, as minor differences in formatting can cause failures. The **commodity description**, while more stable, can also be misidentified. ### **Core Flaw: Disconnection Between Template and AI** A critical insight emerged regarding the fundamental design of the comparison phase (Phase 2). - **Phase 2 (Comparison) Limitations:** The system compares the IPS-extracted "enhanced JSON" with the database template. However, it was determined that the process only *overwrites* the IPS data with database values **when an exact match is already found**. The database template does not actively *guide* or *constrain* the initial LLM extraction. - **Circular Logic Problem:** Essentially, the system uses the database to clean up data only after the AI has already made its best guess. If the AI's extraction is poor (e.g., on rate code), no match occurs, and the incorrect AI data is kept, leading to new entries. The existing template is not being utilized as a prior input to make the AI's extraction more accurate in the first place. ### **Proposed Solutions and Strategic Rethinking** The conclusion was that incremental tweaks are insufficient; a new approach is needed to tightly couple the known template data with the AI processing stage. - **Feed Templates to the LLM:** The proposed solution is to dynamically feed the known invoice template for a matched vendor/account **into the LLM system prompt** *before* extraction. This would instruct the AI to use the existing line item descriptions, commodity types, and rate codes as a reference, forcing it to align its output with the known structure. - **Implement Confidence Scoring:** The enhanced system should include confidence scores from the LLM. For example, extractions with less than 90% confidence could be flagged for review, and those below a lower threshold could require mandatory manual intervention, preventing incorrect automated creation. - **Ownership and Research:** The team resolved to take full ownership of this logic. Immediate next steps involve a thorough review of the existing LLM system and user prompts, the template matching code, and the exact method by which vendor names are translated to IDs. ### **Action Plan and Administrative Notes** The session concluded with a clear action plan and ancillary discussions about tooling access. - **Immediate Technical Investigations:** The team will review three key areas: the LLM system/user prompts governing extraction, the logic for translating vendor names to IDs, and the overall design of the template matching and comparison workflow. A meeting with the original developer was also scheduled to gain historical context. - **Development and Testing Path:** Changes to the LLM prompts can be made by cloning the repository, creating a feature branch, and deploying to a development environment for testing. This allows for controlled experimentation with new prompting strategies. - **Access and Infrastructure Issues:** A significant blocking issue is the inability to access internal tools (like FTG Connect) due to VPN/client conflicts on non-company hardware. This hinders the ability to test changes and investigate issues directly, and resolving this access problem was noted as a high priority for enabling effective development.

Summary

The meeting primarily focused on reviewing a password policy document and discussing the significant challenges and risks associated with the ongoing migration from GCP to Azure. A major point of concern was the definition and feasibility of an MVP (Minimum Viable Product) for the Azure environment, given the compressed timeline and numerous unresolved technical and procedural hurdles.

Password Policy Review

A previously shared password policy document was reviewed, confirming no recent updates to its specifications. The policy mandates a minimum password length of 12 characters and a maximum of 64 characters, enforces multi-character requirements, sets a lockout after five consecutive failed attempts, and maintains a password history of 36 months. A separate, unchanged policy exists for service accounts.

Azure Migration Status and MVP Definition

The corporate IT team has requested a prioritized timeline for an Azure MVP, but the project is mired in uncertainty and faces extreme risks. The migration has been discussed for nearly two years with minimal progress, and the current expectation to deliver an MVP by December is considered nearly impossible due to the upcoming holiday season and a lack of foundational work. The team expressed that they cannot commit to any timeline until critical issues are resolved.

Lack of Progress and Clarity: The project is still in early architectural discussions, with basic questions about the application's functionality still being asked by the corporate team, indicating a fundamental lack of understanding.

High-Risk Timeline: Attempting a migration in the remaining weeks of the year is deemed unfeasible, and there is a strong reluctance to define a limited MVP that could be misinterpreted as project completion, leaving critical functions untested.

Security and Access Roadblocks: A primary roadblock is obtaining security approval to expose the migrated database to the public, which is essential for any end-to-end functionality. Furthermore, development and testing have so far occurred on a test subscription, not the actual Constellation environment, introducing significant unknown risks.

Technical and Procedural Challenges

Several specific technical and operational challenges were identified that complicate the migration and post-migration support. These issues highlight a disconnect between the corporate IT team's objectives and the practical realities of the application's needs.

Complex Integrations: Key functionalities like Single Sign-On (SSO), Power BI integration and embedding, database exposure for other tools, and data integration with Karbon accounting remain as major, untested challenges.

Loss of Control and SLAs: In the new Azure/Constellation environment, the team anticipates a loss of administrative control, preventing them from performing same-day fixes or adhering to current Service Level Agreements (SLAs). This necessitates a redefinition of SLAs and potentially a dedicated internal IT liaison.

Disaster Recovery Concerns: The current disaster recovery capability in GCP, which allows for a full rebuild in under 24 hours, is at severe risk. The approval processes and lack of control in Azure are expected to drastically increase recovery times, with no commitment from Microsoft or corporate IT on who will own and meet these recovery objectives.

Integrity Check Errors and Virtual Account Mapping

A significant operational issue involves a high volume of "3018" integrity check errors related to virtual account mapping. While a bulk upload feature is in development to ease the manual workload, the root cause of the errors is more complex than initially thought.

Upcoming Bulk Upload Solution: A new feature is being developed to allow bulk downloading of unmapped virtual accounts and bulk uploading of corrected mappings across customers, aiming for completion by the second week of December. This will automate the current manual process handled by a specific individual.

Ongoing Root Cause: The errors are not solely due to location address discrepancies. The core issue is the continual creation of new virtual accounts, which occurs when there are variations in any of the six key attributes that define a virtual account (e.g., billing ID, service account ID, meter ID). This suggests underlying problems with data consistency from the source system that need to be addressed to prevent the errors from recurring.

Data Consistency and Fuzzy Logic

The discussion explored interim and long-term solutions for handling the virtual account mapping errors. The immediate goal is to automate the existing manual mapping process to clear the backlog, while a future-state solution would involve implementing fuzzy matching logic to automatically map similar strings (e.g., "Avenue" and "Av") and prevent the errors from appearing in the first place. The team plans to test a proposed fuzzy logic algorithm against recent manual mappings to validate its effectiveness before implementation.

DSS Matching Logic

Executive Summary

Detailed Summary

Key Topics

Decisions

Action Items(0/0 done)