You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 18 Next »

Process

The Archiecture Definition Workspace is where we as a community come together to work through the architecture for openIDL going forward.  We take our experiences, combine them with inputs from the community and apply them against the scenarios of usage we have for openIDL.  Below is a table of the phases and the expected outcomes of each.

PhaseDescriptionOutcome
RequirementsDefine the requirements for one or more possible scenario for openIDL.  In this case, we are focused on the stat reporting use case.A set of requirements.  openIDL - System Requirements Table (DaleH @ Travelers)
Define ScenariosDefine the scenarios sufficiently to gather ideas about the different steps.  The scenarios will change over time as we dig into the details.A few scenarios broken down into steps.
BrainstormingGather ideas from all participants for all the different steps in the scenariosDetailed notes for each of the steps in the scenario(s)
Architecture Elaboration and Illustration

Consolidate notes and start defining architecture details.

Network Architecture - different kinds of nodes and how they participate

Application Architecture - structure of the functional components and their responsibilities

Data Architecture - data flows and formats

Technical Architecture - use of technologies to support the application

Diagrams for the different architectures

  • block diagrams
  • interaction diagrams

Tenets

  • strongly held beliefs / constraints on the implementation
Identify SpikesFrom the elaboration phase, will come questions that require answers.  Sometimes, answers come through research.  Often, answers must come from spikes.  Spikes are short, focused deep dive implementation activities that help identify the right solution for aspects of the system.  The TSC must approve the spikes.
  • spikes defined
  • spikes approved
Execute SpikesExecute approved work to answer the question that required the spike.  Spike results documented.
Plan ImplementationWith spikes completed, the team can finalize the design of the architecture and plan the implementation.Implementation Plan
ImplementImplement the architecture per the plan.Running network in approved architecture

Deliverables:

Scenarios

Stat Report 


Subscribe to Report (automate initiation & consent)

Identify Report

  • What is it (metadata)
    • Naming it 
    • identifier
    • Requestor
    • type of input
    • generation source
    • line of business
    • what output should look like
    • explicit math for aggregation
    • Purpose of data (what being used for)
    • similar to what is captured on a data call
    • DR - stab at making a vers of this, idea of what it should be (ref reqs), see how it looks, whats missing, etc. - find gaps as opposed to trying to be complete here  - for todays putpose some metadata along lines of reqs, would we do first req/draft of what it would look like, anything missing? (feels like reqs lite)
    • KS - info req section in reqs table, first iteration/sol will highlight gaps
    • SK - any existing samples of data calls/reqs? metadata assoc w/ request, match up, covered in the list?
    • PA and KS to discuss what will be shared, integrating w/ other depts, large list of data calls from other systems, working with ops teams to bring it together, high level looking to make big improveements on metadata and reqs
    • SK - date thinking couple (date of req, deadline data, expiration date)
    • KS - for a report these are the fields we fill in: (a la data dictionary definitions), what data call was intended to capture but inc all of details Dale pointed out, there is bridging vs pointing back to reqs, layout for report - THIS IS WHAT WE ARE TRYING TO DO/WHAT THIS REPORT IS
  • Identify Stat Reporter

Identify who is subscribing

  • Defining participants and role
    • Data Providers (Carriers)
    • Report Requestors (DOI)
    • Implementors (AAIS & etc. )
    • Stat Reporter (not necessarily same as implementor, general approved or cert stat reporter))
  • producer of the data and the receiver of the data (source and sync/target)
  • Carriers providing data, DOI creates request
  • DH - Who are the participants? Carrier, Requestor, Intermediary (AAIS? other stat agents? those building extraction patterns and formatting report), implementor of report

Connecting Subscriber and Report

  • Carriers and DOIs, want to capture that Carrier is data provider for a specific report and DOI is specific receiver for a report
  • not data itself, more metadata about report, who getting specifically
  • who get from / give to
  • Notion of give-take between implementors and carriers and DOI about the intent
  • Section about ability to communicate and improve to come to consensus it is the report we want 
  • Communicate about = user interface, carrier gets a chance to say "this one" and the abiltiy to comment on report before implemented, and then implementation and then feedback to agree to
  • Stat Reporting or data calls too? apply to both but focused on Stat Reporting and can bridge later date 
  • Reqs for stat reporting in handbook

Parameters of Subscription

  • Specific to each report (loss dates, premium dates - other variables?)
  • Some general to all reports 
  • Line of Business, Dates, Jurisdictions, 
  • Differences in report by state? Something Stat Reporting folks can answer
  • Territory, Coverage? Diff reports same time period, grouping not a filter

Editing Subscription

  • create/read/update/delete subscriptions
  • self-service or goverened thing?
  • right now, sign up thru stat reporter for reports a Carrier wants run
  • AAIS does it for them or on their own - something to be done
  • Part of governance of openIDL (members, credentialing, )
  • Audit log - auditability of subscriptions - managing subscriptions as part of openIDL - AAIS thing, funct of openIDL 
  • openIDL not a stat reporter - is there. a specific designation? AAIS is stat reporter working thru openIDL, if others join, they could be doing stat reporting on openIDL, there will be a "Stat Reporter" as intermediary, 
  • defines a seat in openIDL network (how to say "AAIS is doing X")
  • DH - Trv joins openIDL, selects which stat agent thry would do stat reporting through - could be report by report but guess all-or-nothing
  • PA - not all or nothing as AAIS doesnt do all line (work with AAIS, then Verisk, ISO - can't be complete)
  • DH - don't do MassCar and Texas w/ AAIS
  • KS - identifying report,  id stat reporter, per report detail (each report stat reported via AAIS), stat reporter per report or by line of business - per report connection covers all cases

Ending Subscription

  • Delete
  • Give subscription an end data (effective expiration on the subscription itself)
  • lead time where AAIS or Carriers want to know if they are continuing or moving to new stat agent in openIDL
  • Autorenewal

Load Data / Assert Ready for Report

080122

  • ?? Facilitate semi-auto inquiries, metadata management scheme
  • ?? Day 1 - PDF uploaded somewhere

080222

  • KS - Homework, turn the above into arch statements or drawings/tenets, not in the requirements, feel little like requirements still, how do we add progress outside meetings?
  • PA - like about reqs - key of what genre a req comes from and a unique ID - can we get a unique ID for these elemtns and a table, what refs what reqs, do homework
  • KS - components or arch elements as oppsed to reqs - talking solutioning, trying to take reqs and apply to scenarios, break out into a set of arch statements for each component (LD1 assert up to a date on the data, LD2), then consolidate  - AAIS team to org this doc into that format (due next Mon 8/8
  • SK - is the reqs based on discussions, done, next step to jump into solution design and arch? 
  • PA - jumping in makes sense, int in 2 things: interactions of network and HDS, hard to think of how data load happens w/o knowing target
  • SK - deliberated reqs, organized, next step not to re-deliberate reqs but to solidify the arch or at least start on it NOT reclassifuying this into another set of reqs
  • KS - avoid that, these are functional areas sys needs to support, not get to details of tech for a while, all the ideas that need to hold true, made progress in open ended way
  • JB - top down/bottom up - some sense going back to phases of the sys we started with, keep in mind arch we are dealing with network, not centralized data center, keep in mind org funct around aspects of that network, reflect some of the initial thinking arch needs to be supported, what are the elements for producers, processors, receivers of data
  • KS - need to be tolerant of chaos, in between meetings remove chaos and refine, brainstormer, raw material
  • PA - outlined our big boxes? 
  • KS - Data formats? Stat plan

Define Format

  • What is the data? Glossary or definition? What is being loaded (stat report well-defined)
  • Assumption - stat plan transactional data, metadata is handled by spec docs as yet to be written
  • Data existing in HDS, what schema says, there to fulfill stat report, this is just data thats there, period and quant/qual of data designed to do stat report, for this purpose just a database
  • Minimal data catalog - whats the latest, define whats there (not stat report per se), whats in there is determined thru the funct described (time period, #, etc.) - diff between schema for a db and querying it, format for what could be in there
  • Minimal form of data catalog - info about whats in the data
  • Schema is set but might evolve - "type of data loaded" - could say "not making assertions this data is good for a specific data call but to the best of our ability it is good to X date"

Load Function

  • Deeper in process of data you have getting into openIDL, details of managing
  • Process, raw data in carrier DB, turned into some "load candidate", proposed to be loaded into system, needs to go thru edit package
  • DH - before HDS?
  • KS - from your raw data to accepted HDS data (load function) and will inc other pieces like edit package
  • DH - internal loading to the carrier
  • KS - carrier resp for turning data into intake format (stat plan)
  • DR - req for "heres what data should look like to be ingested" - 
  • data model - stat plan day 1, day 2... data model
  • KS - process of taking it in, do work to make more workable in the middle, dont commit to saying "what you put in front end is exactly what ends up in HDS" - right now not putting it exactly, turning it into at least a diff syntax and never will be 1:1, semantically close, 
  • DH - more sense for decoding
  • KS - load funct part of openIDL, carrier entry point, what carrier putting into load func is stat plan, THEN run thru edit package, review/edit (a la SDMA), "go" and then pushed thru HDS - carrier not doing transform, carrier loading thru UI (SDMA), may even be SDMA (repurposed) to load HDS at end of day
  • DH - HDS w/in carrier node?
  • KS - adapter package - need to support 1 keeping data in carrier world and dont want everyone to write their own edit package and load process, agree on somethign that runs in your world that is lightweight edit package
  • DR - simplify, essentially a data model, how does it lie in HDS, may or may not be a different input data model that is whats loaded, once in HDS and "loaded" should conform and have any edit packages already run on it, all running on carrier side, dont want it going out and back - caveat, edit packages are shallow tests, not looking at rollup or reconciliations, "is it in the format intended?"
  • KS - row by row edits, not across rows, had to have x w/o errors, etc. - syntactical and internal, "if you pick this loss record cant have a premium"
  • DR - sanity checks and housekeeping 
  • after edit, push to HDS (tbd format, close to stat plan day 1)
  • PA - extensibility, adding more to end of stat plan in the future

Transform

  • whatever we need, might do some small decoding, def turn in from flat text to TBD (database model in HDS)
  • normalization? some light transformation in the beginning
  • assumes not collapsing records, like stat plan same level of granularity every record input is record in HDS (time being)? 1:1
  • decoding has reference data to lookup

Edit Package

  • Big (all of SDMA)
  • when we discuss loading data is it already edited and run thru FDMA rulebase and good to go or raw untested data
  • ASSUMING thru the edit
  • Can tell how goods the data and through when
  • pointer to SDMA functionality:
  • PA - SDMA - business level rules, large manual process for reconciliation BEFORE turning in reports (today), business and schema testing (does data match rules and schema? cross field edits)
  • KS - cross field edits - loss records, diff coverages, do have a publishable set of 1000s of rules if used SDMA will just work, just plug SDMA in - can and has been pulled out, proved it could be done, rules could be run as an ETL process - havent done, back and forth and fixing of records not part of it, run the rules as ETL process

Data Attestation (boil down to tighter discussion Ken Sayers )

  • Have it or don't by time period
  • Assumption - run report, everyone is always up to date with data, loading thru stat plan, data has been fixed in edit process, ask for 2021 data its there
  • Automated query cant tell if data is there, may have transax that haven't processed, dont know complete until someone says complete
  • Never in position to say "complete" due to late transax
  • If someone queries data on Dec 31, midday. not complete - transax occur that day but get loaded Jan 3 - never a time where it is "COMPLETE"
  • Time complete = when requested - 2 ways - 1 whenever Trav writes data, "data is good as of X date" metadata attached, Trav writes business rules for that date, OR business logic on extract "as long as date is one day earlier" = data valid as of transax written
  • Manual insertion - might not put more data in there, assume complete as of this date
  • Making req on Dec 31, may not have Dec data in there (might be Nov as of Dec 31)
  • Request itself - I have to have data up to this date - every query will have diff param, data it wants, cant say "I have data for all purposes as of this date"
  • 2 dates: 12/31 load date and the effective date of information (thru Nov 30)
  • Point - could use metadata about insertion OR the actual data, could use one, both or either
  • Data bi-temporal, need both dates, could do both or either, could say if Trv wrote data on Jan 3, assumption all thru 12/31 is good
  • May not be valid, mistake in a load, errors back and fixing it - need to assert MANUYALLY the data is complete as of a cert time
  • 3-4 days to load a months data, at the end of the job, some assertion as to when data is complete
  • most likely as this gets implemented it will be a job that does the loading, not someone attensting to data as of this date -where manual attestation becomes less valuable overe time
  • as loads written (biz rule, etc.) If we load on X date it is valid - X weeks, business rule, not manual attestation - maybe using last transax date is just as good - if Dec 31 is last tranx date, not valid yet - if Dec 31 is last transax date then Jan 1
  • Data for last year - build into system you cant have that for a month 
  • Start with MANUAL attestation and move towards automated
  • Data thru edit and used for SR, data trailing by 2 years
  • doesnt need to be trailing 
  • submission deadline to get data in within 2 years then reconciliation, these reports are trailing - uncomfortable with tis constraint
  • our ? is the data good, are we running up to this end date, not so much about initial transax than claims process
  • May have report that wants 2021 data in 2023 bug 2021 data updated in 2022
  • Attestation is rolling, constantly changing, edit package and sdma is not reconciliatioj it is business logic - doesnt have to be trailing
  • As loading data, whats the last date loaded, attestation date
  • sticky - go back x years a report might want, not sure you can attest to 
  • decoupling attestation from a given report (data current as of x date), 
  • everyting up to the date my attestation is up to date in the system
  • "Data is good through x date" not attesting to period
  • Monkey Wrench: Policy data, our data is good as of Mar 2022 all 2021 data is up to date BUT Loss (incurred and paid) could go 10 years into future
  • some should be Biz Logic built into extrat pattern - saying in HDS< good to what we know as of this date, not saying complete but "good to what we know" - if we want to dome somethign with EP, "I will only use data greater than X months old as policy evolves
  • Loss exposure - all losses resolved, 10 years ahead of date of assertion, as of this date go back 10 years
  • decouple this from any specific data call or stat report - on the report writer 
  • 2 assertion dates - one for policy vs one for claim
  • not saying good complete data, saying accurate to best of knowledge at date x
  • only thing changing is loss side
  • saying data is accurate to this point in time, as of this date we dont have any claim transax on this policy as of this date
  • adding "comfort level" to extraction?  - when you req data you will not req for policies in last 5 years - but if i am eric, wants to und market, cares about attestation I can give in March

Exception Handling

  • Account for exception processing
    • What is an exception? 
    • PA - loss & premium records, putting stat plan in JSON, older data didn't ask for VIN, some data fields optional
    • KS - exceptions can be expected, capturing & managing situations to be dealt qwith, not "happy path", need to have error codes and remediation steps, documentation for what they all mean and what to do about them (SDMA has internal to edit package) - things like "cant get it in edit package b/c file not correct", etc. - standard way of notifying exceptions throughout system, consistent, exception received and what to do about it
    • PA - ETL stuff, exceptions based on S&S topics, whats the generalize way to handle? or specific except cases?
    • KS - arch needs way to report and document and address/remediate exceptions (consistent, notifying, dealing)
    • PA - options: 
      • messaging format, 
      • db keeping log of all messages
      • hybrid approach of both
    • KS - immediate feedback and non-sequential (messaging or notification feedback)
    • JB - data loading transfer of data or into HDS? 
    • KS - data loading starts with intake file in current statplan format, ends when data in HDS
    • JB - lot of exceptions local to this process loading data, reported to anyone or resolved or level of implementation of who is reporting data,
    • KS - some user interface, allows you to load a file and provide feedback, but a lot is asynchronous, no feedback from UI
    • JB - gen approach to be shared across 
    • KS - consistent way to handle across system (sync/asynch, UI vs notification)
    • PA - 2 lambda funct loaded in, 2 S&S topics (1 topic per lambda), seems like nice granular feedback, as we get more lambdas throughout node would be unweildy, master topic to subscribe to resources
    • KS - too deep for now
    • PA - one general exception thread or thing to subscribe to, get large amount of exceptions as opposed to making the QA team to ind subscribe to each resource (some kind of groupings?) - lot of components throwing exceptions and dont want to sub to each component 
    • KS - do we want to audit exceptions? Likes/Unlikes, Consents, etc. - are there exceptions we want that to be captured on ledger or somewhere to be audited later?
    • PA - consent to data call and dont have data required that should be recorded/captured/to chain, etc. (consented to participate and no data)
    • KS - funct in exception handling, getting close to NFR (disaster recovery, continuity, reacting to scalability, etc.) need to get there at some point - digging deeper, specific exceptions will have different decisions 

Metrics/Reporting (process)

  • KS - Within Data Loading, Feedback (status, presence) on what is loaded, # of records, total premiums
  • DH - by line of business, record counts, premiums, paid losses, incurred losses, significant metrics and claim count
  • KS - SDMA good example, same as SDMA provides (DH says yes)
  • DH - maintaining 5% tolerance ? then we need to know if within tolerance AND where they are relative to tolerance
  • KS - tolerance here is a load thing - amount puot into system (5%), but you can load pieces, certain amounts, going back to report based tolerance, cant determine here  - data loaded for last month or quarter - some time period of loaded data calculating tolerance for AND could be multiple loads
  • DH - requirement could be "must be in tolerance" and combination of loads must be within tolerance
  • KS - want to go back and check tolerance across a time period
  • DH - or a counter that looks at aggregate tolerance over periods of time (calendar year? quarter? month?) - could be continuous
  • KS - and know when to do that counter - editing before gets to HDS, clarify? should not get error records in HDS
  • DH - you might, if within tolerance? theoretically you wouldn't but practivally you will - always have bad zip codes or edits, but minimal in scope and a resource constraint to fix every single record
  • KS - if i had a bad zip, why in HDS being queried? might be valid in other ways but is that how we want to work?
  • DH - do you delete whole record?
  • KS - not based on zip CAN use but based on zip CANT? pretty complicated - what does SDMA do? not passing thru error records to data lake
  • PA - to follow up with Andy, gut feeling still passing, not eliminating rows
  • DH - deleting rows = out of balance 
  • DR - how do we know wrong but not know right zip code?
  • PA - situation where zipcode featured in wrong state (NC in CT) trigger an error event
  • KS - are there some errors we let through b/c a report WONT break? Iffy
  • DR - some automated loading tooling that would prevent from happening, pre-edit package, Carriers may need to inc downstream, if we know it is wrong, it might be ok - HDS is not the one use case (know stat reporting ok), might not be used or below threshold, but if there is a different use case cant make that assumption errors are ok - goal accurate to best of knowledge, if obvs wrong attempt to fix it
  • DH - logistical nightmare, errors cant fix b/c not getting data, valid limit but only on a few records wont pass edit - maybe on HDS a flag "this record did/did not pass edit"
  • KS - IBM thought about this, initially built braindead vers of transform, loading errors into record, put into HDS, and could have records that were self-descriptive of what errors and then up to Extract Pattern to decide
  • DR - if something fails edit package, missing or wrong, could make all missing, all null, if null pattern doesn't have to know - if a record is accurate to one limit field and set to null, wont pass edit package and parser will decide "cant use" w/o metatdata assoc with it - cleaner than playing game of "data is wrong but lets say whats wrong so others can use it" - too complicated
  • KS - "null" is valuable in some cases, error vs value situation
  • DR - trying to put logic on data call extract vs db, isn't always an error and data call should decide if it needs it, shouldnt matter to data call, why matter if not there - put logic onto extract pattern than HDS - wrong or missing
  • JB - conflicts b/w "state and zipcode" - needs some resolution to fix something (inconsistency)
  • KS - hybrid approach - load errors into records and ignore if they dont matter, could bloat things
  • DR - build funct on bloat and bloat becomes feature
  • PA pass or not-pass?
  • KS - loss of fidelity, all use cases for reading data not known, one is fine other is bad, need extract pattern, if null not use record
  • PA - minimum amt of edit package everything should pass?
  • KS - no, doesn't sound like it -
  • DR - didn't pass edit package could wortk in cert cases, but zip code is perfect ex: which is right? know use cases in stat report, up to 5% iffy, would work but dsoesnt scale with other use cases, flags help at record level only, a use case could say "ill accept b/ it is 5%" - need to know what error is 
  • KS - we can have arch bloat and as we learn if this bloat has value we can remove or not - shrink arch to assume null, disciplined to revist this (record-level errors), 
  • DR - record level error, limit to that, boolean, simple, just do record leveling and no further is reasonable approach (stat reporing on day 1, allows others to build more funct on openIDL and allows carriers to build knowing it is there)
  • Dh - some errors more imp than others, zipcode not important for some reports, nature of error, dont know higher level of error coding, if you need to get too geo specific, this record wont work but looking at general, then perhaps it might - rather than not using record at all it makes it more complicated
  • DR - use cases, for other data calls maybe its ok for what that datq call is doing, 90% of records saying "no issues" thats a call for data call to make, put complexity on writers, maybe moot but on day1, let people know some aren't quite perfect - there will be more fine-grained for the carriers, will see more clearly results of the edit package - keep to simple flag
  • KS - simple approach now doesn't obviate the use of the more fine-grain later, if we find it is imp to know the error is the zipcode and x doesn't want to use it we can go back to data and add errors later
  • DR - carrier decision, hold it as data req for HDS, later req could be added (optional or mandatory) "you could add metadata about exceptions" and theat might obviate need for remediation - DH aware - this is rep of downstream system of record
  • DH - systemic issues we fix, these are one-off
  • DR - hard, one-off issues = one-off corrections (data entry error) and not directly impacting admin of policy or claims, hard sell to always clean up, dont want HDS out of sync with system of reocrd and dont wand SoR spammed with error cleanup and affectig ops
  • KS - threshold? everything goes through even with error can enter HDS?
  • DR - ideal world depends on data users (whats acceptable) but we could say at minimum whats most imp 
  • DH - functiioon where run through edit package and where we can fix some errors (what they can) to get within tolerance, run thru edit package to pass edits
  • KS - not sys of record, result of fixing stuff in source systems and getting them out, have to get into and then back o ut of source systems
  • DH - we make edit changes in SDMA today, not going back to source system
  • KS - out of sync between SDMA and source system, hds doesn't match
  • DH - edit correction, then yes not match source systems
  • DR - get in trouble, modifying HDS for one deliverable, might be ok, but harder to use HDS as source of truth, not accurate compared to source systems, if you need to do it that way tread HDS as source of truth and push corrections downstream - HDS accurate rep of downstream systems, but if most corrections/edits not in source of truth makes sense to push edit funct into HDS and  there can be edits on data call or extract pattern - could execute extractpattern ourselves to test - if EP exists, references edit package, doesn't mean running EP in parallel as needed to correct things, correcting NOT at the source but in this method, simpleifies architecture, if HDS is dumb source of truth, complexity goes to EP, say EP is stat report and says "6% problems, out of tolerance", if we know and can run that see out of tolerance and fix some things in ephemeral data store to move INTO tolerance, point to alt locations in EP
  • KS - dont those locations needed by other EPs? loss report that needs those fixes?
  • JB - some errors fix at the source, biz decision for the carrier 
  • DR - fixed today might be source of truth next week, dont want to keep track of error fixes you did, some might have been fixed in changes to the policy, dont need to fix downstream anymore, make tooling correct to see tolerance and adjustments thru UI, whatever gives flexibility and business decision to see why OR really 1-offs and no systemic fix or fix wont happen in a year - vis to see problems and made decsion to change - simplifies getting right, HDS reps source of truth and working on marquee use case (stat reporting), slightly more complex with interim edited data store (div-ing vs full replica)
  • KS - HDS in sync with systems and overlay of edits visible to EPs, but not durable not permanent - exists as a temp thing
  • JB - diff patterns, diff things, replicating corrections?
  • DR - complexity of EP to solve
  • JB - record of what errors were in HDS could choose to deal or not, having a copy of HDS knows records with errors, maybe/maybe not addressed
  • DR - sometimes you need to make changes to get to threshold, just recording doesn't help, changes in plance now HDS doesn't rep core systems (state maangement prob of ANOTHER source of truth)
  • JB - if errors that significant must fix?
  • DH - might be system issue that needs to get fixed
  • KS - do a record level (fix-record), discuss later, better to have record have a fixed version, here's what we got and the fixes to it, simple to see, diff way to do what dr was describing
  • DR - could do "foreign key" to fixed record, db arch instead of flag
  • DH fix record ONCE
  • DR could get lots and lots of those, based on EP, 13 columns of foreign keys, could spiral, some state management, source of truth changed and now core record changed to do we need to link edit package...
  • JB if small p% has errors, 
  • DR - depends in ephermeral or not, tooling to fix errors automatically, read data, modify, fix, kill
  • KS - ephemerility - how ephemeral or length of time ephemeral
  • DR - time prob - 2 months doesnt work, 2 min 2 days maybe
  • KS - i have fixes to rec, will fix downstream systems, do we need a re-run?
  • DR - seperation of concerns is important, logic data, look for speciific field, linked table - now messing with core system to deal with corner cases 
  • KS - not corner case, know records will be messed up and fixed at HDS, things exist
  • DR - how many times will we do edits outside Sys of Rec for data calls
  • DH - dont, no edit package on data calls
  • DR - never on data calls?
  • KS - on the load
  • DH - look for reasonability, dont have edit package to make sure interrelationship of edits is sound
  • KS - this system, edit package on all data you load
  • DR - go two ways, proliferation of edit records, pointing to diff versions of records, could hold like "ONE only ONE edit record, option of querying that or not depending on use case, dep on database design could be problematic
  • KS - could design away with "views", if have a view 

Data Catalog (meta data about whats in the db - some notion of whats available currently)

History Requirements 

  • rollout period - not keeping 20 years of data

Schema Migration/Evolution


Create Report Request (Configuration)

Define jurisdictional context/req (single or multi versions of same report)

How often it runs

Data Accessed

Outputs

Roles and Permissions

UI/Interface

Extraction Pattern

Aggregation Rules

Messaging

Participation Criteria

Two Phase Consent

Data Path (from TRV to X to Y - where is the data going and for what purpose)

Development Process (extraction/code)

Testing

Auditability of data

Generate Report

Rule Base for each report

Extract Data  (will involve aggregation)

Transmit Data (from HDS to analytics node)

Combine Data (various sources)

Consolidate Data (at the report level)

Traceability

Format the output

Validate against participation criteria (vs report config)

Exception Processing

Messaging

Generate Report

Auditability/Traceability

Reconciliation (Manual day1?)

Extraction error detection & handling

  • Everything needs to be edit-able
  • Fixes don't happen in current month (monthly correcting and then moving on)
  • Latency of error correction could be a year
  • need to make sure we have facility to capture corrections made while NOT bastardizing HDS
  • internal or architectural? DR is aware
  • SC - Errors:
    • missing information (on record provided)
      • current environment vs future
      • today - flat file from upstream, flat file submitted with missing limit, info passed to AAIS, flagged by AAIS, returned to carrier (can see instantly by state), these 2 states need fix made, go into SDMA to make fix then submit, AAIS approves, loaded by AAIS
        • it had already gone thru the edit
    • DH - load into SDMA, not approved yet, Susan makes corrections, goes thru edit again once Susan made corrections (see right away if fix worked), if in tolerance it is "approved" by AAIS
    • PA. - doing upload to SDMA, staging area, AAIS not running load until it is approved (edit package engaged)
    • SC - loading it to AAIS system, told to fix errors, fixes, then "officially submitting" and AAIS "approves"
    • PA - can't go to HDS until "approved"
    • DH - where within process is edit package? where is favcility to correct the errors, if HDS is supposed to be matching to source systems, then we shouldn't be making changes to HDS for other purposes beyond StatReporting - decision in ArchWG - how handle error corrections and fidelity of HDS
    • PA - direction, making update, go about making corrections of data already inside HDS, first example - data before HDS, different Error type
    • JB - case Dale mentioned, HDS is out of sync with source system, SS has error, needs time to fix, copies of DB with errors to be corrected - would suggest errors corrected get corrected in HDS but a log to inform source system of corrections as made - instead of lots of copies of collected data
    • JM - crossing boundary - doesn't care what carriers do - where do we stop caring - only thing, HDS has to be right, up to the carrier how they get it right
    • JB -yes but instead of making fix and a copy of DB it seems it should be ficec in HDS
    • SC - internal issue, AAIS needs to edit data, thats their job , if they say "2 errors" and they get fixed she says "done" and pushes to HDS - conflict with source system is something SHE deals with
    • JB transferred to AAIS for edit checks, 
    • PA - held before data lake until adter corrected
    • JM - cant occur until content is in it
    • PA - edit pre ETL
    • JB - do it 2x, if you correct HDS need to run edit in that environ
    • PA - how do we have chick-egg issue
    • JM - policy vs implementation ? - HDS is great cutoff point, everthing inside, up to the carrier to get it right in HDS - BUT Edits tell you whats right - carrier accountable up to HDS, if accountable on the carrier side and can verify before HDS, do the edits, send to HDS - what if I say "right, but edit stuff can't run iutnil other side - alrwady loaded to HDS - now what do I do? - accountability? where run edits is key question
    • PA - edit package run today, run on etl on load, no knowl on load - 2nd part AAIS does reconciliation after, sometimes errors arise
      • error type 1 - pre HDS , edit package fails on load - but what if loaded in HDS what is the recon process and what the process for that
    • JB - financial types of reconciliation
    • PA - yellowbook #s, compare #s submitted vs financial #s and due to granularity things come out wrong, financial reconciliation before stat reporting
    • JB - 1x year vs monthly
    • JM - reconciling financials? where?
    • SC - public info
    • PA - reach out to team with gap analysis, grey areas in codeing vs what they have, validate where /why numbers are off
    • SC - those arent errors, do reconcile, out of process doesnt become errors, differences and reasons why page 14 doesn't match - but NOT errors
    • PA - validity AAIS gets turning in reports on carriers - not only passed edit package but biz data matches fin data and a reason if it doesn't - why states listen to AAIS, how are we ensuring we are doing stuff correctly
    • JB - diff record exception 
    • JM - annual value add - edits? HDS needs two stage?
      • think its right but flag then run edits and get "ok/not ok" - question - who runs the eidts? in principle edits run on anyone centralized db
    • JB - copy of edits made avail to all
    • DH - one body resp for edits, not every single carrier 
    • JM - you put data in HDS, centralized code runs on all dbs, puot into HDS in some manner "this is not fully approved/edited" and decision: edit in place or is it a 2-stage thing?
    • SC - even if every carrier ran edit package themselvess, ult AAIS HAS TO RUN EDIT PACKAGE - resp lies with statistical reporting partner
    • PA - extract patterns to un T/F that a package was run - do test on clean or dirty data
    • JM - edits form of extractPattern, is it sufficient if it checks all the data
    • PA - regulator! 
    • JM - need feedback - run edits, if answer wrong, accountability to get it right
      • phys load or set flags
    • PA - should be running edits before load,
    • JM - WHERE? edits have to be consistent lang, thing needs to be well-defined structure
    • PA - rules engine, java, repackage rules engine as step in process going thru load (pass/no pass) 
    • JM - engine has to run against well define struct - b/c our data runs against well defines struct, now you are in HDS? put it into well def struct to run the rule that is the post-edit vers of that structure
    • PA - messaging format of HDS - stat plan, objects, run edit package against that 
    • JM - stat loading and knowl, if run edits against that, once passes - put it somewhere else or flag it - 2 concepts pre and post - saying to all carriers it needs to be PRE data but it has to have a shape - HDS? JM perceives when you demand "struct in diff way" and sees it as HDS
    • PA - diff pipleline but sees why it is outside of HDS
    • JB - data standard for saying how data will be considered, keep in mind dist arch, AAIS can't run anyting on db at carrier - raw, wont be sent to AAIS
    • PA - collections of stat records, running rules against them, if HDS is stat plan JSONified, run EPs, passed valdiation and legit extract
    • JM - HDS is JSONified stat stuff, edits, things all can see are ALL HDS in his mind - if prescribing shape b/c edits won't work, first place carriers have to do that
    • PA - pipeline A before HDS, where prescribed the data hits first
    • JM - widget shape here, then ep - prescribing shape, set of edits then HDS -  pipeline A is a prescribed shape, do whateve it takes to get it right, once edit passed drop into HDS
    • DH - wants to have DavidR weight in
    • PA - Pipeline A (infra before HDS), need to pull rules engine, before how much do we want to control creation? JM talkign about HDS being a larger thing, where does the baloon around openIDL begin? PipelineA is infra, carrier does all before? will still design load up to plugin
    • JB - think of pipeline A as data format
    • PA - wont process and give feedback
    • JB - need data format to be standard to run rules against, gives flexibiltiy to reconstruct design with same format (transit from flat file to whatever). 
    • PA - docker image with initial process? where is the official inbound point of openIDL community vs carrier
    • JM - one step at a time - HDS in the dark (far right), run extract patterns on - before HDS has to pass edits - edits need to be centrally maintaineed, id DRules expecting  something - pipeline A - already in that shape - sayig to carriers, prescribe format of HDS, to be right prescribe the edits, has to hit a prescribe shape here - carrier can do whatever to get into that form, that form is prescribed, java thing, json, all prescriptive, no flexibility
    • PA HDS, cna write queries against, layering other things not HDS
    • JM - centralized group do edits, carriers get it into that shape, must be part of standard of stuff to be prescribed
    • PA - meat of Drules, lot of it is testing stat plan, start ingesting as json, chekcing positionality
    • JM - thou shalt not load HDS until edits passed, edits maanged, approved format, carrier must get data into shape - reload until passed and THEN move to HDS
    • PA - can we have a bucket, fire lambdas against it, won't move to secondary bucket until passes
    • DH - suppose use HDS for other things, communicating with reinsureres, something outside of stat reporting, now that HDS not necessarily reflects source systems
    • JB - source consistent, take time to get corrected, logically - more correct vers HDS
    • PA - HDS more right than source system
    • JB - fixed at HDS but not at source
    • JM - policy, carrier accountaibility, edit finds something wrong, iterates on changes, if it takes 6 months to get back to source, for next 6 months other reports dont reconcile - accountability in governance statement "if you find an error you are accountable to reconcile"
    • JB - consolidated data in HDS for other purposes, if corrections were in HDS the right place to do it
    • JM - betteer that doesnt line up is wrong
    • JB - log for where / when changes done
    • JM - carrier accountability - more right data - where is accountability to carrier? whatever it takes upsteam - tell us changes you made requirement - lof that says "to get this loaded here are 7 edits" - accountability to make it transparent
    • PA - meta on each row with last update date and what changed
    • BH - if systems dont reconcile - BAD - what else are we doing with it? problem to be solved, may be a log, sounds painful
    • SC - reality - keep a log today (she does of every change made) - most cases data SC didn't get on her file (stat file) - is it really diff from soiurce system? she didn't get it on her file due to mapping upstream -know zip code is wrong or vin is wrong dont change things in her file or tell source system theres too many (agents inputting) - ok if under 5%
    • JM - pracical question - do edits - syntactially and symantically: find alpha, dont know if someone mistyped VIN, but no idea T/F in real workd - HOW RIGOUROUS DO EDITS NEED TO BE? - even if edits flag error? can we accpt it?
    • SC - happens all the time, might get edit "limint on policy is $1MM and you got somehting else - not an error"
    • JM - 2 levels of edits? showstopped (dead) and one we accept
    • SC - wont ignore fact error was received, will go and looks "did I have the right limit" - edits help und if there is a problem - is it internal edits ?
    • JM - what is the purpose of an edit? dont edit more than you have to - what is the purpose in this context - all sorts of mech for internal correction - dont edit more than you need to without purpuse - some things you have to fix, principle: only put in edits b/c hardcore reason to do it (not just clean data"
    • JB - work to be done - application and analysis and insight, not policy-level corrections
    • JM - do edits have levels? severity of error (which means will it be addressed)
    • JB - sanity check errors vs record format errors - can and will catch but WHERE in process
    • DH - gut check for AAIS as stat agent on how rigourous they need to be
    • JM - levels - showstoping and scary and "oughta check"
    • JB - accuracy in general (THRESHOLD)
    • JM - confidence scores from address cleansers - 
      • showstoppers (break system)
      • competency score (".7 good enough? yaaay")
    • JB - data quality scores, pick battles
    • SC - basic: does every field get a val - current and future, if not ABCD - if that field is filled? if so whats in there, nebulous - stat agents bear resp of "data is reasonable", know it is not garbage, how much has to be "good" - what does "good" mean (every field filled w/ reasonabvle value"
    • JM - mTable that does this - argument - for every field "type, table, range, = score"
    • SC, come across something, didn't meet the threshold, kick it back?
    • JB - levels determine responce
    • JM - governance ? - value, string, etc. - dont measure if you aren't gonna govern it - if you are gonna put a rule in there, must have govenance polity - arch has to provide for edit layer and series of thresholds to get a score and governance policies by score
    • JM  - pass/fail and scoring
    • PA - extra metadata for user queries

Reconciliation (make sure report is correct based on request - reasonability check on the report - NOT financial reconciliation)

Financial Reconciliation (Oracle? Source of truth to tie against those #s?)

Statistical Reconciliation

Auditability/Traceability

Deliver Report

Make report available (S3 bucket public/private? start private)

Permissioned Access

Deliver to participants (carriers)

Deliver to subscribers (requestors)

Receipt/Notifications

Auditability/Traceability

Exception handling

Data Call

Communications for resolving conflicts, etc.

Load Data

Create Data Call

Like Data Call

Issue Data Call

Subscription to Data Call

Consent to Data Call

Mature Data Call

Abandon Data Call

Clone Data Call

Deliver Report


Application Components

Data Sources, Sinks and Flows

Decisions

Tenets

Data

ID

Tenet

1Data will be loaded in a timely manner as it becomes available.
2HDS will track the most recent date that is available to query for pre and post edit package data.
3Data owners will correct any mistakes as soon as they are made aware of the issue. 
4Data owners will follow current practices for logging policy and claim records as they do today. A new record will be created for each event. All records will be loaded in a timely manner after the creation event. 
5There will be a distinction between edited and unedited records. (Successfully gone thru edit package)

Non Functional Requirements (to be moved to requirements doc)

Notes:


Time

Item

Who

Notes









  • No labels