openIDL - Architecture Discovery (Drawing)

Discussion notes:

AWG Master diagram:

https://lucid.app/lucidchart/ac20d4e1-50ad-4367-b5cf-247ed9bad667/edit?viewport_loc=-301%2C-37%2C3200%2C1833%2CkgLSxXlpoGmM&invitationId=inv_de1a5e61-8edc-488a-90ee-8312f8c69cd4#

KS -
- Work from this and other diagrams
- Draw boxes as opposed to blank slate
PA - start broad, draw stuff until we have off
KS - can check off major functions, components in swim lanes, need to ingest fata, load it, data calls, extract, report - high level - can start from left and spitball flow of data as it comes through the system, get high level boxes in there, or as we go itemize whats in boxes - if doing it alone Ken would do all the above - start from somethign may or may not be close
KS - first step: getting data into HDS, through some means, means many mult DBs or systems are the source of this info, needs to be normalized - ETL is going to normalize data from diff sources into an openIDL format (HDS - first thing to happen) - another thing that is going to happen, we are goign to edit it (syntax, data errors in ETL ) then convert to HDS format and load into HDS format (all happens in Member Enterprise on :LEFT of diagram
JM - first point of concern, to do the edits, need to be in standardized format (for predictability)
KS - if standardizing edits, therefore the data input must be standardized at this point, putting data into standardized format - edit package will edit it for validity aganzt a number of rules, know we have a rules engine in the ETL (and rules repository lies in ETL)
JB - puts data into sep format and second step where it checks it
KS - ETL standardizes data, then ETL Edit engine (Rules Engine, Rules Repo) - 2 major steps going on inside here: ETL and Edit then converter to map it to format for HDS
KS - standardized, edited (rules engine and repo) then mapped to HDS format - coming out of here: only the valid
JB - warnings and exceptions based on issues
JM also intermediary data set, want to land data into some structure visible to run edit rules, some type of repo, biased towards persistence
JB - batch or message format
JM - stuff on fly needs troubleshooting, persisting diff issue (maintenance)
JB - just some kind of file format with w/ schema, well-defined structure
KS - after edit, thought we were keepoing all the records in the HDS, still true? Flagging those with errors? Or just daying
JB - 2 types: data errors - sanity check?
KS - ui for controlling edits and release of this stuff
JB - going to be issues with data vut if thresholds not above a cert threshold... path, data quality checks - discussed everyhting submitted goes to HDS b/c it comes from the source
KS - SDMA func "this has x erroers but in a few months will release all data
JB - submissions - daily or weekly
JM then dont need UI - up to carrier to figure out how to pass all tests,
KS - think we heard wd rather have standardized process, need UI to get it to work - no consensus, thinks Dale
JB could be process that reads files and provides outputs
KS - 10k+ rules that are needed, people want to leverage
PA - as someone who used SDMA a lot, lot to be said for how that has allowed users to self-service
JB - supply somehing for all, add own rules, but also a standard set of rules all can use
JM - depends on what you want in interface, put UI that allows to see whats going on is fine vs UI for editing data
PA - doing it today, have both options,can update rows but a lot of times errors are when carrier ETL fails to make it correctly
KS - not happenign quick enough for cycle to finish - have to get it out in the week, can't get it fixed at the source but need tp be able to hit endpoint - then DB got complicated, error log records, etc.
JB - sep files from corrected records, if there are tests for sanity check, level of quality, up to carrier to try to resolve?
KS - heard doesn't work, what we want to to but can't always do that - sometimes timeframe needs fix in SDMA, either data or process wont change in time
JB - spot in the middle, can't change box 1 in pic, you can change downstream, doesn't HAVE to mean HDS, basically a box 2 instead of UI for finxing it
KS - do it after standardization, not on carrier, up to openIDL footprint to make changes, IS IT PART of openIDL footprint to provide fixing for this data - how do we decide
PA - flip to PATL page, kind of boiled down version, SDMA today, what we can see being robust way of doing it - Carrier ingestion portal, do large edits before or stnadardarzed edit adfter, through package run against any data set, from ingeastion portal trivers a job to HDS - as soon as we put in working table, assign UUID,
KS - not clear where error editing is happening
JM - how od you make changes? edits?
KS - have UI to make changes, shows tables, almost like excel editing
JB - who does that at carrier?
PA - Susan or Reggie for ex at TRV - the business people in charge of loading data, something like this a feature a lot of companies would want to run, like TRV bypass whole system, smaller companies would want it
KS - TRV currently fixing data with SDMA - every now and then we can't get the back end to feed the right data
PA - thinks way we are right now, making all work with this workflow, TRV isn't making changes to data lake, submitting excel with why #s (adjustment artifact) why adjust should happen - as of today AAIS is not updating records in data lake, allowing to edit at load, working in ingestion portal
KS - need to decide if thats a req for the system or not, can say "if we have it, where would it be?"
PA - whole secondary thing in HDS, error table and correction table in HDS, not there today
JB - baseline data qual check, simply check data for acceptibiloty (errors under threshold) and pass along if not exceeded OR if exceeded carrier would need to update, have means to edit records and proecc using tools from openIDL - baseline data qual check where data meets cert qual, if not accepted would carrier allow to be edited or back on carrier
JM - day, 1, day 2, day 3 - simplest design, carrier loads HDS and then done - maybe set flag "ready to go" and if it fails test back up batch and put it back in, no staging area or UI needed to vchange
JB - loading into DB and back it out, squirrely, having format to check on way in (instead of loading garbage)
JM - babystep: v1 is flat load and back out (otherwise staging area to run rules against it) v2 fail and fix, v3 do whatever it takes (typically have aspot where in flow can modify data, OR formal facility in there - day 1, 2, 3 question) - mixing delivery with Arch
JB - likes V2, simplest way to start, door open, if we think a modification interface, qual checs vs modifications, cant let crappy data- load, iterate across, - how do fixes re-apply if you reload stage, fixes in robust design, put them in a fixit body of tables, re-apply fixits - fixit gets big fast
PA - non robust vers TODAY of what JM described, once you see issue, so we make the right call to reload or...
JM - load 1x, modify with fixit interface
JB - copy, fix it, resubmit - still have sep files
JM - prob with fixit, copy made must have structure, could be overwritten with reload
PA - haven't ironed out - 4 csv in one day for auto (1 month worth of loading) - are those 4 CSV one job? mult docs per job?
JB - org'd as sep submissions or files
KS - some scope of identifier of this package of data, work on package identifier, dont release until edit/removed
PA - pivor slightly, passing 47 of 48 states, is there a facility fto load 47 who passed or all-or-nothing?
JM - batch ID mech on all tables, put it into stage, its a batch - if you chose to say "loaded 48 states as a batch and one is wrong, back them all out, individually? <SEAN REVIEW TIME 45min)
JB - file in a folder means needs to be processed
KS - wouldn't want huge company like Hartford wouldn'et want to equate file to a batch (mult batch to file, mult file to batch)
PA - why would a batch have a state/line indiviation - whether pass or fail 4 sep batches
JB rules by state?
PA - depends, rules are more national for most part but passing "is this data for this state/line/timeframe valid?
KS - can batch be orthogonal to reporting? Sent bunch of data, want to fix it all if it doesn't work only found one state screwed up - can't pass the state, batch is
PA - if you have your batch transcend states and lines, tagging errors to non errors
JB - data quality checks record by record, if exceed %/proportion bad collection, may have stats on where errors came from, not sure to enforce pre-sorting of data
PA - if i have set of data (NC and SC) and I submitted should machine render 2 sept batches
KS - can we do it without calling new batch? return errors "AZ so these records dont pass" and have choice to fix, split and fix, etc. - doesn't invalidate batch to have one part of it messed up
JB - idea - check record for format, some point too many errors, problem, if collex are from 2 diff states then need better analytics on data quality, enhancement of data quality checks - make it as easy as possible for carriers to submit data
JM - agree place to stage things, rule to be run - day 1 fix process is on carrier, bolting on fix UI wont break prior architecture - add data qual state column to design, edit package will respond "bad, good, warning" - only answering data quality not changes
KS - does req control mech to release data to allow
JM - happy path - gets data in staging area, normal process kick off the rules, give answers and return answers to scheduler, if value is past safety threshold, if passes, done, if fails fires off email
KS - control DB? Job flow?
JM - if everythng works it just works, if edit is fast run the load to HDS otherwise intervention event - fire email to team/ group notified there is a problem
JM - if edits return "pass" you let the job flow - have to go on assumption 99% of time batches run and work, worried about fixit approach - ideal is und why things are happening, these batches should just flow
KS - pipeline approach
JM - fan of persistent stage, 3 major boxes of ETL integrating with staging, stat model we can all live with on day 1, believes we will check 10-12 EPs, day 1 should be stat model
JM - keep adapter as small as it if, explain it

Tues Sept 20, 2022

KS - talked yesterday about 3 options for member enterprise, (see diagram), tried to simplify it down where we don't have "fixes" in the first cut, need to talk about diff pieces and figure out the phasing, basic contention was no ability to fix at all and should be on back end, but have heard there are folks who will NEED some way to fix data just before reporting, while we dont support first pass find a place to do it, 3 paths show where this can happen: fix after in HDS or fix while in staging area/pipeline (2) or fix before hand in frontloaded process - good idea of whats going on in this box EXCEPT the Adapter
KS - this is what we plan to install in enterprise of member, one of the tenets is we provide data privacy - could be hosted and priv maintained thru hosted node having properl controls, security is well defined nd agreed to by carriers and carriers not moving data out of their world - but when hosted it is obvious data is outside their walls - if we could do hosted it would be simple, stand up whole stack inside node and partners (Senofi, CHainyard, CapGemini) could host nodes and maintain - feedback from ND, some carriers really want to host the data, striving for with this blue box is a tech stack amenable to as many people as possible - TRV, Hartford, State Farm, etc. - want them to be able to stand up this stack in their world - real exp with carrier who ran up ahainst road blocks in tech stack, policies, etc BIG disagreements - more complicated tech stack, 4 major sets of tech, the lighter the stack to install inside carrier world the more likely to be accepted with as little variation as possible - blue box not intended to be perfect, it has to take into account factors of company policies, etc. - try to minimize and maintain integrity of transactions
KS - moving forward, call back to whats been done before, always open to best ways to document architectures
KS - phase - get into each box, figure out rough block diagrams of whats going on, at the adapter
- Blue - member enterprise, Green box AAIS setup, red hosted node - still believe a hosted aspect to this makes sense, complicated tech, young tech, challenging to set up and config, targeted set of people with skills contrilling we will be better off - some day mature (1-button install), Fabric and Kubernetes and AWS has certain complexity we want to encapsulate
- No issues running node at AAIS at this point, will need to interact with analytics node (currently at AAIS, doesn't have to be), any carrier nodes need to interact with carrier enterprise stack where data lives
- Adapter will run extraction, defined in a way it can execute in member's world
- another tenet not challenged, the RAW data is in HDS, the input to the extraction engine, where member is agreeable the processed data can leave their world, into analytics, turned into reports
JB - improvements on horizon for Fabric/Kubernetes (Fabric Operator), move in that direction, carriers are using AWS, lot of it using TCP/IP to make connections to resources, one set of permissions internally or access adapter or apps to talk to fabric, they will be interactions that req auth for use, ways to use it - thing that runs adapter that interprets request to get data would be binding approach: request comes in, not necc executable code but standards for implementation in HDS, concept of adapter good concept, needs to be somethign that happens when request received, hope there will be things that simplify the config
KS - moving target, as much as possible, encapsulate that movement, other stuff doesn't break, a buffer of implementation and all that is trade-off, whats in the adapter
JB lot depends on whats the db of the HDS and what will take, will determine HOW you get data from it
KS - walk thru couple of different db types, see how they might make things harder or easier, weigh options, pick short/long term opps
KS - we have HDS
- noSQL (like Mongo, which 1 carrier said no to)
- Relational DB
- format of the data, allow scripting or programming
JB - common denominator, what is supportable in carrier's domain, some form of standard sql, ought to support something common
KS - current assumptions
- Adapter
  - JB - interact with requests
    - carriers need to see and consent to requests, number of API calls
    - if hosted node wraps interface thru network, api call?
    - adhoc data call - do they know they have that data? (stat plans they know, repeatable), but adhos "what is the nature of the request", some knowledge of whats in the HDS? subsequent phase?
  - KS - function is to execute an extraction against HDS and return results (to analytics node? etc.)
    - if adapter makes as few assumptions as possible - needs to know format of db, cannot be just an api call
    - management among carriers will be handled via Fabric and hosted node, all the adapter does is execute the extraction upon request
    - carriers interacting with network thru hosted node, minimize dependencies, all the adapter needs to know "asked to extract data from here to there"
    - able to test the extraction pattern to see what it would result in
    - extraction should be human readable/understandable
    - some test harness "I have a new EP I want to run, test it"
    - needs to execute when data call happens but also eval before consent (can we do this?), flow says "I will consent but what does it do?"
- db could be nosql or relational
  - sql? nosql? allow scripting?
- extraction must be stored on-ledger
  - meta data, architecture of the request
    - sql? elastic search form?
- extraction options
  - pure sql
  - scripting
  - hll
  - dsl
  - graphql
- data model is (semantically) starting from notion it is the stat plans (transactionally based
  - PA - premium transaction and loss transaction
- JB - currently stat plans, may evolve over time, starting point
- PA - what are we adapting to?
KS - Extraction Processor
- receive requests
- interpret / translate extraction
- execute transaction
- gather results
- return results
KS - big concern about security in this model
- not sure if carriers have pushed back, passing around code which is gen the thing security folks say is a no-no
- dont pass code that will run on someone else's machine
- JB - push or pull model, automatic = exposure but if it is a request (pull) API pulls and evaluates it, not executing code it is interpreting, more secure approach polling for things to do and launching via human control
- KS - where ledger provides some comfort, immutable ledger is good, seen code coming across, good with executing that, everyone has option to consent, some sort of technical way to test Extract process,
- JM - agree with concern, intro construct to reject queries
  - if we had a table that said "if these keywords appear we will reeject query" - tricky to engineer but somethign that can screen filters, config table, system will reject if person approves, will submit request to improve security
- KS - might not need but good to have, would run the initial request thru same scanner
- JB - something validation of request would do
- JM - humans make mistakes, double up on safety, hard to sell idea of arbitrary code execution
- JB - late binding, execute as a way to test against what is in data store
- KS - scanner validator, called as part of extraction processor, will happen via creator of EP, upload, system will run it, standard set of checks
- JB - good add
KS - elephant in the room - still DB decision, data format really tells us a lot of what EP is capable of, depending on reqs some may not be up to the task, take phased approach not starting with simple problem
PA - processing with loops and stuff, stored procedures
JB. - stages or phases with mongo, dont have to do it in all one SQL statement
KS - Mongo stages of agg pipeline? Mongo proprietary, steer away from prop implementations (like map-reduce)
JB - reason to not comm request in executable terms, implement in data store, one thing not in the current transaction data of stat reporting dont have the actual policy record itself, relational rep in the future would support
KS - we do have poilicy identifier in datastore, could be business key to a logical policy record
JB - dont have attributes of policy in terms or coverage
KS - imperfect and difficult to put together, not all reports are policy anyway, agg of coverage across all policies
JB - talked about extracting what fields avail in stat record
KS - assume relational db and our EP was a pipeline of SQLs, one after next, no definition of statements (1, 2, 3) and EP executes pipeline of SQLs
JB - all specification of how complex a request
PA - likes
KS - 1 not making up a language, not depending on someone looking at lang pre-process and und what it is doing, it is easy to execute, just clear text
JM -
- 1 can we create views as part of execution
- fundamentally, is our problem a document store prob OR an entity relationship problem
- Mongo is for doc stores, feels like a relational problem, insurance entites well defined and consistent
- doc store not nature of what we are doing
- dealing with well defined entities that relate to each other constantly, at 100k feet a relational problem
KS - ready to commit to relational, fully supports what we need to do, aligns well with what we are after, transactional records
JM - supportability - feedback from friday call, whole area comfortable with SQL, step away a more scarce skillset
KS - document db supports simple model, outstrip it quick but simple way (2 tools - relational db and sql)
JM - can we define arbitrary views
JB - views used in staging, define and access, dont have to run first stage of pipeling
KS - can we create views or use multi stage?
JM - WITH clause in SQL, how complex do you want to get?
KS - where the work is, battle b/w devs and db people, if i have to call you every time i want a new field take off, if we have to deploy a new db model every time we want a new view ?
JM - schema evolution question, need to keep schema updated,
KS - issue with old queries working
JM - unless you validate JSON struct AND schema, no magic to schema management
JB - might not change that often, 1-2 times a year at most

Time	Item	Who	Notes

Space shortcuts

Page tree

Discussion notes: