You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Discussion notes:

AWG Master diagram:

https://lucid.app/lucidchart/ac20d4e1-50ad-4367-b5cf-247ed9bad667/edit?viewport_loc=-301%2C-37%2C3200%2C1833%2CkgLSxXlpoGmM&invitationId=inv_de1a5e61-8edc-488a-90ee-8312f8c69cd4#

  • KS -
    • Work from this and other diagrams
    • Draw boxes as opposed to blank slate
  • PA - start broad, draw stuff until we have off
  • KS - can check off major functions, components in swim lanes, need to ingest fata, load it, data calls, extract, report - high level - can start from left and spitball flow of data as it comes through the system, get high level boxes in there, or as we go itemize whats in boxes - if doing it alone Ken would do all the above - start from somethign may or may not be close
  • KS - first step: getting data into HDS, through some means, means many mult DBs or systems are the source of this info, needs to be normalized - ETL is going to normalize data from diff sources into an openIDL format (HDS - first thing to happen) - another thing that is going to happen, we are goign to edit it (syntax, data errors in ETL ) then convert to HDS format and load into HDS format (all happens in Member Enterprise on :LEFT of diagram
  • JM - first point of concern, to do the edits, need to be in standardized format (for predictability)
  • KS - if standardizing edits, therefore the data input must be standardized at this point, putting data into standardized format - edit package will edit it for validity aganzt a number of rules, know we have a rules engine in the ETL (and rules repository lies in ETL)
  • JB - puts data into sep format and second step where it checks it
  • KS - ETL standardizes data, then ETL Edit engine (Rules Engine, Rules Repo) - 2 major steps going on inside here: ETL and Edit then converter to map it to format for HDS
  • KS - standardized, edited (rules engine and repo) then mapped to HDS format - coming out of here: only the valid
  • JB - warnings and exceptions based on issues
  • JM also intermediary data set, want to land data into some structure visible to run edit rules, some type of repo, biased towards persistence
  • JB - batch or message format
  • JM - stuff on fly needs troubleshooting, persisting diff issue (maintenance)
  • JB - just some kind of file format with w/ schema, well-defined structure
  • KS - after edit, thought we were keepoing all the records in the HDS, still true? Flagging those with errors? Or just daying
  • JB - 2 types: data errors - sanity check?
  • KS - ui for controlling edits and release of this stuff
  • JB - going to be issues with data vut if thresholds not above a cert threshold...  path, data quality checks - discussed everyhting submitted goes to HDS b/c it comes from the source
  • KS - SDMA func "this has x erroers but in a few months will release all data
  • JB - submissions - daily or weekly
  • JM then dont need UI - up to carrier to figure out how to pass all tests, 
  • KS - think we  heard wd rather have standardized process, need UI to get it to work - no consensus, thinks Dale
  • JB could be process that reads files and provides outputs
  • KS - 10k+ rules that are needed, people want to leverage
  • PA - as someone who used SDMA a lot, lot to be said for how that has allowed users to self-service
  • JB - supply somehing for all, add own rules, but also a standard set of rules all can use
  • JM - depends on what you want in interface, put UI that allows to see whats going on is fine vs UI for editing data
  • PA - doing it today, have both options,can update rows but a lot of times errors are when carrier ETL fails to make it correctly
  • KS - not happenign quick enough for cycle to finish - have to get it out in the week, can't get it fixed at the source but need tp be able to hit endpoint - then DB got complicated, error log records, etc. 
  • JB - sep files from corrected records, if there are tests for sanity check, level of quality, up to carrier to try to resolve?
  • KS - heard doesn't work, what we want to to but can't always do that - sometimes timeframe needs fix in SDMA, either data or process wont change in time
  • JB - spot in the middle, can't change box 1 in pic, you can change downstream, doesn't HAVE to mean HDS, basically a box 2 instead of UI for finxing it
  • KS - do it after standardization, not on carrier, up to openIDL footprint to make changes, IS IT PART of openIDL footprint to provide fixing for this data - how do we decide
  • PA - flip to PATL page, kind of boiled down version, SDMA today, what we can see being robust way of doing it - Carrier ingestion portal, do large edits before or stnadardarzed edit adfter, through package run against any data set, from ingeastion portal trivers a job to HDS - as soon as we put in working table, assign UUID, 
  • KS - not clear where error editing is happening
  • JM - how od you make changes? edits?
  • KS - have UI to make changes, shows tables, almost like excel editing
  • JB - who does that at carrier?
  •  PA - Susan or Reggie for ex at TRV - the business people in charge of loading data, something like this a feature a lot of companies would want to run, like TRV bypass whole system, smaller companies would want it
  • KS - TRV currently fixing data with SDMA - every now and then we can't get the back end to feed the right data
  • PA - thinks way we are right now, making all work with this workflow, TRV isn't making changes to data lake, submitting excel with why #s (adjustment artifact) why adjust should happen - as of today AAIS is not updating records in data lake, allowing to edit at load, working in ingestion portal
  • KS - need to decide if thats a req for the system or not, can say "if we have it, where would it be?"
  • PA - whole secondary thing in HDS, error table and correction table in HDS, not there today
  • JB - baseline data qual check, simply check data for acceptibiloty (errors under threshold) and pass along if not exceeded OR if exceeded carrier would need to update, have means to edit records and proecc using tools from openIDL - baseline data qual check where data meets cert qual, if  not accepted would carrier allow to be edited or back on carrier
  • JM - day, 1, day 2, day 3 - simplest design, carrier loads HDS and then done - maybe set flag "ready to go" and if it fails test back up batch and put it back in, no staging area or UI needed to vchange
  • JB - loading into DB and back it out, squirrely, having format to check on way in (instead of loading garbage)
  • JM - babystep: v1 is flat load and back out (otherwise staging area to run rules against it) v2 fail and fix, v3 do whatever it takes (typically have aspot where in flow can modify data, OR formal facility in there  - day 1, 2, 3 question) - mixing delivery with Arch
  • JB - likes V2, simplest way to start, door open, if we think a modification interface, qual checs vs modifications, cant let crappy data- load, iterate across, - how do fixes re-apply if you reload stage, fixes in robust design, put them in a fixit body of tables, re-apply fixits - fixit gets big fast
  • PA - non robust vers TODAY of what JM described, once you see issue, so we make the right call to reload or...
  • JM - load 1x, modify with fixit interface
  • JB - copy, fix it, resubmit - still have sep files
  • JM - prob with fixit, copy made must have structure, could be overwritten with reload
  • PA - haven't ironed out - 4 csv in one day for auto (1 month worth of loading) - are those 4 CSV one job? mult docs per job? 
  • JB - org'd as sep submissions or files
  • KS - some scope of identifier of this package of data, work on package identifier, dont release until edit/removed
  • PA - pivor slightly, passing 47 of 48 states, is there a facility fto load 47 who passed or all-or-nothing?
  • JM - batch ID mech on all tables, put it into stage, its a batch - if you chose to say "loaded 48 states as a batch and one is wrong, back them all out, individually? <SEAN REVIEW TIME 45min)
  • JB - file in a folder means needs to be processed
  • KS - wouldn't want huge company like Hartford wouldn'et want to equate file to a batch (mult batch to file, mult file to batch)
  • PA - why would a batch have a state/line indiviation - whether pass or fail 4 sep batches
  • JB rules by state?
  • PA - depends, rules are more national for most part but passing "is this data for this state/line/timeframe valid? 
  • KS - can batch be orthogonal to reporting? Sent bunch of data, want to fix it all if it doesn't work only found one state screwed up - can't pass the state, batch is 
  • PA - if you have your batch transcend states and lines, tagging errors to non errors
  • JB - data quality checks record by record, if exceed %/proportion bad collection, may have stats on where errors came from, not sure to enforce pre-sorting of data
  • PA - if i have set of data (NC and SC) and I submitted should machine render 2 sept batches
  • KS - can we do it without calling new batch? return errors "AZ so these records dont pass" and have choice to fix, split and fix, etc. - doesn't invalidate batch to have one part of it messed up
  • JB - idea - check record for format, some point too many errors, problem, if collex are from 2 diff states then need better analytics on data quality, enhancement of data quality checks - make it as easy as possible for carriers to submit data
  • JM - agree place to stage things, rule to be run - day 1 fix process is on carrier, bolting on fix UI wont break prior architecture - add data qual state column to design, edit package will respond "bad, good, warning" - only answering data quality not changes
  • KS - does req control mech to release data to allow 
  • JM - happy path - gets data in staging area, normal process kick off the rules, give answers and return answers to scheduler, if value is past safety threshold, if passes, done, if fails fires off email
  • KS - control DB? Job flow?
  • JM - if everythng works it just works, if edit is fast run the load to HDS otherwise intervention event - fire email to team/ group notified there is a problem
  • JM - if edits return "pass" you let the job flow  - have to go on assumption 99% of time batches run and work, worried about fixit approach - ideal is und why things are happening, these batches should just flow 
  • KS - pipeline approach
  • JM - fan of persistent stage, 3 major boxes of ETL integrating with staging, stat model we can all live with on day 1, believes we will check 10-12 EPs, day 1 should be  stat model
  • JM - keep adapter as small as it if, explain it

Tues Sept 20, 2022

  • KS - talked yesterday about 3 options for member enterprise, (see diagram), tried to simplify it down where we don't have "fixes" in the first cut, need to talk about diff pieces and figure out the phasing, basic contention  was no ability to fix at all and should be on back end, but have heard there are folks who will NEED some way to fix data just before reporting, while we dont support first pass find a place to do it, 3 paths show where this can happen: fix after in HDS or fix while in staging area/pipeline (2) or fix before hand in frontloaded process - good idea of whats going on in this box EXCEPT the Adapter
  • KS - this is what we plan to install in enterprise of member, one of the tenets is we provide data privacy - could be hosted and priv maintained thru hosted node having properl controls, security is well defined nd agreed to by carriers and carriers not moving data out of their world - but when hosted it is obvious data is outside their walls - if we could do hosted it would be simple, stand up whole stack inside node and partners (Senofi, CHainyard, CapGemini) could host nodes and maintain - feedback from ND, some carriers really want to host the data, striving for with this blue box is a tech stack amenable to as many people as possible - TRV, Hartford, State Farm, etc.  - want them to be able to stand up this stack in their world - real exp with carrier who ran up ahainst road blocks in tech stack, policies, etc BIG disagreements - more complicated tech stack, 4 major sets of tech, the lighter the stack to install inside carrier world the more likely to be accepted with as little variation as possible  - blue box not intended to be perfect, it has to take into account factors of company policies, etc. - try to minimize and maintain integrity of transactions
  • KS - moving forward, call back to whats been done before, always open to best ways to document architectures
  • KS - phase - get into each box, figure out rough block diagrams of whats going on, at the adapter
    • Blue - member enterprise, Green box AAIS setup, red hosted node - still believe a hosted aspect to this makes sense, complicated tech, young tech, challenging to set up and config, targeted set of people with skills contrilling we will be better off - some day mature (1-button install), Fabric and Kubernetes and AWS has certain complexity we want to encapsulate
    • No issues running node at AAIS at this point, will need to interact with analytics node (currently at AAIS, doesn't have to be), any carrier nodes need to interact with carrier enterprise stack where data lives
    • Adapter will run extraction, defined in a way it can execute in member's world 
    • another tenet not challenged, the RAW data is in HDS, the input to the extraction engine, where member is agreeable the processed data can leave their world, into analytics, turned into reports
  • JB - improvements on horizon for Fabric/Kubernetes (Fabric Operator), move in that direction, carriers are using AWS, lot of it using TCP/IP to make connections to resources, one set of permissions internally or access adapter or apps to talk to fabric, they will be interactions that req auth for use, ways to use it - thing that runs adapter that interprets request to get data would be binding approach: request comes in, not necc executable code but standards for implementation in HDS, concept of adapter good concept, needs to be somethign that happens when request received, hope there will be things that simplify the config
  • KS - moving target, as much as possible, encapsulate that movement, other stuff doesn't break, a buffer of implementation and all that is trade-off, whats in the adapter
  • JB lot depends on whats the db of the HDS and what will take, will determine HOW you get data from it
  • KS - walk thru couple of different db types, see how they might make things harder or easier, weigh options, pick short/long term opps
  • KS - we have HDS
    • noSQL (like Mongo, which 1 carrier said no to)
    • Relational DB
    • format of the data, allow scripting or programming
  • JB - common denominator, what is supportable in carrier's domain, some form of standard sql, ought to support something common
  • KS - current assumptions
    • Adapter
      • JB - interact with requests
        • carriers need to see and consent to requests, number of API calls
        • if hosted node wraps interface thru network, api call?
        • adhoc data call - do they know they have that data? (stat plans they know, repeatable), but adhos "what is the nature of the request", some knowledge of whats in the HDS? subsequent phase?
      • KS - function is to execute an extraction against HDS and return results (to analytics node? etc.) 
        • if adapter makes as few assumptions as possible - needs to know format of db, cannot be just an api call
        • management among carriers will be handled via Fabric and hosted node, all the adapter does is execute the extraction upon request
        • carriers interacting with network thru hosted node, minimize dependencies, all the adapter needs to know "asked to extract data from here to there"
        • able to test the extraction pattern to see what it would result in 
        • extraction should be human readable/understandable
        • some test harness "I have a new EP I want to run, test it"
        • needs to execute when data call happens but also eval before consent (can we do this?), flow says "I will consent but what does it do?"
    • db could be nosql or relational
      • sql? nosql? allow scripting?
    • extraction must be stored on-ledger
      • meta data, architecture of the request
        • sql? elastic search form?
    • extraction options
      • pure sql
      • scripting
      • hll
      • dsl
      • graphql
    • data model is (semantically) starting from notion it is the stat plans (transactionally based
      • PA - premium transaction and loss transaction
    • JB - currently stat plans, may evolve over time, starting point
    • PA - what are we adapting to?
  • KS - Extraction Processor
    • receive requests
    • interpret / translate extraction 
    • execute transaction
    • gather results
    • return results
  • KS - big concern about security in this model
    • not sure if carriers have pushed back, passing around code which is gen the thing security folks say is a no-no
    • dont pass code that will run on someone else's machine
    • JB - push or pull model, automatic = exposure but if it is a request (pull) API pulls and evaluates it, not executing code it is interpreting, more secure approach polling for things to do and launching via human control 
    • KS - where ledger provides some comfort, immutable ledger is good, seen code coming across, good with executing that, everyone has option to consent, some sort of technical way to test Extract process,
    • JM - agree with concern, intro construct to reject queries
      • if we had a table that said "if these keywords appear we will reeject query" - tricky to engineer but somethign that can screen filters, config table, system will reject if person approves, will submit request to improve security
    • KS - might not need but good to have, would run the initial request thru same scanner
    • JB - something validation of request would do
    • JM - humans make mistakes, double up on safety, hard to sell idea of arbitrary code execution
    • JB - late binding, execute as a way to test against what is in data store
    • KS - scanner validator, called as part of extraction processor, will happen via creator of EP, upload, system will run it, standard set of checks
    • JB - good add
  • KS - elephant in the room - still DB decision, data format really tells us a lot of what EP is capable of, depending on reqs some may not be up to the task, take phased approach not starting with simple problem
  • PA - processing with loops and stuff, stored procedures
  • JB. - stages or phases with mongo, dont have to do it in all one SQL statement
  • KS - Mongo stages of agg pipeline? Mongo proprietary, steer away from prop implementations (like map-reduce)
  • JB - reason to not comm request in executable terms, implement in data store, one thing not in the current transaction data of stat reporting dont have the actual policy record itself, relational rep in the future would support
  • KS - we do have poilicy identifier in datastore, could be business key to a logical policy record
  • JB - dont have attributes of policy in terms or coverage
  • KS - imperfect and difficult to put together, not all reports are policy anyway, agg of coverage across all policies
  • JB - talked about extracting what fields avail in stat record
  • KS - assume relational db and our EP was a pipeline of SQLs, one after next, no definition of statements (1, 2, 3) and EP executes pipeline of SQLs
  • JB - all specification of how complex a request
  • PA - likes
  • KS - 1 not making up a language, not depending on someone looking at lang pre-process and und what it is doing, it is easy to execute, just clear text
  • JM -
    • 1 can we create views as part of execution
    • fundamentally, is our problem a document store prob OR an entity relationship problem
    • Mongo is for doc stores, feels like a relational problem, insurance entites well defined and consistent
    • doc store not nature of what we are doing 
    • dealing with well defined entities that relate to each other constantly, at 100k feet a relational problem
  • KS - ready to commit to relational, fully supports what we need to do, aligns well with what we are after, transactional records
  • JM - supportability - feedback from friday call, whole area comfortable with SQL, step away a more scarce skillset
  • KS - document db supports simple model, outstrip it quick but simple way (2 tools - relational db and sql)
  • JM - can we define arbitrary views
  • JB - views used in staging, define and access, dont have to run first stage of pipeling
  • KS - can we create views or use multi stage?
  • JM - WITH clause in SQL, how complex do you want to get?
  • KS - where the work is, battle b/w devs and db people, if i have to call you every time i want a new field take off, if we have to deploy a new db model every time we want a new view ?
  • JM - schema evolution question, need to keep schema updated, 
  • KS - issue with old queries working 
  • JM - unless you validate JSON struct AND schema, no magic to schema management 
  • JB - might not change that often, 1-2 times a year at most
  •  

Time

Item

Who

Notes









  • No labels