openIDL - Architecture Discovery (Drawing)

Discussion notes:

AWG Master diagram:

https://lucid.app/lucidchart/ac20d4e1-50ad-4367-b5cf-247ed9bad667/edit?viewport_loc=-301%2C-37%2C3200%2C1833%2CkgLSxXlpoGmM&invitationId=inv_de1a5e61-8edc-488a-90ee-8312f8c69cd4#

KS -
- Work from this and other diagrams
- Draw boxes as opposed to blank slate
PA - start broad, draw stuff until we have off
KS - can check off major functions, components in swim lanes, need to ingest fata, load it, data calls, extract, report - high level - can start from left and spitball flow of data as it comes through the system, get high level boxes in there, or as we go itemize whats in boxes - if doing it alone Ken would do all the above - start from somethign may or may not be close
KS - first step: getting data into HDS, through some means, means many mult DBs or systems are the source of this info, needs to be normalized - ETL is going to normalize data from diff sources into an openIDL format (HDS - first thing to happen) - another thing that is going to happen, we are goign to edit it (syntax, data errors in ETL ) then convert to HDS format and load into HDS format (all happens in Member Enterprise on :LEFT of diagram
JM - first point of concern, to do the edits, need to be in standardized format (for predictability)
KS - if standardizing edits, therefore the data input must be standardized at this point, putting data into standardized format - edit package will edit it for validity aganzt a number of rules, know we have a rules engine in the ETL (and rules repository lies in ETL)
JB - puts data into sep format and second step where it checks it
KS - ETL standardizes data, then ETL Edit engine (Rules Engine, Rules Repo) - 2 major steps going on inside here: ETL and Edit then converter to map it to format for HDS
KS - standardized, edited (rules engine and repo) then mapped to HDS format - coming out of here: only the valid
JB - warnings and exceptions based on issues
JM also intermediary data set, want to land data into some structure visible to run edit rules, some type of repo, biased towards persistence
JB - batch or message format
JM - stuff on fly needs troubleshooting, persisting diff issue (maintenance)
JB - just some kind of file format with w/ schema, well-defined structure
KS - after edit, thought we were keepoing all the records in the HDS, still true? Flagging those with errors? Or just daying
JB - 2 types: data errors - sanity check?
KS - ui for controlling edits and release of this stuff
JB - going to be issues with data vut if thresholds not above a cert threshold... path, data quality checks - discussed everyhting submitted goes to HDS b/c it comes from the source
KS - SDMA func "this has x erroers but in a few months will release all data
JB - submissions - daily or weekly
JM then dont need UI - up to carrier to figure out how to pass all tests,
KS - think we heard wd rather have standardized process, need UI to get it to work - no consensus, thinks Dale
JB could be process that reads files and provides outputs
KS - 10k+ rules that are needed, people want to leverage
PA - as someone who used SDMA a lot, lot to be said for how that has allowed users to self-service
JB - supply somehing for all, add own rules, but also a standard set of rules all can use
JM - depends on what you want in interface, put UI that allows to see whats going on is fine vs UI for editing data
PA - doing it today, have both options,can update rows but a lot of times errors are when carrier ETL fails to make it correctly
KS - not happenign quick enough for cycle to finish - have to get it out in the week, can't get it fixed at the source but need tp be able to hit endpoint - then DB got complicated, error log records, etc.
JB - sep files from corrected records, if there are tests for sanity check, level of quality, up to carrier to try to resolve?
KS - heard doesn't work, what we want to to but can't always do that - sometimes timeframe needs fix in SDMA, either data or process wont change in time
JB - spot in the middle, can't change box 1 in pic, you can change downstream, doesn't HAVE to mean HDS, basically a box 2 instead of UI for finxing it
KS - do it after standardization, not on carrier, up to openIDL footprint to make changes, IS IT PART of openIDL footprint to provide fixing for this data - how do we decide
PA - flip to PATL page, kind of boiled down version, SDMA today, what we can see being robust way of doing it - Carrier ingestion portal, do large edits before or stnadardarzed edit adfter, through package run against any data set, from ingeastion portal trivers a job to HDS - as soon as we put in working table, assign UUID,
KS - not clear where error editing is happening
JM - how od you make changes? edits?
KS - have UI to make changes, shows tables, almost like excel editing
JB - who does that at carrier?
PA - Susan or Reggie for ex at TRV - the business people in charge of loading data, something like this a feature a lot of companies would want to run, like TRV bypass whole system, smaller companies would want it
KS - TRV currently fixing data with SDMA - every now and then we can't get the back end to feed the right data
PA - thinks way we are right now, making all work with this workflow, TRV isn't making changes to data lake, submitting excel with why #s (adjustment artifact) why adjust should happen - as of today AAIS is not updating records in data lake, allowing to edit at load, working in ingestion portal
KS - need to decide if thats a req for the system or not, can say "if we have it, where would it be?"
PA - whole secondary thing in HDS, error table and correction table in HDS, not there today
JB - baseline data qual check, simply check data for acceptibiloty (errors under threshold) and pass along if not exceeded OR if exceeded carrier would need to update, have means to edit records and proecc using tools from openIDL - baseline data qual check where data meets cert qual, if not accepted would carrier allow to be edited or back on carrier
JM - day, 1, day 2, day 3 - simplest design, carrier loads HDS and then done - maybe set flag "ready to go" and if it fails test back up batch and put it back in, no staging area or UI needed to vchange
JB - loading into DB and back it out, squirrely, having format to check on way in (instead of loading garbage)
JM - babystep: v1 is flat load and back out (otherwise staging area to run rules against it) v2 fail and fix, v3 do whatever it takes (typically have aspot where in flow can modify data, OR formal facility in there - day 1, 2, 3 question) - mixing delivery with Arch
JB - likes V2, simplest way to start, door open, if we think a modification interface, qual checs vs modifications, cant let crappy data- load, iterate across, - how do fixes re-apply if you reload stage, fixes in robust design, put them in a fixit body of tables, re-apply fixits - fixit gets big fast
PA - non robust vers TODAY of what JM described, once you see issue, so we make the right call to reload or...
JM - load 1x, modify with fixit interface
JB - copy, fix it, resubmit - still have sep files
JM - prob with fixit, copy made must have structure, could be overwritten with reload
PA - haven't ironed out - 4 csv in one day for auto (1 month worth of loading) - are those 4 CSV one job? mult docs per job?
JB - org'd as sep submissions or files
KS - some scope of identifier of this package of data, work on package identifier, dont release until edit/removed
PA - pivor slightly, passing 47 of 48 states, is there a facility fto load 47 who passed or all-or-nothing?
JM - batch ID mech on all tables, put it into stage, its a batch - if you chose to say "loaded 48 states as a batch and one is wrong, back them all out, individually? <SEAN REVIEW TIME 45min)
JB - file in a folder means needs to be processed
KS - wouldn't want huge company like Hartford wouldn'et want to equate file to a batch (mult batch to file, mult file to batch)
PA - why would a batch have a state/line indiviation - whether pass or fail 4 sep batches
JB rules by state?
PA - depends, rules are more national for most part but passing "is this data for this state/line/timeframe valid?
KS - can batch be orthogonal to reporting? Sent bunch of data, want to fix it all if it doesn't work only found one state screwed up - can't pass the state, batch is
PA - if you have your batch transcend states and lines, tagging errors to non errors
JB - data quality checks record by record, if exceed %/proportion bad collection, may have stats on where errors came from, not sure to enforce pre-sorting of data
PA - if i have set of data (NC and SC) and I submitted should machine render 2 sept batches
KS - can we do it without calling new batch? return errors "AZ so these records dont pass" and have choice to fix, split and fix, etc. - doesn't invalidate batch to have one part of it messed up
JB - idea - check record for format, some point too many errors, problem, if collex are from 2 diff states then need better analytics on data quality, enhancement of data quality checks - make it as easy as possible for carriers to submit data
JM - agree place to stage things, rule to be run - day 1 fix process is on carrier, bolting on fix UI wont break prior architecture - add data qual state column to design, edit package will respond "bad, good, warning" - only answering data quality not changes
KS - does req control mech to release data to allow
JM - happy path - gets data in staging area, normal process kick off the rules, give answers and return answers to scheduler, if value is past safety threshold, if passes, done, if fails fires off email
KS - control DB? Job flow?
JM - if everythng works it just works, if edit is fast run the load to HDS otherwise intervention event - fire email to team/ group notified there is a problem
JM - if edits return "pass" you let the job flow - have to go on assumption 99% of time batches run and work, worried about fixit approach - ideal is und why things are happening, these batches should just flow
KS - pipeline approach
JM - fan of persistent stage, 3 major boxes of ETL integrating with staging, stat model we can all live with on day 1, believes we will check 10-12 EPs, day 1 should be stat model
JM - keep adapter as small as it if, explain it

Tues Sept 20, 2022

KS - talked yesterday about 3 options for member enterprise, (see diagram), tried to simplify it down where we don't have "fixes" in the first cut, need to talk about diff pieces and figure out the phasing, basic contention was no ability to fix at all and should be on back end, but have heard there are folks who will NEED some way to fix data just before reporting, while we dont support first pass find a place to do it, 3 paths show where this can happen: fix after in HDS or fix while in staging area/pipeline (2) or fix before hand in frontloaded process - good idea of whats going on in this box EXCEPT the Adapter
KS - this is what we plan to install in enterprise of member, one of the tenets is we provide data privacy - could be hosted and priv maintained thru hosted node having properl controls, security is well defined nd agreed to by carriers and carriers not moving data out of their world - but when hosted it is obvious data is outside their walls - if we could do hosted it would be simple, stand up whole stack inside node and partners (Senofi, CHainyard, CapGemini) could host nodes and maintain - feedback from ND, some carriers really want to host the data, striving for with this blue box is a tech stack amenable to as many people as possible - TRV, Hartford, State Farm, etc. - want them to be able to stand up this stack in their world - real exp with carrier who ran up ahainst road blocks in tech stack, policies, etc BIG disagreements - more complicated tech stack, 4 major sets of tech, the lighter the stack to install inside carrier world the more likely to be accepted with as little variation as possible - blue box not intended to be perfect, it has to take into account factors of company policies, etc. - try to minimize and maintain integrity of transactions
KS - moving forward, call back to whats been done before, always open to best ways to document architectures
KS - phase - get into each box, figure out rough block diagrams of whats going on, at the adapter
- Blue - member enterprise, Green box AAIS setup, red hosted node - still believe a hosted aspect to this makes sense, complicated tech, young tech, challenging to set up and config, targeted set of people with skills contrilling we will be better off - some day mature (1-button install), Fabric and Kubernetes and AWS has certain complexity we want to encapsulate
- No issues running node at AAIS at this point, will need to interact with analytics node (currently at AAIS, doesn't have to be), any carrier nodes need to interact with carrier enterprise stack where data lives
- Adapter will run extraction, defined in a way it can execute in member's world
- another tenet not challenged, the RAW data is in HDS, the input to the extraction engine, where member is agreeable the processed data can leave their world, into analytics, turned into reports
JB - improvements on horizon for Fabric/Kubernetes (Fabric Operator), move in that direction, carriers are using AWS, lot of it using TCP/IP to make connections to resources, one set of permissions internally or access adapter or apps to talk to fabric, they will be interactions that req auth for use, ways to use it - thing that runs adapter that interprets request to get data would be binding approach: request comes in, not necc executable code but standards for implementation in HDS, concept of adapter good concept, needs to be somethign that happens when request received, hope there will be things that simplify the config
KS - moving target, as much as possible, encapsulate that movement, other stuff doesn't break, a buffer of implementation and all that is trade-off, whats in the adapter
JB lot depends on whats the db of the HDS and what will take, will determine HOW you get data from it
KS - walk thru couple of different db types, see how they might make things harder or easier, weigh options, pick short/long term opps
KS - we have HDS
- noSQL (like Mongo, which 1 carrier said no to)
- Relational DB
- format of the data, allow scripting or programming
JB - common denominator, what is supportable in carrier's domain, some form of standard sql, ought to support something common
KS - current assumptions
- Adapter
  - JB - interact with requests
    - carriers need to see and consent to requests, number of API calls
    - if hosted node wraps interface thru network, api call?
    - adhoc data call - do they know they have that data? (stat plans they know, repeatable), but adhos "what is the nature of the request", some knowledge of whats in the HDS? subsequent phase?
  - KS - function is to execute an extraction against HDS and return results (to analytics node? etc.)
    - if adapter makes as few assumptions as possible - needs to know format of db, cannot be just an api call
    - management among carriers will be handled via Fabric and hosted node, all the adapter does is execute the extraction upon request
    - carriers interacting with network thru hosted node, minimize dependencies, all the adapter needs to know "asked to extract data from here to there"
    - able to test the extraction pattern to see what it would result in
    - extraction should be human readable/understandable
    - some test harness "I have a new EP I want to run, test it"
    - needs to execute when data call happens but also eval before consent (can we do this?), flow says "I will consent but what does it do?"
- db could be nosql or relational
  - sql? nosql? allow scripting?
- extraction must be stored on-ledger
  - meta data, architecture of the request
    - sql? elastic search form?
- extraction options
  - pure sql
  - scripting
  - hll
  - dsl
  - graphql
- data model is (semantically) starting from notion it is the stat plans (transactionally based
  - PA - premium transaction and loss transaction
- JB - currently stat plans, may evolve over time, starting point
- PA - what are we adapting to?
KS - Extraction Processor
- receive requests
- interpret / translate extraction
- execute transaction
- gather results
- return results
KS - big concern about security in this model
- not sure if carriers have pushed back, passing around code which is gen the thing security folks say is a no-no
- dont pass code that will run on someone else's machine
- JB - push or pull model, automatic = exposure but if it is a request (pull) API pulls and evaluates it, not executing code it is interpreting, more secure approach polling for things to do and launching via human control
- KS - where ledger provides some comfort, immutable ledger is good, seen code coming across, good with executing that, everyone has option to consent, some sort of technical way to test Extract process,
- JM - agree with concern, intro construct to reject queries
  - if we had a table that said "if these keywords appear we will reeject query" - tricky to engineer but somethign that can screen filters, config table, system will reject if person approves, will submit request to improve security
- KS - might not need but good to have, would run the initial request thru same scanner
- JB - something validation of request would do
- JM - humans make mistakes, double up on safety, hard to sell idea of arbitrary code execution
- JB - late binding, execute as a way to test against what is in data store
- KS - scanner validator, called as part of extraction processor, will happen via creator of EP, upload, system will run it, standard set of checks
- JB - good add
KS - elephant in the room - still DB decision, data format really tells us a lot of what EP is capable of, depending on reqs some may not be up to the task, take phased approach not starting with simple problem
PA - processing with loops and stuff, stored procedures
JB. - stages or phases with mongo, dont have to do it in all one SQL statement
KS - Mongo stages of agg pipeline? Mongo proprietary, steer away from prop implementations (like map-reduce)
JB - reason to not comm request in executable terms, implement in data store, one thing not in the current transaction data of stat reporting dont have the actual policy record itself, relational rep in the future would support
KS - we do have poilicy identifier in datastore, could be business key to a logical policy record
JB - dont have attributes of policy in terms or coverage
KS - imperfect and difficult to put together, not all reports are policy anyway, agg of coverage across all policies
JB - talked about extracting what fields avail in stat record
KS - assume relational db and our EP was a pipeline of SQLs, one after next, no definition of statements (1, 2, 3) and EP executes pipeline of SQLs
JB - all specification of how complex a request
PA - likes
KS - 1 not making up a language, not depending on someone looking at lang pre-process and und what it is doing, it is easy to execute, just clear text
JM -
- 1 can we create views as part of execution
- fundamentally, is our problem a document store prob OR an entity relationship problem
- Mongo is for doc stores, feels like a relational problem, insurance entites well defined and consistent
- doc store not nature of what we are doing
- dealing with well defined entities that relate to each other constantly, at 100k feet a relational problem
KS - ready to commit to relational, fully supports what we need to do, aligns well with what we are after, transactional records
JM - supportability - feedback from friday call, whole area comfortable with SQL, step away a more scarce skillset
KS - document db supports simple model, outstrip it quick but simple way (2 tools - relational db and sql)
JM - can we define arbitrary views
JB - views used in staging, define and access, dont have to run first stage of pipeling
KS - can we create views or use multi stage?
JM - WITH clause in SQL, how complex do you want to get?
KS - where the work is, battle b/w devs and db people, if i have to call you every time i want a new field take off, if we have to deploy a new db model every time we want a new view ?
JM - schema evolution question, need to keep schema updated,
KS - issue with old queries working
JM - unless you validate JSON struct AND schema, no magic to schema management
JB - might not change that often, 1-2 times a year at most

Monday 9/26/2022

Member's Enterprise Options 1-3 (review of drawing)
JB - push or pull?
KS - pull is in the openIDL as a Service (orange box)
Jb - polling for requests?
KS - Adapter Tab in drawing
- API external facing (listening for requests), gets requests w/ Extract logic (doc passed in), Extraction by Extract Processor, into DB we choose, EP is testable, before someone consents, eval valid requests, what returned, some kind of Scanner/Validator
- Area - execution of code (arbitrary? ), well known pattern not supposed to send code across and run it - make sure it will be ok, somehting that needs ot happen to that code beforre it can run
- right now - map-reduce (moving away from), because proprietary to mongo and limited, if relational pipeline of queries, graphQL, get to point of assumption
KS Member Node
- complete processing of particular node
- KS - Responsible for:
  - managing the network (Fabric) for a carrier
  - all of the interaction with the network
  - running chaincode
  - managing the ledger
- DR - thing that takes extracted data and joins it with other data - function - acc to funct reqs
- JB - other data, some might be avail on carrier side but not shared, could have universal data, but carrier info, granular and rolled up before sharing
- DR - defined by EP
- KS - extraction can request external data
- DR - whatever comes back comes back
- JB - ancillary data, in the node or carrier, sencitivity of connection - if you have data related to cust accounts, didnt want to share, addresses, etc., rolled up into info not shared (is it in a flood zone)
- KS - part that makes it possible
- DR - pink node, some data from carrier, take it, getting it to point where it can be agg and anon - most likely allready agg at this point, takes data, does intended JOINS, point where combined with other carriers data
- JB - thought pink one of the nodes sending data priv to agg node
- DR - exactly - the bridge, takes data from carrier and sends to other place
- KS - analytics node is the "other place"
- DR - go between
- JB - place whree you manage req, see them in UI
- DR - yes but housekeeping to the biz function - this ghets it there in a permissioned way
- JB -where config (setting up channels, so forth) contained in hosted node AAS
- DR - implementation specific, not business function
- KS - configs the path to the network
- DR = could be stateless too in theory - not a network
- JB - not just data extract, if it. comes over api or logged into hosted node remais to be seen
- KS - if we target the level of "gets data, moves it on", need to make decisions at some point whether in first pass mention Fabric or stay above,
- JB - fabric for now, point is theres a control flow and data flow
- JB - control interactions and data interactions
- KS - node knows when to request results, decides when/initiates request for carrier Extract results
- JB - after req consent - "Can you do this?" precede (what I calll control flow)
- KS - somethign else, - this part of the arch is completely data agnostic, it is control, understands the workflow, does not care about what data is moved around facilitates the data, serialized set of data and pass it along
- JB - data format agnistic because data is serialized, wotn care b/c it will be an opaque object
- KS - would we want data encrypted as it moves thru?
- JB - could be, doesnt have to be, need is another policy decision, if you feel connex is secure, wont be looked at by some router, then no need but it could
- KS - could be per data call decision
- JB - if encrypted need key management of data
- KS - similar discussion with carrier from ND
- JB - fine to do it, info as well as commms security, est end to end priv and pub keys
- KS - what else? high level funct resp
- JB - UI in that box would look at world state, see what requests out there, do it not necc automated, managed requests that come in (human), some type of login to hosted node to access UI - should UI exist w/in carrier? no keep it in hosted node to use the UI
- KS - UI hosted here in AAS,
- JB - need to log in, need access controls,
- KS - only allowing 1 org here?
- JB - make the most of the privacy aspects of this, hosted node assoc with carrier
- KS - as far as Fabric goes, one organization
- JB - still assoc w/ carrier, modular funct, security and privacy of the channel
- KS - privatre only to the carrier, permissions could be granted to other orgs if carrier decided, access to node is controlled by carrier - hosted node not plugged into cloud,
- JB - IP Addresses and ports - might be worth considering if carrier had people signing on and logging in, could be known to openIDL "who" so we can monitor hosted nodes (access not content) - some monitoring, logging in - simplify access capabilities, use as proxies for permissions to do work in the UI, log in and have cred in the UI as well
- KS - 2 layers of permission: manage node and UI
- JB - diff individulas who use UI, access to hosted node use identities for access to the UI inside the node - no need for mult ids and creds - to be investigated
- KS - current tech stack we could start from, all the tech running would be in this node right now, lay those out as a starting point
- JB - great, preaching to the choir for use in the testnet
- KS: Tech Stack (current)
  - Fabric
  - Kubernetes
  - Node JS / angular
  - AWS
    - mult services)
- KS - it being hosted solution, how much opinion will a carrier want to have about that stack? how would you think you would make your opinions known/how should we govern that stack? data goes thru, do you want to know all? need to know all? any idea HOW it should be governed?
- PA - Carrier X wants to know all
- JB - should be common across all nodes
- KS - what led us to this sep of concerns, does this tech stakc have to follow the policies of each carrier? knew it would run into problems? =- this is a tech stack decided on by the openIDL org, not any one carrier (contributed)
- JB - doesn't have to be part of the CTO standards as it is a hosted node, external interface, never get a stack that agrees with every orgs internal reqs - get agreement
- KS - goes thru the usual gates
- DR - security part they care deeply about, how it functs less so, security is big, had to come open on the stack, long way to build security trust, needs a lot of love for them to every be comfortable, startign from position of "not confident" instead of agnostic
- PA - Carrier X - very into being involved, want to und eveyrthing they are running and connected with, auditability, they are interested in knowing - believe a carrier was "how do you delete, how prove? where does data live at rest? transmitted, positive confimration all data deleted at the end of each cycle - some connected with all aspects
- DH - biz perspective, anything in the box abides by reqs established
- KS - 3 things possible - main concerns, tech stack is a big concern INSIDE, if hosted it is slightly less, very concerned abotu Security and the Privacy of the data
- DR - COST and long term sustainability, important
- KS - while might not have desire to control everything, secure, priv maintained, reaSONABLE cost, governance process
- JB - DH's point about lifecycle of the data, state of the data, keep in mind when data transmit from this node to analytics, uses priv channel, those things written to the cain itself, lifecycle management of the data
- DR - chain is only a piece of it, nothing stopping it from being written from PDC and elslewhere
- JB - PDC is only visible to partner at the other end
- DR - second leaves their node, could be written other places, not a perfect tech solution, good tech solutions, lot of this will be process based - legal agreements, auditability, enhance the tech
- KS - carrier in ND very concerned with this exactly, data sure where it will go, what you said you would do actually happened - visible, auditabl, enforced
- JB - benefits deploy in open source, can be audited as well as the logs, ways to achieve agreement with those convenants
- DR - OSS classic legal doc dump, pretty hard to systemically always check everything, still needs other peices and recourse and audit mechanisms
- JB - which code delployed, procedures can help see whats being used,
- DR - wouldn't over-index on that, 10 year old code has flaws no one knew, def not going to be primacy mechanism
- KS - combo of all those factors, agreements between entities
- PA - stuff we needed, tenets?
- KS
  1. must pass sec standards of all carriers
  2. must meet data privacy expectations of all carriers
  3. cost of running the code is sustainable
- DR - costs should be similar to API costs, not a $1, but commensurate with other services, reasonable
- PA - API is little different, this is data exchange plus cert of data w/ network, central auth validation
- DR - if you consider cost normally paid to stat agent, the tech itself shold not be near that cost - the totality, holistic cost, needs to be better than current state
- KS - stay out of value statement here, overall value may cost more but data priv, might be more important
- JB - bigger pict, efficiencies costs less, fewwer people doing bespoke reports
- DR - still in line with what we expect system to be, maybe problematic at scale, be cog we are building lean infra that is appealing to people
- JB - def keep that analysis in mind, overhead to monitor network, hosted nodes
- DR - watching to keep costs down, managed network is always more costly
- JB - small management staff, low overhead, keep it that way, point in the same direction
- KS - other types of nodes and their responsibilieis, analytics and multi-tenant
- DR - agreement the UI here is extracted data, where carrier resp ends, sounds like confident in tech stack, build this piece, mock up a dataset that fits what we think the EP would be, put it thru the pipes here, next step
- PA - working with Dale, can produce 3 diff sets of test data modifying what he has now,
- DR - stick it thru th plumbing, check the security, hands on keyboard and progress while we look at Extract Engine
- KS - analuytics node - still part of tthis side of the API call, stack for this network, if you have confidence in how that owuld work, attacking that, part of this - go offline and illustrate or do it tomorrow morning, same level, all comfortable, defining the projects to build out the carrier and adapted
- JB - along the lines of making stuff work, area needs thinking, what is the nature of that EP request, in the case of stat data "gimme"
- DR - start with what Peter has, see how it works, idea of what it looks like when done,
- PA - can do 3 json files as 3 mock carriers
- DR - if that confident with NaaS, can point to progress, the EP is the hard part and work on this in parallel
- JB - decision there, when transmitting data across interface, what are the handshakes? serialized, encrypted,
- DR - no persistent listeners, async responses to requests
- DR bigger issues are OPS and SLA maintenance, easier to be responsive than actively polling

27 Sept 2022

KS: Recap of 9/26 discussion
PA - stat plan data comes in, 97 characters, using stat plan handbook, converting coded message, (peter shows data model)
KS - data models, intake model (stat plan - dont want to alter what carriers HAVE to send, flat text based stat plan when it comes in), stuff to the left
DR - do it the right way, all pieces could change over time
PA - not running EP against coded message, keep codes so if you need to do a join
DR - trust PA to set it up, if he has an idea, assume right answer and move onto pieces we dont have a clear idea on
PA - ingestion format is stat plan, HDS format is Decoded
- for relational
  - 1 record per loss and premium
JM - relational db?
PA - totally put it in a Relational DB, suggest 1 table per stat plan, dont need to do weird stuff with mult values per column
KS - claims and prem as well?
PA - 2 tables per line, ea stat plan has prem format and claim format, main thought utilizing SQL lets biz users to contribute more (Mongo is tricky)
KS - make decision it is relational
PA - internal project, if switching to relational has significant implications, when do we want to put that to a formal vote, unpack implications, right now plan is debug first part of auto report, pivot to EPs working with postgresSQL, his team can pivot
DR no strong opinon on which format but doesnt want to throw out work
PA - most of the work has been und how the math behind the report and connect w/ business - has spent time learning JS and Mongo, finish up test with Mongo, can design tables and write EP after
KS - internal planning discussion, JZ wants progress, prove stuff, healthier method
DR - if saying "lets alter" - wants to make sure reasoning is strong, fine with Mongo
JM - showstopper if we dont , legion of people using SQL, need to read EPs, team that manages and processes have deep SQL knowl and coverage but no JSON, biz unit comfort seeing relational form of data, ops team who will support forever very comfortable w/ SQL, can't do Mongo
DR - use Mongo enough
PA - into relational the whole time, wanted to see it thru with mongo
DR - TRV can work with SQL
JB - decide internally, move to relational
DR - want peter to get resources
PA - 2 fridays from now, wants to present THEN rewrite stat plans in Postgres
DR - like postgres (can do JSON blobs directly) - what version?
PA - still considering
DR - Which version JSON blog functionality? find out
JM - postgres favored flavor too
DR - if we do SQL postgres is worth it
KS - vetting internally (AAIS), impacts workstreams
DR - ACTIOn - get Hanover opinion on this stage
JM - denormalized? keep it flat?
DR - like that too
PA - def for the raw tables
Jm -= challenge of flat, redundancy of data
DR - performance in terms of speed or data minimization not a big issue here
DR - node as a service, Peter free range
JM - EP did nothing but SQL in postgres, highly normalized model, minimal connection, plumbing working
PA - easiest over all, need stored procedures
KS - why?
PA - StoredProcs, do EPs in one lang
KS - pass in on API call, cant do with stored Procs
DR - could if you sent in API call and EP and stored as SP and...
PA - Stored Proc - create and destroy temp views, looping, cursors,
KS - stored procs are utility functs not part of EP itself?
JM - zoom out - specifics - what auth do we assume we have over DB? Data management people strict
DR asking for Vers control over DB, what permissions needed, manual requests? 3rd party agent?
KS - if we assume only SQL, someone screams we address it
JM - artifact now - table - can I create/replace (not exactly CRUD), limited # - tables, views, stored procs, etc. - then assume we have the right to create these? EPs can create on the fly? only a table of 7-8 objects, make the grid now - tables and we make em in advance - stored procs on the fly? bigger ask, but have it in this grid
1. 1. Objects as rows: Tables, Views, Stored Procedures, Functions, Triggers.
    What you can do with them: Create in advance, Alter on the fly.
DR - look at as optimizations, utility helper functs? get it working with raw queries, I like stored procs - sanitize query, known responsees, good things about them, dont need them RIGHT NOW
PA - better at SQL than Mongo, depends on how use analyutics node to combine the data, fair bit of process
DR - do I think we will allow Stored Procs be written? No - dont assume initially
PA - mid level processing layer, or whole thing in SQL
DR - willneed midlevel processing long term
JM - EP needs arb code, third party datasets
DR - another thing it does help, simpler to maintain vers control of mid level middleware, some funct/procs than DB itself - this way DB is just datastore, need to keep it fed, moves vers control out a little bit, logical separation
JM as little code in DB as possible, part of challenge - writing code in DB, put it in ETL layer - ETL at Query Time - if you find yourself writing code (stored procs) to get data out..
KS - reality, people put in EP, thats what they can do in the next year, any updates to StoredProcs, will take forever to get approved by eveyrone
DR - extensibility of Stored Procs is weaker
Jm - once DB knows you are selecting, can't drop table in the middle whereas you can with Stored Procs - you can't screw up a db in a select clause but stored proc could blow up db
PA - use middle layer, rest in JS, middle layer in JS and Postgres?
DR - no pref
KS - can we not do this with SQL, can't do this with layers of SQL
JM - views are a logic layer, opp to put v type layer, dont want to overnormalize, can get rid of redundancy w/ aggregate functions, do you restructure your model, query layer or model in need of adjustment
KS - governance model and bility to run code that has to be reviewed by people not just SQL knowl, optimize for somethign to be run by people ok'ing something coming across the wire, run into a lot of ugly SQL
DR - sec architects, they will look at "where is my trust plane", arch whole environ to be secure and only expose DB, only run select statements against DB, more likely to approve something (if something funcky happens outside trust plane, can still protect data easily and will help), if "trust plane is behind DB and expose Data to some procedural code they will raise the bar
PA - instead of everything in one lang and sql,
DR - unfortunate process-gated thing, not the ideal engineering solution but where we are stuck
PA - great discussion, how they look at sec, finish in mongo and do it in postgres, no stored procs, JS to make multiple select queries
JM - dont mind scary looking sql, run mult subqueries and stitch together at bottom,
PA - depends on who the person is reading it, simple objects WITH clause gets harder
JM - do we put VIEW layer on here? is my model correct, team is adamant about keeping flat but can read long sql
DR - plan there, may or may not need ot be middle layer, assume we dont need and only add if PA says he needs it? or assume we need it
PA - middle layer to enhance EP?
DR - third party data enrichment
PA - entire report puts together,
KS - assume we haver reference data nearby or not true? all transactional data?
PA - state names replicate, state codes replicate
JB - consolidation side
JM - reporting logic, shouldnt occur here, not in the adapter -
DR - only getting data over the wire to the servicew
KS - no PII raw data over the wire, some agg happening here
JB - cert have codes and dereference with labels later
JM - where does reference data get resolved?
PA - sourcing ref data from stat handboiok, on load, human readable when going in ETL: dale submits data, stat records go thru ETL, loaded into Relational data store, take code and add column for RI
JB - in order to be human readable by carrier?
KS - less Human readable and more out of AAIS codes and into
JB - standard iso codes better than labels
KS - bring it into something more standard
PA - need to look, believe state codes are NAIC codes
DR - we now have a plan for DB, plan for what happens after adapter done, unless we need it, role of adapter = accept EP,
JM - long term design issue - if we do enrichment stuff at the grain of native data?
KS - a lot of times it will be, have to be able to support it
JM worried that if you run relational query, rule/val prop - run on fine grain data but arb logic from other data sets needs access to fine grained data, then therefore you extract priv data from RelDB, then filter it down
DR - 2 trust planes
- SQL first
- Node - as long as it happens on the node, technically palatable, depends on how implementation looks, do we need quarantine zone, middle layer
JM - opp to review data set NOT JUST EP
DR - biggest design challenge:
JM - human interface, there needs a gate somewhere that says "give a human ability to review data before it goes out the door
DR - still one environ, if passing non agg data over the first trust plane, need secondary stop somewhere
JM - "no humans" is not good, needs review
KS - intention - testable dry run in final product
JM - MVP 1 - dont put it but final product
DR - same page with James, how to make it secure, one thing queries, will require executing foreign code in a runtime, not written by us, more powerful than SQL query
JM - two trust planes, will be post-db
JB - run some carrier side? access to the data
DR - all on the carrier side, wont be in the node, but thing is the hard line in the sand for Sec - no code execution behind the trust plane (sql queries fine)
JM - core deliverables - deck for the DB teams of large companies - selling to security people, will be work
DR - most val artifact that comes out of this, JM/DR tell us what to say
KS - cant accept solution without this box, they can run it or see results, arbitrary code only SQL (read only)
JB - worry - some pattern that requires raw level, in the output
JM - do high risk stuff all the time, but work at it (sharding, encrypting, jump thru hoops)
DR - homomorphic encrypt for all? (laughs)
KS - concern, are we back in "TRV does this, HRT does that?"
JM - need to prove no way to do it simpler, hard fight but make the business case
DR - always an exception process for sec finding, make as simple as possible b/c variances between teams, more and less strict, avoid needing to ask
KS - not all will sign on to the risk or have the will to review it
DR - inconsistency, one carrier agrees to somethign another thinks is risky, will run into, w/in a carrier depends on who reviews case and which CIO (who makes risk call), w/in carrier dont have one standard, decision makers still use judgement (there are still no gos)

10/3/2022

JM - Wed Hartford has 2 new people joining effort, leadership allocated, JM still involved, new person on Arch WG, asked for build team for Q1, Q4 is ramp up time
KS - talked to AAIS about redirecting PA from Mongo to SQL, AAIS is on board, whatever PA is working on should move towards stated direction
KS - relational DB, run queries that are coming from community, talked about scanner
KS - across API layer, all running in carrier, no side effect code like SQL, relational DB, allowed to execute against it to return response, deferring scanner and enrichment - did we say defer test facility
JM - out the door with smallest MVP we can, solid like is core, dotted p2
KS - ETL, submit stat plan, ETL turns into relational structure, EP Processeor executes SQL, across API and API returns results of that
JM - is this good enough to turn into diagramn to show that, MVP: hand nothinjg but sql string over interface, Extract Processor, run it get it back, hand it back to interface have end or end plimbing - this is the core
KS - nuance to execution of SQL, might be more than one sql? pipeline? worth discussin g now? or hope it can all be done in one sql
JB - script? series of sql statements
JM - agree it needs to support multiple queries, how do they communicate? pipelines - how do you get tem to talk to each other?
JB - temporary tables, not modifying (create/destroy) is safer, in addition need to wrap initial investigation of request, step before
JM - is validator on this side or the other side?
KS - put it on both, create SQL, validate on the way in, if SQL dont know how it could
JB - do need to investigate SQL - stat plan can say "gimme report", but other things will require you look at the SQL
KS - going back to MVP, scanner/validator is out of scope,
JB - if all doing now is agreening getting data from stat data, all SQL will addrsss, then scanner/validator do somethign simple, no need for consent dialog until we do the basic stuff
JM - MVP in loosest sense of word or product out the door - can't do product until scanner/validator is done
KS - MVP or POC?
JM - sequennce the lines, 1-4 proves it works, 5, 6, 7 go to the industry - if solid line is #1, what should second be - more about proving own assumptions/proving value to industry?
DH - business perspective, security from going from TRV node thru adapter to analytics node, showing data privacy
KS - lot og reqs, go and pick the ones that drive second level of POC, things required and implemented before we go to production
JM - enrichment 4th on my list, of business-y things, scanner-validator says "ok?", test validation says "run 100s of rules to prove it is what you think" or Enrichment?
DH - prove my data is protected
KS - the scanner - #s 1 and 2 show security, DH wants to show rest of the flow and across nodes we make sure the stuff is going the right way over analyutics node
DH - basic plumbing
JM - end to end plumbing? least I need to do to prove it
KS - briunging an arch perspective here, 1st thing: does the plumbing work, 2nd can we install in a carrier - imp it will be acceptable in a consistent way across carriers - what we work on here can abstract or work concretely so everyone can run it
JM - Enrichment is scariest
KS - needs robust plugin capability or external data model - trust and maintenance are hard, diff timeline than EPs
JM - stop at dotted line and focus on solid lines, all we are gonna do
KS - step 1 solid lines, step 2 end to end plumbing, step 3 up for discussion but for KS "does this work in yoru carrier node"
JM - focus on solid line
JM - get mult sql problem, need mult sqls, couple ways to solve this, argue this executes out of schema with no data in it, schema that carries data, give s a level of grant writes to DBA team - ask what is the set of writes asking for in sub schema
KS - can we est a sandbox DB where the EP works, updats, creates tables as necc
JM - can say that is a design principle and allow implementor to do either - golden -
KS - sep schema
JM - pros and cons of each, takes fight out of DBAs
HDS Schema, EP Schema
JM - separating easier to ask for things - default: want to create views on the fly, idea of mult sqls, parsing behaviors, smart enough, put it anyway
KS - could be a collection of strings
JM - how interacting with each other
KS - first can retunr, second consume, you test it - sql string updates intermediate table with data, all yo uhave is a coreogrqapher (run first, point second at results)
JM - intermediate table, SQLs comm with each other, creation of complimentary tables - DBAs protect data - ask for "Create Views" auth OR temporary tables (sometime dont have robust you want), ony talking about views, temp tables or phys tables
JB - creation of indicies
JM - assumed, agree not to use Stored Procs, comfortable saying "want schema, in schema have these rights, agree to flush data (flushing policy)
KS - drop the whole DB
JB - temp tables when connection closed by default, but drop statements good to do
JM - might be worth asking for matrialized views
JB - temp tables for intermediate results
KS - keep saying not time bound, if we have to do more X not a prob for performance
JM - if this is a long list we have to ask DBAs for
KS - assume "yeah", go ahead and create schema, inside POC, extract processor remove/create tables
JM - grant grant grant - pretty minimal,
KS - table creates and stuff
JM - phys table question on there, more pushback, flushed data is question mark
JM might find optimization opportunities
JB - consistency issues, better to treat like workspace and flush it
JM - no phys tables for now, materialize views?
KS -seems like EP has more complicated interaction diagram, interaction diag between 4 components?
<Live diagramming>
JM - view is a mech to make one sql depndent on another sql, materialize view "love the view but need performance", phys tables fix perf issue but need in advance - want to go as far as we can to
KS - running first sql, get results, where kept?
JM - temp table, self-destruct at end of session
KS -DDL?
JM - easier ask
KS - go0ignm to have to describe the structure of results so temp table can hold em, has to be done before run first query
JM - draw out one query problem - EP should have arrow to postgres SQL and say "SQL", assume it does, will then (logical not phys) will largely read from HDS, retrieve data, postgres to EP schema
KS - somethign in SQL says "select x from this table " returns result, and persistning temp table?
JM - retrive from HDS should be enough, in any non trivial case results go in temp table, if we assume we wrote temp table, the Extract Processor runs retrieve

Time	Item	Who	Notes

Space shortcuts

Page tree

Discussion notes: