Hey All
It’s been long time since i blogged. So today I am here with the building and demonstration of an Universal STG for different data sources using Schema less DB - MONGO DB which is a life saver in the Data Integration area.
Problem:
In conventional data integration process, relational DB schema is being used as the staging layer for GW PC XML data feed. A STG layer is always subjected to changes with respect to the data structure or business requirements in an agile execution. It can also have ‘n’ number of sources when future compatibility is considered. Considering the above scenarios , the ease of flexibility to changes , wide variety of data are very complex if the STG layer is a relational data base. Lead time for a real time analytics is also high.
What We are trying to implement here:
The XML feed is loaded into a schema less DB after being converted into JSON. In this case Mongo DB being a document DB, the degree of flexibility to changes is very high and schema migration process is made very simple.
For this POC the below two scenarios are implemented viz;
1)Loaded STG PLCY PRD and STG_PRTY_AGMT_ROLE and STG PRSN tables in the STG Layer( this schema is a relational stg Layer with no constraints)
2)Loaded AGMT , PRSN and PRTY_AGMT tables in the ODS layer directly from Mongo DB( a sample Star schema in the core ODS layer)
Technology Used:
Mongo DB, RoboMongo, XML to Json Converter, Alteryx, SQL server
Detailed Solution Steps with Screenshots:
It’s been long time since i blogged. So today I am here with the building and demonstration of an Universal STG for different data sources using Schema less DB - MONGO DB which is a life saver in the Data Integration area.
Problem:
In conventional data integration process, relational DB schema is being used as the staging layer for GW PC XML data feed. A STG layer is always subjected to changes with respect to the data structure or business requirements in an agile execution. It can also have ‘n’ number of sources when future compatibility is considered. Considering the above scenarios , the ease of flexibility to changes , wide variety of data are very complex if the STG layer is a relational data base. Lead time for a real time analytics is also high.
What We are trying to implement here:
The XML feed is loaded into a schema less DB after being converted into JSON. In this case Mongo DB being a document DB, the degree of flexibility to changes is very high and schema migration process is made very simple.
For this POC the below two scenarios are implemented viz;
1)Loaded STG PLCY PRD and STG_PRTY_AGMT_ROLE and STG PRSN tables in the STG Layer( this schema is a relational stg Layer with no constraints)
2)Loaded AGMT , PRSN and PRTY_AGMT tables in the ODS layer directly from Mongo DB( a sample Star schema in the core ODS layer)
Technology Used:
Mongo DB, RoboMongo, XML to Json Converter, Alteryx, SQL server
Detailed Solution Steps with Screenshots:
STEP 1:
- Download the mongodb and robomongo in your local machine.
- Create a folder named “data” in C drive.
STEP 2:
- In command prompt, mention the ”mongod.exe” file location and click enter.
STEP 3:
- start mongod.exe
STEP 4:
- Open Robomongo.exe file
Step 5:
- Set new connections and save.
- Then click connect
Step 6: Import json files to mongo dB
- Paste the mongod.exe file and json file location along with
Mongo import --db startup_log --collection test –file.
Step 7:
- In the robomongo page
- Go to Startup_log dB Collections
- So we can view the imported data.
Step 8:
- In the robomongo page we can view data
Step 9: Load Mongodb data to STG DB/ODS using Alteryx
Setup source DB connection from Mongo DB
- Launch alteryx, select ‘connectors’ ‘from tool pallet’
- Drag ‘mongodb input’ to workflow
- Set the server name as ‘localhost’,
Database: ‘startup_log’
Collection: ’test’
Setup target DB connection to STG/ODS relational DB
- Select ‘In/Out’ from tool palate and drag ‘Output Data’ to it.
- Double click the output data object and configure the target db connection.
- Set the file location and also ‘Update, insert if new option ’option as Output options
- Do the custom data mapping from source to target
- Save and run the workflow.
- For this POC the two scenarios were tried out :
loaded in the STG Layer( This schema is a relational stg Layer with no constraints)
2) Loaded AGMT , PRSN and PRTY_AGMT were loaded in the ODS layer directly from Mongo DB( a sample Star schema in the ODS)
Step11: Viewing results in dB (SQL)
AGMT:
Updated AGMT: updated PRD_END and PRD_STRT
Inserted new ID: - pc: 79
PRSN:
Updated FRST_NAME and LST_NAME for pc: 1078 and pc: 1079
Inserted new ID: - pc: 1080
PRTY_AGMT:
Updated SBTYP for pc: 166
Inserted new ID: - pc: 168
It was really a nice post and i was really impressed by reading this
ReplyDeleteBig Data Hadoop Online Training
thanks for the valuble information about
ReplyDelete,Online courses
I read your post. It is very informative and helpful to me. I admire the message valuable information you provided in your article.
ReplyDeletepower bi training