Wednesday, March 14, 2018

Schema less DB for Universal Stage using Mongo DB & Alteryx

Hey All

It’s been long time since i blogged. So today I am here with the building and demonstration of an Universal STG for different data sources using Schema less DB - MONGO DB which is a life saver in the Data Integration area.

Problem:

In conventional data integration process, relational DB schema is being used as the staging layer for GW PC XML data feed. A STG layer is always subjected to changes with respect to the data structure or business requirements in an agile execution. It can also have ‘n’ number of sources when future compatibility is considered. Considering the above scenarios , the ease of flexibility to changes , wide variety of data  are very complex if the STG layer is a relational data base. Lead time for a real time analytics is also high.


What We are trying to implement here:

The XML feed is loaded into a schema less DB after being converted into JSON. In this case Mongo DB being a document DB, the degree of flexibility to changes is very high and schema migration process is made very simple.

For this POC the below two scenarios are implemented viz;

1)Loaded  STG PLCY PRD and STG_PRTY_AGMT_ROLE  and STG PRSN tables in the STG Layer( this schema is a relational stg Layer with no constraints)

2)Loaded  AGMT , PRSN and PRTY_AGMT tables in the ODS layer directly from Mongo DB( a sample Star schema in the core ODS layer)

Technology Used:

Mongo DB, RoboMongo, XML to Json Converter, Alteryx, SQL server

Detailed Solution Steps with Screenshots:


STEP 1:


  • Download the mongodb and robomongo in your local machine.
  • Create a folder named “data” in C drive.
STEP 2:
  • In command prompt, mention the   ”mongod.exe” file location and click enter.


  STEP 3:
  • start mongod.exe



STEP 4:


  • Open Robomongo.exe file

Step 5:


  • Set new connections and save.



     
  • Then click connect






Step 6: Import json files to mongo dB


  • Paste the mongod.exe file  and json file location along with
Mongo import --db startup_log --collection test –file.






  Step 7:


  • In the robomongo page
  • Go to           Startup_log dB      Collections
  • So we can view the imported data.



Step 8:


  • In the robomongo page we can view data


Step 9: Load Mongodb data to STG DB/ODS using Alteryx


Setup source DB connection from Mongo DB
  • Launch alteryx, select  ‘connectors’ ‘from tool pallet’
  • Drag ‘mongodb input’ to workflow
  • Set the server name as ‘localhost’,
       Database: ‘startup_log’
       Collection: ’test’




Setup target DB connection to STG/ODS relational DB


  • Select ‘In/Out’ from tool palate and drag ‘Output Data’ to it.
  • Double click the output data object and configure the target db connection.


  • Set the file location and also ‘Update, insert if new option ’option as Output options
  • Do the custom data mapping from source to target


  • Save and run the workflow.


  • For this POC the two scenarios were tried out :
1) Loaded STG PLCY PRD and STG_PRTY_AGMT_ROLE , STG PRSN were
loaded in the STG Layer( This schema is a relational stg Layer with no constraints)

2) Loaded AGMT , PRSN and PRTY_AGMT were loaded in the ODS layer directly from Mongo DB( a sample Star schema in the ODS)

Step11: Viewing results in dB (SQL)


AGMT:
Updated AGMT:  updated PRD_END and PRD_STRT
Inserted new ID: -   pc: 79


PRSN:


Updated FRST_NAME and LST_NAME for pc: 1078 and pc: 1079
Inserted new ID: -   pc: 1080
PRTY_AGMT:


Updated SBTYP for pc: 166
Inserted new ID: -   pc: 168




3 comments:

  1. It was really a nice post and i was really impressed by reading this
    Big Data Hadoop Online Training

    ReplyDelete
  2. I read your post. It is very informative and helpful to me. I admire the message valuable information you provided in your article.
    power bi training

    ReplyDelete