What are the steps involved in big data solutions?
by Aarushi Sharma Human Resource Executivei) Data Ingestion — The foremost step in deploying big data solutions is to extract data from different sources which could be an Enterprise Resource Planning System like SAP, any CRM like Salesforce or Siebel , RDBMS like MySQL or Oracle, or could be the log files, flat files, documents, images, social media feeds. This data needs to be stored in HDFS. Data can either be ingested through batch jobs that run every 15 minutes, once every night and so on or through streaming in real-time from 100 ms to 120 seconds.
ii) Data Storage — The subsequent step after ingesting Big data is to store it either in HDFS or NoSQL database like HBase. HBase storage works well for random read/write access whereas HDFS is optimized for sequential access.
iii) Data Processing — The ultimate step is to process the data using one of the processing frameworks like mapreduce, spark, pig, hive, etc.
Sponsor Ads
Created on Nov 15th 2019 04:32. Viewed 317 times.