Publications

Publications

Title: An Effective and Scalable Data Modeling for Enterprise Big Data Platform


 

Abstract:

The enormous growth of the internet, enterprise applications, social media, and IoT devices in the current time caused a huge spike in enterprise data growth. Big data platform provided scalable storage to manage enterprise data growth and served easier data access to decision-makers, stakeholders and business users. It is a well-known challenge to classify, organize and store all this data and process it to provide business insights. Due to nature, variety, velocity, volume and value of data make it difficult to effectively process big data. Enterprises face challenges to apply complex business rules, to generate insights and to support data-driven decisions in a timely fashion. As big data lake integrates streams of data from a bunch of business units, stakeholders usually analyze enterprise-wide data from various data models. Data models are a vital component of Big data platform. Users may do complex processing, run queries and perform big table joins to generate required metrics depending on the available data models. It is usually a time consuming and resource-intensive process to find the value from data. It is a no-brainer that big data platform in the enterprise needs high-quality data modeling methods to reach an optimal mix of cost, performance, and quality. This paper addresses these challenges by proposing an effective and scalable way to organize and store data in Big Data Lake. It presents some of the basic principles and methodology to build scalable data models in a distributed environment. It also describes how it overcomes common challenges and presents findings.

 

Keywords:

Big Data, Big Data Lake, Scalable Data Modeling, Hadoop, Spark, Business Intelligence, Big Data Analytic


Conference: 2019 IEEE International Conference on Big Data (Big Data)


Title: Mid-Tier Models for Big Data


 

Abstract:

With the rise of Big data, enterprises started accumulating significantly more data than they consume. Big Data lake made data consumption easier for all stakeholders, analysts, and developers. Variety, volume, and velocity of data and complexity of businesses added complexity in processing, organizing, and storing data to serve analytical solutions on a timely basis. It is often a big challenge for enterprises to cleanse, organize, classify, and store big data so that insights are accessible on time. Data consistency will also come into the picture when multiple data models define similar metrics. As numerous data sources are integrated into a single platform, stakeholders often analyze data from various subject areas. It leads to complex queries resulting in big joins and more processing power. Even with cheap storage and more processing power of Hadoop and big data technologies, modeling big data is a time-consuming and error-prone process. This paper addresses that challenge by introducing mid-tier models for big data. It discusses a novel data modelingMid-Tier models approach to organize and store big data in distributed storage. It outlines how it overcomes some of the challenges and showcases an example.

 

Keywords:

Enterprise Data Models, Mid-Tier Data Models, Big Data Lake, Dimensional Models, Big Joins, Hadoop, Spark


Conference: 2019 IEEE 5th International Conference on Big Data Intelligence and Computing (DATACOM)


Full Text

Title: Bridging Data Silos Using Big Data Integration


 

Abstract:

With cloud computing, cheap storage, and technology advancements, an enterprise uses multiple applications to operate business functions. Applications are not limited to just transactions, customer service, sales, finance but they also include security, application logs, marketing, engineering, operations, HR and many more. Each business vertical uses multiple applications which generate a huge amount of data. On top of that, social media, IoT sensors, SaaS solutions, and mobile applications record exponential growth in data volume. In almost all enterprises, data silos exist through these applications. These applications can produce structured, semi-structured, or unstructured data at different velocity and in different volume. Having all data sources integrated and generating timely insights helps in overall decision making. With recent development in Big Data Integration, data silos can be managed better and it can generate tremendous value for enterprises. Big data integration offers flexibility, speed, and scalability for integrating large data sources. It also offers tools to generate analytical insights which can help stakeholders to make effective decisions. This paper presents the overview on data silos, challenges with data silos and how big data integration can help to stun them.

 

Keywords:

Data Silo, Big Data, Data Pipelines, Integration, Data Lake, Hadoop


Journal: International Journal of Database Management Systems


Full Text


Click For Citation

Title: Overcoming Data Silos Through Big Data Integration


 

Abstract:

With cloud computing, cheap storage and technology advancements, an enterprise uses multiple applications to operate business functions. Applications are not limited to just transactions, customer service, sales, finance but they also include security, application logs, marketing, engineering, operations, HR and many more. Each business vertical uses multiple applications which generate a huge amount of data. On top of that, social media, IoT sensors, SaaS solutions, and mobile applications record exponential growth in data volume. In almost all enterprises, data silos exist through these applications. These applications can produce structured, semi-structured, or unstructured data at different velocity and in different volume. Having all data sources integrated and generating timely insights helps in overall decision making. With recent development in Big Data Integration, data silos can be managed better and it can generate tremendous value for enterprises. Big data integration offers flexibility, speed and scalability for integrating large data sources. It also offers tools to generate analytical insights which can help stakeholders to make effective decisions. This paper presents the overview on data silos, challenges with data silos and how big data integration can help to overcome them.

 

Keywords:

Data Silo, Big Data, Data Pipelines, Integration, Data Lake, Hadoop


Journals:


Full Text

Title: Stimulate ML Development using Feature Store


Abstract:

The state of ML development didn’t change much since the last few years. Data scientists and developers spend 80% of their time in data wrangling. It not only impedes the ML application development but it also negatively impacts overall consumer satisfaction and enterprise decision making. This article sheds light on a way to overcome these challenges

.

Access:


About HypeRight:

Hyperight is an international event service provider focusing on creating network-oriented and crowdsourced business events. It is committed to providing premium content and unique customer experience through a unique platform where data practitioners from around the world come to learn.