
Full text loading...
In this era of big data, five Vs (volume, variety, value, veracity, velocity) are important and challenging to handle in the industry for data analytics. The increasing amount of data accumulated across time from various sources are often the key concern in every data-science-related study. In our study in generating a well underperforming data analytics dashboard, we encountered a few challenges in consuming the big data, because of the request rate limit, data frequency inconsistency, and authentication limits from the database application programming interface (API) provider. These challenges impose incompleteness of retrieved data, slowness in retrieving data, and failure in automating data retrieval in regular basis. Thus, in this study, we are proposing several performance optimization techniques to enable faster data processing and analysis of large-scale datasets. We achieved a significant performance enhancement in building data pipelines starting from data consumption to publication, after implementing a combination of data partitioning and multiprocessing techniques. These approaches were validated using the real-time production data of 29 oil fields and demonstrated the potential for reducing the data retrieval and processing time. This finding would have significant implication in dealing with massive amounts of data and gives us broader implication on data analytics capabilities.