Case study

Financial Institution Associating Banks

The client, a financial institution associating various types of banks such as corporate, universal, and investment, approached us with a visionary goal. They aspired to revolutionize their data infrastructure to enhance AI capabilities for forecasting, optimizing processes, and making informed business decisions.

The financial institution grappled with scattered data across various processes. That is why they recognized the need for a unified, cloud-centralized, and reliable data source that could house all the diverse data sets critical to their operations, as well as exploit this data for AI computation like forecasting, optimizing and decision-making.

To solve the given problem, the company needed a specialist in the data sector of informatics who could not only understand their complex requirements but also design a robust system capable of managing and harnessing the potential of their extensive data.

Our team undertake the challenge by proposing a comprehensive solution – the development of a data lake. This data lake, designed as a centralized repository, allows the client to seamlessly store structured and unstructured data at any scale. The key innovation was the utilization of a raw format, ensuring flexibility and adaptability to evolving data needs. Thanks to data lake, organized in a demanded manner, it provides a great background data source for further analysis to address forecasting, optimizing and business decision-making.

Technologies used:

Apache Hadoop




Two years duration (from June 2019 to the beginning of July 2021)
Text Link


Bussines need

Financial institutions aimed to elevate their business prowess through strategic enhancements in the following domains:

  • Strategic Forecasting Engine: a forecast machine predicting the number of accounts opened, loans sold, and other financial operations.
  • Fortified Security Frameworks: reinforcing the foundations of security by implementing advanced security structures, ensuring the safeguarding of sensitive data and financial assets.
  • Innovative Customer Identification and Authentication Models: identification and authentication of customers and users, enhancing the user experience while maintaining robust security protocols.
  • Micro-loan opportunities: providing a dynamic and accessible financial solution for a broader spectrum of clients.
  • Investment Banks' Predictive Tool: empowering the institutions with the ability to predict how much money they will collect versus how much more they can invest in other ventures.
  • Future functionalities: anticipating the new needs of the financial landscape, a series of innovative functionalities are on the horizon.

The core issue of scattered and unorganized data was addressed through the implementation of the data lake. The bank wanted to collect the following data from its various data sources and combine as a central database for further analysis, as well as feed a dedicated AI forecasting machine with:

  • User payment history
  • Biometric data
  • History of opening bank accounts
  • Credit/Loan history
  • Security bank structure

By structuring and organizing data in a manner aligned with the client's demands, we created a powerful background data source. This data lake, tailored to their specific requirements, became the linchpin for subsequent analyses, fuelling forecasting, optimizing processes, and facilitating informed business decision-making.

Services we did for them

Our initial approach involved a meticulous audit of the financial institution's existing data landscape, its sources, and finding possible ways to connect with these systems. Following the data audit, our team embarked on a detailed and careful design phase.

Secondly, the development phase. It was executed with a commitment to leveraging the most suitable and cutting-edge technologies available in the data sector. From database management systems to cloud infrastructure, we employed best practices to ensure the efficiency, scalability, and security of the data lake.

The journey from conception to execution involved team not only in designing the system, but also managing the entire development process. Our specialists ensured that the data lake not only met but exceeded the expectations of the financial institution, providing them with a cutting-edge tool for leveraging AI in their operational processes.

Specialists involved in the project

Additional Roles

Data Steward
Security Specialist
Infrastructure Specialist
DevOps Engineer

Key Roles

Data Architect
Data Scientist
Data Engineer
Business Analyst

Implementation Process

Study of the usage cases of the data

Firstly, the team allocated some time for consideration of what possible usage cases of the data are available despite the set already required by the company.

Installing hardware infrastructure

Launch of 4 servers for the whole solution (three main units and one for backup).

Installing HDFS on the infrastructure

Installing Hadoop infrastructure on three servers with one extra.

Implementing data architecture

Implementing data architecture for 

Transfer of data from existing sources to the HDFS

Some data had to be directly and comprehensively transferred to the Data Lake database.

Developing interfaces for new data input

Preparing input interfaces for HDFS from different working DBs and DWHs.

Data integration within the data lake

Integration of data which already was inside the lake - removing repetitions, developing a self-control system.

Developing the interfaces for data for the users and other client apps

Developing the interfaces for human users as well as client apps which were supposed to exploit data within the Data Lake.

Developing data organization protocol when data demand occurs

Developing the protocol of data organization and transfer from the Data Lake to the target application or human user, regarding the needs.

Security application

Designing and implementing security for all the human users across the company as well as all the apps exploiting the data in the Data Lake.

Solution documentation

Preparing a documentation of processes and components involved within the solution.

Further maintenance of the Data Lake

Adding new data sources as the company grows

Implementing new features concerning the data collected, integrating the system with new apps within the company.


Strict security policy:

In general, banks and banking institutions follow very strict rules of security. The data stored in their databases concern private and corporate funds, hence the security requirements are high and clearly stated. That caused complex issues in terms of security.

Multiple unintegrated and unstandardized data sources:

The data among the banks were unstandardized and spread out between multiple sources. Learning the structure and designing the new one to wholly integrate it was troublesome at the beginning.

Integrating Hadoop with banking hardware architecture:

Hadoop, generally speaking, is a complicated but reliable infrastructure to implement data storage DBs, data warehouses and data lakes. The institution had to allocate a common hardware to store the data. Our job was to connect their data sources to the newly established infrastructure, which was more complex than it seemed so.


As a team of data engineers we were taking part in the data lake development, from its start to its end, we were following some very particular assumptions

Here are some insight tips which could enhance the development process:
Adam Mata
Data Engineer

In my opinion, it is great to take into consideration all the possible data sources available within the company to create the Data Lake. We would use the data which was not involved in any crucial process, just to have it for further use when needed. At the end of the project it gave us a way to develop new analytics features which came out to be very source saving for the company.

According to my experience, it is worth a while to spare some time on security only as long as the Data Lake is finished constructing to have all the areas covered.

It is always worth thinking a couple of steps forward while developing, due to the complexity of the whole process and project. Some of the structures were not designed general enough and as the project proceeded it turned out that some of the data may be stored together within one relation/instance.

Do not underestimate communication within the team – it turned out crucial in planning some steps during development. The problem of the data storage was a result of the lack of communication between data scientists and developers.

Project in numbers


data scientists / developers


sources integrated


cutting-edge Data Lake


technologies & programming languages

Got a project in mind?
Feel free to ask questions or see our handbook