- Smart Cities
- Internet of things
- Massive Data processing
- Fraud and Risk
Big Data and Real Time data processing
Areas of application
- Banking and Insurance
- Public administration
- Industry and Energy
Capture, obtain and process huge amounts of information both in batches and in real time.
Ensure the quality and adequacy of the information, avoiding losses or duplicities.
Respect security requirements in integration with other systems, storage and con dentiality.
We use the Spark suite as appropriate and rely on Kafka to ensure there is no loss of information. In addition, we try to make scalable and real time solutions whenever is possible.
The environment can grow by including new sources and capabilities in the analysis layer, generating valuable information through the implementation of models and algorithms.
A big data architecture with horizontal scaling capabilities aimed at managing both real- time and batch information, based on Spark as a core element of the project.
The solution approach is based on the use of the Spark suite as appropriate depending on the case, Spark core for batch processing, Spark Streaming for real time processing and on the other hand Spark SQL to be able to connect with other applications and study the data. We rely on Kafka to ensure that there is no loss of information, receiving the packages and distributing them according to the relevant criteria. We try to make the solutions scalable, if possible in real time and with hot swap, an example of this is the assembly of the kafka cluster with zookeeper, distributed in a set of nodes expandable at all times with new nodes to scale the service.
The result of the project entails the implementation of a big data architecture with horizontal scaling capabilities aimed at managing both real-time and batch-based mass information, based on spark as a core element of the project. The architecture
is also scalable in terms of the number of machines and incorporates capabilities that add security and integrity to the processed when working in environments where the disposal of information is not acceptable. From this point on, the environment can grow, including new sources as well as capabilities in the data analysis layer, facilitating the work of the data scientists team or generating value information through the implementation of models and algorithms.