WDC-Platform General Description
WDC-Platform Components
Architecture
Visualization
The WDC-Platform implements the concept of alternative "out of the box" visualization with built-in support for Apache Zeppelin, Splunk Search Head and our own environment - EVA.
EVA is an interactive visualization environment developed by ISGNEURO. It is unique because it combines imperative and declarative approaches to organizing user interface description and allows users to implement the complex interactive dashboards logic without any programming.
EVA is designed for preliminary interactive data research and it is also the simplest tool for creating high-level final dashboards.
EVA consists of:
Apache Zeppelin is a popular Apache open source solution designed to implement a single interface for visualizing data obtained as a result of its analysis by various tools.
EVA is designed for preliminary interactive data research and it is also the simplest tool for creating high-level final dashboards.
EVA consists of:
- A data analysis tool with the option to export analysis results as ready-to-use dashboards.
- Dashboard management environment.
Apache Zeppelin is a popular Apache open source solution designed to implement a single interface for visualizing data obtained as a result of its analysis by various tools.
Modeling
The modeling functionality in the WDC-Platform is provided in two specialized data processing languages: OTL and SMaLL.
OTL is a high-level data processing language. It was originally designed on the basis of Splunk's SPL language and is backward compatible with SPL for most of its features.
The OTL language is a highly efficient solution to a number of processing tasks, especially in machine-generated logs.
Main functions of the language are:
SMaLL (Simple Machine Learning Language) is language that extends the capabilities of OTL and allows users to use all the features of the WDC-Platform without machine learning techniques or data engineering knowledge.
The main idea of SMaLL is to reduce the solution of most analytical problems to three standard fixed command pipeline sequences - patterns:
Such approach allows to hide the processing data routine details from the end user and concentrate his/her efforts on achieving the final result.
The unique feature of SMaLL is the obligatory Explain command, which interprets the machine learning model and presents it in a human-understandable form. This allows not only to understand and adjust the resulting model, but also bring experts' knowledge into it, which is unavailable in a particular training dataset.
The OTL language is a highly efficient solution to a number of processing tasks, especially in machine-generated logs.
Main functions of the language are:
- full-text search and filtering on all message content;
- selection and search by selected fields;
- aggregation;
- union;
- data enrichment from external or internal sources;
- calculation of statistical indicators;
- mathematical data transformations;
- data preparation for specific types of visualization;
- train and apply machine learning models from a wide range of open source libraries.
SMaLL (Simple Machine Learning Language) is language that extends the capabilities of OTL and allows users to use all the features of the WDC-Platform without machine learning techniques or data engineering knowledge.
The main idea of SMaLL is to reduce the solution of most analytical problems to three standard fixed command pipeline sequences - patterns:
- Model training pattern (GetData | Fit | Explain | Score | Show).
- Model application pattern (GetData | Apply | Score | Show).
- Directories pattern (GetData | Eval | Put).
Such approach allows to hide the processing data routine details from the end user and concentrate his/her efforts on achieving the final result.
The unique feature of SMaLL is the obligatory Explain command, which interprets the machine learning model and presents it in a human-understandable form. This allows not only to understand and adjust the resulting model, but also bring experts' knowledge into it, which is unavailable in a particular training dataset.
Data Processing
WDC-Platform computing capabilities are based on Apache Spark engine, which provides efficient parallel computing implementation and fast access to information distributed across different storage points. In addition to direct computing, Apache Spark allows users to apply and expand system capabilities through the constantly evolving set of machine learning libraries created by the product community members.
The SMaLL and OTL languages are add-ons to the Apache Spark functionality and provide significant entry barrier reduction to using the capabilities of this tool.
Data Storage
The No DataBase concept is the main idea for data storage organization in the WDC-Platform. This concept allows not to build a system on the basis of one or several databases specialized for concrete tasks, but to implement it as an efficient distributed storage that provides the required performance level, reliability and scalability. A hybrid database, based on the calculation of all required for effective work with specific index data and aggregation structures, is implemented on top. This allows users to inherit the advantages of all efficient storage structures and minimize the impact of their disadvantages.
The current storage subsystem implementation uses the GlusterFS distributed file system, and the Apache Parquet format is used as the optimal format for storing data in a hybrid database.
This approach guarantees high horizontal scaling opportunities while increasing stored data volume and high reliability of the final system.
If it becomes necessary to process especially large volumes of historical data (> 0.5 PB), it is possible to transparently replace GlusterFS with the HDFS file system (Apache Hadoop).
This approach guarantees high horizontal scaling opportunities while increasing stored data volume and high reliability of the final system.
If it becomes necessary to process especially large volumes of historical data (> 0.5 PB), it is possible to transparently replace GlusterFS with the HDFS file system (Apache Hadoop).
Data Loading
Loading data from different sources is provided by Apache NIFI. With the help of NIFI, incoming data is pre-processed, all the indices and aggregates required for company effective operation are calculated. They are distributed to storage devices afterwards.
Installation and Maintenance
The WDC-Platform distribution kit is delivered as a virtual machine in OVF format. Detailed installation and launch process is described in the corresponding section of full documentation.
By installing WDC-Platform, the user accepts the License Agreement.
By installing WDC-Platform, the user accepts the License Agreement.
Maintenance of the WDC-Platform software is carried out by the developer - the ISGNEURO company.
Updating individual independent components that are part of the WDC-Platform, outside the general update of the entire distribution kit, is not expected, as it may disrupt the operation of the entire set due to the possible customization of these components in the development process.
Entire distribution updates and individual critical fixes are distributed only by the developer. In the case of individual components self-updating, modified installation continues to be supported by the developer, but according to a special SLA that does not imply interference with parts of the System changed by the user.
The developer does not have hidden remote access to the installations of the Platform.
Updating individual independent components that are part of the WDC-Platform, outside the general update of the entire distribution kit, is not expected, as it may disrupt the operation of the entire set due to the possible customization of these components in the development process.
Entire distribution updates and individual critical fixes are distributed only by the developer. In the case of individual components self-updating, modified installation continues to be supported by the developer, but according to a special SLA that does not imply interference with parts of the System changed by the user.
The developer does not have hidden remote access to the installations of the Platform.
Development and deployment of applications based on the WDC-Platform
The WDC-Platform provides the complete application development opportunity, which contains description of collecting, processing and visualization of information in one place. Applications include:
- data sources description unique to the application and rules for extracting fields from them;
- search queries set used in the application;
- visual components, dashboards, reports, charts, alerts, enrichment rules;
- meta-information, such as version and application name, visual style, icons, list of dependencies (other applications, external libraries).