Data Preparation

The large datasets that point clouds constitute cannot directly fit in the main memory, demanding a new structure to exploit through a Database Management System (DBMS). While the data heterogeneity rises, existing DBMS still rely on a limited number of data models to manage efficiently the variability and redundancy of the amount of observations. Handling efficiently these massive unstructured datasets (heterogeneous and from different sources) demands high scalability, speed (when data must be processed / mined in a near or real time manner) and computational adaptation (cloud computing) to answer specific needs. This relates to Big Data problematics, or how to efficiently process big semi-structured / unstructured datasets. While disclosing existing data mining limitations, Big Data mining techniques introduce new challenges due to the high volume and heterogeneity of the massive datasets. Finding hidden pattern and information for knowledge discovery requires complex multimodal systems. The limited dimensionality support in geographic information systems (GIS) to k-dimensional data with k ≤ 3 struggles for indexing higher dimensionality. Therefore, data interaction needs flexibility and scalability for different tasks: processing, data management and visualisation. To solve these challenges, spatial indexing and storage is essential. It should be able to scale up to multiple servers, be optimized for sequential and parallel disk access or for CPU/GPU intensive tasks. Relational Database Management Systems (RDBMS) and NoSQL DBMS for such application exists, but we confront several identified problems.

Leave a Reply

Your email address will not be published. Required fields are marked *