Professional Documents
Culture Documents
IBM® Big SQL is a data warehouse system for Hadoop that you use to summarize, query, and
analyze data. Big SQL provides SQL access to data that is stored in InfoSphere®
BigInsights™ by using JDBC or ODBC connections.
Big SQL provides broad SQL support that is typical of commercial databases. You can issue
JDBC queries or ODBC queries to access data that is stored in InfoSphere BigInsights, in the
same way that you access databases from your enterprise applications.
Big SQL is used to issue queries of any size, including concurrent queries. Big SQL provides
support for large ad-hoc queries by using MapReduce parallelism and point queries, which are
low-latency queries that return information quickly to reduce response time and provide
improved access to data.
The Big SQL server is multi-threaded, so scalability is limited only by the performance and
number of cores in the computer that runs the server. If you want to issue larger queries, you can
increase the hardware performance of the server computer that Big SQL runs on, or chain
multiple Big SQL servers together to increase throughput.
Big SQL head
The set of Big SQL processes that accept SQL query requests from applications and
coordinate with Big SQL workers to process data and compute the results.
Big SQL head node
The physical or virtual machine (node) on which the Big SQL head runs.
Big SQL worker
The set of Big SQL processes that communicate with the Big SQL head to access data
and compute query results. Big SQL workers are normally collocated with the HDFS
DataNodes to facilitate local disk access. Big SQL can access and process HDFS data
that conforms to most common Hadoop formats, such as Avro, Parquet, ORC, Sequence,
and so on. The Big SQL head coordinates the processing of SQL queries with the
workers, which handle most of the HDFS data access and processing.
Big SQL worker node
The physical or virtual machine (node) on which the Big SQL worker runs.
Big SQL scheduler
A process that runs on the Big SQL head node. The scheduler's function is to bridge the
RDBMS domain and the Hadoop domain. The scheduler communicates with the Hive
metastore to determine Hive table properties and schemas, and the HDFS NameNode to
determine the location of file blocks. The scheduler responds to Big SQL head and
worker requests for information about Hadoop data, including HDFS, HBase, object
storage, and Hive metadata.
Big SQL metadata
HDFS data properties such as name, location, format, and the desired relational schemas.
This metadata, gathered through the scheduler, is used to enable consistent and optimal
SQL processing of queries against HDFS data.