You are on page 1of 41

Splunk DB Connect 1.0.

9
Deploy and Use Splunk DB Connect
Generated: 5/16/2013 5:58 pm

Copyright 2013 Splunk, Inc. All Rights Reserved

Table of Contents
Introduction..........................................................................................................1 About Splunk DB Connect........................................................................1 How Splunk DB Connect fits into the Splunk picture................................1 How to get support and find out more information about Splunk..............1 Before you deploy ................................................................................................3 Deployment requirements.........................................................................3 Release notes...........................................................................................4 Security and access controls....................................................................5 Architecture and performance considerations ...........................................8 Install Splunk DB Connect................................................................................10 Install and configure Splunk DB Connect ................................................10 Install required database JDBC drivers for MySQL and Oracle if needed.....................................................................................................13 Define and add a new/custom database to DB Connect........................14 Configure and use Splunk DB Connect...........................................................17 Add or manage a database connection..................................................17 Configure database inputs......................................................................19 Set up a database lookup table ...............................................................23 Use the Splunk DB Connect search commands.....................................24 Troubleshoot Splunk DB Connect...........................................................26 Configuration file reference..............................................................................29 Configuration file reference.....................................................................29 database.conf.spec.................................................................................29 database_types.conf.spec......................................................................30 dblookup.conf.spec.................................................................................32 dboutput.conf.spec..................................................................................32 java.conf.spec.........................................................................................33 inputs.conf.spec......................................................................................37

Introduction
About Splunk DB Connect
This app enables you to enrich and combine machine data with database data. Splunk DB Connect enables you to easily configure database queries and lookups in minutes via the Splunk user interface. Quickly deploy Splunk for real-time collection, indexing, analysis and visualizations of machine data and then import and index data already stored in your database for additional analytic insight. Furthermore, database lookups enable you to reference fields in an external database that match fields in your event data. Using this match, you can enrich your event data by adding more meaningful information and searchable fields to them.

How Splunk DB Connect fits into the Splunk picture


Splunk DB Connect is one of a variety of apps and add-ons available within the Splunk ecosystem. All Splunk apps and add-ons run on top of a core Splunk installation, so you'll be installing Splunk first, and then installing Splunk DB Connect. For specifics about what you'll install where, see "Install Splunk DB Connect" in this manual. For details about apps and add-ons, refer to "What are apps and add-ons?" in the Splunk Admin Manual. To download Splunk, visit the download page on splunk.com. To get more apps and add-ons, visit Splunkbase.

How to get support and find out more information about Splunk
Splunk DB Connect version 1.0.8 and later is officially Splunk Supported.

To file a support case about Splunk DB Connect, send email to support@splunk.com or use the Support Portal.

Find more information about Splunk


You have a variety of options for finding more information about Splunk: The core Splunk platform documentation Splunk Answers The #splunk IRC channel on EFNET

Before you deploy


Deployment requirements
What databases are supported?
Splunk tests and supports connecting to the following databases: Oracle Database Microsoft SQL Server MySQL Additionally, though unsupported, the following databases can be connected to out of the box: Sybase PostgreSQL SQLite H2 HyperSQL Generic ODBC support You can also add your own database types by providing JDBC drivers.

What versions of Splunk are supported?


Splunk 4.3 or later.

What operating systems are supported?


Splunk DB Connect runs on the Splunk-supported versions of the following: Linux Mac OS X Windows Server 2003/2008R2 Windows XP/7 (for development/testing purposes)

Additional software requirements


The following are required before deploying Splunk DB Connect: A Java Runtime Environment (JRE) Version 1.6 or above Do not use a hotspot (client mode only) JVM. http://www.oracle.com/technetwork/java/javase/downloads/index.html

Deployment Checklist
For a quick install, you'll want to have: The Splunk DB Connect bits Connectivity to the database (machine to query, path through network, etc) Credentials to access the data in the database Required queries or schema JDBC drivers for MySQL and Oracle and anything else listed on the Splunk DB Connect's Splunkbase details page

Release notes
This topic contains listings of known issues and resolved issues for this release.

Known issues
The following issues have been reported in this release of Splunk DB Connect: Manual use of local = 1 is required for lookup in distributed environment. Refer to this topic for details. (DBX-92) Lookup failed due to spaces in table column name. Workaround is to use an advanced database lookup with custom sql like: "SELECT [the field] as the_field" (DBX-100) DB Connect does not work with Splunk search head pooling, the JavaBridge doesn't start (DBX-14) In dbquery view, syntax highlighted editor doesn't work on IE7 and 8. Workaround is to disable syntax highlighting in the UI. (DBX-23) CSS for autocomplete in Splunk Manager displays incorrectly in IE 7 and 8. (DBX-23)

Changelog
The following issues have been resolved in this release of Splunk DB Connect: Can't modify permissions from the UI for dbquery and dbinfo commands. Workaround by modifying [commands] stanza from default.meta into local.meta (DBX-105) Race condition in Fill All Columns button in lookups manager UI can result in %s/%s popup message which goes away after a few moments. (DBX-90) Lookup only grabs the first match in the DB, even though the lookup should be returning multiple rows (DBX-102). This is now configurable in dblookup.conf. Conflicting dblookup.conf stanzas cause "Script for lookup table <lookup name> returned error code 1. Results may be incorrect" error (DBX-119) Postgres issue with OOM in Java heap space (DBX-127) Password are not automatically encrypted if they are not entered in the UI (DBX-120) Fetch Database Names button logs password in web access logs (DBX-125) Can't connect to Access MDB via ODBC. Validation fails (DBX-117) dbquery did not return the columns in the correct order (DBX 130)

Security and access controls


This topic describes how Splunk DB Connect handles credentials for database access.

Access to Database Connections


In Splunk DB Connect versions 1.0.8 and earlier, database connection objects cannot be restricted to a particular role. When creating a database connection, the credentials you use will be implicitly used by every user that has access to dbquery, dblookup, or any other commands that use the connection. For instance, dbquery myConnection "SELECT * FROM Audit_Table" will not check whether the executing user has rights to the myConnection object. You can, however, limit which roles have access to the dbquery command. By default, only admins have access to dbquery, dblookup, and dboutput commands. Make sure you use a database account with appropriately limited permissions. The recommended solution to work with databases regarding security (both
5

read-only and read-write), is to limit the permissions of the database user, specified in the database connection, to the minimum necessary to fulfil its tasks. i.e. the user should only have read access (SELECT) to required tables/views. In case of dboutput the user should be granted limited write access as well (INSERT, UPDATE). This configuration needs to be done on the DBMS side - so describing the necessary steps for each DBMS type is out of scope for these docs. An additional mitigation is to configure the database connection as read-only. Finally, one could also have separate search heads for different user access to database connections. The config for default permissions is found in $SPLUNK_HOME/etc/apps/dbx/metadata/default.meta:

[] access = read : [ admin ], write : [ admin ] ### Manager ### [manager] access = read : [ * ], write : [ admin ] export = system [manager/databases] access = read : [ admin ], write : [ admin ] export = system [manager/dbmon] access = read : [ admin ], write : [ admin ] export = system [manager/dblookups] access = read : [ admin ], write : [ admin ] export = system ### Commands ### [commands] access = read : [ admin ], write : [ admin ] export = system [commands/dbquery] access = read : [ admin ], write : [ admin ] export = system [commands/dbinput]

access = read : [ admin ], write : [ admin ] export = system [commands/dbinfo] access = read : [ admin ], write : [ admin ] export = system [commands/dbmonpreview] access = read : [ admin ], write : [ admin ] export = none ### Other settings ### [inputs/dbmon-*] access = read : [ admin ], write : [ admin ] [savedsearches] access = read : [ admin ], write : [ admin ] export = none [props] access = read : [ * ], write : [ admin, power ] export = system [transforms] access = read : [ * ], write : [ admin, power ] export = system [eventtypes] access = read : [ * ], write : [ admin, power ] export = system [lookups] export = system [searchscripts] access = read : [ * ], write : [ admin ] export = system

Read-Only Connections
When a database connection is configured to be read-only (this can be done through the manager UI), DB Connect will set the JDBC connection flag "read-only" to true. This is done with the following Java method: http://docs.oracle.com/javase/6/docs/api/java/sql/Connection.html#setReadOnly(boolean) It is up to each JDBC driver to implement this setting when establishing the connection. In addition, dbquery uses the JDBC method executeQuery(String) instead of execute(String), which is designed to only execute queries (as
7

opposed to updates). With this method, most JDBC drivers do the appropriate thing to validate statements before they are actually sent to the database server. Finally, the dboutput command does an addition check to see if the database connection is configured as read-only before submitting the statement to the JDBC driver for execution.

Granting non-admin user access to dbquery


As an admin, you may build some dashboards that uses dbquery and want to share those dashboards with non-admin users. Because of a known issue, DB Connect version 1.0.8 and earlier prevents you from enabling access to dbquery from the UI. For a work-around, please see: http://splunk-base.splunk.com/answers/76006/dbquery-command-permissions Keep in mind that enabling this access allows users to do ad-hoc queries; moreover if they know the name of a DB Connection they can issue ad-hoc queries to that database. Please see the suggested mitigations in the Access to Database Connections section above.

Architecture and performance considerations


If you have a trial or personal Splunk deployment running on a single host (indexer and Splunk Web both running on the same system), you can install Splunk DB Connect on this system. To use Splunk DB Connect for reporting or database lookups in a distributed search environment, you must install it on a search head. Note: In a distributed search environment, you must force the lookup to be performed on the search head where Splunk DB Connect is installed. To force the lookup to be performed locally, add local=1 after the lookup command. Example:
index=test | lookup local=1 mysql_table ip_address as clientip OUTPUT host | table clientip, host

This is not currently possible when using automatic lookups. For more information about automatic lookups, refer to this topic in the core Splunk platform documentation. For database inputs, depending on the anticipated volume of your deployment,
8

there are 3 options: Small scale: install Splunk DB Connect on a search head for monitoring and configure it to forward events to the indexer(s) Medium scale: use a dedicated Splunk heavy forwarder to perform monitoring and forward events to indexer(s). Large scale: Use multiple dedicated Splunk forwarders and partition the monitors among them.

Search head pooling and Splunk DB Connect


Splunk DB Connect does not currently support Splunk's search head pooling functionality. To run DB Connect, you must install it on a standalone Splunk instance or on a separate search head that is not a member of a pool.

Performance considerations
Because Splunk DB Connect queries your database, there is a possiblity that your queries may impact that database's performance. In particular, if the initial run of your query to the database retrieves a lot of data, this may affect the performance of your database. Subsequent runs of the query should generally be less impactful, as they are only retrieving data that is new since the previous run of the query. To mitigate this, you can set the tail.follow.only directive, which is only exposed in inputs.conf. Lookups generate multiple selects that should be within the expected workload for a database and should not affect performance. Splunk DB Connect will execute a separate SELECT statement for each unique combination of input fields. This may happen more than once per search, because the search preview function in Splunk may invoke the lookup multiple times during execution of a search for parts of the results. Splunk will not cache the results between invocations of the lookup.

Install Splunk DB Connect


Install and configure Splunk DB Connect
This procedure describes how to install and configure Splunk DB Connect. It assumes that you have an existing Splunk instance to use as the underlying platform. For information on installing Splunk, refer to "Before you install" in the core Splunk platform documentation.

Install Splunk DB Connect


The easiest way to install Splunk DB Connect is to use Splunk Manager. To do this: 1. Download Splunk DB Connect from Splunkbase and save it to a location accessible from your Splunk instance. 2. Log into your Splunk instance, navigate to Manager > Apps and click Install app from file. 3. Select the app package dbx-....tar.gz and upload it. 4. When the upload is complete, follow the instuctions to restart Splunk. Upgrade from a previous version Upgrading from an earlier version of Splunk DB Connect is similar to installing it from scratch: 1. Download the latest Splunk DB Connect from Splunkbase. 2. Log into your Splunk instance, navigate to Manager > Apps and click Install app from file. 3. Select the app package and check the box to upgrade it. 4. When the upgrade is complete, follow the instructions to restart Splunk.

10

Configure Splunk DB Connect


UI Setup To complete the app setup from the UI, navigate to the app homepage. You will be presented with a setup page with some pre-populated values. These default values should be appropriate for most use cases. Note: You must click the Save button at least once because this will enable the JBridge server (you can verify that this is the case by checking that the scripted input jbridge_server.py is enabled). Command Line Setup As an alternative you can setup DB Connect by hand without the setup page. Here are the steps to do so: 1. Create $SPLUNK_HOME/etc/apps/dbx/local/app.conf

[install] is_configured = 1

2. Create $SPLUNK_HOME/etc/apps/dbx/local/java.conf

[java] home = <JAVA_HOME path here>

Note: (JAVA_HOME is usually the path up to but not including the bin directory where the java binary lives). 3. Enable the Java Bridge server (scripted input) in
$SPLUNK_HOME/etc/apps/dbx/local/inputs.conf

[script://$SPLUNK_HOME/etc/apps/dbx/bin/jbridge_server.py] disabled = 0

4. Create the sink for database inputs in


$SPLUNK_HOME/etc/apps/dbx/local/inputs.conf

[batch://$SPLUNK_HOME/var/spool/dbmon/*.dbmonevt] crcSalt = <SOURCE>

11

disabled = 0 move_policy = sinkhole sourcetype = dbmon:spool

5. Restart Splunk Advanced Setup Options The following information can be used in conjunction with the config files to make advanced configuration changes.
Java

Java Installation directory (JAVA_HOME): This must be the directory of your Java JRE (Runtime Environment). The default value will be retrieved by your JAVA_HOME environment variable (if set properly). Java command line options: These command line parameters will be used when starting your Java instance. You can specify maximum memory usage, localization, and default file encoding. Important: Incorrect format of this field can mean that the extension will not start correctly
Java Bridge Server

Address: The IP of your Java Bridge Server (this will typically be 127.0.0.1 (localhost)) Port: The port of your Java Bridge Server. Default is 17865. Important: There must not be any firewall rules activated for this port. Threads: Number of listening threads to work on Java Bridge commands. Note: Too many or not enough threads could lead to a slow performance of the java bridge service Turn on debugging: When enabled, the Java Bridge will log any debug information jbridge_client.log. Important: Enabling debugging will have negative impact on Splunk DB Connect's performance; do not use this in a production environment.

12

Logging configuration

Log level: Logging severity for Splunk DB Connect. Logfile: The name of the Splunk DB Connect logfile, located in $SPLUNK_HOME/var/log/splunk and contains all logs of the Java bridge server.
Configuration adapter

Adapter type Enable configration caching


Database Connection Handling

Factory Type Enable connection pooling Cache database and table metadata Preload database configuration
Database Inputs

Scheduler Threads Output Type Default timestamp output format


Database Lookups

Enable caching of database lookup definitions Cache invalidation timeout


Persistence

Global Store type

Install required database JDBC drivers for MySQL and Oracle if needed
Most of the databases in the list in "About Splunk DB Connect" are preconfigured in DB Connect and only require that you add a database connection and define inputs for that database.

13

However, if you're planning to connect a MySQL or Oracle database to Splunk using Splunk DB Connect, you must download and install the relevant JDBC drivers: MySQL http://dev.mysql.com/downloads/connector/j/ The archive contains the JDBC driver:
mysql-connector-java-*-bin.jar

Note: Use version 5.1.7 or newer.

Oracle JDBC ojdbc6.jar from http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-111060-084321.

Install the drivers


Once you've downloaded the drivers, copy them into
$SPLUNK_HOME/etc/apps/dbx/bin/lib

and then restart Splunk.

Adding other databases that aren't in the list?


If you want to add a custom database that is not in the list in "About Splunk DB Connect", refer to "Define and add a new/custom database" for instructions.

Define and add a new/custom database to DB Connect


In addition to the databases predefined and supported in the shipping Splunk DB Connect package (see the list in "About Splunk DB Connect"), you can add connection support for any custom-defined database that has JDBC drivers. Note: At a minimum, Splunk DB Connect supports querying custom-defined database connections. For some custom database connections, features related to generation of queries may not work. Additionally, depending on the JDBC driver's implementation of database-metadata, the custom dbinfo search command may not work.

14

Download and install the relevant JDBC driver


Before you can complete the process of adding a new database connection that isn't already included in the shipping Splunk DB Connect package, you must first download that database's Java Database Connectivity (JDBC) driver and copy the .jar file to $SPLUNK_HOME/etc/apps/dbx/bin/lib.

Add the custom database to database_types.conf


If you're adding a new (not in the list in "About Splunk DB Connect") database connection, you must define it by creating a stanza for it in a copy of database_types.conf. Important: Do not edit this file in $SPLUNK_HOME/etc/apps/dbx/default, but instead create and then edit a copy of it in $SPLUNK_HOME/etc/apps/dbx/local. For more information about precedence and Splunk configuration files, refer to "About configuration files" in the core Splunk platform documentation. Example new database stanza in database_types.conf

[postgresql] displayName = PostgreSQL jdbcDriverClass = org.postgresql.Driver defaultPort = 5432 connectionUrlFormat = jdbc:postgresql://{0}:{1}/{2} testQuery = SELECT 1 AS test defaultCatalogName = postgres defaultSchema = public

Connection Validation
Every time a connection is re-used in the pool DB Connect will try to validate that the database connection is actually working. If validation fails, you will probably see an error message like "ValidateObject failed". There are two ways DB Connect tries to validate a connection: If a testQuery is specified in database_types.conf, DB Connect will execute that query, and a response will validate that the database connection is working. If testQuery is not specified, DB Connect will try to use the Java method connection.isValid(), and rely on the JDBC driver to answer. Some JDBC drivers do not implement this API call (seems like Derby is build against
15

Java 1.5 source, where JDBC doesn't have the method isValid). The workaround is to specify a manual testQuery. The simplest one is SELECT 1.

Note: As of 1.0.9, you can disable connection validation by setting validationDisabled=true in database_types.conf.

Add the database connection in Manager


Once you've defined the new database, proceed to "Add or manage a database connection" to continue configuration.

16

Configure and use Splunk DB Connect


Add or manage a database connection
Before working with a database, you have to establish a connection to it. Once a connection is configured, you will be able to use it in queries, lookups, inputs, and outputs. Note: If you are adding a database connection for a database that is not supported out-of-the-box (not in the list in "About Splunk DB Connect"), follow the steps in "Define and add a new/custom database to DB Connect" before following the steps in this procedure.

Create a new database connection


1. Log into Splunk Manager as a user with the Admin role and navigate to Manager > External Databases (under the Data section of Manager). 2. Click Add New. The "Add new database" panel is displayed. 3. Supply the following information: Unique name: Enter a unique name that identifies this new database connection. You will reference the database using this name from Splunk DB Connect commands, lookups, and monitors. Database type: The type of database to which you want to connect. Hostname or IP address: The hostname or IP address of the database server. For local database types (such as SQLite or ODBC) you can use any value (for example, "localhost") here. Port: The TCP Port to connect to. You can leave this field empty if you're using the default port of the selected database type or if it is a local database. Database name or Oracle SID: You can leave this field empty to connect to the default database, if the selected database type supports this. Press the button to get a list of available database names for the entered connection information. Note: When adding a local database such as SQLite, specify the fully qualified path to the database file. Alternatively you can place the SQLite file into $SPLUNK_HOME/var/dbx (you might need to create this directory) and name it as database_name.sqlitedb, then you can use "database_name" instead of the fully qualified path.
17

Username and Password: If the database connection requires username and password for authentication, provide them here. For Windows users, you can use the following notation in the username field: <DOMAIN>\<USERNAME>. arg.useNTLMv2 = true is implied if you use this notation. You can override this in the config file. Read-Only: You can set the database connection to read-only. If this is enabled, Splunk DB Connect will not send run any modifying SQL statements against the database. The dbupdate command will not work. Validating the database connection information: When this checkbox is enabled, Splunk DB Connect will try to connect to the database before saving the connection information. If the connection doesn't succeed you will see an error message.

Update or delete database connections


You can update and delete database connections via the same Manager panel. When you make a change, Splunk will automatically reload the database list in the Java Bridge Server (JBS).

Manage database connections via configuration files


You can manage your database connections by editing a copy of the database.conf file (and if you're adding a custom database that is not supported out-of-the-box) the database_types.conf file. Important: Do not edit these files in $SPLUNK_HOME/etc/apps/dbx/default, but instead create and then edit a copy of this file in $SPLUNK_HOME/etc/apps/dbx/local. For more information about precedence and Splunk configuration files, refer to "About configuration files" in the core Splunk platform documentation. After editing database.conf, you must either restart Splunk (which will restart the JBS as well) or reload Splunk via the following command:
splunk cmd python $SPLUNK_HOME/etc/apps/dbx/bin/reload.py databases

The JBS will pick up the modifications and will automatically encrypt plain passwords in the configuration files.

18

Configure database inputs


Database inputs allow you to fetch data from databases and index that data with Splunk. Unlike standard inputs, data from database inputs are retrieved on a regular basis (based on a schedule) by the DBmon scheduler. Note: Because Splunk DB Connect queries your database, there is a possiblity that your queries may impact that database's performance. In particular, if the initial run of your tail query to the database retrieves a lot of data, this may affect the performance of your database. Subsequent runs of the query should generally be less impactful, as they are only retrieving data that is new since the previous run of the query. To add a database input: 1. Log into Splunk Web and navigate to Manager > Data inputs and click Database Inputs, then New. 2. Specify a unique name for your input, and use the information in the following sections to configure your input.

Monitor Types
When configuring a database input, you have the following choices for input types: Dump A dump input is simplest case. It will essentially execute the same query every time it runs and will output all results. If you do not specify an interval in the schedule, it will run only once. Tail A tail monitor works similar to the tail input monitor in Splunk file input monitor functionality. It will determine new records in the given table and will output only those. In order for Splunk to know when there is new data in your database, you must provide a column within the monitored table (or within the query result) that has a value that will always be greater than the value in any older record. Reasonable choices for this tail.rising.column are:

19

Auto incremented values (like auto IDs or sequence filled columns) Creation or Update timestamps Scheduling This setting specifies how often the database inputs are executed. There are 3 different ways on how to define such schedules:
Automatic

This is the default mode for tail and dump inputs. The database input will automatically select the delay between the executions of the data retrieval query depending on the quantity of results it produces and how long it takes to execute the query. It will execute more frequently on tables with many new records in every run. To use this mode, select auto. If this schedule type is used, the database input query runs the first time immediately after startup and will then determine the delay for subsequent executions.
Fixed delay

This is a static value: the amount of time the database input query should wait between executions. This can be expressed as a relative time expression or as a number of seconds. Examples: 1h (a fixed delay of 1 hour) 3 (3 seconds) If this schedule type is used, the input query will run immediately after startup for the first time and will then use the specified delay betweens subsequent executions.
Cron expression

This schedule type allows you to specify a cron expression for when the monitor gets executed. Examples:
20

0/5 * * * * (Every 5 minutes) 30 18 * MON-FRI * (Every weekday at 6.30 pm)

Note: Database inputs do not keep track of executions that are missed when they are not running (for example, when Splunk is stopped). Query generation You can specify just the table name or view you want to monitor and let the database input generate a SQL query for you, or you can specify the SQL query yourself, which provides flexibility and allows you to use certain conditions or join other tables, etc. To specify a custom query, place the where clause in curly braces {{...}}. The literal $rising_column$ will be replaced with the name specified in the rising column setting. The literal ? will be substituted with the checkpoint value. Example: SELECT * FROM my_table {{WHERE $rising_column$ > ?}} For the initial run (for example, if there's no checkpoint state for the input yet), DB Connect will execute the query without the part within the curly braces {{...}} For any of the following queries, DB Connect will execute the query with the part in the curly braces. When your rising column is a date, make sure you wrap the checkpoint parameter in a to_date, such as: {{AND $rising_column$ > to_date(?,'YYYY-MM-DD"T"HH:MI:SS')}}. The format you use must be the same as the format that you selected. For Oracle make sure you put the name of the rising column in upper case. Output formatting (including timestamps) These settings determine how the results are converted into a text-based format Splunk can index.
Formats

Key-Value based Multiline Key-Value based CSV Template based Timestamp output

21

Output Timestamps

You want to make sure to get this right, otherwise your line merging may not work as desired and multiple events may get indexed as a single clump. You can either enable or disable the output of a timestamp value. If enabled, the event is prefixed with the timestamp value. If you specify a timestamp column, the timestamp value is fetched from the given column of each database result row. Otherwise the current time is used. Splunk DB connect expects the timestamp column in your database to be of type datetime/timestamp. If it is not (for example, it is in format char/varchar/etc.), you must check the Output timestamp box and specify the output.timestamp.parse.format so that DB Connect can obey the timestamp output format setting. For example, if the database column EVENT_TIME contains strings (for example, CHAR, VARCHAR, VARCHAR2, etc) with values like 01/26/2013 03:03:25.255 then you must specify the parse format in the appropriate copy of inputs.conf:

output.timestamp = true output.timestamp.column = EVENT_TIME output.timestamp.parse.format = MM/dd/yyyy HH:mm:ss.SSS

Seeing incorrectly merged or split events? If you are seeing merged or incorrectly split events, check out "Issues with bad linebreaking" in the Troubleshooting topic later in this manual for more information.

Working with Splunk DB Connect and custom source types


If the data from your database is not in a common format, you may want to create a custom source type to tell Splunk how to handle it. For more information about source types, refer to "Why source types matter (a lot)" in the core Splunk product documentation. To create a custom source type, you must create line-breaking/timestamping settings manually in the appropriate copy of props.conf. In most cases, you can just copy the settings from the corresponding default source type. Here's what you need for the various timestamp formats available in DB Connect: Key-Value:

22

SHOULD_LINEMERGE = false LINE_BREAKER = ([\r\n]+)

Multiline Key-Value:

KV_MODE = none REPORT-mkv = dbx-mkv SHOULD_LINEMERGE = false LINE_BREAKER = ([\r\n]---91827349873-dbx-end-of-event---[\r\n]) LINE_BREAKER_LOOKBEHIND = 10000

Template:

SHOULD_LINEMERGE = false LINE_BREAKER = ([\r\n]---91827349873-dbx-end-of-event---[\r\n]) LINE_BREAKER_LOOKBEHIND = 10000

CSV:

SHOULD_LINEMERGE = false LINE_BREAKER = ([\r\n]+)

Set up a database lookup table


Splunk DB Connect allows you to define a lookup table that uses an external database as its source. Refer to "About lookups and field actions" in the core Splunk platform documentation for more information. To set up a database lookup table: 1. Log in to Splunk Web and navigate to Manager > Lookups > Database Lookups and click Add new. 2. Specify a unique name for the lookup and enter the database and table to use for the lookup. 3. At this point you have two main options: Specify fields directly: bring in all columns from the table using the Fill all columns button or specify fields to be used in the lookup. You can fill
23

all the columns and then use the Delete link next to each field to trim the list. Use a specific SQL query to pull the data in: select the Configure advanced Database lookup settings checkbox, then define a SQL query and use the $input_field$ as a placeholder for each input field value. 4. Click Save. A corresponding scripted lookup definition is created and you can use it within Splunk as though it were a regular lookup by using the | lookup command. You can also configure an automatic lookup. For information about automatic lookups, refer to this topic in the core Splunk product documentation.

Create a lookup manually via dblookup.conf


You can create a lookup manually through the dblookup.conf file. This is useful if you have a table with many columns that would be cumbersome to select using Manager. However, you must also create the lookup definition manually in transforms.conf with external_cmd = dblookup.py <name from dblookup.conf> By default, only 1 result will be fetched from the database for each lookup input row. If you want to return more than one row, you can change max_matches in dblookup.conf.

Lookups and Splunk DB Connect in a distributed environment


Some constraints exist when running DB Connect in a distributed Splunk environment and using lookups: If you are running DB Connect in a distributed environment, you must use the local=1 option in your lookup command, like this: | lookup local=1 . Automatic lookups are not supported.

Use the Splunk DB Connect search commands


Splunk DB Connect provides custom search commands to query data in your databases.

24

dbquery
The database query command allows you to execute any query against your databases and returns the rows as Splunk results. You can work with those results like you can with any other Splunk results (for example, results from searches). It is similar to the core Splunk platform inputlookup command. Usage

| dbquery database "sql" [limit=limit]

where: database is the external database as specified in database.conf sql is the SQL query statement to execute limit is optional and allows you to specify the maximum count of results that should get returned Example

| dbquery ASSET_DB "SELECT id,name, ip_address,owner,last_update FROM hosts WHERE active = 1"

dbinfo
The dbinfo command fetches schema information from the database. Usage

dbinfo database=<database-spec> [table=<table-spec>] (tables|columns)

dboutput (beta feature)


This functionality is in beta and is currently not officially supported. The dboutput command allows you to insert or update data in your database. Use with caution, as this will actually write data to and change the contents of your database. Usage of this command is currently limited to output 50,000 search results.
25

Usage

dboutput type=<insert|update> database=<database> table=<table> [key=<key_field>] [fields?]

Note: When using type=update, you must specify a key field/column.

Troubleshoot Splunk DB Connect


This topic contains information about troubleshooting common issues with Splunk DB Connect.

Answers
Have questions? In addition to the common troubleshooting tips listed in this topic, you can visit Splunk Answers and see what questions and answers the Splunk community has about using Splunk DB Connect.

Java Bridge Server not running


A status error indicating that the Java Bridge Server is not running and errors relating to REST keep-alive failed in dbx.log typically occur when Splunk DB Connect is running in a VM that has been suspended and then restarted. If you must suspend and restart the VM that Splunk DB Connect is running in, you must restart Splunk upon waking.

Input not updating


For a dbmon-tail, check the latest checkpoint value, which is stored in $SPLUNK_DB/persistentstorage/dbx ($SPLUNK_DB is, if not specified otherwise: $SPLUNK_HOME/var/lib/splunk). Each input has its own directory, which is a hash of its name (for example, a 32 character long hex string). This directory typically contains 2 files: manifest.properties: contains meta-information, such as the name of the input state.xml: contains the actual state in XML format You must first identify the state directory and then you can inspect the XML file.

26

This state file looks like this:

<list> <value key="latest.record_update"> <value class="sql-timestamp">2012-12-07 04:22:25.703</value> </value> </list>

Error creating PersistentValueStore


If you see the following error in the jbridge.log:

ERROR Java process returned error code 1! Error: Initializing Splunk context... Environment: SplunkEnvironment{SPLUNK_HOME=/opt/splunk,SPLUNK_DB=/opt/splunk/var/lib/splunk} Configuring Log4j... [Fatal Error] :1:1: Premature end of file. Exception in thread "main" com.splunk.config.SplunkConfigurationException: Error creating PersistentValueStore type xstream: com.thoughtworks.xstream.io.StreamException: : Premature end of file.

you may have inadvertently corrupted your persistent store file. To resolve the problem, remove $SPLUNK_DB/persistentstorage/dbx/global recursively.

Issues with bad line breaking/line merging


The problem is that Splunk has certain heuristics for linebreaking. Normally, log file data has timestamps for each event. Splunk understands that well. If you have timestamps in your database rows, then you shouldn't have line breaking issues. Just be sure to set output timestamp and specify as timestamp column the column that, you know, has the timestamp. If you don't have timestamps in your db rows If you don't have timestamps in your database rows, you have two options: Click output timestamp and leave the timestamp column blank. Splunk will output the current time when indexing. Use the default sourcetype in the input config. Leave it blank and Splunk DB Connect will use dbmon:kv as the sourcetype (in the normal case where you're using the key-value output format). However, if you put
27

something custom in the sourcetype field, you should then tell Splunk how to linebreak for that sourcetype. Just copy over the props.conf settings for the default stanzas - specifically, add "SHOULD_LINEMERGE = false". If your timestamp is not of type datetime/timestamp Splunk DB connect expects the timestamp column in your database to be of type datetime/timestamp. If it is not (for example, it is in format char/varchar/etc.), you must check the Output timestamp box and specify the output.timestamp.parse.format so that DB Connect can obey the timestamp output format setting. For example, if the database column EVENT_TIME contains strings (for example, CHAR, VARCHAR, VARCHAR2, etc) with values like 01/26/2013 03:03:25.255 then you must specify the parse format in the appropriate copy of inputs.conf:

output.timestamp = true output.timestamp.column = EVENT_TIME output.timestamp.parse.format = MM/dd/yyyy HH:mm:ss.SSS

28

Configuration file reference


Configuration file reference
Splunk DB Connect includes several custom configuration files. The following are the spec files associated with them: database.conf.spec database_types.conf.spec dblookup.conf.spec dboutput.conf.spec java.conf.spec Splunk DB Connect also includes a custom set of input fields, so there is an additional inputs.conf.spec file to describe them. The most current versions of these spec files are located in $SPLUNK_HOME/etc/apps/dbx/README.

database.conf.spec
# Copyright (C) 2005-2012 Splunk Inc. All Rights Reserved. # The file contains the configured database connections [<name>] host = <string> * The IP address or the hostname of the database. port = <integer> * The port number of the database. If omitted the default port number for the given database type is used. username = <string> * The username which is used for authenticating against the database. password = <string> * The password which is used for authenticating against the database. It will be automatically encrypted if it is set in * clear-text. database = <string>

29

* The database name or SID. type = <database_type> * The database type. References a stanza in database_types.conf readonly = true|false * Whether the database connection is read-only. If it is readonly, any modifying SQL statement will be blocked database.sid = true|false * Only applies to Oracle database connections (ie. type=oracle). Set to *true* if the Oracle database is only reachable * using an SID. By default the the service name format is used. default.schema = <string> * Sets the default schema for the database connection if the database type supports it (Currently only Oracle supports * it). testQuery = <string> * Supply a specific test query for validating connections to this database * If defined it overrides the testQuery of the database type (see database_types.conf) validationDisabled = [true|false] * Turn off connection validation for this database connection * If defined it overrides the validationDisabled of the database type (see database_types.conf) * Caution: disabling validation can lead to unpredictable results when using it with connection pooling

database_types.conf.spec
# @copyright@ # This file contains the database type definitions [<name>] displayName = <string> * A descriptive display name for the database type. typeClass = <string> * The FQCN (fully qualified class-name) of a class implementing the com.splunk.dbx.sql.type.DatabaseType interface. jdbcDriverClass = <string> * The FQCN of the JDBC Driver class. Only used when no typeClass is

30

specified. defaultPort = <integer> * The default TCP port for the database type. Only used when no typeClass is specified. connectionUrlFormat = <string> * The JDBC URL as a MessageFormat string. The following values will be replaced: * {0} the database host * {1} the database port (the port specified in database.conf or the default port) * {2} the database name/catalog or SID * Only used when no typeClass is specified. testQuery = <string> * A simple SQL that is used to validate the database connection. Only used when no typeClass is specified. supportsParameterMetaData = [true|false] * Whether the given JDBC driver supports metadata for java.sql.PreparedStatement. * Only used when no typeClass is specified. quoteChars = <string> * Override the quote characters for the database type. If not specified the default ANSI-SQL quote characters will be used. * Only used when no typeClass is specified. defaultCatalogName = <string> * Configure the default catalog name for a generic database type. Used for querying the catalog names (ie. databases) local = true|false * This flag marks a database type as local (ie. it is accessed via the filesystem instead of TCP) defaultSchema = <string> * Set the default schema prefix for the database type (defaults to null) streamingFetchSize = <n> * Number of results to be fetched at a time when streaming is enabled for a JDBC statement. streamingAutoCommit = [true|false] * Turn auto-commit on or off for java.sql.Connection instances in streaming mode validationDisabled = [true|false] * Turn off connection validation for database connections of this type * Defaults to false * Caution: this can lead to unpredictable results when using this with

31

connection pooling

dblookup.conf.spec
# Copyright (C) 2005-2012 Splunk Inc. All Rights Reserved. # This file contains the configured database lookup definitions [<name>] database = <database> * The database. References a stanza in database.conf table = <string> * The database table name. Only used in simple mode (advanced = 0). fields = <csv-list> * A list of fields/columns for the lookup * It possible to simply specify the field or the field and the column in the form: <field> as <sql-column> advanced = [1|0] * Whether to perform a simple lookup against the table or use a custom SQL query query = <string> * A SQL query template. Expressions in the form of $fieldname$ are replaced with the input provided by splunk. input_fields = <csv-list> * list of fields/columns for as input for the SQL query template max_matches = <n> * Maximum number of results fetched from the database for each lookup input row * Defaults to 1

dboutput.conf.spec
# Copyright (C) 2005-2012 Splunk Inc. All Rights Reserved. [<name>] database = <string> * The database to use (references the database.conf stanza)

32

table = <string> * The table to update mode = insert|update * for the simple mode fields = <string> * the fields used for the update in the form of <field_name> [AS <column_name>]. You can use * as a wildcard here. key = <string>[ AS <string>] * Only applies to mode=udpate. The key columns to use for the SQL UPDATE statement. not.found = insert|ignore|fail * Only applies to mode=udpate. If there are no records update (ie. no matching * key value has been found for the result, this option defines how dboutput * should behave in that case. * - insert: dboutput should insert the result instead * - ignore: do nothing * - fail: Rollback the changes (if possible) and fail the execution sql = <string> advanced = true|false * true to use the predefined SQL statement, false to use the table and automatically generate the SQL * Defaults to false. sql.update = <string> sql.insert = <string>

java.conf.spec
# @copyright@ # The master configuration file. Global settings for Java and Splunk DB Connect are # configured in here. ################# # Java settings # ################# [java]

33

home = <path> * Path to the Java JRE or JDK installation directory options = <string> * Arbitrary Java command line options * For example memory settings or system properties [bridge] addr = <bind address> * The address/interface the Java Bridge server should listen for * connections on. * In most cases only 127.0.0.1 makes sense. port = <bind port> * The port the Java Bridge server should listen for connections on. threads = <n> * The size of the thread pool for Java Bridge command execution. * This defines the number of commands that can run concurrently debug = true|false * Turn on debugging for the Java Bridge client [logging] level = INFO|DEBUG|WARN|ERROR|FATAL * The global logging severity file = <filename> * The filename for the Splunk DB Connect logfile which is placed at * $SPLUNK_HOME/var/log/splunk

console = true|false * Enable or disable STDOUT output of log events. (only for debugging). logger.<logger_name> = INFO|DEBUG|WARN|ERROR|FATAL * Override the global log level for a specific logger [persistence] global = <store type> * The type used for the global persistent store. * - xstream: Data is stored in XML flatfiles. These files are readable * and easy to inspect and change * - jdbm: Data is stored in a btree key-value database. The performance * is better for big amounts of data but the files are binary. type.<type_name> = <fqcn> * A type definition for a implementation of the PersistentValueStore

34

interface [config] adapter = <config_adapter> * A class implementing the com.splunk.config.ConfigurationAdapter interface cache = true|false * Enable or disable caching of configuration values [output] default.channel = <string> * A channel is a way on how to get data into Splunk. Currently this is * done using the "spool" channel. There will be other options in future. default.timestamp.format = <string> * The default format of timestamp values for events generated by DBmon. * The can be overridden on per-input definition basis. The format is * expressed as Java SimpleDateFormat pattern. type.<type> = <string> * Allows the registration of a output channel *(a class implementing com.splunk.output.SplunkOutputChannel) format.<format> = <string> * Allows the registration of a output format * (a class implementing com.splunk.dbx.monitor.output.OutputFormat) [cache] default.type = <softref|lru> cleaner.interval = <relative_time_expression> [rest] keep-alive.timeout = <relative_time_expression> ##################### # Database settings # ##################### [dbx] database.factory = persistent|default * The database connection factory to use database.factory.pooled = true|false * Enable database pooling pool.maxActive = <n> * The maximum number of active database connections

35

pool.maxIdle = <n> * The maximum number of idle database connections cache.tables = true|false * Turn on caching of table metadata information cache.tables.size = <n> * The size of the table metadata cache cache.tables.invalidation.timeout = <relative_time_expression> * The amount of time before the cached metadata information of a table is * considered invalid and fetched again preload.config = [true|false] * When enabled, the database factory will fetch and check all configured * database on startup. Otherwise there fetched when they are used for the * first time. query.stream.limit = <n> * Force streaming results for queries with a max. result limit greater than this (default is 10000). * This setting affects only certain database types. jdbc.streaming.fetch.size = <n> * Number of results to be fetched at a time when streaming is enabled for a JDBC statement. It can be overridden on * a per-database-type basis using the "streamingFetchSize" parameter in database_types.conf. * This setting affects only certain database types * Default is 500 [dbmon] threads = <n> * The size of the thread pool for database inputs output.channel = <string> * The output channel to use * - spool: Temporary files, that are moved into a file monitor sinkhole * - rest: Events are uploaded via REST to Splunkd output.buffer.limit = <file-size-expression> * Only applies to the spool output channel. The max. size of the temp files, before they are moved to the sinkhole. * Defaults to 5MB output.time.limit = <n> * Only applies to the spool output channel. The time limit for moving files to the sinkhole in milliseconds. * Defaults to 5000

36

[dblookup] cache = true|false * When set to true, database lookup definitions are cached in memory cache.size = <n> * The cache size for database lookups definitions (number of entries) cache.invalidation.timeout = <relative_time_expression> * The amount of the before a database lookup definition is considered invalid * and removed from the cache. [startup] init.<n> = <FQCN> [dboutput] batch.size = <n>

inputs.conf.spec
# Copyright (C) 2005-2012 Splunk Inc. All Rights Reserved. # This file contains the database monitor definitions [dbmon-<type>://<database>/<unique_name>] interval = <relative time expression>|<cron expression>|auto * Use to configure the schedule for the given database monitor * There are 3 different schedule types * - auto - the scheduler will automatically choose an interval based on the number of generated results * - fixed delay between runs - The number of millisconds or a relative time expression * Examples: * interval = 5000 (run every 5 seconds) * interval = 1h (run every hour) * - a cron expression * Examples: * interval = 0/15 * * * * (run ever 15 minutes) * interval = 0 18 * * MON-FRI * (run every weekday at 6pm) query = <string> * The query options allows you to define the exact SQL query that is executed against the database

37

table = <string> * If no query is specified DBmon will automatically create a SQL query from the given table name (ie. SELECT * FROM <table>) output.format = [kv|mkv|csv|template] * The output format to use. The following are available: * - kv: Simple Key-Value pairs * - mkv: Multiline Key-Value pairs (ie. each key-value pair will be printed on it's own line * - csv: CSV formated events * - template: allows you to specify the generated events using the <output.template> or <output.template.file> options output.template = <string> output.template.file = <string> output.timestamp = [true|false] * Controls weather or not the generated event is prefixed with a timestamp value output.timestamp.column = <string> * The column of the result set where the timestamp is fetched from. If this is omitted the execution time of the monitor * will be used as the timestamp value output.timestamp.format = <string> * The format of the output timestamp value expressed as a Java SimpeDateFormat pattern. output.timestamp.parse.format = <string> * Used for the case that the timestamp in the column defined by <output.timestamp.column> is a string value (ie. * varchar, nvarchar, etc). It allows you to define the (SimpleDateFormat) pattern to parse the timestamp with. output.fields = <list> * The fields that should be printed in the generated event # A Tail Database monitor will remember the value of a column in the result and will only fetch entries with higher value # in future executions. [dbmon-tail://<database>/<unique_name>] tail.rising.column = <string> * A column with a value that is always rising. The best option is to use an auto-incremented value or a sequence. A * creation or last-update timestamp is also a good choice. tail.follow.only = [true|false] * This only affects the first execution of the monitor. If this options is set to true (default is false) nothing will be

38

* indexed at the first run. [dbmon-dump://<database>/<unique_name>] [dbmon-change://<database>/<unique_name>] change.hash.algorithm = MD5|SHA256 [dbmon-batch://<database>/<unique_name>]

39

You might also like