Skip to main content

Welcome Note

I have been thinking to create a blog for quite some time with my experiences with Splunk. The aim of this blog is to share my experience with Splunk and other big data tools. This blog is NOT intended for beginners, but would urge them to use "answers.splunk.com" or "docs.splunk.com" to read and understand basics.


I would love to invite others to share their experience and post to this blog. Please feel free to "comment" so I could provide you edit access to this blog.

Key Points to Note:
  • This site has NOTHING to do with Splunk Inc and hence opinion expressed here are purely personal
  • Code snippets should be hosted in github preferably as "gist"
  • Images can be self hosted or uploaded to this blog
  • Ensure that you don't copy any of your company's or client's details in this blog. 

Comments

Popular posts from this blog

Splunk Integration with Active Directory/LDAP

Most of the companies want to integrate their Splunk installation to centralised authentication system. The main article in Splunk docs describe it in concise manner, but this article is to do the integration in a practical manner including the code. LDAP/Active Directory : Purpose of Integration To authenticate users via Active Directory (AD) To associate users to roles To centralise management of users/roles  To collect Identity list from Active Directory subsystems Modular App(s) I always tend to create specific apps for every functionality. For integration of Splunk, the app I would create is something like A_prod_ldap_auth  (the naming convention implies the integration into PROD, ldap for authorisation purposes) Contents of the app Authorization mainly is done using two conf files authentication .conf  - configuring authentication with LDAP authorize .conf  - configure roles and granular access controls Both these files...

syslog and Splunk : Logrotate

Syslog (rsyslog or syslog-ng) is used by almost 99% of Linux based Splunk installations for collection of data especially from network devices where the data is transient. Key things to remember while collecting these logs are To store the log using syslog and forward to other systems if required. Store them in a well formatted directory structure. Direct streaming to Splunk is not preferred as restart of Splunk causes problem This leads to storing  the data and hence managing the data (store for 1 day and delete etc.). We use "logrotate" extensively for this purpose Link to git code:  logrotate.d Loading .... Delaycompress is required, so that log files are NOT rotated while splunk is reading it While collecting into splunk discard any .gz extensions Ensure size is specified to a feasible value

Splunk : Transform data further from already transformed data

In many circumstances, you may need to extract or transform a data which has just been extracted by another transform. Please see a below case whereby the _raw needs to have a new sourcetype (index time), then on such modified sourcetype you need to extract the fields , and one of the extracted fields itself is an XML. We can achieve all this using props.conf and transforms.conf within an app (or local/ directory of your existing app) In props.conf  # =================================================================  # These are executed in the same order that they appear in the list so ORDER CAREFULLY!  [incoming_sourcetype]  TRANSFORMS-sourcetype = rename_mySourcetype  [mySourcetype]  # Search Time extractions by REPORT  REPORT-mySourcetype = my_deep_extraction_1, my_deeper_extraction_2, my_deeper_extraction_3  # =================================================================  In transforms.conf  # ====================...