Logstash and sFlow

Logstash has a native input codec for Netflow, and there are plenty of examples around the web of how to set that up, as well as how to build a Kibana dashboard to make use of it.  But what if your network equipment doesn’t do Netflow?  Arista, as an example, makes some great hardware – but they rely on sFlow sampling, rather than Netflow.

I looked high and low for examples of people using ElasticsearchLogstashKibana (ELK) to deal with sFlow, and there just wasn’t a lot out there.  I ended up having to figure it out myself.

Logstash can take in all kinds of stuff, and it has plenty of tools to process many forms of data, but there’s no native sFlow input, nor a specific codec to be used with the UDP input.  I was going to have to rely on grok, which is like a swiss army knife – but you need to be pretty handy with regular expressions – and I’m not.  So I turned to this invaluable web tool, the Grok Debugger.

But before I could even get started with that, I needed to figure out the message format for the sFlow records that would be coming my way.  Fortunately, InMon, the main people behind sFlow, have furnished the sFlow Toolkit.  Using that, I was able to take a look at the sFlow records in human readable format.

Running ‘sflowtool -l -g‘ gives you output like this:

Now I had the field names from the sFlow packet.  Then, using the Grok Debugger, I was able to plug in the raw sFlow messages, and use the field names I’d captured above to produce this:

match => { "message" => "%{WORD:SampleType},%{IP:sflow.ReporterIP},%{WORD:sflow.inputPort},%{WORD:sflow.outputPort},%{WORD:sflow.srcMAC},%{WORD:sflow.dstMAC},%{WORD:sflow.EtherType},%{NUMBER:sflow.in_vlan},%{NUMBER:sflow.out_vlan},%{IP:sflow.srcIP},%{IP:sflow.dstIP},%{NUMBER:sflow.IPProtocol},%{WORD:sflow.IPTOS},%{WORD:sflow.IPTTL},%{NUMBER:sflow.srcPort},%{NUMBER:sflow.dstPort},%{DATA:sflow.tcpFlags},%{NUMBER:sflow.PacketSize},%{NUMBER:sflow.IPSize},%{NUMBER:sflow.SampleRate}" }

I know, right?

Once you get over that hurdle, everything falls into place pretty well.  The rest of the Logstash filter is pretty standard.

I do some DNS reverse lookups so that I can use human readable values in my dashboards, and search for things by name.

I also do some dictionary translations, adding a field for service names based on destination port numbers.  This requires producing some YAML files, which I’ll furnish below.  I created three:

A conversation with a reddit user made me remember these two additional components.  These form the Logstash input portion needed to make sFlow work:

And finally, here’s the actual Logstash Filter file:

After you get that going, your Elasticsearch cluster will be populated with nice clean sFlow JSON records:


Logstash sFlow JSON

Once you get here, it just becomes a matter of building your Kibana dashboards to do cool and interesting stuff with this data.  I’ll do another post on that at some time.

Share this post:
Facebooktwittergoogle_plusredditpinterestlinkedintumblrmail
war Written by:

31 Comments

  1. August 4, 2015
    Reply

    This kicks so much ass…

  2. Clay Fiske
    August 4, 2015
    Reply

    Thanks for the post. I’m glad someone else dug into this ahead of me. 🙂

    I noticed sflowtool has a Netflow conversion option (-c) built into it. Did you do any experimentation with using that to feed into Logstash’s existing Netflow parser?

    • August 8, 2015
      Reply

      I did give that a try, but it didn’t work immediately, and I didn’t have time to fool around with it. Initially, my plan was to make use of that, as I’d already been doing a lot with Netflow.

  3. Dude
    October 21, 2015
    Reply

    Do you not require an output.conf of some sort for this?

    • October 23, 2015
      Reply

      Sure, but my assumption is that you’ve already got that. That’s a prerequisite of any logstash setup (input(s), filter(s), output(s)). There’s nothing unique or special about the output. I dump all of my stuff into elasticsearch.

      Mine looks like this:

      output {
      elasticsearch {
      host => "xxx.xxx.xxx.xxx"
      protocol => "http"
      index => "xx-logstash-%{+YYYY.MM.dd}"
      }
      }

  4. Rich
    November 6, 2015
    Reply

    This is great and I plan to start working on a setup such as yours asap. Are you pushing counters from your switch as well via sFlow? Are you placing that in ELK? It looks to me like you are working with packet samples alone but perhaps I missed it.

    • November 7, 2015
      Reply

      You’re correct. I’m discarding counters for the time being. I’m only interested in the flow data.

  5. Lee
    November 30, 2015
    Reply

    Thanks for the post. a few things…
    1) I’m not able to compile the latest version… after a good deal of searching I found the following download link that has a version that ./configure, make, make install works on and that link is:

    wget http://www.inmon.com/bin/sflowtool-3.22.tar.gz

    2) In elastic search you can no longer have varriables that use “.” so none of this data actually gets saved to elasticsearch for later use in Kibana.

    I had to replace all instances of things like: “sflow.DstHostname” to “sflow_DstHostname” and had to do this in all instances where the “variable” had a “.” in the name…

    3) is there a reason you are dropping on “_grokparsefailure” tags? were you seeing many of them?

    4) where did you find info on what the individual fields meant? was going to put together a grok for the CNTR variable but not sure were to look to see which fields mean what..

    thanks,

    Lee

    • November 30, 2015
      Reply

      Hey Lee. In order:

      1) I was able to successfully compile sflowtool v3.35, and that’s the version I’m using. I don’t know why you’d be running into trouble with newer versions. (I’m running on Ubuntu 14.04 LTS – I’m pretty sure I built it directly from their git repo: https://github.com/sflow/sflowtool.)

      2) On this particular ES cluster, I’m still running 1.x code, so I don’t have an issue with ‘.’ yet. Thanks for the heads up – I’ll have to give that some consideration when I look to upgrade.

      3) I did see some grok parse failures, but not an alarming number. In my environment, I don’t need a high degree of precision. I’m just trying to get good estimated levels of network consumption based on our internal applications.

      4) In the link I provided in my post above, you can see the abbreviated result of ‘ sflowtool -l -g ‘ which provides a human readable decoding of the sflow packets. The sample I’m showing is a flow packet. The Counter packets are decoded as well, and you shouldn’t have trouble deciphering them. (Also: The sflowtool git link has a lot more information about the packet decode than it used to, so that’s worth checking out in addition to what I provided.)

      • February 29, 2016
        Reply

        Lee,

        When you say you can’t have “.” in elasticsearch, are you referring to field names? What version of elasticsearch are you running and what platform (linux or windows)?

        • March 12, 2016
          Reply

          Turns out he’s right. They did break that when they went to ES 2.x. It doesn’t have anything to do with the way things display in Kibana, but in ES itself, you have to de_dot before you hit the output to ES.

          It’s not terrible, but it is a pain in the neck to re-write your filters if you’ve been doing a lot with nested fields… Like for instance, NetFlow or sFlow – which is pretty much all I do. Dammit. 🙂

  6. xtruthx
    February 10, 2016
    Reply

    My can you please share your visualizations in json format with us? actually i m running of creativity.

    many thanks

  7. xasaph
    March 1, 2016
    Reply

    With ElasticSearch 2.0+, you’ll need to de_dot (plugin) the field names.

  8. ElevenB2003
    March 4, 2016
    Reply

    Hey, great guide but I’m having some trouble running the “sflowtool -l -g” “-g” isn’t an option (using the Windows sflowtool). I’m trying to ship sFlow from a Fortigate into ELK.

    Also, the YAML files – where did you place them?

    • March 4, 2016
      Reply

      I can’t offer any insight about the options available in the Windows version of sflowtool, as I’ve only used it on Linux. I’d be surprised if the function wasn’t there, but it’s possible that the syntax is different.

      As for the location of the YAML dictionary files, you should be able to put them anywhere you want. When you call them from the filter, you can (should) specify the complete path to them:

      translate {
      field => "[sflow.srcPort]"
      destination => "[sflow.SrcSvcName]"
      dictionary_path => "/etc/logstash/dictionaries/iana_services.yaml"
      }

    • March 14, 2016
      Reply

      Thanks for sharing that. I’ll check it out.

    • Kay
      April 26, 2016
      Reply

      Hello Daniel,
      how did you set up the config.yaml?
      I dont get it to work – maybe beacause my switch has no hostname…
      Regards,
      Kay

  9. Simon
    April 24, 2016
    Reply

    I just ran through this having first tried a few of the sflow codecs and working well.

  10. Karl Trasschaert
    May 13, 2016
    Reply

    Hi,

    When i start ./sflowtool_wrapper.sh, i have the error “/bin/awk: No such file or directory”

    I’m using ubuntu 14.04, how can i fix that ?

    • May 19, 2016
      Reply

      Sorry for the delayed reply… I’ve been busy.

      apt-get install gawk

  11. Ben
    June 24, 2016
    Reply

    Hello,
    Thank you for your great share.
    Can you just say me how do you create your bandwidth graph ?
    For Y axis : do you use {NUMBER:sflow.PacketSize},%{NUMBER:sflow.IPSize},
    For X axis : @timestamps.
    I search for days, your help would be very appreciated 🙂
    Thanks

    • June 24, 2016
      Reply

      Hi Ben, You just need to total the sflow.PacketSize field, and logstash handles the timestamps for you automatically.

      • Ben
        July 5, 2016
        Reply

        And If I want bandwidth in byte/s I think I need to calculate :
        if interval is 30 sec and SampleRate is 1000 :
        bandwitdth = (nb of packets * PacketSize) /30 *1000
        I am mistaken ?
        Thanks

  12. Kawika
    July 7, 2016
    Reply

    Thanks for the post but I can’t see any output from the filter. I can see the data flow using sflowtool_wrapper.sh from the command-line and I see in the logstash.log showing it’s loading the sflow filter but still no output. I’m using the same input you posted and a minimal filter. Thanks for any tips.

    • Kawika
      July 7, 2016
      Reply

      nvm, I went another route. Cheers

  13. tron_jones
    November 30, 2016
    Reply

    Tried using this method and found it a logstash codec input on rubygems already built for this. I am using it with a Cisco nexus 9000 using sflow.

    Install the “logstash-codec-sflow” from rubygems using the logstash command:

    sudo ls_home_dir/bin/logstash-plugin install logstash-codec-sflow

    Logstash conf file:

    input {
    udp {
    port => 6343
    codec => sflow {}
    }
    }

    output {
    elasticsearch {
    index => “sflow-%{+YYYY.MM.dd}”
    hosts => [“localhost”]
    }
    stdout {
    codec => rubydebug }
    }

  14. Saos Tomat
    March 15, 2017
    Reply

    Why the tcpFlags field pattern is DATA and not WORD? Is there any particular reason?

    • March 18, 2017
      Reply

      No particular reason. Would there be an advantage to treating it as a WORD, rather than DATA?

  15. CP
    July 28, 2017
    Reply

    The link in the article cannot be opened

Leave a Reply

Your email address will not be published. Required fields are marked *