Monday, December 30, 2019

PRTG: solving your problems

So you wrote a nice custom script to monitor you Next Cool Thing, and it doesn't work.  Not only does it not work, you can't figure out what is wrong with it.

So how do you solve it?

First, turn on sensor debugging. 

Go to the sensor in question, then Settings.  Scroll down until you find this bit. 

Turn that on.




Got that?   Good.

Now the interesting part.  Where are my readings? 

C:\program data\Paessler\PRTG Network Monitor\Logs (Sensors)\

So there you have it.

Kind of.

You get two files for every sensor.   One file indicates all the stuff that was sent to the sensor via PRTG.  That's the file with the name "Result of Sensor XXXX.Data.txt"

The other file is the output after the script has been run.  That's "Result of Sensor XXXX.log"

From there, it's time to go read that second log file and try to figure out what the problem might be.  The big problem I have with troubleshooting Python scripts is the lack of an Idle interpreter in the default installation of PRTG. 

That's survivable.  Notepad is there.  It's just more annoying. 

The second part of reference is that the interesting bits about what failed is generally at the very end of the script output. 

Is this the perfect debugging scenario?  No.  But it does provide the information you need to figure out why your script isn't working.

You did write the script on a machine with better debugging utilities to try and make a good proof of concept, right? 

So best practice in my eyes: write the entire script except the output bits on a separate machine before transferring it to the remote probe and/or primary server.

So...   Have fun, and go squash some bugs.

Thursday, December 5, 2019

PRTG: Making Custom Sensors to monitor strange things with Python


#verion 1.0
#last modified 12/2/19
#
#v 1.0   initial revision / prtg integration
#
import sys
import json
import urllib.request

from paepy.ChannelDefinition import CustomSensorResult


if __name__ == "__main__":

    location = json.loads(sys.argv[1])
    
    parsed = "http://" + str(location['host']) + "whatever else in the url"
    
    page = urllib.request.urlopen(parsed).read()
    

    #data comes out as binary type.
    #convert from binary to normal string
    np = page.decode('utf-8')

    scrip = np.split('



    for lines in scrip:


    result = CustomSensorResult("OK")


         result.add_channel(channel_name="channel1", unit="Custom", value=status_value1, is_float=True, primary_channel=True, warning=0, is_limit_mode=True, limit_min_error=0.5, limit_max_error=1.5, limit_error_msg="channel1 failed") 
         result.add_channel(channel_name="channel2", unit="Custom", value=status_value2, is_float=True, is_limit_mode=True, warning=0, limit_min_error=0.5, limit_max_error=1.5, limit_error_msg="channel2 failed")


    print(result.get_json_result())


Base code for a python script sensor for PRTG.   

Using this method, you write a script to scrape a web page, and then present the information to PRTG as a JSON file.

The limits are used to define up/down/warning status.

In this case, I have outputs of a binary 1/0 for working not working.

So how does that work?   

Notice this bit.

result.add_channel(channel_name="channel1", unit="Custom", value=status_value1, is_float=True, primary_channel=True, warning=0, is_limit_mode=True, limit_min_error=0.5, limit_max_error=1.5, limit_error_msg="primary connection 

And breaking that apart, the section here
is_limit_mode=True, limit_min_error=0.5, limit_max_error=1.5, limit_error_msg="channel1 offline"

Let's break these down.

Note: these names aren't the same those expected or presented for EXE/Advanced sensors on the PRTG custom sensor page.

is_limit_mode = Sets PRTG to know that the output has acceptable ranges of input.   
limit_min_error  = This sets the lower limit that defines an error.  Depending on your output, there may not be one.  I'm using outputs of binary 0/1 in this case, so I set it to .5.   Therefore, a 0 output is defined as error state.
limit_max_error  = This set the upper limit that defines an error.   Depending on your output, there may not be one.  In my case, there is never an upper maximum error.   So I set it to 1.5.   
limit_erro_msg = The message you want on PRTG for any device that may be not working.


So, with these setting set correctly, PRTG will report an individual sensor is down based on the values you assign to status_value1 and status_value2.  So now, you can alert based on those settings using normal PRTG alerting.


What this base script doesn't currently do:
  1. Parse anything.  Parsing the web page is based entirely on what you are looking for.  The page I was looking at was all table based, so splitting the data into tables made sense.  You will have to handle that portion.
  2. Deal gracefully with urlib.urlrequest.open() errors.   You will get a JSON error in PRTG when you try to pull a web page you can't get.  That's a simple try/except statement.  Use this message in your except portion to report failure gracefully.

            result.add_error("Your Error Message Here")



Secondary important thing...   probably the most important.  

See this block? 

 location = json.loads(sys.argv[1])
    
    parsed = "http://" + str(location['host']) + "whatever else in the url"

This block accepts json data as input to the script.  
The second part location['host'] pulls the IP address setup on the sensor to feed that data into the script.   So this script can be written once and run on multiple devices.  That's what makes this script extendable.

Now...   

So, you've got the initial script working.

How in the world do I troubleshoot this thing when I suddenly get a bunch JSON errors when I deploy it?   

That's the subject of another discussion.