Showing posts with label automation. Show all posts
Showing posts with label automation. Show all posts

Thursday, December 5, 2019

PRTG: Making Custom Sensors to monitor strange things with Python


#verion 1.0
#last modified 12/2/19
#
#v 1.0   initial revision / prtg integration
#
import sys
import json
import urllib.request

from paepy.ChannelDefinition import CustomSensorResult


if __name__ == "__main__":

    location = json.loads(sys.argv[1])
    
    parsed = "http://" + str(location['host']) + "whatever else in the url"
    
    page = urllib.request.urlopen(parsed).read()
    

    #data comes out as binary type.
    #convert from binary to normal string
    np = page.decode('utf-8')

    scrip = np.split('



    for lines in scrip:


    result = CustomSensorResult("OK")


         result.add_channel(channel_name="channel1", unit="Custom", value=status_value1, is_float=True, primary_channel=True, warning=0, is_limit_mode=True, limit_min_error=0.5, limit_max_error=1.5, limit_error_msg="channel1 failed") 
         result.add_channel(channel_name="channel2", unit="Custom", value=status_value2, is_float=True, is_limit_mode=True, warning=0, limit_min_error=0.5, limit_max_error=1.5, limit_error_msg="channel2 failed")


    print(result.get_json_result())


Base code for a python script sensor for PRTG.   

Using this method, you write a script to scrape a web page, and then present the information to PRTG as a JSON file.

The limits are used to define up/down/warning status.

In this case, I have outputs of a binary 1/0 for working not working.

So how does that work?   

Notice this bit.

result.add_channel(channel_name="channel1", unit="Custom", value=status_value1, is_float=True, primary_channel=True, warning=0, is_limit_mode=True, limit_min_error=0.5, limit_max_error=1.5, limit_error_msg="primary connection 

And breaking that apart, the section here
is_limit_mode=True, limit_min_error=0.5, limit_max_error=1.5, limit_error_msg="channel1 offline"

Let's break these down.

Note: these names aren't the same those expected or presented for EXE/Advanced sensors on the PRTG custom sensor page.

is_limit_mode = Sets PRTG to know that the output has acceptable ranges of input.   
limit_min_error  = This sets the lower limit that defines an error.  Depending on your output, there may not be one.  I'm using outputs of binary 0/1 in this case, so I set it to .5.   Therefore, a 0 output is defined as error state.
limit_max_error  = This set the upper limit that defines an error.   Depending on your output, there may not be one.  In my case, there is never an upper maximum error.   So I set it to 1.5.   
limit_erro_msg = The message you want on PRTG for any device that may be not working.


So, with these setting set correctly, PRTG will report an individual sensor is down based on the values you assign to status_value1 and status_value2.  So now, you can alert based on those settings using normal PRTG alerting.


What this base script doesn't currently do:
  1. Parse anything.  Parsing the web page is based entirely on what you are looking for.  The page I was looking at was all table based, so splitting the data into tables made sense.  You will have to handle that portion.
  2. Deal gracefully with urlib.urlrequest.open() errors.   You will get a JSON error in PRTG when you try to pull a web page you can't get.  That's a simple try/except statement.  Use this message in your except portion to report failure gracefully.

            result.add_error("Your Error Message Here")



Secondary important thing...   probably the most important.  

See this block? 

 location = json.loads(sys.argv[1])
    
    parsed = "http://" + str(location['host']) + "whatever else in the url"

This block accepts json data as input to the script.  
The second part location['host'] pulls the IP address setup on the sensor to feed that data into the script.   So this script can be written once and run on multiple devices.  That's what makes this script extendable.

Now...   

So, you've got the initial script working.

How in the world do I troubleshoot this thing when I suddenly get a bunch JSON errors when I deploy it?   

That's the subject of another discussion.

Friday, May 25, 2018

1,000 lines of Python


Did I ever think I'd intentionally write 1000 lines of python code?   Not really.   But I'm getting up there.

Python is pretty good for parsing through XML files and gathering the data.  From there, it can be used to compare that data to expected results.  Auditing.  

When I first thought of the idea of auditing Verifone Commander configurations, I never contemplated what it would take in time, code, and labor.   It's been a lot of all of.  But now I'm almost up to 1,000 lines of code to audit a Verifone Commander system.  

I wish Verifone would make their equipment scale better.  Enterprise level management would be awesome.  Then I wouldn't have to cobble tools together using Python, Powershell, and AutoIt.  

So how does all this work?   AutoIt is used to automatically backup every single site.   Once the backups are complete, the audit script will run over the files to examine what the settings are in comparison to what they should be.   


So at the point of originally writing this, the code was just barely reaching 1,000 lines.   It has since broken into numerous modules and is closer to 4,000 lines.  And I've still got about a dozen files to go.

Maybe I need to spend more time researching better Python coding.  Or a way to organize libraries better.  But going through 1 file that's more than a 1,000 lines of code is a pain.  So it's easier to break the things into separate modules. 


I guess the other part of this....  is it worth it spending probably 40 hours writing an estimated 6,000 lines of code to audit a system? 

Yes, yes it is.


Monday, January 16, 2017

Automation

I've been trying to automate more stuff.  Most recently, I used AutoIT to delete 4000 pages of fake assets.  So that was a great win.  I set the system to show 200 pages per item, and yet there were still 4000 pages.  When an import would fail, the system would create 65,536 new assets.  Do that 12 times and you end up with 800,000 assets.  That have to be deleted.  Manually.

Of course, if we were running the on-premise version of the software you could use a simple SQL command.  But we migrated off to the cloud.  So that was out of the question.

If you can't do it with SQL, then it's time to do it with some other method. 

Now, with 4,000 pages of stuff, at about 1.5 minutes per page deletion it would have taken me 12.5 work days to get rid of the fake assets.  Not happening. 

In the end, it took me about 4 hours.  30-45 minutes to write the original script.  Then another 3.5 hours dealing with crashes of the script to make tweaks.  Granted it was just a bunch of web page clicks, but sufficient time prevents those failures.  The main thing was adding time between the clicks. 

Click one button.
Wait 2 seconds.
Move to another location.
Click.
Wait 2 seconds.
Click. 
Wait 4 seconds.
Wait a minute.
Start script at line 1.

I wish I could have written a better script.  But I haven't become good enough at that.  So the scripts I end up writing are generally very specific to the computer being used.  Exact screen coordinates and what not. 

Anyways.  Enough of automation. 

Just realize: repetitive IT work can often be automated.  It just takes time and effort.  And documenting the large behaviors.   


Sunday, September 18, 2016

The Boring Details

I've been spending a lot of time contemplating automation recently.  Automating things is rather great.  But I think there is an unwritten side part to automation.  I'm going to write that down.

In order to automate anything, you must first document the entire process.

After reading that sentence, you are probably thinking a lot of sarcastic comments.  I'd like to agree with you, but the stupid simple is what most people miss in the first place.  How often has business classes shown case study after case study of ridiculous levels of bureaucracy that can be removed and processes that can be streamlined by knowing the process. 

But then that involves a lot of boring drudgery.  That's the part that no one does. It's a simple thing, but doing that simple thing is all that really needs to be done.   By the end of the process of documentation, you've got an in depth understanding of the events that take place.  Often in the process you start thinking about why certain things are done, and you realize just how much time you can solve by automating.

I looked at the same idea when I was fighting the Windows Automatic Installation Kit.  Sounded like a great idea.  I could never get the network drivers to work on my builds.  So I basically burned through a lot of crap and none of it worked. 

So after that, I went back to partial automation and partial manual.  If part of the process is copying files and creating directories, why not automate that?  A batch file is perfectly acceptable for that and it becomes automatic and the same everywhere. 

I want to do the same thing with network discovery, but Python is giving me hell.  Something I'm not certain of is causing me problems.  I can't get the data file to create.  

Anyways, I guess this is the call to do boring but important things.  Documentation is boring.  But it solves a world of problems.  It also gives you the ability to solve all sorts of problems in the future.  And it gives you the best ability: delegation.  If you have something well documented, you can then delegate the task and give it to someone else.