Giant pain in the butt.
My build was Ubuntu 16.04. And that works fine.
Lots of stuff that is not mentioned, or even thought about.
A few things I had to solve...
1) Elasticsearch needs to be on the same version
2) Mongodb needs to be at least version 3 or higher.
3) The web URI information is badly documented.
4) A cluster won't form until you are using the same database for all servers.
I set up a router in front of my setup, resulting in a NAT'd public IP address and the cluster behind. Gives plenty of room for growth without ever readdressing anything externally.
Based on recommendations, I changed the listening port to 12900. Not sure the implications or reasoning of that. But it seems to work for me.
So, private IP is IP of the server.
Public IP is the NAT'd IP of the server.
rest_listen_uri = private ip/api/
web_listen_uri = private ip
web_endpoint_uri = public ip/api/
rest_transport_uri = private ip/api
forgetting the /api/ on the web_endpoint_uri causes a specific issue
It's the cannot POST issue.
Don't forget the /api/ setting.
Once that is done
I also load balanced the web interface using HAProxy.
Final thing about Graylog, and one I wondered about...
Once I've got the cluster up and working, do I need to load balance my UDP syslog traffic?
Short answer: probably not.
Though it doesn't look like the other servers in the cluster are doing anything, they are. Easy way to tell is look at the number of messages waiting and processing. With 2 servers in my cluster, my waiting and processing at 4,000 messages per minute sits in the dozens. There's never a queue.
So, with two low power computers and a whopping total of 14 GB of RAM, I process 4,000 messages a minute with zero wait. That includes bursts of up to 6,000 messages per minute.
I'm not saying you won't need to load balance the UDP traffic. I'm saying you need to base it off your scenario. And figure out how many messages per minute you are getting.
Generally, UDP is low overhead network traffic anyways. The general problem is not that you can't receive enough data. It's that the data takes too long to process, resulting in a queue. That's the purpose of building a cluster.
Total cost at this point? 3 cast off computers that aren't good for much else. One of my computers runs 6 GB of RAM and runs 100% CPU usage. Yet it still handles enough to reduce the load of the primary computer.