sameroom
2017-02-05 00:15
has joined #community201702


zehicle
2017-02-05 00:15
testing bot

2017-02-05 00:16
bot reverse

2017-02-05 07:53
@zehicle - thanks for the update. I'll look on Monday morning. Is there a threaded/batched way to engage with these sorts of conversations? Email list, web forum etc? IM is clunky when I'm online maybe 30hrs/week and in the wrong timezone.

2017-02-05 20:43
@gregoryo2008_twitter yes, there's an email list https://groups.google.com/forum/#!forum/digitalrebar

2017-02-05 20:44
we could also add you to the community slack channel (this channel is now cross linked to that one)

2017-02-06 00:43
@zehicle Okay I've joined the forum (crickets!). If I stick with Rebar after this week, I'll talk to you about the Slack channel.

greg
2017-02-06 00:44
@gregoryo2008_twitter: What is going to make your decision?

2017-02-06 00:55
Whoa, actual live chat! Good question - we've been working with Fuel and I'm casting about to see how other systems work. I'm entirely new to OpenStack so I'm not sure if I know enough yet to know what is going to make the decision!

greg
2017-02-06 00:56
our openstack is pretty raw, but making progress.

greg
2017-02-06 00:56
Where are you?

2017-02-06 00:57
Conceptually though, I'm interested to know how we'll end up managing and upgrading our nodes once we have a system in place. Getting it going initially seems like the basic functionality of a deployment system, but those other tasks are going to become quite important, especially since OpenStack has a quick and almost-enforced upgrade cycle.

2017-02-06 00:57
I'm in Western Australia.

2017-02-06 00:58
Puppet is our CMDB of choice, so we're hoping that whatever tool we use can cope with that either being integrated some how, or bolted on afterward.

greg
2017-02-06 00:58
Yeah - those are good questions, that I'm not sure anyone has answered. I think we have a workable story, but we need people to start.

greg
2017-02-06 00:59
We have worked with that in the past. We can drive it or call out to puppet post provisioining. Either works.

2017-02-06 01:00
Hey have you got any recommendations for a good way to get up and running? I'm working through a Linux Academy training course, and it's built on Icehouse ): Clearly working with it is going to be ultimately the main method, but some cohesive guidance would be good too.

2017-02-06 01:00
Okay, good to hear that Puppet isn't going to be a square peg. Devil in the details, of course.

greg
2017-02-06 01:02
hmm - next to think about it. We may want to let you see our stuff. It will get the basics up, but still needs a little work around actual compute and neutron side.

greg
2017-02-06 01:03
It gets you k8s along the way. :slightly_smiling_face: or :disappointed: depending upon your perspective.

2017-02-06 01:03
k8s?

greg
2017-02-06 01:03
kuberenetes - most openstack deployment mechanisms are moving to it to manage the deployment of Openstack.

greg
2017-02-06 01:04
You run the openstack components as containers managed by k8s.

greg
2017-02-06 01:04
Some have the goal of seamless container and vm interactions.

greg
2017-02-06 01:04
We'll see if that comes about.

2017-02-06 01:04
Oh goody, another thing to learn. We've been looking at k8s from afar wondering if we'd need to work out what it actually is.

2017-02-06 01:05
Is there a performance/manageability tradeoff at every turn with these decisions?

greg
2017-02-06 01:05
manageability.

greg
2017-02-06 01:06
performance shouldn't be hurt. Containers appear to be a good trade-off in that area. These are mostly long running containers, so perf isn't really a problem.

greg
2017-02-06 01:06
The manageability comes from the fact that k8s can help with availability and upgrades.

greg
2017-02-06 01:07
k8s works on manifests and you can change a manifest and the system will "automagically" move the containers to new versions. We'll see how well it goes in practice, but ...

greg
2017-02-06 01:07
So, you get some upgrade in there.

2017-02-06 01:08
Very promising in theory. Without knowing how the guts work it really is just magic to me for now.

greg
2017-02-06 01:09
To add fear, the openstack we are playing with is opensource stuff with AT*T . They use helm which is a manifest manager for k8s. It will hopefully get to where the helm charts will have upgrade actions like roll out next version. db upgrades, container updates, and .....

greg
2017-02-06 01:09
Our system will do it all - deploy k8s, helm, ceph, and then openstack.

2017-02-06 01:11
Haven't heard of helm - just found your vid on YouTube about it, will have a watch.

greg
2017-02-06 01:12
:slightly_smiling_face: be kind

2017-02-06 01:13
Portal 2 icon peeking through from the desktop is a nice touch (:

greg
2017-02-06 01:15
All coding makes Greg a sad boy.

2017-02-06 01:18
So I have tried to grok the relationship between Crowbar, OpenCrowbar and Digital Rebar... Crowbar is v1 and looked after by SUSE, OpenCrowbar (v2) became DigitalRebar (v3) - right?

greg
2017-02-06 01:18
Yep

greg
2017-02-06 01:19
V2 was started at Dell but died internal. 2.5 years ago Dell said they were done. Rob left Dell and I joined him again and started DigitalRebar from that base. Victor joined us soon after.

2017-02-06 01:19
Time to feed the :bear:!

greg
2017-02-06 01:20
I left Dell as 2.0 was being designed. I didn't want to deal with the transition to private company.

2017-02-06 01:20
Time to feed the :bear:!

greg
2017-02-06 01:21
I created Crowbar with Rob. Victor was one of the first to join the team. We've been doing this awhile.

2017-02-06 01:22
Okay, all makes sense. Okay straight up: Watching a couple of videos I'm hearing terms like 'proof of concept' and wondering about maturity and readiness. We're looking to build a production cluster in the next few months (at least some of the team knows heaps more about OpenStack than me). Should we be considering Rebar now?

greg
2017-02-06 01:22
We've learned lots and learn more all the time. The crowbar and some of 2.0 design have issues. We've been amazed that Suse continues to drive on, but they don't really change Crowbar and live with warts, but are going to have issues trying to add additional workloads and support for newer systems.

2017-02-06 01:23
Yeah it's those warts that I don't have a clue about yet.

greg
2017-02-06 01:24
You can't debug an issue easily in CB1.0 and 2.0. DR has more atomic and segregated items.

greg
2017-02-06 01:25
Well, we are brutally honest at times about our state. We have people starting to use us for production with some pretty intense integrations. Should be able to talk about that more soon.

2017-02-06 01:26
It all sounds pretty good, so I'll keep hacking and see how it goes. Gotta go fight with the hardware networking config on my test box now, so I can hopefully browse to my newly installed DR.

greg
2017-02-06 01:26
I think your problem isn't going to be DR, but the stability of the systems you want to deploy. We can help with that.

greg
2017-02-06 01:26
networking always fun

2017-02-06 01:27
Do you have a sense of Fuel's community, software (design, operation etc) and direction?

2017-02-06 01:27
Someone told me that Mirantis are pulling out of Fuel.

greg
2017-02-06 01:28
In some regard, I think they are influx. They aren't sure what they want to do there.

greg
2017-02-06 01:29
I've never completely bought into fuel. To merged together and not separable. We deployed fuel to deploy openstack once. Very silly POC>

2017-02-06 01:41
Well that's exactly what we've been doing - last week a colleague created a few dozen VMs as a test - all on OpenStack deployed with Fuel. Seems to be going okay.

2017-02-06 01:42
But ongoing manageability is a question - looks like they use mcollective to kick locally installed Puppet manifests.

2017-02-06 01:43
Anyway, it's been excellent to talk to you, thanks for your time. Time for me to go and learn more.

greg
2017-02-06 01:44
:slightly_smiling_face: have fun.

2017-02-06 02:14
Hey @galthaus can I get an invite to community Slack? (If that's a better comms channel than this...)

2017-02-06 02:25
@gregoryo2008_twitter to my knowledge, old FUEL is not being maintained (it's really just cobbler + puppet). New FUEL is completely different w/o an upgrade based on salt (and MaaS?). Also, it's OpenStack focused. We've been careful to be general purpose. That adds some complexity but makes us more robust too. That's why we've got ansible, puppet and chef all running together.

wdennis
2017-02-06 02:27
has joined #community201702

2017-02-06 02:28
@gregoryo2008_twitter we're betting that kubernetes is the better underlay and have been working to support the OpenStack-Helm efforts. It's not all there yes (Greg said the same thing earlier) but could get there pretty fast.

2017-02-06 02:28
https://www.youtube.com/watch?v=wZ0vMrdx4a4&index=2&list=PLXPBeIrpXjfjabMbwYyDULOX3kZmlxEXK

2017-02-06 02:59
What's this talk of 'old' Fuel and 'new' Fuel? There's https://www.mirantis.com/software/openstack/fuel/ and https://fuel-infra.org - is that what you mean?

2017-02-06 03:01
I don't think so. fuel-infra looks like the "big tent" version of old fuel. I think the new one is Fuel-Ccp https://github.com/openstack/fuel-ccp

2017-02-06 03:23
Ah, something else again. Not sure what you mean by big tent - open source community version is how I've heard it described, versus Mirantis' version. Anyway, thank you, this adds more grist for the mill.

2017-02-06 04:11
like most of OpenStack, it's a long story. Are you looking for DYI or a distro?

2017-02-06 04:36
We're looking to be able to deploy OpenStack using some sort of deployment tool that is both easy to get started, and customisable. We want to be able to upgrade it without massive effort every six months. We want to be able to automate as much as possible in ways that we can control - we have Puppet expertise and in-house usage already.

2017-02-06 04:38
I suspect we will want to be able to customise both the deployment platform's settings _and_ the OpenStack that it is deploying and managing. If we can do both with something like Puppet, all the better.

2017-02-06 04:45
I'd be happy to set up a 1x1 to see if there's a fit.

2017-02-06 04:46
IMHO, what you are describing is not just about OpenStack but broader ops. That's our approach. The point of the K8s underlay for Openstack is to help w/ upgrades of OpenStack but it can be used more broadly too.

2017-02-06 04:54
What do you mean a 1x1?

2017-02-06 13:37
a call or meeting instead of via the community chat

2017-02-06 21:54
@wdennis I checked out the redeploy process and recorded a demo of how it should work for the UX and CLI.

2017-02-06 21:54
ubuntu-16.04

2017-02-06 21:56
I had some issues at first because I was using the CLI with "rebar nodes commit X" after I changed the value and that was causing issues if the noderoles were still in process. Skipping the commit is OK for redeploy.

wdennis
2017-02-07 01:58
@zehicle Thanks for the redeploy vid

wdennis
2017-02-07 01:58
Am bring up a new DR system to try things out on now

2017-02-07 02:09
@zehicle Ah okay, thanks will keep it in mind. Following through the deployment guide and videos for now to get a feel for it.

wdennis
2017-02-07 03:05
Hmmm, installing DR via the ?quickstart.sh? script as a non-priv?d user does not seem to work? UI comes up, but no objects within

wdennis
2017-02-07 03:06
Looks like I have all the needed containers running...

wdennis
2017-02-07 03:07

zehicle
2017-02-07 03:10
did you start it before as root? your permissions may be in a bad state

greg
2017-02-07 03:14
@wdennis - cd digitalrebar/deploy/compose

greg
2017-02-07 03:14
docker-compose logs -f rebar_api

greg
2017-02-07 03:14
or

greg
2017-02-07 03:14
docker-compose logs rebar_api > /tmp/rebar.log

greg
2017-02-07 03:15
That can give some hints to where we are at. Hung, or errored or skipped.

2017-02-07 03:16
I?m getting confused about choosing access mode - host or forwarder. I?m trying to create a metal admin node, to manage all metal nodes.

2017-02-07 03:17
"For a Metal or KVM booting dev-test add: --con-provisioner --access=FORWARDER"

2017-02-07 03:17
generally, you want to use --access=HOST

2017-02-07 03:17
unless you are trying to run VMs locally

2017-02-07 03:17
"Host mode ? is useful for systems that are managing ? joined nodes (VMs or physical nodes), or dedicated hosts"

2017-02-07 03:18
which page are you reading? I'll see about updating it. because you also want --con-dhcp

2017-02-07 03:18
http://digital-rebar.readthedocs.io/en/latest/deployment/questions.html#what-access-mode-should-i-use

2017-02-07 03:18
what are you trying to do?

2017-02-07 03:18
http://digital-rebar.readthedocs.io/en/latest/deployment/install/linux.html

2017-02-07 03:18
Create an admin node on metal, to manage systems all on metal.

2017-02-07 03:19
then you'll want host mode.

2017-02-07 03:19
Thanks

2017-02-07 03:19
I did see a ?con-dhcp reference somewhere in the docs

wdennis
2017-02-07 03:19
@zehicle negative, downloaded quickstart.sh & then exec?d it as a non-priv?d user (one I made for DR, ?dradmin?)

zehicle
2017-02-07 03:20
I think you may need that user to have sudoer rights

wdennis
2017-02-07 03:20
Oh, I see @greg responded - stand by, let me look...

wdennis
2017-02-07 03:21
@zehicle yes, the user does have sudo rights (in the ?wheel? group in CentOS 7)

2017-02-07 03:21
Ah yes, last question on deployhment/questions.html in context of Provisioner - mentions DHCP and ?con-dhcp

2017-02-07 03:21
@gregoryo2008_twitter you'll need to make sure you understand your DHCP / Network environment. Do you have another DHCP server on your network?

2017-02-07 03:21
Currently yes, but we?re about to turn it off (:

2017-02-07 03:22
Here's a video about setting up DHCP - it MUST match your network environment

2017-02-07 03:22
https://www.youtube.com/watch?v=5YWMlYYuu-s&index=9&list=PLXPBeIrpXjfgurJuwVjZkcfmatCoXYM_v

2017-02-07 03:23
Okay I?ll leap ahead and watch that first. I was planning to hit go and then watch vids - I?m up to 002

2017-02-07 03:23
that one is pretty important for bare metal.

2017-02-07 03:23
Yep, makes sense

wdennis
2017-02-07 03:23
@greg Seeing stuff in rebar_api logs like this:

2017-02-07 03:23
forwarder is used when you don't want to leak DHCP to your tenwork

wdennis
2017-02-07 03:24

greg
2017-02-07 03:24
That seems like consul didn't start right.

greg
2017-02-07 03:25
hmmm -

greg
2017-02-07 03:29
@wdennis - have you confessed to me your setup?

wdennis
2017-02-07 03:31
Forgive me father for I have sinned? :wink:

wdennis
2017-02-07 03:32
CentOS 7.3, made a regular user ?dradmin?, home = /home/dradmin

wdennis
2017-02-07 03:32
Added to ?wheel? group which grants sudoer rights

wdennis
2017-02-07 03:33

wdennis
2017-02-07 03:33
(into /home/dradmin)

greg
2017-02-07 03:33
curl | bash

greg
2017-02-07 03:33
curl | bash

greg
2017-02-07 03:33
curl | bash

greg
2017-02-07 03:33
okay or not

wdennis
2017-02-07 03:34
chmod +x, then ./drqs.sh --con-provisioner --con-dhcp --access=FORWARDER

greg
2017-02-07 03:34
hmm

greg
2017-02-07 03:35
Talk to me of your networking.

greg
2017-02-07 03:35
and what are you going to boot off this powerful tool that is DigitalRebar

wdennis
2017-02-07 03:36
Ansible all ran fine, except at end got:

wdennis
2017-02-07 03:36

greg
2017-02-07 03:36
checking something.

wdennis
2017-02-07 03:36
Have an awesome Dell PE2950 (newer one, yeah still old now but wth) with 16GB RAM

2017-02-07 03:37
Time to feed the :bear:!

wdennis
2017-02-07 03:38
server is just on a single flat network, but am planning on testing w/ KVM nodes on the server itself

greg
2017-02-07 03:38
You might be up.

greg
2017-02-07 03:38

greg
2017-02-07 03:39
nvm - you showed me stuff that shouldn't be up.

greg
2017-02-07 03:39
You can try it, but I don't htink it is likely

wdennis
2017-02-07 03:39
I got nuthin'

wdennis
2017-02-07 03:40
(nothing in Deployments, Workloads, Networks, etc.)

greg
2017-02-07 03:40
You login though?

wdennis
2017-02-07 03:41
Yup, that worked

wdennis
2017-02-07 03:41
Except I authd at the web UI login, then got an additional login as so:

greg
2017-02-07 03:42
Yeah - we have challenges at that. We really need to get out of digest auth/ssl mode and move to basic/ssl.

wdennis
2017-02-07 03:42
@wdennis uploaded a file: https://rackn.slack.com/files/U416T0AAX/F41RWH3PU/screen_shot_2017-02-06_at_9.57.05_pm.png and commented: double login screen

greg
2017-02-07 03:42
It will clean that up.

greg
2017-02-07 03:42
Yeah. We have an issue on in the UX.

greg
2017-02-07 03:43
We use digest auth, but it really mucks up the browsers and since we switched to single page material design app, it gets confused.

greg
2017-02-07 03:43
Anyway ...

wdennis
2017-02-07 03:43
So lonely with nothing around...


greg
2017-02-07 03:44
can you check top on the system and see if it is pounding.

greg
2017-02-07 03:44
Do you have a system deployment?

wdennis
2017-02-07 03:46
Top looks OK to me...

greg
2017-02-07 03:46
ps auxwww | grep puma

wdennis
2017-02-07 03:46

2017-02-07 03:47
@zehicle Okay the DHCP stuff all makes sense. Regarding the option to collapse ?dhcp? and ?host? networks into one - what is the benefit of having them separate?

wdennis
2017-02-07 03:47

greg
2017-02-07 03:47
@gregoryo2008_twitter - well - it style and history

greg
2017-02-07 03:48
@wdennis - something seems to be running very slow

wdennis
2017-02-07 03:48
@greg Ah, but what?

greg
2017-02-07 03:49
@gregoryo2008_twitter - the two ranges act as a discovery limiter for large environments.

greg
2017-02-07 03:49
The dhcp range has a set of short leases and nodes are only in that range until they are assigned a more permanent static-ish IP.

2017-02-07 03:50
Sure, but it appears that anon leases can be short, while the known ones be longer, even when in the same pool.

greg
2017-02-07 03:50
It is also the case that the ranges act as buffers for people that may not know all the things DHCP on their network.

2017-02-07 03:51
Something for us to keep in mind I guess. Sticking mostly to defaults for now!

greg
2017-02-07 03:51
Yeah - I'm not sure we've tried setting both anon and bound in the same range. It seems like it should work the way you described and what makes sense in my head, but ...

greg
2017-02-07 03:52
@wdennis - hmmm - can you restart the rebar_api container?

greg
2017-02-07 03:52
cd digitalrebar/deploy/compose

greg
2017-02-07 03:52
docker-compose restart rebar_api

greg
2017-02-07 03:53
docker-compose logs -f rebar_api

greg
2017-02-07 03:53
This is slow (but not 49 minutes slow). Rails apps start slowly.

wdennis
2017-02-07 03:58
OK, restarted, but still seeing:

wdennis
2017-02-07 03:58

greg
2017-02-07 03:58
That is "normal"

wdennis
2017-02-07 03:58
Oh good :stuck_out_tongue:

greg
2017-02-07 03:58
That part about the rails app coming up taking forever. That is it.

wdennis
2017-02-07 03:58
Ah

greg
2017-02-07 04:06
how about now?

greg
2017-02-07 04:06
same few lines?

wdennis
2017-02-07 04:07
This is where I?m at now:

wdennis
2017-02-07 04:08

greg
2017-02-07 04:09
Do the disks work? :slightly_smiling_face:

greg
2017-02-07 04:10
umm - sooo - it is trying to load the basic content.

greg
2017-02-07 04:10
You should see something like this:

greg
2017-02-07 04:11
```rebar_api_1 | trueCalling cmd: /usr/local/entrypoint.d/25-load-initial-workloads.sh rebar_api_1 | 2017/02/03 21:35:42 [INFO] serf: EventMemberJoin: 4afe50dc6d03 172.17.0.3 rebar_api_1 | Loading the core barclamp metadata rebar_api_1 | 2017/02/03 21:35:52 [INFO] serf: EventMemberUpdate: 45f9fd79fd96 rebar_api_1 | Loading barclamp metadata from /opt/digitalrebar/rackn-workloads/6fusion/rebar.yml rebar_api_1 | Loading barclamp metadata from /opt/digitalrebar/rackn-workloads/burnin/rebar.yml rebar_api_1 | Loading barclamp metadata from /opt/digitalrebar/rackn-workloads/kubernetes/efk-logging/rebar.yml rebar_api_1 | Loading barclamp metadata from /opt/digitalrebar/rackn-workloads/kubernetes/deis/rebar.yml rebar_api_1 | Loading barclamp metadata from /opt/digitalrebar/rackn-workloads/kubernetes/openstack/rebar.yml rebar_api_1 | Loading barclamp metadata from /opt/digitalrebar/rackn-workloads/kubernetes/helm/rebar.yml rebar_api_1 | Loading barclamp metadata from /opt/digitalrebar/rackn-workloads/kubernetes/heapster-monitoring/rebar.yml ```

greg
2017-02-07 04:11
kinda like that - that is my private play things, but ...

greg
2017-02-07 04:12
loading core is the first one though. Always.

greg
2017-02-07 04:12
okay - new issue to check.

greg
2017-02-07 04:12
docker-compose ps | grep rule

greg
2017-02-07 04:13
The rule-engine may be having problems - that start up script waits for it to start.

greg
2017-02-07 04:13
docker-compose restart rule-engine

greg
2017-02-07 04:14
might help as well.

wdennis
2017-02-07 04:16

greg
2017-02-07 04:18
docker-compose logs -f rule-engine

wdennis
2017-02-07 04:18
OK, that seemed to kick things, but from rebar_api log:

wdennis
2017-02-07 04:18

greg
2017-02-07 04:19
ok

greg
2017-02-07 04:19
That is okay. apparently dcos workload is now out of date and broke.

wdennis
2017-02-07 04:19
OK

wdennis
2017-02-07 04:20
Here?s end of rule-engine log:

wdennis
2017-02-07 04:20

greg
2017-02-07 04:20
That looks better

greg
2017-02-07 04:21
hmm - wonder what caused this timing/startup mess up

wdennis
2017-02-07 04:21
And now we got stuffs in the UI :slightly_smiling_face:

greg
2017-02-07 04:21
The rule-engine hang up is what was causing your system.

greg
2017-02-07 04:21
again not sure why.

wdennis
2017-02-07 04:22
OK, so to roll back: it is supported to run the DR system as no non-priv?d user (albeit one with sudo rights)?

wdennis
2017-02-07 04:23
*as a non-priv?d user

greg
2017-02-07 04:23
yes

wdennis
2017-02-07 04:23
cool

wdennis
2017-02-07 04:23
right answer :slightly_smiling_face:

greg
2017-02-07 04:24
yeah - I got one finally right

wdennis
2017-02-07 04:24
blind squirrel and all that :wink:

wdennis
2017-02-07 04:27
Hmm? still noting showing under Networks? I should see something there, yes?

wdennis
2017-02-07 04:27
*nothing

2017-02-07 04:28
it can take time to get the networks there - it's part of the init sequence

2017-02-07 04:28
you should be able to watch them from the rebar api log when it uses the API to add networks

wdennis
2017-02-07 04:28
OK

greg
2017-02-07 04:33
They are after the core components.

2017-02-07 05:27
Hmm, that DHCP video said I could change configs before deployment, but they don?t seem to have taken effect. I changed IPs in `compose/config-dir/api/config/networks/the_admin.json.mac` and then ran `./run-in-system.sh --deploy-admin=local --access=host --con-provisioner --con-dhcp --admin-ip=$IPA`, now https://$IPA/ux/#/networks/1 shows the default 192.168.124 network.

2017-02-07 05:29
From the web UX I tried to change them, and it gave an error and now ?Ranges ? Error Loading Ranges"

greg
2017-02-07 05:43
--access=HOST

greg
2017-02-07 05:43
caps matter for that one.

2017-02-07 05:43
Ah, thanks. So, can I rerun and fix this, or should I blow it away (uh, how?) and start again. I could of course reformat the host.

greg
2017-02-07 05:44
with regard to the UX changes, I think we fixed them, but need to rebuild the revproxy container to pull the set of changes.

greg
2017-02-07 05:44
You can rerun and it will blow it away and start over.

2017-02-07 05:44
Excellent

2017-02-07 07:37
Reading http://digital-rebar.readthedocs.io/en/latest/deployment/install/raid-bios.html I can?t seem to download MegaCLI version 8.07.14 - only 8.07.07 is available.

2017-02-07 07:40
Heh, some guy at http://techedemic.com is hosting it.

greg
2017-02-07 13:48
hmm - someone else last night had problems, but eventually found them on the sites listed there.

greg
2017-02-07 20:12
come back @wdennis come back

wdennis
2017-02-07 20:12
I?m still joined & sharing...

greg
2017-02-07 20:12
ok

greg
2017-02-07 20:12
np

wdennis
2017-02-07 23:22
Ok, I fail... Trying to make a node in system inventory (I.e. has run sledgehammer) do a install into an existing deployment (which has my desired bootenv set.) Went into Nodes, then clicked on "Move Nodes", selected desired deployment, then clicked on "Redeploy Nodes". The node rebooted, went into sledgehammer, and all nodes roles went green, but there she sits... How to get it to take the desired bootenv change??

wdennis
2017-02-07 23:56
OK, here's what I figured out... - in Deployments, go to Matrix tab, click the Bind Node Roles icon, then click on "provisioner-os-install" and click the "+" key to do the binding. This will add the node role in blue (proposed state) beside the node. Do the same thing for the "rebar-installed-node" role. - Then click on the node role, and click the "floppy" icon on each role to commit the proposed role. Once this is done, DR will take the appropriate actions to conform the node to the role.

wdennis
2017-02-08 00:00
So what I don?t get is what a ?Deployment? is for? I would think it applies a collection of roles to a node, perhaps with per-deployment-specific node role attribute values? I could see the new (changed) ones sitting in ?proposed? state until someone commits them, but I don?t understand why the new node roles don?t get applied automatically to the nodes put into the deployment...

greg
2017-02-08 01:01
Oh - well - we need to talk aobut this more. This moves to training and operations with regard to what things mean. Can't right now though I'll try later.

2017-02-08 04:01
@wdennis this is not in an obvious place... there are some docs about this: http://digital-rebar.readthedocs.io/en/latest/api/common.html under 5.5.1.7.2.

wdennis
2017-02-08 04:29
@zehicle Is the ?rebar-installed-node? role a ?meta-role? (i.e. an aggregation of roles?)

wdennis
2017-02-08 04:31
Like in Ansible you can have a meta-role that includes ?n? other roles into a package; when you use the meta-role it applies the included roles in order defined...

wdennis
2017-02-08 04:33
I see that from those API docs that the example was binding the one ?rebar-installed-role? role to the node which was defined as ?a usefule set of node roles"

wdennis
2017-02-08 04:42
And I see that on http://digital-rebar.readthedocs.io/en/latest/api/common.html in section 5.5.1.7.2.2 that one has to (should?) create a ?deployment to deploy the nodes into?, but ? why? What exactly is a deployment supposed to be? A collection of nodes that were installed with the same roles/attributes? (I was thinking it was a defined collection of roles that would be applied to a node when a node was put into the deployment)

wdennis
2017-02-08 04:48
There should be a way to define a collection of roles that conforms the target machines to an intended environment, which includes firmware, OS, and systems software, that serves as a base over the group of machines

wdennis
2017-02-08 04:48
Then other roles can be layered on top on selected machines as desired (like the k8s stuff for instance)

greg
2017-02-08 05:10
@wdennis - we have some things to discuss. With regard to DigitalRebar's models and how they are employed to get you where you want to go.

greg
2017-02-08 05:11
First that objects: Deployments, Nodes, Roles, NodeRoles, DeploymentRoles and Attributes.

greg
2017-02-08 05:12
Deployments = bags of nodes. They just holding cells collect nodes. We usually ascribe purpose to a deployment. System deployment holding cell for discovered nodes. RSEnv1 - the nodes in RSEnv1.

greg
2017-02-08 05:13
Nodes = Things DR is operating on. VMs, machines, cloud instances, a linux instance for example.

greg
2017-02-08 05:14
Roles = Atomic functions that we want to apply to a node. These have dependencies and attribute requirements that are met to build a graph of actions to sequence against nodes.

greg
2017-02-08 05:14
NodeRoles = an instance of a Role applied to a Node within a deployment.

greg
2017-02-08 05:15
The matrix tab in the deployment is visualization of this. Nodes are rows, Roles are columns. Cells are NodeRoles.

greg
2017-02-08 05:15
DeploymentRoles are special node roles for the Roles within a deployment. MOre on this in a second.

greg
2017-02-08 05:16
Attributes - typed pieces of information that are stored on the objects.

greg
2017-02-08 05:17
Attributes can live on noderoles, nodes, deploymentroles.

greg
2017-02-08 05:19
When a machine PXE boots to sledgehammer, sledgehammer creates a node in DR and places that node in the system deployment. It also adds the noop role rebar-managed to the node. This role has dependencies that cause the initial 15-ish roles to be added to the node. Sledgehammer marks the node active after putting an SSH key in place.

greg
2017-02-08 05:20
When the system added roles to the node, it created noderoles to present that roles execution on the node. It also added deployment roles to the system deployment for those roles as well.

greg
2017-02-08 05:20
The annealer (the part that executes noderoles) sequences all the node roles and starts executing them in order.

greg
2017-02-08 05:21
To execute a node role, its attribute requirements must be met. All this means is that a role can require an attribute (like what os should I install), the annealer will try to find a value for that by checking things in order.

greg
2017-02-08 05:22
In order by node, noderole, deploymentrole, role. This way you can create defaults by node, by deployment, by global.

greg
2017-02-08 05:28
When you use the wizard to install the OS, you are really hiding about 5 steps. First, you are creating a deployment for tracking your nodes for this function set. Second, you are moving the nodes into the deployment for safe keeping, Third, you are adding noderoles and deploymentroles to the node and deployment. For the install OS wizard, this is done by adding the rebar-installed-node role (a noop) to the node. This also brings in the provisioner_os-install role through dependencies. Fourth, you are setting the target os attribute on the deployment_role to make it apply to all nodes in the deployment. I think. IT could be putting it on the node instead. I'd have to check. Either way, the target os is getting set. Fifth, the deployment and nodes are being committed causing the actions to take effect.

greg
2017-02-08 05:29
FYI, you can see the deployment roles in the matrix view. They are the column headers. Those links take you to the deployment roles attribute setting pages.

greg
2017-02-08 05:30
nodes in a deployment do NOT have to all have the same roles assigned to them.

greg
2017-02-08 05:31
For examples, a kubernetes deployment would have a set of etcd nodes, master nodes, and worker nodes. All of these have different node role assignments, but are all held within one deployment to represent the k8s instance.

greg
2017-02-08 05:32
You ansible analogy is not correct from a underlying tech perspective, but it does probably reflect what it looks like is going on from the outside and isn't a bad way to think about roles requiring other roles.

greg
2017-02-08 05:33
You have conflated action and configuration in your last part

greg
2017-02-08 05:33
roles are actions, attributes are the configuration.

greg
2017-02-08 05:35
So there are roles to do bios setting, firmware flashing, raid configuration, ipmi configuaration, OS installation, deploy kubernetes, ....

greg
2017-02-08 05:36
There are attributes that for the instances defined in a deployment control the configuration of that action for those set of nodes that have been declared to have that action done to them.

greg
2017-02-08 05:36
It turns out there is one more feature that makes more of what you want functional.

greg
2017-02-08 05:36
The Profile.

greg
2017-02-08 05:37
The Profile is a attribute/value list that overrides everything.

greg
2017-02-08 05:37
You can give a node profile and it will use those attribute values as configuration.

greg
2017-02-08 05:39
This way could build a profile that declared I want a raid5 volume and a raid10 volume and make the raid10 volume bootable, I want all the components at the latest levels, I want Ubuntu-16.04-ks, I want k8s verison 1.5.3, ....

greg
2017-02-08 05:40
The you apply all the roles to them. (Yes, we have a road map item that the profile should include a set of roles to apply to the node).

greg
2017-02-08 05:40
Add in some rule engine event handlers and all of this can be automated.

greg
2017-02-08 05:41
It sounds really complex, but it isn't too bad. Just takes a little to get you started with the right frame of reference.

greg
2017-02-08 05:41
That was a dump.

greg
2017-02-08 05:42
Got it? :slightly_smiling_face:

wdennis
2017-02-08 14:56
Epic - a lot to digest here...

wdennis
2017-02-08 14:57
"Digital Rebar - The Missing Manual" :)

greg
2017-02-08 15:34
I like to think it as the "DigitalRebar - An Epic Journey of a Thousand Poorly Worded Pages That You Can Not Find"

wdennis
2017-02-08 16:28
lulz

wdennis
2017-02-08 16:28
All things in time...

vlowther
2017-02-08 16:55
Yes, and the most important words are scattered about as comments in the source :slightly_smiling_face:

wdennis
2017-02-08 17:16
I created a glossary for myself; do these definitions make sense (or if not, how should I rewrite them?) * ?Annealer" - the system that executes NodeRoles * "Barclamp" - Collections (associations, "bags") of Roles, each having a "Jig" that implements them, along with Attributes and possibly a wizard * "Jig" - modular plug-in arch that abstracts configuration managers (i.e. Chef, shell, Ansible, ?noop?, etc) - "noop" jigs are milestones * "Sledgehammer" - PXE boot img used to do initial bare-metal Node discovery & inventory * "Node" - Object that DR operates on; Metal or VM system managed by DR * ?NodeRoles" - an instance of a Role applied to a Node within a Deployment * "Deployments" - Collections (associations, "bags") of Nodes. We usually ascribe purpose to a Deployment. * ?DeploymentRoles" - special NodeRoles for the Roles included in a Deployment. * "Workloads" - Barclamps with wizards * "Profiles" - lets you create a block of Attributes that can be applied to Nodes to override NodeRole settings * "Roles" - atomic functions (actions) that we want to apply to a Node; have dependencies and Attribute requirements that are used to build a graph of actions to sequence against Nodes; members of a Barclamp and are run by a Jig; can have Parent(s) and Children Roles * Attributes - typed pieces of information that are stored on the objects. Attributes can exist on NodeRoles, Nodes, and DeploymentRoles. They are used as configuration parameters by the Role.

vlowther
2017-02-08 17:25
Looks pretty good.

greg
2017-02-08 17:26
@wdennis - great job on decoding the rambings

wdennis
2017-02-08 19:32
Hey @greg - every time I hack on the Ubuntu kickstart template, and want to give it another try (the installer is currently failing with an error), it is enough to simply reboot the node, or do I have to do the ``` rebar nodes update 12 '{ "bootenv": "sledgehammer" }' rebar nodes update 12 '{ "bootenv": "ubuntu-16.04-ks-install" }? ``` dance?

greg
2017-02-08 19:35
You have to do the dance to rebuild the static files.

wdennis
2017-02-08 19:35

greg
2017-02-08 19:36
The files are not built upon request.

wdennis
2017-02-08 19:36
OK, thanks

2017-02-10 23:35
Hi! I've been watching crowbar for a while, and wondered if the current version would be suitable for running a small (16 node) bare metal lab? I'm basically looking to replace foreman. I'd need the ability to pxe boot, install various OSs - including 'odd' ones like CoreOS or fBSD, and have a nice webui for determining which nodes got (re-)installed with what OS

2017-02-11 00:41
@mech422 yes, that's exactly what we're talking about. CoreOS is an interesting one - totally possible except that that we need to discuss if SSH to it is OK or not.

2017-02-11 00:42
here's a video (there are several before this one) that shows the reprovisioning process https://www.youtube.com/watch?v=fFsaOUbmb9g&index=10&list=PLXPBeIrpXjfgurJuwVjZkcfmatCoXYM_v

2017-02-11 00:46
@zehicle ahh - thanks... fBSD is a big one for us as well. a lot of our production nodes are running it

2017-02-11 00:47
I'm just trying to slide rebar in the lab to get a lil PR for it :-)

2017-02-11 00:48
the current list includes debian-7, debian-8 and scientificlinux-6.8. I know that @VictorLowther on our team would know the delta for fBSD.

2017-02-11 00:48
generally, linux flavors are manageable

rstarmer
2017-02-11 00:55
has joined #community201702

wdennis
2017-02-14 04:23
So, a couple of questions to ruminate on...

wdennis
2017-02-14 04:24
1) How to add new nodes (or reinstall existing rebar-managed nodes) to an existing deployment?

wdennis
2017-02-14 04:26
2) If you want to install a node with a given OS (let?s say Ubuntu 16.04) but the only thing you want to change is the disk partitioning, what?s the best way to handle that? Separate bootenv?s with different templates, or the same bootenv with different templates?

wdennis
2017-02-14 04:37
3) Is there a way to compose templates where for instance you want to use the same basic overall master template, but insert sections inside that template which themselves could be templates? Again like for maintaining different standard disk partitioning layouts, multi-disk setups, default root passwords, etc.?

zehicle
2017-02-14 15:39
1) I assume you mean ones that are not in the database? You can use the rebar-join script from the node or outside it if you have SSH access. You can also use the API/CLI to create the nodes in advance so they are pre-identified when they show up. 2) @vlowther may have better idea. I think that can be handled by profiles where the attribs are different in each profile 3) that's another question @vlowther but I believe yes.

vlowther
2017-02-14 15:50
2) There are a couple of different ways to handle that. Quickest is to have a new bootenv and a new seed template. Most flexible is to abstract the partitioning stuff out into a per-node attrib, and expand that in the template.

sameroom
2017-02-14 15:50
Whoops! You've exceeded your daily message limit on this Sameroom account (it will reset in *24 hours and 0 minutes*). If you have too many Tubes for your budget, pause or delete some Tubes on https://sameroom.io/manage. If you dont have a subscription, visit https://sameroom.io/pricing to upgrade to unlimited messaging.

2017-02-14 15:51
2) There are a couple of different ways to handle that. Quickest is to have a new bootenv and a new seed template. Most flexible is to abstract the partitioning stuff out into a per-node attrib, and expand that in the template.

2017-02-14 15:52
3) Not as of yet, no. It is a good ask, tho -- the Go text/template language allows for such things, but I would need a reasonable way to expose that functionality.

2017-02-14 16:21
@mech422 I have not messed with freebsd in... a long time. The last time I tried playing around with any of the *bsds was approx. NetBSD 1.5.

2017-02-14 16:22
That said, if there is a way to do an unattended install the uses PXE and does not use NFS, we can probably handle it.

2017-02-16 05:24
I deployed rebar on an instance in AWS and then deployed k8s with rebar. How can I access the k8s dashboard ui? how can I long into the k8s linux instance? It takes in username ubuntu but asks for password.

greg
2017-02-16 14:30
https://IP of master node in k8s/ui should get you the dashboard.

greg
2017-02-16 14:31
You iwll need to have https open for that security group.

greg
2017-02-16 14:31
From the AWS digitalrebar node, you should be able to login into the other nodes as root.

greg
2017-02-16 14:32
@tnkumar - hopefully, that helps.

greg
2017-02-16 15:45
Test

2017-02-16 15:47
@mech422 - strange. How many nodes are you booting at once? What is this running on? I've seen something like this before. Our simple go-based tftp server wigs out in some environments that I don't understand.

2017-02-16 15:48
- connection between gitter and slack fixed - sorry about the delays

2017-02-16 15:48
gitter users are welcome to guest accounts on slack

2017-02-16 15:48
what I mean is what is the base OS of your admin node.

2017-02-16 15:49
thanks @zehicle

2017-02-16 15:51
to create Slacks, just use https://sameroom.io/SLrYards

2017-02-16 15:51
@zehicle I just tried iptables -F and rebooted the 'test server' - still got dhcp and timeout with tftp

2017-02-16 15:52
@zehicle I'm only booting 1 'test server' at a time atm - I have a total of 16 test nodes availabe

greg
2017-02-16 15:53
@mech422 - what is your base OS? I think I can give you a workaround.

2017-02-16 15:54
Ubuntu 16.04

greg
2017-02-16 15:56
convenient

greg
2017-02-16 15:56
Let's try this.

greg
2017-02-16 15:56
We are going to do two things.

greg
2017-02-16 15:56
First we are going to move the tftpport that that provisioner is using to something else.

greg
2017-02-16 15:56
To do this, we need to do the following:

greg
2017-02-16 15:57
cd digitalrebar/deploy/compose

greg
2017-02-16 15:57
vi compose.env

greg
2017-02-16 15:57
add to the end of the file.

greg
2017-02-16 15:57
TFTPPORT=6699

greg
2017-02-16 15:57
save the file

2017-02-16 15:57
end of file ? I that file doesn't exist for me ?

greg
2017-02-16 15:57
oops - common.env

2017-02-16 15:58
ok - added

greg
2017-02-16 15:58
docker-compose restart provisioner

greg
2017-02-16 15:59
once that is done.

greg
2017-02-16 15:59
you should be able to do:

greg
2017-02-16 15:59
ps auxww | grep provisioner-mgmt

2017-02-16 15:59
rebar@bernie:~/digitalrebar/deploy/compose$ sudo netstat -tulpn | grep 69 tcp6 0 0 :::443 :::_ LISTEN 18699/docker-proxy udp6 0 0 :::69 :::_ 25070/provisioner-m

greg
2017-02-16 15:59
it should show look like this:

greg
2017-02-16 15:59
root 137266 0.0 0.0 1506036 29616 ? Sl Feb12 4:50 provisioner-mgmt --api-port 8092 --static-ip 136.179.33.28 --static-port 8091 --tftp-port 6699 --file-root /tftpboot

2017-02-16 16:00
still says 69....

2017-02-16 16:00
let me double check I saved the file

greg
2017-02-16 16:00
yeah - thinking .

greg
2017-02-16 16:00
okay - try this.

greg
2017-02-16 16:00
docker-compose stop provisioner ; docker-compose rm -f provisioner ; docker-compose start provisioner

2017-02-16 16:03
hmm - it keeps saying 'ERROR: no containers to start'

greg
2017-02-16 16:04
what directory are you in?

2017-02-16 16:04
dr/deploy/compose

greg
2017-02-16 16:04
flu induced haze.

greg
2017-02-16 16:04
docker-compose up -d provisioner

2017-02-16 16:05
ahh - rebuilding the image

greg
2017-02-16 16:05
you need to run another tftp server

2017-02-16 16:06
'another' as in tftpd-hpa ? or another dr container ?

2017-02-16 16:07
ohhh... that docker-compose up bombed

2017-02-16 16:07
'Image digitalrebar/dr_provisioner not found'

greg
2017-02-16 16:07
export DR_TAG=latest

greg
2017-02-16 16:07
yes tftpd-hpa

2017-02-16 16:08
this server is my foreman server too - so I have tftpd-hpa, isc-dhcpd, and bind9 already installed an working...I can restart them if needed

greg
2017-02-16 16:08
export DR_TAG=master

greg
2017-02-16 16:09
Let's get the provisioner running again.

greg
2017-02-16 16:09
IT sounds like you know how to run tftpd-hpa already.

greg
2017-02-16 16:10
I need you to serve the .cache/digitalrebar/tftpboot directory.

greg
2017-02-16 16:10
I use this ``` foundry@master-admin:~/digitalrebar/deploy/compose$ cat /etc/default/tftpd-hpa # /etc/default/tftpd-hpa TFTP_USERNAME="tftp" TFTP_DIRECTORY="/home/foundry/.cache/digitalrebar/tftpboot" TFTP_ADDRESS="[::]:69" TFTP_OPTIONS="--secure" ```

2017-02-16 16:10
ok - provisioner is back up

2017-02-16 16:10
rebar@bernie:~/digitalrebar/deploy/compose$ sudo netstat -tulpn | grep 69 tcp6 0 0 :::443 :::_ LISTEN 18699/docker-proxy udp6 0 0 :::6699 :::_ 10089/provisioner-m

greg
2017-02-16 16:10
Cool

2017-02-16 16:10
lemme restart tftpd-hpa real quick

greg
2017-02-16 16:11
make sure to change it base directory to the digitialrebar place.

2017-02-16 16:13
ok - dir changed, perms checked and service restarted

2017-02-16 16:13
shall I go restart the test node ?

greg
2017-02-16 16:13
yes

2017-02-16 16:17
odd - that gave a timeout too... I think I need to change the ip on tftpd-hpa - I had it pinned to the pxe interface, but I think DR is handing it the mgmt ip

greg
2017-02-16 16:17
probably

2017-02-16 16:18
ok - its listening on 0.0.0.0:69 now - lemme go reboot

greg
2017-02-16 16:18
we use the admin-ip flag

2017-02-16 16:21
yeah - I used that to specify the 'web ip' as I wasn't sure what was what...

2017-02-16 16:21
my 'mgmt' network and 'pxe' network are diff

2017-02-16 16:21
ok - so that got me pxelinux

greg
2017-02-16 16:21
okay - should boot into sledgehammer

greg
2017-02-16 16:21
a centos 7 ram image.

2017-02-16 16:21
but it seems to be stuck

greg
2017-02-16 16:21
The node should appear in the UX.

greg
2017-02-16 16:21
okay - sooooo

greg
2017-02-16 16:22
with regard to networking

greg
2017-02-16 16:22
is the pxe network routable to the mgmt network and vice versa.

2017-02-16 16:22
hmm - tftpd: read: Connection refused ?

2017-02-16 16:23
its sorta like a firewall setup...I have diff interfaces on the box - 1 for the 'pxe network' , 1 for the 'ipmi network' and 1 for the 'mgmt' network

2017-02-16 16:23
the rest of my lab gear can only hit the 'mgmt' network directly, so I need the web interface on that

2017-02-16 16:24
I have ipv4.ip_forward on, so I shouldn't need any routes, right ?

greg
2017-02-16 16:24
did you configure networks?

greg
2017-02-16 16:24
in digitalrebar?

2017-02-16 16:24
Umm - I tried too... I never got a 'the_bmc' network...

2017-02-16 16:25
but I added a 'pxe network' - and setup dhcp on it - it appears to be working, since we got dhcp boot

2017-02-16 16:25
last time I re-ran the install, I lost the 'admin-internal' network, though I don't know if thats used for anything

greg
2017-02-16 16:25
example more than anything else.

greg
2017-02-16 16:26
did you add a router in the admin network?

2017-02-16 16:26
tftpd-hpa is listening on all IPs now, so even if ips are baked into the DR boot stuff, it should respond....

2017-02-16 16:27
no routers added anywhere

greg
2017-02-16 16:27
okay - so my guess is that you gave pxe network address to the node, it tried to tftp to a mgmt-network ip.

greg
2017-02-16 16:27
it doesn't have a route for that.

greg
2017-02-16 16:28
Two ways to fix this.

2017-02-16 16:29
it shouldn't need a route? as both interfaces are on the same box ?

greg
2017-02-16 16:29
one is to add the admin node's PXE IP as the router for the network in the network page.

2017-02-16 16:29
ok - I can do that

greg
2017-02-16 16:29
it doesn't know where to send it though.

2017-02-16 16:30
I figured that 'linux responds to arps for all ips' thing would handle that - let me add the router and restart it :-)

greg
2017-02-16 16:30
the other is to change rerun the startup script with the PXE ip as the admin ip. This will clean things up you can then add your network back. We listen to 0.0.0.0 for ui and api functions.

2017-02-16 16:31
ok - if this works, I'll re-install like that

2017-02-16 16:31
brb

greg
2017-02-16 16:31
i"m going offline for a while. Need to nap. Later

2017-02-16 16:32
thanks :-)

2017-02-16 16:32
sledgehammer is loading :-)

2017-02-16 18:04
@zehicle Wierd - I stopped everything, purged everything, pulled fresh from git again, and ran: ./run-in-system.sh --help --con-provisioner --con-dhcp --account=rebar --access=HOST --admin-ip=192.168.51.70/24 --deploy-admin=local

2017-02-16 18:05
admin-internal still came up with the wrong ip ranges - but once I changed it to do dhcp for 192.168.51.0/24 - dhcp AND TFTPD work ?!?

greg
2017-02-16 18:15
yeah - we think there are some nat docker tftp issues when not all the ips align.

2017-02-16 18:18
oh - wb :-) How was your nap ?

2017-02-16 18:20
I think I only have 2 more things to fix before I can start provisioning: the 'discovery' stuff and the BMC/IPMI stuff

2017-02-16 18:20
I can see it booting discovery/vmlinuz - and it comes up to a centos login... but I'm not seeing any 'discovered' nodes in the webui

zehicle
2017-02-16 19:15
@mech422 that could mean that the booted system cannot connect back to the admin server to register

zehicle
2017-02-16 19:15
it would do it via port 443

2017-02-16 19:24
rebar@bernie:~/digitalrebar/deploy/compose$ sudo netstat -tulpn | grep 443 tcp6 0 0 :::443 :::* LISTEN 31737/docker-proxy

2017-02-16 19:25
its only listening on ipv6 - but tftp was the same, and recognized ipv4 sooo...

2017-02-16 19:25
I added the router on the network

2017-02-16 19:26
oh - should the router have a /24 or /32 netmask ? the field auto-fills with a /32 on the end - but the admin-internal network creates a router with a /24 ?

greg
2017-02-16 19:36
24

2017-02-16 19:39
does the discovery image have a login ?

greg
2017-02-16 19:40
root/rebar1

greg
2017-02-16 19:40
journalctl -u sledgehammer

greg
2017-02-16 19:40
often has useful info

2017-02-16 19:43
dam, I was close - I tried rebar/rebar1 :-P

2017-02-16 19:44
I think its a dns failure - its setting domain name to local.neode then dying with an unknown hostname a lil bit further down

2017-02-16 19:45
doesn't appear it got as far as trying to talk to the admin node

2017-02-16 19:57
'Unable to connect to Rebar: unable to verify existance of machine-install user: Get https://192.168.40.70/api/v2/machine-install'

2017-02-16 19:58
looks like its banging it by IP

2017-02-16 20:05
Hmm..thats wierd - none of the interfaces have an ipv4 address - eth0 just has an ipv6

2017-02-16 20:11
dhclient says it got a (correct) dhcp address and bound it to eth0

2017-02-16 20:12
but ip address show eth0 only has an ipv6

2017-02-16 20:22
re-ran dhclient and got eth0 an ipv4 address, but its still not happy - its dying around line 105 of /tmp/startup.sh

2017-02-16 20:23
(I can't manually create nodes thru the webui either, if that tells you anything ?)

zehicle
2017-02-16 20:51
the webui only can create with providers

zehicle
2017-02-16 20:51
you can use the CLI to inject nodes

zehicle
2017-02-16 20:52
docker logs -f compose_rebar_api_1 may show you errors if it's related to the node create API

2017-02-17 15:07
Hi, I've been following the tutorial locally, but would like to test it out with packet.net however is the rackn100 code still vaild?

greg
2017-02-17 15:45
@grealish - I thought so, but @zehicle would know better.

greg
2017-02-17 15:45
@mech422 - Thinking about your issue. Can you send me a snippet with the journalctl -u sledgehammer

greg
2017-02-17 15:55
@mech422 - I sometimes see issues like this with dueling dhcp servers. We get an initial load image, but then get a different dhcp response in sledgehammer and it is missing some parameters. Maybe?

2017-02-17 16:34
@galthaus Umm - I'd love - can you give me a couple of hours ? Slammed with the Friday morning meetings atm

greg
2017-02-17 17:24
I'm on and off. The whole flu thing is maybe done, but pretty fatigued so I come and go.

2017-02-17 20:22
@grealish yes! but I think it's good for $35


2017-02-17 20:27
connecting this to #digitalrebar on freenode




zehicle
2017-02-17 21:10
testing IRC connection


zehicle
2017-02-17 21:14
hello IRC users?!

2017-02-17 21:14
yes, IRC connection is working.

2017-02-17 22:05
sweet

2017-02-18 15:48
Morning

greg
2017-02-18 15:52
hi

2017-02-18 15:52
@zehicle Have you seen https://tumblr.github.io/collins/index.html ?

greg
2017-02-18 15:53
Well this is @greg

2017-02-18 15:53
oh sorry! Morning :-)

greg
2017-02-18 15:54
np. I suspect that all the cross connecting of apps will get things confused, but we'll see. :slightly_smiling_face:

2017-02-18 15:55
heh - yeah, we're sort of spoiled for choices lately - irc, slack,gitter, etc etc

2017-02-18 15:57
I've gotta figure out a cloud-init script to boot our 'standard' company golden images on openstack with ldap and everything working...

2017-02-18 15:57
hopefully, I should be done with that this weekend, and can get back into DR begginning of the week

greg
2017-02-18 15:58
okay - cool - good luck with that.

2017-02-18 15:59
thanks - regarding the sledgehammer logs - I'm 90+% certain theres only 1 dhcp server on the vlan...I ran dhclient manually from sledgehammer, and wasn't surprised by the ip it said responded

greg
2017-02-18 16:00
ok - the next step will be things like cat /proc/cmdline and some others to make sure all the options got sent with the right values.

2017-02-18 16:01
yeah - just re-ran dhclient and its the right dhcp server ip thats responding - ok, I can check the options real quick if ya like - i just ran it again

greg
2017-02-18 16:02
From the sledgehammer root login, cat /proc/cmdline

greg
2017-02-18 16:02
that should have sent some stuff.

greg
2017-02-18 16:03
cat /var/lib/dhclient/dhclient.leases

greg
2017-02-18 16:03
That should show the DHCP options.

greg
2017-02-18 16:04
Those are the inputs for the script.

2017-02-18 16:05
dhclient has multiple entries since I re-ran it multiple times - its the last entry in the file for current run, right?

greg
2017-02-18 16:05
it will probably try and match them all.

2017-02-18 16:06
blah - sledgehammer doesn't allow root ssh - lemme go create a user



2017-02-18 16:11
hmm - I could blow away the leases file and re-run dhclient?

greg
2017-02-18 16:14
ah = yes

2017-02-18 16:14
the last lease entry got cut off somehow - but it was basically the same as the one above it - option dhcp-rebinding-time, domain name, rebind, renew nad expire times

greg
2017-02-18 16:14
on the adin nod

greg
2017-02-18 16:14
on the admin node, cd ~/.cache/digitalrebar/tftpboot

greg
2017-02-18 16:14
you addresses are 192......

greg
2017-02-18 16:15
ls C0*


greg
2017-02-18 16:16
remove all the 192.* C0\* files

2017-02-18 16:17
<root@bernie>:/home/rebar/.cache/digitalrebar/tftpboot# rm 192.168.51.200.ipxe C0A833C8.conf

greg
2017-02-18 16:17
yes and then there is a C0 file in pxelinux.cfg directory

2017-02-18 16:17
ok - removed

greg
2017-02-18 16:19
reboot the node

greg
2017-02-18 16:19
the not the admin node

2017-02-18 16:19
client/test node rebooting

2017-02-18 16:22
Hey! we have a node :-)

2017-02-18 16:23
and its in error state now :-)

2017-02-18 16:24
looking at the node - all steps up to 'rebar-managed-node' are green

2017-02-18 16:24
'rebar-managed-node' is yellow but 'amt-discover' is green

2017-02-18 16:25
bios-discover is yellow (HP Proliant G5 blades)

2017-02-18 16:25
ipmi-discover is yellow

2017-02-18 16:25
raid-tools-install is red

2017-02-18 16:25
raid-discover is yellow

2017-02-18 16:27
oh wow - gitter inlines gists ?

greg
2017-02-18 16:33
You need to read some more. :slightly_smiling_face:

greg
2017-02-18 16:33
The hardware tools need some downloads that we can't provide because of licensing.

greg
2017-02-18 16:34
yellow means pending - red means error - green means done.

greg
2017-02-18 16:34
raid-tools-install is the predecessor of most of the discovers.

2017-02-18 16:35
cool - I'll see if I can fuxor it around till its happy

greg
2017-02-18 16:35
``` Gregs-MacBook-Pro-2:provisioner galthaus$ ls ~/.cache/digitalrebar/tftpboot/files/raid/ 8.07.14_MegaCLI.zip SAS2IRCU_P19.zip SAS2IRCU_P20.zip ```

greg
2017-02-18 16:36
MegaCLI and P20. There are links in the docs. Also if you click on the red icon, it may give you the link to the file to download.

2017-02-18 16:36
Can I just reboot the node after I get the firmware blobs ?

greg
2017-02-18 16:36
You can, or you can click the retry button on the red icon. If you get to the annealer, upper right button/icon in ux.

greg
2017-02-18 16:37
the errors are separate and there is a retry all button there.

2017-02-18 16:37
(bios update / configuration is a big one for me - we have 10k-15k metal nodes in 80 pops)

2017-02-18 16:37
sweet! thanks for the help!

greg
2017-02-18 16:37
okay - that is cool and good for us to hear. We may want to have a conversation at some point about support and hardware types.

2017-02-18 16:38
sure - I'd be happy to provide any info I can

2017-02-18 16:38
we're mostly a supermicro shop (looks like we'll be transitioning to dell in the future)

greg
2017-02-18 16:38
At that scale and size, you may want some consulting and contractual support.

greg
2017-02-18 16:39
We have the beginning of supermicro support and great dell support.

2017-02-18 16:39
Time to feed the :bear:!

2017-02-18 16:39
yeah - thats gonna be a hard sell, but we DO employ 2-3 full time fBSD core committers on staff

2017-02-18 16:39
so might be able to work something out that way - we do give back to stuff we use

greg
2017-02-18 16:40
ok

2017-02-18 16:41
anyway, I'll be happy to provide any info I can

greg
2017-02-18 16:42
cool

2017-02-18 16:42
once we get going, maybe I can get my boss talking to you guys

greg
2017-02-18 16:42
yeah - that is fine.

2017-02-18 16:44
@mech422

2017-02-19 20:25
@galthaus Just a heads up - after following: http://digital-rebar.readthedocs.io/en/latest/deployment/install/raid-bios.html the 'raid tools install' step is still red, and 'bios-discover' is still yellow

greg
2017-02-19 20:25
It has a log in the node role. It will probably tell you what is wrong.

2017-02-19 20:26
@galthaus Also, am I understanding correctly that rebar needs to control/configure the BMCs ? my IPMI/BMC network is a seperate DHCP network, currently not under rebar control

greg
2017-02-19 20:27
It won't since you don't have one configured.

greg
2017-02-19 20:27
It will look at your current settings though.

greg
2017-02-19 20:27
I think.

2017-02-19 20:27
@galthaus err... you mean a log on the node? or a log named after the node on the admin node ?

greg
2017-02-19 20:27
In the UI, you can click into the red icon and see the log of what went wrong.

2017-02-19 20:29
hmm - looks like the rpm version got bumped: caution: filename not matched: Linux/MegaCli-8.07.14-1.noarch.rpm

2017-02-19 20:29
lemme unpack the zip and see whats in there

greg
2017-02-19 20:29
hmm _ thought I fixed this potentially.

2017-02-19 20:31
oh - the download name was different too - it downloaded as 'Linux_MegaCLI_8.07.07.zip' so I symlinked it to '8.07.14_MegaCLI.zip'

2017-02-19 20:31
yeah - it extracts to 'MegaCli-8.07.07-1.noarch.rpm'

2017-02-19 20:34
btw - do I also need to download the dell/supermicro/etc bios tools ?

greg
2017-02-19 20:34
you will need the sum tool, I believe.

2017-02-19 20:45
grrr...this boils down to broadcom's site sucks, I think :-P the search doesnt seem to find 8.07.14 and some results claim MegaCLi 5.5 P2 is 'latest'

greg
2017-02-19 20:45
yes - I had the exact link earlier.

greg
2017-02-19 20:47
I use this

greg
2017-02-19 20:47

greg
2017-02-19 20:48
Then rename it to: 8.07.14_MegaCLI.zip in files/raid under the tftpboot directory.


greg
2017-02-19 20:49
is the other one.

greg
2017-02-19 20:50
In fact, updating barclamp info shortly.

2017-02-19 20:51
thanks - rebooting client node

2017-02-19 20:56
Woot! Node shows all green upto 'firmware flash'

greg
2017-02-19 20:56
cool

2017-02-19 20:57
'firmware-flash' 'ipmi-configure' and 'rebar-hardware-configured' are all blue

2017-02-19 20:57
guess I need the bios flash tools now ?

greg
2017-02-19 20:59
no

greg
2017-02-19 20:59
They are waiting to run.

greg
2017-02-19 20:59
It indicates that they are awaiting config.

greg
2017-02-19 21:00
You can commit those and they will run with their current config or you can make changes.

greg
2017-02-19 21:00
At this point, you usually add a workload (os install, or k8s or whatever) and when you commit, it pushes the node through the rest of the process.

2017-02-19 21:01
ahh - lets see if I can do an OS install...

greg
2017-02-19 21:01
firmware-flash may not do aything if nothing matches. ipmi-configure will set the root password to cr0wBar! and configure an IP if there is a BMC network.

2017-02-19 21:03
Hmm - I like IPMI - saves me having to walk around rebooting stuff - I'll probably have to redo my ipmi network to play nice with rebar though

2017-02-19 21:18
well, that was painless :-)

2017-02-19 21:18
both foreman and rebar end up with unbootable systems on the HP nodes though

2017-02-19 21:19
something funky with the bios/grub I think - oddly enough, it works fine if you install manually via the installer

2017-02-19 21:21
Firing up one of the dell nodes now

greg
2017-02-19 21:27
What OS? Probably a tweak for the kickstart. We would like to know more about that.

greg
2017-02-19 21:27
Batman movie time

2017-02-19 21:27
ohhh - sounds good - I'm waiting for Dr. Strange to hit Vudu

2017-02-19 21:28
its ubuntu 16.04 (for some reason, thats the only OS option I get - could have sworn the Centos iso was there too)

2017-02-19 21:28
I'll try and track down the preseed issue - its gotta be some setting in there somewhere

2017-02-19 21:37
Hmm - Dell C6100 cloud server nodes not booting after install either

2017-02-19 21:37
starting to think ubuntu 16.04 does a crap job with grub :-P

2017-02-19 22:29
I think I FOUND IT!

2017-02-19 22:29
the hp machine was installed with no partition flagged as bootable

2017-02-19 22:30
I marked /boot as bootable and it seems to be fine

2017-02-19 22:32
gonna check the Dell node now

2017-02-19 23:06
err - whats the default login for machines provisined as Ubuntu 16.04 ? I've tried 'rebar/rebar', 'rebar/rebar!', 'rebar/cr0wBar!', 'root/rebar' , 'root/rebar!', 'root/Cr0wBar!'....

greg
2017-02-19 23:52
rebar/rebar1

greg
2017-02-19 23:53
If we configure raid, then we mark the drive bootable.

2017-02-20 00:09
yeah - I tried rebar/rebar1 - but I'm rebooting and I'll give it another try

2017-02-20 00:09
I couldn't figure out which preseed is actually being used - theres 4 or 5 of them in the tftpd ubuntu/preseed dir

greg
2017-02-20 00:10
regardless. From the admin node, root passwordlesss should work.

greg
2017-02-20 00:10
if that doesn't work, then add your ssh key to the rebar-access data attribute and rerun the role.

greg
2017-02-20 00:10
That willl allow you root access.

2017-02-20 00:11
I was trying to figure out how you set attributes....

greg
2017-02-20 00:11
propose the deployment the node is in.

greg
2017-02-20 00:11
Edit the attribute.

2017-02-20 00:11
ahh - ok

greg
2017-02-20 00:11
Commit the deployment.

2017-02-20 00:13
yeah - no bootable partition on the dell node either

2017-02-20 00:13
not sure why it even booted ?

2017-02-20 00:13
and it keeps powering off - hope its not a hardware problem

2017-02-20 00:15
gitter to IRC bridge test....

2017-02-20 00:15
damn - it's not a three way thing

2017-02-20 00:16
IRC to gitter

greg
2017-02-20 00:16
Oh - we power things off.

2017-02-20 00:16
gitter to irc

2017-02-20 00:16
nm-> it's working. user error

greg
2017-02-20 00:16
@mech422 - DR powers things off by default.

greg
2017-02-20 00:16
if IPMI is configured and usable, DR will power off nodes that are "bored".

greg
2017-02-20 00:17
You can change that with stay_on attribute on the node. It defaults to false. Set it to true and it will keep the node on.

2017-02-20 00:25
ok - so the boot issues appear to just be the /boot partition not being marked bootable in the MBR

2017-02-20 00:25
I'm trying to re-deploy now after having changed the attributes

2017-02-20 00:26
it wasn't happy with 'redeploy all nodes' after commiting the deployment, so I'm powering the node back up before trying it again

2017-02-20 00:27
hmm - still an error

2017-02-20 00:33
and it turned the node off again

2017-02-20 00:33
it was about half way thru the 'node roles' thing - I don't think I can 'redeploy all' until the node is 'green' again, right ?

2017-02-20 00:37
dinner time!

greg
2017-02-20 01:16
you redeploy before. iT should not have turned the node off until all green. You may need to refresh the UI to get the latest data. Also powering off the node marks it as not alive and not all green as a consequence.

2017-02-20 15:37
@mech422 it's on my radar and I had a discussion with a past user. it looks like an internal project that was opened. Good validation of the overall type of workflow that we're building as generic cross function.

rstarmer
2017-02-21 19:12
are there instructions for deploying DR on a mac directly? I thought I?d seen them before, but I can?t seem to find quite the right incantation.

greg
2017-02-21 19:53
there are and there aren't/

greg
2017-02-21 19:53
Docker made it much harder.

greg
2017-02-21 19:54
Their latest attempt to run docker on Macs does NOT work well with what we need to do in the environment for networking. So, you need the boot2docker based methodology that is harder to find now.

rstarmer
2017-02-21 21:01
Ok, I can just deploy via VM then!

zehicle
2017-02-24 16:15
we made changes to the ansiblie install lastnight - there may be issues, we are investigating

zehicle
2017-02-24 16:15
we = I

2017-02-28 00:00
is there an ansible playbook for deploying DR? I see mention of ansible deployment here ? http://digital-rebar.readthedocs.io/en/latest/deployment/install/ansible.html but the repo it refers to is nonexistent/private, and it mentions ubuntu 14 target system instead of 16 so I'm assuming the playbook was made private for not being up to date

zehicle
2017-02-28 03:41
Yes, in /deploy/digitalrebar.yml


zehicle
2017-02-28 03:43
run-in-system basically stages the Ansible run

zehicle
2017-02-28 03:44
I'll review that page and update it. We've been focused on the run-in-system install

zehicle
2017-02-28 03:53
@Iae - you are right, that page was out of date! Thanks for alerting me. I'm fixing it

zehicle
2017-02-28 04:01
wow - that page had a lot of old stuff. I've pruned it.

2017-02-28 04:54
@lae I've updated the page - try http://digital-rebar.readthedocs.io/en/latest/deployment/install/README.html as a source. We generally recommend using the quickstart script for first setup

2017-02-28 04:54
http://digital-rebar.readthedocs.io/en/latest/deployment/install/quick.html#quick-start

2017-02-28 17:05
Hm, I was hoping to be able to just add a role and set some variables to our infrastructure playbook

zehicle
2017-02-28 17:34
Depending on complexity, real docs on that. We do it by example and code checks.

zehicle
2017-02-28 17:34
On the list of things to document...

2017-02-28 20:21
Hello, I'm trying to install DR on metal with DNS and DHCP. I'd like to use a self-signed cert. I'm running the quckstart with --validate_certs=False flag, but it's still attempting to validate. Any ideas? Thanks in advance!

greg
2017-02-28 20:26
Hi @richie9352 - what is still attempting to validate certs?

greg
2017-02-28 20:28
validate_certs is an ansible flag associated with the get_url task. We don't have that any where?

greg
2017-02-28 20:28
What command are you trying to run?

2017-02-28 20:28
Hi @zehicle - The quckstart script. Failed to validate the SSL cert for github-cloud.s3.amaonaws.com:443. I'm assuming this is because I'm running on metal, not aws.

greg
2017-02-28 20:28
hmm - okay - checking

2017-02-28 20:30
Yes, I couldn't find any reference to this flag in the docs.

zehicle
2017-02-28 20:31
@richie9352 is the quickstart.sh not downloading or are you getting the script to start?

greg
2017-02-28 20:31
oh - S3 at Amazon is down starting at about 11:00a CST.

greg
2017-02-28 20:32
That would cause this error.

zehicle
2017-02-28 20:32
!!

2017-02-28 20:32
Yes the script is starting.

2017-02-28 20:32
Ahh! That would cause it

greg
2017-02-28 20:32
Looks like it is still down.

zehicle
2017-02-28 20:33
ouch - yes, worldwide impact

greg
2017-02-28 20:33
soooo - wait for that to get fixed. :slightly_smiling_face:

2017-02-28 20:33
Wow, that's not good


wdennis
2017-02-28 20:33
The cloud will make everything better & easier, they said... ;)

2017-02-28 20:33
Well, At least I know it wasn't my install to blaim :)

greg
2017-02-28 20:33
Or my code :slightly_smiling_face:

2017-02-28 20:33
Haha! True

zehicle
2017-02-28 20:34
maybe I should plug the cloud back in...

wdennis
2017-02-28 20:34
Hybrid cloud ftw ;)

2017-02-28 20:35
obligatory relevant xkcd: https://xkcd.com/908/

2017-02-28 20:35
Thanks for the help guys!

greg
2017-02-28 20:36
:slightly_smiling_face:

2017-02-28 21:04
thanks amazon