2017-05-01 16:13
I've done an isolated install, created a subnet (on my eth2, overlapping the subnet already on the NIC) and when the machine rebooted PXE, drp showed logs of it discovering it, but then the dr-provisioner process died and now the UI hates the default password.

2017-05-01 16:13
I restarted the process with the same command - ./dr-provisioner --etc...

2017-05-01 16:13
[sic dr-provision]

2017-05-01 16:14
for the webserver, go back the the root (w/o UI)

2017-05-01 16:15
BTW, got around the explode issue by turning off selinux. :(

2017-05-01 16:16
tried to just call the IP and port, but creds still failed.

2017-05-01 16:17
It forwarded to /ui

2017-05-01 16:17
dr-provision2017/05/01 16:16:46.489545 Bad auth header: Basic

greg
2017-05-01 16:20
okay - dr-provision died. I've been see some timeouts on DHCP writes. I'm going to address that. UI hating the default password. seems strange. for the UI, it is easies to use https://<ip>:8092/ui/?token=rocketskates:r0cketsk8ts

greg
2017-05-01 16:20
at least for me.

2017-05-01 16:21
Thanks, Greg. token= works.

greg
2017-05-01 16:23
For you selinux issue, was that your isolated install or the "production" install?

2017-05-01 16:23
only ever tried ISOLATED

greg
2017-05-01 16:24
okay - wow - so selinux prevent updates to a local directory. thanks. I'll add that info to the issue.

2017-05-01 16:26
My dhcp client host is gone. :(

2017-05-01 16:27
IPMI port not plugged in for that host. :(

greg
2017-05-01 16:27
yikes - did we make it worse?

2017-05-01 16:28
failing the DHCP -> TFTP handoff ... maybe the host will timeout somehow.

2017-05-01 16:28
Can tell if "nexthost" ever made it to the client.

greg
2017-05-01 16:29
How many machines? I just found something on that myself. Trying to isolate and fix. I have something that makes it "better". Not sure it completely fixes it.

2017-05-01 16:29
Just one.

2017-05-01 16:29
I'm off to that datacenter at the end of the week anyway. More serial cables - belt and suspenders.

greg
2017-05-01 16:30
hmm - didn't see it with just one. More with 4 - though if drp is seeing extra dhcp requests, we'll still read them to ignore them. I have a timeout on writes that seems to make things worse. New code in a bit that removes that write timeout.

2017-05-01 16:31
There are two NICs on another host that are making DHCP client noise.

greg
2017-05-01 16:31
nope - still get it more. Research on my side.

2017-05-01 17:40
@galthaus Greg - do you have time to Skype. I only got 1 success out of 16. Default CentOS copy of kickstart is partitioning badly and disk is full even though it is a 1 TB drive.

wdennis
2017-05-01 17:51
@greg - On http://provision.readthedocs.io/en/latest/doc/arch/data.html#template I don?t see entries for: .Env.OS.Family .Env.OS.Version - since they are used in the templates (at least for U16.04 preseed and post-install), perhaps they should be documented?

greg
2017-05-01 18:06
@intendo - not at the current moment. You are using DR. From Sledgehammer, you can see the drive order and what the drives are. Is there an extra USB drive or something messing with enumeration.

greg
2017-05-01 18:07
@wdennis - okay - Yes - they should be. The ubuntu/debian templates need them. Kinda. They are part of the bootenv values and on settable as parameters on the machine or profiles.

wdennis
2017-05-01 18:08
OK - maybe I?ll talk to you guys about joining the docs team when I see you all in a few days - since I?m ?beginner mind? and trying to learn this, I think I may be able to contribute

wdennis
2017-05-01 18:09
So, another q - what?s the best (easiest) way of updating DRP to latest (stable)?

greg
2017-05-01 18:09
that would be cool. Just need to add it to the pile of things to get around to.

greg
2017-05-01 18:09
Have a PR on that in just a bit.

wdennis
2017-05-01 18:10
Want to try adding a profile for a group of machines to set default params, then apply to machines

wdennis
2017-05-01 18:11
Not sure I?m on a version that can do that ? [dradmin@dr-admin ~]$ ./drpcli version Version: v2.9.1003-tip-73-8918047a73649afac5db926a254a8af88a8cefe5

greg
2017-05-01 18:11
drpcli profiles

wdennis
2017-05-01 18:12
``` [dradmin@dr-admin ~]$ ./drpcli profiles Error: unknown command "profiles" for "drpcli" Run 'drpcli --help' for usage. ```

greg
2017-05-01 18:13
don't have it. I was pretty sure, but now we know.

wdennis
2017-05-01 18:22
@greg - so the question is, how to upgrade an existing DRP install?

greg
2017-05-01 18:27
Yes - I know. I have a doc for that. I haven't committed it.

greg
2017-05-01 18:29
trying to render to show you.

wdennis
2017-05-01 18:30
Ah, OK

wdennis
2017-05-01 18:30
Thx

greg
2017-05-01 18:31
```Upgrade While not glamorous, you can install over the existing code and restart. That is about it. Here are few more details. Steps For isolated Install, update this way: Stop dr-provision: killall dr-provision Return to your install directory Run the install again rm sha256sums # Remeber to use --drp-version is you want something other than stable # Curl/Bash from quickstart if you truly believe, or this: tools/install.sh --isolated install Restart dr-provision, as stated by the tools/install.sh output. For non-isolated Install, update this way: Stop dr-provision, using your system method of choice systemctl stop dr-provision or service dr-provision stop Install new code - How ever you installed before, do it again. Install Start up dr-provision systemctl start dr-provision or service dr-provision start Version to Version Notes In this section, notes about migrating from one release to another will be added. v3.0.0 to v3.0.1 If parameters were added to machines or global, these will need to be manually readded to the machine or global profile, respectively. The machine?s parameter setting cli is unchanged. The global parameters will need to be changed to a profiles call. drpcli parameters set fred greg to drpcli profiles set global fred greg v3.0.1 to v3.0.2 Nothing known to be required to be done.```

greg
2017-05-01 18:31
something

wdennis
2017-05-01 18:48
OK, looks good? ``` [dradmin@dr-admin ~]$ ./drpcli version Version: v3.0.1-0-730b0a596e1b6fa2103f52e3d19fb9c3f9b2a9af [dradmin@dr-admin ~]$ ./dr-provision --version dr-provision2017/05/01 14:33:39.707080 Version: v3.0.1-0-730b0a596e1b6fa2103f52e3d19fb9c3f9b2a9af ```

wdennis
2017-05-01 18:51
:cry: Still no ?profiles? subcommand to drpcli?

wdennis
2017-05-01 18:53
``` Available Commands: autocomplete Rocket-Skates CLI Command Bash AutoCompletion File bootenvs Access CLI commands relating to bootenvs files Commands to manage files on the provisioner help Help about any command interfaces Access CLI commands relating to interfaces isos Commands to manage isos on the provisioner leases Access CLI commands relating to leases machines Access CLI commands relating to machines prefs List and set DigitalRebar Provision operational preferences reservations Access CLI commands relating to reservations subnets Access CLI commands relating to subnets templates Access CLI commands relating to templates users Access CLI commands relating to users version Rocket-Skates CLI Command Version ```

greg
2017-05-01 19:02
hmm - okay trying to look

greg
2017-05-01 19:03
of course not - it is still in tip. Need to cut a release to include it. Sigh.

greg
2017-05-01 19:03
Was hoping to have some additional updates with that.

wdennis
2017-05-01 19:44
No worries- will work with what I have atm

wdennis
2017-05-01 19:45
I do have a problem tho :confused:

wdennis
2017-05-01 19:46
Booted a node to SH, came up in the UX, set the bootenv to U16.04 install; rebooted on PXE and booted back into SH again...

wdennis
2017-05-01 19:47
What's the l/p for SH again? Not "rebar / rebar1" or "rocketskates / RocketSkates"

greg
2017-05-01 19:48
root / rebar1

wdennis
2017-05-01 19:49
I still have the "ubuntu-16.04-install" bootenv showing in UX, think it got messed up by update??

greg
2017-05-01 19:50
Update won't update the templates or bootenvs.

wdennis
2017-05-01 19:51
Ok, logged in to the node which is up on SH, it does have the expected IP addr...

wdennis
2017-05-01 19:52
The UX doesn't show the nodes MAC addr, that's what it actually keys off of right?

greg
2017-05-01 19:52
no - ip

wdennis
2017-05-01 19:52
Really

greg
2017-05-01 19:53
dhcp uses mac to map to ip. templates are rendered by ip usually.

wdennis
2017-05-01 19:53
Thought it would be MAC...

wdennis
2017-05-01 19:53
Yes, OK

greg
2017-05-01 19:53
because of the pxeliunx chain.

wdennis
2017-05-01 19:54
So you are depending on DHCP to hand out dame IP between reboots (which should be the case...)

greg
2017-05-01 19:54
yes .


wdennis
2017-05-01 19:57
So you can see the BootEnv is set to the U16.04 install...

wdennis
2017-05-01 19:58
Rebooting the node again, will see what happens...

greg
2017-05-01 19:59
ok

greg
2017-05-01 19:59
hmm - not sure.

wdennis
2017-05-01 20:02
Nope, got SH again...

greg
2017-05-01 20:04
In the UI, check the machines listing and see if two nodes have that IP.


greg
2017-05-01 20:05
it got a different IP.

wdennis
2017-05-01 20:07
Only during that phase I guess -- it has the "correct" one now on my DHCP server...

greg
2017-05-01 20:08
Okay - sooooo - I know what is going on, I think, and we do some special "magic" in our DHCP server to deal with it.


wdennis
2017-05-01 20:09
No leases recorded for 192.168.1.111

greg
2017-05-01 20:09
Are those bindings (reserved) or are they current state?

wdennis
2017-05-01 20:10
Current state

greg
2017-05-01 20:10
DHCP servers are supposed to pay attention to the client identifier field of the request.

greg
2017-05-01 20:11
Our DHCP server only pays attention to the chaddr field - MAC Address.

wdennis
2017-05-01 20:12
My DHCP comes from a pfSense OS box

greg
2017-05-01 20:12
That way pxelinux, linux, ipxe - get the same ips.

greg
2017-05-01 20:12
looking.


wdennis
2017-05-01 20:13
This has always worked on Cobbler, and DRP 2.9

wdennis
2017-05-01 20:13
(192.168.1.148 is IP of DRP server)

greg
2017-05-01 20:14
okay - not sure then.

greg
2017-05-01 20:15
But the assignment of the IP isn't from DRP

greg
2017-05-01 20:15
Not sure why that would change anything.


wdennis
2017-05-01 20:18
Here's how I started DRP

wdennis
2017-05-01 20:24
Doing a packet cap to see where that IP is coming from...

greg
2017-05-01 20:28
yeah - with that , we shouldn't be run anything on port 67.


wdennis
2017-05-01 20:58
Not quite sure how to interpret this...

wdennis
2017-05-01 20:59
I see the pfSense router pinging IPs .1.111, .1.123, and finally .1.104 during the PXE boot process

wdennis
2017-05-01 21:00
Those IPs correspond to IPs I see on the node while it is PXE-booting

greg
2017-05-01 21:02
You DHCP server in this trace:

greg
2017-05-01 21:02
Got a discovery and started the process by ping 1.111.

greg
2017-05-01 21:02
It then got a request, I don't know with what content.

greg
2017-05-01 21:03
It then got another discover, to which it started a ping on 1.123.

greg
2017-05-01 21:03
It timed out and responded with an Offer.

greg
2017-05-01 21:03
The offer was requested and acked.

greg
2017-05-01 21:03
60 seconds or so later. a replease on 1.123 was done and soon there after the process repeats but with 1.104.


greg
2017-05-01 21:05
Can you open the first request and check the server ip/id?

greg
2017-05-01 21:05
You may have two dhcp servers running.

wdennis
2017-05-01 21:05
Looks like the 3rd packet was a request for IP .1.111

wdennis
2017-05-01 21:07
I don't see the "Offer" packet from the pfSense box...

wdennis
2017-05-01 21:07
(Or anywhere...)

greg
2017-05-01 21:07
yeah - It isn't in the list.

greg
2017-05-01 21:08
in that first request, there should be a server id or server identifier option? That should be the IP of the DHCP server that gave the offer that request is using. Since the packet is broadcast, it is supposed to be from a broadcast offer or a very late stage renew.

wdennis
2017-05-01 21:08
Should only be one DHCP server running on this net - the pfSense router

wdennis
2017-05-01 21:08
(Which is where the packet cap came from)


wdennis
2017-05-01 21:11
DHCP server ID is 192.168.1.254, which is the router

greg
2017-05-01 21:11
that looks good.

greg
2017-05-01 21:12
On the request or discover that gets 1.104, can you look at the client identifier option (97), I think,

greg
2017-05-01 21:12
sorry 61

wdennis
2017-05-01 21:18
Now I'm pissed off enough to throw a hardware tap on the Ethernet segment to the server :rage:

wdennis
2017-05-01 21:18
We'll get all them packets yet!

greg
2017-05-01 21:18
:slightly_smiling_face:

greg
2017-05-01 21:19
Did the client identifer look different for the second set of IP assignments?

2017-05-01 21:19
Minimum "inactive" lease time is 7200 sec? I can't reduce it to debug?

2017-05-01 21:19
Or is that just a gui thing, and I can do what I want on CLI?

2017-05-01 21:19
inactive = reserved

2017-05-01 21:20
that is, once it's successfully leased.

greg
2017-05-01 21:20
oh - different person. My brain kicked in. Sorry. checking.

2017-05-01 21:20
I erased all my leases on DRP, and I'm waiting kinda patiently.

2017-05-01 21:20
:-)

greg
2017-05-01 21:20
no - reserved - is for things handed out by explicit reservation.

greg
2017-05-01 21:21
active is for unkwown things reguardless of if we keep handing the same thing back.

greg
2017-05-01 21:21
But, let me check the mins.

2017-05-01 21:21
Then perhaps I'm confused by terminology. I thought "active" meant the DHCP server is actively trying to establish that IP on the client.

2017-05-01 21:21
and "inactive" mean's it's set - reserved or dynamic.

greg
2017-05-01 21:21
no

greg
2017-05-01 21:21
bad terms possibly on our part.

greg
2017-05-01 21:22
The "Active" lease time is for the addresses in the "Active" range of the subnet. The "Active" range of the subnet is used for unknown / unreserved nodes that are trying to DHCP

greg
2017-05-01 21:23
Reserved Lease time is for entries that have explicit Rersevation objects in the database. that map a MAC to an IP.

2017-05-01 21:23
gotcha. not standard protocol definitions.

greg
2017-05-01 21:24
probably not. matches DR previous definitions.

greg
2017-05-01 21:24
7200 is hard-coded minimum on back end.

greg
2017-05-01 21:24
60 is hard-coded minimum on back end for active.

2017-05-01 21:24
Makes sense of "reserved."

2017-05-01 21:25
and 60 is fine, as long as the DHCPd doesn't timeout waiting to set the value.

greg
2017-05-01 21:25
May want to expose those as preferences at some point.

2017-05-01 21:25
because then there's possibly a race condition - between "active - unset" and "active - set"

greg
2017-05-01 21:25
Yeah - 60 is tight in my opinion, but matches what we've run

greg
2017-05-01 21:26
for DR.

2017-05-01 21:26
If my dhcp client wakes up again ($DIETY willing) and asks for and IP, what's the chance it'll get sledge or something on it? Any special magic to install centos-7?

greg
2017-05-01 21:27
undiscovered, high for sledgehammer.

2017-05-01 21:27
other than drpcli bootenvs install centos and setting it as the default?

greg
2017-05-01 21:27
Already discovered, gets what bootenv is set to.

greg
2017-05-01 21:27
not default.

2017-05-01 21:27
I see no "machiens"

greg
2017-05-01 21:28
yes default.

greg
2017-05-01 21:28
sorry

greg
2017-05-01 21:28
hmm no machiens then sledgehammer didn't finish or get loaded. Soooo, sledgehammer is the dream of a booting unknown machine.

2017-05-01 21:33
So, sledgehammer is in a dream state? nmap and ping show nothing

greg
2017-05-01 21:34
well - if the node pxes, then you should get sledgehammer. SSH should be open and root/rebar1 should allow you in, if it is the real sledgehammer.

2017-05-01 21:34
sad face

2017-05-01 21:35
http://provision.readthedocs.io/en/latest/doc/workflows.html <- nicely done.


2017-05-01 21:37
Hey guys, so I'm still trying to get the "provisioner" up. I've started over from scratch running the following: `sudo ./run-in-system.sh --deploy-admin=local --con-provisioner --con-dhcp --access=HOST --admin-ip=10.54.4.118/23` and everything seems to have worked. When I run `docker-compose ps` I can see "compose_provisioner_1" and it says it is up. But in the UX, I still do not see the "Provisioner" tab. any thoughts?

wdennis
2017-05-01 21:37
Don't understand it - I see the node do a discover, don't see an offer back, but then do a request for .1.111

greg
2017-05-01 21:39
@wdennis - yeah - I don't understand that

greg
2017-05-01 21:40
@spencerwjensen - provisioner tab requires revproxy to find the provisioner.

greg
2017-05-01 21:41
https://ip of admin node/health

greg
2017-05-01 21:41
That should show some nice things.

greg
2017-05-01 21:41
Also in the ux deployments, is the system deployment green or yellow?

wdennis
2017-05-01 21:42
@greg - maybe this is the problem??


wdennis
2017-05-01 21:44
See a bunch of these in a row, then see the node booting sledgehammer


greg
2017-05-01 21:45
so - we don't build those files.

greg
2017-05-01 21:46
We rely on IPs. We have never done that for DRP and DR hasn't done it for years.

wdennis
2017-05-01 21:47
So that's a normal part of the PXE process?

greg
2017-05-01 21:48
yeah - pxelinux walks a set of files to get its info.

wdennis
2017-05-01 21:48
Ah

greg
2017-05-01 21:48
eventually getting to pxelinux.cfg/default

wdennis
2017-05-01 21:48
Yes, I see that

greg
2017-05-01 21:49
The preceding ones are IP-based, the that mac file, and then default.

2017-05-01 21:49
from /health: ``` {"Map":{"dhcp-mgmt-service":["10.54.4.118:6755"],"dns-mgmt-service":["172.17.0.7:6754"],"rebar-api-service":["172.17.0.11:3000"],"rule-engine-service":["172.17.0.2:19202"]},"Matcher":{"dhcp-mgmt-service":"^dhcp/(.*)","dns-mgmt-service":"^dns/(.*)","rebar-api-service":"^rebar-api/(.*)","rule-engine-service":"^rule-engine/(api/.*)"},"Default":"rebar-api-service"} ```

greg
2017-05-01 21:50
@spencerwjensen, that would do it.

wdennis
2017-05-01 21:50
Well, still have NO idea where the node is getting a -.1.111 IP from...

2017-05-01 21:50
and when I check the system deployment it is actually red right now... in the past it was always yellow, never green... but now it is red because of dhcp-mgmt_service.

greg
2017-05-01 21:51
@spencerwjensen - you have a problem somewhere. Those have timeout and are red.

greg
2017-05-01 21:51
cd digitalrebar/deploy/compose

wdennis
2017-05-01 21:51
Let's boot a different node & see what happens...

greg
2017-05-01 21:51
docker-compose logs -f provisioner

greg
2017-05-01 21:51
and see what spits out if anything. It could be looping and failing.

greg
2017-05-01 21:52
Consul is up and somethings seem to have registered. So, that is good.

2017-05-01 21:53
> to opencrowbar.s3-website-us-east-1.amazonaws.com port 80: Connection timed out >provisioner_1 | Calling cmd: /usr/local/entrypoint.d/00-wait-for-ip.sh >provisioner_1 | Calling cmd: /usr/local/entrypoint.d/05-start-samba.sh >provisioner_1 | Calling cmd: /usr/local/entrypoint.d/10-wait-for-consul.sh >provisioner_1 | % Total % Received % Xferd Average Speed Time Time Time Current >provisioner_1 | Dload Upload Total Spent Left Speed >100 18 100 18 0 0 3420 0 --:--:-- --:--:-- --:--:-- 3600 >provisioner_1 | Calling cmd: /usr/local/entrypoint.d/15-get-sledgehammer.sh >provisioner_1 | % Total % Received % Xferd Average Speed Time Time Time Current >provisioner_1 | Dload Upload Total Spent Left Speed > 0 0 0 0 0 0 0 0 --:--:-- 0:02:06 --:--:-- 0curl: (7) Failed to connect to opencrowbar.s3-website-us-east-1.amazonaws.com port 80: Connection timed out

2017-05-01 21:54
it appears to be timing out trying to connect to s3?

greg
2017-05-01 21:54
yes - yes it does

greg
2017-05-01 21:54
It attempts to get sledgehammer

2017-05-01 21:55
I am behind a proxy but I have set the proxy environment variables for the system, for docker, and for yum.

2017-05-01 21:55
are there any specific proxy settings I need to set elsewhere?

greg
2017-05-01 21:55
hmmm - okay - just a minute.

greg
2017-05-01 22:00
This is a bug like thing.

greg
2017-05-01 22:01
in digitalrebar/deploy/compose

greg
2017-05-01 22:01
cat access.env

greg
2017-05-01 22:01
does it have any proxy vars?

2017-05-01 22:02
>USE_OUR_PROXY=YES >EXTERNAL_IP=10.54.4.118/23 >FORWARDER_IP= >CONSUL_JOIN=10.54.4.118 >DR_START_TIME=1493670448 >RUN_NTP=YES

greg
2017-05-01 22:02
So add to that:

greg
2017-05-01 22:02
UPSTREAM_HTTP_PROXY=$http_proxy UPSTREAM_HTTPS_PROXY=$https_proxy UPSTREAM_NO_PROXY=$no_proxy

2017-05-01 22:03
oh nice!! okay, once added do I need to restart any services?

greg
2017-05-01 22:03
Replace the $http_proxy, $https_proxy and $no_proxy with your items

greg
2017-05-01 22:03
yes - this:

greg
2017-05-01 22:03
docker-compose restart provisioner

greg
2017-05-01 22:04
Soooo what is going on and an issue to open -

greg
2017-05-01 22:05
The system canhaz a web proxy. When it does, it uses that for everything. You can specify the upstream for our proxy to use, but it tries to get it from the local system. squid/webproxy container. The problem is that the provisioner bypasses that proxy on start up to load sledgehammer and uses those upstream vars. Those aren't set when using our internal proxy.

greg
2017-05-01 22:06
So we have a start-up race condition when proxies are involved. I guess we haven't been testing it that much since we left Dell.

2017-05-01 22:06
Time to feed the :bear:!

2017-05-01 22:07
Ha ha!! That's funny! I'm at Intel, so I totally know what you mean! :-P

greg
2017-05-01 22:09
@spencerwjensen - I may want to talk with you at some point about hardware. and find out more what you are doing.

wdennis
2017-05-01 22:09
Whelp, same thing on a different node... boots 1st time to SH as expected, but then after setting the bootenv to U16.04-install, it continues to boot SH every time thereafter...

2017-05-01 22:12
Happy to chat offline! I work in the Data Center Solutions Group. Currently using Cobbler, and Ansible, amongst other tools to manage racks of servers in our labs.

greg
2017-05-01 22:14
yeah - @wdennis - I'm not sure how this ever worked for you. I think the DHCP server is working correctly, but we make this work with ours. The client identifiers are different and the server gives out different addresses unless you MAC lock them in the DHCP server.

greg
2017-05-01 22:14
You could try that if it is an option. Bind the mac to the same address all the time.

greg
2017-05-01 22:15
cobbler may be building the mac files to work with pxelinux / dhcp badness. I'll need to think about that.

2017-05-01 22:15
So I added the proxy info as you said and restarted the service but still seeing a timeout in the logs. here's what i added: >USE_OUR_PROXY=YES >EXTERNAL_IP=10.54.4.118/23 >FORWARDER_IP= >CONSUL_JOIN=10.54.4.118 >DR_START_TIME=1493670448 >RUN_NTP=YES >UPSTREAM_HTTP_PROXY=http://<url>:<port>/ >UPSTREAM_HTTPS_PROXY=http://<url>:<port>/ >UPSTREAM_NO_PROXY=<no_proxy list>

greg
2017-05-01 22:16
checking again.

greg
2017-05-01 22:19
hmm - that should have worked.

greg
2017-05-01 22:21
well - that is sad. So, we can do this instead.

greg
2017-05-01 22:21
when you run this do you get stuff?

greg
2017-05-01 22:21
ls ~/.cache/digitalrebar/tftpboot

2017-05-01 22:22
yup!

2017-05-01 22:22
>[root@master compose]# ls ~/.cache/digitalrebar/tftpboot/ >files ipxe.efi ipxe.pxe isos machines nodes pxelinux.cfg sledgehammer

2017-05-01 22:22
in the "root" directory right?

greg
2017-05-01 22:22
okay - you will need to a couple of things but give me a minute.

greg
2017-05-01 22:23
yeah - that is fine. That is mounted into the containers to avoid downloading all the time.

2017-05-01 22:23
oh gotcha!

2017-05-01 22:24
sidebar.. is this Rob? or Greg? every time you type a message I see 2 names pop up! LOL

wdennis
2017-05-01 22:24
Hah @spencerwjensen - another Cobbler/Ansible guy, nice to meet you :grinning:

2017-05-01 22:24
:-P Likewase @wdennis

greg
2017-05-01 22:25
Greg

wdennis
2017-05-01 22:25
@greg Idk, everything was working swell with DRP 2.9 on the same router... maybe something in the upgrade mangled something??

greg
2017-05-01 22:25
if you do slack, I can invite you to that instead. It is "better".

2017-05-01 22:25
LOL! yes yes! I love slack! :-)

greg
2017-05-01 22:26
maybe - but I'm not sure. @wdennis

wdennis
2017-05-01 22:26
As in, more people use Slack :wink:

2017-05-01 22:26
LOL

greg
2017-05-01 22:26
Let me check.

greg
2017-05-01 22:26
spencerwjensen I need an email.

greg
2017-05-01 22:27
@spencerwjensen: - this as root

greg
2017-05-01 22:27
``` # Get sledgehammer TFTPROOT=~/.cache/digitalrebar/tftpboot PROV_SLEDGEHAMMER_SIG=a42c8c66a60b77ca1c769b8dc7e712f6644579ed PROV_SLEDGEHAMMER_URL=http://opencrowbar.s3-website-us-east-1.amazonaws.com/sledgehammer SS_URL=$PROV_SLEDGEHAMMER_URL/$PROV_SLEDGEHAMMER_SIG SS_DIR=${TFTPROOT}/sledgehammer/$PROV_SLEDGEHAMMER_SIG mkdir -p "$SS_DIR" if [[ ! -e $SS_DIR/sha1sums ]]; then (curl -fgL -o "$SS_DIR/sha1sums" "$SS_URL/sha1sums") while read f; do (curl -fgL -o "$SS_DIR/$f" "$SS_URL/$f") done < <(awk '{print $2}' <"$SS_DIR/sha1sums") if ! (cd "$SS_DIR" && sha1sum -c sha1sums); then echo "Download of sledgehammer failed or is corrupt!" rm -f "$SS_DIR/sha1sums" exit 1 fi fi ```

wdennis
2017-05-01 22:27
I guess I'll burn it down and start over again tomorrow with a fresh copy... it's dinner time now

2017-05-01 22:27
oh right! spencerwjensen@gmail.com is my personal or spencer.w.jensen@intel.com. Either is fine to use for slack.

greg
2017-05-01 22:28
Yeah - I need to head out as well. I'll make a note to diff through some things, but not sure.

greg
2017-05-01 22:28
sent to gmail

greg
2017-05-01 22:31
I need to head home and do dad things. Back later tonight.

2017-05-01 22:31
:-) thanks for the help! That last script seems to be pulling stuff! I'll join the slack channel and talk to you later!

2017-05-01 22:31
thanks again!

greg
2017-05-01 22:32
np - @wdennis - I'll try and put out a release tonight with "fixes" for a lot your requests.

spencerj
2017-05-01 22:32
has joined #community201705

spencerj
2017-05-01 22:52
in my system deployment, the "dhcp-mgmt_service" role is currently red and is currently set to "null". are there any examples of what this should look like?

greg
2017-05-01 23:17
once the script finished, did you restart the provisioner?

greg
2017-05-01 23:17
@spencerj - you will need to restart the provisioner. once that finishes, you will need to retry the dhcp server role. You can do that from the annealer page in the UI (the sprially button in the top right).

wdennis
2017-05-01 23:21
Super frustrating- after totally wiping & reinstalling DRP, the nodes refuse to boot anything but SH still, even when bootenv is reset... :rage:

greg
2017-05-02 02:49
oh - sorry. I'm really certain that it is and I wonder if the lease cache in your dhcp server could be messing with it. It may not be simple to add a mac write out function. It might only work for lpxelinux.0, but that is what you are using. I think that is why we dropped it. The others didn't use it. Anyway, I'll think about.

greg
2017-05-02 02:50
@intendo - let's try and hook up. Tuesday.

wdennis
2017-05-02 04:02
@greg - I intend to fire up my old Cobbler system tomorrow (just have to get some OS installs done!) and I?ll do a packet capture on one of the nodes I install, we?ll see just what Cobbler does?

greg
2017-05-02 04:08
sorry - be back stepped. I don't know what is going on.

greg
2017-05-02 04:19
I don't see any changes in the log that would account for this, but ...

wdennis
2017-05-02 04:21
OK, digging in to my pfSense router that is serving the DHCP; i see the following 3 stanzas in the dhcpd.leases file (running ISC dhcpd 4.2.6 on FreeBSD): ``` lease 192.168.1.111 { starts 1 2017/05/01 21:23:51; ends 1 2017/05/01 21:25:26; tstp 1 2017/05/01 21:25:26; cltt 1 2017/05/01 21:23:51; binding state free; hardware ethernet 00:25:90:ed:a2:04; uid "\000\000\000\000\000\000\000\000\000\000\000\000%\220\355\242\004"; } lease 192.168.1.123 { starts 1 2017/05/01 21:24:44; ends 1 2017/05/01 21:25:15; tstp 1 2017/05/01 21:25:15; cltt 1 2017/05/01 21:24:44; binding state free; hardware ethernet 00:25:90:ed:a2:04; uid "\001\000%\220\355\242\004"; } lease 192.168.1.104 { starts 1 2017/05/01 21:25:26; ends 1 2017/05/01 23:25:26; tstp 1 2017/05/01 23:25:26; cltt 1 2017/05/01 21:25:26; binding state free; hardware ethernet 00:25:90:ed:a2:04; } ```

wdennis
2017-05-02 04:22
(I put them in the order that the IPs showed up in the PXE boot console output, which is correlated by the ?starts? time)

wdennis
2017-05-02 04:24
Not sure, but looks like only the ?uid? value (or the lack of one) is the differentiating factor? Wonder if ISC dhcpd hands out a different IP addr lease for every tuple of MAC (?hardware ethernet?) and uid value (or lack of uid value)

wdennis
2017-05-02 04:26
Do you know where the ?uid? value is getting set from - the discovery packet?

greg
2017-05-02 04:43
yes - this is what I've been saying but not well. We had this problem with ISC DHCP. We wrote our own for this. ISC has an opion that basically says ignore uid and sticks with mac only.

greg
2017-05-02 04:45
uid is option 61 in the discovery packet. The pxe prom on the nic, the kernel, isc dhcp server, and lpxelinux can and sometimes does use different ones. You have three in this case.

greg
2017-05-02 04:55
@wdennis - do you know what version of ISC DHCP they are using? I found this option and we had to use dhcp version > 4.3. ignore-client-uids in the server config file will make the server only pay attention to mac instead of client identifier.

greg
2017-05-02 05:01
Yeah - that is what we used to do when we had ISC DHCP in the mix. We built 4.3.X because it wasn't in the distros at the time, added that option, and all was good. We got tired of fighting ISC and wrote our own - rebar-dhcp. That is the basis for dr-provision.

wdennis
2017-05-02 11:06
Currently it's 4.2.6 -- but I'm running a downlevel version of pfSense at this point; perhaps a newer version has a higher ver of ISC dhcpd - I should upgrade in any case

wdennis
2017-05-02 11:08
Unfortunately on the network I'm running DRP on, I have to provide DHCP from the router...

wdennis
2017-05-02 15:15
OK, updated my pfSense router to latest, now I see this option in the DHCP server controls:


wdennis
2017-05-02 15:16
Checked it, will see what we get now...


wdennis
2017-05-02 15:31
That was it

greg
2017-05-02 15:35
Yeah!!

greg
2017-05-02 15:37
That is a bad part of the spec. I was in the IETF meetings when DHCP was being defined. We were more focused on clients getting IPs and not Servers installing.

wdennis
2017-05-02 15:46
i just read that the UID was to be used to differentiate between systems that dual-booted OS?s?

vlowther
2017-05-02 15:47
That sounds like a post-hoc justification. :slightly_smiling_face:

greg
2017-05-02 15:56
probably, but we never used DHCP for things other than clients originally. The craze at the time was clients getting on networks, and diskless "dumb" terminals. Heck we were still installing AIX from floppies at the time.

greg
2017-05-02 15:57
nobody trusted DHCP for servers that had to have a known good address.

vlowther
2017-05-02 16:00
nods.

vlowther
2017-05-02 16:01
These days, in the time of massive server farms and teardown/rebuild instead of fix-in-place, though...

greg
2017-05-02 16:06
yeah - heck the follow-spec that was worked by not done was MobileIP. It was to be a layer on top of DHCP that allowed the networking stack to have a constant IP while you roamed DHCP subnets. I started to get that working on AIX (we actually had AIX client laptops at the time), but we stopped when the spec started getting hard to implement and people started making reconnection a priority in clients. It just wasn't a problem.

spencerj
2017-05-02 16:44
Hey @greg ! restarting the services seemed to do the trick. now I have the default Network and DHCP subnet. I'm not ready to open the range to my entire lab yet. I wanted to play around a bit more with the provisioner first. is it possible to add individual MACs somewhere?

spencerj
2017-05-02 16:45
in Cobbler I could do this through the dhcp files.

vlowther
2017-05-02 16:49
hm... you looking at just storing what mac address is associated with a machine, or more looking at making sure the same mac address always gets the same address?

spencerj
2017-05-02 16:51
I'm looking at getting rid of the dhcp range temporarily and statically assigning IPs to MACs.. If it works the same as cobbler, the provisioner should ONLY do something if DHCP request comes from a machine with a known MAC.

spencerj
2017-05-02 16:51
does that make sense?

vlowther
2017-05-02 16:51
Yes, :slightly_smiling_face:

vlowther
2017-05-02 16:53
The UI does not have support for it yet, but you are looking for DHCP reservations.

greg
2017-05-02 16:54
@spencerj - Remind me, are you using DR or DRP?

greg
2017-05-02 16:54
different commands for each.

spencerj
2017-05-02 16:54
LOL... I think just DR.. I'm not sure the difference though.

greg
2017-05-02 16:56
I think DR as well from what you told me earlier.

spencerj
2017-05-02 16:58
okay :slightly_smiling_face: so you have to set the reservations from the CLI?

greg
2017-05-02 17:05
DR is less like cobbler than that, but ...

spencerj
2017-05-02 17:14
oh okay, but are reservations possible?

greg
2017-05-02 17:17
Yeah - thinking about how to do it.

greg
2017-05-02 17:18
umm - @spencerj - did you ever tell me what your goal is with DR and/or DRP?

spencerj
2017-05-02 17:19
I don't think so! :slightly_smiling_face:

greg
2017-05-02 17:19
What I mean is - do you just want a cobbler replacement. Set a mac/IP map and install os, walk away? Or do you want workload orchestration, IPMI management, eventing, post install configuration in orchestrated way.

spencerj
2017-05-02 17:21
currently we just have the "Cobbler" piece were we have a MAC/IP mapping. known or "registered" systems can boot and get the PXE payload to build whatever profile (OS + Kickstarts) we assigned..

spencerj
2017-05-02 17:21
then AFTER the cobbler process we are manually kicking off the "configuration" stuff (Ansible).

spencerj
2017-05-02 17:24
we also have IPMI/BMC on all of our machines but again.. each of these pieces is sort of separate from one another. we have scripts for IPMI automation... scripts for Cobbler automation, and more scripts for Ansible automation.. but nothing really tying them together very well.

spencerj
2017-05-02 17:25
In a past life I was a TeamCity admin so I was starting to look at some options there with a TeamCity/Jenkins type setup to manage the scheduling/event side of things.. but then I struck just the right search query into Google and Digital Rebar popped up.

spencerj
2017-05-02 17:26
I kid you not, in ALL my past Googling, searching high and low for bare metal provisioning solutions, I never found DR.

greg
2017-05-02 17:26
we are finally trying to do that better.

spencerj
2017-05-02 17:26
oddly enough, it was actually a search string of "cobbler docker" that did the trick and I found some blog post that Rob had written! LOL

greg
2017-05-02 17:28
okay - good to know. We can do all of that over time. The challenge is that DR is opinionated on how is should be used. We are trying to undo that with DRP and slowly in DR.

greg
2017-05-02 17:28
We aren't there yet.

greg
2017-05-02 17:28
By default DR wants to manage tightly IPs through DHCP.

spencerj
2017-05-02 17:29
So the "Provisioner" in DR is not the same as DRP?

greg
2017-05-02 17:30
no - that is a coming change. DR's provisioner is the basis for DRP, but they are not the same. They have similar pieces, but DRPs is much more fleshed out, documented, and directly control able.

spencerj
2017-05-02 17:32
ohhhhh... okay.. I misread the docs then.... I thought DRP was standalone but was also the "Provisioning" piece in DR..

spencerj
2017-05-02 17:32
that makes sense now! :slightly_smiling_face:

spencerj
2017-05-02 17:33
So.. based on what I've said so far then.. should I start with DRP?

greg
2017-05-02 17:35
well - it matches your current model better.

spencerj
2017-05-02 17:36
our environment is very "fungible".. constantly rebuilding and re-purposing HW... but all of the systems are "known", as in we want to control the IP assignment and avoid "rogue" systems building.

greg
2017-05-02 17:36
You have IPMI system, Provsion system, other system.

greg
2017-05-02 17:36
DRP could be the Provision System

greg
2017-05-02 17:36
I'm working to hook it into DR. To drive events and such.

greg
2017-05-02 17:36
DRP may be easier to play with it.

greg
2017-05-02 17:37
It has a Reservation system. You define MAC->IP reservations.

greg
2017-05-02 17:37
You then create Machines that map IPs->BootEnvs.

greg
2017-05-02 17:37
BootEnvs are the OSes you want to install.

greg
2017-05-02 17:37
The BootEnvs are like cobbler kickstart.

greg
2017-05-02 17:38
The machines have parameters and/or profiles that all you to inject information globally or per node into the kickstarts.

greg
2017-05-02 17:38
All with CLI or API or UI.

greg
2017-05-02 17:39
The kickstarts are templated so that you can update machine values and chain bootenvs.


spencerj
2017-05-02 17:39
does DRP still use sledgehammer for discovery?

greg
2017-05-02 17:39
I've been docing like fiend.

greg
2017-05-02 17:39
yes

spencerj
2017-05-02 17:39
ha ha ha!


spencerj
2017-05-02 17:40
okay! and do the BootEnvs allow you to hook in "post" OS stuff? like Ansible Roles?

greg
2017-05-02 17:40
wjat

greg
2017-05-02 17:40
What you've described is a subnet for base options set to reserved only.

greg
2017-05-02 17:41
THen a bunch of reservations for your IP>MAC maps. This could include IPMI if you want to run DHCP bmcs, but ....

greg
2017-05-02 17:41
Then create a machine (or let sledgehammer discover and create).

greg
2017-05-02 17:41
Set the bootenv you want.

spencerj
2017-05-02 17:44
okay! and does DRP use PXE or iPXE? or both? is one defaulted?

greg
2017-05-02 17:44
by default, it serves files of lpxelinux.0, ipxe, and bootefi.*.

spencerj
2017-05-02 17:44
oh swet!

greg
2017-05-02 17:44
The image you use, is dependent upon what you want to boot from or some magic.

spencerj
2017-05-02 17:44
*sweet!


greg
2017-05-02 17:45
The options in the DHCP server are templated.

greg
2017-05-02 17:46
You can do this insanity: ```{{if (eq (index . 77) "iPXE") }}default.ipxe{{else if (eq (index . 93) "0")}}lpxelinux.0{{else}}bootx64.efi{{end}} ```

greg
2017-05-02 17:46
which says check option 77 for iPXE, use default.ipxe as the bootloader, if option 93 is 0 then use lpxelinux.0 otherwise use bootx64.efi

greg
2017-05-02 17:46
That way we handle ipxe, legacy bios, and uefi

greg
2017-05-02 17:47
Though the default is just: lpxelinux.0

greg
2017-05-02 17:47
because simple

spencerj
2017-05-02 17:47
LOL!

greg
2017-05-02 17:48
in theory, you could use the same thing to jump to arm or 32bit or whatever.

greg
2017-05-02 17:49
though I suppose ARM is a bad word for you. :slightly_smiling_face:

spencerj
2017-05-02 17:53
ha ha ha!

spencerj
2017-05-02 17:53
It's okay. I forgive you for that profanity. :stuck_out_tongue_winking_eye:

spencerj
2017-05-02 18:30
once I have the DRP code, how should I run the installer? I'm looking at the install docs for "install.sh". Are there only two ways to run? with or without --isolated?

wdennis
2017-05-02 18:36
@spencerj I used the curl cmd to download the installer bundle, then ran the install.sh myself when I had vetted it. As I?m running a demo of it, I did create a ?drp? directory, put the install stuff there, and then ran the installer with ?--isolated? which keeps all the things within the dir you ran the installer in.

spencerj
2017-05-02 18:45
okay.. so without the --isolated flag it will do a "normal" install?

spencerj
2017-05-02 19:16
So I just ran the install.sh script and got this error: ``` cp: cannot stat ?assets/startup/dr-provision.service?: No such file or directory ```

spencerj
2017-05-02 19:16
when I `ls` that directory this is what I see:

spencerj
2017-05-02 19:16
``` [infra@master drp-install]$ ls assets/startup/ rocketskates.service rocketskates.sysv rocketskates.unit ```

greg
2017-05-02 19:17
Oops.

greg
2017-05-02 19:17
Fixing

spencerj
2017-05-02 19:18
ha ha ha!

2017-05-02 20:00
drp in-place updates preserve configs and data? just delete the sha file?

2017-05-02 20:02
for --isolated

greg
2017-05-02 20:04
drp in place creates a drp-data directory and shows the options to run that way.

greg
2017-05-02 20:05
To have install.sh pull an updated zip file from github, remove the sha file and re-run install.

2017-05-02 20:05
groovy.

2017-05-02 20:05
that's exactly what I meant. :-)

lae
2017-05-02 20:15
oh funny, spencer's story is literally my story (was using cobbler/ansible/in house scripts for ipmi and stuff, suddenly found DR and then DRP was made public afterwards)

greg
2017-05-02 20:18
:slightly_smiling_face: @vlowther and I are working on a plan to get IPMI/RAID/BIOS back into the fray and join DRP machines into DR as nodes.

2017-05-02 22:18
I'm confused which should be defaultBootEnv and unknownBootEvn. unknownBootEnv=discovery and defaultBootEnv=sledge?

2017-05-02 22:22
nevemrind - found it in the Fine Documentation

greg
2017-05-02 22:23
Fine - as in wine - as in drunken ramblings.

2017-05-03 00:00
tftpd not responding...

2017-05-03 00:00
it's listening on udp:69

2017-05-03 00:03
if the mac is in the server's arp table as associated with an ip already, accisble through a server interface that's not listening tftpd, will it fail to resopnd to tfpd udp?

2017-05-03 00:06
the same mac is showing as known by the other interface that's actually running DRP

greg
2017-05-03 00:07
Trees falling. Noisily or not. Hmm. I think you are using DR. Are you in HOST or FORWARDER mode? If you didn't specify anything then you are in FORWARDER mode. The implication is that only thinks on docker0 will communicate with DR services. You could bridge into that network. If host mode then all interfaces are in play and will respond.

2017-05-03 00:08
DRP - no docker

2017-05-03 00:08
no bridges

2017-05-03 00:12
wat? why would ncat say 'no route to host' when ping sees the IP just fine?

2017-05-03 00:16
ah, iptables - no accepto port udp:69

2017-05-03 00:17
et voir la!

2017-05-03 00:27
dr-provision2017/05/03 00:24:22.650999 DHCP handler died: write udp4 0.0.0.0:67->192.168.1.51:68: i/o timeout

2017-05-03 00:27
drp process died.

2017-05-03 00:30
any idea why a dell R410 would ignore and not reboot PXE when this returns OK? ipmitool chassis bootdev pxe; ipmitool chassis power cycle

vlowther
2017-05-03 01:17
latest DRP code cannot fail in that way -- we no longer rely on timeouts to cleanly release the DHCP socket to work around the darwin kernel's lack of ability to clean up UDP sockets belonging to nonexistent processes.

vlowther
2017-05-03 01:17
yes, really.

vlowther
2017-05-03 01:19
re: r410: because all IPMI firmware sucks.

vlowther
2017-05-03 01:20
messing with bootdev order via ipmi is something I basically assume will fail silently until I am happily proven wrong.

2017-05-03 01:21
lolz

vlowther
2017-05-03 01:23
tl:dr: never kill -9 processes with open listening UDP ports on a mac, unless you like rebooting.

2017-05-03 01:23
jeez.

2017-05-03 01:24
Is there any way to remotely set the bootdev? I see an entry for dell related stuffs in the ipmitool manual.

vlowther
2017-05-03 01:25
Yes, the IPMI standard has those methods, and some IPMI firmware even implements it properly.

vlowther
2017-05-03 01:26
I don't know if the idrac on a 410 is one of those --- it has been many years and more ethanol since I tried. \

vlowther
2017-05-03 01:27
If you have a copy of racadm compatible with that box, it is probably a better bet than ipmiitool.

2017-05-03 01:48
Once DRP finds a box and drops sledgehammer on it, shouldn't it be a "machine"? Or are those only for predefined machines?

2017-05-03 02:02
Because it didn't give me a "machine" object to play around with.

greg
2017-05-03 02:10
It should create a machine.

greg
2017-05-03 02:11
My guess is that your subnet is missing some parameters that sledgehammer needs. I tried to put this in the subnet config page, I think.

greg
2017-05-03 02:11
You can get the error log byL

greg
2017-05-03 02:11
logging into sledgehammer - root/rebar1

greg
2017-05-03 02:11
journalctl -u sledgehammer

greg
2017-05-03 02:12
That should have a clue as what is missing or busted.

greg
2017-05-03 02:12
The darwin hack sadly makes it better, but occassionally it hits it still. It is better though. :neutral_face:

2017-05-03 02:13
already rebooted. my idrac access had annoying keyboard issues.. because mac->x2go->xfce->chrome->javaws->console.

2017-05-03 02:14
Does sledgehammer open the serial console? My ssh attempts seemed to require a key.

2017-05-03 02:24
d

greg
2017-05-03 02:30
d?

greg
2017-05-03 02:31
@newgoliath - it should, I think. ssh requires a key. Let me check something.

greg
2017-05-03 02:32
That is a good feature request to put back.

greg
2017-05-03 02:32
actually, some template in template love should fix this nicely.

greg
2017-05-03 02:33
I cleaned too much.

greg
2017-05-03 02:33
It should have a console, but I can put in the root-remote-access.tmpl and it should work in sledgehammer.

greg
2017-05-03 02:33
and discovery.

greg
2017-05-03 02:34
sooo - you can put {{ template "root-remote-access.tmpl" . }} in the sledgehammer start-up.sh template and it will enable ssh with keys in the access_keys parameter in the machine profile, profiles assigned to the machine, or the global profile.

greg
2017-05-03 02:34
I'll make that change shortly.

greg
2017-05-03 02:35
I need to cut a release, we've amassed some good changes.

2017-05-03 02:40
I'm fading... sleepy...

spencerj
2017-05-03 02:43
In my experience `chassis power cycle` reboots everything including the BMC so any temp flags are lost. After the `bootdev pxe` command I usually do a `chassis power reset` which seems to be more synonymous with a hard reboot, but BMC stays alive so the pxe flag stays.

greg
2017-05-03 02:49
sleep my friend and dream of large women.

2017-05-03 02:50
struggling to remember the film you reference.

greg
2017-05-03 02:51
Princess Bride

greg
2017-05-03 02:51
I do not envy you your headache when you wake up.

greg
2017-05-03 02:51
My brain is a little off right now.

2017-05-03 02:51
hee hee.

greg
2017-05-03 06:17

greg
2017-05-03 06:17
Stables updated.


2017-05-03 13:42
grabbing and installing

2017-05-03 13:49
dr-provision2017/05/03 13:49:00.327087 Received option: OptionClientIdentifier: ??K?

greg
2017-05-03 13:51
yeah - some things send CID that aren't printable.

2017-05-03 14:00
interesting drp is responding dhcp on both my interfaces.

2017-05-03 14:00
sledge keeps requesting, over and over.

2017-05-03 14:01
every 30ish seconds

greg
2017-05-03 14:02
Yes - that is right. The lease time is 60 seconds. so renew time 30 seconds.

greg
2017-05-03 14:03
You can change that in the subnet definition.

2017-05-03 14:03
nmap says up, but all ports closed.

2017-05-03 14:03
I see your {{ template "root-remote-access.tmpl" . }} in the sledge.yaml

2017-05-03 14:03
Did bootenvs need to be reloaded, to re-run templates?

2017-05-03 14:05
no serial console ouput (but it's at least connecting)

greg
2017-05-03 14:07
Yes - templates needs to be reloaded.

greg
2017-05-03 14:08
So do bootenvs.

2017-05-03 14:08
okdee doke.. thanks Greg!

greg
2017-05-03 14:08
Templates are easy. I have a bug I'm still working on to update bootenvs.

greg
2017-05-03 14:08
It may be easiest to:

greg
2017-05-03 14:08
stop dr-provision

greg
2017-05-03 14:08
rm -f drp-data/digitalrebar/bootenvs

greg
2017-05-03 14:08
rm -f drp-data/digitalrebar/templates

greg
2017-05-03 14:09
Then rerun the tools/discovery-load

greg
2017-05-03 14:09
and all the otehr bootenvs install commands from before.

greg
2017-05-03 14:09
The bootenvs update is a real issue.

2017-05-03 14:10
roger.

2017-05-03 14:12
should I bootenvs destroy before bootenvs install?

2017-05-03 14:21
chicken-egg problem with dr-provision and discovery-load ----- after the rm -f commands.

2017-05-03 14:21
:(

2017-05-03 14:22
I guess gotta clear out the configs for the default boot envs.

greg
2017-05-03 14:59
ugh - sorry. yeah prefs. will get in the way.

greg
2017-05-03 14:59
I should know better. Get fix the bugs instead of cludging around them .

2017-05-03 15:02
I can't find out to force start without the discovery

greg
2017-05-03 15:04
rm -rf drp-data/digitalrebar/preferences/*

greg
2017-05-03 15:05
then reset them in the UI once all is loaded. Sorry.

2017-05-03 15:05
OH! DUH! there they are.

2017-05-03 15:46
Did you miss updating the version string in the executable?

2017-05-03 15:46
dr-provision2017/05/03 15:45:03.939046 Version: v3.0.1-tip-38-10d0a97fe90d2104bf9c2e7720529496fac4c033

2017-05-03 15:46
or did I not delete all the rigth stuff?

greg
2017-05-03 15:50
It should have automatically been updated

2017-05-03 15:50
nevermind.. I didn't pull the versioned one.

greg
2017-05-03 15:51
The next tip build will be relative to 3.0.2

2017-05-03 16:30
the client gets sledge and the first script is run, but it seems control.sh never happens.

2017-05-03 16:33
does it go out to the Internet? Because I've got no NAT running on my private IP address range that I'm using for DHCP.

greg
2017-05-03 16:36
it should not need internet, I think.

greg
2017-05-03 16:36
can you get on sledgehammer?

greg
2017-05-03 16:36
what does journalctl -u sledgehammer show

2017-05-03 16:37
no route to host.

2017-05-03 16:37
it gets an IP address and shuts up.

2017-05-03 16:37
no more dhcp requests.

greg
2017-05-03 16:39
hmm - what IP did you use for the static-ip on the dr-provision line. It should probably be the internal IP of your admin system.

2017-05-03 16:39
yup, i have em1 (public IP) em2 (192.168.1.1)

2017-05-03 16:40
./dr-provision --static-ip=192.168.1.1 --file-root=/root/drp/dr-provision-install/drp-data/tftpboot --data-root=drp-data/digitalrebar --debug-bootenv=2 --debug-dhcp=2 --debug-renderer=2 --dhcp-ifs=em2

2017-05-03 16:42
it pxes to tftp and gets the sledge bits. then the logs dump out a script. then nothing.

2017-05-03 16:42
I can reset with IPMI.

2017-05-03 16:44
dr-provision2017/05/03 16:29:51.742235 Rendering start-up.sh for All booting discovery dr-provision2017/05/03 16:29:51.742465 Content: #!/bin/bash export PS4='${BASH_SOURCE}@${LINENO}(${FUNCNAME[0]}): ' set -x set -e

2017-05-03 16:44
then I never see anything again.

greg
2017-05-03 16:44
Do you get a machine object?

greg
2017-05-03 16:45
wow.

2017-05-03 16:45
nope.

2017-05-03 16:45
it dumps out the whole script.. up to the echo "Did not get control.sh..."

2017-05-03 16:46
the script was not dumped to console, it's in logs.

greg
2017-05-03 16:46
okay - whew - I was worried.

2017-05-03 16:46
Should I ping the subnet to see if anything sprung to life?

greg
2017-05-03 16:47
yeah - see if the node is reachable, but doesn't seem likely.

2017-05-03 16:47
because the IP that it got DHCP is truly down. nmap says so.

2017-05-03 16:47
? (192.168.1.52) at <incomplete> on em2

2017-05-03 16:48
^ arp -a

greg
2017-05-03 16:48
yeah

greg
2017-05-03 16:50
it is like your boot dev isn't set.

2017-05-03 16:51
hmm.. interesting.

greg
2017-05-03 16:52
the pxelinux.cfg/default should have IPAPPEND 2 on it.

2017-05-03 16:55
ls ./drp-data/tftpboot/pxelinux.cfg/ <- empty

greg
2017-05-03 16:55

2017-05-03 17:01
curl http://192.168.1.1:8091/pxelinux.cfg/default DEFAULT discovery PROMPT 0 TIMEOUT 10 LABEL discovery KERNEL sledgehammer/708de8b878e3818b1c1bb598a56de968939f9d4b/vmlinuz0 INITRD sledgehammer/708de8b878e3818b1c1bb598a56de968939f9d4b/stage1.img APPEND rootflags=loop root=live:/sledgehammer.iso rootfstype=auto ro liveimg rd_NO_LUKS rd_NO_MD rd_NO_DM provisioner.web=http://192.168.1.1:8091 rs.api=https://192.168.1.1:8092 IPAPPEND 2

2017-05-03 17:01
OK... trying from remote host.

2017-05-03 17:02
same

greg
2017-05-03 17:02
looks good

2017-05-03 17:03
and the swagger API is accessible

2017-05-03 17:03
curl -fsSLk https://192.168.1.1:8092/

2017-05-03 17:03
OK

greg
2017-05-03 17:04
drpcli profiles show global

greg
2017-05-03 17:04
anything in those?

2017-05-03 17:04
just the name.

greg
2017-05-03 17:05
ok

greg
2017-05-03 17:05
hmmm - maybe me.

2017-05-03 17:05
I see your greg key in access_keys

2017-05-03 17:05
but whatever, the machine isn't even online.

greg
2017-05-03 17:05
well , if the script breaks it could do that.

greg
2017-05-03 17:06
profiles has something in it? or the examples profiles have something in it.

2017-05-03 17:06
drpcli profiles show global { "Name": "global" }

2017-05-03 17:06
drpcli profiles list [ { "Name": "global" } ]

greg
2017-05-03 17:07
ok

greg
2017-05-03 17:08
Let me test something - i forgot

2017-05-03 17:11
I'm fine to kill and rm -rf the whole thing.

greg
2017-05-03 17:12
Try that and restart with new. Make sure all the templates are aligned.

greg
2017-05-03 17:12
I'm concerned I didn't test sledgehammer/discovery without access_keys.

greg
2017-05-03 17:27
testing now - may have found something.

greg
2017-05-03 17:35
nope - my bad. on the virtualbox setup.

2017-05-03 17:44
script ran, asking for DHCP again - good progress.

greg
2017-05-03 17:46
Just published v3.0.3 - it has a fix for bootenv updating. It is a cli change only.

2017-05-03 17:48
Goodness gracious, there a machine!

greg
2017-05-03 17:48
:slightly_smiling_face:

greg
2017-05-03 17:49
If you add the access_keys parameter to the global profile like in the example, you should be able to ssh into sledgehammer (after a reboot).

2017-05-03 17:54
assets/profiles/root-access.yaml being the example?

wdennis
2017-05-03 18:19
@greg <offtopic>Here in olde Austin-towne? Anyplace I should try to go and eat at tonight?

greg
2017-05-03 18:23
Location and transportation info needed. BBQ?

wdennis
2017-05-03 18:24
Loc: downtown (Hampton Inn Univ.) transportation: RideAustin or other. Food: Tx-Mex, Mex or BBQ - don?t care as long as it?s tasty!

greg
2017-05-03 18:26
Thinking

greg
2017-05-03 18:39
soo - Victor and I were discussing -

greg
2017-05-03 18:43
@wdennis BBQ - Salt Lick (North in Round Rock), Iron Works (1st and RedRiver just off IH-35 (downtown)), Stubb's 8th/RedRiver (sometimes has live music)

greg
2017-05-03 18:43
MEX - is harder - Manuel's - Not my normal.

wdennis
2017-05-03 18:44
@greg Thx, looking to get my noms on while here :yum:

greg
2017-05-03 18:45
Tex-Mex - Chuy's and/or Trudy's - Chuy's is good food (chain, but started here). One on North Lamar is not to far from you. The one on riverside is neat because you can walk to Zilker park or other venues.

greg
2017-05-03 18:45
Stubb's is neat because it is near sixth street for music and bars. If that is your thing.

greg
2017-05-03 18:46
Iron works is similar location wise just.

wdennis
2017-05-03 18:46
Music yes, bars not anymore :wink:

wdennis
2017-05-03 18:49
(Except if they have music)

greg
2017-05-03 18:49
most bars have some music especially downtown.

wdennis
2017-05-03 18:52
Loving being back in my home state (born in Houston, lived in El Paso for my early years)

2017-05-03 18:53
Hi rebars

2017-05-03 18:54
Just need a quick help here

2017-05-03 18:54
I am trying to deply openstack workload on DR

2017-05-03 18:54
But it fails at the last step on openstak-deploy

2017-05-03 18:54
with error Downloading common from repo http://localhost:8879/charts helm lint nova ==> Linting nova [INFO] Chart.yaml: icon is recommended [ERROR] templates/: render error in "nova/templates/daemonset-libvirt.yaml": template: nova/templates/daemonset-libvirt.yaml:14:62: executing "nova/templates/daemonset-libvirt.yaml" at <include "hash">: error calling include: template: nova/charts/common/templates/_funcs.tpl:22:4: executing "hash" at <include $wtf $contex...>: error calling include: template: nova/templates/configmap-etc.yaml:9:56: executing "nova/templates/configmap-etc.yaml" at <include "template">: error calling include: template: nova/charts/common/templates/_funcs.tpl:12:3: executing "template" at <include $wtf $contex...>: error calling include: template: no template "nova/templates/etc/_ceph.client.cinder.keyring.yaml.tpl" associated with template "gotpl" Error: 1 chart(s) linted, 1 chart(s) failed Makefile:51: recipe for target 'build-nova' failed

2017-05-03 18:55
any help will be highyl appreciated

2017-05-03 18:55
its for a school project :)

2017-05-03 18:55
and while we are on it, try Tres-Leches from Chuy's

wdennis
2017-05-03 18:56
@ayush37 :+1:

2017-05-03 18:57
@zehicle is there any assistance regarding the error I am facing, if fixed I owe you a Tres-Leches

greg
2017-05-03 18:57
@ayush37 - not sure. I haven't tried it 3 mos or so. The upstream may have moved.

2017-05-03 18:58
@zehicle Thanks, shall try fixing it by some means

greg
2017-05-03 18:58
I'm refreshing my memory.

greg
2017-05-03 18:58
This is Greg, it is confusing.

2017-05-03 18:59
yeah it is

greg
2017-05-03 18:59
My current guess is that the upstream has moved on. We do a git clone:

greg
2017-05-03 18:59

greg
2017-05-03 18:59
but that is my repo. So - maybe not. Still could have atrophied.

greg
2017-05-03 19:00
What school project?

2017-05-03 19:00
I am a masters student at UT Dallas, its a research project

greg
2017-05-03 19:00
Hmm - also could be 1.6.1 update.

2017-05-03 19:00
where my professir wants me to try out Digital Rebar

greg
2017-05-03 19:00
nice - okay

greg
2017-05-03 19:01
Interesting - would like to know what for? You may want to stick to more supported things. :slightly_smiling_face:

greg
2017-05-03 19:01
What version of k8s did you install?

greg
2017-05-03 19:01
1.6.1 or 1.5.3 - it will depend upon when you started using DR.

greg
2017-05-03 19:01
It is possible that you are using a too new k8s for the openstack stuff.

2017-05-03 19:01
actually, we are woryeah maybe

2017-05-03 19:01
yeah maybe

2017-05-03 19:02
let me check the k8 installation

2017-05-03 19:02
regaring the project, we are working with Ericsson for some POC

2017-05-03 19:02
which involves the use of digital Rebar

greg
2017-05-03 19:02
One option is to "redeploy" after changing the version 1.5.3.

greg
2017-05-03 19:02
For OpenStack or general machine management?

2017-05-03 19:02
for OpenStack only

2017-05-03 19:03
I tried to deploy the openstack workload from GUI

greg
2017-05-03 19:03
Okay- so you are looking for a quick openstack and DR might help with that.

2017-05-03 19:03
of the rebar

2017-05-03 19:03
exactly

2017-05-03 19:04
let me try with the older version of K8

2017-05-03 19:04
thanks a lot

greg
2017-05-03 19:05
Yeah - so , you can change the options - reset things - then don't have the workload autocommit. Go into the deployment and find hte k8s-config role/service (right corner of deployment). Edit k8s version to 1.5.3. Then commit the deployment.

greg
2017-05-03 19:05
Also, you have DR working? Without talking to us? How?

greg
2017-05-03 19:05
:slightly_smiling_face:

2017-05-03 19:05
hahaha, it was pretty self-explanatory the website

2017-05-03 19:05
its really kind of you guys to make it so lucid

greg
2017-05-03 19:06
You must think like I do, which is rare. I'm sorry for you.

greg
2017-05-03 19:06
or Victor. Still sorry for you.

2017-05-03 19:06
if this works, I may get a "A" grade in my subject

2017-05-03 19:06
CLoud COmputing

2017-05-03 19:06
:D

greg
2017-05-03 19:07
ok - keep me posted. I'm interested.

2017-05-03 19:07
with a Tres-Leches cake from myside to the DR team

2017-05-03 19:07
sure i wll

2017-05-03 19:10
I started the deployment with 1.5.3

greg
2017-05-03 19:12
okay - so it was already a t 1.5.3

2017-05-03 19:13
it was at 1.6.1

greg
2017-05-03 19:13
so - restarted. - let's see what happens.

greg
2017-05-03 19:14
WIth regard to Tres-Leches, I never make that far. Between the Queso, chips, and meal, I'm usually ready to explode. :slightly_smiling_face:

2017-05-03 19:15
yeah !! exactly, so I always ask the server to make the cake as togo

2017-05-03 19:15
once I am home, there no one stopping me

2017-05-03 19:15
from pounding on it

greg
2017-05-03 19:15
:slightly_smiling_face: planning and gluttony - it is a beautiful thing

2017-05-03 19:16
its like on of the sevens sins, executed perfectly

2017-05-03 19:19
getting back to DR fiasco: its running well and currently perfroming the etcs-install

2017-05-03 19:19
etcd*

2017-05-03 19:50
Unofrtunately failed at k8s-dns

2017-05-03 19:50
Err: [WARNING]: Consider using yum, dnf or zypper module rather than running rpm Backtrace: /opt/digitalrebar/core/rails/app/models/jig.rb:52:in `die' /opt/digitalrebar/core/rails/app/models/barclamp_rebar/ansible_playbook_jig.rb:290:in `run' /opt/digitalrebar/core/rails/app/models/run.rb:79:in `run' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/job.rb:15:in `_run' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/job.rb:100:in `block in work' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/adapters/active_record.rb:5:in `block in checkout' /var/cache/rebar/gems/ruby/2.1.0/gems/activerecord-4.2.5/lib/active_record/connection_adapters/abstract/connection_pool.rb:292:in `with_connection' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/adapters/active_record.rb:48:in `checkout_activerecord_adapter' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/adapters/active_record.rb:5:in `checkout' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/job.rb:83:in `work' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/worker.rb:78:in `block in work_loop' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/worker.rb:73:in `loop' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/worker.rb:73:in `work_loop' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/worker.rb:17:in `block in initialize'

2017-05-03 19:51
@zehicle Now i am trying as Ubuntu 16.04 as OS, prev deployment was on Centos 7.2

2017-05-03 20:22
unfortunately failed at same step with k8s version 1.5.3

greg
2017-05-03 20:23
hmm - okay - I'll add it to my queue to retry it myself.

greg
2017-05-03 20:24
Can you describe your environment so I can get close to it?

2017-05-03 20:24
yes it is on AWS Ubuntu 16.04 LTS as Rebar server

greg
2017-05-03 20:24
What nodes?

greg
2017-05-03 20:25
AWS instances?

greg
2017-05-03 20:25
from the aws provider?

2017-05-03 20:25
t2.xlarge

2017-05-03 20:26
So if I try with K8s version 1.5.3 it fails at k8s dns step, and if the version is 1.6.1 it fails at last step OPenstack-deployment

2017-05-03 20:26
thanks a lot @zehicle for you consideration, highly appreciated

greg
2017-05-03 20:27
yeah - I need to see what is failing . "Something" has changed. I may not get to this until tomorrow.

2017-05-03 20:27
yes I understand, thanks again for your effortm, I shall also try debugging it, if I find a fix shall post you

2017-05-03 20:28
till then, may the force be with you

greg
2017-05-03 20:31
:slightly_smiling_face:

zehicle
2017-05-03 21:07
I'd suggest http://Packet.net over AWS for Openstack tests.

lae
2017-05-03 21:13
there's a rackn code iirc

zehicle
2017-05-03 22:56
Yes, RACKN100 - I think it gets $25 credit or so

lae
2017-05-03 23:10
I think it may actually have been $100?

zehicle
2017-05-03 23:13
it was originally...

zehicle
2017-05-03 23:14
about a year ago. early adopter benefits :slightly_smiling_face:

lae
2017-05-03 23:15
Ah.

lae
2017-05-03 23:16
Guess I got in early enough, haha.

2017-05-04 16:44
@zehicle haha, was working late night, guess i will try with Packet.net in that case

2017-05-04 16:45
thansks for the update though, have a great day!!

zehicle
2017-05-04 16:48
in a few minutes, RackN is going to be posting (twitter, linkedin, facebook, web) a lot about DRP. If you've been having good time with it, please help us get the word out about it! Thanks.

2017-05-04 16:49
sure thing, will spread it in UT Dallas and Ericsson

lae
2017-05-04 16:49
:+1:

2017-05-05 01:41
hey @zehicle I tried all the permutations and combinations of the k8's version on AWS and Packet, the issues is still the same, it gets stuck at the very last step of Openstack-Deployment, with an error depictin issue in "helm lint nova", Please let me know if there is a fix as Ericsson shall be really interested in implementing this for expedited OPenstack deployment :smile:

2017-05-05 01:42
clarification for k8s version 1.6.1 it goes all the way to last step, for all other versions 1.51, 1.5.3, 1.5.7 it fails at K8 DNS

greg
2017-05-05 03:09
okay -

zehicle
2017-05-05 14:45
@Ayush37 if there is commercial interest, then let's talk about that 1x1. I'm at the OpenStack summit next week. This work is exciting but early and will need some sustaining effort.

2017-05-05 14:59
Hey @zehicle , Regarding the commercial interest, I shall pass on the information to my manager as I am an intern and not in a position to make a decision

zehicle
2017-05-05 15:10
We are happy to get on a call and discuss with interested parties. OpenStack Helm is an active project and week require ongoing support to maintain.

wdennis
2017-05-05 23:54
Good to meet the DR crew at DevOpsDays Austin - best DOD I've yet attended!

greg
2017-05-06 03:36
It was good! Glad to meet you too!

greg
2017-05-08 13:58
@chermack - this is the community

chermack
2017-05-08 13:59
has joined #community201705


wdennis
2017-05-08 16:55
@greg Any idea why I'd be getting this error?

greg
2017-05-08 16:58
Yes - change the 127.0.0.1 in the bar to match your ip address in the overall nav bar.

greg
2017-05-08 16:58
@wdennis

wdennis
2017-05-08 17:01
Why does the Swagger UI assume 127.0.0.1?

greg
2017-05-08 17:02
well - umm - you see - umm - lazy me. It has to with the fact it isn't very integrated into our system.

wdennis
2017-05-08 17:02
Ah

wdennis
2017-05-08 17:03
Picking your battles eh? :wink:

greg
2017-05-08 17:03
We use the swagger-ui without any changes (except to reference that URL). I'd have to change it is a little more to make it more dynamic.

greg
2017-05-08 17:03
We also want to switch which one we use.

wdennis
2017-05-08 17:04
:+1::skin-tone-2:

wdennis
2017-05-08 17:23
@greg Are the .yaml files in assets/profiles automatically loaded, or do they have to be loaded via drpcli ??

greg
2017-05-08 17:23
you have to load them. They are examples ,because you usually have to change value.s

zehicle
2017-05-08 18:53
RE swagger-ui -> there's a new version based on react that we'd switch to also

zehicle
2017-05-08 18:53
the rest of the UX is based on react

zehicle
2017-05-08 18:53
DRP UX

wdennis
2017-05-08 23:30
@greg No helper cmd yet in drpcli to associate profiles with machines?

greg
2017-05-08 23:41
Not yet - soon - maybe tonight.

greg
2017-05-08 23:41
drpcli machines update <uuid> '{ "Profiles": [ "prof1", "prof2" ] }'

greg
2017-05-08 23:42
will do it. WARNING- the UX will eat profiles if you set bootenv. It is not implemented quite right.

wdennis
2017-05-08 23:44
You mean if you change the machine?s bootenv thru the UX?

greg
2017-05-08 23:45
yes

greg
2017-05-08 23:45
I noticed that today.

wdennis
2017-05-08 23:46
Thx for that tip, that was my next move :slightly_smiling_face:

greg
2017-05-08 23:46
drpcli machines bootenv <uuid> <bootenv>

greg
2017-05-08 23:47
works though

wdennis
2017-05-08 23:47
So, do I use ?drpcli machines update ?? to change the bootenv, or is there a specific drpcli cmd to do that?

wdennis
2017-05-08 23:47
n/m :wink:

greg
2017-05-08 23:48
you can do them in the same update.

greg
2017-05-08 23:48
```drpcli machines update <uuid> '{ "Profiles": [ "prof1", "prof2" ], "BootEnv": "newbootenv" }' ```

wdennis
2017-05-09 19:53
@greg If I want to update an existing profile, can I do the following? ``` drpcli profiles update - < assets/profiles/root-access.yaml ```

wdennis
2017-05-09 19:55
Tried it, throwing an error ?requires two arguments?

greg
2017-05-09 20:08
update needs object id

greg
2017-05-09 20:08
then -

wdennis
2017-05-09 20:09
Trying to change the ?access_ssh_root_mode? param in the ?root-access? profile, doing this: ``` drpcli profiles update "root-access" '{ "access_ssh_root_mode": "yes" }' ``` But no matter what I do, the value remains ?true? (instead of ?yes?, ?no?, etc.)

wdennis
2017-05-09 20:10
Ah - I see, can do: ``` drpcli profiles update "root-access" - < assets/profiles/root-access.yaml ``` and it works

wdennis
2017-05-10 02:05
@greg Trying to enable root access over SSH (yes, I know, bad?)

wdennis
2017-05-10 02:06
So in my ?root-access? profile, I have: ``` access_ssh_root_mode: "yes" ```

wdennis
2017-05-10 02:07
But when I check my resulting /etc/ssh/sshd_config, I see the following: ``` root@testnode01:~# grep -n ^PermitRoot /etc/ssh/sshd_config 28:PermitRootLogin prohibit-password 89:PermitRootLogin without-password ```

wdennis
2017-05-10 02:08
line 89 is the one put there by post-install; why isn?t it coming up with ```PermitRootLogin yes``` ??

greg
2017-05-10 02:25
Bug probably. I'll check

wdennis
2017-05-10 02:54
How can I check the rendered preseed template? A URL get?

greg
2017-05-10 03:19
yes - the template is part of a bootenv and assigned to a machine.

greg
2017-05-10 03:19
The template has a path like: '{{.Machine.Path}}/compute.ks'

greg
2017-05-10 03:20
This means: http://<ip>:8091/machines/<uuid>/compute.ks

wdennis
2017-05-10 03:39

wdennis
2017-05-10 03:43
n/m, I see that I have to set the BootEnv for the machine from ?local? to (in my case) ?ubuntu-16.04-install?

wdennis
2017-05-10 03:50
Next question: If I edit a template, do I have to do something to get DRP to pick up on the changes?

wdennis
2017-05-10 03:50
I am rendering the template, and I don?t see my changes?

greg
2017-05-10 04:00
yes - edit the template file.

greg
2017-05-10 04:00
then run: drpcli templates upload filename as filename

wdennis
2017-05-10 04:02
OK, cool, thanks

wdennis
2017-05-10 04:02
Looks good now

wdennis
2017-05-10 04:11
Hmm, looks like maybe a problem now?

wdennis
2017-05-10 04:13
So I did: ``` [dradmin@dr-admin drp]$ drpcli templates upload assets/templates/net-post-install.sh.tmpl as net-post-install.sh.tmpl ``` And now when I try to get the URL http://192.168.1.148:8091/machines/5fcbf69d-287e-4c2c-b085-5858665cd442/post-install.sh I?m seeing nothing returned?

wdennis
2017-05-10 04:14
The net_seed.tmpl upload worked OK though, when I get ?/seed I do see the correct preseed

greg
2017-05-10 04:16
usually it is net-post-install.sh

greg
2017-05-10 04:16
Template name and path are not always the same.

greg
2017-05-10 04:17
nvm - you are right.

wdennis
2017-05-10 04:17
Yeah, I have:

wdennis
2017-05-10 04:17
``` { "ID": "net-post-install.sh.tmpl", "Name": "net-post-install.sh", "Path": "{{.Machine.Path}}/post-install.sh" } ``` In the ubuntu-16.04-install bootenv

wdennis
2017-05-10 04:18
I was previously seeing something when I did a get on the ?/post-install.sh URL

greg
2017-05-10 04:18
make sure the machine's bootenv is still set to ubuntu-16.04

wdennis
2017-05-10 04:19
It is

wdennis
2017-05-10 04:19
The ?/post-install.sh URL doesn?t 404 or anything, just returns nothing?

greg
2017-05-10 04:20
okay - so that is usually a render problem.

greg
2017-05-10 04:21
You can check the bootenv errors to see if it shows something. The machine errors could as well

greg
2017-05-10 04:21
drpcli machines show <uuid> | jq .Errors

greg
2017-05-10 04:21
The log from dr-provision should show something too.

greg
2017-05-10 04:22
If you need to, you can turn on the debugRenderer preference to 2 and see if it shows anything.

greg
2017-05-10 04:22
It could have a golang template error.

wdennis
2017-05-10 04:24
No errors in relevant machine or bootenv

wdennis
2017-05-10 04:25
Does dr-provision log to filesystem somewhere, or just to console?

wdennis
2017-05-10 04:30
Now actually I see that I *am* getting a 404 on ?/post-install.sh URL

greg
2017-05-10 04:37
console

wdennis
2017-05-10 04:37
Had to restart dr-provision to get the console logs back?

wdennis
2017-05-10 04:38
And I?m seeing: ``` r-provision2017/05/10 00:22:08.919333 Rendering net-post-install.sh.tmpl for testnode01 booting ubuntu-16.04-install dr-provision2017/05/10 00:22:08.919393 Static FS: Failed to render template for /machines/5fcbf69d-287e-4c2c-b085-5858665cd442/post-install.sh: template: :54:12: executing "net-post-install.sh.tmpl" at <{{template "set-host...>: template "set-hostname.tmpl" not defined ```

greg
2017-05-10 04:38
there you go. the set-hostname.tmpl isn't loaded. it appears

greg
2017-05-10 04:38
cd assets/templates

greg
2017-05-10 04:39
drpcli templates upload set-hostname.tmpl as set-hostname.tmpl

wdennis
2017-05-10 04:39
Hmmm, didn?t change that one, & it exists? ``` -rw-r--r-- 1 dradmin dradmin 559 May 3 13:40 set-hostname.tmpl ```

greg
2017-05-10 04:40
yeah - if you upgrade without doing a bootenv install after upgrade, you are probably missing the subtemplates.

greg
2017-05-10 04:41
you can do: ``` for i in `ls *` do drpcli templates upload $i as $i done ``` in the assets/templates directory

wdennis
2017-05-10 04:43
Yup, here?s what I have: ``` [dradmin@dr-admin drp]$ drpcli templates list | grep "ID" "ID": "default-elilo.tmpl" "ID": "default-ipxe.tmpl" "ID": "default-pxelinux.tmpl" "ID": "local-elilo.tmpl" "ID": "local-ipxe.tmpl" "ID": "local-pxelinux.tmpl" "ID": "net-post-install.sh.tmpl" "ID": "net_seed.tmpl" "ID": "set-hostname.tmpl" ```

greg
2017-05-10 04:44
you are missing the root-access template

greg
2017-05-10 04:44
and others.

greg
2017-05-10 04:44
bootenv install just loads all templates in templates by default now.

greg
2017-05-10 04:44
So that little script snippet above will add them.

greg
2017-05-10 04:44
Or install a new bootenv.

wdennis
2017-05-10 04:45
OK, looks like they?re all loaded now

wdennis
2017-05-10 04:46
Aaaaaaaaand we?re good!

greg
2017-05-10 04:46
That would explain the no root access.

wdennis
2017-05-10 04:46
Yes

greg
2017-05-10 04:46
hmm - to things to do:

wdennis
2017-05-10 04:46
Time to redeploy

greg
2017-05-10 04:47
1. update the upgrade docs to remind people to reload all templates for that release. 2. See about making that renderer error show up on the machine errors list.

wdennis
2017-05-10 04:48
Thanks man

greg
2017-05-10 04:48
np

wdennis
2017-05-10 13:11
Good morning @greg :slightly_smiling_face:

wdennis
2017-05-10 13:11
Got just about everything where I want it now, except the apt sources.list?

wdennis
2017-05-10 13:11
This is what I?m getting now: ``` root@testnode01:~# grep -v ^# /etc/apt/sources.list | grep -v ^$ deb http://192.168.1.148:8091/ubuntu-16.04/install xenial main restricted deb http://192.168.1.148:8091/ubuntu-16.04/install xenial-updates main restricted deb http://192.168.1.148:8091/ubuntu-16.04/install xenial universe deb http://192.168.1.148:8091/ubuntu-16.04/install xenial-updates universe deb http://192.168.1.148:8091/ubuntu-16.04/install xenial multiverse deb http://192.168.1.148:8091/ubuntu-16.04/install xenial-updates multiverse deb http://192.168.1.148:8091/ubuntu-16.04/install xenial-backports main restricted universe multiverse ```

greg
2017-05-10 13:12
We shouldn't be touching it anymore.

wdennis
2017-05-10 13:12
Looks like it is being touched?

greg
2017-05-10 13:14
Make sure you aren't setting local_repo as a parameter on anything.

wdennis
2017-05-10 13:14
n/m, just found it? ``` [dradmin@dr-admin drp]$ cat assets/profiles/local-repo.yaml Name: local-repo Params: local_repo: true ```

wdennis
2017-05-10 13:15
I did not set it to ?true??

greg
2017-05-10 13:15
Make sure that isn't assigned to a machine and it isn't set in global.

greg
2017-05-10 13:15
Loading it as a profile is okay. putting it on a machine is not.

greg
2017-05-10 13:16
Okay - so - I found the other place.

greg
2017-05-10 13:16
You may need to play with it.

wdennis
2017-05-10 13:16
It is incorporated in a profile I think?

greg
2017-05-10 13:17
Adding a profile into the system doesn't directly do anything.

wdennis
2017-05-10 13:17
Setting the value to ?false? should disable this, yes?

greg
2017-05-10 13:17
You have to add that profile to the machine's list or make that change to the global profile.

greg
2017-05-10 13:17
You can, but it won't change the problem.

greg
2017-05-10 13:18
The problem is these lines in the preseed file:

greg
2017-05-10 13:18
``` d-i mirror/protocol string {{.ParseUrl "scheme" .Env.InstallUrl}} d-i mirror/http/hostname string {{.ParseUrl "host" .Env.InstallUrl}} d-i mirror/http/directory string {{.ParseUrl "path" .Env.InstallUrl}} ```

wdennis
2017-05-10 13:19
So, DRP apt sources are default still?

greg
2017-05-10 13:20
well - kinda

wdennis
2017-05-10 13:23
So, I should comment out those preseed file lines?

greg
2017-05-10 13:24
replace them with:

greg
2017-05-10 13:24
``` d-i mirror/http/hostname string http://archive.ubuntu.com d-i mirror/http/directory string /ubuntu ```

greg
2017-05-10 13:27
``` {{if (eq "debian" .Env.OS.Family)}} d-i mirror/protocol string http d-i mirror/http/hostname string http://http.us.debian.org d-i mirror/http/directory string /debian {{else}} {{ if .ParamExists "local_repo" }} {{ if eq (.Param "local_repo") true }} d-i mirror/protocol string {{.ParseUrl "scheme" .Env.InstallUrl}} d-i mirror/http/hostname string {{.ParseUrl "host" .Env.InstallUrl}} d-i mirror/http/directory string {{.ParseUrl "path" .Env.InstallUrl}} {{else}} d-i mirror/http/hostname string http://archive.ubuntu.com d-i mirror/http/directory string /ubuntu {{end}} {{else}} d-i mirror/http/hostname string http://archive.ubuntu.com d-i mirror/http/directory string /ubuntu {{end}} ```

greg
2017-05-10 13:28
That would honor the local_repo var. I'm going to test that and commit it.

wdennis
2017-05-10 13:33
And if the ?local-repo? profile is not included in the local machine?s Profile or Profiles list, or in the global profile, then I should get the Ubuntu apt sources then?

greg
2017-05-10 13:33
yes - if you add the above piece in place.

greg
2017-05-10 13:34
I think.

greg
2017-05-10 13:34
I haven't tried it.

wdennis
2017-05-10 13:37
Well here goes nothin? :slightly_smiling_face:

wdennis
2017-05-10 13:40
Looks like a winner - from rendered seed: ``` d-i mirror/http/hostname string http://archive.ubuntu.com d-i mirror/http/directory string /ubuntu ```

wdennis
2017-05-10 13:41
I?ll next load the local-repo profile on another test box and see if I get the DRP repo

wdennis
2017-05-10 13:46
Hmm, doesn?t seem to work, get the same lines as above in the seed?

wdennis
2017-05-10 13:48
n/m, didn?t add it to the profiles list via drpcli?

greg
2017-05-10 13:49
tip as new commands addprofile removeprofile on machines object.

wdennis
2017-05-10 13:51
nice

greg
2017-05-10 13:51
At oscon today. Will be off and on

wdennis
2017-05-10 13:52
ack

wdennis
2017-05-10 13:52
OK, with the ?local-repo? profile loaded, it does work: ``` d-i mirror/protocol string http d-i mirror/http/hostname string 192.168.1.148:8091 d-i mirror/http/directory string /ubuntu-16.04/install ```

wdennis
2017-05-10 16:15
FYI, as far as enabling remote root access?

wdennis
2017-05-10 16:19
I see the post-install.sh just adds a second ?PermitRootLogin? line to sshd_config, to wit: ``` root@testnode02:~# grep -n ^PermitRoot /etc/ssh/sshd_config 28:PermitRootLogin prohibit-password 89:PermitRootLogin yes ```

wdennis
2017-05-10 16:20
As it turns out, later directives don?t overrule former ones in sshd_config

wdennis
2017-05-10 16:20
Always takes the first directive

wdennis
2017-05-10 16:22
So, I modified the ?root-remote-access.tmpl? thusly: ``` {{if .ParamExists "access_keys"}} mkdir -p /root/.ssh cat >/root/.ssh/authorized_keys <<EOFSSHACCESS ### BEGIN GENERATED CONTENT {{ range $key := .Param "access_keys" }} {{$key}} {{ end }} ### END GENERATED CONTENT EOFSSHACCESS {{end}} sed --in-place -re '/^PermitRootLogin/ s/prohibit-password/{{if .ParamExists "access_ssh_root_mode"}}{{.Param "access_ssh_root_mode"}}{{else}}without-password{{end}}/' /etc/ssh/sshd_config ```

wdennis
2017-05-10 16:24
Which generates when rendered: ``` mkdir -p /root/.ssh cat >/root/.ssh/authorized_keys <<EOFSSHACCESS ### BEGIN GENERATED CONTENT ssh-rsa [redacted] will@Wills-MacBook-Air ### END GENERATED CONTENT EOFSSHACCESS sed --in-place -re '/^PermitRootLogin/ s/prohibit-password/yes/' /etc/ssh/sshd_config ```

greg
2017-05-10 16:32
cool

greg
2017-05-10 16:33
I'll look to pull something like that in.

wdennis
2017-05-10 16:34
Did not fix ?AcceptEnv? yet ? ``` root@testnode02:~# grep -n ^AcceptEnv /etc/ssh/sshd_config 75:AcceptEnv LANG LC_* 90:AcceptEnv http_proxy https_proxy no_proxy ```

greg
2017-05-10 16:34
It should probably removed.

wdennis
2017-05-10 16:34
You need line 90 for proxy environments?

greg
2017-05-10 16:35
yes

greg
2017-05-10 16:35
well - for our DR runners

wdennis
2017-05-10 16:36
Then I guess ? http_proxy https_proxy no_proxy? should be added at the end of ?AcceptEnv LANG LC_*?

greg
2017-05-10 16:36
Yeah - not sure. Probably.

2017-05-12 20:15
I <3 drp

greg
2017-05-12 20:16
:slightly_smiling_face:

2017-05-12 20:17
Box was on CentOS 6.9 - stuck in the void.

2017-05-12 20:17
Now pxebooting and will be running centos 7.3 ASAP.

2017-05-12 20:33
any way to tell slegehammer to reboot pxe via drpcli?

2017-05-12 20:38
I used IPMI - but is there a more drpy way?

greg
2017-05-12 20:43
currently no. we are looking at a following to drp -> drpv (digitalrebar provider) that would replace drp but adds IPMI/BIOS/RAID. Something like that.

2017-05-12 20:51
Interesting to keep them separate. Microservices, even.

greg
2017-05-12 20:51
yeah - with go - they can live in the same binary, but be implemented that way.

greg
2017-05-12 20:51
So, we can separate them or not.

2017-05-12 21:16
If I just replace the greg key with my pubkey, "centos should work" eh?

2017-05-12 21:17
or do I need to set the params on the global profile?

greg
2017-05-12 21:18
yes

greg
2017-05-12 21:18
make sure the profile is added to the node or add it to the global profile.

2017-05-12 21:20
add profile to global profile?

greg
2017-05-12 21:21
add the param to the global profile.

greg
2017-05-12 21:25
I need to add some docs for that.

greg
2017-05-12 21:25
It is on my list.

2017-05-12 21:31
drpcli profiles show global { "Name": "global", "Params": { "access_keys": { "access_ssh_root_mode": "without-password", "root": "ssh-rsa <stuff> root@os1" } } }

2017-05-12 21:31
LIke so, Obi-wan?

greg
2017-05-12 21:31
access_ssh_root_mode is a peer with access_key.

greg
2017-05-12 21:31
like this:

2017-05-12 21:32
ah, oops. I see.

greg
2017-05-12 21:32
``` { "Name": "global", "Params": { "access_ssh_root_mode": "without-password", "access_keys": { "root": "ssh-rsa <stuff> root@os1" } } } ```

greg
2017-05-12 21:32
You in fact can use that an update blob on global

greg
2017-05-12 21:32
drpcli profiles update global - < file.json

greg
2017-05-12 21:33
where that snippet is in file.json

2017-05-12 21:36
got it. or call params set twice.

greg
2017-05-12 21:37
yes - one with each. :slightly_smiling_face:

2017-05-15 07:37
Hi.

2017-05-15 07:41
I have tried to get the digital rebar up and running but, something makes it eat upp all the memory I give it. I trying to use it to provision bare-metal.

2017-05-15 07:42
Any known issues?

2017-05-15 07:45
By the way it does seem like a great piece of software by looking at your demos.

rstarmer
2017-05-15 09:11
Giving provision a try, and was going to launch kolla-ansible based openstack on top, but it looks like the ubuntu images point to the control node for apt repo, and that doesn't quite work (complains about security, etc.)

rstarmer
2017-05-15 09:12
any pointers on where the apt repo ends up on disk would be useful. I'm looking to deploy this with a customer tomorrow (well, Monday), and I think I can get the basics going, but I'm not sure how to deal with this interaction?

rstarmer
2017-05-15 10:13
I think where I?m struggling is how do the bootenvs incorporate profiles? I?m not seeing where they get matchesd.

greg
2017-05-15 12:39
@rstarmer - profiles are attached to machines. They aren't matched per se. You have add them explicitly.

greg
2017-05-15 12:39
@rstarmer - I have a fix in my tree for ubuntu.

greg
2017-05-15 12:40
I missed a couple of local_repo wrappers.

greg
2017-05-15 12:42
tip will have the fix in about 10 minutes.

rstarmer
2017-05-15 15:19
is there an upgrade process? Or do I stop the service, re-install, and re-start?

greg
2017-05-15 15:19
@svallebro - For digitalrebar, you need to make sure your admin node has at least 2 cores (4 is better) and 6GB of memory.



rstarmer
2017-05-15 15:20
@greg thanks

rstarmer
2017-05-15 16:40
@greg do you have an example of mapping profiles to machines?

greg
2017-05-15 16:50
Did you get tip? The drpcli has a helper command to add profiles to machines

wdennis
2017-05-15 17:59
@greg After working with profiles/templates for a bit now, I think it would be helpful to have a ?drpcli template render [name]? command

wdennis
2017-05-15 18:01
That way you don?t have to actually apply the templates to a machine via profiles, and then call a URL to see how they would render?

wdennis
2017-05-15 18:01
Of course, there would have to be a way to spec a machine I guess in that command?

greg
2017-05-15 18:02
yes - I have two issues in the backlog for this function. :slightly_smiling_face: @wdennis

wdennis
2017-05-15 18:02
sweet

rstarmer
2017-05-16 04:23
@greg I got tip, and am just now relaunching my ubuntu instance. I'll have a look at the helper function now.

rstarmer
2017-05-16 04:37
hmm, I'm clearly doing something wrong, none of the profiles in assets are ingestible. Also, no docs on profiles?

greg
2017-05-16 05:20
should be there now.

greg
2017-05-16 05:21
tip was updated 2 hours ago with profile helpers, and docs in latest

greg
2017-05-16 05:22
oh - profiles need to be edited for your environment which is why they are not autoimported. @rstarmer

rstarmer
2017-05-16 05:27
I figured they did, and I did, but importing fails, I'll dig a bit more shortly.

rstarmer
2017-05-16 06:03
I updated root acess, and this is what I get: drpcli profiles create assets/profiles/ubuntu-access.yaml --format yaml Error: Invalid profile object: error unmarshaling JSON: json: cannot unmarshal string into Go value of type models.Profile

rstarmer
2017-05-16 06:53
@greg got it. Saw your earlier example. But why is it not possible to just pass a file as an argument rather than having to - < file.json? Certainly causes end-user confusion, I expect, after reading the docs to be able to "drpcli profile create -F yaml assets/profiles/my_profile.yaml" And all your default documents are YAML, so why is the default file format JSON? Just a few user suggestions :slightly_smiling_face:

greg
2017-05-16 12:51
```drpcli profiles create assets/profiles/ubuntu-access.yaml --format yaml```

greg
2017-05-16 12:51
should be:

greg
2017-05-16 12:51
```drpcli profiles create - < assets/profiles/ubuntu-access.yaml```

greg
2017-05-16 12:51
it will figure out json vs yaml on the redirect in.

greg
2017-05-16 12:53
oh - sorry, brain is waking up. that was a feature request. The -F or --format is for output only. Input should be either.

jj
2017-05-16 21:12
has joined #community201705

zehicle
2017-05-16 21:48
@jj can you point @greg to code that installs the agent?

jj
2017-05-16 21:49
sure


jj
2017-05-16 21:50
it?s just the typical chef-bootstrap

greg
2017-05-16 22:05
okay cool -

greg
2017-05-16 22:27
Should be doable, I'll see what I can do.

jj
2017-05-16 22:28
awesome.

greg
2017-05-16 22:30
sorry quick questions is proxy user and proxy pass supposed to be http://<username>

greg
2017-05-16 22:30
same with password

greg
2017-05-16 22:30
Also @jj - okay if I make it more parameterized?


jj
2017-05-16 22:39
@zehicle & @greg

jj
2017-05-16 22:39
@greg absolutely

jj
2017-05-17 00:49

greg
2017-05-17 02:27
@jj - thanks for the PRs. Give it about twenty minutes and they should be in tip images.

jj
2017-05-17 02:28
awesome!

jj
2017-05-17 02:28
i?ll be able to verify the esxi 65 images _ideally_ thursday

greg
2017-05-17 02:29
sounds fine. We can always adjust. Many of them are examples anyway. We may want to generate a validated matrix in the docs at some point. Not sure. Something to contemplate.

jj
2017-05-17 02:30
:slightly_smiling_face:

greg
2017-05-17 02:37
while you are around, @jj - I'm thinking about this as a set of parameters for chef client install. ``` Params: chef_server_url: https://mumble chef_validation_name: vname chef_validation_pem_drp_location: files/chef/validation.pem chef_validation_pem_ext_location: http??/// chef_validation_pem_string: filecontent chef_client_package_name: fred.rpm chef_client_package_drp_location: files/chef/ chef_client_package_ext_location: http://mumble/... chef_client_first_boot_run_list: - role1 - role2 chef_client_environment: env1 ```

greg
2017-05-17 02:37
For proxy, I'll already have a set of params for that. I'll add to it.

jj
2017-05-17 02:38
seem reasonable

greg
2017-05-17 02:38
I'll create defaults for ones that make sense.

greg
2017-05-17 02:39
I'm leery of the last part - about interface naming and forcing.

greg
2017-05-17 02:39
If anything, I'll add it another helper in the library of functions that can be included, but off by default.

greg
2017-05-17 02:39
Not sure how important it is.

2017-05-17 10:58
Hi, my installation stops at "TASK [gem install kvm slaves]". How can i prevent this task? I don't need any KVm management or is this tools also needs in other reason? From the documentation prospective, this tools needed in development environment but i will use digital rebar in production case.

2017-05-17 12:08
We installed rebar on ubuntu 16.x. The task "TASK [wait for admin convergence [1 upto 20 minutes]]" failed. I am not able to login to the web portal using rebar/rebar1. Is there anything we can check in the logs?

greg
2017-05-17 12:25
@theta-my: Are you behind a proxy? it tries to get things from ruby gems and that can be hard sometimes behind a proxy or if you are in some Asian countries.

greg
2017-05-17 12:27
@nratnakaram_twitter - ```cd digitalrebar/deploy/compose docker-compose ps docker-compose logs rebar_api > /tmp/rebar-api.log ```

greg
2017-05-17 12:27
The ps will should show all containers running.

greg
2017-05-17 12:27
The log should show some information about where it is.

greg
2017-05-17 12:28
If a container isn't running, we should be able to see the log for that container by replacing rebar_api with the container name in the logs command.

2017-05-17 12:29
Yes, I'm behind a stack of firewalls.

2017-05-17 12:29
No way out ;)

2017-05-17 12:30
For all installation stuff, I can only use the corporate satellite server. But for gems...

greg
2017-05-17 12:30
okay - then that is going to be a problem. We assume that admin node has a mostly unrestricted outbound access. We have started to do some work to build an install image, but it is still a work in process.

greg
2017-05-17 12:31
It will use a different deployment method. If we get it working, I may switch the install system over to it.

2017-05-17 12:31
If i know which gems needed and where are this gems searched, I can provide this manually

greg
2017-05-17 12:33
What OS?

2017-05-17 12:33
rhel7

greg
2017-05-17 12:35
in this file: deploy/tasks/base-centos.yml - lines 40-44.

greg
2017-05-17 12:35
You could try deleting them.

greg
2017-05-17 12:35
I'm not sure they are needed for your case.

2017-05-17 12:36
I will give them a try :) Thanks!

greg
2017-05-17 12:36
I suspect this is going to be a slog, but eventually, we are going to expect to get to AWS s3 to get sledgehammer.

2017-05-17 12:36
@zehicle We are not seeing any logs. Is there a specific container that should be running for authentication?

2017-05-17 12:38
We are re-running - ./wait_for_rebar.sh

2017-05-17 12:43
Sorry, the next outgoing connection is requested. "https://get.docker.com/" :(

2017-05-17 12:52
@zehicle ./wait_for_rebar.sh failed again. Authentication is not happening. These are the containers running as of now. There are no logs.

2017-05-17 12:52
[?5/?17/?2017 6:21 PM] Ashokan, Pradeep Kumar: digitalrebar/dr_rebar_api:master  digitalrebar/dr_goiardi:master    digitalrebar/dr_webproxy:master   digitalrebar/dr_dns:master        digitalrebar/cloudwrap:master     digitalrebar/logging:master       digitalrebar/dr_provisioner:maste digitalrebar/dr_rev_proxy:master  digitalrebar/dr_trust_me:master   digitalrebar/dr_postgres:master   digitalrebar/rule-engine:master   gliderlabs/consul                 digitalrebar/dr_forwarder:master 

greg
2017-05-17 12:55
@nratnakaram_twitter - This is @galthaus - the forwarder prepends zehicle. There should be logs. Those are images which look okay.

greg
2017-05-17 12:56
@nratnakaram_twitter - you don't get anything from ``` docker-compose logs -f rebar_api ``` when run in digitalrebar/deploy/compose

greg
2017-05-17 12:56
??

greg
2017-05-17 12:57
@theta-my - umm - let me check with @vlowther on where we are with the single image.

greg
2017-05-17 12:57
In either case, you are going to need docker install on the machine.

greg
2017-05-17 12:57
our install image has all containers in it without having to go to docker hub.

vlowther
2017-05-17 12:59
Basic install stuff works. Don't have anything set up to autogen the artifacts, tho.

2017-05-17 12:59
Ok, this was missing in the pre-check from depploy/rin-in... --help ;)

vlowther
2017-05-17 13:00
It has not been added to any of our ansible install methods yet.

greg
2017-05-17 13:01
Kinda like I said, we assume the admin node has internet access for the time being. We are trying to remove that requirement.

2017-05-17 13:02
I'm happy to support you to find out to go further with this ;)

greg
2017-05-17 13:03
Can you describe your networking? What is your host environment and what is outbound/inbound inet access like? What are you trying to do with DR?

2017-05-17 13:03
So, docker install was not the problem :( Script stops at "TASK [Get Docker]"

greg
2017-05-17 13:04
@theta-my: That does ``` - name: Get Docker get_url: url=https://get.docker.com/ dest=/tmp/docker.sh validate_certs=False become: yes ```

greg
2017-05-17 13:04
Which is supposed to pay attention to proxies, but may not.

greg
2017-05-17 13:04
Though that script will attempt to get to the internet as well.

2017-05-17 13:05
I have no direct internet access. I'm work on a production environment. I can only use the repos provided by a other team with respect of corporate security.

2017-05-17 13:05
@galthaus Now we got the logs. postgres seems to be in the "restarting" status

greg
2017-05-17 13:05
okay - so do you have docker in those repos?

2017-05-17 13:06
docker as installation package - YES

2017-05-17 13:06
I have run the install some seconds ago, no problem.

greg
2017-05-17 13:06
okay - so -we will need to work through this. I need to go to an awards ceremony for my daughter. I'll be back in a couple of hours.

greg
2017-05-17 13:09
@theta-my - This will not work currently. We need to build you an image that has docker containers, because the next step is to tell docker to pull from the internet a set of containers. It sounds like that is not allowed in your environment. So, we need a different plan. Can you get to the internet and put things on those boxes?

2017-05-17 13:11
Only if I use my workstation as a "proxy" (no, not as network proxy, as file proxy).

greg
2017-05-17 13:12
yeah - so, you can get files and put them in place. That oculd be workable.

2017-05-17 13:12
yup

2017-05-17 13:35
@galthaus, I am part of @nratnakaram_twitter Team. As we pointed in previous chat, that prostgres getting restarting all the time. we got he logs for the postgres container>

2017-05-17 13:35
==> Log data will now stream in as it occurs: 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: a6357a579da8 172.17.0.10 2017/05/17 13:32:15 [INFO] serf: Attempting re-join to previously known node: 9151aab4dcf9: 172.17.0.9:8301 2017/05/17 13:32:15 [INFO] agent: Joining cluster... 2017/05/17 13:32:15 [WARN] manager: No servers available 2017/05/17 13:32:15 [ERR] agent: failed to sync remote state: No known Consul servers 2017/05/17 13:32:15 [INFO] agent: (LAN) joining: [10.138.161.161] 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: 7f8e39b33473 172.17.0.4 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: ebcb3d2be239 172.17.0.6 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: 9151aab4dcf9 172.17.0.9 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: 50fae633b067 172.17.0.8 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: 34e423275179 172.17.0.7 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: 861a1822a01b 172.17.0.5 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: fd4ac25c9be4 172.17.0.11 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: inll50904063h 10.138.161.161 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: 4d6c2e26f01d 172.17.0.2 2017/05/17 13:32:15 [INFO] serf: Re-joined to previously known node: 9151aab4dcf9: 172.17.0.9:8301 2017/05/17 13:32:15 [INFO] consul: adding server inll50904063h (Addr: tcp/10.138.161.161:8300) (DC: digitalrebar) 2017/05/17 13:32:15 [WARN] memberlist: Refuting a suspect message (from: a6357a579da8) 2017/05/17 13:32:15 [INFO] agent: (LAN) joined: 1 Err: <nil> 2017/05/17 13:32:15 [INFO] agent: Join completed. Synced with 1 initial agents % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 21 100 21 0 0 2753 0 --:--:-- --:--:-- --:--:-- 3000 The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale "en_US.utf8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. initdb: could not access directory "/var/lib/postgresql/data": Permission denied

greg
2017-05-17 14:11
What command are you using to start dr and as what user?

vlowther
2017-05-17 14:55
I am making a Jenkins builder for the offline install bits.

greg
2017-05-17 15:13
Cool

vlowther
2017-05-17 16:24
https://s3-us-west-2.amazonaws.com/rebar-offline/master.html <-- open the file, click the link to get the latest offline install bits

zehicle
2017-05-17 17:30
Nice

rstarmer
2017-05-17 17:36
help. How do I create a user? I.e., what's the password? I'm trying with a YAML of the form:

rstarmer
2017-05-17 17:37
Name: test Params: password: test

rstarmer
2017-05-17 17:37
I also tried the password as a cryptd value

rstarmer
2017-05-17 17:37
@greg @zehicle ^^

greg
2017-05-17 17:41
looking at it now.

greg
2017-05-17 17:45
@rstarmer - well - I don't have that plumbed. I'll be working on that for a little while.

rstarmer
2017-05-17 17:46
:slightly_smiling_face:

rstarmer
2017-05-17 17:51
is there a .isos directory somewhere that things get cached? I.e., can I pre-load the cache?

greg
2017-05-17 17:51
the tftpboot directory has an isos directory like the DigitalRebar cache directory. Putting things there works.

greg
2017-05-17 17:52
The bootenvs install command will create/check an isos directory that is a peer with bootenvs and templates.

greg
2017-05-17 17:52
You can cache there as well.

rstarmer
2017-05-17 17:55
cool, will try now.

rstarmer
2017-05-17 18:09
so how do I delete a bootenv? Seems that destroy fails, and I stopped the install in the middle of trying (very slowly) to download the sledgehammer.tar

rstarmer
2017-05-17 18:09
i.e., what?s the easiest way to blow away the environement and rebuild?

rstarmer
2017-05-17 18:14
what is the hashing mechanism for the users? I found the database?.

greg
2017-05-17 18:18
golang simple-crypt

greg
2017-05-17 18:18
delete the drp-data directory and restart the dr-provision

greg
2017-05-17 18:19
sledgehammer is already in "use".

greg
2017-05-17 18:19
If you install "local", set the prefs to not ref sledgehammer, you should be able to remove sledgehammer.

rstarmer
2017-05-17 18:27
so I re-built, and using the pre-downloaded sledgehammer, I get an error for both, it says the sha doesn?t match when it installs.

greg
2017-05-17 18:27
Partial download of sledgehammer?

greg
2017-05-17 18:27
should match what is in the files.

rstarmer
2017-05-17 18:28
possible, but will re-download (the local network is just slow, so using my phone?)

2017-05-17 21:57
ok so... back to this... can we use it to spin up bare metal and XENServer VMs

2017-05-17 22:12
yes for metal. we don't have a XEN provider so you'd need a cloud wrapper like OpenStack OR create a provider that would start VMs on hosts or some other way.

2017-05-17 22:13
We had someone ask about using bash/ansible on the hosts to start VMs that would then register into rebar. there are several options that could be built

2017-05-17 22:14
of course, you could PXE the VMs but that's not what I'd recommend for normal workloads.

2017-05-17 22:14
and if i require gui access from a public ip on an interface? but provisioing via private

2017-05-17 22:14
i could terraform the vms to spin up and bootp

2017-05-17 22:15
yes, you can control the provision interface

2017-05-17 22:16
okk ill run through it again on a CentOS 7 VM

2017-05-17 22:17
I don't see a terraform XEN provider, so I'm not sure what you are thinking.

2017-05-17 22:18
there is one :) ive used it before ....

greg
2017-05-18 06:03
Hi All, DRP has a new release v3.0.4 - https://github.com/digitalrebar/provision/releases

greg
2017-05-18 06:08
Stable and tip are updated to this.

greg
2017-05-18 06:08
There are not any changes to migrate. Stop, install, start dr-provision

2017-05-18 07:43
hi rob, can i ask you about your solution with my problem? I have no chance for direct internet access to install/ start digital rebar. I can only use my workstation as a file proxy. At the moment i hang on "TASK [Get Docker]". Docker is installed.

greg
2017-05-18 13:45
@theta-my - I'm looking at. Hope to have something later this afternoon.

greg
2017-05-18 13:45
Or at least some comments.

greg
2017-05-18 13:46
@theta-my - what is your use case?

2017-05-18 13:53
we will deploy some bare metal servers, orchestrated by IBM ICO, in a customer environment.

2017-05-18 13:54
and we will use digital rebar as IaaS, full automated ;)

2017-05-18 13:57
We need the full stack : metal discovery, inventory, os install based on inventory tags, hands over to chef

2017-05-18 14:10
That's the answer your expected?

greg
2017-05-18 14:19
Thinking about it. I may have more questions later.

jj
2017-05-18 18:43
@greg ping

jj
2017-05-18 18:44
I just pulled down `tip` to install it on a provisioning machine, and it seems i?m getting:

jj
2017-05-18 18:44
``` admini@echo:/var/log$ /usr/local/bin/dr-provision -bash: /usr/local/bin/dr-provision: cannot execute binary file: Exec format error admini@echo:/var/log$ ```

greg
2017-05-18 18:49
@jj - how did you run the install?

greg
2017-05-18 18:49
What type of system?

jj
2017-05-18 18:49
it?s a xps x86

jj
2017-05-18 18:49
and i was walking through the install instructions

jj
2017-05-18 18:49
i can zoom it if you want

jj
2017-05-18 18:49
(in like 5 mins)

greg
2017-05-18 18:49
os type?

jj
2017-05-18 18:49
ubuntu

greg
2017-05-18 18:50
okay - let me know when you are ready?

jj
2017-05-18 18:51
heh, seems now?ll work. http://bit.ly/zoom-jjasghar

2017-05-19 11:43
Hello team, I hang on : "TASK [Get Docker Compose] ****************************************************** fatal: [10.241.236.92]: FAILED! => {"changed": false, "dest": "/usr/local/bin/docker-compose", "failed": true, "msg": "Request failed", "response": "An unknown error occurred: coercing to Unicode: need string or buffer, NoneType found", "state": "absent", "status_code": -1, "url": "https://github.com/docker/compose/releases/download/1.7.1/docker-compose-Linux-x86_64"} "

2017-05-19 11:57
fixed: after install docker-compose must create a link "ln -s /usr/bin/docker-compose /usr/local/bin/docker-compose"

greg
2017-05-19 12:34
did you get docker-compose from the internet and then link it?

2017-05-19 12:35
no, installed via pip

greg
2017-05-19 12:35
ok

2017-05-19 13:15
I am getting the below error, when I try installing digital-rebar, Could you please help me on the same.. TASK [Pull compose images [SLOW]] ************************************************************************************************************************************************************************* fatal: [10.138.161.217]: FAILED! => {"changed": true, "cmd": "DR_TAG=master /usr/local/bin/docker-compose pull", "delta": "0:00:05.711742", "end": "2017-05-19 18:32:50.227777", "failed": true, "rc": 1, "start": "2017-05-19 18:32:44.516035", "stderr": "Pulling postgres (digitalrebar/dr_postgres:master)...\nGet https://registry-1.docker.io/v2/: dial tcp 34.205.194.204:443: getsockopt: no route to host", "stderr_lines": ["Pulling postgres (digitalrebar/dr_postgres:master)...", "Get https://registry-1.docker.io/v2/: dial tcp 34.205.194.204:443: getsockopt: no route to host"], "stdout": "", "stdout_lines": []} to retry, use: --limit @/home/I324148/digitalrebar/deploy/digitalrebar.retry PLAY RECAP ************************************************************************************************************************************************************************************************ 10.138.161.217 : ok=46 changed=19 unreachable=0 failed=1

greg
2017-05-19 13:29
@deepuashokan85 - do you have internet access to docker hub?

2017-05-19 13:32
@zehicle that's where I struck, system unable to contact docker hug, however git and yum repos able to talk to internet

greg
2017-05-19 13:33
Are you through a proxy?

2017-05-19 13:34
yes.. I am through proxy

greg
2017-05-19 13:34
hmm - okay - it should have setup docker to use the proxy, but maybe not. You should check that.

2017-05-19 13:35
when I do wget , getting below message: [I324148@inll50904062a digitalrebar]$ wget https://registry-1.docker.io/v2/ --2017-05-19 19:05:20-- https://registry-1.docker.io/v2/ Resolving proxy (proxy)... 172.28.64.41 Connecting to proxy (proxy)|172.28.64.41|:8080... connected. Proxy request sent, awaiting response... 401 Unauthorized Authorization failed. [I324148@inll50904062a digitalrebar]$

2017-05-19 13:36
Is there alternate way to have the docker hub local in my system?

2017-05-19 13:39
@zehicle ^^ ??

greg
2017-05-19 13:40
I'm working on it.

greg
2017-05-19 13:40
@deepuashokan85 and @theta_my are in similar problems. We are working on an offline install.

2017-05-19 13:42
Great @zehicle, let me know once you have it ready..

2017-05-19 14:03
next stop :worried: "TASK [Pull compose images [SLOW]] ********************************************** fatal: [10.241.236.92]: FAILED! => {"changed": true, "cmd": "DR_TAG=master /usr/local/bin/docker-compose pull", "delta": "0:01:31.028292", "end": "2017-05-19 16:02:13.261874", "failed": true, "rc": 1, "start": "2017-05-19 16:00:42.233582", "stderr": "Pulling postgres (digitalrebar/dr_postgres:master)...\nNetwork timed out while trying to connect to https://index.docker.io/v1/repositories/digitalrebar/dr_postgres/images. You may want to check your internet connection or if you are behind a proxy.", "stdout": "Trying to pull repository registry.access.redhat.com/digitalrebar/dr_postgres ... \nTrying to pull repository docker.io/digitalrebar/dr_postgres ... \nPulling repository docker.io/digitalrebar/dr_postgres", "stdout_lines": ["Trying to pull repository registry.access.redhat.com/digitalrebar/dr_postgres ... ", "Trying to pull repository docker.io/digitalrebar/dr_postgres ... ", "Pulling repository docker.io/digitalrebar/dr_postgres"], "warnings": []}"

2017-05-19 14:04
a wget to https://index.docker.io/v1/repositories/digitalrebar/dr_postgres/images runs with no error

greg
2017-05-19 14:07
This will fail completely.

greg
2017-05-19 14:07
You have to have an internet connection for that step.

greg
2017-05-19 14:08
I keep saying your use case is not really supported directly.

2017-05-19 14:08
yes, i have a proxy configuration which runs fine (the wget would also failed...)

2017-05-19 14:09
(after some discussion, a internet connection via proxy is allowed yet) :)

2017-05-19 14:13
@zehicle @theta-my , the problem is from office network I am unable to talk to docker hug, however from home network it works for me..

2017-05-19 14:14
I can not check this directly, https://index.docker.io/v1/repositories/digitalrebar/dr_postgres/ request a user name and password...

2017-05-19 14:28
Can some one verify if the requested files available?

greg
2017-05-19 14:37
it is there and not password protected

greg
2017-05-19 14:37
do you have password based proxy?

2017-05-19 14:38
no, the password request comes if i try to use the link in a browser

2017-05-19 14:38
if i use wget, no problem

2017-05-19 14:38
can you send my the complete file path?

2017-05-19 14:38
i will try wget to this

2017-05-19 14:39
only for check

greg
2017-05-19 14:39
it isn't a file.

greg
2017-05-19 14:39
well - it is , but it is a separate protocol that docker uses to get content.

greg
2017-05-19 14:39
My guess is that docker is misconfigured

2017-05-19 14:40
ahhh, separate protokoll...

2017-05-19 14:40
not 80 or 443 ...

2017-05-19 14:40
(http/ https)

greg
2017-05-19 14:41
it is those ports and those protos, but it turns into more requests

2017-05-19 14:43
??? than I'm lost yet, I have set a system proxy for http and https, tryed to configure the docker proxy in /etc/systemd/system/docker.service.d/http-proxy.conf

2017-05-19 14:43
but nothing helps

2017-05-19 14:43
something missing?

greg
2017-05-19 14:45
can you run this:

greg
2017-05-19 14:45
docker run hello-world

greg
2017-05-19 14:46
It should output Hello from Docker!

greg
2017-05-19 14:46
That means that you have docker configured correctly for your firewall environment.

2017-05-19 14:46
trying

2017-05-19 14:48
failed

2017-05-19 14:48
network time out

2017-05-19 15:23
ok, docker proxy rechecked, new configured, service restartd -> runs fine with "docker run hello-world"

greg
2017-05-19 15:31
retry the docker pull

2017-05-19 15:34
working on it, runs in a storage failure, sounds like not enough storage available at the chosen path...

2017-05-19 16:32
:( was not the failure

2017-05-19 16:32
TASK [Pull compose images [SLOW]] ********************************************** fatal: [10.241.236.92]: FAILED! => {"changed": true, "cmd": "DR_TAG=master /usr/local/bin/docker-compose pull", "delta": "0:01:19.428383", "end": "2017-05-19 18:30:10.865906", "failed": true, "rc": 1, "start": "2017-05-19 18:28:51.437523", "stderr": "Pulling postgres (digitalrebar/dr_postgres:master)...\nPulling rule-engine (digitalrebar/rule-engine:master)...\nPulling consul (gliderlabs/consul:latest)...\nPulling forwarder (digitalrebar/dr_forwarder:master)...\nPulling goiardi (digitalrebar/dr_goiardi:master)...\nPulling trust_me (digitalrebar/dr_trust_me:master)...\nPulling logging (digitalrebar/logging:master)...\nPulling dns (digitalrebar/dr_dns:master)...\nPulling provisioner (digitalrebar/dr_provisioner:master)...\nPulling revproxy (digitalrebar/dr_rev_proxy:master)...\nPulling cloudwrap (digitalrebar/cloudwrap:master)...\nPulling rebar_api (digitalrebar/dr_rebar_api:master)...\nfailed to register layer: devmapper: Thin Pool has 827 free data blocks which is less than minimum required 851 free data blocks. Create more free space in thin pool or use dm.min_free_space option to change behavior", "stdout": "Trying to pull repository registry.access.redhat.com/digitalrebar/dr_postgres ... \nTrying to pull repository docker.io/digitalrebar/dr_postgres ... \nmaster: Pulling from docker.io/digitalrebar/dr_postgres\nDigest: sha256:d94959f8c3294b3da4c8bb0ecb0e786e8cb386998b59a25e2afb50aa51a8bf2a\nTrying to pull repository registry.access.redhat.com/digitalrebar/rule-engine ... \nTrying to pull repository docker.io/digitalrebar/rule-engine ... \nmaster: Pulling from docker.io/digitalrebar/rule-engine\nDigest: sha256:2d42fdf62c74ffecdbc9d4afc2591243179bd13f3a7225cfece0758689ee2f4b\nTrying to pull repository registry.access.redhat.com/gliderlabs/consul ... \nTrying to pull repository docker.io/gliderlabs/consul ... \nlatest: Pulling from docker.io/gliderlabs/consul\nDigest: sha256:927a560389df16092364a4c26c976cd9b845800c8b96b4e687451c398f4187c1\nTrying to pull repository registry.access.redhat.com/digitalrebar/dr_forwarder ... \nTrying to pull repository docker.io/digitalrebar/dr_forwarder ... \nmaster: Pulling from docker.io/digitalrebar/dr_forwarder\nDigest: sha256:9febe88b6f8ff9b028fbeb10560cdec752d35985aaf5165629f6dd7824e0f129\nTrying to pull repository registry.access.redhat.com/digitalrebar/dr_goiardi ... \nTrying to pull repository docker.io/digitalrebar/dr_goiardi ... \nmaster: Pulling from docker.io/digitalrebar/dr_goiardi\nDigest: sha256:db620bbb4994d1074d706362996243353ad1d1f0347e7ec4e8706acb0e7195fe\nTrying to pull repository registry.access.redhat.com/digitalrebar/dr_trust_me ... \nTrying to pull repository docker.io/digitalrebar/dr_trust_me ... \nmaster: Pulling from docker.io/digitalrebar/dr_trust_me\nDigest: sha256:194f71e8edf29644ddcf9477d6764c71cff44dabe3c5de42aed779fd488183be\nTrying to pull repository registry.access.redhat.com/digitalrebar/logging ... \nTrying to pull repository docker.io/digitalrebar/logging ... \nmaster: Pulling from docker.io/digitalrebar/logging\nDigest: sha256:4eeee9ee94df703bc69ce1ef0af12d36302ac315873361c55089be701d140e9a\nTrying to pull repository registry.access.redhat.com/digitalrebar/dr_dns ... \nTrying to pull repository docker.io/digitalrebar/dr_dns ... \nmaster: Pulling from docker.io/digitalrebar/dr_dns\nDigest: sha256:46c93057b1bb01d1700d9015ef3f03a483b3204aa03ef808589086eddec8259d\nTrying to pull repository registry.access.redhat.com/digitalrebar/dr_provisioner ... \nTrying to pull repository docker.io/digitalrebar/dr_provisioner ... \nmaster: Pulling from docker.io/digitalrebar/dr_provisioner\nDigest: sha256:f790122f2d83f8f2a05b1e00d8fa90fbfbf7c093b8d4913faf3055da46b15cc6\nTrying to pull repository registry.access.redhat.com/digitalrebar/dr_rev_proxy ... \nTrying to pull repository docker.io/digitalrebar/dr_rev_proxy ... \nmaster: Pulling from docker.io/digitalrebar/dr_rev_proxy\nDigest: sha256:

2017-05-19 16:32
ffb8675f7069613ba4c6697bae49f367e7f43946b170dfa0236d2a923d42c24c\nTrying to pull repository registr

2017-05-19 16:34
I think, this step crashed : failed to register layer: devmapper: Thin Pool has 827 free data blocks which is less than minimum required 851 free data blocks. Create more free space in thin pool or...

2017-05-19 16:35
but : /dev/mapper/datavg-rebar_lv 40G 122M 38G 1% /var/lib/docker should be enough

jj
2017-05-19 16:49
any chance you have a `cURL` command to change the BootEnv handy?

jj
2017-05-19 16:49
anyone ^^

2017-05-19 16:51
curl is installed

2017-05-19 16:53
but what will you do?

jj
2017-05-19 17:06
oh, sorry i mean to run a curl command against the API

jj
2017-05-19 17:07
i useally figure one or two out and copypasta them with differences i need

jj
2017-05-19 17:07
the ?how toget the token? and all

2017-05-19 17:07
If you give me the request ;)

jj
2017-05-19 17:07
`data='{"BootEnv": "local"}'`

jj
2017-05-19 17:08
so leverage the rocketskates token then put against URL/api/v3/machines/UUID with the data as the payload?

2017-05-19 17:10
I'm not so familiar curl , so can you write down the complete command :shy:

jj
2017-05-19 17:10
heh, yeah i?ll play around with it. i think i know what i need to do after :rubberducking: you :slightly_smiling_face:

greg
2017-05-19 18:12
well - realize that DRP and DR have slightly different provisioner components at the moment.

greg
2017-05-19 18:12
@jj - drpcli machines bootenv <uuid> <bootenv>

jj
2017-05-19 18:20
Ah! Yeah the python in the end of the new boot.cfg has an error, I was going to see it was another way @greg

greg
2017-05-19 18:24
My python is function, but not very good.

greg
2017-05-19 18:39
@jj did you find the bug and fix?

greg
2017-05-19 18:58
the python thing may not be a bug.

greg
2017-05-19 18:58
It is a mismatch in supported cert validation.

jj
2017-05-19 19:01
ah, yeah it just says ?there is an error?

greg
2017-05-19 20:20
@jj - I have fix for the python thing.

jj
2017-05-19 20:20
:open_mouth:

jj
2017-05-19 20:21
I?m having a hellva time trying to get ESXi to boot correctly and install to a USB stick, i might just remove the `ks.cfg` completely at this rate

greg
2017-05-19 20:22
ok

2017-05-21 03:44
ello People ! I am working on one of my machine learning projects and need your help and support . Please fill the survey form . Thanks in advance . https://goo.gl/eHqkHk

2017-05-22 09:52
Hi, can DigitalRebar use Dell iDrac to powercycle and netboot the server ?

2017-05-22 12:19
what do I need to start simple bare-metal on prem with tftp and dhcp on admin host ?

2017-05-22 12:24
anywhere I need to put username and password for idrac admin account ?

2017-05-22 13:53
@maymann check out Provsion for DHCP/PXE/TFTP > https://github.com/digitalrebar/provision

2017-05-22 13:54
you'll need to run install the full Digital Rebar for out of band management

2017-05-22 13:56
to answer your original question - YES. that's what Digital Rebar does

2017-05-22 13:58
@punitaojha this is not the right forum for this type of survey

2017-05-22 21:09
@zehicle Hey Rob!! Remember me, from the discussion regarding DR failure on AWS

2017-05-22 21:09
I put out the word to my manager about the DR and its use cases

2017-05-22 21:10
Can you reach out to him regarding any potential use cases at Ericsson , His name is "Kumar" and he can be reached at "thalanayar.muthukumar@ericsson.com"

2017-05-22 21:19
Hi, I am trying to create a barclamp from a ansible playbook. Any pointers to docs or similar. I'm new to both ansible and rebar.

2017-05-22 21:44
@svallebro the best thing to do is look at the kubernetes install roles and the ansible jig. https://www.youtube.com/watch?v=uLTA2LA4KG8

2017-05-23 03:02
Tanks

2017-05-24 23:00
Following the mac PXE youtube video. I'm unable to get the machine to full boot. Getting the error: Failed to download stage2.img for ....

greg
2017-05-24 23:32
Is that file in the rftpboot dir for provision?

2017-05-25 12:43
zehicle: yes I see the file in drp-data/tftpboot/sledgehammer/708de8b878e3818b1c1bb598a56de968939f9d4b/stage2.img

2017-05-25 13:04
?

greg
2017-05-25 13:37
hmm - okay - so, DHCP worked, tftp worked,http didn't ....

greg
2017-05-25 13:44
connecting to the ui (or prefs through the cli), you can change the debug on render to see what files are being sent back in the output of dr-provision.

greg
2017-05-25 13:46
What networking did you configure? is it a local L2? What which local IP did you use for the dr-provision? Is the node multi-homed?

greg
2017-05-25 13:47
The questions are directed to try and make sure that the IP used to contact DRP through the full path is the same.

2017-05-25 13:47
I changed the debug level. Will look at the logs closely soon

2017-05-25 13:48
I'm on mac using virtual box. dr-provision is using the same subnet as my vb interface

2017-05-25 13:48
--static-ip=192.168.61.233/24

greg
2017-05-25 13:48
The IP used for the --static-ip flag needs to be routable by the clients

greg
2017-05-25 13:48
:slightly_smiling_face:

greg
2017-05-25 13:49
host-only network?

2017-05-25 13:49
this is the subnet I created 192.168.61.1/24

2017-05-25 13:49
yes

greg
2017-05-25 13:50
what 61.233 instead of 61.1?

greg
2017-05-25 13:50
what=why?

2017-05-25 13:50
I don't know.. I just type some random number there

2017-05-25 13:50
no reason

greg
2017-05-25 13:51
--static-ip should be the address assigned the node. On my system that is the .1 address.

2017-05-25 13:51
hmm

2017-05-25 13:51
the node that is booting?

greg
2017-05-25 13:52
vboxnet1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 ether 0a:00:27:00:00:01 inet 192.168.57.1 netmask 0xffffff00 broadcast 192.168.57.255

2017-05-25 13:52
I guess the static-ip is a little confusing for me

2017-05-25 13:52
I assume it was the dr-provision itself

greg
2017-05-25 13:52
so static-ip is the IP that dr-provision will use for communications if no other path is obvious.

2017-05-25 13:52
inet 192.168.61.1 netmask 0xffffff00 broadcast 192.168.61.255

greg
2017-05-25 13:52
Yes- use 192.168.61.1 as the static-ip.

2017-05-25 13:53
same result

2017-05-25 13:53
when I boot

2017-05-25 13:54
that's it

2017-05-25 13:54
I'm getting a login prompt now

greg
2017-05-25 13:54
Cool!

2017-05-25 13:54
odd.. I thought I tried .1 before

2017-05-25 13:54
anyhow.. it works :)

2017-05-25 13:54
what's the login?

greg
2017-05-25 13:54
root/rebar1

greg
2017-05-25 13:54
drpcli machines list

greg
2017-05-25 13:54
should show the node as well

2017-05-25 13:55
nice

2017-05-25 13:55
I will play around some more.. those videos are helpful

greg
2017-05-25 13:56
I'll take a note to review the docs and maybe add note to the video abou that.

greg
2017-05-25 13:56
Which video did you use?

greg
2017-05-25 13:57
This is actually @galthaus or @greg.

2017-05-25 13:57
I watched the video with the old macbook pro

2017-05-25 13:58
there was one thing didn't work for me during install, but I just ignore it

2017-05-25 13:58
sudo route -n add -net 255.255.255.255 --static-ip=10.0.0.20

2017-05-25 13:58
it said to do that if I'm osx > 10.9

2017-05-25 13:58
but it didn't like the syntax "--static"

greg
2017-05-25 13:58
the --static-ip should be 192.168.61.1 in your case.

greg
2017-05-25 13:59
Yeah - it was an add from the community that was needed for his mac. It may not always be needed.

greg
2017-05-25 13:59
You sometimes need it to make sure the broadcast packets are routed correctly on a mac.

2017-05-25 14:00
it was complaining about syntax error on the world "static"

2017-05-25 14:00
need to step out..bbl

greg
2017-05-25 14:01
actually, I think greater than/less thans are reversed.

greg
2017-05-25 14:01
I'll fix that.

greg
2017-05-25 14:01
thanks.

2017-05-25 15:00
I believe I also had an issue with the first one too

2017-05-25 15:00
I'm running 10.11.6

2017-05-25 15:10
zehicle: I noticed there's a more comprehensive UI than the swagger one in other videos. How do I access that?

greg
2017-05-25 15:43
Path is /ui

2017-05-25 15:46
that's the one I have been using: <localhost:8092/ui>

2017-05-25 15:48
I'm referring to the other one. Like in k8s video

2017-05-25 15:57
or is that part of the RackN product?

greg
2017-05-25 16:10
That is the bigger digitalrebar ux. Most likely

2017-05-25 16:12
zehicle: how do I access the "bigger DR ux"

greg
2017-05-25 16:27
It is a separate product You would have to install digitalrebar fulll. The question is what are trying to do and which features do you need

2017-05-25 16:38
Trying to do an eval of the product and better understanding what it can do.

greg
2017-05-25 16:48
okay - well - check here for digitalrebar info: https://github.com/digitalrebar/digitalrebar

2017-05-25 16:50
Got it thx

2017-05-25 17:28
6GB Ram min now :)

greg
2017-05-25 17:36
for digitalrebar, it is doing a little more than drp. :slightly_smiling_face:

2017-05-25 17:38
just a little

2017-05-25 17:38
:)

2017-05-25 17:48
zehicle: What's a good email for you? It might make sense for us to have a brief chat before I go down the full blown evaluation. I can email you a short description of what we trying to accomplish.

greg
2017-05-25 17:53
and

2017-05-25 17:56
Thx, you will receive an email from Aaron soon

greg
2017-05-25 17:59
:slightly_smiling_face: thanks - we are at Gluecon right now. So reply may not be today.

2017-05-25 18:21
np

2017-05-26 12:03
I installed the Digital-rebar, the installation completed successful. However in the UI provisioner tab is missing. But the provisioner container is running.

2017-05-26 12:03
[root@mo-dc2df9e2a compose]# docker-compose ps WARNING: The DR_TAG variable is not set. Defaulting to a blank string. Name Command State Ports ------------------------------------------------------------------------------------------------------------- compose_cloudwrap_1 /sbin/docker-entrypoint.sh Up compose_consul_1 /bin/consul agent -config- ... Up compose_dns_1 /sbin/docker-entrypoint.sh Up compose_forwarder_1 /sbin/docker-entrypoint.sh Up 0.0.0.0:3000->3000/tcp, 0.0.0.0:443->443/tcp compose_goiardi_1 /sbin/docker-entrypoint.sh Up compose_logging_1 /sbin/docker-entrypoint.sh Up compose_postgres_1 /docker-entrypoint.sh postgres Up compose_provisioner_1 /sbin/docker-entrypoint.sh Up compose_rebar_api_1 /sbin/docker-entrypoint.sh Up compose_revproxy_1 /sbin/docker-entrypoint.sh Up compose_rule-engine_1 /sbin/docker-entrypoint.sh Up compose_trust_me_1 /sbin/docker-entrypoint.sh Up compose_webproxy_1 /sbin/docker-entrypoint.sh Up [root@mo-dc2df9e2a compose]#

2017-05-26 12:04
Please check and let me know where I am going wrong...

greg
2017-05-26 13:26
First, my guess is that you don't want forwarder mode. You should rerun the command with: --access=HOST

greg
2017-05-26 13:33
second, you can go to https://adminip/health and see if the service has registered.

2017-05-26 18:50
@zehicle , even after rerun the command with --access=host , still provisioner service is not showing in the UI

2017-05-26 18:50
{"Map":{"dns-mgmt-service":["172.17.0.7:6754"],"rebar-api-service":["172.17.0.9:3000"],"rule-engine-service":["172.17.0.2:19202"]},"Matcher":{"dns-mgmt-service":"^dns/(._)","rebar-api-service":"^rebar-api/(._)","rule-engine-service":"^rule-engine/(api/.*)"},"Default":"rebar-api-service"}

greg
2017-05-26 18:58
capital HOST

greg
2017-05-26 18:59
do you have access to internet?

greg
2017-05-26 18:59
you can do: ```docker-compose logs -f provisioner```

greg
2017-05-26 18:59
It would be nice to know what is in that log.

2017-05-30 09:38
@zehicle , I am getting below messages from the command : docker-compose logs -f provisioner [root@mo-dc2df9e2a compose]# docker-compose logs -f provisioner WARNING: The DR_TAG variable is not set. Defaulting to a blank string. Attaching to compose_provisioner_1 provisioner_1 | Calling cmd: /usr/local/entrypoint.d/00-wait-for-ip.sh provisioner_1 | Waiting for 192.168.124.11/24 provisioner_1 | Calling cmd: /usr/local/entrypoint.d/05-start-samba.sh provisioner_1 | Calling cmd: /usr/local/entrypoint.d/10-wait-for-consul.sh provisioner_1 | % Total % Received % Xferd Average Speed Time Time Time Current provisioner_1 | Dload Upload Total Spent Left Speed 100 19 100 19 0 0 3472 0 --:--:-- --:--:-- --:--:-- 3800 provisioner_1 | Calling cmd: /usr/local/entrypoint.d/15-get-sledgehammer.sh provisioner_1 | % Total % Received % Xferd Average Speed Time Time Time Current provisioner_1 | Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:02:06 --:--:-- 0curl: (7) Failed to connect to opencrowbar.s3-website-us-east-1.amazonaws.com port 80: Connection timed out provisioner_1 | Calling cmd: /usr/local/entrypoint.d/00-wait-for-ip.sh provisioner_1 | Calling cmd: /usr/local/entrypoint.d/05-start-samba.sh provisioner_1 | Calling cmd: /usr/local/entrypoint.d/10-wait-for-consul.sh provisioner_1 | % Total % Received % Xferd Average Speed Time Time Time Current provisioner_1 | Dload Upload Total Spent Left Speed 100 19 100 19 0 0 3314 0 --:--:-- --:--:-- --:--:-- 3800 provisioner_1 | Calling cmd: /usr/local/entrypoint.d/15-get-sledgehammer.sh provisioner_1 | % Total % Received % Xferd Average Speed Time Time Time Current provisioner_1 | Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:02:06 --:--:-- 0curl: (7) Failed to connect to opencrowbar.s3-website-us-east-1.amazonaws.com port 80: Connection timed out provisioner_1 | Calling cmd: /usr/local/entrypoint.d/00-wait-for-ip.sh provisioner_1 | Calling cmd: /usr/local/entrypoint.d/05-start-samba.sh provisioner_1 | Calling cmd: /usr/local/entrypoint.d/10-wait-for-consul.sh provisioner_1 | % Total % Received % Xferd Average Speed Time Time Time Current provisioner_1 | Dload Upload Total Spent Left Speed 100 19 100 19 0 0 1100 0 --:--:-- --:--:-- --:--:-- 1117 provisioner_1 | Calling cmd: /usr/local/entrypoint.d/15-get-sledgehammer.sh provisioner_1 | % Total % Received % Xferd Average Speed Time Time Time Current provisioner_1 | Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:02:06 --:--:-- 0curl: (7) Failed to connect to opencrowbar.s3-website-us-east-1.amazonaws.com port 80: Connection timed out provisioner_1 | Calling cmd: /usr/local/entrypoint.d/00-wait-for-ip.sh provisioner_1 | Calling cmd: /usr/local/entrypoint.d/05-start-samba.sh provisioner_1 | Calling cmd: /usr/local/entrypoint.d/10-wait-for-consul.sh

greg
2017-05-30 13:33
Does your admin node have access to amazon s3?

josh
2017-05-30 18:28
has joined #community201705

2017-05-30 18:50
@zehicle , No admin node do not have access to amazon s3. [root@mo-dc2df9e2a compose]# telnet opencrowbar.s3-website-us-east-1.amazonaws.com 80 Trying 52.216.82.18...

greg
2017-05-30 18:55
well - that is the problem.

greg
2017-05-30 18:55
currently install requires that the admin node has a path out to the internet.

greg
2017-05-30 18:55
for initial setup

2017-05-31 12:50
@zehicle ,

2017-05-31 12:53
@zehicle , Since we are having internet connection issue at company, I am thinking like this, will deploy the digital-rebar on VM host from home internet, then will take that OS image and will deploy on our company server, will that work?

greg
2017-05-31 13:18
You may have to restart it.

greg
2017-05-31 13:18
it should work.

2017-05-31 15:31
How about the Admin IP, It will get change when I move the home based OS image to Company network, how it will work?

2017-05-31 15:35
@zehicle , when I logged into provisioner container, and issued ps -ef command to check the running process.. I see the below. root@59a1537e7152:/# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 12:48 ? 00:00:00 /bin/bash /sbin/docker-entrypoint.sh root 22 1 0 12:48 ? 00:00:00 smbd root 25 22 0 12:48 ? 00:00:00 smbd root 28 22 0 12:48 ? 00:00:00 smbd root 43 1 0 12:48 ? 00:00:00 /bin/bash /sbin/docker-entrypoint.sh root 44 43 0 12:48 ? 00:00:00 curl -fgL -o /tftpboot/sledgehammer/a42c8c66a60b77ca1c769b8dc7e712f6644579ed/sha1sums http://opencrowbar.s3-website-us-east-1.amazonaws.com/sledgehammer/a42 root 46 0 0 12:48 ? 00:00:00 /bin/bash root 60 46 0 12:49 ? 00:00:00 ps -ef the PID 44 is running curl which connects to AWS S3. which required internet. How will it be possible if I import the VM to company network, which will not allow the container to connect ti internet?

greg
2017-05-31 15:56
The images are cached in the image.

greg
2017-05-31 15:57
You can stop the containers and restart them with a specific command line that will change the external ip and not run the ansible playbooks.

greg
2017-05-31 15:57
Or you can contact RackN and get support for the offline install mode.