2018-07-02 18:15
has joined #community201807

2018-07-02 19:15
@ uploaded a file: https://rackn.slack.com/files/UBBSHF675/FBH1P892L/capture.png and commented: @shane I used os-other content and used the templates for ESXi, but it fails to boot with this error

spector
2018-07-02 19:15
@ $Welcome

2018-07-02 19:15
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

spector
2018-07-02 19:15
I tweeted you via DigitalRebar on email issue but we found you so just ignore that

spector
2018-07-02 19:17
Your submission via our web only had 1 ?l? so we fixed the issue.

shane
2018-07-03 01:00
@ - did you run the server through a "discovery" workflow first? did Sledgehammer and build a Machine object for the server? Also - are you running DHCP on DRP or externally?

2018-07-03 15:10
yes the machine was discovered by Sledgehammer and was in Terrafrom-ready state.

2018-07-03 15:11
I am still running tests on Virtualbox, so DHCP is vbox virtual network

2018-07-03 16:28
Also where can I find ipmi-discover? I have IPMI commands plugin, but cant find ipmi-discover.

greg
2018-07-03 16:31
ipmi-discover is provided by the `ipmi` plugin, not the `virtualbox-ipmi` plugin.

2018-07-03 16:34
is ipmi-discover a stage ?

zehicle
2018-07-03 16:39
yes, it's content tied to the plugin

2018-07-03 16:40
ok, and the plugin name that I need to get is "IPMI commands"?

zehicle
2018-07-03 16:40
yes

2018-07-03 16:41
I did that, but I only have ipmi-configure stage

greg
2018-07-03 16:43
Yes - ipmi-discover is a task, I believe.

2018-07-03 16:45
its not in the tasks either. its not a template or a param. I cant find that anywhere.

2018-07-03 16:57
to be clear, this not part of Virtualbox experiments, I am trying to discover physical servers using IPMI.

greg
2018-07-03 16:57
okay

greg
2018-07-03 16:58
There isn?t `ipmi-discover`

greg
2018-07-03 16:58
There is only `ipmi-configure`

greg
2018-07-03 16:58
At that stage, set the configure parameters to do what you want, and away it goes.


zehicle
2018-07-03 18:40
just a reminder - no community meeting today (would have already started)

zehicle
2018-07-03 18:40
next planned meeting would be 7/17

zehicle
2018-07-03 18:47
if you're waiting/watching for the Kubernetes (KRIB) work I announced in channel last week then here's a short update: 1) we've merged the KRIB (v1) and KRIB-HA (v2) work together going forward as KRIB (so v3?) 1+) the code is all APLv2 2) it's still a work in progress, so expect some bumps 3) it relies on the cert plugin for now. we've made that available without needing a licensed org 4) we are going to move cert, packet-ipmi, virtualbox-ipmi plugins over to digitalrebar and setting the license to APL. They are all available today in the RackN online library, but it will take some time to split out the code and fix the builds if you wanted to build them yourself. For now, we're keeping the KRIB work relatively quiet so the DRP community has time to test and validate. Eventually, we'll promote more broadly.

2018-07-03 19:24
I used the ipmi-configure to discover machines through IPMI, and used params to set username and password. However, the workflow always stuck at ipmi-configure stage and never move forward, I am wondering the workflow stages have to be different, my workflow has: 1-ipmi-configure (sledgehammer as boot env) 2-sledgehammer-wait 3-terraform-ready

2018-07-03 19:25
also, I was adding trying to add the machine manually though the portal

2018-07-03 19:27
I double checked and ssh to the IPMI address, and its reachable and ssh connects just fine.

2018-07-03 23:55
has joined #community201807

2018-07-04 00:40
Hi ?I?m trying to install Ubuntu 16.04 and following the getting started guide. However, my pxe network doesn?t have a gateway or access to the internet. This should not be required, but the install pauses until I hit enter a few times. Then it continues the install over the local network. Does anyone know which boot parameters need to be changes? The default is `debian-installer/locale=en_US.utf8 console-setup/layoutcode=us keyboard-configuration/layoutcode=us netcfg/dhcp_timeout=120 netcfg/choose_interface=auto url={{.Machine.Url}}/seed netcfg/get_hostname={{.Machine.Name}} root=/dev/ram rw quiet {{if .ParamExists "kernel-console"}}{{.Param "kernel-console"}}{{end}} -- {{if .ParamExists "kernel-console"}}{{.Param "kernel-console"}}{{end}}`

zehicle
2018-07-04 00:46
@ yes, there's a parameter for that "local-repo" which should be set to true.


2018-07-04 01:00
@zehicle thanks. I?m trying it now.

zehicle
2018-07-04 01:01
no problem, I'm updating the $faq to include that


2018-07-04 01:23
@zehicle That did it. Thanks!

zehicle
2018-07-04 01:25
Glad to help!

2018-07-04 01:48
@zehicle I spoke too soon. Eager for dinner :slightly_smiling_face: That didn?t make any difference. I created a profile and assigned it to the server. ```./drpcli profiles show local-repos { "Available": true, "Description": "", "Documentation": "", "Errors": [], "Meta": { "color": "black", "icon": "hashtag", "title": "User added profile" }, "Name": "local-repos", "Params": { "local-repo": true }, "ReadOnly": false, "Validated": true } ./drpcli machines show 405f3f72-d6dc-477f-b7e5-240008609279 | grep local-repos "local-repos" ```

zehicle
2018-07-04 01:57
btw: there's a drpcli call for params ./drpcli machines params [machineid]

greg
2018-07-04 02:47
@zehicle I believe local_repo went away with the repo fixes @vlowther added. Ubuntu is annoying because it kinda assumes internet access.

zehicle
2018-07-04 02:57
I'll fix the doc and note that in the param

zehicle
2018-07-04 02:59
Is there a workaround?

2018-07-04 03:44
Adding ?default_route=true? to the boot parameters and including a dns server on the local subnet in DHCP seems to have corrected the problem.

zehicle
2018-07-04 03:51
I'll update the docs. Sorry for throwing the curveballs.

greg
2018-07-04 13:11
@ I wonder. I bet I put option 3 and 6 in my subnets. That may work around the problem as well

2018-07-04 16:09
@greg That worked as well. Thanks.

2018-07-05 10:55
has joined #community201807

2018-07-05 15:38
I used the ipmi-configure to discover machines through IPMI, and used params to set username and password. However, the workflow always stuck at ipmi-configure stage and never move forward, I am wondering the workflow stages have to be different, my workflow has: 1-ipmi-configure (sledgehammer as boot env) 2-sledgehammer-wait 3-terraform-ready also, I was adding trying to add the machine manually though the portal I double checked and ssh to the IPMI address, and its reachable and ssh connects just fine.

spector
2018-07-05 15:46
@ $Welcome

2018-07-05 15:46
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

2018-07-05 15:47
Hello, thanks

shane
2018-07-05 15:53
@ - have you checked the Job Log to see what it says? It's probably an error being kicked back that stops the job from successfully running.

2018-07-05 16:54
No error is logged. I cleared the Job-log directory, and re-launched DRP and did the same steps, it always stuck at ipmi-configure stage, and no error is logged. it seems like it does not event attempt to contact the other machine at all.

2018-07-05 17:03
I added the username and password params to ipmi-configure stage, and I also tried to add ipmi/address (which is the same IPMI address), but is does not change anything.

2018-07-05 17:11
@shane is adding a machine manually, starts the workflow automatically, or do I need to do extra step to start the workflow ?

zehicle
2018-07-05 17:40
@ really it's the runner that does the workflow

zehicle
2018-07-05 17:40
so, it won't start until you have a runner working on the machine

zehicle
2018-07-05 17:41
you can use a script on the machine to have the runner (drpcli) register the machine and then start the workflow

zehicle
2018-07-05 17:41
if you are having workflow stall, make sure that you have a runner going - there's a stage for installing the runner in a new O/S or O/S image

2018-07-05 17:48
you mean this command drpcli machines update <UUID> '{ "Workflow": "ubuntu18" }'

2018-07-05 17:55
I also tried drpcli machine workflow UUID myworkflow, also nothing happens

2018-07-05 17:56
@zehicle I am not sure if that's the runner you are talking about

zehicle
2018-07-05 18:01
drpcli is the runner


dave.parker
2018-07-05 19:28
How come sometimes I'm not allowed to change a machine's stage through the API? I try, and get code 422, "Changing machine stage not allowed"

dave.parker
2018-07-05 19:32
Oh, I guess "Changing machine bootenv not allowed" is part of it too.

dave.parker
2018-07-05 19:32
Hrm.

vlowther
2018-07-05 19:35
That happens whenever a machine is assigned to a Workflow that is processing.

dave.parker
2018-07-05 19:36
Ah ok.

dave.parker
2018-07-05 19:36
How do I clear the workflow through the API? Can I just delete it?

zehicle
2018-07-05 19:40
@dave.parker you can set it to empty string, yes.

zehicle
2018-07-05 19:41
you may also want to clear the Stage when you do that

dave.parker
2018-07-05 19:41
Ok.

dave.parker
2018-07-05 19:41
Thanks

2018-07-05 19:45
@zehicle so I tried drpcli machines processjobs , and it does start processing, so now I get an error "Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory attempt 0 failed - trying again..." so I did "modprobe ipmi_devintf" to insert the kernel module for ipmi but still getting the same error, do I need to install some other ipmi software on my machine? (other than ipmitool)?

2018-07-05 19:49
one of the solutions I came across is inserting the kernel module "ipmi_si", but this module would only work on physical machines, does not work on virtual mahcines, do you think this is the problem?

zehicle
2018-07-05 19:51
you are using IPMI tool on virtual?

shane
2018-07-05 19:51
yes - don't have a BMC (Baseboard Management Controller) in a Virtual Machine - you can't modify the BMC ... IPMI is just a protocol to talk to a BMC

zehicle
2018-07-05 19:51
jinx w Shane

zehicle
2018-07-05 19:52
@ we have the virtualbox-ipmi for purposes like that

2018-07-05 19:52
I am running DRP on virtual machine, but the machine I am trying to access is physical

zehicle
2018-07-05 19:52
which uses the platform APIs to simulate IPMI for building workflow

zehicle
2018-07-05 19:52
ah, ok

2018-07-05 19:53
so what @shane said, means I need to run DRP on a physical machine ?

zehicle
2018-07-05 19:53
no, the endpoint can run on VMs just fine

2018-07-05 19:54
so why am I getting this error?

2018-07-05 19:54
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory attempt 0 failed

2018-07-05 19:55
on the endpoint ?

zehicle
2018-07-05 19:56
I think it would help if we had more information about your system - both the DRP VM and what you are trying to control via IPMI

vlowther
2018-07-05 19:59
What is the physical gear you are trying to manage?

2018-07-05 19:59
DRP VM: Virtualbox vm, running Ubuntu 18.04LTS IPMI Machine: Dell PowerEdge C6320

2018-07-05 19:59
Time to feed the :bear:!

vlowther
2018-07-05 19:59
ok

vlowther
2018-07-05 20:00
can you log into the idrac on that box?

2018-07-05 20:00
yes

vlowther
2018-07-05 20:00
ok.

vlowther
2018-07-05 20:01
Where are you running `drpcli machines processjobs`

vlowther
2018-07-05 20:01
?

2018-07-05 20:01
on the endpoint

vlowther
2018-07-05 20:01
Is that the machine running dr-provision?

2018-07-05 20:01
yes

vlowther
2018-07-05 20:02
ok

vlowther
2018-07-05 20:02
that command must be run on the target machine.

vlowther
2018-07-05 20:03
and if you boot the 6320 into Sledgehammer the agent (which is what you are running with that command) will run automatically.

vlowther
2018-07-05 20:04
REally, the only time you would run drpcli machines processjobs manually is of you were troubleshooting why the agent was failing.

vlowther
2018-07-05 20:04
otherwise, we arrange for it to run as a daemon everywhere it is needed.

2018-07-05 20:06
Does that mean the machine should be in a sledgehammer-wait state, before I can run configure-ipmi ?

vlowther
2018-07-05 20:06
Well, the machine needs to be booted into Sledgehammer.

vlowther
2018-07-05 20:08
I would just build a workflow that goes from discover to ipmi-configure, set the varoius ipmi parameters to configure it the way you want, and go from there.

2018-07-05 20:10
I see, I will try to do this. the confusion was because I thought it works like this: -configure-ipmi would ssh to target machine, and change boot order to PXE. -then install Sledgehammer, and go from there.

vlowther
2018-07-05 20:11
Nope. we do not rely on SSH at all.

vlowther
2018-07-05 20:11
You PXE boot the box into Sledgehammer, our agent starts, and runs through whatever workflow is assigned to the machine.

zehicle
2018-07-05 20:12
v2 did use SSH and we choose to move away from it for a myriad of reasons

2018-07-05 20:13
Ok, thank you for the clarification.

dave.parker
2018-07-05 21:33
"Can not change stages with pending tasks unless forced"

dave.parker
2018-07-05 21:33
How do I force something through the API?

dave.parker
2018-07-05 21:38
Oh there's a force parameter.

shane
2018-07-05 21:38
JSON `"force": true`

dave.parker
2018-07-05 21:53
`{ "op": "replace", "path": "/Stage", "value": "STAGE", "force": true }` doesn't seem to work

dave.parker
2018-07-05 22:06
Ah, nevermind, figured it out. :smile:

shane
2018-07-05 22:18
as a general pattern - you can try operation via the Web Portal, and turn on the developer JS tools - you'll be able to see all of the exact API calls we issue - for example the "workflow clear item", "force" enabled, etc...

dave.parker
2018-07-05 22:21
Oh cool, thanks.

zehicle
2018-07-05 22:52
also, Properties on DRP models are generally capitalized.

2018-07-05 23:16
How do I add customer initrd.gz to a cloned BootEnvironment? It looks like I may be able to do it with modifying Env.JoinInitrds and Env.Kernel, but I?m just not able to connect the dots.

shane
2018-07-06 00:15
@ right now you have to clone the BootEnv and then edit it to modify the `Initrd` and/or `Kernel` that is specified from the default - the paths are relative to the exploded out ISO in the DRP Endpoint

2018-07-06 00:24
@shane Thanks. So I just change the files where it is stored on the disk?

shane
2018-07-06 00:24
Are you using the web Portal ?

shane
2018-07-06 00:24
or CLI ?

2018-07-06 00:24
Only to learn

2018-07-06 00:24
First web portal then cli and eventually api

shane
2018-07-06 00:24
what version of DRP are you running ?

2018-07-06 00:25
I installed it Tuesday

shane
2018-07-06 00:26
you can see your version in the `Info & Preferences` menu item - then the lower right panel

shane
2018-07-06 00:26
probably v3.9.0 then

2018-07-06 00:27
v3.9.0-0-aa4d113c7a433d6a81fb985275be01901ae28e6d

shane
2018-07-06 00:27
when you Clone an object (including a BootEnv) - you should be presented with an Editable panel of the cloned item - you change the fields/things in it

shane
2018-07-06 00:27
(yep stable released v3.9.0)

shane
2018-07-06 00:28
the cloned version fields are editable -including Kernel and Initrd

shane
2018-07-06 00:28
make sure to Save after making edits to the clone

shane
2018-07-06 00:29
this is roughly equivalent of doing the CLI commands: ```drpcli bootenvs show centos-7-install --format=yaml > my-centos.yaml # edit the copy drpcli bootenvs create ./my-centos.yaml```

2018-07-06 00:29
I see that ?kernel? is editable. install/netboot/ubuntu-installer/amd64/linux but the initirds is not. install/netboot/ubuntu-installer/amd64/initrd.gz

shane
2018-07-06 00:30
hmmm ... that's weird ...

shane
2018-07-06 00:30
try the CLI syntax then

shane
2018-07-06 00:30
:slightly_smiling_face:

shane
2018-07-06 00:31
looks like a goofy bug with the Portal UX behavior - with Initrd being after the Templates

shane
2018-07-06 00:31
@zehicle ^^^ ???

shane
2018-07-06 00:32
if you do via CLI - you'll want to make sure to change the `Name` field to something like `my-ubuntu` ... or whatever

2018-07-06 00:33
That is great. I prefer the cli, just the UI can be easier when starting?

shane
2018-07-06 00:34
totally understandable ... even our in-house CLI curmudgeon (@vlowther) can be caught using the web Portal from time to time ... :slightly_smiling_face:

vlowther
2018-07-06 00:35
grumps.

2018-07-06 00:36
Do I use drpcli files upload to load the new initrd.gz?

shane
2018-07-06 00:37
you can - you can also stage them in the tftpboot directory appropriately - either `drp-data` directory or `/var/lib/dr-provision` depending how you did the original install (isolated vs not)

shane
2018-07-06 00:38
one note - I'm not sure about the Path rules - I assume the "absolute" will be rooted in the `tftpboot` directory - while the current BootEnv use of the relative paths are ... ahem relative ... to the exploded ISO directory

shane
2018-07-06 00:38
the `drpcli files upload ... ` command will upload to the `tftpboot/files` directory

vlowther
2018-07-06 00:39
@shane before going too much farther, there are some design issues that we should go over with Greg.

shane
2018-07-06 00:40
yeah - I bumped in to those already :slightly_smiling_face:

2018-07-06 00:41
Thanks for the help shane. That should provide the functionality that I am looking for.

vlowther
2018-07-06 00:43
Drpcki files upload will not work unless the bootenv is designed to use it.

vlowther
2018-07-06 00:44
There are some old design issues that have to be considered.

vlowther
2018-07-06 00:45
I can go into more detail tomorrow.

2018-07-06 14:24
Is there a quick writeup on how to utilize the ha-krib plugin?

shane
2018-07-06 14:25
@ the ha-krib is now folded in to the open "krib" content (it contains ha-krib) ... @zehicle is working on updates very very actively this week - including documentation

2018-07-06 14:26
Yeah, I saw the plugin come in with the krib content. I'm about to start messing with it and figured I would ask if you had anything written up yet.

shane
2018-07-06 14:26
the "official" documentation location is: https://provision.readthedocs.io/en/tip/doc/integrations/krib.html but it doesn't contain the HA pieces yet - and it still references the older `change-stage/map` workflow system (deprecated in favor of `Workflows`)

shane
2018-07-06 14:27
there is some new "content" package documentation on it: https://provision.readthedocs.io/en/tip/doc/content-packages/krib.html

2018-07-06 14:28
perfect, thanks @shane

shane
2018-07-06 14:29
we should hopefully have all of the documentation updated in the next few days - so hopefully mid-week next week things will look a lot better - again Rob is working quite actively on the content pack right now to clean things up, and enhance it

zehicle
2018-07-06 14:55
I'm planning to get a few videos posted shortly

zehicle
2018-07-06 14:56
showing the basics - I was able to do a full run last night

zehicle
2018-07-06 14:56
there are some tricks to know about cleanup for multiple runs

2018-07-06 17:57
I found a bug in the rackn portal, whats the best way to report it?

shane
2018-07-06 17:58

shane
2018-07-06 17:58
If you could label it `provision-ux-bug`

2018-07-06 18:08
It doesn't look like i can add labels, but the bug is here: https://github.com/digitalrebar/provision/issues/931

shane
2018-07-06 18:08
Ok - we'll (@zehicle) take a look - thank you @

zehicle
2018-07-06 18:10
1) You have to manually remove the blanks - there's delete button for that. We cannot make assumptions about blanks.

2018-07-06 18:12
Is that done some other way than clicking the X next to the task in the task list?

zehicle
2018-07-06 18:12
not in the UX

zehicle
2018-07-06 18:12
it's just an ordered list in the data model

zehicle
2018-07-06 18:13
we did not code reorder in the UX

2018-07-06 18:14
Clone -> Delete Last Task -> Add: Works

2018-07-06 18:14
Clone -> Delete Middle Task -> Add: Error

2018-07-06 18:14
and I don't see any way to fix the error in the UI

zehicle
2018-07-06 18:15
I'll take a look. FWIW: We STRONGLY recommend using the bundle development pattern when building new workflow / stages / etc. The UX was not designed for component development.

zehicle
2018-07-06 18:15
@ I see the problem

2018-07-06 18:16
Yeah, this is just for a one off thing / exploration . Its a simple stage, so I can just re-create it by hand.

2018-07-06 18:16
Since it works fine as long as you go in order.

zehicle
2018-07-06 18:16
We have a video training on the bundle upload pattern.

zehicle
2018-07-06 18:16
which is very productive even on small edits

zehicle
2018-07-06 18:17
and encourages good content management practices

zehicle
2018-07-06 18:45
@ I was able to find and fix the issue (using delete instead of splice). It will go through the process and should be available in the tip UX tomorrow.

2018-07-06 19:00
cool, thanks for the quick fix @zehicle

zehicle
2018-07-06 19:03
was a very simple fix! we (RackN) don't use the UX for much editing, so it's helpful to get bugs like that reported. It was a common component, so likely other list selectors were also broken.

2018-07-06 20:01
```Log for Job: 63738a9f-cd7d-4810-ba0e-537f6c319572 Starting task etcd-config on 59c493f5-54ea-4e19-8ecc-0787f1e25238 Starting command ./etcd-config-etcd-config.sh.tmpl Command running Configure the etcd cluster Add initial variables to track members. [] [] Creating 1 servers Electing etcd members to cluster profile: cts-ha-krib STAGE REQUIRES CERT PLUGIN!! Contact RackN for access. Command exited with status 1 Action etcd-config.sh.tmpl finished Task etcd-config failed Marked machine 59c493f5-54ea-4e19-8ecc-0787f1e25238 as not runnable Updated job 63738a9f-cd7d-4810-ba0e-537f6c319572 to failed Task signalled that it failed```

2018-07-06 20:01
Is that expected?

2018-07-06 20:01
CERT PLUGIN not available?

shane
2018-07-06 20:02
Ping @zehicle on DM

shane
2018-07-06 20:03
It is expected, and one of the road bumps we're ironing out

2018-07-06 20:06
Will do @shane :thumbsup:

zehicle
2018-07-06 21:02
the cert plugin is moved to open access - you should be able to select it in the catalog

zehicle
2018-07-07 16:09
@ you do need to use the TIP version (which is default now) to get the unrestricted version