zehicle
2018-11-01 12:56
In tip, sledgehammer needs to be updated top very very latest. The changes to sledgehammer and to kubernetes needed to be reconciled. @greg did that Tuesday.

zehicle
2018-11-01 12:56
@seaton ^^

smedefind
2018-11-01 17:13
I?m trying to use the custom-ipxe bootenv, but I keep getting an error on boot

smedefind
2018-11-01 17:14
`Booting kernel failed: Invalid argument`

smedefind
2018-11-01 17:14
Any one else seen this?

zdunn
2018-11-01 17:21
@smedefind I have! Well, on your screen

zehicle
2018-11-01 17:21
check the DRP logs and see if there's a rendering error on the template

zdunn
2018-11-01 17:22
those are systemd logs?

zehicle
2018-11-01 17:22
gives Zach +1 on Dad Jokes tally

zdunn
2018-11-01 17:22
I have two kids

zdunn
2018-11-01 17:22
it's just my nature at this point

zehicle
2018-11-01 17:23
yes AND also available via the /logs API. You can browse in the UX

zdunn
2018-11-01 17:24
hmm k - doesn't seem to be much there

smedefind
2018-11-01 17:26
I just booted up PXE with the logs being followed in my console, nothing came up.

smedefind
2018-11-01 17:27
Turning on debug

smedefind
2018-11-01 17:29
```Nov 01 17:29:09 http://elemental-pxe-01.us-east.optoro.io dr-provision[3229]: [159239:328]Content: Nov 01 17:29:09 http://elemental-pxe-01.us-east.optoro.io dr-provision[3229]: DEFAULT discovery Nov 01 17:29:09 http://elemental-pxe-01.us-east.optoro.io dr-provision[3229]: PROMPT 0 Nov 01 17:29:09 http://elemental-pxe-01.us-east.optoro.io dr-provision[3229]: TIMEOUT 10 Nov 01 17:29:09 http://elemental-pxe-01.us-east.optoro.io dr-provision[3229]: LABEL discovery Nov 01 17:29:09 http://elemental-pxe-01.us-east.optoro.io dr-provision[3229]: KERNEL ipxe.pxe```

zdunn
2018-11-01 17:31
:point_up: that's from the debug

bagricola
2018-11-01 17:32
Hmm? my CI process that uses `drpcli contents bundle` has started failing with `Error: Failed to load: No idea how to decode FETCH_HEAD into .git`

bagricola
2018-11-01 17:33
(first rebuild of the content pack in ~2 weeks so it?s pulled down a new version of drp + cli (its configured to use tip)

zehicle
2018-11-01 17:46
bundle checks for ._[meta].meta files.

zehicle
2018-11-01 17:51
it's possible that you've got a .git file hanging out in a subdirectory (likely the templates one)


bagricola
2018-11-01 18:35
hmm? well it is a git repo so it has a `.git` dir in the root

bagricola
2018-11-01 18:36
which would explain the error, as FETCH_HEAD is a file inside that

bagricola
2018-11-01 18:42
yeah, removing the `.git` dir before calling bundle works

zehicle
2018-11-01 19:17
we put multiple content packs in a repo, so that file is not a factor

christopher_wood
2018-11-01 19:18
(Apparently it's that time of day, when I turn up after reading things.) The gateway around here is always the bottom of an ip range and I've calculated that using ruby. Is there a way of adding some dynamic calculation to a content pack or the inside of a template? (https://golang.org/pkg/text/template/ but I'm not sure how much I can cram in a template.)

christopher_wood
2018-11-01 19:18
That is, I know how I would do the calculation, I'm not sure where I would add this in a content pack.

zehicle
2018-11-01 19:22
@christopher_wood yes! we include the sprig library for the template render, so you can use nearly any of those functions

christopher_wood
2018-11-01 19:22
Thank you! Reading about sprig.



zehicle
2018-11-01 19:24
you may also add a task to calculate and push the values into params for later stages.

christopher_wood
2018-11-01 19:32
Okay, I think I get it. Back to the books. Very much appreciated.

sean.t.beeg
2018-11-02 14:46
has joined #community201811

zehicle
2018-11-02 14:54
@sean.t.beeg $welcome

2018-11-02 14:54
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

chedbob.pm
2018-11-04 15:30
howdy. I think I wish to run something on a chef server which drp could use to execute remote instructions as part of a provisioning workflow for other machines. is there a doc somewhere on accomplishing things of that nature?

chedbob.pm
2018-11-04 20:41
can the drp "client" do that? thanks.

shane
2018-11-04 21:07
In short, yes. Either as a custom task in workflow that calls out to the chef server (executed on the machine side) ; or a PlugIn that executes the call to chef on the machine's behalf (from DRP Endpoint) .

zehicle
2018-11-04 21:51
@chedbob.pm check out the ansible tower discussion posted from the last meetup. If that matches, the a plugin may be best

chedbob.pm
2018-11-04 21:51
thanks Shane and Rob

shane
2018-11-05 15:00
- Tomorrow (Tuesday November 6th) at 11:00 am PST will be our regularly scheduled Digital Rebar meetup (v029). Check out the agenda and details: https://www.meetup.com/digitalrebar/events/lchdhpyxpbjb/

zehicle
2018-11-05 15:02
@chedbob.pm "remote actions" relates to the question you are asking about

sean.t.beeg
2018-11-05 17:51
Hey there guys. hope all is well. Rob invited me here. :slightly_smiling_face: anyway, I just setup a test VM running digital rebar. the VM has a public interface that faces our lab (but not the internet) and a private interface used for PXE booting. When I visit the DNS name of the digital rebar server on port 8092/ux, I am getting a "No route" response (404) from the server

sean.t.beeg
2018-11-05 17:51
I should point out that using CLI I was able to get sledhammer to work on an R630 via PXE boot

sean.t.beeg
2018-11-05 17:55
also am getting the same when I visit using firefox on the server itself using the loopback address.

shane
2018-11-05 18:15
@sean.t.beeg - welcome - if we didn't say that already :slightly_smiling_face: For the Portal - the laptop/management machine that is used to manage the DRP Endpoint must have Internet connectivity to `http://portal.rackn.io`(HTTPS port 443) and access to the DRP Endpoint port 8092 (by default - unless changed)

sean.t.beeg
2018-11-05 18:16
my mac can reach that address, and that's where I'm accessing the URL on my server from. my mac uses a web proxy, could that be part of our issue here?

shane
2018-11-05 18:17
the Web App is operating in a CORS (cross origin resource sharing) model - so it's basically "proxying" the connection between the DRP endpoint and the hosted SaaS site

shane
2018-11-05 18:18
also - make sure you use the IP address of your "public facing" interface on DRP endpoint - if the DNS resolves to the internal address (PXE) - that would likely break the Portal connection through the web browser

sean.t.beeg
2018-11-05 18:20
@shane I'll take a look at the ip on the DRP endpoint. that's the only thing left

sean.t.beeg
2018-11-05 18:20
your assistance is most appreciated

sean.t.beeg
2018-11-05 18:21
:slightly_smiling_face:

shane
2018-11-05 18:21
you're web browser URL should look something like: https://portal.rackn.io/#/e/1.2.3.4:8092

shane
2018-11-05 18:22
where "1.2.3.4" is the your internal public IP of the DRP endpoint

shane
2018-11-05 18:22
(or a DNS record that resolves to that address)

sean.t.beeg
2018-11-05 18:23
HEY doing that manually worked

shane
2018-11-05 18:24
also - unless you've installed a CA signed TLS cert - you'll need to visit the DRP endpoint address directly - and accept the Self Signed Cert in your browser - otherwise, it'll fail

shane
2018-11-05 18:24
woot woot !

sean.t.beeg
2018-11-05 18:24
check that noise out. :slightly_smiling_face:

sean.t.beeg
2018-11-05 18:24
ok I'm on my way.

sean.t.beeg
2018-11-05 18:24
I've got about 15 R630 compute nodes I want to fool with for a bit

sean.t.beeg
2018-11-05 18:25
I'll let you guys know how it works out

shane
2018-11-05 18:45
excellent

dave.parker
2018-11-05 23:32
Why can't I set the unknown bootenv to something other than "discovery"?

zehicle
2018-11-05 23:33
from the UX?

dave.parker
2018-11-05 23:34
It tried from the cli as well.

zehicle
2018-11-05 23:34
you can only choose Bootenvs that are available

dave.parker
2018-11-05 23:34
It says my bootenv cannot be used for the unknownBootEnv

zehicle
2018-11-05 23:34
so check to make sure that you are picking an available bootenv

dave.parker
2018-11-05 23:34
Ok

zehicle
2018-11-05 23:34
one of those battle scars, protect operators from bad decisions things

dave.parker
2018-11-05 23:35
It looks good. It has a check mark next to it anyway. How else can I tell?

zehicle
2018-11-05 23:35
it does not help you if your upgrade breaks the bootenv

zehicle
2018-11-05 23:35
check the logs and see what it's telling you is the reason

zehicle
2018-11-05 23:36
OH... there's also a "can be unknown bootenv" flag that needs to be set

dave.parker
2018-11-05 23:36
Oh!

dave.parker
2018-11-05 23:36
Ok, where's that?

zehicle
2018-11-05 23:36
another protection since the unknown bootenvs are insecure

zehicle
2018-11-05 23:36
on the bootenv object

dave.parker
2018-11-05 23:36
Ok let me look.

dave.parker
2018-11-05 23:37
"OnlyUnknown"?

zehicle
2018-11-05 23:37
yes

dave.parker
2018-11-05 23:37
Ok, let's see.

dave.parker
2018-11-05 23:40
Yup, that did it. Thanks!

zehicle
2018-11-06 00:43
That flag limits the tokens that can be generated for the machine

zehicle
2018-11-06 00:47
It's some of the "holy cow! Drp does that magic?!" I discover all the time as I dig around.

asingla
2018-11-06 02:10
has joined #community201811

zehicle
2018-11-06 02:33
@asingla $welcome

2018-11-06 02:33
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

asingla
2018-11-06 02:37
Great to be part of the community...

zdunn
2018-11-06 14:26
Has anyone tried krib in memory with a ZFS root pool?

zehicle
2018-11-06 14:39
@zdunn not that I've heard of.

shane
2018-11-06 14:43
@zdunn - I believe @nkabir was doing some things with ZFS - but he was using Ansible IIRC to provision the ZFS volumes

shane
2018-11-06 14:44
don't think anything w/ the root pool, though

zdunn
2018-11-06 15:07
k

zdunn
2018-11-06 15:08
I was thinking in memory OS with ZFS libs in place, pulling in the zpool on boot (obviously we would need to have a configure step if it didn't exist)

shane
2018-11-06 15:35
Sledgehammer (the DRP discovery image) is in-memory, but it probably doesn't have the ZFS libs by default - though you could build a workflow task that adds the pkgs and then does what you want ... i you work towards this - we'd love to add it in to community

zdunn
2018-11-06 15:36
we are looking for sure

zdunn
2018-11-06 15:36
we are also looking at rancherOS

zdunn
2018-11-06 15:37
it's sort of already doing this*

zdunn
2018-11-06 15:37
aside from ZFS

zdunn
2018-11-06 15:37
but they support ZFS in their ondisk install

zdunn
2018-11-06 15:40
If we wanted to add in the zfs tools and libs is there some good docs on building custom sledgehammers?

zehicle
2018-11-06 15:40
using the data in inventory/gohai, it would be pretty easy to build a stage that does the right install steps. Sledgehammer is Centos7 so a yum install would work. The biggest unknown is logic to pick the right volumes automatically. For starters, you could provide via a param instead

zdunn
2018-11-06 15:40
that would be good for an ondisk install

zdunn
2018-11-06 15:40
but for PXE'ing an in memory image

zdunn
2018-11-06 15:40
we don't want 20 mins of compiling

zehicle
2018-11-06 15:41
the stages include "attach disk"

zehicle
2018-11-06 15:41
if you want all in RAM, then just extend the system that's there

zdunn
2018-11-06 15:41
sure - that would be good after we have the image!

zdunn
2018-11-06 15:42
the steps with rebar make this all easier in someways

zehicle
2018-11-06 15:42

greg
2018-11-06 15:42
We don?t use that anymore

zehicle
2018-11-06 15:42
BUT... typically, it's easier to add items into the image using stages postinstall than building a new sledgehammer

zehicle
2018-11-06 15:42
that makes it more durrable

zehicle
2018-11-06 15:43
oh! right. sorry

zdunn
2018-11-06 15:43
@zehicle wouldn't that make it harder to do an in memory install?

zehicle
2018-11-06 15:43
I think this is a specific topic for the meetup today

zdunn
2018-11-06 15:43
stages that is

zdunn
2018-11-06 15:43
vs launching a prebuilt artifcat

zdunn
2018-11-06 15:43
I would rather lean into the baremetal as cattle analogy for kubernetes

zehicle
2018-11-06 15:44
no, you'd just do the install

zdunn
2018-11-06 15:44
we've been doing something similar with Joyent's Triton and it's worked well

zdunn
2018-11-06 15:44
I don't grok > `no, you'd just do the install`

zdunn
2018-11-06 15:45
if I am loading the sledgehammer image - they including a step for ZFS

zdunn
2018-11-06 15:45
I would need to do that every time I reboot

zehicle
2018-11-06 15:45
there's a balance between doing a stage to install something and having to maintain your own image.

zdunn
2018-11-06 15:45
sure

zdunn
2018-11-06 15:45
but I don't want 30 min reboot times either

zdunn
2018-11-06 15:45
MTR for a rack failure would be awful

zehicle
2018-11-06 15:45
check out the kexec flag. from sledge to sledge, you don't need to reboot

zdunn
2018-11-06 15:46
interesting

zdunn
2018-11-06 15:46
but a power failure / hardware replacement / panic would mean a reboot and so a recompile?

zehicle
2018-11-06 15:46
recompile?

zdunn
2018-11-06 15:47
of ZFS

zdunn
2018-11-06 15:47
if it's a stage

zdunn
2018-11-06 15:47
in the pxe boot

zdunn
2018-11-06 15:47
so we would pxe into sledgehammer

zdunn
2018-11-06 15:47
then install zfs (yum whatever we find works best)

zdunn
2018-11-06 15:48
that would compile ZFS

zdunn
2018-11-06 15:48
and the ZFS tools

zdunn
2018-11-06 15:48
then attach disks

zdunn
2018-11-06 15:48
(i am sure I am skipping a lot of other bits)

zehicle
2018-11-06 15:50
I don't have enough experience to know why you need to recompile ZFS each time.

smedefind
2018-11-06 15:52
ZFS has to be specifically compiled for each kernel version. We could probably lock sledgehammer to a version and create a package off of that.

smedefind
2018-11-06 15:52
Basically ZFS can be a pain in the ass

zdunn
2018-11-06 15:53
well with rancherOS it's a PITA because they do build a custom kernel etc https://github.com/rancher/os-kernel/commit/93b71d24fb6368d38f562184d6fb5a97926265df

zdunn
2018-11-06 15:53
we could do something with the yum packages for ZFS

zdunn
2018-11-06 15:53
but that I believe really just does a bunch of the compiling for you in the backgroudn

zdunn
2018-11-06 15:53
since the don't want to run afoul of the licensing (?)

zehicle
2018-11-06 15:54
you could do the compile once then then push it to the endpoint/files.

zdunn
2018-11-06 15:54
we will try doing the stages and see how long it takes

zehicle
2018-11-06 15:54
if the bits are there, then use before compile

zdunn
2018-11-06 15:54
we do have 48 cores to throw at this

zdunn
2018-11-06 15:54
and kexec may save us enough inbetween

zehicle
2018-11-06 15:55
@zdunn this is the type of work that RackN can help with building and designing as a services engagement

zdunn
2018-11-06 15:55
Way to close the deal @zehicle

zdunn
2018-11-06 15:55
:smile:

knibble
2018-11-07 11:49
has joined #community201811

tom.gillman
2018-11-07 14:59
should `/usr/local/bin/incrementer` and `/usr/local/bin/drbundler` be removed when running `tools/install.sh remove` ??

greg
2018-11-07 15:00
oh - probably. :neutral_face:

jzimmer
2018-11-07 15:02
has joined #community201811

zehicle
2018-11-07 15:10
@jzimmer $welcome

2018-11-07 15:10
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

zehicle
2018-11-07 15:10
@knibble ^^ for you too!

jzimmer
2018-11-07 15:12
Thanks Rob.

nistor
2018-11-07 15:32
has joined #community201811

jzimmer
2018-11-07 15:53
i am installing DRP for the first time, and trying to set up a small POC for a larger upcoming project. If i have questions is this the right space to just post, or should i DM someone?

greg
2018-11-07 15:58
Start here, @jzimmer

jzimmer
2018-11-07 16:01
sounds good

greg
2018-11-07 16:53
$welcome - @nistor

2018-11-07 16:53
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

nistor
2018-11-07 17:01
howdy! glad to be here :laughing:

zdunn
2018-11-07 17:35
Is the best place to ask questions about krib here?

greg
2018-11-07 17:35
well - sure. :slightly_smiling_face:

zdunn
2018-11-07 17:36
lol

zdunn
2018-11-07 17:36
Just wondering how best to debug a "failure"


shane
2018-11-07 17:36
there is a Param you can set which enables better debugging

zdunn
2018-11-07 17:37
--tell-me-whats-going-on-brah ?

shane
2018-11-07 17:37
it essentially enables `set -x` for all of the scripts - which will output a lot more about what is actually run

zdunn
2018-11-07 17:37
k

shane
2018-11-07 17:37
hold on - finding it

shane
2018-11-07 17:39
`rs-debug-enable` - set that to True - either in the Global profile, or via a Profile / Param added to the Machine object of the Machine(s) you want to debug

shane
2018-11-07 17:40
for example - you could just set it in the `krib` profile you are using to hold your cluster information - since that should already be applied to all of the machines in question

shane
2018-11-07 17:40
if you include `{{ template "setup.tmpl" }}` in any custom content - you can have this mechanism at your disposal for BASH scripts you develop for content

zdunn
2018-11-07 17:42
cool trying that now

zdunn
2018-11-07 17:49
alright! that got me: `Error: Key, krib/cluster-bootstrap-token, already present on profile test-krib-ha`

zdunn
2018-11-07 17:49
do I need to remove keys between runs?

shane
2018-11-07 17:50
yes - there's a whole "reset procedure" you need to run to clean things up ...

shane
2018-11-07 17:50
if you are doing multiple runs of the tool

zdunn
2018-11-07 17:50
ahh ok

zdunn
2018-11-07 17:51
giving that a shot

zehicle
2018-11-07 17:51
it's the cluster-reset workflow

zehicle
2018-11-07 17:51
pick a machine in the cluster and run that workflow

shane
2018-11-07 17:52
(only needs to be run on ONE machine)

zehicle
2018-11-07 17:52
IT WILL REMOVE YOUR CLUSTER CREDENTIALS

nistor
2018-11-07 17:53
I totally forgot -- was there an pre built ISO to deploy for a DRP if you wanted to dedicate a machine to it? I think maybe I had heard it was a centos image?

zdunn
2018-11-07 17:54
@zehicle I think that's the issue I am running into now :slightly_smiling_face:

zdunn
2018-11-07 17:54
it's just a test to play around

greg
2018-11-07 17:54
@nistor - you need an existing machine running linux (centos7)

nistor
2018-11-07 17:54
gotcha ok thnx

greg
2018-11-07 17:54
Then you run the install.sh from the curl bash command.

jzimmer
2018-11-07 18:10
are there any documents around for BIOS Configuration?

zdunn
2018-11-07 18:12
ok - so next stupid krib config question: ```./krib-get-masters-krib-get-masters.sh.tmpl@275(): echo 'Missing krib/cluster-master-vip on the machine!' ```

zdunn
2018-11-07 18:13
I think I just need to configure this into a range that it is in?

zdunn
2018-11-07 18:13
e.g. the current nodes are all in 10.1.0.0/24 and the default master vip is 10.10.10.10 or something similiar

shane
2018-11-07 18:15
@zdunn - for "playing around" - I'd suggest starting with a non-HA cluster build - which will use one controller - the VIP is only needed for an HA config

shane
2018-11-07 18:15
it's a more "advanced" usage pattern

zdunn
2018-11-07 18:15
well we are never going to run a non-HA env

zdunn
2018-11-07 18:16
so IDK if that makes much sense to do that

shane
2018-11-07 18:16
however - that Param defines an IP address that will be used by nginx as the VIP for the control plane - so it needs to be routable to all of your masters

shane
2018-11-07 18:17
typically an IP on the "public" side of the cluster, in the same Layer2 environment

shane
2018-11-07 18:17
that all of your masters sit in

zdunn
2018-11-07 18:25
gotcha

shane
2018-11-07 18:25
if you are using a Layer3 topology - it's left up to the operator to insure that IP (VIP) is able to float between machines via your infrastructure

zehicle
2018-11-07 18:26
If you start at ha, you may slow learning the pattern.

zdunn
2018-11-07 18:28
I am slow learner

zdunn
2018-11-07 18:28
it's okay

zdunn
2018-11-07 18:28
it's exposing me to more features as well

zdunn
2018-11-07 18:29
just waiting on the worlds worse DNS now to resolve out my server names (didn't know that when i started)

shane
2018-11-07 18:31
NS1 !!

zdunn
2018-11-07 19:01
lol

zdunn
2018-11-07 19:01
we are on aws

zdunn
2018-11-07 19:01
resolution of new dns entries takes forever

smedefind
2018-11-07 19:01
I wish we had NS1 money

zdunn
2018-11-07 19:08
now onto ``` [ERROR KubeletVersion]: the kubelet version is higher than the control plane version. This is not a supported version skew and may lead to a malfunctional cluster. Kubelet version: "1.12.2" Control plane version: "1.11.1" ```

greg
2018-11-07 19:09
Yes - this is a bug(kinda). set the k8s version parameter globally to `1.12.2`

greg
2018-11-07 19:10
I think the kubelet installer isn?t pay attention to that var and getting the latest. Where the control plane is using the var.

zdunn
2018-11-07 19:14
```krib/cluster-kubernetes-version (string):```

zdunn
2018-11-07 19:14
?

shane
2018-11-07 19:27
@zdunn - here's the original response that worked for @seaton

zdunn
2018-11-07 19:27
nice

shane
2018-11-07 19:27
(note the "v" in there)

zdunn
2018-11-07 19:48
huh different errors

zdunn
2018-11-07 19:48
```Tiller NOT running - something went wrong with helm init ```

zdunn
2018-11-07 20:32
so things are somehow already running there...

zdunn
2018-11-07 21:28
yeah it does seem to install but then not register where it has installed things?

greg
2018-11-07 22:02
@zdunn - are you rerunning the stages without resetting them?

greg
2018-11-07 22:02
These aren?t necessarily idempotent.

zdunn
2018-11-07 22:02
Nah

zdunn
2018-11-07 22:02
I've reset a couple times

zdunn
2018-11-07 22:03
It blew away the custom attributes

greg
2018-11-07 22:03
Are the machines persistent?

zdunn
2018-11-07 22:03
No

zdunn
2018-11-07 22:03
Live

greg
2018-11-07 22:11
Hmm - actually, I guess your last error is the one to look at.

greg
2018-11-07 22:12
From that machine, you should be able to do `drpcli profiles get <krib profile> param "krib/cluster-admin-conf"`

greg
2018-11-07 22:12
Redirect that to a file: `label.conf`

greg
2018-11-07 22:12
Then do `kubectl --kubeconfig=label.conf get nodes`

greg
2018-11-07 22:13
You should then be able to look into the label.conf and see what is in it if the command fails.

dave.parker
2018-11-07 22:29
I've got a bootenv I made to handle unknown systems and it's telling me it can't add a system node because the bootenv is flagged as "UnknownOnly". But it doesn't work as an Unknown bootenv if it's not flagged that, plus the default discovery bootenv is flagged UnknownOnly but can still create machine nodes. What gives?

greg
2018-11-07 22:31
Ummm @dave.parker, I?m not sure what you are doing when you get that message. The `UnknownOnly` bootenvs can only be used in the Unknown Bootenv Preference. They can not be assigned to a machine.

tom.gillman
2018-11-07 22:31
Could one put preferences as part of a content bundle?

greg
2018-11-07 22:32
@tom.gillman - sadly no. That is an architectural oversight in the content system.

greg
2018-11-07 22:32
You can kinda do it, but they don?t take effect like you think they should.

tom.gillman
2018-11-07 22:33
I ask, because the implication is that contents are RO, and preferences seem to want to be modified.

greg
2018-11-07 22:34
That is correct. From a history perspective, prefs were created before the object system solidified and then didn?t get truly converted over to objects. Prefs are also the only ?objects? that can be specified on the command line that are set if not set.

dave.parker
2018-11-07 22:34
I'm booting a new machine. I have the Unknown Bootenv preference set to my bootenv, which boots an image that then tries to do discovery much like sledgehammer does. But I get the error that says it can't do that because the bootenv is set to UnknownOnly.

greg
2018-11-07 22:35
okay - so - are you trying to create a machine and getting this error?

dave.parker
2018-11-07 22:35
Yup

greg
2018-11-07 22:36
okay are you explicitly setting the machine?s bootenv in the create object?

dave.parker
2018-11-07 22:36
Hrm. Good question.

greg
2018-11-07 22:37
The thing to realize is that `discover` and `sledgehammer` are bootenv pairs that subtly tied together to do the creation dance.

greg
2018-11-07 22:37
Why are you creating your own unkonwn bootenv?

dave.parker
2018-11-07 22:39
Nope, it's just setting hostname and IP and MAC.

dave.parker
2018-11-07 22:39
I'm trying to decouple DHCP from the discovery process so I can do discovery in environments where I don't have DHCP and cant install a local rebar server.

greg
2018-11-07 22:40
okay - so what are you booting?

greg
2018-11-07 22:41
or how really?

greg
2018-11-07 22:41
Because there are a couple of things now in tip the could be used instead.

dave.parker
2018-11-07 22:41
Oh really?

dave.parker
2018-11-07 22:41
It's a modified sledgehammer image.

greg
2018-11-07 22:41
potentially.

greg
2018-11-07 22:42
There is a new command: `drpjoin`

dave.parker
2018-11-07 22:43
Basically I unpacked the stage1.img and stage2.img images, made all the necessary changes for it to take a static IP argument from the bootenv and use that instead of dhcp, and it... works? Except the very end where it goes to create the node and fails.

greg
2018-11-07 22:43
This takes the file URL of the DRP endpoint as its argument: `https:<drpip>:8091`

greg
2018-11-07 22:44
okay - one thing at a time. Your sledgehammer image - how does the machine boot it?

dave.parker
2018-11-07 22:45
I cloned the discovery bootenv and set it to load my image instead of regular sledgehammer, then set that bootenv to the Unknown Bootenv.

greg
2018-11-07 22:46
Right, but where does the IP parameter come from? Or is static in the image?

dave.parker
2018-11-07 22:47
Ah, that's set in the boot parameters. I modified the init (stage 1) and sledgehammer-start.sh (stage 2) scripts to grab those (the same way it grabs stuff like provisioner.web) and use the IP and gateway I set in the bootenv.

greg
2018-11-07 22:48
so you are only going to ever manage one machine?

dave.parker
2018-11-07 22:50
Per site, yes. I'm doing this so I can build an initial machine in a new site from a remote drp server without DHCP. Basically that first machine will be the site's drp server. After that it's DHCP and sledgehammer as usual.

greg
2018-11-07 22:50
ok

greg
2018-11-07 22:52
thinking

dave.parker
2018-11-07 22:57
Ok. Well take your time. I'm taking off for the night. If you think of something let me know. :smile:

greg
2018-11-07 22:57
Can you send me you bootenv? DM is fine

dave.parker
2018-11-07 22:57
I'm just confused why the regular discovery/sledgehammer process works but my slightly modified version doesn't.

dave.parker
2018-11-07 22:57
You want the JSON?

greg
2018-11-07 22:58
sure

greg
2018-11-07 22:58
or yaml would be better (--format yaml) is your friend (well mine)

dave.parker
2018-11-07 22:58
k

tom.gillman
2018-11-08 00:46
so, `drpcli bootenvs uploadiso <ISO>` documentation suggests that I can specify a URL such as `https://my.image.site/isos/my_image.iso` -- Is this a true statement?

shane
2018-11-08 00:47
@tom.gillman - no, the bootenvs uploadiso command reads the BootEnv `OS.IsoUrl` location

zehicle
2018-11-08 00:47
I was about to say: I do know that it can figure it out from the bootenv URL.

zehicle
2018-11-08 00:48
it's better to do it from the bootenv so that you can manage/version the source location

shane
2018-11-08 00:49
there is an alternative that "works" - but it is not guaranteed to be safe ... you can download / copy via standard shell commands on the DRP Enpoint, and copy the ISO/TAR file referred to in the BootEnv place it in the `drp-data/tftpboot/isos/` directory then `pkill -HUP dr-provision`

shane
2018-11-08 00:50
note if someone else starts an API call that does the same thing - bad things may happen

tom.gillman
2018-11-08 00:50
good enough.

shane
2018-11-08 00:50
(`drp-data` is in isolated mode - or in `/var/lib/dr-provision` in "production" install mode)

shane
2018-11-08 00:50
the SHA256 hash and the filename must match what the BootEnv states, otherwise it'll fail validation and not be "exploded" out correctly

shane
2018-11-08 00:51
you can also pre-stage ISOs this way, prior to starting the DRP Endpoint up on first run - and it'll do the right things with (assuming SHA/name are right)

tom.gillman
2018-11-08 01:06
ok, last question for the evening. If I want to get content packs, must I use the UI, or is there a way to get them via the cli?

shane
2018-11-08 01:06
Which ones?

tom.gillman
2018-11-08 01:07
like, classify, os-other, drp-community-contrib

shane
2018-11-08 01:08
I'm away from computer - but I think you can search this channel, I posted before here

tom.gillman
2018-11-08 01:17
+1 for remembering a conversation from a year ago

tom.gillman
2018-11-08 01:21
no offense, but that's ridiculously complicated and doesn't lend itself well to automation.

shane
2018-11-08 01:22
which answer did you find ? it might have gotten a bit better, but probably not

tom.gillman
2018-11-08 01:22
I searched on "content pack cli" and found something from Nov 1, 2017

shane
2018-11-08 01:22
we know the artifact download process could be a bit better via API call ... but ... other things ...

shane
2018-11-08 01:24
curious what you would suggest ? it's a single CURL call with an auth string ... not really that much too it

shane
2018-11-08 01:24
that's the current method

tom.gillman
2018-11-08 01:25
what would be nice would be something like `drpcli contents create <URL>` where URL points to the content pack you want. At that point, getting the list is the problem.

shane
2018-11-08 01:26
ah - yes, we've added in helpers to point to external resources in some places - but the `contents create` hasn't been updated with that

shane
2018-11-08 01:26
Pull Requests/Patches are gladly welcome !! :slightly_smiling_face:

tom.gillman
2018-11-08 01:27
Then all I have to do is define ``` my_extra_contents: - pack1 - pack2 - pack3``` and feed to an ansible role and go.

shane
2018-11-08 01:28
yep - totally "get" the utility / use cases for it :slightly_smiling_face:

shane
2018-11-08 01:28
you'd also need an AUTH string - so there's a little more to it than that, but could be a bit easier

shane
2018-11-08 01:33
would you please file a feature enhancement - that helps us to prioritize added enhancements, etc

tom.gillman
2018-11-08 01:33
Sure


greg
2018-11-08 03:35
WAIT A MINUTE! @tom.gillman and @shane

greg
2018-11-08 03:35
this works: `drpcli contents create http://hostname/path/to/my/favorite-cp.yaml`

greg
2018-11-08 03:36
now auth strings are tougher, but I think https works and it might work with username/password auth.

greg
2018-11-08 03:37
Second (or first from previosu conv), you can do `drpcli isos upload <file or url>` as well.

greg
2018-11-08 03:37
that puts a file into the isos directory by api.

greg
2018-11-08 03:37
and triggers a bootenv reload if it matches a name and checksum.

tom.gillman
2018-11-08 03:38
now if only there were some way to easily determine content pack names. :wink:

greg
2018-11-08 03:38
What do you mean?

greg
2018-11-08 03:39
file name will be whatever and the name of the cp will come from within the yaml.

greg
2018-11-08 03:39
You can also do `upload` instead of `create`

greg
2018-11-08 03:39
That will create or update if already created.

tom.gillman
2018-11-08 03:42
Thinking towards the automation aspect. The idea is ```1. install DRP 2. Load content packs 3. upload ISOs``` and then off to the races

zehicle
2018-11-08 03:42
Imho, the issue is having a registry built into the cli.

zehicle
2018-11-08 03:43
That's all the ux really does.

tom.gillman
2018-11-08 03:43
I've got 1 and 3 pretty well hands free. I'm looking making #2 as stupid simple as possible

zehicle
2018-11-08 03:44
There's a url for the rackn registry

shane
2018-11-08 03:44
Already has it :slightly_smiling_face:

zehicle
2018-11-08 03:45
Then it's just two calls

zehicle
2018-11-08 03:49
No contract on that api but it's been stable

ilari.oras
2018-11-08 15:23
has joined #community201811

ilari.oras
2018-11-08 15:43
Hello there!

ilari.oras
2018-11-08 15:44
I was wondering if there's a way to do CentOS installs with custom kickstart file, what would be templated to each server? (I'd like to set up static IP addresses for servers (on OS side), and currently we have an interactive kickstart file asking for those to be put in interface configuration)

zehicle
2018-11-08 15:54
@ilari.oras $welcome. yes, that's exactly what DRP does

2018-11-08 15:54
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

bagricola
2018-11-08 15:55
hmm? possible to remove EPEL from the `centos-drp-only-repos` job?

zehicle
2018-11-08 15:55
check out the CentOS bootenv files and then look a the CentOS workflow stages

zehicle
2018-11-08 15:56
sorry @bagricola that was not to you.

bagricola
2018-11-08 15:56
yeah i guessed :smile:

zehicle
2018-11-08 16:01

zehicle
2018-11-08 16:01
there are some options on different repos

zehicle
2018-11-08 16:01
but that template only wires .MachineRepos

zehicle
2018-11-08 16:02
it looks like you'll need to edit the package-repos param

bagricola
2018-11-08 16:03
hmmk? so it uses the repo defined on the iso / bootenv by default, and override means edit package-repos to include *all* required repos

bagricola
2018-11-08 16:03
(my guess anyway)

zehicle
2018-11-08 16:03
@ilari.oras there are a few ways to do static IPs. we recommend creating DHCP reservations but you can do it a lot of different ways during your configuration.

zehicle
2018-11-08 16:04
yes. remember that you can set param overrides in order from machine, profile (ordered), global profile and param default

zehicle
2018-11-08 16:05
so you have a lot of control over the scope of the change

zehicle
2018-11-08 16:54
@ilari.oras check out this link for Centos and then look at the relevant ../templates https://github.com/digitalrebar/provision-content/blob/master/content/bootenvs/centos-7.yml

shane
2018-11-08 17:37
@ilari.oras - expanding on @zehicle suggestion - if you set a Reservation on the DRP Endpoint side - set your machines to DHCP - they will be assigned a "known IP" (the reservation). If you require truly static IP assignment on the host to avoid DHCP dependencies - let the machine boot, get DHCP reservation, then write a Stage/Task?Template to insert in to your Workflow, that converts the DHCP assigned reservation to a static configuration on the Machine side ...

zdunn
2018-11-08 19:03
I think this is probably the actual error?

greg
2018-11-08 19:07
Yeah - that is probably a bug

greg
2018-11-08 19:08
krib/cluster-kubeadm-cfg is the only remaining param in the profile that needs to be cleared

zdunn
2018-11-08 19:12
yeah I added it

zdunn
2018-11-08 19:12
so that it would complete

dave.parker
2018-11-08 19:16
How do I download a community content pack from the cli? I thought I had the command saved somewhere but can't find it. I want to be able to install a fresh machine and immediately grab drp-community-contrib and install it.

tom.gillman
2018-11-08 19:31
Yeah, that would be awesome, :wink:


zdunn
2018-11-08 19:45
Where are these labels?

zdunn
2018-11-08 19:46
it seemed to "succeed" aside from that!

greg
2018-11-08 19:47
not completely sure. I?d have to check.

zdunn
2018-11-08 19:47
Yeah, I just can't seem to find it

zdunn
2018-11-08 20:01
This seems wrong? Three masters?

greg
2018-11-08 20:02
You said you wanted HA, right?

zdunn
2018-11-08 20:03
sure

zdunn
2018-11-08 20:03
I wasn't sure is all master was OK

zdunn
2018-11-08 20:03
```./krib-config-krib-config.sh.tmpl@365(): kubectl --kubeconfig=label.conf label nodes http://halfling-dollar.us-east.optoro.io env=dev error: 'env' already has a value (dev), and --overwrite is false Command exited with status 1 Action krib-config.sh.tmpl finished Task krib-config failed ```

zdunn
2018-11-08 20:03
seems to be what's failing

dave.parker
2018-11-08 20:34
Thanks Greg

bagricola
2018-11-08 21:05
hmm? does `package-repos` affect sledgehammer in any way?

bagricola
2018-11-08 21:06
my discovery stage is failing because one of the centos 7 mirrors is broken, and discovery runs the LLDP stage which tries to install lldpd

bagricola
2018-11-08 21:06
which fails cos of the bad centos mirror :confused:

tim.putney
2018-11-09 00:01
has joined #community201811

zehicle
2018-11-09 01:27
@tim.putney $welcome

2018-11-09 01:27
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

zehicle
2018-11-09 01:29
@zdunn the labels are from the inventory stage. If you run that it creates the params for that code to run

zdunn
2018-11-09 01:30
gotcha

ilari.oras
2018-11-09 07:11
Thanks for messages, I think I have now plan how to move forward :+1:

ilari.oras
2018-11-09 07:13
Ideally I want to populate things within templates from terraform (and/or from netbox). Is there something in documentnation for that?

zehicle
2018-11-09 13:23
@ilari.oras here's the Terraform docs (which links to videos too): https://provision.readthedocs.io/en/tip/doc/content-packages/terraform.html?highlight=terraform

zehicle
2018-11-09 13:24
a word of caution: it's pretty easy to get into "who owns state" with Terraform such that the TF state file is out sync with reality. DRP is a running service, so it tracks state all the time.

zehicle
2018-11-09 13:26
I know there are people in community integrating w/ netbox. It's pretty easy to push data into netbox from a plugin (RackN builds those all the time). That's a good place to start the integration.

bagricola
2018-11-09 14:03
< Using netbox as a single source of truth for IP assignments

job
2018-11-09 14:21
has joined #community201811

zehicle
2018-11-09 14:24
@job $welcome !

2018-11-09 14:24
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

tom.gillman
2018-11-09 15:54
Where does one view the event log without using the gui?

zehicle
2018-11-09 16:13
@tom.gillman it's a websocket, you need to attach to it. There are some browser plugins that can so it

shane
2018-11-09 16:14

zehicle
2018-11-09 16:15
We've built a CoreOS content pack that supports live and install options. Includes the DRP Runner (so you get full control) and works with kexec to skip reboots. Video Demo: https://youtu.be/abuBUgBdOb8

zehicle
2018-11-09 16:17
note for @christian.tardif @tgelter @daniel.bernier from previous posts ^^

zehicle
2018-11-09 16:18
we have not yet tried to integrate with KRIB or other workflows.

zdunn
2018-11-09 16:23
oh interesting. getting a PCC error PXE'ing now after trying to move from sledgehammer => ubuntu

nistor
2018-11-09 16:29
if I pull down the sledgehammer file manually, how can I jam it into the system? ``` root@drp /home/nistor/drp: ./drpcli bootenvs uploadiso sledgehammer Error: Unable to connect to http://rackn-sledgehammer.s3-website-us-west-2.amazonaws.com/sledgehammer/9c1ad5cb7483928e6aba1d93ba363de929169f37/sledgehammer-9c1ad5cb7483928e6aba1d93ba363de929169f37.tar: Get http://rackn-sledgehammer.s3-website-us-west-2.amazonaws.com/sledgehammer/9c1ad5cb7483928e6aba1d93ba363de929169f37/sledgehammer-9c1ad5cb7483928e6aba1d93ba363de929169f37.tar: dial tcp 52.218.200.43:80: i/o timeout root@drp /home/nistor/drp: ls -l *.tar -rw-r-----. 1 nistor nistor 408166912 Nov 9 10:03 sledgehammer-9c1ad5cb7483928e6aba1d93ba363de929169f37.tar root@drp /home/nistor/drp: ``` This server is in a closed environment so it can't go out and reach for the file.

zdunn
2018-11-09 16:40
Hmmm I thoughat PCC error was because centos => ubuntu

zdunn
2018-11-09 16:40
but discovery works

zdunn
2018-11-09 16:40
but then doing a krib-reset got me the pcc error

tom.gillman
2018-11-09 16:42
try `drpcli isos upload sledgehammer-9c1ad5cb7483928e6aba1d93ba363de929169f37.tar` @nistor

nistor
2018-11-09 16:45
ah ``` root@drp /home/nistor/drp: ./drpcli isos upload sledgehammer-9c1ad5cb7483928e6aba1d93ba363de929169f37.tar { "Path": "sledgehammer-9c1ad5cb7483928e6aba1d93ba363de929169f37.tar", "Size": 408166912 } root@drp /home/nistor/drp: ```

shane
2018-11-09 16:47
@nistor - the `bootenvs uploadiso` is used to manage the BootEnv related assets (ISO, in this case) - which uses the Fields in the BootEnv to find/get the ISO. The `isos upload` is a generic "manage my ISOs" feature - and as long as the ISO name and SHA256SUM match - it'll be correctly associated with the BootEnv, which will then allow it to mark the BootEnv `Available: true` for use

nistor
2018-11-09 16:47
ahhh ok, thanks!

shane
2018-11-09 16:49
you can manage "other ISOs" (and files for that matter) with the `isos upload` command - but it by default manipulates the `tftpboot/isos/` directory files - the analog to that is `files upload` which is better for "other things" to be managed - which correlates to the `tftpboot/files/` directory location

shane
2018-11-09 16:49
so if you want to push artifacts for provisioning use to be hosted on the TFTP/Web server for provisioning activities - that's a good place for it

nistor
2018-11-09 16:49
right now im just looking to go through the install as per the doc but without remote access so I have to pull the files down and move them over to this isolated environment but it's good info for me

shane
2018-11-09 16:50
yep - are you using internal Repo / Mirrors ?

nistor
2018-11-09 16:50
actually, I should clarify, the only option I have is ipv6 in this environment, but frankly it might as well be isolated, sadly most places still do not support v6

nistor
2018-11-09 16:50
no internal repo or mirror, the centos dependencies managed to pull over v6 fortunately

shane
2018-11-09 16:50
are you using `stable` or `tip` version of the product ?

nistor
2018-11-09 16:51
stble

shane
2018-11-09 16:51
we (@greg) has made a lot of changes in Sledgehammer to support IPv6 only environments

shane
2018-11-09 16:51
you may want to pull the most recent version of `tip` and associated `tip` DRP Community Content/Sledgehammer

nistor
2018-11-09 16:52
i will take a look at that if it makes the whole process easier, thanks!

nistor
2018-11-09 17:11
any plans to get "get.rebar.digital" onto v6 so I can use curl right from the box

zehicle
2018-11-09 17:15
That url is going to be rebuilt. So maybe.

nistor
2018-11-09 17:15
s/maybe/yes :laughing:

christian.tardif
2018-11-09 17:28
@zehicle Question regarding CoreOS thing (saw a big smile on @daniel.bernier's face when you sent the announcement :slightly_smiling_face: ), as you're relying on CoreOS iso to do the install, is that to say that UEFI machines are still out of scope (as the iso does not support uefi, unless newer versions do) ?

zehicle
2018-11-09 17:30
@nistor what about pulling from github?

zehicle
2018-11-09 17:30
That's a @greg question

nistor
2018-11-09 17:30
github doesn't support v6

nistor
2018-11-09 17:30
:confused:

zehicle
2018-11-09 17:30
Bleh.

nistor
2018-11-09 17:31
last I checked I was in 2018 :neutral_face: damn github, damn themmmmmmm

nistor
2018-11-09 17:31
in any case im grabbing what i need and just jamming it onto the local box

zehicle
2018-11-09 17:31
rattles cup for v6 repo sponsor

zdunn
2018-11-09 17:32
@zehicle would that get us a discount? :smile:

zehicle
2018-11-09 17:32
Naming rights?

zdunn
2018-11-09 17:32
haha

shane
2018-11-09 17:34
`get-sponsored-by-optoro.rebar.digital` ....

zdunn
2018-11-09 17:35
haha

zdunn
2018-11-09 17:35
well if you know your req's (bandwidth wise) we could certainly talk

zdunn
2018-11-09 17:35
we like to support OSS as much as possible

zdunn
2018-11-09 17:35
we already support rubygems etc

shane
2018-11-09 17:38
@nistor - actually - IPv6 may work for you for the `get` .... try: 2604:1380:2:c000::3

greg
2018-11-09 18:05
@christian.tardif you can boot sledgehammer then Kexec for live Coreos and the install. What I don?t know is if the disk layout supports uefi

shane
2018-11-09 18:19
@nistor - I just tested a `wget` call to get the stable installer script - works fine on IPv6: `wget -O install.sh 'http://[2604:1380:2:c000::3]/stable'`

christian.tardif
2018-11-09 18:40
@greg The disk layout itself should not have problems. What we performed so far is pushing the CoreOS image (the same image CoreOS/RedHat/IBM (!!!) is using through the coreos-install script (in fact, the uefi incompatibility comes from this install script)) directly to the UEFI-enabled servers boot disk, and then adjusting grub accordingly with the new "install" Will try to find myself an available unnsed setup to validate that and will post my findings

greg
2018-11-09 18:51
Okay. That is kinda what meant by disk layout. Didn?t know what bootloader and other positioning things they put in the way. One of the many reasons I don?t actually find Coreos viable

nistor
2018-11-09 19:00
@shane, cool thanks, works from here, although it's the same file with the same issues of no v6 on the repos but that's ok. Are there plans to change the install to validate that a file exists and matches a checksum file accordingly before attempting to download a file? I'm finding I have to edit the install script to check for the file first before trying a DL to skip the DL process. eg: dr-provision.zip (which I know I can pass with the --zip-file) but drp-community-content.yaml isn't checked for before a DL, or the SHA file. --skip-run-check doesn't tie into those files.

shane
2018-11-09 19:02
I didn't have any such plans- but if you have any patches you'd like to submit - we'd be delighted to incorporate them ... :slightly_smiling_face:

nistor
2018-11-09 19:03
coolio ok, ill send over my findings when i'm done :laughing:

nistor
2018-11-09 19:08
@shane, in reading the docs at https://portal.rackn.io/#/ it refers to the tip file there however comparing the /stable and /tip scripts the only difference is this: ``` + if [[ -e $binpath/drpjoin ]] ; then + ln -s $binpath/drpjoin drpjoin + fi ``` Where does the better v6 handling come in ? I guess it has to do with only the version tree not the actual installer/install process by the looks of it, ya?

shane
2018-11-09 19:15
@nistor the IPv6 betterness is in the drp-community-content references to Sledgehammer BootEnv - and Sledgehammer itself which has been enhanced to operate in IPv6 only environments. The installer difference you see is not related to IPv6 in that respect. The `drpjoin` is a new tool to enroll VMs and Containers as an asset that can be managed by DRP workflow.

nistor
2018-11-09 19:15
Ahhhh ok

nistor
2018-11-09 19:15
perfect

shane
2018-11-09 19:15
when you install drp-community-content you need to install the `tip` version with the `tip` DRP endpoint/software

shane
2018-11-09 19:16
that pulls in the `tip` referenced Sledgehammer image with the updates

zehicle
2018-11-09 20:16
@nistor @shane get6.rebar.digital now AAAA resolves in the DNS. You can use it instead of the actual address

josh.knarr
2018-11-10 02:33
has joined #community201811

shane
2018-11-10 15:28
@josh.knarr... $welcome ...

2018-11-10 15:28
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

josh.knarr
2018-11-10 15:29
morning

stobias123
2018-11-10 22:02
Hey guys, something up with the rackn site?

stobias123
2018-11-10 22:03
http://l8istsh9y.com looks like it's still owned by rob but times out

zehicle
2018-11-10 22:18
Yes. Working on that.

zehicle
2018-11-10 22:18
Changing hosts and dns

zehicle
2018-11-10 23:09
@stobias123 thanks for alerting us

zehicle
2018-11-11 03:17
notes that we quietly passed the 250 member mark last week! Welcome to all the new people!

christian.tardif
2018-11-12 13:30
@greg Regarding CoreOS thing.... I'll try to have a chat on it with RedHat folks. As they told me, couple of weeks ago, that CoreOS will be behind their OpenShift / OKD solutions, that would mean an increased support for newer hardware, including UEFI.

zehicle
2018-11-12 14:14
Up to you. We still think sledgehammer is a simple alternative with more support via centos

zehicle
2018-11-12 14:15
Coreos was a community request

zdunn
2018-11-12 14:51
are people generally using rebar to get servers to a good spot before running something like chef/ansible (perhaps as a stage)?

zdunn
2018-11-12 14:51
I guess I am trying to decide where I want configuration logic to live

zdunn
2018-11-12 14:52
bonds, drive layout, ntp, etc

zdunn
2018-11-12 14:52
some I could see being in an image (ntp, security updates, etc)

zehicle
2018-11-12 14:58
If you have working script, use them. We're finding that the stage/task model and content packs are easier to maintain than playbooks and cookbooks

zehicle
2018-11-12 14:59
Esp for physical data where it's baked into drp

zdunn
2018-11-12 15:10
We've been handling all of those configurations via Triton and it's tools

zdunn
2018-11-12 15:11
So this would probably be new work

shane
2018-11-12 15:29
@zdunn - we see adopters of DRP take both paths. Generally speaking, you gain the most from the product with deeper integration by use of Workflows to allow you to orchestrate and manage large fleets of systems. In particular - managing the hardware config - BIOS, Firmware, RAID, setting things like network config / bonds, etc ... are all good use cases. How much you choose to hand off to an external config mgmt tooling varies depending on how much you've invested in that already, versus the desire to use a single tool.

zdunn
2018-11-12 15:44
Generally, I am trying to figure a path to best treat our hardware more like cattle

zdunn
2018-11-12 15:46
right now what hurts us the most is the lack of a good scheduler* and a lot of gotchas with running very linuxy flavored applications in LX Zones * = right now our schedule won't restart containers onto new nodes. So we are exploring k8s a bit there

zdunn
2018-11-12 15:47
All of the hardware management has been handled via Triton up to this point so we are used to one tool to handle just about everything at that level

zehicle
2018-11-12 16:57
interesting note that proto-genesis of this work was Greg and Rob trying to get early Triton (aka Joyant Data Center) to install

rafael.skodlar
2018-11-12 17:13
has joined #community201811

shane
2018-11-12 17:25
@rafael.skodlar... $welcome ...

2018-11-12 17:25
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

zehicle
2018-11-13 16:41
I heard from @greg this morning that KRIB can now use CoreOS live boot in tip now. It shaves about 2 minutes (the docker install) from install. So, that puts it until the 5 minute range

zehicle
2018-11-13 16:42
also, pinned all the versions so that things stop breaking from external default versions changing

nistor
2018-11-13 16:53
hi team, I'm noticing that in the install script you have it calls 7z as the zip program, however on CentOS it doesn't check to see if the binary is there, -- well i mean it sort of does, it checks for '7z' however '7za' seems to be the binary itself, although this is not called anywhere else in the script, the script seem to use bsdtar instead to handle ZIP files.

nistor
2018-11-13 16:54
second issue, I constantly see travis, who the heck is travis :laughing: ``` dr-provision2018/11/13 15:32:02.553994 [52:14]user:rocketskates:frontend [error]: /home/travis/gopath/src/github.com/digitalrebar/provision/backend/bootenv.go:460 [52:14]BootEnv discovery : Explode ISO: Iso sledgehammer-9b5276ac5826520829aa73c149fe672fe2363656.arm64.tar does not exist. Will not be able to PXE boot arch arm64 ```

greg
2018-11-13 17:11
@nistor - 7z is used by DRP itself and needs to be installed for proper functioning. It is not needed by the install script.

nistor
2018-11-13 17:11
Ah ok

nistor
2018-11-13 17:12
what about the 7z vs 7za issue?

nistor
2018-11-13 17:12
``` root@drp /home/nistor/drp-tip: 7z --help bash: 7z: command not found root@drp /home/nistor/drp-tip: root@drp /home/nistor/drp-tip: 7za --help 7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=en_CA.UTF-8,Utf16=on,HugeFiles=on,64 bits,2 CPUs Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz (206D2),ASM,AES-NI) ```

greg
2018-11-13 17:13
it just means that it runs the yum command twice and does nothing.

nistor
2018-11-13 17:13
but if the DRP itself needs it then what is not functioning in my install ?

greg
2018-11-13 17:13
that question doesn?t make sense.

nistor
2018-11-13 17:14
"7z is used by DRP itself and needs to be installed for proper functioning." <-- what does 7z get used for by the DRP itself?

nistor
2018-11-13 17:14
because the binary 7z doesn't exist, 7za does, so it woudl seem that some function may not be working properly internally to DRP

greg
2018-11-13 17:14
exploding windows encoded bootenv isos.

greg
2018-11-13 17:15
It will likely never be an issue for you. And I?m checking the script/drp code that calls it.

greg
2018-11-13 17:15
It may only need the library

nistor
2018-11-13 17:15
ah gotcha, ok cool.

greg
2018-11-13 17:16
Yeah - so - good catch - that will fail, but since that bootenv iso won?t work at all, it is really an issue.

greg
2018-11-13 17:16
Does point out that we should drop the 7z requirement.

nistor
2018-11-13 17:17
coolio

bryan.gallant
2018-11-13 19:43
Hi all! Quick question, is there a simple way to add an alternate version of pxelinux to the system? For instance, ESXi w/ legacy BIOS boots requires syslinux 3.86. It works if replace the lpxelinux.0 w/ the older version, but I?d like a less disruptive method if possible.

zehicle
2018-11-13 20:02
I think you can upload the new and clone/create a new bootenv that points to it



zehicle
2018-11-13 20:53
I think that we've done some of this work before in community. you should check the history.

john
2018-11-13 23:56
has joined #community201811

zehicle
2018-11-14 13:58
@john $welcome

2018-11-14 13:58
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

john
2018-11-14 13:59
Thanks Rob. I've been working my way through the DRP paces and figured I'd join the community

tom.gillman
2018-11-14 14:42
welcome @john

john
2018-11-14 14:48
Is there a way to run a stage or task from the DRP server? Like kicking off a post os-install ansible run?

zehicle
2018-11-14 14:50
Glad to have you here! have you seen the colordemo yet?

john
2018-11-14 14:52
the one where they changed the color metadata?

shane
2018-11-14 14:53
Yes - the colordemo is an example content pack that shows how to build Workflows to do "things"

john
2018-11-14 14:54
I did, I think but I'll go back and check it again.

shane
2018-11-14 14:54
One of those things can be run Cfg Mgmt tooling - Ansible local execution - or make a call out to Tower to do it, etc

shane
2018-11-14 14:55
one point to remember - Workflow executes on the machine being provisioned, in the Sledgehammer (discovery)) environment

shane
2018-11-14 14:55
so any calls/cmds occur from the Machine in question

john
2018-11-14 14:56
ahh yes.. so that's my question :smile:

shane
2018-11-14 14:56
Plugins, on the other hand - execute on the DRP Endpoint side, on behalf of the machine - so depending on your security model around your provisioned machines - that may matter how/what you build for custom content

john
2018-11-14 14:57
ok, so then for post-install "not on the machine" I will dig into plugins

john
2018-11-14 14:58
I've got some scripts that will do things like identify a node by it's serial, then "enrich" by attaching a profile, renaming the machine, creating a dhcp reservation, etc.

shane
2018-11-14 14:59
do you have those in a "content pack" - or are you looking to build them in to one ?

john
2018-11-14 14:59
eventually in a content pack. right now it's adhoc in my virtualbox testing environment

shane
2018-11-14 15:00
cool - colordemo should be a good template for you on how to build that in to a content pack

shane
2018-11-14 15:01
also - there is an Ansible content pack in DRP - which allows for running playbooks on Machines

greg
2018-11-14 15:02
This is also classification and inventory content packs that hand a lot of what your script is doing. Ipmi plugin that also configures and drives IPMI actions.

john
2018-11-14 15:07
can you point me to some of those packs (classification and inventory) ?

john
2018-11-14 15:08
my identify is pretty simple and runs as a stage in discovery workflow

john
2018-11-14 15:08
``` SERIAL=$(drpcli machines get {{ .Machine.UUID }} param gohai-inventory | jq -r '.DMI.System.SerialNumber') PROFILE="devices_${SERIAL}" drpcli machines addprofile ${RS_UUID} ${PROFILE} ```

john
2018-11-14 15:08
the enrich is a bit more complex because of parsing the mac out to make a reservation

john
2018-11-14 15:09
(and I don't know how to make a reservation token on the machine just yet - so I do this from another location)

greg
2018-11-14 15:10
That is pretty clean.

greg
2018-11-14 15:10
They are RackN content. you can get access to them on a trial basis through the portal.

shane
2018-11-14 15:14
@john - here's a video Rob did on the Inventory content piece

shane
2018-11-14 15:14

shane
2018-11-14 15:15

zehicle
2018-11-14 15:20
@john @tom.gillman we/RackN have been doing a lot of development integration here. So our content is battle tested and also integrated to our other content beyond the basic DRP.

john
2018-11-14 15:22
Sounds good. Now that I'm a week into it, I'll go a bit more into the demos and setup a trial soon. I wanted to get a handle on how DRP works first.

zehicle
2018-11-14 15:25
Inventory stage does what you asked. Uses same approach and adds some verification checks

john
2018-11-14 19:58
one other question.. is there a clear cut way to tell when a workflow is done? Best I've found is to grab a machine, then see if CurrentTask is > len(Tasks)

john
2018-11-14 19:59
``` root@provisioner:~/drp# drpcli machines show $RS_UUID | jq 'del(.Params)' > active.json root@provisioner:~/drp# diff active.json done.json 5,6c5,6 < "CurrentJob": "33f5db70-da8e-4d17-bd66-acfffd749b9f", < "CurrentTask": 4, --- > "CurrentJob": "b0dd3844-49bd-4e08-b28d-0daa28eaff77", > "CurrentTask": 5, root@provisioner:~/drp# cat active.json | jq '.Tasks' [ "stage:helloworld", "bootenv:sledgehammer", "helloworld", "stage:sleepstate", "sleep100" ] ```

greg
2018-11-14 20:03
@john - The official way to know if no work is left to do is the `CurrentTask >= len(Tasks)` With regard to workflow completion, I generally prefer having a completion stage that does nothing but by the end of the workflow.

greg
2018-11-14 20:03
Then I can wait on machine events for the machine?s stage to get to complete.

john
2018-11-14 20:03
:+1:

greg
2018-11-14 20:05
something like this: `drpcli machines wait 4f0a44b1-a789-4ba7-a6fc-61541b996b8d Stage sledgehammer-wait`

john
2018-11-14 20:05
I had been putting noop "workflow-done" stages at the end of some workflows. :slightly_smiling_face:

greg
2018-11-14 20:06
Most of the RackN workflows for kubernetes and the like have ?bookend? stages to bracket to workflows.

greg
2018-11-14 20:07
The start stage is useful as well because it allows you to ensure proper bootenv without forcing middle stages to care about bootenv.

john
2018-11-14 20:08
Cool. I was starting to put those stages in at the end, then I wondered if I was missing something. But now I know `CurrentTask >= len(Tasks)`, and that adding a noop stage is a recommended practice.

john
2018-11-14 20:08
Thank you.

greg
2018-11-14 20:09
The challenge with the CurrentTask variable and len(Tasks) is that Tasks can change.

greg
2018-11-14 20:11
If you use the RackN hardware components, they dynamically alter the task list on the fly to inject tasks for bios, raid, and ipmi configuration based upon the presence of adapters and other information. This means that it can grow over time. While jq and getting the machine object repeatedly works, the ?end-cap? stage allows you to setup an event waiter that has a known and easy to validate target.

john
2018-11-14 20:12
understood. I am definitely seeing a waitforstage(uuid, workflow, stagename, timeout) type function

greg
2018-11-14 20:13
The golang api library already has one with a large ability to do complex ops with it. the drpcli has a simpler version of it exposed.

greg
2018-11-14 20:14
The ?runner? uses it to figure out if the machine has work to do. This means that instead of poll looping on the machine object, the runner actually waits for machine changes and gets notified by events to wake up. Allowing for huge scale.

f.hufenreuter
2018-11-15 17:26
has joined #community201811

anthony.lincoln
2018-11-15 21:49
has joined #community201811

shane
2018-11-15 21:53
@f.hufenreuter and @anthony.lincoln - $welcome

2018-11-15 21:53
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

f.hufenreuter
2018-11-16 08:56
hey everybody

zehicle
2018-11-16 12:20
Hello!

andrew
2018-11-16 18:47
has joined #community201811

andrew
2018-11-16 19:26
Hello all, I'm attempting to setup a bootenv for rpi in DRP based off this http://web-docs.gsi.de/~bloeher/howto/rpi3_netboot.html . Where can I tail the DHCP server logs? Doesn't seem to be outputting into /var/log anywhere.

zehicle
2018-11-16 19:42
Journalctl and there is a /logs api too

zehicle
2018-11-16 19:42
You can also run interactive which looks to console

zehicle
2018-11-16 19:43
@andrew ^^ and $welcome

2018-11-16 19:43
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

zehicle
2018-11-16 23:39
just did a pull to add http://KubeVirt.io stage to KRIB workflow! video to come soon but it will be short and simple

zehicle
2018-11-16 23:40
since it's just a "drop and works" stage

zehicle
2018-11-16 23:41
@andrew I'm super curious about your use case! Apples & Pis?

zehicle
2018-11-18 22:46
KubeVirt + Kubernetes = VMs! Here's a demo showing the addition of http://Kubevirt.io to KRIB https://youtu.be/Qi4unF9Q8h0

marc.padovani
2018-11-19 15:29
has joined #community201811

zehicle
2018-11-19 15:32
@marc.padovani $welcome

2018-11-19 15:32
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

marc.padovani
2018-11-19 15:50
Howdy

zehicle
2018-11-19 16:04
REMINDER - we have a meetup tomorrow! Lots and lots of exciting topics as per https://www.meetup.com/digitalrebar/events/lchdhpyxpbjb/

shane
2018-11-19 16:12
Except it's Tues. Nov 20th ... not the 6th ... :slightly_smiling_face:

ivailo.shankov
2018-11-19 16:23
has joined #community201811

ivailo.shankov
2018-11-19 16:25
Good morning everybody!

shane
2018-11-19 16:51
Good morning, @ivailo.shankov, $welcome

2018-11-19 16:51
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

nistor
2018-11-19 18:48
quick question! I have 2 machines booting on the same LAN with sledge hammer, however only 1 machine seems to register under 'machines' in the DRP -- <shrug>

nistor
2018-11-19 18:49
(clarity: either machine, not both)

zehicle
2018-11-19 18:53
@nistor are the MACs unique?

nistor
2018-11-19 18:54
they should be, let me double check

nistor
2018-11-19 18:54
``` ? 100 984b.e16c.9f02 dynamic 0 F F Eth1/11 ? 100 984b.e16d.2050 dynamic 0 F F Eth1/13 ```

nistor
2018-11-19 18:54
yup

zehicle
2018-11-19 18:55
Are you using drp for DHCP? Check the reservations?

nistor
2018-11-19 18:55
I am using drp yes, both systems are receiving IP addresses

nistor
2018-11-19 18:55
I ddidnt setup any reservations at all, left it blank since I didnt want to work with any static options

zehicle
2018-11-19 18:56
Sorry, I meant leases

nistor
2018-11-19 18:56
I show 4 leases in teh system, I see the 2 machiens but only the 1 shows up in 'machines'

shane
2018-11-19 18:56
how big is your Subnet definition for that network ?

nistor
2018-11-19 18:56
. /24

nistor
2018-11-19 18:57
at the prompt of sledgehammer one says '204' and the other says 'localhost' as if it never got anything. the 204 shows up in the machines list.

nistor
2018-11-19 18:57
im just rebooting both again right now to see if I can spot anything different on the startup of the sledge hammer image running

greg
2018-11-19 18:58
Can you log into the console of sledgehammer? ?journalctl -u sledgehammer? to see what happened on the one that didn?t register.

nistor
2018-11-19 18:58
sure, will do once they're back up

shane
2018-11-19 19:03
(user/pass for sledgehammer is: `root`/`rebar1`)

nistor
2018-11-19 19:04
what if I told you on the systems reporting 'localhost' I can't login to it

shane
2018-11-19 19:05
the sledgehammer image usually won't report "localhost" for the name - so ... are you sure sledgehammer is booting there ? does the `/etc/issue` output show "sledgehammer" in the signature ?

nistor
2018-11-19 19:06
the login banner shows 'Digital rebar sledgemhammer <string here> kernel 3.10.0-062...

shane
2018-11-19 19:06
ok - if you hit enter on console, localhost should change to something like 'd-xx-xx-xx-xx-xx-xx` (MAC address w/ `d` in front) - unless you've named the Machines in which case it should pick up the new machine name

nistor
2018-11-19 19:07
thats the thing, it's not changing, the other one did to '204', not even the mac. I guess that's just the setup of the DHCP options

shane
2018-11-19 19:08
can you copy your subnet config? `drpcli subnets show <NAME> --format=yaml` (replace <NAME> appropriately)

shane
2018-11-19 19:09
feel free to DM it to me if you need to for IP anonymity/whatever

nistor
2018-11-19 19:10
nothing private here

nistor
2018-11-19 19:10
``` root@drp /opt/drp: ./drpcli subnets show ens256 --format=yaml ActiveEnd: 10.255.200.199 ActiveLeaseTime: 60 ActiveStart: 10.255.200.101 Available: true Description: "" Documentation: "" Enabled: true Errors: [] Meta: {} Name: ens256 NextServer: "" OnlyReservations: false Options: - Code: 3 Value: 10.255.200.1 - Code: 6 Value: 204.11.48.2 - Code: 15 Value: http://snickers.org - Code: 1 Value: 255.255.255.0 - Code: 28 Value: 10.255.200.255 Pickers: - hint - nextFree - mostExpired Proxy: false ReadOnly: false ReservedLeaseTime: 7200 Strategy: MAC Subnet: 10.255.200.254/24 Unmanaged: false Validated: true root@drp /opt/drp: ```

nistor
2018-11-19 19:11
shouldn't the subnet part be .0/24 ? (I left it default on what the server picked)

bagricola
2018-11-19 19:16
well it?s a /24 so the whole last octet would be masked off anyway, it?s a bit of a weird way to represent it though and I?ve certainly never seen DRP do that myself :smile:

shane
2018-11-19 19:16
is `.254` your DRP address ?

nistor
2018-11-19 19:16
yes

shane
2018-11-19 19:18
yeah - weird - my subnets show proper network number/subnet (`10.10.10.0/24`) notation

nistor
2018-11-19 19:19
let me change that and see if it makes a difference in the system

shane
2018-11-19 19:19
hmm - actually one generated from a real interface shows same as yours

shane
2018-11-19 19:19
`"Subnet": "10.100.16.23/24"`

nistor
2018-11-19 19:19
ya, weird right

shane
2018-11-19 19:19
it must be how our Web UX is building up the subnet spec - it does a lot of "helper" stuff to make it easier to auto-generate a Subnet spec

shane
2018-11-19 19:20
probably a red-herring, as I suspect it's not causing the problem

nistor
2018-11-19 19:20
why wouldn' I be able to login to the sledgehammer image if it can't determine its state and returns localhost?

shane
2018-11-19 19:21
you should be able to log in to sledgehammer regardless ...

tom.gillman
2018-11-19 19:21
How did you get a .204 with the Active Start and Active End not including that value?

nistor
2018-11-19 19:21
the systems' IP is 102

nistor
2018-11-19 19:21
the 'name' of it is 204

nistor
2018-11-19 19:21
the description reads: .11.50.158

shane
2018-11-19 19:21
jeesh ...

nistor
2018-11-19 19:22
so its getting it because it's going out via NAT is my guess (?)

nistor
2018-11-19 19:22
on the external side, these servers are all internal, but use NAT for public outbound

nistor
2018-11-19 19:26
ok I manage dto get logged in, the HP ILO appears to change text to uppercase after the login prompt for some reason :confused:

nistor
2018-11-19 19:27
journalctl -x shows the last 50 lines reading the attached.

nistor
2018-11-19 19:28
want the entire file ?

shane
2018-11-19 19:28
ok - I have to step out for a bit - back around 1pm PST - if you can copy the output in to a Snippet here, that would be helpful

shane
2018-11-19 19:29
hopefully one of the other guys will be able to pick up with you in a short bit

nistor
2018-11-19 19:29
coolio

nistor
2018-11-19 19:32
journalctl -x output

nistor
2018-11-19 19:32
there's the entire output for anyone :laughing:

greg
2018-11-19 19:44
@nistor - you have a PTR records that map to illegal names.

greg
2018-11-19 19:44
204.11.50.158 is not a valid hostname

greg
2018-11-19 19:45
The `getent hosts 10.255.220.101` is return `204.11.50.158` as a hostname.

nistor
2018-11-19 19:47
.... does that break the entire thing?

nistor
2018-11-19 19:47
10.255.220.101 nats out to 204.11.50.158

greg
2018-11-19 19:48
Well - you are trying to set the machine?s name to `204.11.50.158` which isn?t allowed in DRP.

greg
2018-11-19 19:48
It has nothing to do with NAT is appears. It is how your DNS is setup, I think.

nistor
2018-11-19 19:48
hrm how am I trying to set that, via the DHCP /subnet options?

tom.gillman
2018-11-19 19:49
DHCP Option 6

greg
2018-11-19 19:49
The DHCP client in sledgehammer gets an address, but no HOSTNAME option. So, sledgehammer attempts to do a reverse lookup through the getent command.

greg
2018-11-19 19:49
For one machine, it doesn?t return anything, so we set it to the default made up name.

greg
2018-11-19 19:49
The other machine does, so we attempt to reuse it.

nistor
2018-11-19 19:50
DHCP option 6 is DNS server

nistor
2018-11-19 19:50
so if it queries the DNS server and gets an IP back the whole provisioning breaks

nistor
2018-11-19 19:50
is what Im sort of reading here

greg
2018-11-19 19:50
Yes - so sledgehammer sets that in palce and then does the reverse lookup.

vlowther
2018-11-19 19:50
well, we are asking the DNS server "what is the hostname for 10.255.220.101"

nistor
2018-11-19 19:50
is it possible to never use DNS as the option for machine?

vlowther
2018-11-19 19:50
and the DNS server is answering 204.11.50.158

vlowther
2018-11-19 19:51
which is weird.

nistor
2018-11-19 19:52
the NAT setup today was jsut a hack i had to put in to get outside connectivity so I suppose this isn't a normal setup

tom.gillman
2018-11-19 19:52
dots in a dns name have a fairly specific meaning. If you're trying to send an IP as a hostname, you're in for all kinds of heartache

nistor
2018-11-19 19:52
but non the less if it breaks on that then I have to disable the DNS DHCP option 6

tom.gillman
2018-11-19 19:52
Not just in DRP, but pretty much everywhere.

greg
2018-11-19 19:53
right, but there is a way around this.

nistor
2018-11-19 19:53
I get that part, but I didnt expect the process to break because of DNS

nistor
2018-11-19 19:53
I thought hte whole thing would of used mac's for identifiers not DNS resolutions

greg
2018-11-19 19:53
It does use the MAC.

nistor
2018-11-19 19:54
but in the machines list it shows under name, not mac

greg
2018-11-19 19:54
You haven?t told it a hostname, so it is trying to use DNS to guess a Hostname on your behalf and your system sends a bad one.

nistor
2018-11-19 19:54
and it appears from above that because its the same name its an issue

greg
2018-11-19 19:54
Yes, because there are two objects representing the machine.

greg
2018-11-19 19:54
One is the DHCP lease. That is by MAC.

greg
2018-11-19 19:54
The Machine is identified by UUID (completely separate from MAC).

nistor
2018-11-19 19:55
right

greg
2018-11-19 19:55
The machine has soft linking between mac addresses, names, and addresses.

greg
2018-11-19 19:55
You can create a reservation for this mac to IP pair and set option 12 (hostname) in the that reservation to get around your DNS issues.

nistor
2018-11-19 19:56
can I just diable DHCP option 6 in the subnet listing as a workaround as well?

greg
2018-11-19 19:56
Yes

nistor
2018-11-19 19:56
ok cool let me do that then and reboot this one box

nistor
2018-11-19 19:57
my ultimate goal is to never have to use DNS for boot strapping a machine until the OS itself is online

tom.gillman
2018-11-19 19:58
then you shouldn't really hand it out in the dhcp request. :slightly_smiling_face:

nistor
2018-11-19 19:59
true, and i've removed it

nistor
2018-11-19 20:00
I guess as a user I'm surprised that the process fails with the use of DNS in it, I would have expected less dependency on DNS and in this situation 10.255.x.x should never actually resolve to anything, there's no zone file for it, so 204.11.50.158 is the IP it's getting via some other means, not DNS.

nistor
2018-11-19 20:00
as that 50.158 is the NAT outside IP of the fw instance

nistor
2018-11-19 20:00
removing DHCP Option 6 solved the issue, systems report 'localhost' however in the mahciens list I see them all now with the mac address, perfect!

greg
2018-11-19 20:02
The machine list shows localhost?

nistor
2018-11-19 20:02
one says localhsot the other shows dxx-xx-xx

nistor
2018-11-19 20:02
it took some time to show so giving it another 10-20 s

nistor
2018-11-19 20:02
the machines list has the right info

nistor
2018-11-19 20:02
and there it goes, second one now has the right prompt, excellent

nistor
2018-11-19 20:03
the description field it picks, should that also be the "domain name" ?

greg
2018-11-19 20:04
Since you haven?t set one, it does that.

nistor
2018-11-19 20:04
ok cool

nistor
2018-11-19 20:05
now onto more testing :slightly_smiling_face: thnx!

nistor
2018-11-19 21:01
hrm, so if I remove the DNS Option 6, I can't actually do an install of an OS, at least not CentOS that has to go out to the repo since it can't determine any DNS servers. Maybe reservation is my only option.

greg
2018-11-19 21:02
For that one machine, it is likely.

nistor
2018-11-19 21:15
Are workflows stores on the local DRP endpoint system or somehow centralized? Just thinking if I had 125 sites how would I best push the workflow to all of them, or have them constantly in sync without using something like chef or puppet.

john
2018-11-19 21:19
I see that for sledgehammer, control.sh runs `drpcli machines processjobs {{.MachineUUID}}`. This, I assume creates an EventStream websocket. All good. What I've been seeing is that if I create a reservation and the machine changes its ip, that breaks the websocket and you it can't process any more jobs. At this point I have to reboot the machine OR ssh in and run my own instance of `drpcli machines processjobs $RS_UUID`.

greg
2018-11-19 21:21
@nistor - There are RackN road map items for that. Content packages are a simple start.

john
2018-11-19 21:21
I guess my question is if there is a way to force processjobs to re-establish connection on an ip change, or if that would be something of interest for a feature request?

greg
2018-11-19 21:22
@john - Feature / Bug Request.

greg
2018-11-19 21:22
If the connection drops, it should reestablish. The main thing is that the DRP Endpoint is still routable to the Client after the IP change.

nistor
2018-11-19 21:23
@greg, ok cool! thnx

b.quan
2018-11-20 00:06
Is there a way to change the machine state to go back to first in the task list instead of reading from the previous installation? We ran into an issue that even after we change the machine stage and retry deploying the machine, DRP still reads the information from a previous installation failure.

rvakkalagadda
2018-11-20 00:07
has joined #community201811

greg
2018-11-20 00:23
@b.quan - Yes, but it if you need it, it usually means that you are not doing something quite right.

greg
2018-11-20 00:23
So, for example, if you set a machine stage to the current stage, nothing changes.

greg
2018-11-20 00:24
Or you didn?t restart correctly, or change bootenvs or something to make the system different to make it think that something has changed to warrant starting over.

greg
2018-11-20 00:24
With that said,

greg
2018-11-20 00:25
You can do this: `drpcli machines update <uuid> '{ "CurrentTask": -1 }' --force`

greg
2018-11-20 00:25
This will force the current task backwards and start over.

greg
2018-11-20 00:25
It should even wake up a runner.

zehicle
2018-11-20 00:28
@rvakkalagadda $welcome

2018-11-20 00:28
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

b.quan
2018-11-20 03:42
Thank you Greg! We'll give it a try.

martin.olsson
2018-11-20 13:05
has joined #community201811

martin.olsson
2018-11-20 13:10
hello everyone :slightly_smiling_face:

zehicle
2018-11-20 14:20
@martin.olsson $welcome

2018-11-20 14:20
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

martin.olsson
2018-11-20 14:39
I'm currently trying out digital rebar, which I find to be a very interesting tool :slightly_smiling_face:. I've been following the quickstart and I've been reading other digital rebar docs as well. I haven't found any documentation in regards to my problem , so therefore I ask it in here :slightly_smiling_face: I'm trying to install ubuntu 18.04 according to the quickstart steps with a workflow. It seems to start the installation nicely (and also finish it) but afterwards it doesn't boot into the OS (which it should do according to the quickstart (it goes into PXE boot, and jumps out to HDD boot and says similar with "no operating system found")). Could this be a legacy / uefi bios issue? I'm thankful for any tips anyone may have.

tom.gillman
2018-11-20 15:06
That is most certainly a UEFI issue.

tom.gillman
2018-11-20 15:08
If the install isn't going to a hdd in the uefi boot list then you should probably use legacy boot. I ran into the same issue a few weeks ago.

shane
2018-11-20 15:31
@martin.olsson - by default the installer assumes `/dev/sda` as the target install disk. If your bootloader is booting "something else" as the default Boot Device, then post-install reboot will fail. You can boot the machine in to Sledgehammer and log in to the console (or add your SSH keys, and SSH to it) - and see what Sledgehammer identifies as the disk ordering and names (eg `lsblk`). You can specify an alternate Install Disk by setting the Param `operating-system-disk` to an alternate device (eg `sdb` - no preceding `/dev`)

shane
2018-11-20 15:32
if that is not the correct disk you want the OS installed/booted from, then you need to modify your boot order appropriately to insure the BIOS/bootloader loads the correct drive for you

shane
2018-11-20 15:33
(there are RackN commercial plugins for manipulating/managing BIOS/Firmware as part of Workflow)

zehicle
2018-11-20 15:38
@martin.olsson if that works, please let us know and we'll add it to the FAQs

zehicle
2018-11-20 15:38
notes the FAQs are getting long enough to need to re-organization

shane
2018-11-20 15:40
(it's on my list ... :slightly_smiling_face: )

zehicle
2018-11-20 15:54
reminder > community meeting in 3 hours

martin.olsson
2018-11-20 17:35
wow! thanks for all the tips guys :slightly_smiling_face: , i'll try them once I'm at the office again, and get back to ya!

zehicle
2018-11-20 23:12
Community Meeting Recording posted: https://youtu.be/2cXmM8OLX1o

zehicle
2018-11-20 23:12
We covered A LOT of new cool stuff including extensions to the plugin system and running the runner in a container.

martin.olsson
2018-11-21 14:46
@shane I'm having some trouble logging into the sledgehammer discovery environment. It says invalid password or user when using details from the docs below https://provision.readthedocs.io/en/tip/doc/configuring.html#rs-configuring-default - section 6.3 specifies a username and password which I interpret should be valid for sledgehammer env as well or did I understand it wrong? :slightly_smiling_face:

f.hufenreuter
2018-11-21 14:48
(user/pass for sledgehammer is: `root`/`rebar1`)

martin.olsson
2018-11-21 14:49
I'm in, thanks @f.hufenreuter :slightly_smiling_face:

shane
2018-11-21 14:50
@martin.olsson you can also add your SSH public key half as a Param on the machine (either directly, in the `global` profile, or a profile you attach to the machine) - then you can log in with SSH directly - see the $faq for details


shane
2018-11-21 14:52
see section 24.4 - you can also add keys via the Web portal too - of course

martin.olsson
2018-11-21 14:57
got ya!

toby.owen
2018-11-21 17:34
has joined #community201811

shane
2018-11-21 18:49
@toby.owen $welcome

2018-11-21 18:49
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

zehicle
2018-11-21 22:21
Happy Thanksgiving everyone! We'll be lurking on the channel but responses will be slower than usual due.

zehicle
2018-11-21 22:21
:turkey:

robin.cordier
2018-11-22 18:39
has joined #community201811

zehicle
2018-11-22 19:35
@robin.cordier welcome

diego.oberlin
2018-11-23 09:35
has joined #community201811

martin.olsson
2018-11-23 12:25
@shanegreetings again!, just a follow-up on the issue I asked about before. I have performed "lsblk in the sledgehammer env (also got ssh key and remote login to work as well, yay!) and the harddrive is identified as "sda" with underlying sda1, sda2 partitions etc (can supply structure if you wish). I've played around in the bios setting/unsetting uefi/legacy, but haven't managed to get it to proper boot from the hard drive. Any other suggestions? :slightly_smiling_face:

martin.olsson
2018-11-23 13:30
I also had another problem with actually performing pxe booting against the server I'm testing digital rebar with. Sometime it worked fine a few times, then had issues, and then totally stopped working. I changed a lot of configurations, trying to pinpoint the problem. It turns out I had to enable a setting, and fill in some fields that would ensure compatibility with digital rebar in the unifi controller software. Thought you guys could add the following below to your FAQ perhaps? :slightly_smiling_face: Unifi network equipment (using unifi security gateway) digital rebar support in the Unifi software controller: settings --> networks ---> <select your network> --> advanced dhcp options tick in "enable dhcp network boot" -- specify ip address to DRP endpoint -- specify bootfile to use (my case: "lpxelinux.0") I also had digital rebar subnet with dhcp proxy settings (and "lpxelinux.0") with this config.

zehicle
2018-11-23 17:36
@martin.olsson if you have dhcp set correctly then you should not need to set the endpoint because of nextboot.

greg
2018-11-23 19:04
@martin.olsson I?m not sure lpxleinux0 will work with uefi

shane
2018-11-25 01:11
@diego.oberlin $welcome

2018-11-25 01:11
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

diego.oberlin
2018-11-26 14:36
Hi @shane, greetings from a newbie to digital rebar. I'm currently testing it. I've done an --isolated setup, and completed all minimal configs. I've created a vm on our XenCenter and when booting such vm, i got the following message from drp: ``` Nov 26 15:13:53 n15-drp-001 dr-provision[3779]: dr-provision2018/11/26 14:13:53.313176 [3:1]dhcp:dhcp [ warn]: /home/travis/gopath/src/github.com/digitalrebar/provision/midlayer/pxe.go:74 Nov 26 15:13:53 n15-drp-001 dr-provision[3779]: [3:1]Incoming iPXE does not support comboot ``` Any idea what the problem might be? Thank you!

greg
2018-11-26 15:15
These are okay. They indicate that the ipxe agent isn?t as full featured as drp expects. It should have sent a different ipxe and booted that.

greg
2018-11-26 15:16
@diego.oberlin it should have booted a new ipxe and continued. Make sure you set a default unknown bootenv

diego.oberlin
2018-11-26 15:18
oh... I see. I'll try that. Thanks @greg!

shane
2018-11-26 15:43
@diego.oberlin to expand a touch more on what @greg says: DRP PXE boot process goes through some "magic" where it tries to properly identify what ipxe boot file to send to the client - part of that will trigger a few WARN messages of the type you saw, until it hits on a successful boot file for your PXE environment

shane
2018-11-26 15:44
I thought we had a $faq entry related to that - but it looks like we don't - so I'll add one


diego.oberlin
2018-11-26 15:52
I'll take a look at the FAQs as well. Thanks @shane!

s.pisarski
2018-11-26 16:33
- Did the DRP download URL change or is it down?

greg
2018-11-26 16:33
which?

s.pisarski
2018-11-26 16:34
get.rebar.digital/stable

zehicle
2018-11-26 16:34
that's backed by a CNAME to s3 now, should be v4 and v6

shane
2018-11-26 16:35
@s.pisarski - yes it has changed - looks like something broke - checking

s.pisarski
2018-11-26 16:35
@zehicle, thanks!

s.pisarski
2018-11-26 16:35
@shane - thanks :slightly_smiling_face:

zehicle
2018-11-26 16:36

tom.gillman
2018-11-26 16:39
^interim

tom.gillman
2018-11-26 16:39
not that I'm a spelling pedant or anything.

s.pisarski
2018-11-26 16:39
:smile:

zehicle
2018-11-26 16:42
@greg found the problem... fixed now. No trailing / w CNAMES

s.pisarski
2018-11-26 16:43
@greg, @shane, & @zehicle - Thank you all

zehicle
2018-11-26 16:44
@s.pisarski the v6 is not working

s.pisarski
2018-11-26 16:45
@zehicle - thanks for the warning :slightly_smiling_face:

zehicle
2018-11-26 16:54
@s.pisarski I think it was a short lived warning... that's fixed too

zehicle
2018-11-26 16:54
just use the get.rebar.digital It now is dual stack.

shane
2018-11-26 17:54
- our "curl to bash" installer is dual-stack for IPv4 and IPv6. We are going to remove the `get6.rebar.digital` DNS record - unless there are any major objections - I don't think there is much usage of that record in the first place

shane
2018-11-26 17:55
@diego.oberlin - were you able to get past the PXE boot issue ?

diego.oberlin
2018-11-27 09:25
good morning! @shane - no, wasn't able to get in place yesterda. I'm currently getting back into it.

diego.oberlin
2018-11-27 12:36
@shane ok, i was able to get pass the pxe boot issue. Now i've got a properly installed os on the vm. Thanks.

diego.oberlin
2018-11-27 13:05
yeah... now more questions, for instance, ipv6... can dr-provision/rackn dhcp server handle ipv6?

zehicle
2018-11-27 14:28
No. Drp dhcp is V4. V6 is a roadmap item.

zehicle
2018-11-27 14:29
@diego.oberlin happy to talk about customers/sponsors for it to accelerate development

shane
2018-11-27 14:30
DRP will work in conjunction with an external v6 DHCP service just fine, though

diego.oberlin
2018-11-27 15:19
thanks @zehicle @shane, in that case I need to --disable-dhcp at start, right? Involves that other config changes at drp (subnets, reservations,etc) ?

diego.oberlin
2018-11-27 15:24
Glad to talk more about this in a near future. Right now, we need to able to have ipv6 dhcp service in place by any other way. :man-shrugging:

diego.oberlin
2018-11-27 15:28
more questions: what about IMPI plugins for vcenter and xencenter? I wasn't able to find docs related to that. I need to be able to create virtual machines directly through drp api (or eventually through drp ui)

greg
2018-11-27 15:31
We don?t have IPMI plugins for vcenter or xencenter. Would be interesting to have them, but not now.

florent.wagener
2018-11-27 15:37
I have a suggestion for another future release: please don't allow us to be able to display *all* jobs. I just missclicked that on an environment that has been there for a while and I am still crying from the pain it causes to by browser :smile:

greg
2018-11-27 15:40
@florent.wagener - makes sense.

diego.oberlin
2018-11-27 15:46
ok...

florent.wagener
2018-11-27 15:52
I also like the new view of the jobs :slightly_smiling_face:

greg
2018-11-27 15:53
in tip?

john
2018-11-27 16:09
has something changed with the rackn portal in the last day or so? UX v1.5.0 I have two instances of DRP, one on my dev server and one running in docker in localhost. I wasn't able to look at a bootenv on the localhost, and today I super refreshed the one on my dev server and started encountering the same problem.

john
2018-11-27 16:10
I thought it was ublock blocking http://hs-analytics.net but I've disabled ublock and tried different browsers with the same issue

john
2018-11-27 16:13
sorry.. I just tried again and it's only my localhost install. not the dev server. So I guess UX portal is fine

greg
2018-11-27 16:14
Do you have `tip` DRP?

greg
2018-11-27 16:14
`tip` has arch support

john
2018-11-27 16:14
yes. it looks like this is a CORS issue for localhost

john
2018-11-27 16:14
Versions: DR v3.11.0-tip-72 & UX v1.5.0\

greg
2018-11-27 16:15
which changed Bootenv. So, bootenv won?t render correctly with http://portal.rackn.io. You need http://tip.rackn.io

john
2018-11-27 16:16
ahh

john
2018-11-27 16:16
that does it

john
2018-11-27 16:17
thanks for the info

greg
2018-11-27 16:17
Yeah - make sure you have tip content as well.

john
2018-11-27 16:23
that i have. I've been pulling tip code and tip content. But I suppose i should start looking at testing in stable

greg
2018-11-27 16:24
your PRs are in tip.

greg
2018-11-27 16:24
I need to cut a release soon. We haven?t had one for a while and we?ve amassed some features.

john
2018-11-27 16:25
:+1:

greg
2018-11-27 16:26
a while by our standards. A month or two :wink:

john
2018-11-27 16:26
Also I wanted to just say that the content bundle feature is amazing. I converted a bunch of objects into a content bundle yesterday and it's pretty impressive.

john
2018-11-27 16:28
In our last tool, we had to lay out a bunch of json files in a repo and ship a loader tool that would recurse through making api calls. The bundle takes care of that, deletes things you no longer need, and makes the rest as readonly. You guys were definitely thinking that through.

greg
2018-11-27 16:49
@john - glad you like it

jzimmer
2018-11-27 20:28
i am pxe booting an ubuntu machine, and the load keeps haulting at the end of the install at the same point. would it be waiting for something from my workflow that is missing?

dave.parker
2018-11-27 21:02
Quick question, when the sledgehammer image does it's thing and sets params at discovery, what user does it use to talk to the rebar server?

dave.parker
2018-11-27 21:04
That looks very familiar @jzimmer When I've seen that in the past on my systems it's because a post-install action that I had created was failing. Try watching the text console (alt-f4 I think) and you should see what it's doing when it stalls out there.

dave.parker
2018-11-27 21:05
Also check in your jobs to see if any of them failed. Sometimes you get useful stuff there when something goes sideways.

jzimmer
2018-11-27 21:06
ok i will take a look thanks

greg
2018-11-27 21:33
@dave.parker it is a dynamic token for the specific machine with restricted scope to that machine

dave.parker
2018-11-27 21:41
Ok. Should I be able to run drpcli commands from the command line after I log into the running sledgehammer image?

greg
2018-11-27 21:42
well - maybe.

greg
2018-11-27 21:42
:slightly_smiling_face:

greg
2018-11-27 21:42
If you didn?t change the default password for the `rocketskates` user, yes.

greg
2018-11-27 21:43
@dave.parker - drpcli uses defaults - helps the peoples learn.

dave.parker
2018-11-27 21:44
Ah ok, so if the rocketskates user got deleted I won't be able to run commands interactively? How about stages I add to the discovery process via a workflow, will those work?

greg
2018-11-27 21:45
hmm - well - maybe. :slightly_smiling_face:

greg
2018-11-27 21:46
The tasks run by stages can use template expansion to get a token for that machine.

dave.parker
2018-11-27 21:46
Basically I'm running a bash script as a stage after the discovery stage that just grabs the system UUID and adds it as a param. And I can't get it to work and am trying to figure out why. I get some stuff in the job log about a template error and a stage step not existing, but I don't know what that's all about because the stage step does exist.

greg
2018-11-27 21:46
okay - cool.

greg
2018-11-27 21:47
You have this: ``` #!/bin/bash SUID="GOTSOMECOOLWAY" drpcli machines set {{.Machine.UUID}} param my-cool-parm to "$SUID" ```

greg
2018-11-27 21:48
You need to make sure you have credentials and endpoint set.

greg
2018-11-27 21:49
The fastest way to do this is to insert. ``` {{template "setup.tmpl" .}} ```

greg
2018-11-27 21:49
This will under the covers expand and do a few things.

dave.parker
2018-11-27 21:49
Actually I have this, should this work?

dave.parker
2018-11-27 21:49
```#! /usr/bin/env bash cat /sys/class/dmi/id/product_uuid | /usr/local/bin/drpcli machines set ${RS_UUID} param system-uuid to - ```

dave.parker
2018-11-27 21:49
I verified that RS_UUID is set as an environment variable.

dave.parker
2018-11-27 21:50
Hrm ok.

greg
2018-11-27 21:50
That line will work. You still need RS_ENDPOINT and RS_TOKEN set.

greg
2018-11-27 21:50
The setup.tmpl is a template that expands to set those.

dave.parker
2018-11-27 21:50
Ah ok.

dave.parker
2018-11-27 21:50
Let me try that.

greg
2018-11-27 21:50
It is in the community content.

greg
2018-11-27 21:52
hmmm - it seems like that might work with out.

greg
2018-11-27 21:55
I always do the setup.tmpl thing. It appears that if the runner is started with those as env vars, it will then pass them on to the children. Now, this is the trick . Depending upon which bootenvs they may or may not be set. I find it is always safer to not make that assumption.

dave.parker
2018-11-27 21:59
Ok so logged in interactively, I see that RS_UUID and RS_ENDPOINT are set, but RS_TOKEN isn't.

dave.parker
2018-11-27 21:59
So let me see what setup.tmpl does to fix that and see if I can get my interactive session working at least.

greg
2018-11-27 22:00
`export RS_TOKEN="{{.GenerateInfiniteToken}}"`

dave.parker
2018-11-27 22:01
Ah.

dave.parker
2018-11-27 22:01
Magic!

greg
2018-11-27 22:01
edited for completeness.

greg
2018-11-27 22:02
setup.tmpl also does a few things to make sure the right arch executables are in place. Handles CoresOS BS and so on. Fixes up path to find drpcli and jq.

dave.parker
2018-11-27 22:04
Oh I do have setup.tmpl already on my server. So let me just add that to my script and see what happens.

dave.parker
2018-11-27 22:05
lol hey that worked!

dave.parker
2018-11-27 22:05
Awesome. Thanks for your help

greg
2018-11-27 22:06
`setup.tmpl` is part of the default community content. We use it all over the place for this reason.

dave.parker
2018-11-27 22:06
Very handy.

greg
2018-11-27 22:06
it also for tasks pulls in the exit helpers.

greg
2018-11-27 22:07
exit helpers are bash functions that exit with the correct return code to tell the runner what to do.

greg
2018-11-27 22:08
`exit_stop` will exit the task and stop the runner.

greg
2018-11-27 22:08
`exit_reboot` will exit the task with a success and reboot the machine.

greg
2018-11-27 22:08
`exit_shutdown` will exit the task with success and shutdown the machine.

greg
2018-11-27 22:09
a couple of other useful things that get pull in are `set_param` and `get_param`

greg
2018-11-27 22:10
`set_param "param-name" "value"` or `echo "fred" | set_param "param-name"`

greg
2018-11-27 22:10
`get_param "param-name"`

dave.parker
2018-11-27 22:11
Cool

greg
2018-11-27 22:11
That is useful because it has the `--aggregate` flag set so that it will do the hierarchical lookup for that parameter through machine, profiles, global profile, and defaults.

zehicle
2018-11-27 23:21

jzimmer
2018-11-27 23:50
Will do!

jzimmer
2018-11-28 00:16
@zehicle, that was the issue, when i took over dhcp for the server i forgot to update my dns server

greg
2018-11-28 13:59
- A couple of things - I?m thinking about cutting a release shortly (within a week or so). Other than the usual, move all parts at once, tip seems to be stable and workable. Let me know if there are any concerns.

greg
2018-11-28 14:03
- Second thing - I?m looking at networking features and automation around that. I?d like to poll the crowd on a couple of things. first, what network manager do you us in your OSes by OS. Like systemd-networkd for c7/r7/u18+, ? Or ifupdown for r6/c6, ?. And second what do you use for IP management? spreadsheets, manual assignment, IPAM XXXX, ?, and third (I lied about a couple) what flow do you use for control today. High level answers are fine. Let?s try and use slack threading for this one (or DM me.). Thanks for your participation!

greg
2018-11-28 14:03
Threead here for more comments!!

bagricola
2018-11-28 14:06
RHEL interface files (ifupdown?) for SL6 and C7 (`network` service), we disable NetworkManager and don?t use networkd yet. Not wed to anything in particular, but the interface files seemed less shit compared to `nmcli`. I?d hope we switch to networkd at some point? For IP assignment we use Netbox, some stuff is DHCP?d but most of the active addresses (e.g. in VM or container host clusters) are manually assigned and then read from Netbox during install process.

zdunn
2018-11-28 14:08
ubuntu has some awful network manager now that we are starting to use on 18.04 machines. IPAM is Netbox

dave.parker
2018-11-28 15:14
We pretty much write lovingly hand-crafted /etc/network/interfaces files on Ubuntu systems. IPAM is Netdot

greg
2018-11-28 16:53
@zdunn - I think you should be able to do systemd-networkd on it instead, but I haven?t tried yet.

greg
2018-11-28 16:53
This leads to a different question. Do you care which one you use if it was automated?

zdunn
2018-11-28 16:56
@greg nope. As long as it's automated, we can build a bond, and layer interfaces I don't care

bagricola
2018-11-28 16:59
it might be nice to have multiple options for edge cases, but for the most part I?d prefer not to have to think about interfaces at all and have something else deal with it entirely :smile:

greg
2018-11-28 17:01
so as some of you may have guessed, I?m trying to expand and finish an MVP for our netwrangler feature. I?m currently thinking of three parts to it. One is get network config, this would query something (netbox for example) and build a network plan that is stored in a parameter. This could be skipped and just set the parameter on the machine. The second step would be to write the config files for the OS/style choice. The third step would be to enable the services and restart them.

christian.tardif
2018-11-28 18:32
We're playing mostly with systemd-networkd these days, as it's common to Ubuntu, RedHat and CoreOS. But you're right @greg, if it's automated, we don't care much. Ubuntu is doing great with Netplan, but that's completely useless, thinking at automated stuff.

greg
2018-11-28 18:38
well - we are looking at using the netplan format and a go program to convert that into config files. For us, that leads to easy stages and parameters to drive this stuff in an automated way.

greg
2018-11-28 18:39
hence the three steps.

greg
2018-11-28 18:39
Does that sound reasonable to people?

bagricola
2018-11-28 18:50
I wasn't aware of the netplan format actually, quick read seems promising

greg
2018-11-28 18:54
It seems to be reasonably complete.

bagricola
2018-11-28 19:13
So the idea might be a set of exporters for the various ipam systems to netplan, and then a binary that converts that to interface configs? Seems sane to me

bagricola
2018-11-28 19:14
Rather similar to the tool I wrote internally actually, except i didn't separate those steps into different execution steps

greg
2018-11-28 19:15
That would be the idea.

bagricola
2018-11-28 19:22
Only thing I can see which doesn't quite match up is the renderer setting in the netplan format itself, that'd mean that whatever writes the plan to a param would need to know the renderer that the end system wanted to use

bagricola
2018-11-28 19:22
But if you're writing your own netplan output system then I suppose you can just ignore that field, have it pulled from cli arg or whatever

greg
2018-11-28 19:31
We ignored that right now


greg
2018-11-28 22:55
well first pass anyway.

martin.olsson
2018-11-29 08:19
a bit late answer, been a busy week :open_mouth: , thanks for the comments and tips above @greg and @zehicle for the dhcp and uefi :slightly_smiling_face:. I've managed to provision and boot into OS with two servers with digital rebar so far :tada:

martin.olsson
2018-11-29 08:21
Haven't found any mentioning of it (do appologize if I missed it) , but is Digital Rebar able to provision a machine / device of ARM architecture (for example, a Raspberry Pi ) ?

zehicle
2018-11-29 12:52
Yes ARM. Maybe Pi due to its special pxe approaches. I think Pi was done in community - we don't have any for test.

zehicle
2018-11-29 12:52
There's a meetup recently showing ARM work. It mostly just works

bagricola
2018-11-29 14:29
Hmm? so I?m wondering if I can get drpcli up and running on cumulus linux using a ZTP script. Aiui the sledgehammer image starts a ?sledgehammer? service that extracts some details from the DHCP lease, and then uses that information to download `start-up.sh` from DRP. That?s a script in a `bootenv` which generates a token and outputs the service url, downloads drpcli, and uses it to register the machine and then act as a runner. Given that, it seems like I?d need to create a custom boot env with a custom startup script that does something similar - but since these switches come with an OS preinstalled (or at the very least ONIE, which doesn?t do PXE either), there?s not really a ?boot image? that can be used? does that matter? can a non-machine-specific but still-templated file like start-up.sh be placed somewhere more appropriate than a bootenv?

greg
2018-11-29 14:30
I think I have it down to 4 output formats (not that I?m going to implement all of these). RedHat ifcfg-* files with router friends, Debian /etc/network/interface file, NetworkManager, and systemd/networkd. Input would be http://Netplan.io format.

greg
2018-11-29 14:30
actually, you may not. if you use tip.

greg
2018-11-29 14:31
In tip content, you can do. drpjoin from tools.

bagricola
2018-11-29 14:31
ooh cool. I?ll give it a poke, thanks @greg

greg
2018-11-29 14:31
`curl -fsSL "$1/machines/join-up.sh" | bash --`

greg
2018-11-29 14:32

bagricola
2018-11-29 14:34
holy cow

bagricola
2018-11-29 14:34
worked first time :smile:

greg
2018-11-29 14:47
:slightly_smiling_face:

bagricola
2018-11-29 14:54
is it possible to use template vars in subnet options? e.g. ?{{.ProvisionerURL}}/machines/join-up.sh?

bagricola
2018-11-29 14:54
(have to send option 239 which is classed as the ZTP script)

bagricola
2018-11-29 14:54
url to the ZTP script rather*

greg
2018-11-29 14:54
it might work. :neutral_face:

greg
2018-11-29 14:57
does it default to next boot server if http?. not specified?

bagricola
2018-11-29 15:00
suspect not, there?s no mention of next boot server anywhere in the doco i can find? and ZTP scripts require `CUMULUS-AUTOPROVISIONING` somewhere in the source to trigger the script run

bagricola
2018-11-29 15:00
so i think i have to wrap it anyway

greg
2018-11-29 15:11
@bagricola - I just checked. No, template vars don?t work in options. You can use sprig and a couple of functions, but not the normal task template helpers/

bagricola
2018-11-29 15:16
gotcha

greg
2018-11-29 15:45
@bagricola - can you explain the script requirement?

greg
2018-11-29 15:46
or link

greg
2018-11-29 15:46
I can generally read.

bagricola
2018-11-29 15:46
haha


greg
2018-11-29 15:46
:slightly_smiling_face:

bagricola
2018-11-29 15:46
basically when CL first boots it looks for a ZTP script somewhere (usb, local or DHCP)

bagricola
2018-11-29 15:47
DHCP is option 239, and the script has to have `CUMULUS-AUTOPROVISIONING` in it (usually a comment)

bagricola
2018-11-29 15:47
those are the only requirements

bagricola
2018-11-29 15:47
so machines/join-up.sh is actually perfect, except for not containing the autoprovisioning line

bagricola
2018-11-29 15:49
the only other problem is having to provide an absolute URL to the script to run via DHCP, but don?t think thats a significant issue as I have to specify the subnet settings manually anyway

greg
2018-11-29 15:50
Yeah - it looks like a comment.

greg
2018-11-29 15:53
This might hang the ztp service or manual script.

greg
2018-11-29 15:53
but ?

bagricola
2018-11-29 15:55
ahh yes i guess ztp is expecting the script to exit eventually

greg
2018-11-29 15:55
Though - you could do that? in a task.

greg
2018-11-29 15:56
Install the runner as a server from the task library, start it, and then stop the current runner.

greg
2018-11-29 15:56
That is really strange, but okay.

greg
2018-11-29 15:56
our I can make another change as well. thinking about it too. Back in a second.

zehicle
2018-11-29 16:00
omg... adding a COMMENT allows Cumulus ZTP?! that's crazy

greg
2018-11-29 16:01
@bagricola - if you update to tip content, the join script now has the comment.

bagricola
2018-11-29 16:01
haha :smile:

bagricola
2018-11-29 16:01
well adding a comment and a DHCP option, but close enough

bagricola
2018-11-29 16:04
i tried to update to tip but sopmething has broken authenticating downloads of our content pack from gitlab, so i?ll have to fix that first

bagricola
2018-11-29 16:04
thanks though @greg I?ll give it a shot soon :smile:

greg
2018-11-29 16:05
np - keep me informed.

bagricola
2018-11-29 16:53
interesting

bagricola
2018-11-29 16:53
`[82:5]Failed to render option 239: http://10.10.254.1:8091/machines/join-up.sh, strconv.Atoi: parsing "http://10.10.254.1:8091/machines/join-up.sh": invalid syntax`

bagricola
2018-11-29 16:56
ahh, default (unknown) option type is assumed to be a list of integers, I guess?

greg
2018-11-29 17:01
probably

bagricola
2018-11-29 17:41
yeah can?t work around that :slightly_smiling_face: so close, haha

bagricola
2018-11-29 17:41
home time

greg
2018-11-29 17:42
yeah - I?m looking at it.

greg
2018-11-29 18:00
@bagricola - there is an answer that is not user friendly.

greg
2018-11-29 18:01
Take the `http://10.10.254.1:8091/machines/join-up.sh` and convert it to a comma seperated string of byte numbers.

greg
2018-11-29 18:10
I?m testing it.

greg
2018-11-29 18:13
This worked for me- painful but still

greg
2018-11-29 18:13

greg
2018-11-29 18:14
edit tmp to convert all spaces to between items into `,` then join the lines into a single line and then remove the `,10` at the end

greg
2018-11-29 18:14
Then use that string as the value for the 239 option.

greg
2018-11-29 18:14
@bagricola - I?m working on thing to make this better.

michael.harp
2018-11-29 20:22
Does drp support ?fips? mode during install with RHEL? ie, `fips=1 kernel option`

greg
2018-11-29 20:23
not sure.

greg
2018-11-29 20:23
@michael.harp - is it just passing the option?

greg
2018-11-29 20:23
on the kernel line?

greg
2018-11-29 20:25
there is the `kernel-console` parameter that you can be used to set extra kernel parameters (originally for console options).

michael.harp
2018-11-29 20:25
Not sure really, just heard about the requirement this morning and the RH doc is kinda amusing, https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/security_guide/chap-federal_standards_and_regulations

michael.harp
2018-11-29 20:26
> Ensure that the system has plenty of entropy during the installation process by moving the mouse around or by pressing many keystrokes. The recommended amount of keystrokes is 256 and more. Less than 256 keystrokes might generate a non-unique key.

greg
2018-11-29 20:27
yeah - that is funny!

greg
2018-11-29 20:27
So, I think you could do `fips=1` in the `kernel-console` parameter during the redhat install and it would do what you want. Haven?t tested it though.

michael.harp
2018-11-29 20:28
will it it a try but my guess is there will be tears

shane
2018-11-29 20:31
Ah yes - the "Federally approved algorithms" ... the ones they have the keys to already ....

greg
2018-11-29 20:42
@michael.harp - I just tried an install with kernel-console set to fips=1 on Centos and it seems to report that fips is now enabled in the post installed system. I don?t know real that is, but ?

michael.harp
2018-11-29 21:27
Nice, thanks for trying that

sramirez
2018-11-29 21:48
Hi All, I'm having issues installing ubuntu 14 from the supplied bootenv

sramirez
2018-11-29 21:49
I saw a while ago someone had an identical issue, and rolled back community contrib and content, but I can't seem to figure out how to roll back 'drp-community-content'

zehicle
2018-11-29 21:55
@sramirez there's an switch option on the UX to show all content layers from the content management page. You can also just transfer it again from library. Be sure to pick the version you need.

sramirez
2018-11-29 22:03
Thanks @zehicle I don't seem to have the option to roll back the drp-community-content version from the UX. am I missing something?

zehicle
2018-11-29 22:13
hmmm, you'll need to re-import it

florent.wagener
2018-11-30 02:08
systemd-networkd, IPAM is Orion/Netbox

zehicle
2018-11-30 04:41
That one contact packet is special

bagricola
2018-11-30 11:26
so using that horrendous :slightly_smiling_face: `od` hack from above I got cumulus to at least attempt to boot from url

bagricola
2018-11-30 11:28
it fails however, because of `ZTP DHCP: URL Error <urlopen error [Errno 101] Network is unreachable` which is weird because? how can it have gotten option 239 from dhcp with an IP and then not have network access? :thinking_face: Not an issue with DRP tho so further debugging required my side

greg
2018-11-30 14:19
okay - so if you update to tip drp @bagricola - you can do this instead, `string:http://192.168.100.10:8091/machines/join-up.sh`

greg
2018-11-30 14:19
Also update tip content I add more fixes to the join-up.sh script.

greg
2018-11-30 14:20
Make sure you used the proper ip in the od command.

john
2018-12-01 01:37
evening.. doing our first test with live systems on a real network. DRP is running inside docker so it sees itself with a 172 ip. Our dhcp is forwarded using dhcp helper. I was able to add the next server option so it does the tftp correctly. But then it goes and tries to pull the ipxe template from the 172 ip. Hoping this is a simple setting to tell the server ip.

john
2018-12-01 02:00
going to guess it's this `--static-ip= IP address to advertise for the static HTTP file server (default: 192.168.124.11)`

greg
2018-12-01 02:15
Yes and ??force-static-ip?

john
2018-12-01 02:59
does that work if dr-provision doesn't see the ip as one of the interfaces? This container is essentially nat'd with all the ports exposed. Or is it expected to be bridged?

greg
2018-12-01 02:59
yes - it is only for outbound messages

john
2018-12-01 03:01
ok, then it should work. i'll do some digging on my side.. most likely an issue in how i'm kicking off the docker container

john
2018-12-01 03:02
running dr-provision on the host works just fine, just the ip address of the static ip server being advertised (when in docker)

greg
2018-12-01 03:02
You may have to expose ports or something like that

greg
2018-12-01 03:02
Did you check the faq?

greg
2018-12-01 03:03
I think we usually recommend a `--net host` container

greg
2018-12-01 03:04
Others may run it differently. I know I could of others around here run as containers as well.

greg
2018-12-01 03:05
I?m not sure what they are doing.

john
2018-12-01 03:05
i'm seeing that here. Mine is not running with host networking (just exposed ports), so I will try that.

greg
2018-12-01 03:05
`tftp` has always been a challenge because if its ancient back port scheme

greg
2018-12-01 03:07
sorry I may have lied to you;

greg
2018-12-01 03:08
here are the options

greg
2018-12-01 03:08
``` --static-ip= IP address to advertise for the static HTTP file server --force-static Force the system to always use the static IP. ```

greg
2018-12-01 03:08
That is all you should need.

greg
2018-12-01 03:08
If it was already working to the point in your screen shot, this should be all you need.

john
2018-12-01 03:11
cool. right, it seems like essentially ProvisionerUrl just needs to match

john
2018-12-01 03:11
chain {{.ProvisionerURL}}/${netX/mac}.ipxe && exit || goto chainip :chainip chain {{.ProvisionerURL}}/${netX/ip}.ipxe && exit || goto sledgehammer

greg
2018-12-01 03:11
yes - provisioner url is populated automagically based upon incoming interface.

greg
2018-12-01 03:12
`--static-ip=x.y.z.a` is used as the fall through if DRP can?t figure out the interface the packet came in from.

greg
2018-12-01 03:12
This is really raw or timing in our cache system.

greg
2018-12-01 03:12
but it will be 100% wrong in a NATed docker container.

greg
2018-12-01 03:12
the cache that is.

greg
2018-12-01 03:13
so the `--force-static` says always use the `--static-ip=x.y.z.a` because you can?t tell.

john
2018-12-01 03:19
giving it a shot, waiting on reboot

john
2018-12-01 03:21
hmm.. i saw a way to test render templates somewhere in the docs

zehicle
2018-12-01 03:21
It's a command line option on the cli

zehicle
2018-12-01 03:21
You have to be in the right bootenv

greg
2018-12-01 03:22
for the default.ipxe, you can do: `curl http://<drp ip>:8091/default.ipxe`

shane
2018-12-01 03:38
$faq


shane
2018-12-01 03:38
It's in there, look for "render" templates

john
2018-12-01 03:42
without --net host, i get the docker ip. but it does seem to work with --net host ``` localhogenj@metal35-r01:/tmp/drp$ sudo docker run -d --volume=drprovison_data:/provision/drp-data:rw -p 8091:8091 digitalrebar/provision --static-ip=10.130.194.5 --force-static be3fe3f7b501b96734b19755ab8ba4728430ce0acdca6ea9f39e053f2afdc769 localhogenj@metal35-r01:/tmp/drp$ curl http://localhost:8091/default.ipxe #!ipxe chain http://172.17.0.7:8091/${netX/mac}.ipxe && exit || goto chainip :chainip chain http://172.17.0.7:8091/${netX/ip}.ipxe && exit || goto sledgehammer :sledgehammer chain http://172.17.0.7:8091/${builtin/buildarch}.ipxe ``` I get some interesting dynamics with `--net host` ``` localhogenj@metal35-r01:/tmp/drp$ sudo docker run -d --volume=drprovison_data:/provision/drp-data:rw --net host digitalrebar/provision --static-ip=10.30.194.5 --force-static 35c4c2915e932d205cf447af704d628452f169ff846b25321ab33524aa6f11b3 localhogenj@metal35-r01:/tmp/drp$ curl http://10.30.194.5:8091/default.ipxe #!ipxe chain http://10.30.194.5:8091/${netX/mac}.ipxe && exit || goto chainip :chainip chain http://10.30.194.5:8091/${netX/ip}.ipxe && exit || goto sledgehammer :sledgehammer chain http://10.30.194.5:8091/${builtin/buildarch}.ipxe localhogenj@metal35-r01:/tmp/drp$ curl http://localhost:8091/default.ipxe #!ipxe chain http://[::1]:8091/${netX/mac}.ipxe && exit || goto chainip :chainip chain http://[::1]:8091/${netX/ip}.ipxe && exit || goto sledgehammer :sledgehammer chain http://[::1]:8091/${builtin/buildarch}.ipxe localhogenj@metal35-r01:/tmp/drp$ curl http://10.25.196.69:8091/default.ipxe #!ipxe chain http://10.25.196.69:8091/${netX/mac}.ipxe && exit || goto chainip :chainip chain http://10.25.196.69:8091/${netX/ip}.ipxe && exit || goto sledgehammer :sledgehammer chain http://10.25.196.69:8091/${builtin/buildarch}.ipxe ```

greg
2018-12-01 03:44
do `ps auxwww | grep dr-provision`

greg
2018-12-01 03:44
are the options actually making it?

greg
2018-12-01 03:44
What is the version of that container?

john
2018-12-01 03:45
doesn't look like it ``` localhogenj@metal35-r01:/tmp/drp$ ps auxwww | grep dr-provision root 17280 0.3 0.0 4292 756 ? Ss 03:44 0:00 /bin/sh -c dr-provision --base-root=/provision/drp-data --local-content= --default-content= --static-ip=10.30.194.5 --force-static root 17314 19.7 0.0 122948 103612 ? Sl 03:44 0:01 dr-provision --base-root=/provision/drp-data --local-content= --default-content= ```

greg
2018-12-01 03:47
hmmm

greg
2018-12-01 03:49
try this ,please: `sudo docker run --entrypoint=dr-provision digitalrebar/provision --version`

greg
2018-12-01 03:49
I think

john
2018-12-01 03:51
that works. ``` $ sudo docker run --entrypoint=dr-provision digitalrebar/provision --version dr-provision2018/12/01 03:50:50.701841 Version: v3.11.0-tip-78-641cca862b5711f61009e77a40d586d67233bb30 ```

greg
2018-12-01 03:51
hmm

greg
2018-12-01 03:51
without the `--entrypoint`

greg
2018-12-01 03:52
hmm

greg
2018-12-01 03:52
nvm

greg
2018-12-01 03:52
something is busted.

greg
2018-12-01 03:52
in the container startup.

greg
2018-12-01 03:52
not completely sure.

greg
2018-12-01 03:52
anyway, you can do this:

greg
2018-12-01 03:52
`sudo docker run --entrypoint=dr-provision digitalrebar/provision --base-root=/provision/drp-data --local-content= --default-content= --static-ip=10.30.194.5 --force-static`

john
2018-12-01 03:54
:+1: that works

john
2018-12-01 03:54
``` $ curl http://localhost:8091/default.ipxe #!ipxe chain http://10.30.194.5:8091/${netX/mac}.ipxe && exit || goto chainip :chainip chain || exit ```

john
2018-12-01 03:55
(well i have a shortened form without the base-root, but the static-ip bit works)

greg
2018-12-01 03:57
Dockerfile ENTRYPOINT is doing something strange?

greg
2018-12-01 03:57
to me

greg
2018-12-01 04:02
I think I see what I need to do to ?fix? it.

john
2018-12-01 04:03
thanks for the help. The option was correct, just getting it through docker was the problem.

greg
2018-12-01 04:04
Yeah - the Dockerfile we use to build the container needs another element to let additional options in without the entrypoint part.

greg
2018-12-01 04:04
You should be able to do: `sudo docker run digitalrebar/provision --version`

greg
2018-12-01 04:04
and get the version.

greg
2018-12-01 04:16
okay - much better - new container coming with fixes eventually - probably by mrning.

greg
2018-12-01 04:40
@john - `docker pull digitalrebar/provision`

greg
2018-12-01 04:41
the options will now pass in correctly.

john
2018-12-01 04:44
Thanks Greg. I will check it out.