bagricola
2018-12-03 15:02
@greg `string:` changes to subnet options works btw, I just updated a CL switch serving a file over DHCP :tada:

jzimmer
2018-12-03 17:15
i uploaded a new boot iso via web page, where would this file be saved to once the upload is done?

shane
2018-12-03 17:16
in the `tftpboot/isos/` directory - if in `isolated` mode, this will be in `~/drp-data` base from where you did the install, or in `production` mode will be (by default) in `/var/lib/dr-provision` base directory

jzimmer
2018-12-03 17:20
ok, thanks

jzimmer
2018-12-03 17:23
found it :smile:

shane
2018-12-03 17:24
it should also be "exploded out" in to the `tftpboot/<BOOTENV_NAME>` directory too (eg `tftpboot/ubuntu-18.04/`)

brejoc
2018-12-03 17:54
has joined #community201812

shane
2018-12-03 17:54
@brejoc $welcome ...

2018-12-03 17:54
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

brejoc
2018-12-03 17:55
@shane Thanks!

zehicle
2018-12-04 22:25
Video from today's Meetup! We talked about API extensions and Cumulus ZTP https://youtu.be/QWV10HPOX_M

shane
2018-12-04 22:26
^^^^ and Adhoc Object creation/usage and scenarios for storing information in objects on DRP

josh.knarr
2018-12-04 22:26
I forgot I was subscribed to this slack. Anyone have any interest in kubernetes? I kinda want to make a project to extend the clusterAutoScaler to be able to speak DRP

zehicle
2018-12-04 22:27
we're doing a fair bit w/ it

zehicle
2018-12-04 22:27
you'll want to play w/ the RackN pooling plugin for that - it will allow single command create destroy.

shane
2018-12-04 22:28
doc on KRIB (Kubernetes Rebar Integrated Bootstrap) solution: https://provision.readthedocs.io/en/tip/doc/content-packages/krib.html


josh.knarr
2018-12-04 22:35
OK so it doesn't integrate with the clusterAutoscaler :confused:

josh.knarr
2018-12-04 22:35
but krib looks neat

shane
2018-12-04 22:36
having clusterAutoscaler integrated would be super cool - it shouldn't be too hard - to at least scale a KRIB built cluster - since all of the tooling and workflow would be in place to make the API calls to do it

shane
2018-12-04 22:37
since DRP is 100% API driven and it's a clean API - it should theoretically be "easy" :slightly_smiling_face: (but I'm not a software engineer so ... )

josh.knarr
2018-12-04 22:37
Ha

josh.knarr
2018-12-04 22:37
Yeah I took a quick look at it and it doesn't quite line up 1:1 but it should be pretty easy. I also am not a software guy.

josh.knarr
2018-12-04 22:37
Well, I'm reluctant devops I suppose. :wink:

zehicle
2018-12-04 22:38
the pooling plugin provides the 1 API call operation for that

zehicle
2018-12-04 22:39
we were looking at Loodse's machine controller: https://github.com/kubermatic/machine-controller/

zehicle
2018-12-04 22:39
which requires "create" "status" and "destroy" ops. Which map cleanly to the pooling plugin

shane
2018-12-04 22:39
Hmmmm ..... :slightly_smiling_face:

zehicle
2018-12-04 22:40
the difference & advantage w/ the pooling api is that you are basically starting workflows to create/destroy so they are easily extensible by the operator

zehicle
2018-12-04 22:41
otherwise, you have to code all that stuff into the caller like we did w/ the terraform provider (which we are eventually going to rewrite to use the pool API)

josh.knarr
2018-12-04 22:41
TF = ugh

zehicle
2018-12-04 22:41
yeah, that's why we moved the logic into the API

josh.knarr
2018-12-04 22:41
My use case is we're a rancher shop for on-prem for "data that can't leave the house" and aside of the total lack of documentation around the API it's pretty darn slick

zehicle
2018-12-04 22:41
which also improves visibility

josh.knarr
2018-12-04 22:42
so now we have to figure out how to make the on-prem stuff work like the cloud

zehicle
2018-12-04 22:42
that's pretty much the definition of rebar

josh.knarr
2018-12-04 22:43
yup

josh.knarr
2018-12-04 22:43
I'm dipping my toe in

shane
2018-12-04 22:43
^^^ thinks I just fell in love with @josh.knarr

josh.knarr
2018-12-04 22:54
I used to like TF but they totally screwed the pooch on the provisioner thing. I like how sparkleformation does it a lot better. It's not perfect, but it gets closer to the "load balancers should behave the same way across clouds" abstraction that was originally sold in terraform. :confused:

shane
2018-12-04 23:12
There is so much wrong w/ TF ... it's "ok" for a "fire and forget" for basic provisioning ... but lord help you if you actually want to use it in a production environment for ongoing operational control....

adam.lemanski
2018-12-05 03:20
hi, finally I've got the chance to start with drp for our future kubernetes cluster. I've got 10 dell servers, one is taking care of the provisioning of the other 9(3 master; 6 worker planned). I'm facing some issues with a simple discovery flow. dhcp is configured with reservations by MAC which for the initial ipxe boot works fine but it seems somehow sledgehammer has some connection issues. The most weird thing for me is that if I use the debug shell to execute the wget to the provisioning host, it works :confused: any ideas? btw, how to properly register an organisation? for now I just see `This Organization cannot manage licenses.`

2018-12-05 03:20
Time to feed the :bear:!

adam.lemanski
2018-12-05 03:26
I guess it has something to do with the `udhcp: no lease, forking to background`, race condition?

greg
2018-12-05 03:26
@adam.lemanski - do you have port-delay turned on for your switch ports?

adam.lemanski
2018-12-05 03:27
<--- not a network guru, but checking

greg
2018-12-05 03:28
Well if it is faster to test, then you can try this.

greg
2018-12-05 03:28
in the global profile, add a parameter - `kernel-console`

greg
2018-12-05 03:28
set it?s value to `provisioner.portdelay=30`

greg
2018-12-05 03:29
If you have cli access, you can do it like this: `drpcli profiles set global param kernel-console to provisioner.portdelay=30`

greg
2018-12-05 03:29
Then try and boot your server.

adam.lemanski
2018-12-05 03:30
added via cli, rebooting

greg
2018-12-05 03:30
This assumes that you aren?t setting `kernel-console` somewhere else :slightly_smiling_face:

adam.lemanski
2018-12-05 03:30
fresh drp setup

adam.lemanski
2018-12-05 03:30
nothing custom so far

adam.lemanski
2018-12-05 03:31
hoping to use krib here :slightly_smiling_face:

zehicle
2018-12-05 03:31
@adam.lemanski don't worry about your organization just yet. You'll need it when you want a trial license for advanced features

adam.lemanski
2018-12-05 03:32
I would be happy to support this product by buying the dell support content pack etc

2018-12-05 03:32
Time to feed the :bear:!

greg
2018-12-05 03:33
@adam.lemanski - let?s get you working first. We can talk about that off-line later. :slightly_smiling_face:

adam.lemanski
2018-12-05 03:34
looks like 30sec are not enough?

greg
2018-12-05 03:35
usually that should be fine.

greg
2018-12-05 03:35
hmm - thought maybe not.

greg
2018-12-05 03:35
You can changed it to `120` and see. That should definitely do it.

greg
2018-12-05 03:35
oh - you can also just exit.

greg
2018-12-05 03:36
It should loop and try again.

adam.lemanski
2018-12-05 03:36
testing with 120sec now

adam.lemanski
2018-12-05 03:46
sorry, got distracted by a colleague. still same issue

adam.lemanski
2018-12-05 03:47
which is very weird

greg
2018-12-05 03:47
Try to exit the ash shell

adam.lemanski
2018-12-05 03:47
exiting let it continue

adam.lemanski
2018-12-05 03:47
got the login prompt

adam.lemanski
2018-12-05 03:48
and the machine is available in the portal

greg
2018-12-05 03:49
okay hmmm ? Do you have ipv6 SLAAC enabled?

greg
2018-12-05 03:49
autoconf

adam.lemanski
2018-12-05 03:51
all my network devices use ipv4 only, no active ipv6 config at all

greg
2018-12-05 03:52
ok

greg
2018-12-05 03:52
thinking

adam.lemanski
2018-12-05 03:54
if it helps. I use ? UniFi Switch 24 POE-250W ? UniFi Security Gateway 4P ? UniFi Cloud Key (Controller Software)

adam.lemanski
2018-12-05 03:55
on the gateway dhcp is off for this network that the server are part of, ports are restricted to use the networks that have no dhcp

greg
2018-12-05 03:55
What DHCP server are you using?

adam.lemanski
2018-12-05 03:55
drp

adam.lemanski
2018-12-05 03:56
drp is started as docker container with host network using a host path to store its data (your docker image)

greg
2018-12-05 03:56
I don?t know that tool set.

greg
2018-12-05 03:56
oh

adam.lemanski
2018-12-05 03:57
drp = digital rebar provision

greg
2018-12-05 03:57
did you start it with `--force-static` and `--static-ip=<ip on machine>`

greg
2018-12-05 03:57
umm - that one I know about :wink:

greg
2018-12-05 03:58
I mean the UniFi stuff

greg
2018-12-05 03:58
oh - host networking should make that fine, I guess.

adam.lemanski
2018-12-05 03:58
no static parameter added to start drp

greg
2018-12-05 03:59
well if you are using a `--net host` container then you don?t need them

adam.lemanski
2018-12-05 03:59
starting drp via ansible: ``` - name: start dr-provision container docker_container: name: dr-provision image: digitalrebar/provision network_mode: host volumes: - /data/dr-provision/drp-data:/provision/drp-data ```

adam.lemanski
2018-12-05 04:00
ports are wonderful reachable, host networking works fine

greg
2018-12-05 04:02
Is DRP local to the machine or are the packets traversing the gateway?

greg
2018-12-05 04:03
should be local from the IPs I saw.

adam.lemanski
2018-12-05 04:03
direct

adam.lemanski
2018-12-05 04:04
pet1 = 172.16.80.91 master1 = 172.16.80.92 both on the same switch

adam.lemanski
2018-12-05 04:04
in a bigger network 172.16.80.0/20

adam.lemanski
2018-12-05 04:04
(but mostly empty)

greg
2018-12-05 04:05
Is the machine dual homed or bonded nics?

adam.lemanski
2018-12-05 04:05
nope

adam.lemanski
2018-12-05 04:05
single nic

greg
2018-12-05 04:06
firewall rules on the docker host?

adam.lemanski
2018-12-05 04:07
not now, any to any is allowed

adam.lemanski
2018-12-05 04:07
minimal ubuntu 18.04.1 installation

greg
2018-12-05 04:08
shouldn?t matter, but haven?t tried that one recently.

greg
2018-12-05 04:08
mostly been using my mac and centos7 .

greg
2018-12-05 04:09
without docker.

greg
2018-12-05 04:09
hmm - check apparmor log??

greg
2018-12-05 04:10
another thing would be to check the DRP log, but I don?t expect much there

adam.lemanski
2018-12-05 04:11
tailing via docker logs the whole time :confused: 0 related, but all log levels are set to `warn`

adam.lemanski
2018-12-05 04:12
apparmor has nothing since the last reboot

adam.lemanski
2018-12-05 04:12
my pet1 itself has ipv6 enabled

greg
2018-12-05 04:12
okay - something is fishy. Not sure what though. Shouldn?t matter. It should only try ipv4 or ipv6. It appears to be doing IPv4.

adam.lemanski
2018-12-05 04:15
possible that I messed up something via the drp-community-content pack? I changed to tip since centos was outdated

adam.lemanski
2018-12-05 04:15
thinking of wiping the whole config of drp

adam.lemanski
2018-12-05 04:16
yesterday when I started I just tried the krib live cluster and it seem to work, now I just want them to wait in discovery...

greg
2018-12-05 04:17
hmm - do you have drp tip?

greg
2018-12-05 04:17
Well - you need tip drp if you have tip content.

adam.lemanski
2018-12-05 04:17
latest docker image

greg
2018-12-05 04:17
okay - all good

greg
2018-12-05 04:18
Did this work before?

adam.lemanski
2018-12-05 04:18
krib live cluster worked (without dhcp reservations)

greg
2018-12-05 04:19
`drpcli bootenvs show discovery` see if it is available and validated.

greg
2018-12-05 04:19
hmmmmmmmmm.

adam.lemanski
2018-12-05 04:19
` "Validated": true`

adam.lemanski
2018-12-05 04:20
I think something is broken...if I click on any bootenv via the ui, it doesn't show any information about it

greg
2018-12-05 04:20
`drpcli profiles remove global param kernel-console`

greg
2018-12-05 04:21
switch your url from `http://portal.rackn.io` to `http://tip.rackn.io`

greg
2018-12-05 04:21
you may have to revalidate certs.

greg
2018-12-05 04:21
I?m pretty sure it isn?t the portdelay now.

greg
2018-12-05 04:21
Can you show me your reservations?

adam.lemanski
2018-12-05 04:22
via tip I see the details

greg
2018-12-05 04:22
The ux needs to be updated to support the new arch support.

greg
2018-12-05 04:23
okay I feel like you are going to ask me some annoying questions in the future. But those can wait.

adam.lemanski
2018-12-05 04:23
haha

greg
2018-12-05 04:24
Did you clear the kernel-console variable?

adam.lemanski
2018-12-05 04:24
yes

greg
2018-12-05 04:24
remove the reservation for the machine you are booting and reboot it.

greg
2018-12-05 04:24
if that is safe to do

greg
2018-12-05 04:25
you can also check the leases and see that it matches

adam.lemanski
2018-12-05 04:25
removed reservation and lease

greg
2018-12-05 04:25
I meant for you to keep the lease, but that is okay.

adam.lemanski
2018-12-05 04:25
no one else in that network :slightly_smiling_face:

greg
2018-12-05 04:25
The lease would have made sure it gets the same address.

greg
2018-12-05 04:25
okay

adam.lemanski
2018-12-05 04:27
same issue

greg
2018-12-05 04:27
hmm - okay -

greg
2018-12-05 04:28
in the ux, goto the `info & preferences`

adam.lemanski
2018-12-05 04:28
got the same ip

greg
2018-12-05 04:28
set the DHCP log to `debug` and `save`

greg
2018-12-05 04:29
then reboot the system and let see what the timing of the messages are

adam.lemanski
2018-12-05 04:29
rebooting

adam.lemanski
2018-12-05 04:34

adam.lemanski
2018-12-05 04:35
I made to empty lines after the pxe boot before sledgehammer start

adam.lemanski
2018-12-05 04:36
looks like a 60sec timeout

greg
2018-12-05 04:56
64 seconds feels magical.

greg
2018-12-05 04:57
It is almost like something is holding off the packets.

greg
2018-12-05 04:57
I?m not sure. I?m off to bed.

adam.lemanski
2018-12-05 05:22
OK, I will play around more. Seems the bangkok timezone is not helping xD

shane
2018-12-05 05:28
Any chance you can run DRP in isolated mode on the host to eliminate the container piece?

adam.lemanski
2018-12-05 05:34
will try after I tried to wipe and start over

adam.lemanski
2018-12-05 07:41
tested using the binary directly without container, same issue

adam.lemanski
2018-12-05 08:25
something is really weird, with my setup or the content packagess... obviously https://github.com/digitalrebar/provision-content/blob/tip/content/bootenvs/centos-7.yml#L9 contains all iso related information but if I take a look in the bootenv:

adam.lemanski
2018-12-05 08:25
same for centos-7.6.1810-install

greg
2018-12-05 14:17
@adam.lemanski - that is the state of the UX right now.

greg
2018-12-05 14:18
if you look at the json view, it should show all the arches.

b.quan
2018-12-05 16:46
I'm trying UEFI pxe boot with Mellanox ConnectX-5 nic, but ran into the following error:

b.quan
2018-12-05 16:47

b.quan
2018-12-05 16:47
Is it looking for a grub config file for uefi pxe booting?

greg
2018-12-05 16:54
it is look for elilo.conf which I don?t think we use anymore. We found the elilo was having problems. soo? I think you should try `ipxe.efi` instead of `bootx64.efi`

greg
2018-12-05 16:54
We need to clean those docs - especially for `bootx64.efi`

greg
2018-12-05 16:54
@b.quan

b.quan
2018-12-05 17:01
@greg should I simply rerun the drpcli option 67 setting to use ipxe.efi?

greg
2018-12-05 17:04
yes

b.quan
2018-12-05 17:04
thanks, will try that

b.quan
2018-12-05 17:19
@greg That was it! Time to clean up the instructions above - could have saved me a whole day :slightly_smiling_face:

greg
2018-12-05 17:23
actually - not sure you need it at all anymore. Did you try it without?

greg
2018-12-05 17:23
option 67 set at all?

greg
2018-12-05 17:24
@b.quan - because the dhcp server does that by default now. I think.

b.quan
2018-12-05 17:26
I did not actually try without option 67 set, as I thought that was required from reading the doc. I'll give it a try too. @greg how to unset option 67?

greg
2018-12-05 17:27
`drpcli subnets set <subnetname> option 67 to null`

greg
2018-12-05 17:27
I think

b.quan
2018-12-05 17:31
cool

b.quan
2018-12-05 18:06
@greg yes it works without actually setting option 67

b.quan
2018-12-05 18:32
@greg after the server is provisioned with the stock hwe-ubuntu16 workflow with access-keys being set, I tried to ssh to the server but it still ask for password which I don't have. Any idea why access-key does not work?

shane
2018-12-05 18:37
can you provide the output of the actual "`access-key`" param - feel free to snip a portion of the public key out for privacy, or DM it to me ...

b.quan
2018-12-05 18:41
sure, I'll DM to you

shane
2018-12-05 19:26
so - @b.quan that looks correct - a couple of things to check: ? if you can login on the console - check that `/root/.ssh/authorized_keys` was written correctly with your key ? verify you're using the correct private key half when attempting to connect ? it's the `root` user account you should be connecting to on the newly built machine

b.quan
2018-12-05 19:34
Thanks @shane I'll check these and get back to you if I'm still stuck

ben.le
2018-12-05 21:05
Gents, how to upgrade the latest drp-community-content on dr-provision server?

greg
2018-12-05 21:05
go to the content packages page in the UX, select tip or version for the content pack to update. Click upgrade

ben.le
2018-12-05 21:06
Thanks @greg

ben.le
2018-12-05 21:11
I got the following error message when upgrading the content

ben.le
2018-12-05 21:11
Content Upload Failed: PUT Unable to load root templates: template: :276: function ?upper? not defined

greg
2018-12-05 21:14
If you are trying to move to tip content versions, you need tip DRP version

ben.le
2018-12-05 21:15
got it

b.quan
2018-12-05 21:55
@shane after I changed the user name to be "root" in the access-keys param, it worked

krishan.sharma
2018-12-06 01:22
has joined #community201812

shane
2018-12-06 01:27
@krishan.sharma $welcome :slightly_smiling_face:

2018-12-06 01:27
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

greg
2018-12-06 05:28
- Krib tip is updated to address the most recent Kubernetes security alert and support/handle the latest centos 7 update.

kamp.scott
2018-12-06 05:31
@greg kubernetes security advisory?


bencode14395980
2018-12-06 07:13
has joined #community201812

bagricola
2018-12-06 16:08
woo, ztp script execution works as well

christopher_wood
2018-12-06 16:11
Hello, doing well with this so far but can't figure out where the .Env things are created/stored. {{.Env.BootParams}}, {{.Env.Name}}, {{.Env.InstallUrl}}. For that last one I'm seeing... Dynamic file error for /pxelinux.cfg/01-00-50-56-b9-07-db: template: :191:11: executing "default-pxelinux.tmpl" at <.BootParams>: error calling BootParams: template: machine:1:60: executing "machine" at <.Env.InstallUrl>: error calling InstallUrl: No install repository available ...so obviously not getting something right.

shane
2018-12-06 16:12
@bencode14395980 $welcome

2018-12-06 16:12
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

shane
2018-12-06 16:14
@christopher_wood those template expansion pieces are "magic" - they are documented in the "Architecture Reference" pages, under "Provisioning Models": https://provision.readthedocs.io/en/tip/doc/arch/provision.html

greg
2018-12-06 16:15
The .Env are bootenv parts

christopher_wood
2018-12-06 16:15
Aha, reading.

shane
2018-12-06 16:15
is this for CentOS 7 you see the error

shane
2018-12-06 16:16
BTW - I don't think that doc will answer your question - but just give you some background on the pieces

christopher_wood
2018-12-06 16:16
Yes. It all worked out of the box with the provision-content repo, now I'm trying to do a custom build. I'm going to read the background first, that will likely illuminate this.

shane
2018-12-06 16:24
Do you mean use custom repos when you say a "custom build" ?

christopher_wood
2018-12-06 16:26
I branched provision-content and am trying to use a custom centos 7 kickstart because we're "special".

shane
2018-12-06 16:27
what are you trying to do in the Kickstart ? We generally recommend to use Workflow (Stages/Tasks/Templates) to accomplish customization - since the actions are a lot more transparent (fully logged) on the DRP endpoint side

shane
2018-12-06 16:28
anything that goes in to a kickstart/preseed will be hard to debug/troubleshoot when they break

shane
2018-12-06 16:29
also - you don't need to branch all of the drp-community-content - you should be able to accomplish surgical changes by cloning the appropriate parts

shane
2018-12-06 16:29
and we also specifically have a `select-kickseed` mechanism to inject a completely unique Kickstart in to community content with out having to clone

christopher_wood
2018-12-06 16:29
Custom yum repos and network installation, new root password, templated partitions, optional network interfaces, packages excluded, whole list of things.

christopher_wood
2018-12-06 16:30
Hm, such a newb. Going to read more about select-kickseed too.

shane
2018-12-06 16:31
all of those things can be accomplished via modifying Params which you put in a Profile (group of Params) and attach to a machine (or you can put params directly on Machine - but usually best to group things together in a profile)

christopher_wood
2018-12-06 16:31
That does sound more productive than where I'm currently going, yes.


shane
2018-12-06 16:34
etc...

christopher_wood
2018-12-06 16:34
Not meaning to sound snarky except in a self-directed fashion. I could swear I've read over this stuff and it obviously hasn't sunk right in yet.

shane
2018-12-06 16:34
the last link has (minimal - could be LOTS better) doc on the pieces and parts of drp-community-content content pack

shane
2018-12-06 16:35
yeah - we know - we need to do some clean up and re-org on the Doc - there's a lot there, it just isn't quite as newbie friendly as it could be

shane
2018-12-06 16:35
generally speaking - check the "Content Packages & Plugins" from the table of contents for clues (eg the "drp-community-content" pack)

shane
2018-12-06 16:35
and the Architecture Reference for more advanced stuff

christopher_wood
2018-12-06 16:36
It all makes sense once I understand it, and it's all quite well built. Need to bend my brain around it all.

christopher_wood
2018-12-06 16:37
*around it is all

ben.le
2018-12-06 16:58
when attempting to install debian 9; it displays ?No kernel modules were found?.

ben.le
2018-12-06 17:00
It also displays ?This probably is due to a mismatch between the kernel used by this version of the ??

greg
2018-12-06 17:00
Make sure you have the latest iso uploaded. It may be trying to talk to the internet to get the kernel.

ben.le
2018-12-06 17:02
I got the latest iso 9.6 version and the one matching with bootenv ?version 9.5?, the issue still exists on both of versions

greg
2018-12-06 17:04
did you change anything? Update content package? Update the iso?

ben.le
2018-12-06 17:05
I just updated the iso only

greg
2018-12-06 17:05
So, you have to update the bootenv or it won?t be available.

greg
2018-12-06 17:06
Check the bootenv in the ux and see if it is available.

ben.le
2018-12-06 17:06
the new iso ?9.5? version is matching with the bootenv

greg
2018-12-06 17:07
So, does the bootenv report as valid and available?

ben.le
2018-12-06 17:07
drpcli bootenvs show debian-9-install { ?Available?: true, ?BootParams?: ?priority=critical console-tools/archs=at console-setup/charmap=UTF-8 console-keymaps-at/keymap=us popularity-contest/participate=false passwd/root-login=false keyboard-configuration/xkb-keymap=us netcfg/get_domain=unassigned-domain console-setup/ask_detect=false debian-installer/locale=en_US.utf8 console-setup/layoutcode=us keyboard-configuration/layoutcode=us netcfg/dhcp_timeout=120 netcfg/choose_interface=auto url={{.Machine.Url}}/seed netcfg/get_hostname={{.Machine.Name}} root=/dev/ram rw quiet {{if .ParamExists \?kernel-console\?}}{{.Param \?kernel-console\?}}{{end}}?, ?Description?: ?Debian 9 install BootEnv?, ?Errors?: [], ?Initrds?: [ ?initrd.gz? ], ?Kernel?: ?linux?, ?Meta?: { ?color?: ?black?, ?feature-flags?: ?change-stage-v2", ?icon?: ?linux?, ?title?: ?Digital Rebar Community Content? }, ?Name?: ?debian-9-install?, ?OS?: { ?Codename?: ??, ?Family?: ?debian?, ?IsoFile?: ?debian-9-amd64-mini.iso?, ?IsoSha256": ?cb0e8a529e2c04b06c5d108f72b22281153df15e26b2b900202ef30ac949e5dd?, ?IsoUrl?: ?http://mirrors.kernel.org/debian/dists/stretch/main/installer-amd64/current/images/netboot/mini.iso?, ?Name?: ?debian-9", ?Version?: ?9.5" }, ?OnlyUnknown?: false, ?OptionalParams?: [ ?part-scheme?, ?operating-system-disk?, ?provisioner-default-user?, ?provisioner-default-fullname?, ?provisioner-default-uid?, ?provisioner-default-password-hash?, ?kernel-console?, ?proxy-servers?, ?dns-domain?, ?local-repo?, ?proxy-servers?, ?ntp-servers?, ?select-kickseed? ], ?ReadOnly?: true, ?RequiredParams?: [], ?Templates?: [ { ?Contents?: ??, ?ID?: ?default-pxelinux.tmpl?, ?Name?: ?pxelinux?, ?Path?: ?pxelinux.cfg/{{.Machine.HexAddress}}? }, { ?Contents?: ??, ?ID?: ?default-ipxe.tmpl?, ?Name?: ?ipxe?, ?Path?: ?{{.Machine.Address}}.ipxe? }, { ?Contents?: ??, ?ID?: ?default-pxelinux.tmpl?, ?Name?: ?pxelinux-mac?, ?Path?: ?pxelinux.cfg/{{.Machine.MacAddr \?pxelinux\?}}? }, { ?Contents?: ??, ?ID?: ?default-ipxe.tmpl?, ?Name?: ?ipxe-mac?, ?Path?: ?{{.Machine.MacAddr \?ipxe\?}}.ipxe? }, { ?Contents?: ??, ?ID?: ?select-kickseed.tmpl?, ?Name?: ?seed?, ?Path?: ?{{.Machine.Path}}/seed? }, { ?Contents?: ??, ?ID?: ?net-post-install.sh.tmpl?, ?Name?: ?net-post-install.sh?, ?Path?: ?{{.Machine.Path}}/post-install.sh? } ], ?Validated?: true }

ben.le
2018-12-06 17:09
isos/debian-9.5.0-amd64-netinst.iso

greg
2018-12-06 17:09
It needs to be named: debian-9-amd64-mini.iso

ben.le
2018-12-06 17:16
i just re-name the iso file and restart the installing but the error still the same

ben.le
2018-12-06 17:17
debian-9-amd64-mini.iso

shane
2018-12-06 17:18
The ISO must also have the same exact SHA sum hash - along with the name being the same - i.e. it must be what the BootEnv is looking for

ben.le
2018-12-06 17:20
i have to run drpcli isodownload, right?

ben.le
2018-12-06 17:25
$ drpcli bootenvs uploadiso debian-9-install BootEnv debian-9-install is already available, skipping download of iso ...

ben.le
2018-12-06 17:27
I just run the uploadiso, but it?s skipping download of iso.

christopher_wood
2018-12-06 17:55
@shane, thank you for the pointers! I got into anaconda after reading those, with the default centos-7 bootenv and my custom quickstart.

greg
2018-12-06 18:16
@ben.le - you may need to remove the debian-9 directory and reupload the iso to make sure it is correct. This would be from the tftpboot directory

ben.le
2018-12-06 18:24
after removed debian-9 directory and restarted dr-provision server and then i run the drpcli bootenvs uploadiso debian-9-install BootEnv debian-9-install is already available, skipping download of iso ...

greg
2018-12-06 18:28
yeah - but it should have reexploded the iso

ben.le
2018-12-06 18:30
that what i thought

ben.le
2018-12-06 18:41
I tired to download the iso directly from this link http://mirrors.kernel.org/debian/dists/stretch/main/installer-amd64/current/images/netboot/mini.iso, but it seems like the file Sha256 does match with the one in bootenv json

christopher_wood
2018-12-06 18:45
Does it match with the SHA256 that Debian provides?

ben.le
2018-12-06 18:45
not at all

ben.le
2018-12-06 18:47
SHA256 from the one download sha256sum debian-9-amd64-mini.iso 53fbdc4469216d7cf80023d58925d37add5a81817b43b7a36cd53dd7f606816d debian-9-amd64-mini.iso

ben.le
2018-12-06 18:48
SHA256 from bootenv json cb0e8a529e2c04b06c5d108f72b22281153df15e26b2b900202ef30ac949e5dd

ben.le
2018-12-06 18:49
how to download the iso with the existing SHA256 in the bootenv json file?

greg
2018-12-06 18:53
Yes - because they updated to 9.6. Tip content has updated bootenvs.

greg
2018-12-06 18:53
You will need tip drp to get hte tip content.

ben.le
2018-12-06 18:57
sorry, i?m not familiar with tip drp, what?s the different?

greg
2018-12-06 18:58
To get the most recent drp content which has the sha fixes, you will need the latest DRP endpoint software. We call the latest unreleased version, `tip`

ben.le
2018-12-06 18:59
thanks

andrew
2018-12-06 20:48
Could anyone point me toward some docs on how the custom-ipxe boot env works? I just get "booting kernel failed: Invalid argument" with or without params.

greg
2018-12-06 21:05
@andrew - What are you trying to do?

andrew
2018-12-06 21:08
pxe boot a 32bit machine

greg
2018-12-06 21:09
okay

greg
2018-12-06 21:09
so take that content minus the #!pxe line

andrew
2018-12-06 21:09
honestly I should probably throw this hardware in the garbage, I'm just playing. So no worries if what I'm trying to do is beyond the scope of what drp does.

greg
2018-12-06 21:10
put in the `custom-ipxe` parameter global or on the machien.

greg
2018-12-06 21:11
use the cli. The UX can get confused with the multi-line strings. You need to have added it with the string editor icon thingee.

greg
2018-12-06 21:12
`drpcli profiles show adk-ipxe`

andrew
2018-12-06 21:12
Gotcha! When I use the string editor and do multi line the param gets deleted. I'll go into cli.

greg
2018-12-06 21:13
There should be a little icon that switches to a multi-line editor in the ux.

andrew
2018-12-06 21:13
This little guy?>

greg
2018-12-06 21:13
yes.

andrew
2018-12-06 21:13
Do i have to wrap the whole thing in quotes?

greg
2018-12-06 21:13
no

greg
2018-12-06 21:13
just put the contents in there.

greg
2018-12-06 21:14
then save and don?t trust the ux. Check the cli for if it changed.

greg
2018-12-06 21:14
you should have `\n` in the text, I think.

andrew
2018-12-06 21:14
Checking cli

andrew
2018-12-06 21:14
When I save after using multi line the param disappears.

andrew
2018-12-06 21:15
Oh define each line with \n?

greg
2018-12-06 21:15
no

greg
2018-12-06 21:15
do this. Just a second need to move a cat

greg
2018-12-06 21:16
Create a tmp file with the ipxe script in it

greg
2018-12-06 21:16
do this: `drpcli profiles set global param custom-ipxe to tmp`

greg
2018-12-06 21:16
where `tmp` is your file name.

greg
2018-12-06 21:17
then do: `drpcli profiles show global`

greg
2018-12-06 21:17
That should show this:

greg
2018-12-06 21:17
``` "custom-ipxe": "set base http://mirror.centos.org/altarch/7/os/i386/images/pxeboot/\nprompt -k 0x197e -t 2000 Press F12 to install CentOS... || exit kernel ${base}/images/pxeboot/vmlinuz initrd=initrd.img repo=${base} initrd ${base}/images/pxeboot/initrd.img boot", ```

andrew
2018-12-06 21:17
yup!

greg
2018-12-06 21:18
so now - how are trying to boot with that.

andrew
2018-12-06 21:19
So i'll go over to machines. apply global profile to it. and set boot env to custom-ipxe. Lets see what happens?

greg
2018-12-06 21:19
global is applied to all automatically.

andrew
2018-12-06 21:19
good to know

greg
2018-12-06 21:20
oh - you didn?t put it in global. you put it in adk-ipxe

greg
2018-12-06 21:20
so put that on the machine

andrew
2018-12-06 21:20
Actually I followed your instructions litterally. so now i have a custom-ipxe param in global.

greg
2018-12-06 21:20
ok then all machines have it for now.

greg
2018-12-06 21:21
then yeah force the machines bootenv to that.

greg
2018-12-06 21:21
You may have to unset the machine?s stage and workflow.

greg
2018-12-06 21:21
or create a stage with that bootenv and a workflow with that stage.

andrew
2018-12-06 21:22
Gotcha, yeah. Clearing that up now. One thing i wasn't sure on. Do i need to flag it as runnable for the settings in "bulk actions" to take effect?

greg
2018-12-06 21:23
no. The bulk actions page is just doing things in bulk to the machine. The settings are all driven independently because they are independent. :slightly_smiling_face:

andrew
2018-12-06 21:24
Sounds good.

greg
2018-12-06 21:24
Runnable applies to stopping a runner in a running system.

greg
2018-12-06 21:24
No - sounds rambly. :slightly_smiling_face: But I?m trying.

andrew
2018-12-06 21:24
I'm following and appreciate the help. I've been dabbling with this platform in my free time and loving it so far.

andrew
2018-12-06 21:25
Soon I'll need to provision over 90 raspberry pis and i'm hoping that I can utilize this in someway.

greg
2018-12-06 21:26
umm - okay. Raspberry Pis can be troubling, because they don?t have a real network bootloader.

andrew
2018-12-06 21:26
even the newest model? pi 3 b+

greg
2018-12-06 21:26
well - until pi3 or whatever. Then maybe?.. a big if on the maybe

andrew
2018-12-06 21:27
Yeah, i was going for proof of concept on the 32bit machine. I figured if i could make THAT work i could pivot onto net booting the pis

greg
2018-12-06 21:27
you will be closer. You will need `tip` and the arch support in it.

andrew
2018-12-06 21:27
I'm running tip but didn't pull in the arch support.

andrew
2018-12-06 21:27
I might have missed that step.

greg
2018-12-06 21:28
If you have tip components, then you have arch aware pieces.

andrew
2018-12-06 21:30
No dice on that custom-ipxe btw. Same issue.

andrew
2018-12-06 21:30
No worries. I might be better off abandoning this hardware and just working toward the end game.

andrew
2018-12-06 21:31
fwiw, I know pxe works on those machines.

andrew
2018-12-06 21:32
Maybe I just need to make a 32bit boot env ? It seemed a little overwhelming due to my lack of understanding how kernels and boot parameters work.

andrew
2018-12-06 21:33
I did manage to clone the Centos-7 install and replace the iso with a 32bit one. That kind of worked. now i'm just rambling... Thanks for the help again. I'll keep playing.

ben.le
2018-12-06 21:34
i run into this error when deleting the stage content $ drpcli stages destroy debian-9-install Error: DELETE: readonly: debian-9-install

greg
2018-12-06 21:42
@ben.le - it is in the read-only content and can?t be deleted.

ben.le
2018-12-06 21:44
got it

ben.le
2018-12-06 21:44
thanks @greg

greg
2018-12-06 21:53
@andrew - I have something for you.

greg
2018-12-06 21:54
I looked closer and it didn?t work for me either.

greg
2018-12-06 21:54
Sooo I had to do it this ways.

greg
2018-12-06 21:54
`drpcli profiles show global --format=yaml > global.yaml`

greg
2018-12-06 21:55
then edit `global.yaml`

greg
2018-12-06 21:55
Add this in the parameters section

greg
2018-12-06 21:55
``` Params: custom-ipxe: | set base http://mirror.centos.org/altarch/7/os/i386/images/pxeboot/ prompt -k 0x197e -t 2000 Press F12 to install CentOS... || exit kernel ${base}/images/pxeboot/vmlinuz initrd=initrd.img repo=${base} initrd ${base}/images/pxeboot/initrd.img boot ```

greg
2018-12-06 21:56
Then do: `drpcli profiles update global global.yaml`

greg
2018-12-06 21:56
multi-line strings are hard and shouldn?t really be, but there we are.

andrew
2018-12-06 21:57
Alrighty, I'm on it. I also started thinking that this NIC might not support ipxe. Then i was trying to learn how undionly.kpxe functions.

vlowther
2018-12-06 22:04
@andrew Sorry, we only support 64 bit systems for x86 compatible systems.

andrew
2018-12-06 22:04
No worries

andrew
2018-12-06 22:05
Just getting a kick out of seeing what drp is capable of :smile:

greg
2018-12-06 22:05
umm - in this case, @vlowther why? Yeah - discovery won?t work, but he?s going direct to the 386 repo

vlowther
2018-12-06 22:06
well, the error message was from Sledgehammer. :slightly_smiling_face:

vlowther
2018-12-06 22:06
and we don't have an i686 compatible Sledgehammer.

andrew
2018-12-06 22:06
I understand from a fundamentals standpoint. I was just using that as a proof that pxe was working. Fully understandable that sledgehammer isn't going to have a 32bit

greg
2018-12-06 22:07
yeah- that was his boot proof test. He?s manually created the node and forced its bootenv to the custom-ipxe that points out into the iverse.

greg
2018-12-06 22:07
Though, I?m with you @vlowther that this is a long challenging road to hoe.

andrew
2018-12-06 22:08
Yeah, I didn't want to be a bother at all.

greg
2018-12-06 22:11
@andrew - I?d give it another shot with the right file in place. In the end, @vlowther is right. This machine won?t follow any of the reasonable DRP control patterns.

greg
2018-12-06 22:12
Heck, I bet we don?t even build a drpcli that would work on it.

greg
2018-12-06 22:12
much less make one available to download.

shane
2018-12-06 22:25
yep - we stopped compiling 386 versions earlier in the year ... for all of the binaries

andrew
2018-12-06 22:28
arm64 is in the mix though , right?

andrew
2018-12-06 22:28
That's probably going to be more exciting for me anyhow :slightly_smiling_face:

greg
2018-12-06 22:29
yes though you may have to move some things around and find them, but yeah.

shane
2018-12-06 22:43
we do a couple of ARM arch builds - and ARM is constantly shuffling arch stuff around - so we (probably) may run in to issues w/ newer versions and need to compile something new (eg v5, v6, v7, v8 ... blah blah blah)

shane
2018-12-06 22:43
but we do have a couple of current ARM 32 and 64bit builds

shane
2018-12-06 22:43
sledgehammer is a different story ... that's a bit more limited right now

andrew
2018-12-06 23:24
Ayy, @greg I did it! with a custom bootenv. :smile:

greg
2018-12-06 23:56
Cool

adam.lemanski
2018-12-07 06:16
Is there somewhere a full list what kind of `provisioner.*` options are available like `provisioner.portdelay`? I would like disable the debug shell in case 10 failed downloads of the 2nd image or at least change the number of retries.

greg
2018-12-07 14:12
There are only two currently.

greg
2018-12-07 14:13
`provisioner.portdelay` and `provisioner.routedelay`

greg
2018-12-07 14:14
portdelay is a delay after the port is sleep before the link is marked up.

greg
2018-12-07 14:14
routedelay is a delay for IPv6 to wait before dhcpv6 is started to allow for router advertisements to show up.

greg
2018-12-07 14:15
@adam.lemanski - let me look at it. I?ll try to add something for that today.

adam.lemanski
2018-12-07 14:19
Cool, I will have access to the hardware again next week Tuesday I guess.

dave.parker
2018-12-07 16:26
I've got a weird problem I don't quite understand. I have a DHCP scope on my drp server. A machine got an IP and was successfully built. Now it's just sitting there in the "complete" state. Some time later I tried to build another machine. It was issued a lease on that same IP that was used before, but that's fine since that first machine got a permanent IP and is not using it, and the old lease had expired. So far this is exactly what I would expect.

dave.parker
2018-12-07 16:27
However, when the new machine boots, it doesn't go to discovery, because apparently the server thinks it's the other machine and won't treat it as an unknown system.

dave.parker
2018-12-07 16:27
Can I fix this without deleting the first system?

dave.parker
2018-12-07 16:27
I tried just changing the IP in the machine definition but that didn't help.

greg
2018-12-07 16:34
hmm - this is in interesting issue.

greg
2018-12-07 16:34
You should change the old machine?s Address in RackN to match its static IP. Then see if the new machine goes through discovery.

greg
2018-12-07 16:35
@dave.parker -

dave.parker
2018-12-07 16:37
Ok, let me try that.

dave.parker
2018-12-07 16:49
No, doesn't seem to work. It finds a boot file called <IPADDR>.ipxe and boots from that? Which apparently just directs it to boot local.

greg
2018-12-07 16:51
hmm that file exists after changing the Address field of the other machine.

greg
2018-12-07 16:52
actually. Hmmm

greg
2018-12-07 16:53
I bet we don?t rebuild the bootenv templates on address change.

greg
2018-12-07 16:53
That is a bug. Change the workflow to local. It should be equivalent to your complete.

greg
2018-12-07 16:53
for the old machine and see if that file disappears.

dave.parker
2018-12-07 16:54
Ok, will do.

greg
2018-12-07 16:55
- tip has a sledgehammer update in tip.

greg
2018-12-07 16:55
``` This change does a couple of things. 1. Fixes centos 7.5 to 7.6. There were some oversights in the previous PR. 2. Updates sledgehammer to centos 7.6 for amd64. Arm rebuild will be required. 3. Sledgehammer now has two new kernel variables that can be used to alter start. The two vars are: provisioner.postportdelay which causes a wait in seconds after the link has been brought up, but hasn't started DHCP. Defaults to unset. provisioner.wgetretrycount which is the number of times to retry the stage2 wget before dropping to ash. The default is 10. ```

greg
2018-12-07 16:55
@adam.lemanski - this one is for you.

dave.parker
2018-12-07 17:01
That didn't seem to work either. There is no "local" workflow that I could find so I cleared the workflow. Stage is "none" and bootenv is "local". Same issue.

dave.parker
2018-12-07 17:03
Is that an actual file I can just remove, or is it something that is being generated on demand?

greg
2018-12-07 17:06
Generated

greg
2018-12-07 17:06
Hmm

zehicle
2018-12-07 17:25
@dave.parker just curious - physical hardware or virtual?

samuel.mutel
2018-12-07 17:27
has joined #community201812

shane
2018-12-07 17:30
@samuel.mutel $welcome

2018-12-07 17:30
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

samuel.mutel
2018-12-07 17:33
hello

samuel.mutel
2018-12-07 17:34
i am based in Paris so it's late for me. Just connected to the channel to not lost the temporary link to slack.

samuel.mutel
2018-12-07 17:34
will contact you Monday

dave.parker
2018-12-07 17:47
Both machines are physical

greg
2018-12-07 17:48
Okay. I?ll try to recreate here

greg
2018-12-07 18:09
@dave.parker - are you using DRP?

greg
2018-12-07 18:09
for DHCP

greg
2018-12-07 18:15
if possible, another option is to make sure the Addresses are correct on your machine then restart DRP. @dave.parker

greg
2018-12-07 18:15
still need to think about this.

dave.parker
2018-12-07 18:30
Yes, we're using DRP for DHCP but only for the build. Once the build is done we assign a permanent IP.

greg
2018-12-07 18:42
hmm - I?m trying to figure out how you got the address again. How tight is your Active IP range on your subnet.

greg
2018-12-07 18:42
I guess I can force it that way.

dave.parker
2018-12-07 18:49
So we use only about 10 or so addresses for DHCP. The intent is that systems only use them to build, then we assign them permanent IP addresses outside that range, and then the DHCP addresses are free to reuse.

greg
2018-12-07 18:49
ok

greg
2018-12-07 18:49
I?ll keep trying.

dave.parker
2018-12-07 18:50
Thanks

hugo
2018-12-08 00:51
has joined #community201812

zehicle
2018-12-08 00:56
@hugo $welcome

2018-12-08 00:56
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

fernando.gonzalez
2018-12-09 17:16
has joined #community201812

fernando.gonzalez
2018-12-09 17:17
Hi there!

fernando.gonzalez
2018-12-09 17:18
im using DR-Provison for 2 days now and i think im in love...

fernando.gonzalez
2018-12-09 17:18
great tool!

fernando.gonzalez
2018-12-09 17:18
im using it to deploy OOK with KRIB+Openstack-HELM

fernando.gonzalez
2018-12-09 17:19
so far im able to deploy everything without problem :smile:

fernando.gonzalez
2018-12-09 17:19
but i have a stupid problem i can manage to solve...

fernando.gonzalez
2018-12-09 17:20
i need to limit drp to listen for DHCP on only one interface

fernando.gonzalez
2018-12-09 17:20
but if i use --dhcp-ifs he still listen on 0.0.0.0:67

fernando.gonzalez
2018-12-09 17:21
not only on the interface i specified on the parameter (dhcp requests from other interfaces are rejected)

fernando.gonzalez
2018-12-09 17:31
and also, i have another question...

fernando.gonzalez
2018-12-09 17:32
when sledgehammer gets the gohai information for each host, how i can use that information on templates?

fernando.gonzalez
2018-12-09 17:32
like any other parameter?

zehicle
2018-12-09 18:13
For gohai, you can use {{.ParamAsJson to expand the gohai data then use jq to select. As an alternate, use the inventory stage in the task library to do that for you and read from inventory/ values.

fernando.gonzalez
2018-12-09 18:15
cool so it aggregates profile params + machine params

fernando.gonzalez
2018-12-09 18:16
with any precedence?

fernando.gonzalez
2018-12-09 18:16
profile params overwrite host params?

shane
2018-12-09 18:31
Order of precedence from least to most: ? global profile (and accompanying Params defined in it) ? Machine specific Profile (and accompanying Params) ? Params directly added on Machine

shane
2018-12-09 18:32
btw @fernando.gonzalez - $welcome :slightly_smiling_face:

2018-12-09 18:32
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

zehicle
2018-12-09 18:32
Check out the colordemo repo

shane
2018-12-09 18:34
Also - for your DHCP qeury - DRP will not generate or answer any DHCP packets on interfaces/subnets that it doesn't have a `Subnet` definition for - so ultimately - DRP only answers if an incoming DHCP packet matches a configured `Subnet` on the host - plus you can "enable/disable" subnets on the fly as needed

fernando.gonzalez
2018-12-09 18:36
yep but my problem is dr-provision binds to wildcard 0.0.0.0:67

fernando.gonzalez
2018-12-09 18:37
its ignores requests from other interfaces but the problem is the bind

fernando.gonzalez
2018-12-09 18:37
i cant use dnsmasq on the same server

fernando.gonzalez
2018-12-09 18:39
for now i can use dr-provision to serve all requests because my environment is small

shane
2018-12-09 19:01
are there features in DNSMasq that are missing in DRP that you would use ?

fernando.gonzalez
2018-12-09 19:04
nop, i will try to explain my problem

fernando.gonzalez
2018-12-09 19:06
i have 6 servers with 2 1gbps ethernet (one of those is used to dr-provision communications) and 2 10gbps ethernet.

fernando.gonzalez
2018-12-09 19:06
also there is a small server with dr-provision

fernando.gonzalez
2018-12-09 19:07
i have 3 separated vlans (mgmt 1gbps, k8s 10g, ceph 10g)

fernando.gonzalez
2018-12-09 19:10
as i want to use calico and ceph on designated vlans i need someone to give them ip addresses, as i dont have more servers available my idea was to trunk all vlans to actual dr-provision server and use dr-provision on only mgmt vlan and dnsmasq on the other two

fernando.gonzalez
2018-12-09 19:11
with this configuration i dont need to worry about the NIC order on bios because dnsmasq served DHCP will not have BOOTP

fernando.gonzalez
2018-12-09 19:13
(im so bad writing english...)

fernando.gonzalez
2018-12-09 19:13
xD

greg
2018-12-09 21:20
@fernando.gonzalez we don?t have a specific bind address option. I?ll make sure but I?m pretty sure.

greg
2018-12-09 21:21
I?m with @shane I?d love to know what dnsmasq is providing that DRP and I?ll fix that

fernando.gonzalez
2018-12-09 23:11
@greg to "bind" dr-provision to an interface im using --dhcp-ifs, with this parameter any dhcp request coming from that inerface will be ignored but i expected dr-provision binds to the ip of that interface, not 0.0.0.0:67

fernando.gonzalez
2018-12-09 23:13
there is no especial option dnsmasq have and dr-provision not, i want to split the dhcp requests to avoid if one machine have ipxe configured for two interfaces dr-provision provides booting information to the "wrong" one

fernando.gonzalez
2018-12-09 23:14
(i have 3 vlans trunked to the machine where dr-provision lives)

greg
2018-12-09 23:16
I understand what you have. I don?t understand why you want two tools to manage it. :slightly_smiling_face:

fernando.gonzalez
2018-12-09 23:16
can i setup dr-provision to ignore ipxe requests from one subnet?

fernando.gonzalez
2018-12-09 23:17
(or interface)

greg
2018-12-09 23:17
The `dhcp-ifs` only limits packets not the interfaces. This is partially to make it easier to work with DHCP relay agents.

fernando.gonzalez
2018-12-09 23:18
i need dhcp on 2 vlans and dhcp+ipxe on the other

greg
2018-12-09 23:20
Well - DHCP will hand out addresses to all of those. Whether the system boots from DHCP response is up to the machine.

greg
2018-12-09 23:20
Are you moving the machines post or do all the booting machines start from mgmt ip?

fernando.gonzalez
2018-12-09 23:21
no, the machines will stay on mgmt forever in case i need to recicle some nodes

fernando.gonzalez
2018-12-09 23:22
this problem only occurs on the lab environment

greg
2018-12-09 23:22
DRP DHCP doesn?t have a non-bootfile option currently.

greg
2018-12-09 23:22
I guess I trying to understand your problem.

fernando.gonzalez
2018-12-09 23:22
on production i will have a router with dhcp :smile:

fernando.gonzalez
2018-12-09 23:24
dont worry so much, for this 6 servers i disabled ipxe on bios

greg
2018-12-09 23:24
I don?t understand you problem. Is this real or assumed problem?

greg
2018-12-09 23:25
We haven?t really had this problem before.

fernando.gonzalez
2018-12-09 23:25
is a real problem

greg
2018-12-09 23:25
The machine will attempt to pxe from the mgmt interface, once installed, the DHCP server will tell it to boot the local disk.

greg
2018-12-09 23:25
The local disk will boot the os and the other interfaces will then DHCP cleanly.

fernando.gonzalez
2018-12-09 23:26
if i put 100 new servers on this network (3 vlans trunked to server with dr-provision) and some nics attached to a non mgmt vlan have ipxe configured

fernando.gonzalez
2018-12-09 23:27
dr-provision will boot that machines from the first NIC that request ipxe

fernando.gonzalez
2018-12-09 23:29
dr-provision doesnt allow to not boot a machine if dr-provision is providing dhcp

fernando.gonzalez
2018-12-09 23:30
xD

fernando.gonzalez
2018-12-09 23:30
pocahontas style

fernando.gonzalez
2018-12-09 23:30
xD

greg
2018-12-09 23:30
well - the point is the DRP DHCP server tracks the macs of the system and a single machine entry will be created across the nics.

greg
2018-12-09 23:31
To handle this case.

greg
2018-12-09 23:34
but regardless. DRP doesn?t do what you want to do.

greg
2018-12-09 23:35
it could. binding is a little tough because we don?t have a separate handler system currently. It operates on a single socket model. It isn?t hard to change, but not something I want to just throw together.

greg
2018-12-09 23:36
Changing a subnet to not send a bootfile is easy to add a flag for. Would put it on subnet and reservations probably. That way a bootfile/nextserver is not sent for that subnet or reservation.

fernando.gonzalez
2018-12-09 23:40
i think is a good option, specially on lab environments where you put many pieces on few servers

adam.lemanski
2018-12-10 06:20
@greg I'm happy to report that `provisioner.postportdelay` & `provisioner.wgetretrycount` helped me to get in the sledehammer/discovery stage. Thanks a lot. I'm still a bit curious why there is some timeout but a simple `provisioner.postportdelay=60` works and a `provisioner.postportdelay=30,provisioner.wgetretrycount=30` works too.

masirrobert
2018-12-10 12:58
has joined #community201812

greg
2018-12-10 13:15
My guess is that your switches have portdelay turned on. So, the link comes active and needs some settle time.

greg
2018-12-10 13:16
Depending upon the switch/security gateway, you need more or less time and the switch may stay in discovering state different times.

greg
2018-12-10 13:16
We had one user how needed the pre-`portdelay`, but you need the `postportdelay` . Now we have both.

greg
2018-12-10 13:18
The switch is probably doing spanning tree discovery and has the port up, but not switching. The `postportdelay` or `wgetretrycount` causes the system to wait for the link to stabilize before continuing. @adam.lemanski

zehicle
2018-12-10 13:36
@masirrobert $welcome

2018-12-10 13:36
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

bagricola
2018-12-10 14:30
hmm, so I have successfully installed a CL image onto a switch using the cumulus ZTP functionality with DRP

bagricola
2018-12-10 14:31
however, when the switch reboots after installing, the ZTP script runs but? the machine still shows as ?down? in the DRP ui, I suspect the join-up.sh script does not end up running drpcli processjobs if the machine already exists

bagricola
2018-12-10 14:41
yep confirmed, stuck in this loop: ``` while ! JSON="$(drpcli machines create "{\"Name\": \"$HOSTNAME\", \"Address\": \"$IP\", \"Arch\": \"$ARCH\", \"Meta\": {\"icon\":\"cloud\"}, \"HardwareAddrs\": $(get_macs)}")"; do echo "We could not create a node for ourself, trying again." sleep 5 done ```

shane
2018-12-10 15:08
@bagricola what does your ZTP install workflow look like? do you have a "complete" stage at the end? if the Workflow doesn't mark the machine (switch) complete, then the workflow will never be "finished"

shane
2018-12-10 15:08
also - you might see about adding a Stage in the workflow to install the `drpcli` as a resident Agent in the switch during install - so you can continue to feed new Workflow post-install to the Switch - eg configurations, management, power cycle events, etc...

bagricola
2018-12-10 15:09
yeah it?s a bit weird because it doesn?t really follow how e.g. a centos install or similar works

bagricola
2018-12-10 15:09
theres no concept of a post-install script which you?d usually use to start drpcli

shane
2018-12-10 15:10
the `join-up.sh` script is only for initial Machine object creation and registration - running it again is expected to fail - since it already exists

shane
2018-12-10 15:10
I suppose we could make it smarter so if you re-ran it - it didn't try and create, but just run a dissolvable agent

bagricola
2018-12-10 15:10
there is a complete stage, but it?s never executed as the stage never progresses past ?install?

bagricola
2018-12-10 15:11
so the install stage basically runs some commands to setup the image to install, then triggers a reboot (that completes the install stage), at which point the switch does its thing. When it reboots after install, it makes a DHCP request and uses the value of a DHCP option to run a script, which is the same thing it does during initial discovery

bagricola
2018-12-10 15:12
and yeah I thought that?d be the case about `join-up.sh`

shane
2018-12-10 15:13
we have some new hooks to deal with a reboot in the middle of the workflow - but I think we'd still need an Agent post-install to deal with that and clean up

shane
2018-12-10 15:13
@greg has some more info related to the reboot in the middle

shane
2018-12-10 15:14
@bagricola would like to get all of the pieces you've pulled together and add to ours so we can add Cumulus switch support to the content packs :slightly_smiling_face:

bagricola
2018-12-10 15:17
so the limitation i have (i guess) is that without modifying the script itself, i?d have to serve a different script post-install to the pre-install / discovery one via dhcp (at the moment I just have a static setting on the subnet to serve `join-up.sh`)

bagricola
2018-12-10 15:19
oh hmm? HMM maybe not

bagricola
2018-12-10 15:19
``` -z Zero Touch Provisioning: Stage a Cumulus ZTP script specified by the ztp path. This option can only be used in conjunction with the -i option. See the example below. ```

bagricola
2018-12-10 15:20
might be possible to stage a custom ZTP script which just runs drpcli processjobs using the existing token

bagricola
2018-12-10 15:25
if that?s the case, then I should be able to set control.sh as the ZTP script

bagricola
2018-12-10 15:25
and if that executes post-install, it should start a throwaway runner that can continue!

bagricola
2018-12-10 15:25
(might need the #CUMULUS-AUTOPROVISIONING comment added to control.sh)

greg
2018-12-10 15:37
@bagricola and @shane give me a minute. There is magic here We don?t follow a normal stage install process. How did you trigger the onie-install, @bagricola ?

bagricola
2018-12-10 15:39
`join-up.sh` on initial boot (from dhcp), runs ephemeral drpcli processjobs. Runs discovery without changing the bootenv, then triggering install manually is a new stage (and task) that runs onie-install from drpcli

bagricola
2018-12-10 15:41
(it actually runs automatically after discovery using classification but that doesnt matter)

greg
2018-12-10 15:43
ok - cool - that is what I was doing. What does the stage/task do?

bagricola
2018-12-10 15:44
just runs onie-install twice, and then calls exit_reboot

bagricola
2018-12-10 15:44
i?ll find the code a sec

bagricola
2018-12-10 15:45
``` CUMULUS_IMAGE="{{ .ProvisionerURL }}/files/{{ .Param "cumulus/image" }}" echo "Staging CL Image ${CUMULUS_IMAGE} for installation with postinstall control..." onie-install -fi ${CUMULUS_IMAGE} -z {{ .ProvisionerURL }}/machines/$RS_UUID/control.sh echo "Activating staged CL image and ZTP script..." onie-install -fa echo "Starting install by reboot" exit_reboot ```

bagricola
2018-12-10 15:45
basically this

greg
2018-12-10 15:45
Because, in my head, it would do this: ``` CUMULUS_TARGET_RELEASE="{{.Param "cumulus/release"}}" CUMULUS_CURRENT_RELEASE=$(cat /etc/lsb-release | grep RELEASE | cut -d "=" -f2) IMAGE_SERVER_HOSTNAME={{.ProvisionerAddress}} IMAGE_SERVER="http://"$IMAGE_SERVER_HOSTNAME"/files/"$CUMULUS_TARGET_RELEASE".bin" if [ "$CUMULUS_TARGET_RELEASE" != "$CUMULUS_CURRENT_RELEASE" ]; then ping_until_reachable $IMAGE_SERVER_HOSTNAME /usr/cumulus/bin/onie-install -fa -i $IMAGE_SERVER drpcli machines workflow {{.Machine.UUID}} discovery exit_reboot fi exit 0 ``` or some thing like this

greg
2018-12-10 15:48
more that ?

greg
2018-12-10 15:48
But still. similar.

greg
2018-12-10 15:48
Shouldn?t it ztp on DHCP start-up of the second image?

bagricola
2018-12-10 15:49
it does, and grabs `join-up.sh` again

greg
2018-12-10 15:49
okay - it fails?

bagricola
2018-12-10 15:49
but that doens?t complete because the machine already exists in drp at that point

greg
2018-12-10 15:50
okay - so - what version of content are you using?

greg
2018-12-10 15:50
Because I changed it to do better.

greg
2018-12-10 15:50
For this purpose.

bagricola
2018-12-10 15:51
tip as of about half an hour ago

greg
2018-12-10 15:51
I don?t believe you. :slightly_smiling_face:

greg
2018-12-10 15:51
okay

greg
2018-12-10 15:51
Are you setting hostname in DHCP?

bagricola
2018-12-10 15:52
nope, switch doesn?t exist in DRP prior to booting, it?s discovered during the discovery process from netbox and renamed

greg
2018-12-10 15:53
okay - that would do it

greg
2018-12-10 15:53
and it fails to create the node the second time.

bagricola
2018-12-10 15:53
yep, cos it already exists

greg
2018-12-10 15:54
not really, but really.

bagricola
2018-12-10 15:54
yeah :smile:

greg
2018-12-10 15:55
okay - I have a fix for you to try.

greg
2018-12-10 15:56
how are you at editing the community content.

bagricola
2018-12-10 15:56
hah, i can probably work it out

bagricola
2018-12-10 15:57
the ansible script i use to deploy everything pulls the latest content down as a yaml file and directly uploads it

bagricola
2018-12-10 15:57
so i?ll just do the same manually

greg
2018-12-10 15:57
`drpcli contents show drp-community-content --format=yaml`

greg
2018-12-10 15:57
okay

bagricola
2018-12-10 15:58
`Version: v1.11.1-tip-75-86d3f86a17f6007a7801eee5f3211344d14a5880` version I just pulled using that cmd

greg
2018-12-10 15:58
that is what I have.

greg
2018-12-10 16:01
Here is the diff of what needs to happen.

greg
2018-12-10 16:02
basically, since DHCP isn?t sending hostname - it can?t find itself on the next startup. This moves the IP check and attempts to find self by Address. This is a little dangerous because we don?t require that IP be unique in Machines.

greg
2018-12-10 16:02
sigh - need to edit it - just a minute

bagricola
2018-12-10 16:11
okay, giving it a shot (I assumed you?d already edited it by the time i got back)

greg
2018-12-10 16:34
It is fixed above.

greg
2018-12-10 16:35
I suspect you have the right one.

john
2018-12-10 16:37
i have a question about .InstallRepos - I did some ubuntu installs and noticed there were no apt repos in any of the sources.list. This led me to the `range .InstallRepos` in the preseed, which led me to here: https://github.com/digitalrebar/provision-content/commit/03318234b5e17c99862e9f50e19e6834b026ecaf

john
2018-12-10 16:37
> his gets rid of all manual repo handling in the preseed (for Debian and Ubuntu) and the kickstart (for Centos, RHEL, etc). It replaces it with the repo-handling helpers that dr-provision has provided based on the package-repositories parameter. > In particular, this PR removes the meaning the "local-repo" param used to have for Debian/Ubuntu installs, since what it was trying to do can easily be accomplished by setting up package-repositories to point to the mirror of your choice for any OS install repos.

john
2018-12-10 16:39
from that, I gather I have to define a "package-repositories" parameter with a list of repositories. In doing so, do those repositories need to be available during the install?

greg
2018-12-10 16:41
Yes if they are used with InstallSource true

bagricola
2018-12-10 17:24
@greg works with those patched changes :+1:

greg
2018-12-10 17:34
alrighty then!

greg
2018-12-10 17:34
I plan on building some stages/tasks for these actions.

greg
2018-12-10 17:35
Do you have switch actions you are taking that you wish to tell me about. :slightly_smiling_face:

greg
2018-12-10 17:35
Like add vlan to port or enable port or lacp bond ports ,?

greg
2018-12-10 17:35
I think you said you were doing ansible for this is there a link to that stuff you could share?

greg
2018-12-10 17:35
@bagricola - I?ll push that in.

greg
2018-12-10 17:35
at some point here.

bagricola
2018-12-10 17:48
cool :slightly_smiling_face: yeah the actual switch config is all done in ansible, using callbacks (initial configuration + ongoing)

bagricola
2018-12-10 17:50
i have an ansible role that uses the inbuilt ansible `nclu` module to do various things, but I?m not sure how much of it is portable as it basically uses my equivalent of netwrangler to do port configs

bagricola
2018-12-10 17:51
but it manages SSH keys and blocks of ?configuration? commands (that could do static port config for example)

bagricola
2018-12-10 17:54
I suspect the standard drp tasks for managing SSH and keys and such will work just fine in CL

greg
2018-12-10 17:56
I?m thinking a level higher about tasks/actions to build these are the fly based upon parameters (or external systems).

greg
2018-12-10 17:56
but yes - for the ansible case, the ssh-access task is probably all you need.

bagricola
2018-12-10 17:58
it would be nice to autoconfigure switches purely in DRP :slightly_smiling_face:

greg
2018-12-10 18:01
That is what I?m thinking through?..

greg
2018-12-10 18:01
or trying to

bagricola
2018-12-10 18:04
yeah

bagricola
2018-12-10 18:05
its pretty cool that i?m basically there with being able to config these things automatically and it only took a bit of editing to the community config files? which is stuff I could?ve worked around had that not been an option

bagricola
2018-12-10 18:05
drp is awesome

dave.parker
2018-12-10 19:07
@greg Did you make any progress on duplicating the issue I was having on Friday?

greg
2018-12-10 19:51
Sorry - I haven?t @dave.parker

greg
2018-12-10 20:25
@dave.parker - I have replicated it.

dave.parker
2018-12-10 20:26
Ok!

dave.parker
2018-12-10 20:26
Glad it's not just me.

dave.parker
2018-12-10 20:27
I tried restarting drp after changing the IP of the old server and still wasn't able to clear out that issue.

greg
2018-12-10 20:27
hmmm - let me try

greg
2018-12-10 20:28
Restarting after changing the IP address of the static machine, did fix it for me.

dave.parker
2018-12-10 20:28
Oh yeah?

dave.parker
2018-12-10 20:29
I'm on 3.1.1.0 and 1.5.0 if that makes any difference.

dave.parker
2018-12-10 20:29
Let me try the restart again and make sure it actually restarts.

greg
2018-12-10 20:30
so, changing the address on the machine that is now static, restarting drp, and then rebooting the new machine- works for me.

dave.parker
2018-12-10 20:31
Hrm.

dave.parker
2018-12-10 20:31
Ok, let me try again. I just did a stop/start on drp and verified it did indeed stop/start

dave.parker
2018-12-10 20:31
So let's see.

dave.parker
2018-12-10 20:36
Nope, same thing.

greg
2018-12-10 20:36
`drpcli machines list Address=<ip in question>`

dave.parker
2018-12-10 20:37
I get "unknown command "list""

dave.parker
2018-12-10 20:38
Ok, never mind.

dave.parker
2018-12-10 20:38
I had some other crap going on with my terminal.

dave.parker
2018-12-10 20:38
```$ drpcli machines list Address=162.88.194.246 []```

greg
2018-12-10 20:39
`curl http://<drpip>:8091/162.88.194.246.ipxe`

dave.parker
2018-12-10 20:40
404 page not found

dave.parker
2018-12-10 20:40
So why is this not working?

dave.parker
2018-12-10 20:40
I don't get it.

greg
2018-12-10 20:40
Yeah - I don?t know.

dave.parker
2018-12-10 20:40
What happens if I delete that lease?

greg
2018-12-10 20:41
The file is served from a machine and bootenv. Shouldn?t matter about lease.

dave.parker
2018-12-10 20:41
Ok.

dave.parker
2018-12-10 20:41
Well, let me triple check the settings. Maybe I randomly don't have unknownbootenv set right...

greg
2018-12-10 20:41
The bug I?m seeing is that we aren?t cleaning up the filespace when a machine?s address changes.

dave.parker
2018-12-10 20:41
That would suck.

greg
2018-12-10 20:42
Restarting rebuilds all from scratch.

dave.parker
2018-12-10 20:42
Hrm.

dave.parker
2018-12-10 20:42
That all looks fine.

dave.parker
2018-12-10 20:43
Let me... I dunno. Try powering the machine off and back on. That's what support usually tells me to do...

greg
2018-12-10 20:43
Not drp - make sure you don?t have reservations that would get in the way.

dave.parker
2018-12-10 20:43
Ok.

dave.parker
2018-12-10 20:44
I'll check reservations. I don't think we have any.

dave.parker
2018-12-10 20:44
No reservations.

dave.parker
2018-12-10 20:45
If the magic 30 second poweroff doesn't fix it (it won't, it never does) I guess I'll try deleting the lease just because?

dave.parker
2018-12-10 20:46
Oh I forgot to update all firmware to the latest, that's the other thing support always tells me to do that never works. :smile:

dave.parker
2018-12-10 20:46
Actually if I'm being honest I have had the firmware update actually fix problems. But never the 30 second poweroff.

dave.parker
2018-12-10 20:56
Ok, none of that worked. I deleted the expired lease and the good lease. It got a new IP and did the same thing. So something is borked here.

greg
2018-12-10 21:01
yes - something is strange

shane
2018-12-10 21:02
Are you sure you don't have another DHCP service on that network - or DHCP Relay / IP Helpers forwarding to a different DHCP instance ?

dave.parker
2018-12-10 21:04
...pretty sure? But I could be wrong.

dave.parker
2018-12-10 21:05
Although I do see someone built a machine named dhcp0... That sounds ominous.

dave.parker
2018-12-10 21:05
Let me try a machine that worked once before and see if it works now.

dave.parker
2018-12-10 21:06
I believe the dhcp0 server serves the management subnet. So it shouldn't be a problem. "shouldn't"

dave.parker
2018-12-10 21:15
Ok. The machine I just tried that worked before worked again. But I just noticed this is the first system I'm building in this rack, which is a different subnet from the others. So it could be something borked with that subnet.

dave.parker
2018-12-10 21:15
I'm going to try another system in that rack and see what happens.

greg
2018-12-10 21:15
okay - I?m working on a renderer fix.

dave.parker
2018-12-10 21:16
That'll tell me something. If it works, it tells me something is wonky with this system. If it doesn't work, I probably need to chat with network engineering about this subnet...

greg
2018-12-10 21:43
@dave.parker - I?m pushing a fix through the system on tip that will fix the deregistration problem. Thanks for finding that.

greg
2018-12-10 21:43
It will allow you to change the machine address and let other machines continue reusing the files without restarting the server.

greg
2018-12-10 21:43
You will need to make sure that as you assign static IPs, you update the Machine.Address field appropriately.

greg
2018-12-10 21:44
If you update to tip drp, you should also update to tip content packs and plugins.

greg
2018-12-10 21:44
I?ll let you know when it has made it through the build/test system.

dave.parker
2018-12-10 21:45
Ok, cool!

dave.parker
2018-12-10 22:07
Ok. Second machine in the same rack is doing the same thing.

dave.parker
2018-12-10 22:07
Sooo I suspect this is an issue I need to chat with our neteng team about.

dave.parker
2018-12-10 22:07
Thanks for your help guys.

dave.parker
2018-12-10 22:07
At least I uncovered a legit bug. :smile:

ctrees
2018-12-10 22:55
Anyone going to KubeCon ?


zehicle
2018-12-10 23:02
sorry, missing it. we do have some very cool new K8s function

greg
2018-12-11 02:55
@dave.parker - tip now has your fix in it.

diego.oberlin
2018-12-11 09:05
Good Morning! Quick question: I'm trying to configure IPMI Plugin, and when entering 'ipmi/password' containing not only alphanumeric values i got the following error: ```"'ipmi/password': invalid val '***********': json: cannot unmarshal string into Go value of type models.SecureData". ``` Any ideas?

dave.parker
2018-12-11 15:52
Thank you!

greg
2018-12-11 16:16
@diego.oberlin - ipmi/password is a secure parameter. It needs to be set from the command line only currently.

greg
2018-12-11 16:18
Something like this:

greg
2018-12-11 16:18
``` Gregs-MBP-2:provision galthaus$ drpcli profiles set global param ipmi/password to fred { "Key": "Q0utp4jbWhPn21vavttUd29mFyRKfK3o+i1tTXKRUno=", "Nonce": "4NG0lG6wyILShm2JZwcpiGdi37+VLKPF", "Payload": "7YTmNHCqlNlpJlPy9EHoIQ1KNpeobg==" } ```

greg
2018-12-11 16:19
The data is locally encrypted so that it isn?t stored clear text at rest.

tom.gillman
2018-12-11 17:28
sledgehammer is a RAM boot, correct? No touchy tha disky?

shane
2018-12-11 17:28
correct

shane
2018-12-11 17:29
it's safe to boot it on existing machines to do things like Firmware/BIOS updates, etc ... by design ...

shane
2018-12-11 17:31
just make sure you don't initially boot a workflow that does do something destructive to the host disks :slightly_smiling_face:

gevanoff
2018-12-11 23:31
has joined #community201812

gevanoff
2018-12-11 23:46
Hi. I'm trying to set up Digital Rebar to test it out. The environment I'm working in does some interesting things with DHCP, so I'm thinking I might have to do all of my testing in a local VirtualBox sandbox. Any pointers?


gevanoff
2018-12-11 23:51
Yep, I saw that. I was wondering about the details of "careful setup up[sic] of your network environment..."

gevanoff
2018-12-12 01:06
I'm also assuming that access to the RackN Portal UX is with the commercial product only.

shane
2018-12-12 01:08
The portal is available for open use. There is a hosted (on-prem) version that is commercial. There are specific capacity limits imposed on the publicly accessible portal use.

shane
2018-12-12 01:08
(BTW... $welcome @gevanoff)

2018-12-12 01:08
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

chris
2018-12-12 05:01
has joined #community201812

diego.oberlin
2018-12-12 07:03
Oh I see. Thanks!

shane
2018-12-12 16:44
@chris $welcome

2018-12-12 16:44
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

john
2018-12-12 17:12
something odd with drpcli or drp - i modified a template on disk and used drpcli to upload it. But the result that comes back is the "old" template

john
2018-12-12 17:12
pasting into web ui worked though

john
2018-12-12 17:16
or is upload a one-time thing? I thought it would work like PUT to replace

greg
2018-12-12 17:39
@john - They look similar

greg
2018-12-12 17:40
I usually test with some thing like this: `drpcli templates show virtualbox-other-nics.sh.tmpl | jq .Contents -r`

john
2018-12-12 17:40
if you look in the output, you can see "commented out automatically, native mgm"

john
2018-12-12 17:40
which isn't in my source

greg
2018-12-12 17:41
yeah - I see.

greg
2018-12-12 17:41
okay let me check.

greg
2018-12-12 17:41
Do my command anyway? Always follow up with a get to be sure.

greg
2018-12-12 17:41
It may be that the cli is returning the old one.

greg
2018-12-12 17:42
it shouldn;t

john
2018-12-12 17:42
DRP definitely still has the old one

greg
2018-12-12 17:43
yep busted.

greg
2018-12-12 17:43
looking at it.

zehicle
2018-12-12 17:49
@gevanoff there is only one version of DRP. RackN commercial model is to support the open and provide extensions like UX, content, plugins, etc.

zehicle
2018-12-12 17:52
The UX is available for open source users with sites up to 100 machines. Free registration unlocks some advanced features too. For paid licenses, we allow customers to run their own version on premises.

tom.gillman
2018-12-12 17:52
Why would DRP not hand out a lease for a MAC that it has a reservation?

greg
2018-12-12 17:53
Off the top:

greg
2018-12-12 17:54
1. subnet containing address is off (maybe)

greg
2018-12-12 17:54
2. address is being offered and a ping reply is coming back from someone else.

greg
2018-12-12 17:55
3. dhclient isn?t sending proper mac.

greg
2018-12-12 17:55
3 happens with multihomed hosts sometimes.

john
2018-12-12 17:56
re: dhcp: setting drp preferences to debug level for dhcp helps. also, for even more in-depth troubleshooting, installing "dhcpdump" and running that lets you see packets on the interface, even if drp is misconfigured or not even running

tom.gillman
2018-12-12 17:57
yeah, I'm seeing the dhcp request inbound. just no action. Oddly, the ones on either side of it work just fine.

greg
2018-12-12 17:57
dhcp debug and check logs. @john for the helpful comment.

greg
2018-12-12 17:58
Alright, @john - you can delete / create the template as a workaround. I?ll have a fix in tip shortly.

john
2018-12-12 17:58
:+1:

greg
2018-12-12 17:58
Most of us use content packs and never hit this. :slightly_smiling_face:

john
2018-12-12 17:59
doing content packs as well, but I'm continuously iterating over a couple templates and got tired of copy/pasting into the web UI :slightly_smiling_face:

greg
2018-12-12 17:59
oh - ugh

greg
2018-12-12 17:59
use cli for content packs as well.

greg
2018-12-12 17:59
`drpcli contents bundle ...`

greg
2018-12-12 17:59
is your friend

greg
2018-12-12 18:00
But I understand what you are saying.

john
2018-12-12 18:00
yep yep, doing that :slightly_smiling_face:

john
2018-12-12 18:01
actually, I had a behavior change on that as well...

greg
2018-12-12 18:01
maybe

greg
2018-12-12 18:01
what?s up on that one or open a github issue. That would be better.

john
2018-12-12 18:03
i'll try and reproduce it. I would do `contents upload pack.yaml` and I would get "already exists" (when before it would replace the existing one), so I had to switch to running `contents update packname - < pack.yaml`

john
2018-12-12 18:03
you know what.. I just tried again and couldn't reproduce it

greg
2018-12-12 18:04
okay - it was broken for a few revs of tip (at one point, I think).

john
2018-12-12 18:08
i'll see if it happens again. if it does, it's probably a misleading error message about something in the content pack. that was my guess. Thank you.

greg
2018-12-12 18:09
Thanks for finding the bug

tom.gillman
2018-12-12 18:20
here at VS, we exercise software in ways the designers never intended!

john
2018-12-12 18:28
that's a nice way of saying "we are the better idiot" :slightly_smiling_face:

john
2018-12-12 18:42
q: is there a getting started guide to writing plugins or a good example plugin to look at?

greg
2018-12-12 18:44
no to the guide. we?ve been holding some of that back because it is dangerous, but incrementer is a runnable plugin as an example in the tree.

shane
2018-12-12 18:50
@john - what are you considering doing with plugins? there are a number of commercial plugins in the arsenal already ... curious what you're looking for

john
2018-12-12 18:51
stackstorm integration. we'd like to make calls to stackstorm without running wget/curl commands on the machine

shane
2018-12-12 18:51
sure - there are a couple of community members that have started work on stackstorm integrations - at least one of those is in github

john
2018-12-12 18:51
oh even better :slightly_smiling_face:

shane
2018-12-12 18:53
hmm - well - this one sort of stalled - and is a DRP <> StackStorm <> Device42 integration https://github.com/deusofnull/st2-digital-rebar

shane
2018-12-12 18:54
it's stalled out though - I'll see if I can dig up the other StackStorm integration I know of ...

john
2018-12-12 18:54
ahh.. that's the other way. we have a stackstorm pack to drive DRP with some similar actions

john
2018-12-12 18:55
like we have an action that creates a reservation and then waits for the corresponding lease to show up

shane
2018-12-12 18:55
an integration w/ plugin is a better model in a lot of respects - so actions are driven from the DRP Endpoint side, and not machine-side actions - which may have issues in some security models

shane
2018-12-12 18:55
(eg machines being provisioned don't get direct access to things like CMDB or automation engines... etc.)

gevanoff
2018-12-12 20:04
@zehicle Thanks for the clarification. I'm assessing DRP for a site of thousands of hosts-- looking to modernize the way we do provisioning. Just dipping my feet in the water now and I'm not sure what the licensing fees would be for RackN or what the appetite would be.

zehicle
2018-12-12 20:37
@gevanoff happy to talk 1x1 about it. We have tiers of licenses depending on which components you need ranging from very basic support and out of band management to full enterprise models with co-development.

bagricola
2018-12-13 11:55
when you load a DRP template in tasks view in the UX and it has a syntax error, i get something like `Error parsing inline templates: template: :3: illegal number syntax: "-"` which is? relatively useful for tracking down errors, however my content pack build process right now just uses drpcli contents bundle / create, which I assume just packages up and installs the content but doesn?t actually do any syntax checking

bagricola
2018-12-13 11:55
if i wanted to precheck templates to work out if they?re valid, is there suitable way to do that with DRP or some other golang tool I can deploy?

shane
2018-12-13 11:58
@bagricola - right now the only way is to load the bundle in a DRP endpoint to validate all of the pieces - it shouldn't be too hard to fire up a `dr-provision` binary / service (maybe using alternate ports and w/ DHCP disabled) load content packs to check them - then mark them ok in your internal process

bagricola
2018-12-13 11:58
hmm? i am actually doing that

bagricola
2018-12-13 11:59
This is what I get out

bagricola
2018-12-13 11:59
I wonder if because it?s missing stages from the community packs and such it just doesn?t get far enough to check the template syntax?

shane
2018-12-13 12:00
yes - you'll want to load the `drp-community-content` and any other content packs (eg `task-library`) that you rely on as part of the test process

bagricola
2018-12-13 12:08
ok :+1:

bagricola
2018-12-13 12:08
is there any way in a content pack to describe that it requires other content packs?

bagricola
2018-12-13 12:09
one of the meta files perhaps?

bagricola
2018-12-13 12:09
I know about required features but not about something similar for content packs themselves..

bastiaan
2018-12-13 12:15
has joined #community201812

shane
2018-12-13 12:15
not baked in - we've talked about it - for a straightforward use case, it would be pretty simple to do - but the problem gets complicated to solve correctly if you end up with circular dependencies

shane
2018-12-13 12:15
@bastiaan... $welcome ... !

2018-12-13 12:15
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

bastiaan
2018-12-13 12:15
Thanks @shane!

bagricola
2018-12-13 12:16
@shane gotcha? I grab the content packs when deploying to production env as it?s all done via ansible, but the test process for my own content pack uses gitlab CI so a bit different? shouldn?t be too tricky though

bagricola
2018-12-13 12:22
God dammit? if I?d known this was possible earlier?. I thought it only accepted a blob of JSON directly not a URL :smile:

shane
2018-12-13 12:30
:slightly_smiling_face: that was a feature request I put in - not only accept JSON or YAML, but file or http path :slightly_smiling_face:

bagricola
2018-12-13 12:34
yeah, that simplifies things a lot :smile: cheers

zehicle
2018-12-13 13:53
Wow! I did not know that either!!!

zehicle
2018-12-13 13:53
@bagricola use update content instead of create each time. It gives better errors

shane
2018-12-13 13:55
^^^ assuming of course ... that you already have the content pack install :slightly_smiling_face:

bagricola
2018-12-13 16:41
well pretty much all my CL stuff works now, except part of my install process is to enable VRF on CL

bagricola
2018-12-13 16:41
and that breaks the drpcli runner as it runs in the default VRF, so some shenanigans involved in getting it to talk back to the drp server properly

bagricola
2018-12-13 16:41
but? good enough

bagricola
2018-12-13 16:41
:smile:

greg
2018-12-13 17:14
Can you explain that that to me in email or direct message at some point.

greg
2018-12-13 17:14
Drpcli should reconnect

gevanoff
2018-12-13 23:47
Does anyone have a recommendation about how to configure the networking on VirtualBox to get a basic test setup running?

gevanoff
2018-12-14 00:45
I know this isn't the first time someone has asked. Or, at least, I hope it isn't


gevanoff
2018-12-14 01:06
Unfortunately, this shows how to configure DRP on the mac to then provision VirtualBox VMs.

zehicle
2018-12-14 01:07
what are you trying to accomplish?

zehicle
2018-12-14 01:07
you want to run DRP in a VM?

gevanoff
2018-12-14 01:08
That's what I was trying to do, thinking it would be the safest, easiest option

gevanoff
2018-12-14 01:08
Maybe it's not, though

zehicle
2018-12-14 01:08
it's not a problem

zehicle
2018-12-14 01:09
you need to create a VM for the DRP server that has internet access via NAT and then a second NIC that is for a host only network

zehicle
2018-12-14 01:09
I'd recommend a Centos machine as the DRP host

gevanoff
2018-12-14 01:10
Yeah, I tried that (on Debian). And when it came up there was no networking configured whatsoever

zehicle
2018-12-14 01:10
yeah, for that host machine you'd need to setup the networking

gevanoff
2018-12-14 01:10
Adding that second network adapter seems to have interfered with the configuration of the NAT interface

zehicle
2018-12-14 01:10
do DHCP on the NAT and then static on the host interface

zehicle
2018-12-14 01:11
that's why we start w/ DRP on the host - it's easier :slightly_smiling_face:

gevanoff
2018-12-14 01:11
hah, ok, and that did it

zehicle
2018-12-14 01:11
the DRP DHCP does not "leak" - you can set it to be on the vbox network only

gevanoff
2018-12-14 01:11
You know, I'll probably be good from here

zehicle
2018-12-14 01:12
I'm around if you need more help, just add my @ handle so I see it

gevanoff
2018-12-14 01:12
I just need to redo the DHCP configuration for the host-only network

zehicle
2018-12-14 01:21
we'll make sure any notes you have after the fact get back into the docs

gevanoff
2018-12-14 02:32
Short version of how to Make It Work In VirtualBox: ? In global tools, host network manager, create the vboxnet0 interface. ? On the VM to host DRP, create one network interface on Bridged Adapter (or possibly NAT) and a second on host-only adapter vboxnet0 ? On the VM to pxe boot, set the network adapter to host-only vboxnet0 ? Back on the DRP host VM, set the interface associated with the Bridged Adapter/NAT to use DHCP (for simplicity) and statically configure the host-only interface. ? Also on the DRP host VM, using `drpcli subnets create` create a subnet with the network configurations appropriate for vboxnet0. ? Set the target VM to boot from network and start it up. It should PXE boot.

gevanoff
2018-12-14 03:06
Except then the pxe-booted VM doesn't have external network access and can't reach an archive mirror, soooooooo

greg
2018-12-14 03:15
Yes, I usually add a second nic to that VM as well and put it on the NAT network.

greg
2018-12-14 03:15
so, that I can add it as well.

greg
2018-12-14 03:16
Debian/ubuntu installs are annoying.

greg
2018-12-14 03:18
because you have to either change the `package-repositories` parameter to some value which I don?t have or configure the second nic while the system boots which is silly. For centos, I add this to the kickstart - ``` %pre dhclient --no-pid enp0s8 %end ```

greg
2018-12-14 03:18
Not sure the equivalent for preseed.

greg
2018-12-14 03:18
@gevanoff - this is why we recommend playing with centos, because it is self contained.

greg
2018-12-14 03:18
Much better than ubuntu.

gevanoff
2018-12-14 03:35
Welp, that's nice, but I work at a Debian shop.

zehicle
2018-12-14 03:36
then Centos would be like training wheels for you :slightly_smiling_face:

zehicle
2018-12-14 03:37
not a problem - just reduces factors on the first pass

gevanoff
2018-12-14 03:40
Maybe I should stop fussing with VirtualBox and figure out what I need to do to skip DHCP

zehicle
2018-12-14 03:41
starting with the most default config makes it easier to learn. lots of ways to customize from there.

gevanoff
2018-12-14 03:42
Well, I'm most of the way there

gevanoff
2018-12-14 03:43
The difficulty is really the net config

shane
2018-12-14 03:43
@gevanoff I use the DRP endpoint with routing turned on and NAT translation turned on to Route for the VMs through the endpoint

shane
2018-12-14 03:44
Give me a few minutes to get my example for you

gevanoff
2018-12-14 03:44
Makes sense. I thought about it but was lazy.

zehicle
2018-12-14 03:45
the net config is always the hardest part. we keep looking for ways to make it easier...

bastiaan
2018-12-14 07:33
I have a similar set-up running here on my Mac with virtualbox, using Vagrant. I can share if you like

bastiaan
2018-12-14 07:33
Just a sec, I?ll drop it on Github


bastiaan
2018-12-14 08:16
Sandbox running?

bastiaan
2018-12-14 08:19
The only problem I have right now, is that the ip forwarding doesn?t seem to work, since it can?t reach the internet to download packages:

bastiaan
2018-12-14 08:25
enp0s3 (10.20 address) is the NAT i.f., enp0s8 (192.168.33 address) is the internal host-only i.f.

bastiaan
2018-12-14 10:25
Found the problem: I had the default gw set to 192.168.33.1, while it should be 192.168.33.10 (the ip of the drb host)

samuel.mutel
2018-12-14 13:50
hello

samuel.mutel
2018-12-14 13:51
I am currently working with packer/terraform/containerpilot to build immutable images.

samuel.mutel
2018-12-14 13:52
Currently it is in VM on vsphere and aws. Some input are done through user_data.

samuel.mutel
2018-12-14 13:53
If I create some image with packer using squashfs for example. Is it possible to use digital rebar to provision physical server with this kind of image ?

greg
2018-12-14 14:21
@samuel.mutel - Yes. That is effectively what sledgehammer is. This would allow you to live boot. If you wish to ?install? that to a disk, that is what the RackN image-deploy plugin can do.

john
2018-12-14 14:23
so once last week and this morning, the sledgehammer UI has reported the backend not available. Persists across browers, but curl commands to the API work. dr-provision seems to continue working. Restarting dr-provision brings it back right away.

john
2018-12-14 14:24
It's in this state right now, and I found some log entries. Curious if there's anything else we can look at or try before restarting it

john
2018-12-14 14:26
actually.. while it basically responds, I am seeing that it's not really responding. any requests for things like a list of profiles or tasks is just kind of pending

samuel.mutel
2018-12-14 14:29
@greg If i understand correctly i can use sledgehammer to boot the machine from a live image (centos live cd for example), then download the squashfs image from http/ftp and finally install it on the disk and reboot the machine

greg
2018-12-14 14:30
@john - can you please open a giihub issue with that stack trace?

john
2018-12-14 14:31
sure thing.

greg
2018-12-14 14:31
Please include DRP version and content version.

greg
2018-12-14 14:32
@samuel.mutel - yes you can do that yourself or use the image-deploy plugin.

samuel.mutel
2018-12-14 14:34
With the image-deploy plugin, after copying the image to the disk, is-it possible to download a file and put it on this disk before rebooting. For the user_data ...

greg
2018-12-14 14:36
yes

samuel.mutel
2018-12-14 14:40
Cool. Thanks.


greg
2018-12-14 14:53
Thanks I?ll look at it

shane
2018-12-14 14:54
@bastiaan - very cool - thank you for posting that - we'll take a look at it, and see about adding references to it in the Docs

greg
2018-12-14 15:00
@john - I have a fix for it. Thanks

john
2018-12-14 15:00
:+1:

john
2018-12-14 15:01
that is awesome response time :slightly_smiling_face:

bagricola
2018-12-14 15:16
hmm weird, I?m getting `join-up.sh` stuck in a loop trying to create a new machine again after CL install

bagricola
2018-12-14 15:17
when I copy the token out of the downloaded file and run `drpcli machines list Address=$IP` manually, I get a 403

shane
2018-12-14 15:18
check DRP user/pass pairs (auth) is correct for the CLI call

greg
2018-12-14 15:19
@bagricola - hmm let me check - what version content are you at.

bagricola
2018-12-14 15:19
tip deployed about 20 mins ago

bagricola
2018-12-14 15:19
hmm actually, maybe i was just debugging too slow

bagricola
2018-12-14 15:20
rebooted, copied RS_TOKEN / Endpoint out of the generated ZTP file

bagricola
2018-12-14 15:20
`root@cumulus:~# drpcli machines list Address=10.10.254.10`

bagricola
2018-12-14 15:20
this just returns `[]`

bagricola
2018-12-14 15:20
but the machine is in drp

bagricola
2018-12-14 15:21
``` root@net-test01.lon1:~# RS_KEY="..." drpcli -E https://localhost:8092 machines list | jq '.[].Address' "10.10.254.10" ```

bagricola
2018-12-14 15:22
interesting, running the same command on the drp server returns the machine as you?d expect

greg
2018-12-14 15:22
Is this a known machine?

bagricola
2018-12-14 15:22
yep

bagricola
2018-12-14 15:23
join-up.sh was run on initial boot, triggered a CL image install using workflow

bagricola
2018-12-14 15:23
at the end of the install the switch reboots into CL, reruns `join-up.sh` from DHCP

greg
2018-12-14 15:23
sigh - this is a little annoying.

greg
2018-12-14 15:25
are you sending hostname in the DHCP message?

bagricola
2018-12-14 15:26
not certain, i need to dump to find out - but the machine *has* been renamed by this point

bagricola
2018-12-14 15:26
(in drp, *not* on the host)

bagricola
2018-12-14 15:27
and `join-up.sh` has the code switched as we tried a couple days ago so it does an IP search if it doesnt find the machine by name

greg
2018-12-14 15:27
Yes - that token only works if the machine is unknown.

greg
2018-12-14 15:27
hmmm

bagricola
2018-12-14 15:28
ahhhhhh

greg
2018-12-14 15:29
okay - one more time.

greg
2018-12-14 15:29
I have something I should have done all along.

greg
2018-12-14 15:29
We are broadening the use of join-up.sh - which is cool, but it was intended for things not really pxe booting. I didn?t put the UUID discovery piece in that path.

bagricola
2018-12-14 15:30
yep

greg
2018-12-14 15:32
okay - new thing

zdunn
2018-12-14 15:35
Question (and a link is awesome as an answer): What are the auth options for digital rebar/rackn?

zdunn
2018-12-14 15:35
for context, we are attempting to consolidate a lot of our logins

zdunn
2018-12-14 15:35
and this is one that clearly has a lot of power

greg
2018-12-14 15:36
Not sure about the direction you mean. But, currently, DRP API is user/password or token authenticated using our own user objects.

greg
2018-12-14 15:36
The RackN SaaS is different and probably not what you are asking about.

zdunn
2018-12-14 15:37
DRP is one level for sure

zdunn
2018-12-14 15:37
basically, just about everything else we are using is tied into Google for SSO

greg
2018-12-14 15:37
RackN is looking at adding LDAP/AD support in the future. There could be other integrations.

greg
2018-12-14 15:38
We have leapt on these options because we believe that DRP will run in airgap modes and so SSO is not an immediate priority, but we are starting to see it show it.

greg
2018-12-14 15:39
There are two parts to this future integration (at least). One is the AuthN piece. The other is the AuthZ piece and how those SSO pieces drive tenants and roles.

greg
2018-12-14 15:39
I haven?t looked at the Google SSO pieces. Do you have links for those so I can learn me some stuff.

greg
2018-12-14 15:40
@bagricola - update content and try again.

zdunn
2018-12-14 15:41
sure, it's google is all just SAML

greg
2018-12-14 15:41
okay - our v2 product has okta integration and it was kinda SAML.

zdunn
2018-12-14 15:53
ok cool

bagricola
2018-12-14 15:56
@greg same thing I think, what?d you change? (just so I can make sure it actually got pulled in)

greg
2018-12-14 15:58
``` diff --git a/content/bootenvs/discovery.yml b/content/bootenvs/discovery.yml index ec2b7cb..44ec684 100644 --- a/content/bootenvs/discovery.yml +++ b/content/bootenvs/discovery.yml @@ -209,11 +209,20 @@ Templates: set -x - # Code assumes provider has set HOSTNAME correctly!! - RS_UUID=$(drpcli machines list Name=$HOSTNAME | jq -r .[0].Uuid) + # Check just in case we pxe booted. + host_re='rs\.uuid=([^ ]+)' + if [[ $(cat /proc/cmdline) =~ $host_re ]]; then + RS_UUID="${BASH_REMATCH[1]}" + fi + + # Check Hostname to find us if [[ $RS_UUID == null || $RS_UUID == "" ]]; then - RS_UUID=$(drpcli machines show Name:$HOSTNAME | jq -r .Uuid) + RS_UUID=$(drpcli machines list Name=$HOSTNAME | jq -r .[0].Uuid) + if [[ $RS_UUID == null || $RS_UUID == "" ]]; then + RS_UUID=$(drpcli machines show Name:$HOSTNAME | jq -r .Uuid) + fi fi + # If no uuid, check to see if we have one stored from before if [[ $RS_UUID == null || $RS_UUID == "" ]]; then # See if we have already been created based on dropping uuid file ```

greg
2018-12-14 15:58
never mind, it won?t work for you.

greg
2018-12-14 15:58
sigh

greg
2018-12-14 15:58
It works for other systems.

greg
2018-12-14 15:58
that actually pxe.

bagricola
2018-12-14 15:58
ahh? yeah :disappointed:

greg
2018-12-14 15:59
So - the probielm is that your hostname doesn?t match what DRP knows

bagricola
2018-12-14 15:59
yeah as the installer blats whatever the hostname was regardless

greg
2018-12-14 16:01
The only think I can think of is that the Hostname needs to be sent in the DHCP exchange.

greg
2018-12-14 16:02
but that might not work either for the switch.

bagricola
2018-12-14 16:02
https://docs.cumulusnetworks.com/display/DOCS/Zero+Touch+Provisioning+-+ZTP theres a note in here that says ?Make sure to disable the DHCP hostname override setting in your script?

bagricola
2018-12-14 16:02
and then says its already disabled for CL 3.5 and above :confused:

bagricola
2018-12-14 16:06
is DRP able to reply to DHCP requests with the hostname it knows about?

greg
2018-12-14 16:06
No - you can create a reservation and set the hostname there.

greg
2018-12-14 16:07
but the reservation and the machine name are not synchronized.

greg
2018-12-14 16:07
automatically

bagricola
2018-12-14 16:07
hmmmmm

bagricola
2018-12-14 16:07
so i have a step which pulls the machine name and such from netbox

bagricola
2018-12-14 16:08
if i can create the reservation at the same time, that might work

greg
2018-12-14 16:08
yes

bagricola
2018-12-14 16:08
i dont think synchronisation matters as the machine name should never change until next provision

greg
2018-12-14 16:08
The task can create reservations, I believe.

greg
2018-12-14 16:08
our IPMI configuration stage does it.

greg
2018-12-14 16:09
@john - drp tip has your fix in it

bagricola
2018-12-14 16:09
okay, thanks @greg I?ll give it a shot

greg
2018-12-14 16:10
cool - let me know how it goes

bagricola
2018-12-14 16:34
ok? creating a reservation manually works, but when the install completes and reboots, it looks like DRP joins the machine back up and then restarts the install process

bagricola
2018-12-14 16:34
i need to somehow tell it that the install stage is complete :smile:

greg
2018-12-14 16:36
in your onie-install stage.

greg
2018-12-14 16:36
change workflows to a complete workflow.

greg
2018-12-14 16:37
after the install, but before the reboot.

greg
2018-12-14 16:37
or before the install if it reboots.

bagricola
2018-12-14 16:38
sigh, ignore my stupidity - i uploaded a 3.7.2 image as 3.7.3 to force the switch to run an install from the start

bagricola
2018-12-14 16:38
so it always fails the version check and always reinstalls :woman-facepalming:

bagricola
2018-12-14 16:39
if the version is correct it?ll just skip over the version check, exit 0 and continue with config

bagricola
2018-12-14 16:41
lucky this is PoC hardware cos it?s probably hammering the storage reinstalling this 20 times over :smile:

smedefind
2018-12-14 17:29
Is there any work being done to allow for google auth in the web UI?

smedefind
2018-12-14 17:29
or SSO in general

greg
2018-12-14 17:30
@zdunn asked about it above-^

smedefind
2018-12-14 17:30
damn, he beat me too it

greg
2018-12-14 17:31
:slightly_smiling_face:

zehicle
2018-12-14 17:46
@smedefind webui can do it via cognito. I'd need to collaborate to integrate but the code is there anyway

zehicle
2018-12-14 17:47
Webui auth is different than endpoint auth

greg
2018-12-14 18:02
@zehicle = yep covered above.

greg
2018-12-14 18:03
@smedefind - when Rob says WebUI via cognito, he is meaning the RackN SaaS auth piece. The DRP Endpoint API still uses the auth pieces I mention above. The WebUI works as a bridge between the two systems. This way DRP Endpoint doesn?t phone home and is still airgappable.

bagricola
2018-12-14 18:15
@greg auto reservation step worked :smile: bit of shenanigans to look up the MAC for the dhcp-learned IP which I think I can probably improve, and the only other sticking point is getting DRP to run in the management VRF. I think I basically need to modify the service file that gets installed so it runs drp inside the VRF? but thats a problem for monday :slightly_smiling_face: Thanks for your help this week!

greg
2018-12-14 18:19
well - you can multi-home the DRP endpoint and it should work fine.

greg
2018-12-14 18:20
cool

dave.parker
2018-12-14 19:38
Here's an interesting thing I just discovered.

dave.parker
2018-12-14 19:40
Right now the install process I'm working on uses rebar to get a DHCP lease to do the install, then afterwards it sets the IP of the system to its permanent IP (which is not in the DHCP range) so it doesn't conflict with later systems trying to reuse that IP after the lease expires.

dave.parker
2018-12-14 19:42
But if I try to reinstall that system without re-discovering it, it doesn't work. Because when it goes through the reinstall the machine gets another DHCP IP, but its system profile has a different IP, so it never picks up that I want it to go back through the install process.

dave.parker
2018-12-14 19:42
I guess I have to unset the IP before I reinstall

greg
2018-12-14 19:45
Yes - DRP doesn?t change the Address. The discovery process sets it.

greg
2018-12-14 19:45
It is one of internal questions that we?ve ask is should sledgehammer update the Address of the machine when it boots.

dave.parker
2018-12-14 19:45
I'm wondering why it doesn't go through discovery again then?

dave.parker
2018-12-14 19:46
It just acts like it's set to "local"

greg
2018-12-14 19:46
Because the mac addresses find an ipxe file and set it.

greg
2018-12-14 19:46
Does it get into sledgehammer?

dave.parker
2018-12-14 19:46
No.

dave.parker
2018-12-14 19:46
But this might be another IP address re-use issue.

greg
2018-12-14 19:46
likely

dave.parker
2018-12-14 19:46
With a system that didn't get changed.

dave.parker
2018-12-14 19:47
Anyway. I figured it was probably a design choice. But it messed with me for a bit just now. :slightly_smiling_face:

greg
2018-12-14 19:48
Yeah - at some point, we may want to really think through your use case and see if we can make it better.

zehicle
2018-12-14 20:04
have you considered creating a DHCP reservation w/ the intended IP address?

greg
2018-12-14 20:15
@dave.parker - that might be good. what @zehicle says

dave.parker
2018-12-14 20:31
Yeah, that could be a solution. I'll discuss it with the team.

john
2018-12-14 21:46
we have similar scenario. we have an external process (stackstorm) set the reservation, wait for the lease to appear, then update the ip address in the machine definition

john
2018-12-14 21:47
if you send down the right DRP credentials to the machine (instead of just the usual machine token), you could actually create the reservation and do all that on the machine itself. we just chose to move it upstream a bit

john
2018-12-14 21:51
this assume you have previously loaded a profile named after the serial and it has a primary_ip parameter

greg
2018-12-14 21:51
The machine token does have the ability to create reservations (not list or delete).

john
2018-12-14 21:53
hmm.. I might have to revisit that. it always failed for us and i figured it was a restriction on the token

john
2018-12-14 21:53
(failed on creating the reservation... we since learned there is no reason to delete the lease)

greg
2018-12-14 21:54
Here is the go for token construction:

greg
2018-12-14 21:54
``` t, _ := NewClaim(r.Machine.Key(), grantor, ttl). AddRawClaim("machines", "*", r.Machine.Key()). AddRawClaim("params", "get", "*"). AddRawClaim("stages", "get", "*"). AddRawClaim("jobs", "create", r.Machine.Key()). AddRawClaim("jobs", "get", r.Machine.Key()). AddRawClaim("jobs", "update", r.Machine.Key()). AddRawClaim("jobs", "actions", r.Machine.Key()). AddRawClaim("jobs", "log", r.Machine.Key()). AddRawClaim("tasks", "get", "*"). AddRawClaim("info", "get", "*"). AddRawClaim("events", "post", "*"). AddRawClaim("reservations", "create", "*"). AddRawClaim("reservations", "*", models.Hexaddr(r.Machine.Address)). AddMachine(r.Machine.Key()). AddSecrets("", grantorSecret, r.Machine.Secret). Seal(r.rt.dt.tokenManager) ```

john
2018-12-14 21:56
well i am going to have to try that process again. it would be nice to get the machine on the right ip as part of its discovery process

greg
2018-12-14 21:57
it should work. You just can?t search for a reservation and you can only edit your own.

greg
2018-12-14 21:57
Note the ?key? of a reservation is the IP address so it is little strange.

john
2018-12-16 00:59
had to make sure the reservation didn't already exist

john
2018-12-16 00:59
``` echo "updating machine name to ${FQDN}" drpcli machines update $RS_UUID "{\"Name\": \"${FQDN}\", \"Address\": \"${PRIMARY_IP}\"}" | jq '.Name + " " + .Address' # see if ip reservation exists drpcli reservations exists ${PRIMARY_IP} result=$? if [ "$result" == 0 ]; then echo "reservation for ${PRIMARY_IP} already exists" else echo "creating reservation for$ ${PRIMARY_IP}" drpcli reservations create "{\"Addr\": \"${PRIMARY_IP}\", \"Options\": [{\"Code\": 12, \"Value\": \"${FQDN}\"}], \"Token\": \"${MAC}\", \"Strategy\": \"MAC\"}" fi ```

john
2018-12-16 01:00
in my stackstorm code, i actually delete any instance of the reservation already existing and then create a new one.

zehicle
2018-12-16 01:12
did not realize that the CLI had exists!

diego.oberlin
2018-12-17 08:32
Good Morning! I'm currently having an issue with CentOS 7.5 provisioning: Installation gets stuck at "Running post-installation scripts". Anyone has ever been through this before? Thanks!

diego.oberlin
2018-12-17 08:47
Another question: what about LDAP integration with DRP for users?

bagricola
2018-12-17 09:18
you need to find the job log, should at least give you a hint as to why it?s getting stuck or hanging. If it shows nothing, then I assume the script is failing before DRP takes back over, if it?s the text based installer you can switch to a new tmux (i think?) pane and look at anaconda.log which should be in the root dir

bastiaan
2018-12-17 12:24
Anyone have experience using krib? I?m trying to do a ?demo? install of k8s using the krib content, but when executing the ?krib-live-cluster? workflow it fails on the docker-install stage: ``` Error: Error writing to file /var/cache/yum/x86_64/7/base/gen/filelists_db.sqlite: [Errno 28] No space left on device Command exited with status 1 Action docker-install.sh.tmpl finished ```

bastiaan
2018-12-17 12:25
I?m on DR 3.11.0, content all on tip.

bastiaan
2018-12-17 12:25
I have 3 Virtualbox VMs each with 20GB disk (SATA)..

zehicle
2018-12-17 12:41
Check your drive sizes again. 20 GB may be too small or not set as you think

bastiaan
2018-12-17 12:43
Thnx Rob! It look right now that it mounts the entire first disk as `/docker`. Then it fails on the next stage when it tries to create something in `/var`. So that makes me wonder: if the entire harddisk is used for Docker, then where does the root-filesystem live?

bastiaan
2018-12-17 12:44
If it?s purely in RAM, than I guess maybe they don?t have enough memory?.

greg
2018-12-17 13:27
Yes, how much memory and what type of install are you doing, live or install? @bastiaan

bastiaan
2018-12-17 13:29
@greg I tried doing the live with 3x 1GB

bastiaan
2018-12-17 13:30
now it?s succeeding while trying install with a single 4GB box

bastiaan
2018-12-17 13:30
so I guess it?s indeed a memory thing

greg
2018-12-17 13:33
Yes - if you are doing sledgehammer, 1.5G is close, but 2GB is better. With Krib, you need more still because you are running a full system. I usually do 3-4GB for those.

bastiaan
2018-12-17 13:35
Ah, I see. Thanks!!!

greg
2018-12-17 13:36
All that is assuming you are doing ?live? systems. Which the demos do.

bastiaan
2018-12-17 13:36
I still didn?t buy a new macbook, so I?m stuck with just 16GB? I guess that?s why you guys prefer to use Packet :wink:

greg
2018-12-17 13:38
Yeah - on my 16GB mac, I run DRP in the Host OS (darwin) as an isolated install (yes, I homebrew the heck out of my system). Then I can usually run 3 or 4 3GB 20GB images.

bastiaan
2018-12-17 13:47
Check? I?m running DRP as a VM as well? so that?s pretty tight.. Thnx for the tip!

tom.gillman
2018-12-17 14:33
What's the advantage of standalone as opposed to containerized?

shane
2018-12-17 14:33
running it however you want to ...

shane
2018-12-17 14:37
if you have container deployment solution in your environment - then obviously the container version makes it easy to deploy as a single artifact

greg
2018-12-17 15:16
@tom.gillman - we have some people doing CICD pipeline testing already. They are using that pipeline to build a container for running DRP. Fits their model.

bagricola
2018-12-17 16:10
Janky, but this gets the runner working on CL after VRF enablement :slightly_smiling_face:

bagricola
2018-12-17 16:11
CL itself uses a different trick to put services into a vrf, by using a systemd generator - but it only uses unit files from `/lib/systemd/system` (https://github.com/CumulusNetworks/vrf/blob/a0dbc346abb043b8a3b2bea24df4aa64c9c6acd1/systemd/systemd-vrf-generator#L14) so doesn?t affect the installed drpcli services :disappointed:

bagricola
2018-12-17 16:56
ahh, so close

bagricola
2018-12-17 16:56
it starts, but the machine is set to `Runnable: false`, I assume because it did an exit_reboot

bagricola
2018-12-17 16:57
should drpcli-init.service run after every boot or only once? :thinking_face:

greg
2018-12-17 17:04
On boot

greg
2018-12-17 17:05
@bagricola

bagricola
2018-12-17 17:07
gotcha, that?ll be the bit i?m missing

bagricola
2018-12-17 17:07
so the usual systemctl enable then?

greg
2018-12-17 17:08
yes - that should do it

bagricola
2018-12-17 17:08
ahh need to modify all the Exec lines to make sure it uses the VRF to download the files

greg
2018-12-17 17:08
It is one of the things I?ve been thinking about doing to the join-up.sh script. Have it install the two services (for systemd) and then exit. Letting the services do their magic.

bagricola
2018-12-17 17:19
yeah its annoying that the CL stuff only works on services in `/lib/systemd`, otherwise it?d be much easier to convert the services

bagricola
2018-12-17 17:19
dont really wanna put custom service files in /lib/ cos they?ll probably get destroyed on upgrade

vlowther
2018-12-17 17:26
hmmm... cumulus ignores /etc/systemd/system ?

bagricola
2018-12-17 17:37
it doesn?t ignore it, but it has a custom systemd generator (linked above) that modifies any service files listed in `/etc/vrf/systemd.conf` so that they automatically run inside the management vrf when it?s enabled

bagricola
2018-12-17 17:38
but it only works on the service files in /lib, not the ones in /etc, which I assume is because it uses the /etc directory to output the modified ones

bagricola
2018-12-17 17:42
urgh, prepending the vrf command (e.g. `/usr/bin/vrf task exec %I /usr/local/bin/drpcli machines update "$RS_UUID" '{ "Runnable": true }'`) causes the drpcli command to fail parsing arguments :confused:

bagricola
2018-12-17 17:42
(`Error: drpcli machines update [id] [json] [flags] requires 2 arguments`)

greg
2018-12-17 20:56
- We?ve cut a release for v3.12.0 - here are the release notes - https://github.com/digitalrebar/provision/releases/tag/v3.12.0

greg
2018-12-17 20:56

greg
2018-12-17 20:57
If you update to v3.12.0 DRP, please update to v1.12.0 contents and v2.6.0 plugins. Thanks.

greg
2018-12-17 20:57
Also - there is a sledgehammer update.

zehicle
2018-12-17 21:47
last 2018 meetup tomorrow! https://www.meetup.com/digitalrebar/ I was thinking we'd talk 2019 roadmap items - both RackN and open Digital Rebar.

zehicle
2018-12-17 21:48
we'd also love some feedback about making trial / limited use licenses easier to get and potentially working with the Linux Foundation

ludovic.prevost
2018-12-18 14:31
has joined #community201812

ludovic.prevost
2018-12-18 14:31
morning

shane
2018-12-18 14:32
$welcome - @ludovic.prevost - and good morning

2018-12-18 14:32
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

bagricola
2018-12-18 15:29
just in case anyone is following along with my cumulus shenanigans, the correct fix for running drpcli in a VRF was `/bin/ip vrf exec %i drpcli...`

bagricola
2018-12-18 16:50
And I?m done, CL deploy + redeploy is complete :smile:

greg
2018-12-18 16:58
With a VRF!

bagricola
2018-12-18 16:59
yip

bagricola
2018-12-18 17:01
This is what I ended up with? probably a bit racey between the net commit and reboot but seems to work right now so?

tom.gillman
2018-12-18 17:05
I know I've asked this before, but how often are updated packages rolled into sledgehammer? For me in this particular instance, I'm trying to use `lsblk` to populate block device info into the machine record, and it would be really handy to have the `2.27.1` version, which has support for JSON output instead of having to format it myself. The sledgehammer version looks to be at `2.23.2`

greg
2018-12-18 17:06
We use stock Centos 7.X- we are now at centos 7.6

shane
2018-12-18 17:13
@tom.gillman - we typically only roll Sledgehammer updates when needed - there isn't a real release schedule/cadence for it. I agree that JSON output w/ any tooling is good.

greg
2018-12-18 17:13
got distracted. RackN will sometimes pull in new packages or replace tools for age reasons. the image-deploy plugin does this.

shane
2018-12-18 17:14
This is one of those areas that the redhat based server solutions are rather annoying with ancient tools/library versions unless you pull in external repos and override some of the tool versions

shane
2018-12-18 17:14
too Greg's point - we often add in version checks and upgrades in to workflow pieces if we have known versions we need

shane
2018-12-18 18:17
- v031 Meetup at 11am PST today (in 45 mins) details: https://www.meetup.com/digitalrebar/events/lchdhpyxqbxb

shane
2018-12-18 18:17
Controlling VM scheduling with Kubernetes and Digital Rebar Provision "KRIB" content ... 2018 Retrospective ... 2019 Roadmap !!

mpeter
2018-12-18 20:36
has joined #community201812

zehicle
2018-12-18 20:40
@mpeter $welcome

2018-12-18 20:40
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

mpeter
2018-12-18 20:40
thanks!

zehicle
2018-12-18 21:35
Posted video from meetup: https://youtu.be/GGiUInzNfuw

ben.le
2018-12-18 23:50
hey guys, i created a new bootenv and see it in the list. However, when i create a new machine; it displays ?BootEnv labs-debian-9-install? is not available

ben.le
2018-12-18 23:52
$ drpcli bootenvs list | jq .[].Name |grep labs-debian ?labs-debian-8-install? ?labs-debian-9-install?

gevanoff
2018-12-18 23:53
How should I configure DRP if I want it only to act as a PXE host and not a DHCP server?

ben.le
2018-12-18 23:55
run dr-provision with option --disable-dhcp

ben.le
2018-12-18 23:55
/usr/local/bin/dr-provision --static-ip=10.29.123.42 --disable-dhcp

ben.le
2018-12-18 23:55
hope that help!

ben.le
2018-12-19 00:01
I found the issue; missing the linux and initrd.gz files

gevanoff
2018-12-19 00:21
Thanks! I'll give it a try

adam.lemanski
2018-12-19 03:49
got a bit confused with the ipmi-commands plugin :confused: I thought that would be enough to trigger the ipmi-configure stage. added it my very basic discovery flow and it is just not trying

adam.lemanski
2018-12-19 04:10
tried specifying the values directly to the machine and with username&password in global and address in the machine profile but I still can't trigger/list ipmi actions mentioned in: https://provision.readthedocs.io/en/latest/doc/content-packages/ipmi.html#actions . `drpcli -d machines actions ID` shows only the certs actions :confused:

adam.lemanski
2018-12-19 04:11
using tip portal, tip plugin

greg
2018-12-19 04:11
You set the configure parameters.

adam.lemanski
2018-12-19 04:11
both

greg
2018-12-19 04:11
You need to set the action parameters

greg
2018-12-19 04:12
well - do you try to run the ipmi-configure?

adam.lemanski
2018-12-19 04:12
configure didn't work for me, so at least I wanted to test the actions

greg
2018-12-19 04:12
Does the machine have the ipmi/address parameter set?

adam.lemanski
2018-12-19 04:12
ipmi/address is defined directly for the machine

greg
2018-12-19 04:13
Did you check the plugin pieces to see if it has errors?

greg
2018-12-19 04:14
You uploaded the plugin.

greg
2018-12-19 04:14
Did you create the ipmi plugin?

adam.lemanski
2018-12-19 04:15
nothing in the logs that the plugin was not successful

greg
2018-12-19 04:15
Under the plugins nav panel? Did you create the ipmi plugin there?

adam.lemanski
2018-12-19 04:16
lol thanks, that explains it xD

adam.lemanski
2018-12-19 04:16
looks like the other plugins are enabled by default?

adam.lemanski
2018-12-19 04:18
actions are now listed for the machine

greg
2018-12-19 04:19
yes - it is a historical aspect of ipmi that I can?t easily change.

adam.lemanski
2018-12-19 04:30
awesome, configure and action working now for me :slightly_smiling_face:

bagricola
2018-12-19 11:51
hmm? is there any way to get the ?version? of tip (dr-provision / drpcli rather than content) without downloading it?

bagricola
2018-12-19 11:53
f.ex I just spent 15 mins trying to work out why ZTP booting didn?t work in a new DC, and it?s because it had an old ver of DRP without the DHCP `string:` options. My ansible playbook for deploying DRP assumes that if the binary exists then it?s up to date, so doesn?t automatically update the DRP binary. What I?d like to do is check if the binary installed is the latest version, but from what I can tell right now I?d need to download the 145MB dr-provision.zip file first to run dr-provision --version to compare strings against the running version

zehicle
2018-12-19 12:36
Yes, the -# is the commits since the release

bagricola
2018-12-19 13:18
yeah, just have to find the commit from ?latest? tip and compare i guess

bagricola
2018-12-19 13:18
also, nothing is ever easy, tried to use this ZTP stuff on some new switches in an actual DC

bagricola
2018-12-19 13:18
straight out the box

bagricola
2018-12-19 13:19
``` root@cumulus:/run/cumulus# date Mon Jan 29 20:19:05 GMT 2001 ```

bagricola
2018-12-19 13:19
everything fails because https doesnt work, the clock is so far out ntpd can?t fix it and ntpdate isn?t installed by default :woman-facepalming:

zehicle
2018-12-19 14:04
Ouch

zehicle
2018-12-19 14:04
Interesting use case for insecure comms

zdunn
2018-12-19 14:10
@bagricola I am interested in trying to replicate what you've done with ZTP

zdunn
2018-12-19 14:10
has that all been on cumulus?

bagricola
2018-12-19 14:10
yep, CL based

bagricola
2018-12-19 14:10
testing on real hardware

zdunn
2018-12-19 14:11
Cool. We are an arista shop

zdunn
2018-12-19 14:11
They are always pushing their ZTP solution at us

bagricola
2018-12-19 14:19
i?m not sure how comparable the arista ZTP stuff is with the CL side of things, looks like it can run a script though which is basically what I?ve been doing with CL? where you might run into issues is that it appears that arista uses boot option 67 (bootfile-name), which I believe is also used internally by DRP when it serves a PXE image

bagricola
2018-12-19 14:19
that isn?t a problem with CL because they use option 239, which doesn?t conflict with anything DRP does to PXE boot a system

bagricola
2018-12-19 14:20
so for me it?s as easy as adding option 239 on the boot subnet pointing at the `join-up.sh` script, normal servers ignore it and use whatever DRP provides

zdunn
2018-12-19 14:21
interesting

zdunn
2018-12-19 14:21
our arista configs are fairly static

zdunn
2018-12-19 14:22
but it would be nice to be able to be consistent with the rest of the stack

shane
2018-12-19 14:28
@zdunn - if you use separate subnets/IP space for your switches - you can set the bootfile via the Subnet config on the DRP endpoint side to point to the binary that Arista needs - stage that on DRP in the tftpboot directory

diego.oberlin
2018-12-19 14:31
Hi , my CentOS 7.5 provisioning workflow is stuck at task "centos-drp-only-repos", any ideas?

zdunn
2018-12-19 14:33
@shane that's what I am trying to work for

shane
2018-12-19 14:37
@bagricola I assume you saw the Cumulus posting on setting time? https://docs.cumulusnetworks.com/display/DOCS/Setting+Date+and+Time ```cumulus@switch:~$ net add time ntp server http://4.cumulusnetworks.pool.ntp.org iburst cumulus@switch:~$ net pending cumulus@switch:~$ net commit```` `ntpd -q http://ntp.time.server.com` is also used to replace `ntpdate` functionality barring any of that - sounds like you need a scripted Date/Time step as part of your initial switch bring up - if you can't get time via NTPD reliably - you can make a call to any number of public API endpoints to get the current time - (eg: `wget -q -O - http://worldclockapi.com/api/json/utc/now | jq '.'`)

shane
2018-12-19 14:37
not sure if you've tried `iburst` to force the clock to sync faster ? still may barf on too old time

bagricola
2018-12-19 14:38
@shane actually found that sprig has a `now` function, gonna see if I can set the time approximately based on that

bagricola
2018-12-19 14:38
I don?t really want to go fully down the ntp route in DRP as that?s already handled by ansible later in the provisioning process

bagricola
2018-12-19 14:39
and this all happens during discovery, so if the CL image is old then it gets reimaged about 5s later and all modifications are lost anyway :smile:

shane
2018-12-19 14:52
@diego.oberlin - do you have `local-repo` set on the Machine ?

shane
2018-12-19 14:53
and did you do the `drpcli bootenvs uploadiso centos-7.5.1804-install` - to provide the ISO contents on the DRP Endpoint to install against ?

bagricola
2018-12-19 15:10
hrm? so drp disables the `now` sprig function as misleading, but it looks like the `dateInZone` function defaults to `time.Now()` if no date is given :thinking_face:

diego.oberlin
2018-12-19 15:18
Oh, sorry, I just had done that with the centos-7-install only... my bad

shane
2018-12-19 15:47
@bagricola - yes - there are 3 or 4 Sprig functions intentionally disabled because we believe they can be either destructive unintentionally, or misleading

shane
2018-12-19 15:48
disabled ones are: _ago, now, env, or expandenv_

shane
2018-12-19 15:49
It's the same ISO - the `centos-7-install` bootenv refers to the same ISO (it's a "friendly helper" for current C7 version)

diego.oberlin
2018-12-19 15:55
hmm ok... anyway, I hadn't try to provision centos before. I'm getting following error now: ```Log for Job: 5f5fbf56-50db-433b-b76d-427f85338581 Starting task xc-vm-centos-7:centos-7.5.1804-install:centos-drp-only-repos on test-drp Starting command ./centos-drp-only-repos-centos-drp-only-repos.sh.tmpl Command running Loaded plugins: fastestmirror Determining fastest mirrors http://mirrors.edge.kernel.org/centos/7/atomic/x86_64/repodata/repomd.xml: [Errno 14] curl#6 - "Could not resolve host: http://mirrors.edge.kernel.org; Unknown error" Trying other mirror. ... ... failure: repodata/repomd.xml from centos-7-everything-atomic: [Errno 256] No more mirrors to try. http://mirrors.edge.kernel.org/centos/7/atomic/x86_64/repodata/repomd.xml: [Errno 14] curl#6 - "Could not resolve host: http://mirrors.edge.kernel.org; Unknown error" Command exited with status 1 Action centos-drp-only-repos.sh.tmpl finished Task centos-drp-only-repos failed Marked machine test-drp as not runnable Updated job for xc-vm-centos-7:centos-7.5.1804-install:centos-drp-only-repos to failed Task signalled that it failed```

shane
2018-12-19 16:01
@diego.oberlin - we generally ask that you NOT use the "threads" function in Slack - it doesn't send proper notifications in channel, and it's extremely poorly implemented ... In this case - the errors indicate you don't have DNS resolution for the Repo mirrors - are you using DHCP on DRP -and handing out DNS in the DHCP options (Subnets configuration) ??

diego.oberlin
2018-12-19 16:23
Oh ok, sorry for that (thread) No, i'm not using DHCP on DRP

shane
2018-12-19 16:32
do you have the DNS servers specified in your DHCP options ?

shane
2018-12-19 16:33
(DHCP option 6)

ben.le
2018-12-19 21:53
I just run into this error message when install debian 9 with custom stage and bootenv build

ben.le
2018-12-19 21:53
Dec 19 20:46:49 labs-provision dr-provision[11286]: [0:15]Static FS: Dynamic file error for /machines/ffa198b1-0891-453f-a6ab-117dd4117617/seed: template: :949:3: executing ?net-seed.tmpl? at <.Install>: error calling Install: No idea how to handle repos for labs-debian-9-install

greg
2018-12-19 22:04
You need to make sure that you have an entry in the repo parameter with that name. Or leave the name in your bootenv to debian-9. @ben.le

ben.le
2018-12-19 22:07
$ drpcli bootenvs show labs-debian-9-install { ?Available?: true, ?BootParams?: ?priority=critical console-tools/archs=at console-setup/charmap=UTF-8 console-keymaps-at/keymap=us popularity-contest/participate=false passwd/root-login=false keyboard-configuration/xkb-keymap=us netcfg/get_domain=unassigned-domain console-setup/ask_detect=false debian-installer/locale=en_US.utf8 console-setup/layoutcode=us keyboard-configuration/layoutcode=us netcfg/dhcp_timeout=120 netcfg/choose_interface=auto url={{.Machine.Url}}/seed netcfg/get_hostname={{.Machine.Name}} root=/dev/ram rw quiet {{if .ParamExists \?kernel-console\?}}{{.Param \?kernel-console\?}}{{end}}?, ?Description?: ?Labs Debian 9 install BootEnv?, ?Errors?: [], ?Initrds?: [ ?initrd.gz? ], ?Kernel?: ?linux?, ?Meta?: { ?color?: ?black?, ?feature-flags?: ?change-stage-v2", ?icon?: ?linux?, ?title?: ?FireEye Content? }, ?Name?: ?labs-debian-9-install?, ?OS?: { ?Codename?: ??, ?Family?: ?debian?, ?IsoFile?: ?debian-9-6.0-amd64-netinst.iso?, ?IsoSha256": ??, ?IsoUrl?: ?http://labs-packages-mirror01.eng.fireeye.com/provision/debian-9.6.0-amd64-netinst.iso?, ?Name?: ?labs-debian-9-install?, ?Version?: ?9.6" }, ?OnlyUnknown?: false, ?OptionalParams?: [ ?part-scheme?, ?operating-system-disk?, ?provisioner-default-user?, ?provisioner-default-fullname?, ?provisioner-default-uid?, ?provisioner-default-password-hash?, ?kernel-console?, ?proxy-servers?, ?dns-domain?, ?local-repo?, ?proxy-servers?, ?ntp-servers?, ?select-kickseed? ], ?ReadOnly?: true, ?RequiredParams?: [], ?Templates?: [ { ?Contents?: ??, ?ID?: ?default-pxelinux.tmpl?, ?Name?: ?pxelinux?, ?Path?: ?pxelinux.cfg/{{.Machine.HexAddress}}? }, { ?Contents?: ??, ?ID?: ?default-ipxe.tmpl?, ?Name?: ?ipxe?, ?Path?: ?{{.Machine.Address}}.ipxe? }, { ?Contents?: ??, ?ID?: ?default-pxelinux.tmpl?, ?Name?: ?pxelinux-mac?, ?Path?: ?pxelinux.cfg/{{.Machine.MacAddr \?pxelinux\?}}? }, { ?Contents?: ??, ?ID?: ?default-ipxe.tmpl?, ?Name?: ?ipxe-mac?, ?Path?: ?{{.Machine.MacAddr \?ipxe\?}}.ipxe? }, { ?Contents?: ??, ?ID?: ?select-kickseed.tmpl?, ?Name?: ?seed?, ?Path?: ?{{.Machine.Path}}/seed? }, { ?Contents?: ??, ?ID?: ?net-post-install.sh.tmpl?, ?Name?: ?net-post-install.sh?, ?Path?: ?{{.Machine.Path}}/post-install.sh? } ], ?Validated?: true }

ben.le
2018-12-19 22:08
$ drpcli stages show labs-debian-9-install { ?Available?: true, ?BootEnv?: ?labs-debian-9-install?, ?Description?: ?Debian 9 install stage for FireEye Labs environment.?, ?Errors?: [], ?Meta?: { ?color?: ?yellow?, ?icon?: ?download?, ?title?: ?FireEye Content? }, ?Name?: ?labs-debian-9-install?, ?OptionalParams?: [], ?Profiles?: [], ?ReadOnly?: true, ?Reboot?: false, ?RequiredParams?: [], ?RunnerWait?: true, ?Tasks?: [ ?enforce-public-key-authentication?, ?default-user-access? ], ?Templates?: [], ?Validated?: true }

ben.le
2018-12-19 22:09
I have ?select-kickseed? in the bootenv

greg
2018-12-19 22:20
Since you named the OS.Name ?labs-debian-9-install?, you need to make sure that the package-repositories has entries for that.

greg
2018-12-19 22:20
I think if you left OS.Name - debian-9-install or then it would reuse those entries.

ben.le
2018-12-19 22:22
the package-repositories were created when i ran drpcli bootenv uploadiso labs-debian-9-install

greg
2018-12-19 22:22
it is a parameter with defaults.

greg
2018-12-19 22:22
It can be overridden in global or profiles or machines.

greg
2018-12-19 22:22
If you are using your repos, you need to update that parameter.

ben.le
2018-12-19 22:24
if i used the debian-9-install bootenv of the drp-community-content then i got ?No kernel modules were found. This probably is due to mismatch?.?

greg
2018-12-19 22:24
actually, I think you are right.

ben.le
2018-12-19 22:24
I did update that parameter

greg
2018-12-19 22:24
You need to create a global package-repositories with your info.

ben.le
2018-12-19 22:31
here?s my global package-repositories settings

ben.le
2018-12-19 22:31
for debian-9

ben.le
2018-12-19 22:31
- tag: ?debian-9-install? os: - ?debian-9" installSource: true url: ?http://packages.eng.fireeye.com/debian? distribution: ?stretch? components: - main - contrib - non-free - tag: ?debian-9-updates? os: - ?debian-9" url: ?http://packages.eng.fireeye.com/debian? distribution: ?stretch-updates? components: - main - contrib - non-free - tag: ?debian-9-security? os: - ?debian-9" url: ?http://packages.eng.fireeye.com/debian-security? securitySource: true distribution: ?stretch/updates? components: - main - contrib - non-free

greg
2018-12-19 22:31
Your names don?t match

greg
2018-12-19 22:32
You need to create entries for lab-debian-9

greg
2018-12-19 22:32
or whatever you named it.

ben.le
2018-12-19 22:33
thank you for pointing that out

ben.le
2018-12-19 22:33
i will correct and try it again

greg
2018-12-19 22:33
the os - section can be added to

greg
2018-12-19 22:33
it is a list

ben.le
2018-12-19 22:38
do you have any global contents in the github somehow so i can compare with

greg
2018-12-19 22:38
The default on the parameter.

ben.le
2018-12-19 22:40
Thanks @greg

adam.lemanski
2018-12-20 03:42
first, I really falling in love with KRIB :slightly_smiling_face: I'm trying to get it running with 3 masters but `krib-config` is failing for 2 of 3: ``` Node master2.metal is not yet ready. NAME STATUS ROLES AGE VERSION master1.metal Ready master 2m23s v1.12.3 master2.metal NotReady master 76s v1.12.3 master3.metal Ready master 74s v1.12.3 Node master3.metal is ready. Node master1.metal is ready. Node master2.metal is ready. Node master3.metal is ready. { "color": "green", "feature-flags": "change-stage-v2", "icon": "anchor" } Adding env=dev label to machine master3.metal The connection to the server localhost:8080 was refused - did you specify the right host or port? Command exited with status 1 Action krib-config.sh.tmpl finished Task krib-config failed ``` I could not find anything in the templates which wants to communicate to `localhost:8080`. Any hint?

adam.lemanski
2018-12-20 03:43
besides that it does not continue for the 2 masters: ``` NAME STATUS ROLES AGE VERSION master1.metal Ready master 20m v1.12.3 master2.metal Ready master 19m v1.12.3 master3.metal Ready master 19m v1.12.3 worker1.metal Ready <none> 18m v1.12.3 worker2.metal Ready <none> 18m v1.12.3 worker3.metal Ready <none> 18m v1.12.3 worker4.metal Ready <none> 18m v1.12.3 worker5.metal Ready <none> 18m v1.12.3 worker6.metal Ready <none> 18m v1.12.3 ``` some further info: `etcd/servers` & `krib/cluster-masters` are predefined in the profile

adam.lemanski
2018-12-20 08:23
added `--overwrite=true` to `my-krib-config.sh.tmpl` but looks anyway not consistent with the labels for the master: ``` kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS master1.metal Ready master 6m v1.12.3 http://beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=master1.metal,node-role.kubernetes.io/master= master2.metal Ready master 5m v1.12.3 http://beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=master2.metal,node-role.kubernetes.io/master= master3.metal Ready master 5m v1.12.3 http://beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,builder=krib,env=dev,kubernetes.io/hostname=master3.metal,node-role.kubernetes.io/master= worker1.metal Ready <none> 4m v1.12.3 http://beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,builder=krib,env=dev,kubernetes.io/hostname=worker1.metal worker2.metal Ready <none> 4m v1.12.3 http://beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,builder=krib,env=dev,kubernetes.io/hostname=worker2.metal worker3.metal Ready <none> 4m v1.12.3 http://beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,builder=krib,env=dev,kubernetes.io/hostname=worker3.metal worker4.metal Ready <none> 4m v1.12.3 http://beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,builder=krib,env=dev,kubernetes.io/hostname=worker4.metal worker5.metal Ready <none> 4m v1.12.3 http://beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,builder=krib,env=dev,kubernetes.io/hostname=worker5.metal worker6.metal Ready <none> 4m v1.12.3 http://beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,builder=krib,env=dev,kubernetes.io/hostname=worker6.metal ```

adam.lemanski
2018-12-20 08:45
hmm seems I've got some calico issues now. most deployments fail with something like: ``` Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "da971c119ea96ff5899626ec968e84f5f9557083c0e0e26a927dcd94614f0c84" network for pod "rook-discover-spxb7": NetworkPlugin cni failed to set up pod "rook-discover-spxb7_rook-ceph-system" network: error adding host side routes for interface: calicf4b87eac40, error: route (Ifindex: 8, Dst: 192.168.32.65/32, Scope: 253) already exists for an interface other than 'calicf4b87eac40' Back-off restarting failed container ``` dashboard failed too but after deleting the failed pods it scaled up again and worked. for rook it does not work after deletion of the pods

greg
2018-12-20 14:41
@adam.lemanski - you appear to be trying everything at once. I?m not sure we?ve look at Rook in a while.

andrew
2018-12-20 22:09
Fresh TIP DRP server here on ubuntu 18.04. Followed guide up to the sledgehammer bootenv. On a baremetal PXE boot initial PXE boot works but then it's getting kicked to a shell before it can download root.squashfs . I can manually grab the file with wget after it bails , so communications are working fine. It just appears it's not waiting long enough for DHCP and then fails to get the file before it gets an IP. Any ideas on where i should poke around to get discovery working?

shane
2018-12-20 22:11
do you have "port fast" on the switch ports for the machine failing ? if not - the port may take 30 seconds or longer to enable due to Spanning Tree

andrew
2018-12-20 22:12
Curse you STP, Thanks for the idea. I'll go check the switch.

shane
2018-12-20 22:12
there is also a new in v3.12.0 released earlier this week feature that may help

andrew
2018-12-20 22:12
fwiw :

andrew
2018-12-20 22:13
Pretty sure I'm running 3.12

shane
2018-12-20 22:17
see the `provision.*` kernel options in the v1.12.0 content pack that enables/supports the feature: https://github.com/digitalrebar/provision-content/releases/tag/v1.12.0

shane
2018-12-20 22:18
make sure you've updated content and sledgehammer along with v3.12.0 release version

greg
2018-12-20 22:25
@andrew - from a continuance of testing perspective, you can exit ash and it will attempt to re-wget.

andrew
2018-12-20 22:26
Oh! thanks.

andrew
2018-12-20 22:46
Sure enough the switch was slowing things down. Thanks again.

shane
2018-12-20 22:46
awesome - that's an easy fix for us :slightly_smiling_face:

adam.lemanski
2018-12-21 00:41
@greg nope, not everything at once, one workflow step after another. I just try to to get the krib-install workflow running with my ha-krib profile. `rook` is just an example, like I mentioned I have the same issue with the krib-dashboard step before.

adam.lemanski
2018-12-21 00:42
I've got the basic setup working (kubeadm etc) by allowing overwriting node labels and have now issues with creating container using calico with the above message

adam.lemanski
2018-12-21 00:45
I run the provisioning more than 20 times to discover issues and I try to fix them.

adam.lemanski
2018-12-21 02:39
new day, new krib provisioning :slightly_smiling_face: looks like the dashboard link was updated and I will try using calico 3.4. if successful you can expect a PR later

hannahervin
2018-12-21 03:55
has joined #community201812

zehicle
2018-12-21 03:56
@hannahervin $welcome

2018-12-21 03:56
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

shane
2018-12-21 03:58
@adam.lemanski - KRIB has special considerations if you are running it over and over, it's not very idempotent from that respect

shane
2018-12-21 03:59
Presumably you've found the clear profile routines for multiple runs

adam.lemanski
2018-12-21 03:59
noticed that already, always re-setting the hardware with an other workflow

shane
2018-12-21 03:59
And we may have missed some of them for add-on pieces (like rook or calico bits)

shane
2018-12-21 04:00
The profile used to store the cluster state and info specifically needs cleaned between consecutive runs

adam.lemanski
2018-12-21 04:00
christmas party starting soon so I guess PR will have to wait but so far I think I will contribute some minor calico improvements

shane
2018-12-21 04:01
Excellent! Happy Party time!

adam.lemanski
2018-12-21 04:01
yeah, created a template profile from which I always clone

shane
2018-12-21 04:01
Very good.....!

adam.lemanski
2018-12-21 04:02
since I deal with real hardware all that stuff takes quite some time to test but I prefer testing on what will run later in production for now

adam.lemanski
2018-12-21 04:47
trying to add the possibility to set the cluster ip for the calico etcd via ` curl -gfsSL {{ .Param "provider/calico-etcd-config" }} | sed "s/clusterIP: 10.96.232.136/clusterIP: ${{ .Param "provider/calico-etcd-clusterip" }}/g" | kubectl apply -f -` but somehow it resulted in cutting of the first character/digit of the address :confused: an idea why?

adam.lemanski
2018-12-21 04:49
while `provider/calico-etcd-clusterip: "172.16.160.100"` & `provider/calico-etcd-config: https://docs.projectcalico.org/v3.4/getting-started/kubernetes/installation/hosted/etcd.yaml` are set

adam.lemanski
2018-12-21 04:49
``` Starting calico networking... daemonset.extensions/calico-etcd created The Service "calico-etcd" is invalid: spec.clusterIP: Invalid value: "72.16.160.100": provided IP is not in the valid range. The range of valid IPs is 172.16.160.0/20 Command exited with status 1 Action my-krib-config.sh.tmpl finished Task my-krib-config failed ```

john
2018-12-21 17:13
i've pinned my setup on v3.12.0 and i picked the community content of v1.12.0 as latest. But the web ui is telling me that v1.11.0 is the newer content pack

john
2018-12-21 17:13
drp-community-content : Digital Rebar Provision Community Content Version (Current): v1.12.0-0-07692141149597ca0535204c851e4ff964fc91ce Version (New): v1.11.1-0-11b8fd04814bb8425d8eabce59c113e4e2c12c75

john
2018-12-21 17:14
is this a UI glitch or is there newer bits in v1.11.0 ?

greg
2018-12-21 17:18
SaaS needs updating. I forgot to change the recommended.

greg
2018-12-21 17:18
in the SaaS DB. I?ll do that shortly.

john
2018-12-21 17:31
not anything urgent, just saw that and found it interesting

john
2018-12-21 17:33
i've now ansible installing the docker image, content pack, initial setup, and isos, pinned on those versions. the newest dockerfile and --net is working for me

greg
2018-12-21 19:05
should be updated now.

kraven
2018-12-22 22:15
has joined #community201812

kraven
2018-12-22 22:22
hello. trying to boot 3 dell workstations with dr-provision in docker and dhcp proxy. it's working exactly as expected and all 3 come up into sledgehammer wait.

2018-12-22 22:22
Time to feed the :bear:!

kraven
2018-12-22 22:23
I'm running into problems at the drpcli machines update <UUID> '{ "Workflow": "centos7" }' step. I see the machine reboot but this time when it gets into the PXE I get "PXE-E53: No boot filename received"

kraven
2018-12-22 22:24
then no boot device found, press any key to reboot the machine

kraven
2018-12-22 22:25
the machine won't ever pxe boot again until I go into rackn and delete the machine and then manually reboot it. Then it will come up into sledgehammer-wait again.

kraven
2018-12-22 22:26
I'm stuck there. I'm following the quickstart and configuring most things via drpcli as detailed in the guide

kraven
2018-12-22 22:40
I have static dhcp set for all 3 machines in my router. I see the requests in the provision log before it times out

kraven
2018-12-22 22:40
``` dr-provision2018/12/22 22:39:41.536879 [227:33]dhcp:dhcp [ warn]: /home/travis/gopath/src/github.com/digitalrebar/provision/midlayer/dhcp.go:166 [227:33]No matching subnet, will respond to 0.0.0.0 from 10.0.0.6 dr-provision2018/12/22 22:39:56.692861 [228:34]dhcp:dhcp [ warn]: /home/travis/gopath/src/github.com/digitalrebar/provision/midlayer/dhcp.go:567 [228:34]xid 0xd1808b: Another DHCP server may be on the network: <nil> dr-provision2018/12/22 22:40:02.612311 [229:35]dhcp:dhcp [ warn]: /home/travis/gopath/src/github.com/digitalrebar/provision/midlayer/dhcp.go:166 [229:35]No matching subnet, will respond to 0.0.0.0 from 10.0.0.6 dr-provision2018/12/22 22:40:02.613577 [230:36]dhcp:dhcp [ warn]: /home/travis/gopath/src/github.com/digitalrebar/provision/midlayer/dhcp.go:534 [230:36]xid 0xe827e5: Ignoring request for DHCP server 10.0.0.254 dr-provision2018/12/22 22:40:02.612311 [229:35]dhcp:dhcp [ warn]: /home/travis/gopath/src/github.com/digitalrebar/provision/midlayer/dhcp.go:166 [229:35]No matching subnet, will respond to 0.0.0.0 from 10.0.0.6 dr-provision2018/12/22 22:40:02.613577 [230:36]dhcp:dhcp [ warn]: /home/travis/gopath/src/github.com/digitalrebar/provision/midlayer/dhcp.go:534 [230:36]xid 0xe827e5: Ignoring request for DHCP server 10.0.0.254 ```

kraven
2018-12-22 22:41
10.0.0.6 is the br0 ip of the provision container. 10.0.0.254 is my router/dhcp server

kraven
2018-12-22 22:43
I get the first dhcp.go:166 4 times and then what I pasted above

kraven
2018-12-22 22:59
only other thing that is strange is one of the 3 machines always has a null gohai inventory but the other 2 are full

greg
2018-12-23 00:01
@kraven - how is your Subnet in DRP configured?

zehicle
2018-12-23 03:04
@kraven there may be something causing gohai to error on that machine. try running from the CLI `drpcli gohai`

kraven
2018-12-23 10:35
drp subnet is the same 10.0.0.0/24 as my router but in proxy mode. It works for the first boot fine and if I delete the machine so it must be something with the url that the pxe is trying to pull but I don't see anything more when I turn on --dhcp-debug=2

kraven
2018-12-23 10:40
I'll dig through the code and see where I need to turn on debug or add debug to see the http and tftp requests

kraven
2018-12-23 10:44
ahh prefs, still learning the product...

kraven
2018-12-23 10:51
that's more insightful. with the machine already discovered and I just manually reboot it I get this:

kraven
2018-12-23 10:51
5 times then the pxe client times out

kraven
2018-12-23 10:55
then I delete the machine in DRP and reboot it and I get the same thing once and then it I see this and it boots:

kraven
2018-12-23 11:00
I'm going to try disabling my router dhcp and turn on full dhcp in DRP and see if I get different results

kraven
2018-12-23 11:06
okay, so second boot works when I switch to centos-install with DRP in full dhcp instead of proxy so something not going right in proxy mode

kraven
2018-12-23 11:07
router is asuswrt-merlin using dnsmasq

kraven
2018-12-23 11:48
this is cool https://github.com/solarkennedy/diyipmi. would be easy to modify to run on really cheap esp8266 wifi module to add serial-tty ipmi to any physical machine

zehicle
2018-12-23 12:30
@kraven proxy mode is not as widely used, so you may have found a bug. For those traces, please use the snippet feature in slack to improve readability

zehicle
2018-12-23 12:32
hmmm, of if you are, they are coming in full for me instead of slack hiding them

kraven
2018-12-23 12:58
I removed them for now so it doesn't ugly things up

kraven
2018-12-23 12:58
new to slack but I'll figure out snippet mode and add them

kraven
2018-12-23 13:00
so I did a ubuntu install and after it's done it reboots and then hits pxe again and times out like it was doing before with proxy dhcp

kraven
2018-12-23 13:01
if I go into the bios and set hdd as second boot option then it boots up after pxe times out and I can see in the DRP machine page as complete

kraven
2018-12-23 13:02
but I thought that pxe should still be working but tell the machine to local boot instead of timing out?

kraven
2018-12-23 13:06
maybe I'll wipe my data directory for drp and start over. I'm running similar to https://provision.readthedocs.io/en/latest/doc/faq-troubleshooting.html#example-docker-volume-usage except I have a br0 network with IP rather than host networking so I don't get any port clashes

kraven
2018-12-23 13:08
I did have a problem with that docker container because it didn't have the drp-community-content package added when it came up. I uploaded via drpcli and then went from there.

kraven
2018-12-23 13:50
nevermind there, I started with fresh volume and I didn't need to download drp-community-content.yaml this time

kraven
2018-12-23 13:53
still same proxy bug but I'm able to get it to never pxe boot now by starting with empty machine list and then adding a reservation for the machine mac address. once I remove the reservation it will boot.

kraven
2018-12-23 13:55
so I'm guessing when the machine gets discovered it's adding a reservation for that mac automatically


kraven
2018-12-23 14:22
is there a Dockerfile to build the dr-provision.zip?

shane
2018-12-23 14:27
@kraven - a Reservation is not automatically created by DRP for a machine - DRP doesn't know what you want to do for an IP management scheme - so re-assigning IPs after initial discovery, or creating/injecting appropriate Reservations is left up to the operator

shane
2018-12-23 14:28
the `tools/` directory contains the build scripts - `build.sh` is the main build script

shane
2018-12-23 14:29
and of course - there's the `Dockerfile` for the docker

kraven
2018-12-23 14:41
okay, I can prob find a go base image that I can build with. I don't normally install anything locally

kraven
2018-12-23 14:42
I was just guessing on the reservation since having the reservation and having the machine had the same result but thank you for clarifying

kraven
2018-12-23 14:48
@zehicle drpcli gohai from that one machine with a null one throws an index out of range error

zehicle
2018-12-23 14:49
That would explain the empty param.

kraven
2018-12-23 14:50
strange because it's exact same hardware/bios/etc... as another machine that is fine

zehicle
2018-12-23 14:51
Looks like something unexpected in DMI

kraven
2018-12-23 14:51
OptiPlex 9010

zehicle
2018-12-23 14:51
Every machine is a snowflake

kraven
2018-12-23 14:51
yeah

kraven
2018-12-23 14:53
office was going to send a bunch to be destroyed when they refreshed so they gave me a few. easier and more powerful for a test cluster than raspberry pi's

kraven
2018-12-23 15:00
end goal is k8s cluster with openebs or rancher longhorn persistent storage. also want to use metallb in arp mode for load balancer in front of the ingress

zehicle
2018-12-23 15:15
I'm assuming you've see the KRIB stuff

kraven
2018-12-23 15:16
so for dhcp proxy issue looks like what is happening is when the machine is not in the list offerNetBoot get set to true on line 232 of dhcp.go the machine boots and gets added to the machine list then reboot and it tries pxe again it doesn't match line 223 because machine is not null any more but has no address because it's a fake lease so it matches line 254 and offerNetBoot gets set to false then when it gets to line 670 it returns NoPXE because offerNetBoot is false

kraven
2018-12-23 15:17
@zehicle yeah, that's what lead me to the project in the first place. was trying to find a better way to bring up the cluster than matchbox/terraform/bootkube

zehicle
2018-12-23 15:40
We made of improvements and integrations to dhcp lately. Proxy may not have enjoyed att the benefits

kraven
2018-12-23 15:56
I like the kexec quick bounce from discovery to coreos-live and back

greg
2018-12-23 16:01
@kraven - Can you open an issue in github for the DHCP Proxy issue, please? Most of us are out and about the next few days with Christmas/Holiday stuff. Don?t want to forget this .

kraven
2018-12-23 16:02
sure, just making sure I wasn't doing it wrong :slightly_smiling_face:

greg
2018-12-23 16:03
Still not sure, but sounds like we have bug. Need the notes and config. Thanks for playing with it.

rik
2018-12-24 04:46
has joined #community201812

zehicle
2018-12-24 13:27
@rik $welcome

2018-12-24 13:27
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

rik
2018-12-24 14:02
Thanks @zehicle. Been playing with Provisioner for a while and looking for the community discussion (since the repo+docs is a bit disjointed at times). I guess I?ve found it.

zehicle
2018-12-24 14:20
Happy for suggestions (and patches) on the docs! We do a lot of video and meetups that may help.

adam.lemanski
2018-12-25 02:35
merry christmas to everyone celebrating it. Just a small info about the issue of cutting of the first character for a param in a template... just one `$` too much: `.../clusterIP: ${{ .Param "provider/calico-etcd-clusterip" }}/g"` -> `.../clusterIP: {{ .Param "provider/calico-etcd-clusterip" }}/g"`

greg
2018-12-25 03:21
@adam.lemanski Merry Christmas to you as well. Yeah. Golang templates don?t need the $ The braces are enough.

adam.lemanski
2018-12-25 03:22
yeah, I guess somebody sneaked to my notebook and wanted to sabotage me. :wink:

rik
2018-12-25 10:39
The current issue I?m having is with sledgehammer (I think). One of my (home lab) servers is not bringing networking up properly. In attempting to troubleshoot this, some things I noticed: * the slack archive stops at September 2018 * I had trouble finding the sledgehammer root password (eventually found it in the `http://github.com/digitalrebar/sledgehammer` repo) * the sledgehammer build doesn?t seem to match what?s in the github repo * there?s a comment in the github repo that says it?s unused, but I can?t find the real location. The `sledgehammer-builder` page in the docs says ?unspecified? and nothing else. * I couldn?t find a way to boot sledgehammer with extra kernel args (I wanted to try `net.ifnames=0` since there was an error in the journal that pointed to an interface naming issue). I ended up hacking it into `kernel-console` in the `global` Profile. * the current (1.12.0 & tip) sledgehammer uses version 219 of systemd but `/etc/systemd/network/20-bootif.network` uses `ClientIdentifier` which was introduced in version 220. So far, I?m not sure if any of this has a bearing on why it?s not bringing up networking... Happy to contribute to the docs, but at the moment I?m feeling a bit green.

zehicle
2018-12-25 13:03
hmmm, I had no idea that guest accounts had search history limited. We post the slack archive here http://rebar.digital/slack/html/index.html

zehicle
2018-12-25 13:06
also, sledgehammer build process changed in the last release when we added IPv6 and ARM. I'll let Greg talk to that. You are right that it's different.

andrew273
2018-12-25 13:39
has joined #community201812

kraven
2018-12-25 14:55
is there a source repo for virtualbox-ipmi plugin?

kraven
2018-12-25 15:01
or packet-ipmi?

zehicle
2018-12-25 15:33
@kraven those are RackN components that we offer free. we don't publish the source code at this time

zehicle
2018-12-25 15:35
for customers and partners, we do offer co-dev for plugins that includes source access

shane
2018-12-25 15:43
@andrew273 $welcome ...

2018-12-25 15:43
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

shane
2018-12-25 16:26
@rik some answers: ? I think we just haven't gotten around to posting the Slack archives since September - we tend to do it "quarterly-ish" (it's not automated) ? root/rebar1 is the PW for Sledgehammer - but by default only on the Console (see $faq for adding SSH keys, or changing `access-ssh-root-mode`) ? as Rob mentioned - we just switched to a new build tool for Sledgehammer (`sledgehammer-builder` workflow content) ? `sledgehammer-builder` repo hasn't been put in the open community yet - it will be soon ? for the moment - use `kernel-console` and overload it with whatever you want to pass to the Kernel - https://github.com/digitalrebar/provision-content/blob/master/content/params/kernel-console.yaml ? not sure on the `ClientIdentifier` mismatch - but "it's working" for us at the moment :slightly_smiling_face:


rik
2018-12-26 04:01
Thanks @shane and @zehicle. If I figure out that the networking issue is something sledgehammer- or DRP-related, I?ll post back here.

zehicle
2018-12-26 04:05
Cool. We'd like to know either way. If you do a regular centos install, it may show you the issue.

shane
2018-12-26 04:15
Might be a NIC driver issue? Sledgehammer is based on stock centos 7.6

kraven
2018-12-26 13:13
are there any docs for creating a plugin provider?

kraven
2018-12-26 13:15
I was going to try to use ipmi and VirtualBMC with my libvirt vm's but I just realized it was licensed only when I tried to install it.

kraven
2018-12-26 13:19
So I was thinking maybe a simple plugin provider that would enable running commands on the machines via ssh

greg
2018-12-26 14:51
@kraven - we have one. kvm-test. We haven?t productized it but @vlowther uses it for testing with KVM/libvirt on his setup.

kraven
2018-12-26 16:24
so for internal use only then?

vlowther
2018-12-26 16:53
Eh, more like "you need to be me to use it anyways".

kraven
2018-12-26 17:18
well I code so if there were an example or docs on how to create a plugin provider I should be able to hack something together to do what I'm wanting

zehicle
2018-12-26 17:56
there's an example under /cmd for the incrementer plugin in the provision repo


zehicle
2018-12-26 19:04
I'm about (next few days) to update the license flow to so any registered user can use plugins - it will not require creating an org. Makes it even more self service.

kraven
2018-12-26 23:05
thanks @zehicle. That's what I needed! I'll see what I can get going.

zehicle
2018-12-26 23:06
@kraven in the mean time, I can create an org for you so that you can get started now... I'll send you info via DM

adam.lemanski
2018-12-27 03:35
just created https://github.com/digitalrebar/provision-content/pull/181 , hope it helps to make krib ha more usable :slightly_smiling_face: works for me

kraven
2018-12-27 13:38
Thanks @adam.lemanski I was going through some video demos yesterday and tried to clone the ha example but the cluster didn't come up with 3 masters so hopefully this fixes that. I'll let you know very soon.

kraven
2018-12-27 13:39
looking at the templates it looks like the live cluster should work with coreos base? I've historically been using coreos so staying with that would be more comfortable for me.

kraven
2018-12-27 13:40
btw, I'm almost done with a metallb addition to krib. I'll send a pr once it's working

kraven
2018-12-27 13:42
doing it primarily to help me get more familiar with drp. it's pretty simple in l2 mode.

adam.lemanski
2018-12-27 13:55
Cool, hope it helps. I have no experience with coreos. Actively fighting with rook/ceph but now I am in my long weekend off. Continuing next week Thursday/Friday. Metallb or other ingress is on my todo list too.

adam.lemanski
2018-12-27 13:56
@kraven maybe I'm lucky and you have done it until then :)

kraven
2018-12-27 14:02
I fought with rook/ceph a little and then went to rancher longhorn or openebs. It's pretty easy with coreos because you can just enable a couple of iscsi oneshot services and set a config option in the ignition file, map a couple iscsi binaries into the kubelet container, and it "just works" with a simple helm chart install.

kraven
2018-12-27 14:03
I had it working with rancher rke but I wanted a bit more flexibility so went looking for alternative solutions

kraven
2018-12-27 14:06
drp is looking like exactly what I've been searching for so far. I like the templates and the matchbox + terraform combo of tyhoon was a little clunky

adam.lemanski
2018-12-27 14:06
So far rook/ceph operator via helm integration in krib worked, just need further cleaning on krib side since somehow just 2 of 3 masters have the noschedule taint for masters

adam.lemanski
2018-12-27 14:06
So I've got an osd running on the master

adam.lemanski
2018-12-27 14:08
Looking for a simple storage solution (software) supporting disks and object storage, thought ceph is best matching so far

kraven
2018-12-27 14:08
rancher longhorn I like over openebs because it has a nice web gui where you can check the status of the pvc cluster and also backup and restore volumes from s3 or nfs

adam.lemanski
2018-12-27 14:09
I will take a look at it

adam.lemanski
2018-12-27 14:09
Thanks for the hint

kraven
2018-12-27 14:10
:slightly_smiling_face: I'll send pr's if I can get it working in krib properly

adam.lemanski
2018-12-27 14:10
So far drp is just awesome, simple and works as expected

kraven
2018-12-27 14:11
I also deploy minio somewhere so I can do my storage backups to my unraid nas

adam.lemanski
2018-12-27 14:13
Last time I checked minio it was quite limited but maybe I will evaluate more

kraven
2018-12-27 14:13
it's just s3 with a nice gui that you can attach to a nfs share if you want

kraven
2018-12-27 14:13
or raw disk

kraven
2018-12-27 14:14
has a nice helm chart that makes it quick and easy to spin up

kraven
2018-12-27 14:15
and I guess after getting ingress and storage as a service setup my next piece will be postgres as a service via patroni. watched a nice kubecon video on it this morning and it got me excited to try it

adam.lemanski
2018-12-27 14:32
Seems your agenda is mostly the same as mine. Thinking of coackroachdb but have to check with our devs about it.

bagricola
2018-12-27 14:35
whew, so just before christmas i bricked a cumulus switch by power cycling it while it was updating grub

bagricola
2018-12-27 14:35
turns out you can recover them by booting off usb, if you know the super secure bios password

bagricola
2018-12-27 14:35
?admin?

bagricola
2018-12-27 14:36
reflashed onie from usb, rebooted into onie installer mode and DRP + ansible took it from there, all fixed :smile:

kraven
2018-12-27 14:54
@adam.lemanski I like the idea of cockroach and may be able to switch to it in the future but most of my workloads are java/spring/jpa/hibernate so right now the quickest transition from standalone sqlserver is a postgres cluster

vlowther
2018-12-27 15:16
Setting up a postgres cluster is a good reason to look at cockroachdb. :-)

adam.lemanski
2018-12-27 15:18
Exactly

kraven
2018-12-27 15:25
maybe I need to look at it again. I thought it wasn't a direct replacement

vlowther
2018-12-27 15:44
It isn't, but setting up and babysitting a Postgres HA cluster is more or less a full time job of its own. And you are only getting active-passive with one master.

eric
2018-12-27 16:06
has joined #community201812

zehicle
2018-12-27 16:21
@eric $welcome

2018-12-27 16:21
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

eric
2018-12-27 16:41
Finally a reason to keep Slack open :slightly_smiling_face:

kraven
2018-12-27 17:36
okay, got metallb working and pr sent :slightly_smiling_face:


kraven
2018-12-27 17:42
what subnet or ip should I use for the krib/cluster-master-vip?

m.vandenhoogen
2018-12-27 19:30
has joined #community201812

kraven
2018-12-27 21:49
should krib/cluster-master-vip be removed by krib-dev-reset? it's not inserted by the install so not sure why it is cleared.

kraven
2018-12-27 21:54
and to answer my pwn prior question krib/cluster-master-vip should be set to an IP not used by any cluster node and also not in the dhcp range

zehicle
2018-12-27 23:00
@kraven the krib-dev-reset has been mainly used for single master clusters so far. there may be omissions for using it with a multi-master one

kraven
2018-12-28 11:54
@zehicle okay, I think krib/cluster-master-vip just needs to be removed from WIPE_PARAMS. I don't think a master count check is needed but that's another way to go. I'll send a PR.

kraven
2018-12-28 15:05
so starting from krib-live-cluster workflow with sledgehammer BootEnv how can I configure a workflow that would reboot the machine and come back up in discovery with sledgehammer BootEnv?

kraven
2018-12-28 15:10
I need the reboot otherwise when I go to krib-live-cluster again mount-local-disks fails with ``` mkfs.xfs: /dev/sda1 contains a mounted filesystem Command exited with status 1 ```

greg
2018-12-28 15:13
Clear the workflows, set the stage to local, then set the workflow back to krib-live-cluster.

greg
2018-12-28 15:14
This should cause the runner to reboot the machine, but if you change the workflwo back to discovery during machine post, it will boot that env.

kraven
2018-12-28 15:15
that worked. thank you

kraven
2018-12-28 15:24
@greg is krib-helm stage working for you in HA mode?

kraven
2018-12-28 15:25
mine is failing tiller install

greg
2018-12-28 15:26
I think I saw something like recently, but I haven?t tried.

greg
2018-12-28 15:26
I was trying something else for @zehicle at the time.

zehicle
2018-12-28 15:27
reviewing... I never tested the krib-helm in HA setups

zehicle
2018-12-28 15:28
the nuance for that stage is that it assumes the kube.config file is resident on the master

zehicle
2018-12-28 15:28
it may be a safer assumption to always get the conf from the profile

zehicle
2018-12-28 15:28
would be a smart thing to test for file exists and then retrieve if it's missing

zehicle
2018-12-28 15:29
I'm sure that tiller check test will fail without the credentials

zehicle
2018-12-28 15:30
@kraven ^^

kraven
2018-12-28 15:30
okay

kraven
2018-12-28 15:31
I'm testing again with non-ha to make sure it's working there still

greg
2018-12-28 15:57
@adam.lemanski @kraven - tip provision-content should have your changes in it now.

greg
2018-12-28 15:57
Thanks

kraven
2018-12-28 16:01
good deal

kraven
2018-12-28 16:29
@greg does this accurately show what you're going for with the krib ha + metallb setup? https://docs.platform9.com/assets/pmk-1276.png

kraven
2018-12-28 16:30
then tack on istio for ingress

greg
2018-12-28 16:37
I think so. Need to think about it some more.

kraven
2018-12-28 16:40
can't use metallb for the API becase chicken/egg so need keepalived

kraven
2018-12-28 16:40
but then use metallb for any other services

greg
2018-12-28 16:44
Yes - I think that makes sense.

kraven
2018-12-28 18:29
@zehicle ruled out the kubecrl admin.conf being the issue. If I look at the cluster there is a tiller pod but it's stuck in pending

kraven
2018-12-28 18:30
`0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate.`

kraven
2018-12-28 21:07
fix is `kubectl taint nodes --all http://node-role.kubernetes.io/master-` because all I had in the cluster were 3 masters

kraven
2018-12-28 21:17
and that is needed because krib-kubeadm.cfg.tmpl add NoSchedule taints to all masters

kraven
2018-12-28 22:45
fixed with PR

kraven
2018-12-28 22:58
and it all came up this time with metallb and istio ingress

kraven
2018-12-29 14:03
morning!

kraven
2018-12-29 14:04
was looking at the kubernetes-dashboard and noticed I wasn't getting graphs even though all the heapster stuff gets installed properly via krib

kraven
2018-12-29 14:06
tracked it down to port 10255 readonly port was disabled in k8s 1.12.0

kraven
2018-12-29 14:06
can be enabled in the kubelet config but possible security risk

kraven
2018-12-29 14:07
to prevent the security risk need to add iptables rules to only allow the master node ips to talk to that port

kraven
2018-12-29 14:08
``` $ iptables -A INPUT -p tcp --dport 10255 -s YYY.YYY.YY.YY -j ACCEPT $ iptables -A INPUT -p tcp --dport 10255 -j DROP ```

kraven
2018-12-29 14:08
ACCEPT line for each master IP

greg
2018-12-29 14:35
That is on all nodes running kubelet?

kraven
2018-12-29 14:39
yes

kraven
2018-12-29 14:41
found an alternate to modify the heapster config but it gives heapster cluster-admin role and I don't know if that's more or less secure


kraven
2018-12-29 14:42
translate works fine on there

kraven
2018-12-29 14:58
shouldn't be a concern once they switch off heapster https://github.com/kubernetes/dashboard/issues/2986

kraven
2018-12-29 15:09
well adding `readOnlyPort: 10255` to the kubeadm.cfg kubeletConfiguration did not work

kraven
2018-12-29 15:10
I dumped out the kubelet config from the running cluster and it's not there so I'll try the second method now

kraven
2018-12-29 15:28
option 2 worked

rik
2018-12-31 01:08
OK, @shane & @zehicle - I think I've gone as far as I can without being able to rebuild sledgehammer. Here's some data points:

rik
2018-12-31 01:11
When I boot with sledgehammer, `stage1.img` successfully gets an IP address on the interface via DHCP. Then it mounts `stage2.img` and runs `systemd`, etc. At some point after this, the interface config is lost (see below).

rik
2018-12-31 01:16
By default `systemd` attempts to rename the interfaces (in my case from `eth0` to `enp6s0`), however I can control that behaviour with kernel boot args. The interface renaming fails with the error `systemd-udevd[635]: Error changing net interface name 'eth0' to 'enp6s0': Device or resource busy`. Which means the interface is up when it's attempting to rename it. This might be normal behaviour (I'll try on a different host on which a previous version of sledgehammer has worked before), given the 2-stage boot approach.

rik
2018-12-31 01:25
The issue is that the interface now is UP, but not configured with an IP address [I can manually apply an IP address at this point although not very helpful in an automated workflow :wink:]. If I don't manually configure the interface, it will sit at this point for about 30 minutes (attempting to pull `start-up.sh`) and then there will be a `NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out` in the journal and at this point it will send out a new DHCP request, get an IP address, download `start-up.sh` and complete the discovery.

rik
2018-12-31 01:27
Changing the interface renaming (by using `biosdevname=1` to use the bios naming scheme or `net.ifnames=0` to turn off renaming entirely) changes the renaming error, but does not change the "30 minute delay" behaviour.

rik
2018-12-31 01:30
As mentioned before, I get the error `systemd-networkd[2954]: [/etc/systemd/network/20-bootif.network:8] Unknown lvalue 'ClientIdentifier' in section 'DHCP'` in the journal, since `systemd` is version 219 in the current sledgehammer and the `ClientIdentifier` option was not added until `systemd` version 220. Again, I'm not convinced this is causing the issue, since DHCP does eventually give me an IP address.

rik
2018-12-31 01:53
I was going to try forcing a different device driver (as per @shane's suggestion), but it looks like the only spare Ethernet card I had lying around can't do network boot. It's currently using `r8169`.

rik
2018-12-31 02:03
Once I've been through the 30 minute wait for discovery, doing an _install_ works fine (other than my stupid typos in the `postinstall` script). Since install appears to be using a boot image with the same OS, the Ethernet card driver is the same (the `r8169` driver version is the same in `sledgehammer` and the `centos-7-install` environments). I haven't left a system installed for long before running another test, but networking seems to work fine on an installed system. So, if it is a driver problem, it's maybe something triggered by the `stage1.img` to `stage2.img` trick?

greg
2018-12-31 04:19
@rik - What DHCP options are you sending?

greg
2018-12-31 04:19
I?ve found that DNS can be a problem.

greg
2018-12-31 04:20
Making sure that options 3, 6, and 15 are correct can adjust this.

greg
2018-12-31 04:20
Also 30 minutes is a long time. I?ve never seen that kinda of timeout.

rik
2018-12-31 04:29
I'm using an external DHCP, sending options 3, 6 and 15.

rik
2018-12-31 04:29
re: 30 minutes - you're telling me! good thing I've got other things to do while waiting for that timeout...

greg
2018-12-31 04:30
What DHCP server? If ISC DHCP Server, you are setting the use mac address instead of client identifier flag?

rik
2018-12-31 04:38
yes, ISC v4.4.1

rik
2018-12-31 04:38
and yes, matching on mac address

greg
2018-12-31 04:38
Are you machines dual homed ?

rik
2018-12-31 04:38
(I don't trust client-id)

rik
2018-12-31 04:39
The DHCP server is. I've tried the target server both with and without a second Ethernet - no change.

rik
2018-12-31 04:39
The DRP server is not dual-homed

greg
2018-12-31 04:39
The victims?

rik
2018-12-31 04:40
victim=target?

greg
2018-12-31 04:40
yes

greg
2018-12-31 04:44
You have something like this in your ISC config ``` # Client control deny duplicates; one-lease-per-client on; ```

greg
2018-12-31 04:45
Hmm - what kinda of machines are these?

rik
2018-12-31 05:06
I don?t have those config options, but will try them when I?m back at the console.

rik
2018-12-31 05:06
The machines are a hodgepodge of home lab equipment. The DCHP server is running on a VM.

rik
2018-12-31 05:07
None of it is enterprise-grade.

rik
2018-12-31 08:31
OK, tried the DHCP config suggestions - no difference. I've captured a `tcpdump -vv` of the traffic to & from the target host if that's of use. I can't see any evidence of a second lease being provided. It looks reasonable to me, although I noticed for the first time that on some DHCP requests it gets `default.pxe` and sometimes `ipxe.ipxe`. Doesn't seem relevant to this issue though. Also, when you were asking about 'dual-homed', I should have mentioned that none of the systems have more than one interface on the same network.

kraven
2018-12-31 12:19
started working on nginx-ingress and cert-manager with krib

kraven
2018-12-31 12:24

kraven
2018-12-31 12:25
it's doable with some hackery but I'll wait for istio to add proper support

kraven
2018-12-31 12:26
if I really end up needing service mesh can always use nginx in front of istio

kraven
2018-12-31 14:56
so I see how to load a template in the task but what if I need to only load that template if a param exists? can I render the template to a file from withing a .sh template?

kraven
2018-12-31 15:08
found it ``` echo "expanding template {{$template}} as {{$name}}.yaml" cat > {{$name}}.yaml << EOF {{ $render.CallTemplate $template $render }} EOF ```

zehicle
2018-12-31 15:55
that was one of the first custom renders that we needed :slightly_smiling_face:


kraven
2018-12-31 16:06
hmm, how do I escape a golang template within a template?

kraven
2018-12-31 16:06
lol

kraven
2018-12-31 16:06
have this function in my sh.tmpl file ``` getIngressIp() { echo $(kubectl get svc nginx-ingress-controller \ --namespace kube-system \ -o=go-template --template='{{(index .status.loadBalancer.ingress 0 ).ip}}') } ```

kraven
2018-12-31 16:11
```--template='{{`{{(index .status.loadBalancer.ingress 0 ).ip}}`}}'``` works

kraven
2018-12-31 16:24
do I need to do anything get helm initialized in a job other than make sure my stage comes after the helm stage?

kraven
2018-12-31 16:25
calling helm install I'm getting ``Error: failed to download "stable/nginx-ingress" (hint: running `helm repo update` may help)``

kraven
2018-12-31 16:43
started everything off the same as krib-helm.sh.tmpl so not sure what I'm doing wrong other than not being in the same task as krib-helm-init.sh.tmpl