i.grischott
2018-01-01 21:00
@i.grischott uploaded a file: https://rackn.slack.com/files/U7U02J6LX/F8M1FMZPF/image.png and commented: drpcli prefs set unknownBootEnv discovery defaultBootEnv sledgehammer defaultStage discover works normal..

shane
2018-01-01 21:02
@i.grischott yes there is....

shane
2018-01-01 21:03
@zehicle and @meshiest - I suspect that it's fallout from ui twiddling over the last week....

wdennis
2018-01-01 21:08
Happy New Year rebar-ites!

wdennis
2018-01-01 21:09
Maybe in 2018 we can think of a way to do automated UI testing?

i.grischott
2018-01-01 21:09
My testlab for testing cluster setup:

shane
2018-01-01 21:15
@wdennis - we'll gladly accept any checkins or code related to any and all testing efforts ...

wdennis
2018-01-01 21:23
@shane would it be hard in the UI for it to display the POST (etc) request that it is sending to the API server? (like in a 2-line bottom pane that could be opened/closed)

wdennis
2018-01-01 21:24
Might be a good way to audit what the UI is sending the API server (or is that all logged somewhere?)

i.grischott
2018-01-01 21:27
@i.grischott uploaded a file: https://rackn.slack.com/files/U7U02J6LX/F8LSTDJUA/img_8644.jpg and commented: Testlab: 10 x DELL R610 everyone of this has 2x quadport 1Gb nic's an 1x dualport 10Gb i separated with 7x vlan's on 5 switch's there are different hdd sets on each server some with ssd's other with ssd and large hdds .. what do you think.. it's a good setup with vlan's? and it's possible to bond the nic's ? how can i do this.. how can i set different disk layout's on the machines? is there a best practice guide? ssd's for controller.. My idea is setup kubernetes on baremetal... and the next layer is openstack over there.. i think it's could be useful to install container linux from CoreOS for the base system instead of sledgehammer or centos, because container linux is very secure and small and has a good update strategy with a dual partition system.. what do you think about that.. there is a way to install ansible on it maybe with ActivePython.. https://vadosware.io/post/installing-python-on-coreos-with-ansible/ but i don't know about the security.. Sorry for my noob-questions.. the power is with you master yoda :slightly_smiling_face:

i.grischott
2018-01-01 21:28

wdennis
2018-01-01 21:29
@i.grischott I want your lab :slightly_smiling_face:

i.grischott
2018-01-01 21:33
:slightly_smiling_face:

i.grischott
2018-01-01 21:33
I want your knowledge

i.grischott
2018-01-01 21:34
ähh wisdom :wink:

2018-01-01 21:39
@wdennis commented on @i.grischott?s file https://rackn.slack.com/files/U7U02J6LX/F8LSTDJUA/img_8644.jpg: What are the ?OpenNet? things at the top? Modular patch-bays, or ??

vlowther
2018-01-01 21:42
@wdennis Speaking of auditing, the thing I have been working on over the holidays is nearing completion -- an in-memory log buffer with threaded per-request logging.

i.grischott
2018-01-01 21:43
yep. open-net it's a vendor of patch-panels.. and similar..

vlowther
2018-01-01 21:44
The log levels have changed from numbers to meaningful names, and the log levels set via dr-provision args or pref updates (via the UX) can be overridden on a per-request basis.

wdennis
2018-01-01 21:55
@vlowther sounds good.

vlowther
2018-01-01 21:56
and there will shortly be an audit log level :slightly_smiling_face:

wdennis
2018-01-01 21:56
I just wonder how to test / regression-test the UX elements?

vlowther
2018-01-01 21:56
mutters something about selenium and saucelabs

zehicle
2018-01-01 23:23
@wdennis the browser already records all that - just open the dev panel and look at the network traffic.

ctrees
2018-01-02 12:55
@wdennis on the UX testing... in previous life-times I used a lot of selenium.... I'm good with starting with that and probably some sort of cucumber top-level-document.... ideally work that in with read-the-docs...

ctrees
2018-01-02 12:58
The issue I've had in the past is 'words'... the 'wording' and 're-use' become pretty darn pivital... so most the time code base dies the death of no-useage

ctrees
2018-01-02 13:06
I'm good with coding up some selenium / cucumber I'll attempt something this morning...

ctrees
2018-01-02 14:58
So... I'm taking some guesses... I'm going to assume nodejs as the backend for cucumber that drives webdriver... (there is a golang version... but I figure this is really for UX so js is probably the choice for UX stuff... I'll probably fire up Shayne's 5min test with packet target...

ctrees
2018-01-02 15:02
Since DRP is so command line... I'm thinking the test should 'check the command line' and use that to go find in the UX... and sort of helps my mental map of the cli and reflective UX...

shane
2018-01-02 17:34
- our first meetup of 2018 starts in about 90 minutes ... hope to see you all there ... agenda: https://docs.google.com/document/d/1cQsuWdkHQU-uHh0S3N9RqgYhHGW8gi5tb4fbE98OWvc meetup page: https://www.meetup.com/digitalrebar/events/xmrktnyxcbdb/


i.grischott
2018-01-03 11:34
@i.grischott uploaded a file: https://rackn.slack.com/files/U7U02J6LX/F8LR3R25N/image.png and commented: With TIP v3.5.0-tip-22-3c3877c4bb5389bc92215a1bfa5cba22e39e8076 the Save-Button works, but not on the Endpoint Managment.

i.grischott
2018-01-03 11:35
what is the default root password on centos-7-install ? where can i set them?


ctrees
2018-01-03 14:11
7.3. Default Template Identity These settings apply to TEMPLATES only not the API. The default password for the default o/s templates is RocketSkates The default user for the default ubuntu/debian templates is rocketskates

ctrees
2018-01-03 14:12
So I think... user: rocketskates pw: RocketSkates

i.grischott
2018-01-03 14:12
thanks..

ctrees
2018-01-03 14:13
and the UX 'I think' has some refresh sync issues (which may be related to what you are seeing in the Save)

i.grischott
2018-01-03 14:15
@i.grischott uploaded a file: https://rackn.slack.com/files/U7U02J6LX/F8MLYU5L5/image.png and commented: i try to deploy k8s.. but i don't know my next steps.. :tired_face:

ctrees
2018-01-03 14:15
I was attempting to clear out old endpoints and the save didn't remove the endpoints from the UX portal..

shane
2018-01-03 14:15
@i.grischott - can you please submit an issue, and tag it "provision-ux-bug" - at: https://github.com/digitalrebar/provision/issues

i.grischott
2018-01-03 14:15
ok i try..

shane
2018-01-03 14:16
@ctrees - same for you - ticket please, we have some additional UI help right now, so if you can submit soonest, the issue should be addressed quickly - before @meshiest goes back to University ... :slightly_smiling_face:

ctrees
2018-01-03 14:16
I'll attempt the same for save endpoint...

shane
2018-01-03 14:16
also - try for root user: rebar1

shane
2018-01-03 14:17
to change it - view the kickstart seed - there is a Param that overrides the default - along with the Param that specifies to enable allowing Root SSH access after install - we HIGHLY SUGGEST NOT doing that - but that's up to your local policy

shane
2018-01-03 14:17
our recommendation is to use an SSH key - and put that in place

shane
2018-01-03 14:18
to use SSH key - set the Param "access-key" with a List of public key halves to add

shane
2018-01-03 14:19
search Slack for `access-key` and `access-ssh-root-mode` both greg and I have posted some info here - I'll draft a proper Doc page on this

shane
2018-01-03 14:19
(today)

shane
2018-01-03 14:21
thx @i.grischott we got the issue submission

greg
2018-01-03 14:28
centos is root/RocketSkates

greg
2018-01-03 14:28
ubuntu is rocketskates/RocketSkates

greg
2018-01-03 14:28
The parameter: `provisioner-default-password-hash` and be used to set the user

greg
2018-01-03 14:29
the parameter: `provisioner-default-user` can be used to set the default user on ubuntu/debian

greg
2018-01-03 14:29
The parameter: `provisioner-default-uid` can be used to set the uid of the default user on ubuntu/debian

greg
2018-01-03 14:30
The password hash needs to be generated per google. :slightly_smiling_face:

shane
2018-01-03 14:44
```mkpasswd -m sha-512 'my_password' `mktemp -u XXXXXXXXXXXXXXXX` ```

i.grischott
2018-01-03 16:00
is there a user-guide or howto for the ux to setup a k8s cluster... i don't know which selections of actions i need in which order. ..

greg
2018-01-03 16:06
@i.grischott - My order is this.

greg
2018-01-03 16:06
1. create your k8s-cluster profile, e.g. my-k8s-cluster

greg
2018-01-03 16:07
2. add a parameter to that profile: `krib/cluster-profile` = `my-k8s-cluster`

greg
2018-01-03 16:14
3. Using workflow editor, add the following workflow to the `my-k8s-cluster` profile.

greg
2018-01-03 16:18
a. centos-7-install -> runner-service:Success b. runner-service -> finish-install:Stop c. finish-install -> docker-install:Success d. docker-install -> krib-install:Success e. krib-install-> complete:Success f. discover->sledgehammer-wait:Success

greg
2018-01-03 16:18
The last entry is to handle discovery if you reimage the servers.

greg
2018-01-03 16:19
4. Add the profile to all the machines you want in the cluster.

greg
2018-01-03 16:19
5. Change stage on all the machines to `centos-7-install`

greg
2018-01-03 16:19
6. Reboot all the machines in your cluster.

greg
2018-01-03 16:19
then wait for them to get to complete.

i.grischott
2018-01-03 16:28
thanks very much..

zehicle
2018-01-03 17:48
we have videos of this process - nothing really documented. if you make notes about your install, we'll try to get it into the docs (pull requests welcome of course too)

shane
2018-01-03 17:48
...I'm actually writing that documentation ... right ... now ...

ctrees
2018-01-03 18:15
what's the timeout of the auth token on the UX portal ?

ctrees
2018-01-03 18:16

zehicle
2018-01-03 18:18
@ctrees 60 minutes right now. we're working to get auto-renew working. hopefully today (by EOW latest)

zehicle
2018-01-03 18:20
the endpoint tokens are 8 hours by default and renew at 4 hours.

ctrees
2018-01-03 18:24
I'm just attempting to script the UX login... having some successful failures :wink: but I think I need to add 'Given I am authenticated' function... you were right, you don't have good UX hooks... but there is a way in react to 'fix' that.... I don't see any 'RackN DSL' in the class tags... (I'm sort of shocked as all you guys seem to be very object and word use careful).... BTW... how do you test cli and api ? I saw only the one call in the .travis.yml but also see the code coverage call...

ctrees
2018-01-03 18:26
.... sorry too many questions in one blurb... it can wait till I push a test to explain my questions...

greg
2018-01-03 19:01
the `tools/test.sh` in the drp directory runs all the golang unit tests.

greg
2018-01-03 19:01
This tests the cli, api, and internal components.

greg
2018-01-03 19:01
It does pretty well generally. it doesn?t test the UX.

ctrees
2018-01-03 19:22
thanks... since the UX is really just a reflection of the cli / api I bounce back and forth in a feature test... I about fired up a golang based function backend for the feature reg-ex... but the UX is so tied to js I went that way... BUT I'll dig into tools/test.sh and leverage what-ever is there for cli api

ctrees
2018-01-03 19:24
I need an excuse to learn go... this is the best excuse I've had all year :wink:

shane
2018-01-03 22:18
- I finished a first pass at the KRIB documentation - if anyone would like to take it for a spin and let me know if you run in to any issues or questions ... would aprreciate it ... http://provision.readthedocs.io/en/latest/doc/integrations/krib.html

zehicle
2018-01-04 00:32
@ctrees the org/endpoint update list should be working again.

ctrees
2018-01-04 15:21
So I've created a 'test user' and as I was doing both the sign up and sign in... I noticed the "client_id"... I take it that's tied to the session ? (aka it's the expire token that deals with backend on amazon stuff)...

ctrees
2018-01-04 15:24
...wait... I'm going to create an issue in github and just ping here...

ctrees
2018-01-04 15:44
@zehicle no happiness on a re-test https://github.com/digitalrebar/provision/issues/612

zehicle
2018-01-04 15:47
The front end change was in queue - just released it.

zehicle
2018-01-04 15:47
will take about 10 minutes

zehicle
2018-01-04 15:48
sadly the refresh token thing is fighting against AWS Cognito docs.

ctrees
2018-01-04 15:48
ok... I'll hit it later.. I just created a test user and getting scripts to work... so I should be able to regression Signup, Login and Logout (as that pretty generic).

zehicle
2018-01-04 15:49
awesome!

ctrees
2018-01-04 15:49
I'll dump details in issue 620 (just created)


ctrees
2018-01-04 17:54
@zehicle that re-direct client ID is session unique ?? correct ??


ctrees
2018-01-04 17:54
The client_id part

ctrees
2018-01-04 17:54
aka 'oath-ish'

ctrees
2018-01-04 17:56
I think I get the reason... just messing with my head (and scripts) sorting out URL tracking stuff...

ctrees
2018-01-04 17:58
and the hover animations... are they sort of 'required' for event triggering or just for 'looks'

zehicle
2018-01-04 19:08
@ctrees yeah - it's a redirect that returns the JWT session token. we don't get the passwords at all, just the session token. we register the specific URLs that are allowed to do a redirect.

zehicle
2018-01-04 19:09
a future benefit is that we can use other SSOs to auth

ctrees
2018-01-04 19:21
I follow... what I was stumbling on is I need to pick up the token THROUGH the redirect BEFORE I attempt login... aka no real static 'login' page as I need to git the token first...

ctrees
2018-01-04 19:23
sorry... get token :wink: . Just need to add that to Page Object Login Function... ( it's not immutable ;-)

zehicle
2018-01-04 19:23
the token gets stored in the session, so you can recover it

zehicle
2018-01-04 19:24
that way it survives browser refresh

ctrees
2018-01-04 19:32
yup... but I'm starting the test from 'state-less' so I'll need to pick up the token each time for the browser session. BTW... with all this, I'd suggest NOT doing 'auto refresh' just make sure the browser session js dirties the cache well before token expire AND set a modal in the browser session that forces a re-auth... that lets the users setup deal with re-auth (aka don't attempt to help keep anything alive, just make sure the user see they need to re-auth)... you've got a 're-auth' built-in button I see.. ( that or I'm confused about "auto-renew" )

zehicle
2018-01-04 19:33
Cognito is supposed to issue a 30 day refresh token so that we don't have to keep forcing a login. we're working on that right now

zehicle
2018-01-04 19:44
once we get that @ctrees it may be able to just work w/o login once you have the refresh token stored

ctrees
2018-01-04 22:11
catmini:testyourlogin msops$ npm test > test-your-login@0.0.1 test /Users/msops/Code/testyourlogin > wdio ------------------------------------------------------------------ [chrome #0-0] Session ID: 70b95f2434c4362c75f4012ac95a0922 [chrome #0-0] Spec: /Users/msops/Code/testyourlogin/test/login.spec.js [chrome #0-0] Running: chrome [chrome #0-0] [chrome #0-0] Login Page [chrome #0-0] ? should look nice [chrome #0-0] ? should let you login with valid credentials [chrome #0-0] [chrome #0-0] [chrome #0-0] 2 passing (16s) [chrome #0-0] catmini:testyourlogin msops$

greg
2018-01-04 22:11
nice

ctrees
2018-01-04 22:11
as ALWAYS... it's something simple that causes me days :wink:

ctrees
2018-01-04 22:12
that and remembering yet-another-language-i-had-forgotten


ctrees
2018-01-05 03:36

ctrees
2018-01-05 03:48
@zehicle was right... the react stuff does make finding elements (actually waiting for them to appear) abit harder, but there are ways around it and the webdriver community is active... I'm just glad they are moving off ruby ( npm sure helps )

ctrees
2018-01-05 03:53
I still have to figure out expected failure UX indicators and 'how to bubble problems up to the user'

2018-01-05 14:49
@rackneng ok back to this.... any info anywhere in this KRIB install?

2018-01-05 14:49
anyone... someone... bueller ...

shane
2018-01-05 14:49
complete documentation is done


2018-01-05 14:50
@rackneng i can do this on VMs right ?

2018-01-05 14:50
as a test bed?

2018-01-05 14:57
another thing id love to see rebar support is bare metal XENServer and Triton installs

shane
2018-01-05 14:58
we already support XENServer - you only need add appropriate packages to install it on top of an existing BootEnv - that's a minor Stage change to add the workflow

shane
2018-01-05 14:59
you can use VMs - but managing the power actions of VMs only natively works in DRP via VirtualBox - other hypervisors will work - but you may need to manage the Power (on/off/reboot) and PXE (next boot) options yourself

zehicle
2018-01-05 15:31
@outbackdingo can you give some more details about what type of support you are looking for? Is this installing XENserver? power mgmt of vms? detecting vms?

2018-01-05 16:49
@rackneng well basically, we have both XenServer and Triton nodes deployed, and OpenStack ..... so in a way im workin to integrate into a more reasonable way of deploying things KRIBs on bare metal is just another path.... and thats just for my infrastruture, i wount even begin to think about how my clients can utilie this like most server resellers provide

2018-01-05 16:49
and sorry some of my million mile keyboard keys seems to have gone on holiday again

2018-01-05 16:52
my vision was to use rebar to do provisioning for clients through a web based system like all other server resellers

2018-01-05 16:52
but dont see that as short term feasible

zehicle
2018-01-05 17:10
@outbackdingo are you PXE booting the VMs on those platforms? the KRIB process relies on the CLI/API, not on PXE. It would be possible to use it from VM images if your images started the CLI and registered the node. It's a reasonable use case that would require some tweaks to facilitate.

zehicle
2018-01-05 17:10
The running VMs with DRP is something we are trying to understand better and would happily have a 1x1 design discussion about.

zehicle
2018-01-05 17:11
(that goes for anyone in the community)

2018-01-05 17:16
well right now i use XOA to create a PXE vm let itt boot rebar sees it then provisions it then i use XOA to reboot it, then rebar gets it and installs X on it and done

2018-01-05 17:17
so thats been the process for me for VMs still using 2 tools... XOA and rebar......

2018-01-05 17:18
i dont care how it needs to be done now... i dont mind some pain.... however in the future i really think rebar could shine if it had a multi-tenent ui

2018-01-05 17:19
as for KRIBs itself... yes i would like to spin up a 5 VM system to kick the tires I already have Triton SDc deployed so id like to compare them

2018-01-05 17:19
and well SDC itself is problematic now since it also uses PXE to install new bare metal nodes... :) YAY

detiber
2018-01-05 19:30
has joined #community201801

zehicle
2018-01-05 20:12
welcome @detiber!

detiber
2018-01-05 20:14
@zehicle thanks! I'm looking forward to kicking the tires with dr-provision

shane
2018-01-05 20:15
@detiber let us know if you have any questions ... and welcome

zehicle
2018-01-05 20:20
when you get far enough - we've got 2 k8s strategies w/ dynamic Ansible for Kubespray and the KRIB runner for Kubeadm.

zehicle
2018-01-05 20:25
@detiber RE: non-amd64 hosts. we don't have default images or a sledgehammer for them. it IS possible to detect arch and provide the right boot if we had the images. (adding @carl who has interest in ARM)

thays
2018-01-05 21:24
has joined #community201801

spector
2018-01-05 22:00
Welcome @thays

thays
2018-01-05 22:06
@spector Thanks! Can't wait to get it fired up and start building. Best place to look for any existing Saltstack integrations?

spector
2018-01-05 22:07
I will let the experts give you any info they have. @shane is the most likely to know

shane
2018-01-05 22:08
@thays we don't have any existing saltstack integrations at the moment - it's something on my personal bucket list to get done

shane
2018-01-05 22:08
however - if you wanted to tackle it - you could look at the `Ansible Reference` content as an example

shane
2018-01-05 22:09
coupled with how we handle secrets management in the `Kubernetes` (formerly named `krib`) content

shane
2018-01-05 22:09
the `Kubernetes` stuff shows how we handle tokens (in Saltstack's case it'd be pub/priv keys)

thays
2018-01-05 22:10
Cool..thanks I will check it out.

wdennis
2018-01-06 02:55
@shane you around?

wdennis
2018-01-06 02:56
Trying to add my krib profile to a new node I want to add to the cluster, getting a UX fail...

shane
2018-01-06 02:56
Nope :wink:

wdennis
2018-01-06 02:56

wdennis
2018-01-06 02:57
What does "Profile (at 0) does not exist" mean?

wdennis
2018-01-06 02:59
The `k8s-cluster1` profile absolutely does exist...

wdennis
2018-01-06 03:18
The UX is doing strange things to my data... It's putting quotes around the IPMI strings

wdennis
2018-01-06 03:19
As seen in UX:```ipmi/address: "idrac-796MQW1" ipmi/password: ### obfuscated text ### ipmi/username: "root"```

wdennis
2018-01-06 03:19
Via CLI: ```"Params": { "ipmi/address": "\"idrac-796MQW1\"", "ipmi/password": "xxxxxx", "ipmi/username": "\"root\"" }```

wdennis
2018-01-06 03:21
Why is that happening?

wdennis
2018-01-06 03:21
It's screwing up IPMI (actions fail)

wdennis
2018-01-06 03:23
:frustrated:

shane
2018-01-06 03:39
looking back through UX changes to see if something might have triggered that

shane
2018-01-06 03:39
how did you create the profile ?

wdennis
2018-01-06 03:40
Thru the UX back a week ago; have successfully deployed 4 machines (a K8s cluster) with it...

shane
2018-01-06 03:40
you running v3.5.0 on endpoint - any updates to endpoint or content in the last week ?

wdennis
2018-01-06 03:41
yes, and no updates

wdennis
2018-01-06 03:44
My KRIB profile: ``` { "Available": true, "Description": "", "Errors": [], "Meta": { "color": "", "icon": "", "title": "" }, "Name": "k8s-cluster1", "Params": { "access-keys": { "root": "ssh-rsa xxxxxx will@Wills-MacBook-Air" }, "access-ssh-root-mode": "yes", "change-stage/map": { "docker-install": "krib-install:Success", "finish-install": "docker-install:Success", "krib-install": "complete:Success", "runner-service": "finish-install:Stop", "ssh-access": "runner-service:Success", "ubuntu-16.04-install": "ssh-access:Success" }, "krib/cluster-admin-conf": { "apiVersion": "v1", "clusters": [ { "cluster": { "certificate-authority-data": "xxxxxx", "server": "https://192.168.1.114:6443" }, "name": "kubernetes" } ], "contexts": [ { "context": { "cluster": "kubernetes", "user": "kubernetes-admin" }, "name": "kubernetes-admin@kubernetes" } ], "current-context": "kubernetes-admin@kubernetes", "kind": "Config", "preferences": {}, "users": [ { "name": "kubernetes-admin", "user": { "client-certificate-data": "xxxxxx", "client-key-data": "xxxxxx" } } ] }, "krib/cluster-join-command": "kubeadm join --token 7dff02.8b4d92135c936919 192.168.1.114:6443 --discovery-token-ca-cert-hash sha256:xxxxxx", "krib/cluster-master": "1bcd8472-6c20-47b3-b9ff-f32731905bf1", "krib/cluster-profile": "k8s-cluster1", "local-repo": false, "operating-system-disk": "sda", "provisioner-default-fullname": "xxxxxx", "provisioner-default-password-hash": "xxxxxx", "provisioner-default-user": "xxxxxx" }, "ReadOnly": false, "Validated": true } ```

wdennis
2018-01-06 03:44
(w/ obvious redactions)

wdennis
2018-01-06 03:48
Trying to IPMI powercycle the node, doing this: `$ drpcli -E https://192.168.1.148:8092 machines action 174c3987-22a4-43d4-9eb9-0247162e8628 powercycle`

wdennis
2018-01-06 03:48
Getting a response returned, but no powercycle happening...

wdennis
2018-01-06 03:50
This is the output from that command: ```{ "Command": "powercycle", "OptionalParams": [], "Provider": "ipmi", "RequiredParams": [ "ipmi/username", "ipmi/password", "ipmi/address" ] }``` Alsoe the return code is '0': ```AirDennis:~ will$ echo $? 0```

wdennis
2018-01-06 04:03
OK, confused now - looks like one of the powercycles did run; it did run thru the stage-map for this node, and is on krib-install now; however, it is trying to do the join, and I'm seeing many entries like this at the end of the (running) job log: ``` [discovery] Trying to connect to API Server "192.168.1.114:6443" [discovery] Created cluster-info discovery client, requesting info from "https://192.168.1.114:6443" [discovery] Failed to connect to API Server "192.168.1.114:6443": there is no JWS signed token in the cluster-info ConfigMap. This token id "7dff02" is invalid for this cluster, can't connect [discovery] Trying to connect to API Server "192.168.1.114:6443" [discovery] Created cluster-info discovery client, requesting info from "https://192.168.1.114:6443" [discovery] Failed to connect to API Server "192.168.1.114:6443": there is no JWS signed token in the cluster-info ConfigMap. This token id "7dff02" is invalid for this cluster, can't connect ```

shane
2018-01-06 04:03
hmm - that is backend auth stuff with the Amazon Cognito pieces, I believe

shane
2018-01-06 04:04
oh wait

shane
2018-01-06 04:04
how long has this cluster been sitting ?

wdennis
2018-01-06 04:04
It's a k8s thing

shane
2018-01-06 04:04
It looks like the token is invalid as it's probably expired

shane
2018-01-06 04:05
to join the cluster

wdennis
2018-01-06 04:05
It's been up and running since 12/25

wdennis
2018-01-06 04:05
Ah, I see...

shane
2018-01-06 04:05
is 192.168.1.114 your endpoint ? and you set the API port to 6443 ?

wdennis
2018-01-06 04:05
It's the k8s master node

shane
2018-01-06 04:05
got it

wdennis
2018-01-06 04:06
So it's the k8s join token that's expired

wdennis
2018-01-06 04:08
So the join token stored in my profile is now invalid, I'm guessing...

shane
2018-01-06 04:08
yes

wdennis
2018-01-06 04:08
any way to refresh that?

shane
2018-01-06 04:08
I don't think we do anything with re-authing the tokens on our side - so that's going to be a k8s related issues

shane
2018-01-06 04:09
kubeadm has it's rough edges

shane
2018-01-06 04:09
I'm at a friends birthday party - so need to drop off - we can pursue this a bit further tmw

wdennis
2018-01-06 04:09
OK, have fun & thanks

ctrees
2018-01-06 04:30
so @wdennis was that " stuff just when you pulled things up in the UI editor ?

ctrees
2018-01-06 04:33
... I'm slowly learning react and re-learning what I forgot with selenium / webdriver in an attempt to UI test (and preform mechanical rob/shane tricks)

wdennis
2018-01-06 04:43
No, it was in the actual data (see above for CLI output)

wdennis
2018-01-06 04:45
God bless you @ctrees for taking a stab at UX testing

ctrees
2018-01-06 04:45
yea, I saw that... though you inverted the header.... as the screen shot had the escapes...

wdennis
2018-01-06 04:49
I have almost bailed on using the UX so many times due to bugs creeping in from constant dev

wdennis
2018-01-06 04:50
But it?s too pretty :dancer::skin-tone-3::joy:

ctrees
2018-01-06 04:51
I got login working then I had to re-write ... and I've got motivation from work...

greg
2018-01-06 05:46
@wdennis - the kubeadm join token is valid for 24 hours by default, I think. So, if you can only grow your cluster for a day. It is lame. I think we could add a parameter to the master config to change that. It may be that a token can be reissued as well by running kubeadm. these would be good future enhancements.

ctrees
2018-01-06 05:49
The SSL certificate used to load resources from https://qww9e4paf1.execute-api.us-west-2.amazonaws.com will be distrusted in M70. Once distrusted, users will be prevented from loading these resources. See https://g.co/chrome/symantecpkicerts for more information.

ctrees
2018-01-06 05:50
that came through the chrome browser dev console... as a warning....

wdennis
2018-01-06 14:26
Need to know how to terminate a running job (the `krib-install` stage)

wdennis
2018-01-06 14:27
Since it'll never succeed in finishing...

ctrees
2018-01-06 14:29
krib keep the runner running.... can't you just put a stop or ?? into the que and the runner picks it up ?

wdennis
2018-01-06 14:29
I have set "Runnable" to false on the node, but the drpcli process is still running `kubeadm` on the node

wdennis
2018-01-06 14:30
@ctrees That's what I don't know...

ctrees
2018-01-06 14:30
I'm just guessing as you've been playing with it way more than I... it's all just 'in theory' in my head...

wdennis
2018-01-06 14:30
I could -TERM kubeadm on the node, but not sure that's the right way to do it...

ctrees
2018-01-06 14:32
so in the krib demo... did he leave the runner going after kubeadm was installed ?

ctrees
2018-01-06 14:33
at the time, I was thinking OH this avoids ALL touches other than sledgehammer, but I was not sure how or if the runner migrated off sledgehammer

ctrees
2018-01-06 14:38
in my mind... there is a drp process ON the node IN sledgehammer that KNOWS it's a particular NODE, then there is the drp process that IS and knows it IS an endpoint and has runner que info for the node

wdennis
2018-01-06 14:39
Yes, the runner (drpcli "service") continues to run on the nodes

wdennis
2018-01-06 14:39
The "runner" is not tied to SH

wdennis
2018-01-06 14:40
It's a service that runs on SH

wdennis
2018-01-06 14:40
But you can also install/keep it running on any other install

wdennis
2018-01-06 14:41
That's the difference (AFAIK) between the `complete` (keeps runner running) and `complete-nowait` (terminates runner) stages

ctrees
2018-01-06 14:42
ah...

wdennis
2018-01-06 14:42
The KRIB stage-map ends with the `complete` stage

ctrees
2018-01-06 14:42
but the endpoint is the 'que repo' for the runner and the runner can dynamically update ?? right ??

wdennis
2018-01-06 14:43
Here?s my new node?s process tree: ``` root@k8s-ingress:~# pstree systemd???accounts-daemon???{gdbus} ? ??{gmain} ??acpid ??2*[agetty] ??atd ??cron ??dbus-daemon ??dhclient ??dockerd???containerd???10*[{containerd}] ? ??10*[{dockerd}] ??drpcli???bash???kubeadm???10*[{kubeadm}] ? ??11*[{drpcli}] ??irqbalance ??2*[iscsid] ??lvmetad ??lxcfs???3*[{lxcfs}] ??mdadm ??polkitd???{gdbus} ? ??{gmain} ??rsyslogd???{in:imklog} ? ??{in:imuxsock} ? ??{rs:main Q:Reg} ??snapd???9*[{snapd}] ??sshd???sshd???bash???pstree ??systemd???(sd-pam) ??systemd-journal ??systemd-logind ??systemd-timesyn???{sd-resolve} ??systemd-udevd ```

wdennis
2018-01-06 14:43
See the `drpcli` process? That?s the ?runner?

wdennis
2018-01-06 14:45
Yes, AFAIK, you can put more jobs into the ?hopper? for the node on the DRP server-side, and the node?s runner will dutifully pick them up and start executing them

wdennis
2018-01-06 14:46
I need to tell the runner somehow to terminate the `kubeadm` process? It?s just looping endlessly with:

ctrees
2018-01-06 14:46
so back to your issue, you can't tell the state of the "hopper" on the node side, but you should be able to see the "hopper que" on the endpoint side...

wdennis
2018-01-06 14:46
```[discovery] Trying to connect to API Server "192.168.1.114:6443" [discovery] Created cluster-info discovery client, requesting info from "https://192.168.1.114:6443" [discovery] Failed to connect to API Server "192.168.1.114:6443": there is no JWS signed token in the cluster-info ConfigMap. This token id "7dff02" is invalid for this cluster, can't connect```

wdennis
2018-01-06 14:47
```AirDennis:~ will$ drpcli jobs log 463014d6-60ef-43be-848c-406296037716 -E https://192.168.1.148:8092 | strings | grep Failed | wc -l 7858```

wdennis
2018-01-06 14:48
:joy: actually :cry:

ctrees
2018-01-06 14:52
well... it's less than the newly discovered prime number :wink:

wdennis
2018-01-06 14:53
LOL

wdennis
2018-01-06 14:54
At least yet...

ctrees
2018-01-06 14:54
Oh... krib vs kubespray ? opinion ? I was liking the idea of handing off to ansible

ctrees
2018-01-06 14:55
but I sort of see the krib motivation too...

wdennis
2018-01-06 14:55
From what I gather from @zehicle and what I've been reading about k8s so far, `kubeadm` is supposed to be the blessed k8s orchestration tool

wdennis
2018-01-06 14:56
It does not handle HA controllers yet, as `kubespray` does, but I don't need that for what I'm doing (PoC cluster)

ctrees
2018-01-06 14:57
yea... but spray can install kubeadm unless I'm not following... I was wondering if security (no ssh) is an issue

wdennis
2018-01-06 14:57
So I went with KRIB as it's a DRP one-pass solution

wdennis
2018-01-06 14:58
And the "new hotness" :stuck_out_tongue_winking_eye:

ctrees
2018-01-06 14:59
I get that... and 'look ma, no-ssh'

wdennis
2018-01-06 14:59
And faster, AFAIK - Ansible is slow many times

wdennis
2018-01-06 15:02
Oh well, here goes nothing...

ctrees
2018-01-06 15:02
trying the hard boot option ?

wdennis
2018-01-06 15:02
```root@k8s-ingress:~# ps auxww | grep kubeadm root 4046 0.0 0.3 107164 59104 ? Sl 03:37 0:14 kubeadm join --token 7dff02.8b4d92135c936919 192.168.1.114:6443 --discovery-token-ca-cert-hash sha256:af6edbddbffb904491599a03362d8deebeee68ffa170d20daffa2c6a70acb9b2 root 12163 0.0 0.0 14224 1092 pts/0 S+ 15:01 0:00 grep --color=auto kubeadm root@k8s-ingress:~# kill -TERM 4046 root@k8s-ingress:~# ps auxww | grep kubeadm root 12217 0.0 0.0 14224 932 pts/0 S+ 15:02 0:00 grep --color=auto kubeadm```

wdennis
2018-01-06 15:03
Now to see the job state...

wdennis
2018-01-06 15:06
Yup, "failed": ```AirDennis:~ will$ drpcli jobs show 463014d6-60ef-43be-848c-406296037716 -E https://192.168.1.148:8092 { "Archived": false, "Available": true, "Current": true, "EndTime": "2018-01-06T05:45:16.709533598-05:00", "Errors": [], "ExitState": "failed", "Machine": "174c3987-22a4-43d4-9eb9-0247162e8628", "Meta": {}, "Previous": "82f2c4cc-2df2-45ea-a33f-a4cc1a0769a5", "ReadOnly": false, "Stage": "krib-install", "StartTime": "2018-01-05T18:20:06.89678544-05:00", "State": "failed", "Task": "krib-install", "Uuid": "463014d6-60ef-43be-848c-406296037716", "Validated": true }```

ctrees
2018-01-06 15:08
looks as if the runner reported back in

ctrees
2018-01-06 15:09
do you know what "Previous" is ?

ctrees
2018-01-06 15:10
... I sure learn a lot when your in pain...

wdennis
2018-01-06 15:10
I think "the job before this one"

ctrees
2018-01-06 15:11
OH... that makes sense...

wdennis
2018-01-06 15:11
Oh good, glad I can help :stuck_out_tongue_winking_eye:

wdennis
2018-01-06 15:11
It is the way of OpenSource(tm)

ctrees
2018-01-06 15:12
... and people with disorders...

ctrees
2018-01-06 15:12
anyway... did the runner take a new job ?

wdennis
2018-01-06 15:13
Nope, on a fail it terminates the job queue

ctrees
2018-01-06 15:13
OH...

wdennis
2018-01-06 15:13
There was only one more stage, which was `complete` anyways

wdennis
2018-01-06 15:13
The runner service (drpcli) is still running on the node

wdennis
2018-01-06 15:14
So that's where `complete` would leave it anyhow

ctrees
2018-01-06 15:15
Oh... that was my question, can you give the runner a new que task ? or is the only way to 're-cycle' is a hard boot... and do you know if the node joined the cluster (I assume not... ? right ?)

wdennis
2018-01-06 15:16
a) Yes, it think I could give the runner new jobs (not sure how that would work on the DRP server-side)

ctrees
2018-01-06 15:16
aka... I like this problem as it answers a lot of 'process questions' in my head... mostly DR (disaster recovery)

wdennis
2018-01-06 15:17
b) The node did not join b/c kubeadm could not auth to the k8s API server with the expired token

ctrees
2018-01-06 15:17
to me... DRP -> Disaster Recovery Protocal :wink:

wdennis
2018-01-06 15:18
KRIB (per @greg?s message to me above) only gets a 24h join token, and once it expires, there's currently no way for the DRP system to renew it and store the new one in the profile

wdennis
2018-01-06 15:18
Sadly, I did not know that before I started...

ctrees
2018-01-06 15:19
... yea... expiring and auto rotating tokens is the new hottness ya know...

wdennis
2018-01-06 15:20
Although it did prep my node for k8s... So that's cool

wdennis
2018-01-06 15:20
token rotation = a good idea

ctrees
2018-01-06 15:21
but if the runner failed on that node, the only option to do anything is hard reboot ?? right ??

ctrees
2018-01-06 15:22
then I get krib more... as the only way in is drp and/or kubeadm

wdennis
2018-01-06 15:22
Not the runner that fails, but the job

ctrees
2018-01-06 15:23
Yup, got that (in your debug instance)...

wdennis
2018-01-06 15:24
I'm not sure if they set up a `systemd` service for drpcli where it will auto-start it on boot...

wdennis
2018-01-06 15:26
Yes, they do: ```root@k8s-ingress:~# systemctl list-unit-files --type=service | grep drpcli drpcli-init.service enabled drpcli.service enabled```

ctrees
2018-01-06 15:27
... that's what I THOUGHT was going on... but they do that in the install dynamically via SH

wdennis
2018-01-06 15:28
Yeah, I think in the post-install shell script

ctrees
2018-01-06 15:28
cool stuff and all because your ssl expired :wink:

wdennis
2018-01-06 15:28
?No pain, no gain? I guess

ctrees
2018-01-06 15:29
... the UX testing has now 'distracted' me (in a good way) for 2 weeks...

ctrees
2018-01-06 15:32
I got it taking pretty pictures AND comparing those pictures to base image... (visual diff)

wdennis
2018-01-06 15:34
I was thinking that the way to test the UX would be to programatically ?work? the UX and see what resulting REST command gets sent to the DRP server

wdennis
2018-01-06 15:35
But I?m not a web dev guy so what do I know?

ctrees
2018-01-06 15:36
yup... THAT's exactly what I want to do

ctrees
2018-01-06 15:37
that and have issue drpcli and see the results in UI

ctrees
2018-01-06 15:38
should be able to test the rest api the same way... end up with 5-6 language mappings

ctrees
2018-01-06 15:40
lang meaning: ddrpcli, drp-golang, drp-api, drp-css, drp-bdd, drp-feature

wdennis
2018-01-06 15:42
The real way would be to (IMHO) to TDD as a UX dev style

wdennis
2018-01-06 15:42
Write a test case, ensure it fails, devel the feature, ensure the test passes

ctrees
2018-01-06 15:43
they've got unit in golan and rest-api test I hope to use that as UX basis...

ctrees
2018-01-06 15:45
yea TDD, BDD all great ideas... but if you think about it... ansible is basically that for cli (and cli is what.... 40 years old)....

ctrees
2018-01-06 15:47
anyway... my 'GOAL' is to make a UX test script setup that you can use to file bugs and/or 'capture' demos... realistically the UI is really just good documentation for the cli :wink:


ctrees
2018-01-06 15:51
that was my first attempt.... have 3 others (playing with other tools / structures) most are investigations for 'real work' but using drp as proving grounds...

greg
2018-01-06 15:51
Okay jobs and stuff.

greg
2018-01-06 15:52
kubeadm hung in a loop causing the task to run forever.

greg
2018-01-06 15:52
This is a task design issue and kubeadm issue.

greg
2018-01-06 15:53
regardless, the runner doesn?t not have a watchdog control to kill or stop tasks. That is an interesting idea to think about.

ctrees
2018-01-06 15:53
is the expire token thing aws ? or ?

greg
2018-01-06 15:54
Regardless, when @wdennis killed the kubeadm, the runnner saw it as failure, mark the node as not runnable (runnable == false) and went to sleep waiting for runnable to become true.

wdennis
2018-01-06 15:54
Or (it seems) a admin way to direct the runner to terminate the current job and move on (not sure that?s always a good idea, but the admin can make that call I guess)

greg
2018-01-06 15:55
The job for that task in the machine?s task list was marked failed and the task list index was left where it is at.

greg
2018-01-06 15:55
The purpose for this is that remediation can occur and then machine marked runnabled and the task processing will start where it left off.

wdennis
2018-01-06 15:55
Yes, b/c I went into the node and manually killed the process that drpcli had spawned

wdennis
2018-01-06 15:56
I did (via UX) tell the machine to stop the runner (Runnable = false) but that did not seem to work

greg
2018-01-06 15:56
kubeadm is running the hard loop inside itself apparently. We don?t run the kubeadm calls in a loop.

wdennis
2018-01-06 15:57
Oh sure, I get that

greg
2018-01-06 15:57
That is only checked when drpcli gets a chance.

greg
2018-01-06 15:57
Remember, DRP doesn?t touch machines. Machines touch DRP.

wdennis
2018-01-06 15:57
I thought it may ?systemctl stop drpcli.service? or something

wdennis
2018-01-06 15:57
Pull, not push

greg
2018-01-06 15:57
Yes

wdennis
2018-01-06 15:58
But there should be a way to centrally control the runner on the machines, right?

greg
2018-01-06 15:58
yes and no.

wdennis
2018-01-06 15:59
?it depends? :stuck_out_tongue_winking_eye:

greg
2018-01-06 15:59
Yes - mark the machine runnable and it will eventually stop baring badness.

greg
2018-01-06 15:59
No - DRP has no way to force the stop if something hangs.

wdennis
2018-01-06 15:59
OK, not following that?

greg
2018-01-06 15:59
Well - who knew that kubeadm would loop forever on a bad token.

greg
2018-01-06 16:00
That is unknown badness.

wdennis
2018-01-06 16:00
?mark the machine runnable and it will eventually stop? seems backwords to me?

greg
2018-01-06 16:00
sorry - not runnable

ctrees
2018-01-06 16:00
DRPe - Endpoint, DRPn - Node (I though the runner is the same golang bin but knows it's on a node ? correct ?)

wdennis
2018-01-06 16:00
Phew :slightly_smiling_face:

wdennis
2018-01-06 16:00
Yes, got you - once the runner kicks of a child process, it?s hard to know what that process is actually doing?

ctrees
2018-01-06 16:01
... cool @wdennis file a kubeadm bug !

wdennis
2018-01-06 16:01
Except when you get a return code to the parent

greg
2018-01-06 16:03
@ctrees - DRPn is drpcli running `machines processjobs <uuid>`

greg
2018-01-06 16:04
Yes - @wdennis

wdennis
2018-01-06 16:06
What would happen if the runner service was stopped in the middle of it processing a job queue?

greg
2018-01-06 16:06
now, with some of the new websocket stuff, we can have the drpcli create event stream watching the job and the machine. If the machine becomes not runnable or the job gets marked stopped, we could kill the process and close out things. There are some perils with this because of timing, but it could be made to work and itsn?t a bad idea. Should note it as a feature to deal with.

greg
2018-01-06 16:06
It would leave the job running in DRP.

wdennis
2018-01-06 16:07
i.e. you have this going on: `drpcli???bash???kubeadm???10*[{kubeadm}]` and then you do a `systemctl stop drpcli.service` on the node

greg
2018-01-06 16:07
Once the runner was restarted and asked for job (assuming the machine was runnable), DRP would mark the ?open? job as failed, create new job instance of the current task, and try again.

wdennis
2018-01-06 16:07
Ah, so that wouldn?t work

greg
2018-01-06 16:08
well - it works fine for `node got powercycled accidentally` case. But in this case, no.

wdennis
2018-01-06 16:08
Yup, see that

greg
2018-01-06 16:09
give me a minute to look up something.

wdennis
2018-01-06 16:10
So there?s no way of administratively telling the DRP system ?hey, stop processing the current job, mark it as ?admin terminated?, and move ahead with the next job?

wdennis
2018-01-06 16:11
(would take coordination with the node?s `drpcli` to restart it or something)

greg
2018-01-06 16:13
well.

wdennis
2018-01-06 16:13
Again, not sure that?s a great idea, but if the admin knows the current job which for whatever reason is not proceeding as planned (like I saw from job log output), and that in their estimation could be safely skipped and move on proceeding with the next job in the queue, why shouldn?t they have the power to do that?

greg
2018-01-06 16:13
You could do this, but not as a single action from DRP UX or CLI.

greg
2018-01-06 16:14
1. mark machine not runnable.

greg
2018-01-06 16:14
2. kill hung task / restart runner

greg
2018-01-06 16:14
3. Set currentTask to currentTask + 1 on the machine (I think this still works).

greg
2018-01-06 16:14
4. make machine runnable.

greg
2018-01-06 16:15
That would do you ?administrative? stop and move on to the next task.

wdennis
2018-01-06 16:15
Couldn?t that be orchestrated?

greg
2018-01-06 16:15
#2 is not implemented today.

greg
2018-01-06 16:16
but yes eventually.

wdennis
2018-01-06 16:16
I did do 1); I did 1st half of 2)

greg
2018-01-06 16:16
Okay remediating the expired token in krib.

greg
2018-01-06 16:16
Do you have your k8s admin node running still?

wdennis
2018-01-06 16:16
Yup

greg
2018-01-06 16:17
Can you do this for me from there? `kubeadm token list`

wdennis
2018-01-06 16:18
yup hold on --

greg
2018-01-06 16:18
ok

shane
2018-01-06 16:18
seems to me - you should be able to regenerate the token on Master - and manually inject that back in to the Profile, and then machines could join after that was completed

shane
2018-01-06 16:19
(regenerate token on k8s master)

greg
2018-01-06 16:19
yep - that is what I?m walking @wdennis through. Just collecting the commands.

wdennis
2018-01-06 16:19
```root@testnode03:~# kubeadm token list TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS root@testnode03:~#```

greg
2018-01-06 16:19
@wdennis - `kubeadm token create`

wdennis
2018-01-06 16:20
```root@testnode03:~# kubeadm token create 866bc1.51f5919fd2dfd63e root@testnode03:~#```

greg
2018-01-06 16:20
On the DRP endpoint,

greg
2018-01-06 16:20
you can do this:

greg
2018-01-06 16:20
`drpcli profiles show <k8s cluster profile>`

wdennis
2018-01-06 16:21
Yes?

greg
2018-01-06 16:21
I need to see the join command in the `krib/cluster-join-command`

wdennis
2018-01-06 16:21
```"krib/cluster-join-command": "kubeadm join --token 7dff02.8b4d92135c936919 192.168.1.114:6443 --discovery-token-ca-cert-hash sha256:af6edbddbffb904491599a03362d8deebeee68ffa170d20daffa2c6a70acb9b2",```

greg
2018-01-06 16:22
I believe you can do this now:

greg
2018-01-06 16:22
```drpcli profiles set <k8s-cluster-profile> param krib/cluster-join-command to "kubeadm join --token 866bc1.51f5919fd2dfd63e 192.168.1.114:6443 --discovery-token-ca-cert-hash sha256:af6edbddbffb904491599a03362d8deebeee68ffa170d20daffa2c6a70acb9b2"```

greg
2018-01-06 16:23
Not I replaced the old token (7dff02.8b4d92135c936919) with a new token (866bc1.51f5919fd2dfd63e)

greg
2018-01-06 16:23
You should hten be able to mark the stopped node as Runnable and it should install and join.

greg
2018-01-06 16:24
Also, can you run a `kubeadm token list` again. I want to see the reported info.

wdennis
2018-01-06 16:26
```root@testnode03:~# kubeadm token list TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS 866bc1.51f5919fd2dfd63e 23h 2018-01-07T16:19:56Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token root@testnode03:~#```

wdennis
2018-01-06 16:27
I do not need to update the `--discovery-token-ca-cert-hash` right?

shane
2018-01-06 16:27
no the CA Cert Hash should be the same

wdennis
2018-01-06 16:27
Cool

shane
2018-01-06 16:28
unless the Cert has expired, too ... :slightly_smiling_face:

wdennis
2018-01-06 16:28
LOL

shane
2018-01-06 16:28
but in future when you create a token you should be able to set a longer expiry time with `--token-ttl` (if I'm reading the docs right)

wdennis
2018-01-06 16:30
```AirDennis:~ will$ drpcli profiles set k8s-cluster1 param krib/cluster-join-command to "kubeadm join --token 866bc1.51f5919fd2dfd63e 192.168.1.114:6443 --discovery-token-ca-cert-hash sha256:af6edbddbffb904491599a03362d8deebeee68ffa170d20daffa2c6a70acb9b2" -E https://192.168.1.148:8092 "kubeadm join --token 866bc1.51f5919fd2dfd63e 192.168.1.114:6443 --discovery-token-ca-cert-hash sha256:af6edbddbffb904491599a03362d8deebeee68ffa170d20daffa2c6a70acb9b2"```

wdennis
2018-01-06 16:30
Let's give it a go then!

wdennis
2018-01-06 16:31
How to use drpcli to set the machine runnable?

wdennis
2018-01-06 16:31
(Sorry, not trusting the UX right now?)

wdennis
2018-01-06 16:35
n/m, got it

greg
2018-01-06 16:35
`drpcli machines update <uuid> '{ "Runnable": true }'`

wdennis
2018-01-06 16:35
```Running: kubeadm join --token 866bc1.51f5919fd2dfd63e 192.168.1.114:6443 --discovery-token-ca-cert-hash sha256:af6edbddbffb904491599a03362d8deebeee68ffa170d20daffa2c6a70acb9b2 [preflight] Running pre-flight checks. [WARNING FileExisting-crictl]: crictl not found in system path [discovery] Trying to connect to API Server "192.168.1.114:6443" [discovery] Created cluster-info discovery client, requesting info from "https://192.168.1.114:6443" [discovery] Requesting info from "https://192.168.1.114:6443" again to validate TLS against the pinned public key [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.1.114:6443" [discovery] Successfully established connection with API Server "192.168.1.114:6443" This node has joined the cluster: * Certificate signing request was sent to master and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the master to see this node join the cluster. Finished successfully Command exited with status 0 Action krib-install.sh.tmpl finished Task krib-install finished Updated job 9d7d2564-26b4-439e-a049-c5b959b6da32 to finished```

shane
2018-01-06 16:35
or: `drpcli machines set bff5513f-7f63-43c6-b744-5eefaa9716be param Runnable to true`

wdennis
2018-01-06 16:35
GOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAL!

wdennis
2018-01-06 16:36
```root@testnode03:~# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-ingress Ready <none> 1m v1.9.1 testnode01 Ready <none> 11d v1.9.0 testnode02 Ready <none> 11d v1.9.0 testnode03 Ready master 12d v1.9.0 testnode04 Ready <none> 11d v1.9.0```

wdennis
2018-01-06 16:37
Thanks fellas :slightly_smiling_face:

shane
2018-01-06 16:39
(scratch that method I posted - not right)

greg
2018-01-06 16:48
@wdennis - cool

wdennis
2018-01-06 16:57
Learned a lot, so all good... ?all?s well that ends.?

wdennis
2018-01-06 16:59
Would it/could it be a thing that krib-install checks for join token validity and re-gens & stores new if no longer valid?

greg
2018-01-06 17:02
Well - this would be an example of a maintence operation workflow.

greg
2018-01-06 17:02
A stage/task that would regen the token and update the profile from the k8s master. You would set the k8s-master machine to that stage, it would update the token and go back to wait.

wdennis
2018-01-06 17:07
I was thinking check token, and if invalid (missing) then regen & update

wdennis
2018-01-06 17:07
But yes, separate task I suppose

shane
2018-01-06 17:09
there are lots of OPS related ways to do this ... which isn't our business model ... certainly need to consider if/how we want to approach the Day 2 side of things - but ... for each Operations task we bake in to DRP ... the more people will want to bake in - then it becomes rigid in how it works - which is the problem with DRv2 - we specified too tightly how operations should be handled by how things were built

wdennis
2018-01-06 17:09
Put that in front of krib-install and ?Bob?s your uncle? (for at least my failure case)

shane
2018-01-06 17:09
KRIB is a demonstration workload - not our business model

shane
2018-01-06 17:10
similarly - you could ask us to fix HA control plane in kubeadm - so that KRIB can bake an HA cluster ... but we're not going to touch that with a 100 foot pole

wdennis
2018-01-06 17:11
Ok, not going that far...

shane
2018-01-06 17:11
my point is: it's a slippery slope to start going down ... each tiny iterative change "seems like a good idea" at the time ... :slightly_smiling_face:

wdennis
2018-01-06 17:12
And sometimes to be honest it does seem to me like DRP is a k8s deployment system...

shane
2018-01-06 17:12
I do agree that we might consider how to make cluster join with the token after expiry easier

wdennis
2018-01-06 17:12
I get where you guys are coming from...

shane
2018-01-06 17:12
but NOT for KRIB's sake - for the sake of a generic model for any cluster management tooling going forward - a pattern that makes sense for the larger ecosystem

shane
2018-01-06 17:13
we could simply change the KRIB content to generate a non-expiring token

shane
2018-01-06 17:13
viola ... job finished

wdennis
2018-01-06 17:13
There you go...

wdennis
2018-01-06 17:14
KRIB to me is an opinionated k8s cluster deployer

shane
2018-01-06 17:14
that is exactly the rub ... that I'm getting at

shane
2018-01-06 17:15
we don't want to be in the business of "opinionated installers"

shane
2018-01-06 17:15
our business is helping you get your installation path up and running

shane
2018-01-06 17:15
but realistically - we have to demonstrate how the system works - for others to understand and take it up

wdennis
2018-01-06 17:15
So for a guy like me just learning about k8s and wanting my own bare-metal cluster, I was trying to leverage your installer

shane
2018-01-06 17:15
and eventually extend it for their own operational models

wdennis
2018-01-06 17:16
Granted I?m ignorant at this point on kubeadm as well as many other k8s things...

wdennis
2018-01-06 17:18
So maybe more caveats from the RackN side (?just an example installer, not for production use? etc) may be warranted...

shane
2018-01-06 17:18
it is a good learning experience - for you - and for us - as we learn how to operate ... Operational things within DRP - we find areas to extend, fix, enhance, and make better

wdennis
2018-01-06 17:18
Agreed

shane
2018-01-06 17:18
that caveat exists for `kubeadm` itself ... :slightly_smiling_face:

wdennis
2018-01-06 17:18
Yeah, I?m learning that...

zehicle
2018-01-06 19:11
+1 "generic model for any cluster management tooling" < KRIB helps find patterns for immutable deploys that DRP should facilitate. I would love to see a collaborative approach where Kubeadm / Kubespray did things that leverages DRP node ready state and cluster/profile metadata to make the install & Day 2 easier

zehicle
2018-01-06 19:13
RE: Kubespray vs KRIB - it would be great to let Kubespray to setup the control plane and then use Kubeadm join to attach the nodes. That pattern would work for hybrid cloud managed control too.

shane
2018-01-06 20:10
Nice ... someone forgot to sign the CentOS Repo pkgs for kubernetes 1.9.1 ??? ```Package efde37cfcd34c8232daafb0337b8ba5fda70100ab6988fca71ba30ce929311dd-kubelet-1.9.1-0.x86_64.rpm is not signed```

greg
2018-01-06 22:05
Yeah - I noticed that too. I turned off the checking, but it is wrong.

ctrees
2018-01-06 22:14
since your on... I've got screen size check params... default check mobile (aka small screen)

ctrees
2018-01-06 22:15
what's the min screen width before you tell someone to get a real monitor :wink:

greg
2018-01-06 22:18
4k ultra HD

ctrees
2018-01-06 22:18
yea... cool... can you get rob to send me one for testing ?

greg
2018-01-06 22:19
Actually, I don?t if we?ve set one. There is known issues betwen @zehicle and I because he tests with the firefox debugger open on the right side, while I don?t have it open until I need it.

ctrees
2018-01-06 22:19
I'm sure they are on sale at Fry's

greg
2018-01-06 22:19
I?d have to get one first.

ctrees
2018-01-06 22:20
wait... that's a trick... you don't EVER need one open...

ctrees
2018-01-06 22:20
viewports: [{ width: 1024, height: 768 }],

greg
2018-01-06 22:20
My guess would be something like 800x600, but 1024 would be better.

ctrees
2018-01-06 22:20
just going with the one... it's easy for 'some-one-who-cares-more' to run more ...

greg
2018-01-06 22:21
I, in general, can operate the system without need the debugger window or cli.

ctrees
2018-01-06 22:21
for sure this thing will be albe to auto-gen screen-shot-UX steps :wink:

greg
2018-01-06 22:21
cool

ctrees
2018-01-06 22:25
sorry for the spew... but...

ctrees
2018-01-06 22:25
catmini:drpfeature msops$ yarn run test:po yarn run v1.3.2 $ yarn run wdio wdio.PageObjectTest.conf.js $ /Users/msops/Code/drpfeature/node_modules/.bin/wdio wdio.PageObjectTest.conf.js ------------------------------------------------------------------ [chrome #0-0] Session ID: ca8bb510bc5543bba0d6fe9a6a5fbf5e [chrome #0-0] Spec: /Users/msops/Code/drpfeature/src/pospecs/login.spec.js [chrome #0-0] Running: chrome [chrome #0-0] [chrome #0-0] drp-ux auth form [chrome #0-0] - should deny access with wrong creds [chrome #0-0] ? should allow access with correct creds [chrome #0-0] [chrome #0-0] [chrome #0-0] 1 passing (9s) [chrome #0-0] 1 pending [chrome #0-0] :sparkles: Done in 13.29s. catmini:drpfeature msops$ yarn run test:po yarn run v1.3.2 $ yarn run wdio wdio.PageObjectTest.conf.js $ /Users/msops/Code/drpfeature/node_modules/.bin/wdio wdio.PageObjectTest.conf.js ------------------------------------------------------------------ [chrome #0-0] Session ID: 6ef9a9164beef7c5670bb61db1c6dca5 [chrome #0-0] Spec: /Users/msops/Code/drpfeature/src/pospecs/login.spec.js [chrome #0-0] Running: chrome [chrome #0-0] [chrome #0-0] drp-ux auth form [chrome #0-0] - should deny access with wrong creds [chrome #0-0] 1) should allow access with correct creds [chrome #0-0] [chrome #0-0] [chrome #0-0] 1 pending (10s) [chrome #0-0] 1 failing [chrome #0-0] [chrome #0-0] 1) drp-ux auth form should allow access with correct creds: [chrome #0-0] visCheck: System Management Check Fail: expected false to equal true [chrome #0-0] AssertionError: visCheck: System Management Check Fail: expected false to equal true [chrome #0-0] at /Users/msops/Code/drpfeature/src/pageobjects/page.js:13:53 [chrome #0-0] at Array.forEach (<anonymous>) [chrome #0-0] at Page.visCheck (/Users/msops/Code/drpfeature/src/pageobjects/page.js:12:13) [chrome #0-0] at Context.<anonymous> (/Users/msops/Code/drpfeature/src/pospecs/login.spec.js:52:20) [chrome #0-0] at new Promise (<anonymous>) [chrome #0-0] at new F (/Users/msops/Code/drpfeature/node_modules/core-js/library/modules/_export.js:35:28) [chrome #0-0] error Command failed with exit code 1. info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command. error Command failed with exit code 1. info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command. catmini:drpfeature msops$

ctrees
2018-01-06 22:26

ctrees
2018-01-06 22:28
purple is the css animation drift :wink:... but because I turned sensitivity up to '11' (spinal tap reference)

ctrees
2018-01-07 03:19
so... if you PAY for the aws 'stuff' esp the login... didn't they provide testing hooks ?.... there is something funny differnet about that darn pile of amazoncognito caca... I can't get focus now (or did someone change something)

ctrees
2018-01-07 03:41
Oh... hidden elements named the same but only rendered react .... I think... we need to talk a css style guide or 'something'...

ctrees
2018-01-07 03:43
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F8Q0ZTM2A/screen_shot_2018-01-06_at_9.41.51_pm.png and commented: ... hacking around 4 now by guessing at the element #...

mclamb
2018-01-08 02:25
has joined #community201801



shane
2018-01-08 02:36
@mclamb welcome

mclamb
2018-01-08 02:37
Thanks, @shane! Saw Rob's talk at KubeCon, been generally interested in replacing MaaS with DRP lately...

shane
2018-01-08 02:37
@mclamb here is some documentation on the KRIB stuff - including links to the videos: http://provision.readthedocs.io/en/latest/doc/integrations/krib.html

mclamb
2018-01-08 02:38
One question I had was -- Are there two different products? Digital ReBar Provision (which is part of) Digital Rebar?

shane
2018-01-08 02:38
Digital Rebar Provision (ver 3) - is "the next version of Digital Rebar ver 2"

shane
2018-01-08 02:39
they are very different products and DRP is meant to succeed "Digital Rebar ver 2" (which is now EOL)

mclamb
2018-01-08 02:39
OK.. there seems to be some wording on the site about how DRP can be used "standalone" or as part of Digital Rebar

shane
2018-01-08 02:39
we're working on cleaning up confusing documentation relating to that - can you point to where you are referring ?

mclamb
2018-01-08 02:39
yeah lemme find it

shane
2018-01-08 02:39
thanks !


mclamb
2018-01-08 02:40
top of the page

mclamb
2018-01-08 02:40
"It is designed to stand alone or operate as part of the Digital Rebar management system."

shane
2018-01-08 02:42
ah yes - that's left over - I'll slap a change in right now to clean that up - thx for pointing that out

mclamb
2018-01-08 02:44
Given DRP is the product to use, I understand that its focus is bare metal provisioning. But you do have IPMI proxies for http://Packet.net (and Virtualbox?)... I was wondering if there are other cloud providers for which you do something similar? Digital Ocean, GCP, etc.?

shane
2018-01-08 02:45
ok - that doc has been cleaned up - in a couple minutes the changes should push to the RTD page

shane
2018-01-08 02:45
yes and yes

mclamb
2018-01-08 02:45
I have to deploy to on-prem bare metal for operations, but trying to use cloud for dev/test/etc. It would be nice to use DRP across all and maybe even use the same Terraform code (for DRP)... I guess it is not a huge deal if I have terraform for DRP, then Terraform for Cloud VM providers

shane
2018-01-08 02:45
we support via Plugins - both http://packet.net and virtualbox IPMI like power commands

shane
2018-01-08 02:46
as well as bare metal via IPMI as implemented by a hardware Baseboard Management Controller (BMC - eg. iDRAC, iLO, etc)

shane
2018-01-08 02:49
for the http://packet.net environment - you can see a basic example of spinning up a DRP Endpoint (provisioning server) and provisioning _N_ number of packet machines against that DRP Endpoint, in the example I wrote: https://github.com/digitalrebar/provision/tree/master/examples/pkt-demo

mclamb
2018-01-08 02:49
Cool thx

mclamb
2018-01-08 02:50
You might also consider changing the provision repo description too :wink:

shane
2018-01-08 02:51
you can check out an older YouTube vid on Mac OSX quickstart - which includes VirtualBox info - I haven't reviewed this video yet, but I think it should still be relevant - since the VirtualBox plugin hasn't changed: https://www.youtube.com/watch?v=uUWU-4ObGIY

mclamb
2018-01-08 02:51
Ok will definitely check those out

mclamb
2018-01-08 02:51
thanks for the tips

shane
2018-01-08 02:52
(all references to the UI in that video are no longer valid - but the info is still good - the new UI is vastly improved and much better)

shane
2018-01-08 02:53
this description: _"The Provisioner for DigitalRebar as a Stand Alone Golang Utility"_

shane
2018-01-08 02:53
??

mclamb
2018-01-08 02:53
Yeah... similar to the RTD change you just made, it suggests that it's a part of a larger piece?

shane
2018-01-08 02:53
only because I think you're connecting the two dots ... but I agree it could be worded much better

shane
2018-01-08 02:54
:slightly_smiling_face:

mclamb
2018-01-08 02:54
Further down in the repo it says "DR Provision is a APLv2 simple Golang executable that provides a simple yet complete API-driven DHCP/PXE/TFTP provisioning system" which seems more precise! :slightly_smiling_face:

mclamb
2018-01-08 02:54
Hah, yeah sorry to be so pedantic, but all cleared up for me now

shane
2018-01-08 02:55
no worries - fresh eyes and an outside perspective is refreshing - we sorta "pass over" some things and don't realize they're still in need to tweaking

shane
2018-01-08 02:56
well - it looks like my Corporate Overlord (@greg) doesn't trust me with "Settings" of the _digitalrebar_ repo - so I can't make that change ...

mclamb
2018-01-08 02:56
One last question before I run -- during the bare metal provisioning phase (in discover stage I presume?) I can configure networking and disks (e.g. create LACP bonds, VLANs, RAID devices, etc.)?

shane
2018-01-08 02:57
yes - but we don't have "content" that does that currently - you'd have to write a Stage - which ultimately would just implement the configuration you desire for a given environment

shane
2018-01-08 02:58
typically this would be a simple Bash shell - but you can do this any number of ways ... it would be run in the Sledgehammer image and once you advance a Machine through the Workflow stages, one of the stages would be your specific configuration for networking

shane
2018-01-08 02:59
one example would be to setup your own Configuration Management tooling of your choice (eg Saltstack, Ansible, Puppet) etc. and then you could run appropriate CfgMgmt tooling to do that

mclamb
2018-01-08 02:59
How would that network config stage then get "injected" into /etc/network/interfaces (or the like) once the final OS is installed?

mclamb
2018-01-08 02:59
Ahh... OK yeah, could just do it all via Ansible

shane
2018-01-08 02:59
the KRIB workflow that @zehicle demonstrated is an example of Stages that advance the Machine to a given end-state - your network config would be inserted in that chain of stages to "do it's thing" for you

shane
2018-01-08 03:00
Sledgehammer (discovery image) is just a Live Boot linux distro - so it's running in-mem and does all the Machine prep - including implementing and handling all of the Workflow stages

mclamb
2018-01-08 03:01
ok will start playing around with it soon! thanks for the help

shane
2018-01-08 03:01
you bet, drop by if you need any help/pointers as you work through it

shane
2018-01-08 03:02
that KRIB doc I pointed you at I only just recently wrote - so any feedback on it is appreciated

greg
2018-01-08 03:03
@shane - your ?corporate overload? says give it a shot now.

shane
2018-01-08 03:04
woot! you'll probably regret this day ... just sayin'

greg
2018-01-08 03:05
Already do

greg
2018-01-08 03:05
:slightly_smiling_face:

shane
2018-01-08 03:06
:stuck_out_tongue_winking_eye:

mclamb
2018-01-08 04:29
@shane Here's another reference in the docs to "the larger Digital Rebar system": http://provision.readthedocs.io/en/latest/doc/arch/server.html - Section 4.1

shane
2018-01-08 04:34
fixed

pmorris
2018-01-08 16:29
has joined #community201801

zehicle
2018-01-08 16:42
Hello @pmorris! welcome

pmorris
2018-01-08 16:43
Hello @zehicle! Thanks :smile:

2018-01-08 16:44
uhghhhh

shane
2018-01-08 16:44
uhggggggghhhhhhh!

2018-01-08 16:45
LOL come on im allowed to ughhhhh im preppring to work through this complex looking KRIbs

2018-01-08 16:45
:)

2018-01-08 16:45
cuz when i yell... youll all go... ughhhh .... not again :)

shane
2018-01-08 16:47
it's not very complex - the videos that @zehicle did should help, in conjunction with the KRIB doc in our RTD site

shane
2018-01-08 16:47
I'd go so far as to say ... it's probably the absolute easiest Kubernetes install you'll find out there

joel
2018-01-08 16:48
has joined #community201801

shane
2018-01-08 16:49
@joel welcome

2018-01-08 16:54
@rackneng videos ???

shane
2018-01-08 16:55
see the KRIB documentation 2 links to videos: http://provision.readthedocs.io/en/latest/doc/integrations/krib.html

shane
2018-01-08 16:55
the first one is probably the better one (the KubeCon presentation is longer and a bit broader than just KRIB)

2018-01-08 17:03
ok so question... is there anyway to get rebar to start at boot ???

2018-01-08 17:03
i dont see any "init" scripts

shane
2018-01-08 17:04
if you install NOT in the "--isolated" mode - we drop the correct startup scripts in place

shane
2018-01-08 17:04
if you did an --isolated install - you can still add the init scripts - I'd just suggest curl'ing down the install.sh script, and pull the init scripts out of there: ```curl -s get.rebar.digital/stable -o /tmp/install.sh```

2018-01-08 17:06
no i mean i have to login as root and run ./dr-provision --static-ip=148.251.24.11 --base-root=/home/dingo/drp-data --local-content="" --default-content="" &

shane
2018-01-08 17:06
yes

2018-01-08 17:06
everytime i boot the vm

shane
2018-01-08 17:07
you did the install in `--isolated` mode (non-production mode)

2018-01-08 17:07
oh ?

2018-01-08 17:07
hell

shane
2018-01-08 17:07
isolated mode does not install start up scripts

2018-01-08 17:07
can i change it ?

shane
2018-01-08 17:07
production mode does

2018-01-08 17:08
kind of afraid to mess with it now that its working

shane
2018-01-08 17:08
you can do one of two things: 1) reinstall in production mode (remove the `--isolated` flag during install) - you'll have to move your content back in to place 2) just add start up scripts that point to your current install location

shane
2018-01-08 17:08
in isolated mode, everything is self-contained in your `/home/dingo/drp-data` directory

2018-01-08 17:08
yupp

shane
2018-01-08 17:09
you could simply move this to a root folder location (stop dr-provision first) - for example: ```pkill dr-provision cp -r /home/dingo/drp-data /srv/drp```

shane
2018-01-08 17:09
then add start up scripts to reference the install in your new location

shane
2018-01-08 17:11
the `install.sh` script I referenced above has all of the BASH syntax to add the init scripts - it depends on what Linux distro you're using; which you'll need

shane
2018-01-08 17:12
you can pull the init scripts from our github repo as well: https://github.com/digitalrebar/provision/tree/master/assets/startup

2018-01-08 17:36
uhhhohhh dr-provision2018/01/08 17:35:47.209450 dataTracker: Error loading data: Failed to load backing objects from cache: Unable to load machines: unexpected end of JSON input

2018-01-08 17:36
i broke sumthin

greg
2018-01-08 17:38
Did you update DRP to tip?

2018-01-08 17:38
nope same install as its been just rebooted it

greg
2018-01-08 17:39
change paths?

2018-01-08 17:39
nope

2018-01-08 17:39
guess ill blow it away and do it proper this time

2018-01-08 17:40
ughhh i have machines provisioned from it though

2018-01-08 17:40
crap

greg
2018-01-08 17:40
well - first step is to cd to database and look at machine files to make sure they are valid json

greg
2018-01-08 17:40
cd drp-data/digitalrebar/machines

greg
2018-01-08 17:41
(for isolated mode).

2018-01-08 17:41
k soon as i finished backing up the directory

2018-01-08 18:15
-rw-r--r-- 1 root root 0 Jan 7 07:12 8338b877-f417-485b-836e-d11feff9e860.json

2018-01-08 18:15
dat just aint right

shane
2018-01-08 18:17
what's not right about it?

shane
2018-01-08 18:17
that's the UUID of a Machine, with the .json extension - is the contents wrong ?

shane
2018-01-08 18:17
ah - zero size

2018-01-08 18:17
0 bytes ?

shane
2018-01-08 18:17
nope - it shouldn't be zero size

2018-01-08 18:17
prolly something i booted and shutoff

2018-01-08 18:18
since i have vms that are already built before rebar, but pxe anyway

2018-01-08 18:18
have to get them off pxe

2018-01-08 18:19
now.... back to trying to configure for KRIBs

dunger
2018-01-09 02:45
has joined #community201801

zehicle
2018-01-09 15:57
Hello @dunger ! good to see you

shane
2018-01-09 16:36
welcome @dunger

shane
2018-01-10 01:16
- last minute notice - but I'm presenting basics about Digital Rebar Provision on the Boise Idaho meetup group - anyone interested is welcome to join via Zoom conference, at: https://zoom.us/j/4084048118

shane
2018-01-10 01:16
starts at 5:30 pm PST

ctrees
2018-01-11 03:16
was attempting to look at the drp-api

ctrees
2018-01-11 03:17
catmini:drpfeature msops$ curl -X GET --header --insecure 'Accept: text/plain' 'https://192.168.1.200:8092/api/v3/isos' curl: (3) Port number ended with ' ' curl: (60) SSL certificate problem: unable to get local issuer certificate More details here: https://curl.haxx.se/docs/sslcerts.html curl performs SSL certificate verification by default, using a "bundle" of Certificate Authority (CA) public keys (CA certs). If the default bundle file isn't adequate, you can specify an alternate file using the --cacert option. If this HTTPS server uses a certificate signed by a CA represented in the bundle, the certificate verification probably failed due to a problem with the certificate (it might be expired, or the name might not match the domain name in the URL). If you'd like to turn off curl's verification of the certificate, use the -k (or --insecure) option. HTTPS-proxy has similar options --proxy-cacert and --proxy-insecure. catmini:drpfeature msops$

ctrees
2018-01-11 03:17
Docs say something about an Authorize button


ctrees
2018-01-11 03:19
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F8S7B24FR/drp-api-swagger.png and commented: I sort of remember the Authorized button or put a token in somewhere... but forgot where how

greg
2018-01-11 03:27
@ctrees the `Accept` parts needs a -H

greg
2018-01-11 03:27
You also need to add -H ?Content-Type: application/json?

greg
2018-01-11 03:28
I do this: ```curl -k -u rocketskates:r0cketsk8ts -X POST https://127.0.0.1:8092/api/v3/machines/a2b8c3e8-d524-48d7-9ecd-83ad460836a2/actions/increment -H "Content-Type: application/json"```

greg
2018-01-11 03:29
The accepts is nice but not required.

zehicle
2018-01-11 04:59
You can also use the bearer token header.

ctrees
2018-01-11 14:41
So how are you testing the api currently ? Something in golang I take it


greg
2018-01-11 14:57
Yes

ctrees
2018-01-11 14:57
I see things like iso_test.go which looks like a test of the iso command... so I'm going to follow that pattern

ctrees
2018-01-11 14:57
aka I should be able to get the same data from cli, api, ux (and file system for that matter)

greg
2018-01-11 14:58
More in a second

ctrees
2018-01-11 15:00
half thinking about sticking in some BATS as bash seems to be your native go-to and it may help my brain wrap around the cli better while staying out of the golang

greg
2018-01-11 15:04
well - I used the cli for most things. ONly if I need to really make sure the raw json is working or the cli is busted do I revert to curl. The UX does raw calls and gets json blobs ?auto-morphed? into javascript objects. This is a little annoying, but passable.

greg
2018-01-11 15:04
There is one thing to realize about the golang tests that is really powerful, but tricky to non-golang systems.

greg
2018-01-11 15:05
The DRP server can be instantiated as an internal service inside other golang programs.

greg
2018-01-11 15:05
This is amazingly powerful for unit tests and integration tests.

ctrees
2018-01-11 15:06
yea I figured that nesting was also how the runner and that que stuff works

ctrees
2018-01-11 15:06
? right ?

greg
2018-01-11 15:06
The structures are better separated for that. The runner only needs to models, cli, and api directories.

greg
2018-01-11 15:07
But the code is shared.

greg
2018-01-11 15:08
Effectively, the unit tests run as two halves. The server side (models, backend, midlayer, and server directories mostly) and the api or cli side (models, api, and/or cli).

ctrees
2018-01-11 15:10
so does travis run the units then ? (as I can trace how they run from that eventually)

greg
2018-01-11 15:10
For example, in the terraform drp provider, I use this to test the terraform provider without having to spin up infrastructure. In the tests, I start a server, run terrraform tests against the internal server and validates api driving.

greg
2018-01-11 15:10
Yes.

greg
2018-01-11 15:10
There are scripts in the tree that travis calls.

greg
2018-01-11 15:11
On my mac, I have a golang build env at 1.9 setup.

greg
2018-01-11 15:11
I run: `ulimit -n 2560`

greg
2018-01-11 15:11
Then I can run:

greg
2018-01-11 15:11
`tools/build.sh`

greg
2018-01-11 15:11
This will build all the platforms and do swagger stuff. It will also attempt to add missing components like glide and swagger.

greg
2018-01-11 15:12
I think. Travis does this as well.

greg
2018-01-11 15:12
Then when this is done, I can run: `tools/test.sh`

greg
2018-01-11 15:12
This runs all the unit tests in all the directories.

greg
2018-01-11 15:12
It can take 5-10 minutes.

greg
2018-01-11 15:13
This runs go test with atomic verification and profiling.

greg
2018-01-11 15:13
You can also cd into a directory and run go test without all that.

ctrees
2018-01-11 15:14
yea... I started to follow the test.sh to look for tests and data to basically replicate out to ux

greg
2018-01-11 15:14
The main thing we don?t test really well is the DHCP server.

greg
2018-01-11 15:14
Most of the tests build their own data. We start empty except for the ?constants? (local stage, none stage, local bootenv, ignore bootenv).

greg
2018-01-11 15:14
global profile

greg
2018-01-11 15:16
@vlowther recently changed the style of the unit tests. They are more file based than internal string based. This is good. So in the cli directory, there is a data directory that contains expected files.

greg
2018-01-11 15:16
Run you run the tests, the output files get built and diffed.

greg
2018-01-11 15:16
There is a script a dev can run to ?fix-up? the expected files.

vlowther
2018-01-11 15:16
yeah, fixInteractive.sh

greg
2018-01-11 15:17
This is really useful when I change the cli usage text and have to change all the files. :slightly_smiling_face:

ctrees
2018-01-11 15:17
Oh... good...

greg
2018-01-11 15:18
The implication to that is we can catch very minor changes and decide if we are okay with them floating out.

greg
2018-01-11 15:18
On the API and CLI side.

ctrees
2018-01-11 15:19
that's sort of what I was looking for... is use the test data generation / checking for stuff you guys have done as population for data checks in the UX (eventually)

ctrees
2018-01-11 15:20
What I decided (last night) is I should KISS for now and replicate a simple demo first and see if Shane/Rob/Issac are willing to use... so I switched back to using data in quickstart

greg
2018-01-11 15:20
@ctrees - you are amazing and gutsy.

vlowther
2018-01-11 15:20
arrgh -- I was going to point you at an example, but Github is giving me the angry pink unicorn!

greg
2018-01-11 15:21
So - a couple of things.

vlowther
2018-01-11 15:21

ctrees
2018-01-11 15:22
no... I've been through this before and have a fascination with toliets and all UX is just 'shinny object' to distract people for the 'shit' they create :wink:

vlowther
2018-01-11 15:22
hah

ctrees
2018-01-11 15:22
woops... this is public I should retract that statement...

ctrees
2018-01-11 15:22
'flush'

greg
2018-01-11 15:23
I can see three issues with this. 1. you need a server to test against. Drp is lightweight, it can just run in most locations, so it is probably fine, but have to manage start and stop. 2. data to manipulate - some prestage content packs should be good for that. 3. UX ids to aid in manupulating content.

greg
2018-01-11 15:24
@ctrees - I view it the other way around. crappy UX until you can get the glory of pure API underneath. :slightly_smiling_face:

ctrees
2018-01-11 15:25
yup... #3 is why I need Rob/Issac (I think right) or someone who LIKES shinny ( was thinking Shane ) to leverage more of the Experience of UX testing...

ctrees
2018-01-11 15:27
I'm right at that point now... which is why I keep saying 'RackN-DSL'... cause it's marketing buz I think Rob can leverage but it'll help in tie'n CSS to the React 'auto-dynamic-gen'

greg
2018-01-11 15:28
So - I?ve let Rob and Isaac run with the UX, but have been trying to get testing off and on. We have limited resources and I have limited knowledge around the testing side.

greg
2018-01-11 15:29
What I want/need is to understand what is needed to make the testing easier to write and effective.

ctrees
2018-01-11 15:30
Oh.. the UX RackN-CSS-DSL will help...

greg
2018-01-11 15:30
I can easily solved items #1 and #2. What I need to understand ( and full disclosure: haven?t looked at your tree yet (plugin rerwite is eating my time)) so don?t know what it is trying to do for sure and how. I think I know, but ?

ctrees
2018-01-11 15:31
humm... maybe you should just run the test I've got... from the structure you'll grok what I'm getting at 'I THINK'... cause it's the same patterns you and victor are doing low level...

greg
2018-01-11 15:31
okay - that is what I want to understand.

greg
2018-01-11 15:32
but angry unicorn says NO!

ctrees
2018-01-11 15:32
it's pretty simple... in the UX things like "RackN Portal Login" on the button

ctrees
2018-01-11 15:33
should be associated with css class "rackn-login-redirect" and React does some duplication, so you have to figure out how to tag it unique

ctrees
2018-01-11 15:33
sort of the same with 'generating dynamic test data' vs what Victor has done ?? I think ??

ctrees
2018-01-11 15:35
.... I got a simple login working... which deals with aws cognigo ? let me get the iso check feature test working and then if you and victor look at that... I think you'll both grok it pretty fast...

ctrees
2018-01-11 15:37
after that... hopefully the guys who CARE what the UX looks like (NOT ME, NOT YOU) for sure I'll ask wdennis :wink: cause he sort of put me on this path...

ctrees
2018-01-11 15:39
... it's a lot of work to keep the UX sane UNLESS you hook it to something like what I think vlowther did for golang unit

ctrees
2018-01-11 15:40
... what's angry unicorn ?

ctrees
2018-01-11 15:41
Oh

ctrees
2018-01-11 15:41

greg
2018-01-11 15:41
github was/is down.

greg
2018-01-11 15:41
okay - I?ll look at it when it comes back up.

ctrees
2018-01-11 15:43
you can wait till I check-in the iso list feature test... that'll be a better example for you... the login test only deals with the aws 'poo'

greg
2018-01-11 15:45
ok

zehicle
2018-01-11 15:52
@ctrees we're working to change the redirect into an API call from the UX - which may make it easier to test

ctrees
2018-01-11 15:53
naw I'm through that, that's the test that is working now

ctrees
2018-01-11 15:54
plus when you go with the proxy redirects like in the KRIB demo... that basically is the same thing (you have to pick up auth state dynamically somewhere)

ctrees
2018-01-11 15:58
if your messing with the UX CSS... I would LOVE to chat about that.. put a comment in Issue 627


ctrees
2018-01-11 16:01
The 'theme' pattern is a great place to put in a good 'RackN-DSL' css string pattern that would make feature -> pageobject -> rackn-dsl -> drpapi/drpcli data mapping automated sort of the same way swagger does for api generation

ctrees
2018-01-11 16:10
the biggest UX testing barrier is what you mentioned (that I was not aware of at the time) ReactJS generates the DOM dynamically AND it will generate multiple Identical Elements... which basically kills most common script UX testing selector techniques... I did figure a way around it BUT it'll make the test super fragile... there are React pattern techniques to prevent identical elements BUT you might as well fix the CSS templating and create rackn pageobjects as well...

ctrees
2018-01-11 16:14
anyway... I should have a UX test to verify iso out today... I'll post here when it's ready... it should be less than 20 min 'distraction' to try and I HOPE that'll show enough information to evaluate if this pattern is worth supporting

greg
2018-01-11 16:14
cool - i?ll try and learn more

ctrees
2018-01-11 16:14
... Oh... but the point of a good UX is to learn LESS :wink:

greg
2018-01-11 16:14
:slightly_smiling_face:

greg
2018-01-11 16:15
I don?t ever yet to learn less.

rcameron
2018-01-11 16:17
@rcameron has left the channel

wdennis
2018-01-11 20:25
Sometimes a good UX is a real timesaver and a lower-barrier entrypoint...

ctrees
2018-01-11 21:01
I agree... I just have to complain when I'm working ... BTW I to want you attempt to run some of this... it's pretty fragile right now as I figure out what React is doing in the DOM...

ctrees
2018-01-11 21:02
I've attempted to push test into the browser as common practice since '96'....

ctrees
2018-01-11 21:04
but this UX does look nice :pray:

wdennis
2018-01-11 21:04
@greg Trying to UEFI-boot a new Dell R640 w/ DRP, getting this:

2018-01-11 21:04
Time to feed the :bear:!


vlowther
2018-01-11 21:10
@wdennis What is option 67 set to for that subnet?

vlowther
2018-01-11 21:15
UEFI requires a different bootloader, you cannot just use pxelinux like we do by default.

wdennis
2018-01-11 21:45
Thx @vlowther... what is the correct param?

vlowther
2018-01-11 21:49
There are a couple, depending on how much you like ipxe

vlowther
2018-01-11 21:50
I have them saved as pinned messages.

vlowther
2018-01-11 21:51
{{if (eq (index . 77) "iPXE") }}default.ipxe{{else if (eq (index . 93) "0")}}ipxe.pxe{{else}}ipxe.efi{{end}} <-- if you like ipxe {{if (eq (index . 77) "iPXE") }}default.ipxe{{else if (eq (index . 93) "0")}}lpxelinux.0{{else}}bootx64.efi{{end}} <-- if you don't like ipxe


ctrees
2018-01-11 21:55
at the bottom is a summary and screencast... I'll do another where I walk through the code steps for the iso check ux test...

greg
2018-01-11 22:06
Very nice! @ctrees

greg
2018-01-11 22:07
I like the video

ctrees
2018-01-11 22:07
see the click links for the ADD

greg
2018-01-11 22:08
??

ctrees
2018-01-11 22:09
above the video on the rtdocs page are tcXX links... they will jump to the location in the video of the description

greg
2018-01-11 22:09
I just watched the whole thing, but those are helpful too.

ctrees
2018-01-11 22:09
I use it so I don't have to review all the video to remember 'wtf'

greg
2018-01-11 22:09
Very cool

ctrees
2018-01-11 22:12
well... the question I'll have is after I do the code review screencast... when you watch that one I hope you and @vlowther will be able to figure out if we can tie your patterns up to the ui AND if the feature file format will be useful

ctrees
2018-01-11 22:13
I think you'll see the pattern that the css will need... then it's ALOT like a swagger def file (IMHO)

greg
2018-01-11 22:17
@ctrees - I need to play with and learn more.

greg
2018-01-11 22:17
I then need to beat on people to use it.

greg
2018-01-11 22:17
I still don?t completely understand that last part with css.

ctrees
2018-01-11 22:18
ok... well wait for my next screencast... I'll go through the details... then you can pull and run yourself

greg
2018-01-11 22:18
I also I think I?m about to break you. We are fixing the login code paths to work with cognito. Or maybe little.

ctrees
2018-01-11 22:18
it took me 3 weeks to get my brain around what React was doing

greg
2018-01-11 22:18
I mean correctly.

greg
2018-01-11 22:18
That part isn?t changing.

meshiest
2018-01-11 22:18
the new login prevents the need to go to a separate page

greg
2018-01-11 22:18
:slightly_smiling_face:

ctrees
2018-01-11 22:19
OH... I KNOW it'll all break...

meshiest
2018-01-11 22:19
it's easier to automate than the last time

meshiest
2018-01-11 22:19
I can make it easier to create custom userpools for you too

meshiest
2018-01-11 22:19
so you can test on multiple types of credentials

greg
2018-01-11 22:20
well - that is an internal-ish thing for the SaaS side, but yeah.

ctrees
2018-01-11 22:20
thats why I don't want to put more effort into it unless the css thing gets hook in

greg
2018-01-11 22:21
@meshiest - we can. While I?m very appreciative of @ctrees for contributing and pushing things, but we should be making this more a part of our overall process. :slightly_smiling_face:

ctrees
2018-01-11 22:23
btw... I'm doing the abstraction for 'other reasons' and for sure is not worth the effort unless it's in a CI chain that you like...

ctrees
2018-01-12 00:09
crap... I recorder and pushed up BUT youtube downgraded the res.. or I foo-bar'd a control...

ctrees
2018-01-12 00:09
for what it's worth...


ctrees
2018-01-12 00:10
but it's hard to see what I'm talking about... and I am very hard to follow listening too...

ctrees
2018-01-12 00:11
and I see you (royal you aka RackN) changed the Portal...

ctrees
2018-01-12 00:15
AW... you sucked in that aws stuff into React ? (browser does not redirect now ?)

greg
2018-01-12 00:17
Yes , that is what @meshiest was referencing.

ctrees
2018-01-12 00:17
Yea... maybe a skype session ? or something would be quicker... cause your right, it's not worth the effort unless you've got a code coordination strategy... my demo is really just pushing the same UX code coord to the docs also...

ctrees
2018-01-12 00:19
yup got that ... and thanks @meshiest I got your message... the code section you pointed out is for another test subject... that section will not work as is..

ctrees
2018-01-12 00:22
again... I THINK all the coordination can be done via attributes in the elements that are then used by REACT during it's render...

ctrees
2018-01-12 00:52
This may help:


ctrees
2018-01-12 00:53
or something like:


greg
2018-01-12 02:30
cool links. We have some linting we do currently as part of travis.

ctrees
2018-01-12 03:23
well... just like the swagger issue, there is a TON of 'helpful' libs... but god forbid you have to support them :wink: (the fewer the better)

ctrees
2018-01-12 03:27
you got to use something to key il8n and css theme on... with those maps feature file should be easy to generate via script

zehicle
2018-01-12 03:28
@ctrees I was talking w/ @meshiest today about adding IDs to elements in the UX. It's not a problem - we need to know which elements need IDs ("all" is not a very helpful answer on the first pass) since React does not require them.

ctrees
2018-01-12 03:30
your right 'all' is stupid :wink: that's where the il8n stuff comes in... if you have to have a language map that hits basically everything the user needs to identify

ctrees
2018-01-12 03:31
I attempted to start at the api tree... as I figure... well heck... all views in the SPA pull the data from that api...

ctrees
2018-01-12 03:31
and you've got a LOT of that in the text already...

ctrees
2018-01-12 03:32
if you had to change that text for il8n then easiest to relate the translation to the api as the 'hard reference'

zehicle
2018-01-12 03:32
COMMUNITY NOTE > today we updated the RackN Auth system to be contained within the app (no redirect). That change _should_ automatically retain your login for 30 days instead of 60 minutes.

ctrees
2018-01-12 03:34
I didn't dig to see if there is some sort of il8n hooks ... from the DOM side I didn't notice anything

greg
2018-01-12 03:39
@ctrees we don?t have any of that in place in either system.

ctrees
2018-01-12 03:40
humm... maybe that's the place to start... a spreadsheet with all the GET calls to drp-cli list that the UI calls ... simple start to il8n too ?

ctrees
2018-01-12 03:42
just do KISS, one at a time, till a solid pattern emerges that you-all like ?

greg
2018-01-12 03:42
Well. The weird / powerful part is that it is mostly done as a single component with for most things. So it is one get with parameterizrd inputs. The inputs could be i18b

zehicle
2018-01-12 03:46
@ctrees I don't understand why you want the API calls for UX testing

ctrees
2018-01-12 03:46
SO it really is just a big json blob... lets start with that... heck... I want that spreadsheet anyway... and that blob is probably where to start... just associate data with ui-id elements as in an il8n effort ?

ctrees
2018-01-12 03:46
the UI is just to show data from the CALLS ?? correct ??

zehicle
2018-01-12 03:48
the UI is getting API data, yes. Each screen will call multiple APIs to build the info

zehicle
2018-01-12 03:48
some more than others. PLUS there are two APIs - DRP and the SaaS

ctrees
2018-01-12 03:51
would you agree that UX testing is about making sure the Human has an Expected Experience ? aka isn't the API data the 'source of truth' for the UI ?

ctrees
2018-01-12 03:57
anyway... if I'm not making sense... give me a skype or a zoom

zehicle
2018-01-12 03:58
I see. makes sense. Was more thinking about render and logic issues, not data

zehicle
2018-01-12 04:00
technically, the dev > network display shows all the API calls the UX makes. we use that all the time in troubleshooting

wdennis
2018-01-12 04:28
@zehicle Is "dev > network" in the browser dev tools, or somewhere in the RackN Portal?

wdennis
2018-01-12 04:29
And, do you do cross-browser testing? (There's more than just Chrome out there...)

meshiest
2018-01-12 04:29
@wdennis dev network is the network view of dev tools

wdennis
2018-01-12 04:30
Where is it? (not a web dev, but would like to see REST calls the UX is making [or not] at times)

meshiest
2018-01-12 04:30
Right clicking on a page and inspecting element should allow you to open the dev tools

wdennis
2018-01-12 04:30
OK

meshiest
2018-01-12 04:30
There should be a tab with network on most modern browsers including edge

wdennis
2018-01-12 04:31
Aha, <ctrl>-click on a Mac :slightly_smiling_face:

wdennis
2018-01-12 04:35
Does not work on Safari v11.0.2 (12604.4.7.1.6)

wdennis
2018-01-12 04:36
Does work on Chrome (v63.0.3239.132)

pierre.romagne
2018-01-12 14:02
has joined #community201801

romain.lafontaine
2018-01-12 15:27
has joined #community201801

zehicle
2018-01-12 15:29
welcome @pierre.romagne

romain.lafontaine
2018-01-12 15:34
Hello there

shane
2018-01-12 15:50
@pierre.romagne and @romain.lafontaine - welcome

pierre.romagne
2018-01-12 15:57
\o - hey guys

chermack
2018-01-12 16:51
There is an UbiSoft Slack channel

shane
2018-01-13 00:02
- we hope you'll join us next Tuesday at 11am PST for our 9th online meetup. See the meetup page for agenda and RSVP info: https://www.meetup.com/digitalrebar/events/xmrktnyxcbvb/

ctrees
2018-01-13 23:06
So... nothing is answering a dhcp request... and I'm attempting to debug why...

ctrees
2018-01-13 23:06
catmini:drpisolated msops$ sudo ./dr-provision --static-ip=192.168.88.9 --base-root=/Users/msops/Code/drpfeature/drpisolated/drp-data --local-content="" --default-content=""

ctrees
2018-01-13 23:07
catmini:drpfeature msops$ sudo route -n add -net 255.255.255.255 192.168.88.9

ctrees
2018-01-13 23:07
(the macosx route thing)

ctrees
2018-01-13 23:07
added subnet...

ctrees
2018-01-13 23:08
added subnet via the UX

ctrees
2018-01-13 23:08
saw the request vi dhcpdump

ctrees
2018-01-13 23:10

ctrees
2018-01-13 23:11
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F8SGP4VH8/subnet_info_via_ux.txt and commented: Deleted message and replaced with snippet


shane
2018-01-14 18:01
@ctrees also make sure you add the `--static-ip=192.168.88.9` (assuming _88.9_ is your DRP instance IP) additionally - make __certain__ that the DHCP instance for your hypervisor (VirtualBox ?) is disabled on that subnet - and vbox often lies about the status - and you may need to completely restart vbox after disabling DHCP to make it actually disable (which might mean a reboot in vbox's case to be *certain*)

ctrees
2018-01-14 19:00
Thanks... all your assumptions are correct, but I have not rebooted (doing now)

ctrees
2018-01-14 19:02
I was experimenting with dhcp client (just letting another maclaptop boot in dhcp mode)... to see the traffic on the network (which I saw via wireshark)

ctrees
2018-01-14 19:03
the drp logs seemed to hand out an IP, but the maclaptop never 'used' it (aka used a self assigned IP)...

ctrees
2018-01-14 19:04
I did see BOOTP malform warning messages in wireshark (but not sure where they came from)

ctrees
2018-01-14 19:11
is the https error just the local redirect ? (login into portal with default user/pw)

ctrees
2018-01-14 19:11

ctrees
2018-01-14 19:13
THAT WORKS @shane THANKS!!

ctrees
2018-01-14 19:19
humm... to my 'suprise' that also fixed the maclaptop dhcp request also... I take it the vbox 'route' adjustments basically foo-bar's the dhcp server access (aka outbound packets) to the real network port too... somehow...

shane
2018-01-14 19:24
vbox == messy

ctrees
2018-01-14 20:39
macosx == messy

shane
2018-01-14 20:40
macosx + vbox == pulling_hair_out

zehicle
2018-01-15 00:46
seems like --static-ip solves most problems of DRP not answering network requests as expected. sorry I did not suggest it earlier when I saw the thread.

greg
2018-01-15 04:26
especially on a mac.

florent.wagener
2018-01-15 15:42
hi guys, I've set up a physical to test what I've done only in a virtual environment. So I am using a CentOS 7 on a Dell R620 as my drp server. As for now I am trying to do a basic discovery of a ProLiant DL360 Gen9 but I am facing an issue with it. I have set up a subnet as below: ``` { "ActiveEnd": "10.0.49.200", "ActiveLeaseTime": 60, "ActiveStart": "10.0.49.100", "Available": true, "Enabled": true, "Errors": [], "Meta": {}, "Name": "bond0", "NextServer": "10.0.49.254", "OnlyReservations": false, "Options": [ { "Code": 1, "Value": "255.255.255.0" }, { "Code": 3, "Value": "10.0.49.1" }, { "Code": 6, "Value": "10.0.0.9" }, { "Code": 15, "Value": "http://example.com" }, { "Code": 28, "Value": "10.0.49.255" }, { "Code": 67, "Value": "{{if (eq (index . 77) \"iPXE\") }}default.ipxe{{else if (eq (index . 93) \"0\")}}ipxe.pxe{{else}}ipxe.efi{{end}}" } ], "Pickers": [ "hint", "nextFree", "mostExpired" ], "Proxy": false, "ReadOnly": false, "ReservedLeaseTime": 7200, "Strategy": "MAC", "Subnet": "10.0.49.254/24", "Validated": true } ``` Unfortunately when loading sledgehammer, I got a kernel panic: ```VFS: Cannot open root device "live:/sledgehammer.iso" or unknown-block(0,0): error -19 Please append a correct "root=" boot option; here are the available partition: Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-693.2.2.el7.x86_64 #1``` followed by the Call Trace

2018-01-15 15:42
Time to feed the :bear:!

florent.wagener
2018-01-15 15:42
any idea what could cause that ?

florent.wagener
2018-01-15 15:44
If I use `{{if (eq (index . 77) "iPXE") }}default.ipxe{{else if (eq (index . 93) "0")}}lpxelinux.0{{else}}bootx64.efi{{end}}` the system is just looping in elilo

greg
2018-01-15 15:45
What is the system set to boot with (legacy BIOS or uefi?)

florent.wagener
2018-01-15 15:45
@greg UEFI

greg
2018-01-15 15:46
okay - checking somethings

shane
2018-01-15 15:48
(too late, I saw that)

greg
2018-01-15 15:48
nothing obvious

vlowther
2018-01-15 15:49
will test with our local gear

florent.wagener
2018-01-15 15:49
I'm gonna try on a different server to see if it's not a driver issue or something...

shane
2018-01-15 15:50
what kind of NIC do you have in that server ?

florent.wagener
2018-01-15 15:52
So I I am using `HP Ethernet 10Gb 2-port 561FLR-T Adapter`

greg
2018-01-15 15:54
@florent.wagener - give us a second. We dont? test UEFI every day. We are reseting some stuff to get there. I may be a few minutes.

florent.wagener
2018-01-15 15:55
@greg thanks, no problem :slightly_smiling_face:

florent.wagener
2018-01-15 16:24
So I just tried on a Dell PowerEdge R620 without UEFI configured and it worked like a charm. However when switching to UEFI the boot failed: `PXE-E23 Client received TFTP error from server`

2018-01-15 16:24
Time to feed the :bear:!

greg
2018-01-15 16:36
Yeah - we are looking into . We think we broke something in the latest DHCP/TFTP changes for UEFI. Should have a fix in a bit.

florent.wagener
2018-01-15 16:46
I suppose I will have to update the solution ? If so that's cool, a new test for me to do :slightly_smiling_face:

greg
2018-01-15 16:47
We are finding issues with UEFI.. Once we get a fix, you should be able to update the DRP and maybe the default content, thought it is looking like DRP. Then both should work.

florent.wagener
2018-01-15 17:24
alright !

florent.wagener
2018-01-15 19:27
Looks like I killed the channel :slightly_smiling_face:

shane
2018-01-15 19:28
why! why! Why did you have to go and kill the channel !! ?? ( :sob: )

zehicle
2018-01-15 19:49
you'll have to say something scarier than UEFI... something like ITIL

zehicle
2018-01-15 19:49
or GDPR

florent.wagener
2018-01-15 19:57
agile !

viktor.ekmark
2018-01-15 20:47
has joined #community201801

vlowther
2018-01-15 20:59
@florent.wagener I duplicated your issue -- there seems to be some behavioural issues between how the kernel and initrds are being loaded with ipxe in EFI mode and ipxe in legacy PXE mode -- with the subnet configured to use ipxe both ways, I can boot a qemu VM running with seaBIOS into sleddgehammer just fine, where a qemu VM running tianocore crashes in the way you describe.

florent.wagener
2018-01-15 21:03
@vlowther Great, any fix in head ?

vlowther
2018-01-15 21:03
Not yet.

vlowther
2018-01-15 21:04
I have duplicated it (after stumbling across all sorts of nifty ways in which UEFI almost but not quite implements pxe sanely)

shane
2018-01-15 21:06
@viktor.ekmark welcome

marc.heckmann
2018-01-15 21:48
For the record, elilo din't work any better w/ UEFI, ipxe actually got further. elilo wouldn't even boot into the kernel: It complained about an error on line 6 of the template

greg
2018-01-15 21:50
we are finding that elilo just may not work at all anymore. @marc.heckmann

greg
2018-01-15 21:50
by we, I mean @vlowther

marc.heckmann
2018-01-15 21:51
I recall w/ our old cobbler based solution that the bootx64.efi from CentOS 6 worked, but not the one from CentOS 7

marc.heckmann
2018-01-15 21:51
I didn't dig any deeper as to why unfortunately

marc.heckmann
2018-01-15 21:52
Something like that anyway, it's been a while

vlowther
2018-01-15 21:52
so, the tl;dr for ipxe is that when booting via legacy BIOS, you do not have to pass an initrd= argument to the command line -- ipxe does that natively. For whatever reason, it does not do that aitomatically when booting via UEFI.

vlowther
2018-01-15 21:53
It is rather annoying.

marc.heckmann
2018-01-15 21:57
ok, thanks, good to know

vlowther
2018-01-15 22:02
I am looking to see if I can make elilo work

vlowther
2018-01-15 22:02
the issue with it is that it does not have a native IPAPPEND featire (like pxelinux), not can it fake it (like we can with ipxe)

vlowther
2018-01-15 22:03
We need that to know which nic we booted from, and therefore which nic to DHCP on to fetch the second-stage bootloader.

marc.heckmann
2018-01-15 22:05
ok, I'll have to look at exactly what we did w/ Cobbler, 'cause I'm pretty sure we used IPAPPEND on UEFI.

marc.heckmann
2018-01-15 22:17
So after investigation, it turns out that the `bootx64.efi` that we're using w/ Cobbler is actually Grub. It supports the `macappend 2` statement. Any reason why Grub couldn't be used w/ DRP?

vlowther
2018-01-15 22:29
grub1 or grub2?

vlowther
2018-01-15 22:30
It has been a few years since I have tried them for booting UEFI systems over the network.

vlowther
2018-01-15 22:33
With the arch we were using at the time, they were flakier than elilo

vlowther
2018-01-15 22:34
they tended to have issues relocating kernels and initrds when they got too big

vlowther
2018-01-15 22:34
where elilo did not.

vlowther
2018-01-15 22:43
Otherwise, there is nothing in principle preventing the use of grub2

vlowther
2018-01-15 22:43
it is just another binary to include and another set of templates to expand.

vlowther
2018-01-15 22:44
I am more tempted to just standardize on ipxe, through.

marc.heckmann
2018-01-15 22:44
`GNU GRUB 0.97` is what I'm seeing in the `strings` output

marc.heckmann
2018-01-15 22:46
Like I said, whatever was shipping w/ CentOS 7 wasn't really working for us. Not sure if that was GRUB 2 or not

vlowther
2018-01-15 22:46
ah, the old version of grub that is no longer maintained or supported by upstream.

marc.heckmann
2018-01-15 22:46
I guess that's it


vlowther
2018-01-15 22:47
I liked it, bu then grub2 came along which was much more modular. And brittle. And flaky.

marc.heckmann
2018-01-15 22:49
In any case, it's sounding like standardizing on ipxe is the better way to go

ctrees
2018-01-16 01:12
In: http://provision.readthedocs.io/en/latest/doc/integrations/krib.html#configure-with-the-ux There is: centos-7-install -> runner-service:Success But in my endpoint UI I can't find "runner-service" I loaded krib content

greg
2018-01-16 01:31
You need task-library

ctrees
2018-01-16 01:39
ok... thanks

shane
2018-01-16 02:10
@ctrees my bust - I forgot to document that piece :flogs_self:

ctrees
2018-01-16 02:11
np... cause without your doc I would have gotten this far...

ctrees
2018-01-16 02:12
but I did something... dr-provision2018/01/16 02:08:41.443797 [4555:23165]frontend [audit]: /home/travis/gopath/src/github.com/digitalrebar/provision/frontend/frontend.go:636 [4555:23165]Authenticated rocketskates http2: server: error reading preface from client 192.168.88.9:62336: read tcp 192.168.88.9:8092->192.168.88.9:62336: read: connection reset by peer

shane
2018-01-16 02:12
did you use `tip` version ?

ctrees
2018-01-16 02:13
not sure... checking

shane
2018-01-16 02:14
`drpcli info get`

shane
2018-01-16 02:14
or `dr-provision --version`

ctrees
2018-01-16 02:16
catmini:drpisolated msops$ ./dr-provision --version dr-provision2018/01/16 02:16:37.366528 Version: v3.5.0-tip-49-6aea7e647d6cb992e22a141ce1411a3b3af73095

shane
2018-01-16 02:16
yeah - you running tip

shane
2018-01-16 02:17
you're 49 commits ahead of v3.5.0 stable

shane
2018-01-16 02:17
back rev to 3.5.0 stable please

ctrees
2018-01-16 02:17
I was attempting to force too...

ctrees
2018-01-16 02:18
you don't happen to have the git checkout string in your head :wink:

shane
2018-01-16 02:20
```pkill dr-provision cd <wherever_your_install_path_is> curl -s get.rebar.digital/stable | bash -s -- install --force --isolated``` (assuming isolated mode, otherwise, drop that flag)

shane
2018-01-16 02:20
8d49a776c3d7b40d2af07a356e7b33d2e2b99ca2

shane
2018-01-16 02:21
hmm - that's 3.4.1 version - not sure why stable isn't pulling v3.5.0 - checking on that

ctrees
2018-01-16 02:22
I'll just switch to the 3.5.0 tag

shane
2018-01-16 02:22
3.4.1 should be stable for KRIB

ctrees
2018-01-16 02:25
oh... I forgot... I did the install NOT a clone... guess I'm doing it your way

shane
2018-01-16 02:26
ah - my fault w/ versions ... I had a stupid startup script launching /usr/local/bin/dr-provision instead of my local isolated install binary ... :smacks_head:

shane
2018-01-16 02:28
so - there you go: `8dd3ac9c62a2555d315e07f5a190f2230e3a7ca7`

wdennis
2018-01-16 02:53
Just dropping by to say that KRIB is great stuff... Having loads of fun learning k8s on metal!

ctrees
2018-01-16 02:56
what are the things you are loading on k8s to learn ? mainly curious ? I'm headed toward a blender render cluster thing right now....

wdennis
2018-01-16 03:02
Mostly just learning the base system... How to deploy apps, publish them, do ingress... how to troubleshoot the problems I encounter (create)

wdennis
2018-01-16 03:03

wdennis
2018-01-16 03:04
^^^ my architecture

wdennis
2018-01-16 03:05

wdennis
2018-01-16 03:08
Learned some stuff about KRIB too - like adding a node after 24h is a bit painful :wink: (join token expires)

ctrees
2018-01-16 03:14
yup... I read that :wink: and they have a new login method ...

ctrees
2018-01-16 03:15
so @shane I get : Control Workflow CANNOT ACCESS: Updated Version Required! Workflow allows users to define automatic transitions between Stages on a Profile basis. Workflows require use of Stages and Tasks. In the Workflow View of UI....

ctrees
2018-01-16 03:18
where I didn't get with tip... AND I think my error on tip may have been something stuck in the UI (as when I switched the old UI was 'stuck' the same way... ended up clearing browser.

greg
2018-01-16 03:20
log into the SaaS and see if that clears it up.

ctrees
2018-01-16 03:20
ok..

ctrees
2018-01-16 03:23
I was log in... but I loaded the task-library and krib "Content" and workflow is working... thanks

ctrees
2018-01-16 14:53
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F8TL685AR/drp-krib-test.png and commented: Was attempting the krib demo... got nodes into sledgehammer-wait but when I put into k8s-cluster-install... they seemed to not make it past pxe ?? (when I rebooted the node)

ctrees
2018-01-16 14:59
From what I can tell.. no ks file is generated in drp-data/tftpboot/machines BUT that's what the machine was looking for... (I rebooted and took some snaps)

ctrees
2018-01-16 15:01
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F8TLC5469/drp-krib-test-reboot.png and commented: Begin of reboot

ctrees
2018-01-16 15:02
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F8TLCMSQ5/drp-krib-test-no-ks.png and commented: looking for ks

greg
2018-01-16 15:02
What bootenv is the machine in?

ctrees
2018-01-16 15:04

greg
2018-01-16 15:05
Are those stages and bootenvs available?

ctrees
2018-01-16 15:07
and I've got no job-logs... ... OH... it's IN the UI, but not in the folder :wink:

ctrees
2018-01-16 15:08
I was going to ask about that... same as templates... I don't see them in the digitalrebar/templates but they are in the UI... I take it they are held in the saas-content/*.yaml ?

greg
2018-01-16 15:09
yes.

greg
2018-01-16 15:09
they are read-only in memory content.

ctrees
2018-01-16 15:11
ahw... the other (aka tip) I was getting bin log in digitalrebar/templates

ctrees
2018-01-16 15:12
woops drp-data/job-logs

ctrees
2018-01-16 15:13
and json in drp-data/digital-rebar/jobs/*.json

greg
2018-01-16 15:13
that is where it should go

greg
2018-01-16 15:14
jobs are ?ephemeral?

greg
2018-01-16 15:14
so they get stored in the ?database?

ctrees
2018-01-16 15:14
ok... so know thing between 3.5.0 and tip... and I guess I need to 'download' all the stage / tasks so the ks will generate ?

greg
2018-01-16 15:15
you need to make sure that the content package is in.

ctrees
2018-01-16 15:16
catmini:drpfeature msops$ ls -alu drpisolated/drp-data/saas-content/ total 208 drwxr-xr-x+ 5 msops staff 160 Jan 16 09:16 . drwxr-xr-x+ 9 msops staff 288 Jan 16 09:16 .. -rw-r--r--+ 1 msops staff 71999 Jan 15 22:04 default.yaml -rw-r--r--+ 1 root staff 14259 Jan 15 21:20 krib-v1.4.0-0-23e369560a623da0e69a08b925c2815343c9d987.yaml -rw-r--r--+ 1 root staff 14161 Jan 15 21:20 task-library-v1.4.0-0-23e369560a623da0e69a08b925c2815343c9d987.yaml catmini:drpfeature msops$

ctrees
2018-01-16 15:32
so does the get 'formed' by drp-data/digitalrebar/machines/f5335...248.json ? (seems that's where the node got stuck) but I see no job generated

shane
2018-01-16 15:36
@ctrees - yes, all content is dynamically rendered on request by DRP - and served from a read-only in-memory layer - you won't see the created elements on the DRP endpoint's filesystem

shane
2018-01-16 15:37
any templates will be instantiated, template pieces filled in, and made available to the Machine at request time

2018-01-16 16:18
hrmmm

greg
2018-01-16 16:19
@bsdwatch?

2018-01-16 16:19
signed in witth wrong github account

2018-01-16 16:19
grrrrr

2018-01-16 16:21
no way back i guess

2018-01-16 16:27
there we go

2018-01-16 16:28
ok anyway.... fixed that.... now...

greg
2018-01-16 16:39
FYI - I?m going to start on a release for DRP. 3.6. With the completion of the UEFI boot fixes, i?m going to start the process. Tip will be pre-release shortly.

vlowther
2018-01-16 16:40
@ctrees Dynamic rendered templates can contain sensitive information, so they cannot be browsed. If you know exactly what the generated filename is, you can pull it directly with http or tftp.

vlowther
2018-01-16 16:41
It is totally not because I was too lazy to implement a full filesystem overlay with directory merging.

ctrees
2018-01-16 16:46
[GIN] 2018/01/16 - 10:41:48 | 200 | 10.291µs | 192.168.88.9 | OPTIONS /api/v3/machines/db1dcb0f-d0b6-4afb-9da9-e62b62a68e24 [GIN] 2018/01/16 - 10:41:48 | 422 | 2.666013ms | 192.168.88.9 | PATCH /api/v3/machines/db1dcb0f-d0b6-4afb-9da9-e62b62a68e24

ctrees
2018-01-16 16:47
something I'm doing in the UI ... I think... when I attempt run the krib

shane
2018-01-16 16:48

shane
2018-01-16 16:49
also - you have your DRP endpoint (192.168.88.9) set as your default GW - presumably you've enabled `ip_forwarding` through your endpoint ?

ctrees
2018-01-16 16:51
now that is starting to make sense... I used the auto-gen one this time...

ctrees
2018-01-16 16:59
so anaconda fails to fetch kickstart but I can browse and get it ?

shane
2018-01-16 17:35
@ctrees did you modify the bootenv and associated templates in any way ?

ctrees
2018-01-16 17:43
... I was attempting to follow that key paragraph in your doc...


ctrees
2018-01-16 17:44
then I go back to the rob video...

ctrees
2018-01-16 17:46
I'll push what I've attempted... give me a sec... (I think you may be right... I've got a local subnet route problem... probably the vm can't talk the same way to the api as the browser... I have bridge adaptor but who knows WTF vbox / mac is doing...)


shane
2018-01-16 18:59
our meetup is starting in a few minutes - the agenda is here: https://docs.google.com/document/d/1b72e1dIAJgsfvJbJUpBG9Jmhq6WC1f5SYAOlkcwT0KQ


wdennis
2018-01-16 19:00
@shane Zoom link?

shane
2018-01-16 19:00

shane
2018-01-16 19:01
it's always posted in the http://meetup.com posting

shane
2018-01-16 19:53

2018-01-16 19:56
thanks for the link

shane
2018-01-16 19:57
you betcha

shane
2018-01-16 19:58
any feedback appreciated - if you have questions please drop them here - @wdennis is now a PRO at KRIB - and @ctrees is getting there too :slightly_smiling_face:

wdennis
2018-01-16 20:00
We need to document that ?renew the join token in the profile? stuff (enabling add-on nodes well past initial cluster bringup)

vlowther
2018-01-16 20:07
victor@m4700:~/gocode/src/github.com/digitalrebar/provision/pacman (master) $ drpcli profiles get global param package-repositories [ { "installSource": true, "os": [ "centos-7" ], "tag": "centos-7-install", "url": "http://192.168.124.11:3002" }, { "installSource": true, "os": [ "sledgehammer/f5ffd3ed10ba403ffff40c3621f1e31ada0c7e15" ], "tag": "sledgehammer", "url": "http://192.168.124.11:3001" } ] victor@m4700:~/gocode/src/github.com/digitalrebar/provision/pacman (master)

vlowther
2018-01-16 20:07
That is the package-repositories attrib I used for the no-local-repos demo

vlowther
2018-01-16 20:07
One thing I did not call out is that all the files I used for the PXE process were also in the remote repos

vlowther
2018-01-16 20:09
and that dr-provision transparently proxied all the required TFTP requests for the kernel and initrd to the remote repos sledgehammer and the centos-7 install were configured to use.

ctrees
2018-01-16 20:50

ctrees
2018-01-16 20:50
ssh-access: user1: ssh <user_1_key> user@krib user2: ssh <user_2_key> user@krib

shane
2018-01-16 20:50
that's a standard ssh key injection process - we've documented that in a video on youtube

shane
2018-01-16 20:51

ctrees
2018-01-16 20:51
I guess my question was... is that setup ASSUMED before the krib doc? (that may be what I was missing)

shane
2018-01-16 20:52
it's not really needed - as we can operate w/out the keys - we don't use SSH

shane
2018-01-16 20:52
but it's for your convenience to be able to log in to the Kube master if necessary

shane
2018-01-16 20:52
again - you don't need to SSH to kube master - you can pull the profile config from DRP - and use a remote `kubectl` tool to manage the cluster

shane
2018-01-16 20:53
this allows you to build a cluster w/ zero login needed for a more secure environment

ctrees
2018-01-16 20:53
well then Im sort of out of ideas why the node can't get to the ks

ctrees
2018-01-16 20:55
andaconda says it can't get the file (and hangs) but curl can.. (and browser)...

shane
2018-01-16 20:56
is your curl test a different system from the Machine you're trying to provision ?

shane
2018-01-16 20:56
it might be a Machine --> DRP Endpoint connection problem

ctrees
2018-01-16 20:56
I don't actually know cause I can't get into that machine :wink:

shane
2018-01-16 20:56
is this virtualbox, physical infra, etc ?

ctrees
2018-01-16 20:57
vbox... but I think first I'm going through the ssh stuff... I need to understand that anyway...

shane
2018-01-16 20:57
are you able to do just a simple c7 install to a VM in vbox using this network setup ?

ctrees
2018-01-16 20:58
yes... but I didn't know what users it setup... aka ssh as I see the task fly by

shane
2018-01-16 20:59
that's "documented" in the kickstart how it gets built up with user creds

shane
2018-01-16 20:59
the root user in c7 case should have pw of "RocketSkates" (I believe)

ctrees
2018-01-16 20:59
ok... thanks... I'll peek at that...

shane
2018-01-16 20:59
accessible from the "console" of the VM

ctrees
2018-01-16 21:00
(aka good idea I should have though of)

shane
2018-01-16 22:19
- anyone local to the SF Bay Area - I'll be presenting Digital Rebar on Thursday evening at the BayLISA meetup group. More details:


shane
2018-01-16 22:43
Also - here's the replay video from today's meetup - which was crammed full of goey ooye goodness ... Did Victor avoid the Demo Gods wrath by presenting on THREE topics - combining TWO pieces of brand spanking new functionality combined in to a ONE demo? You're going to have to watch to find out ... :slightly_smiling_face: https://youtu.be/dtCKxueGEic

ctrees
2018-01-16 23:00
I'm guessing I have a network problem... I've done most things (other than fire up wireshark and look at all the traffic)... but I'm pretty sure @shane was correct... my machine -> endpoint does not talk... yet it get through ipxe... AND (I think) sledgehammer... but fails on anaconda

ctrees
2018-01-16 23:00
dr-provision2018/01/16 22:56:11.917015 Found our lease for strat: MAC token 08:00:27:66:d8:66, will use it dr-provision2018/01/16 22:56:11.918219 Received option: OptionDHCPMessageType: 3 dr-provision2018/01/16 22:56:11.918250 Received option: OptionParameterRequestList: dr-provision2018/01/16 22:56:11.918272 Received option: OptionVendorClassIdentifier: anaconda-Linux 3.10.0-693.el7.x86_64 x86_64 dr-provision2018/01/16 22:56:11.918681 xid 0x4f5f8353: Request handing out: 192.168.88.10 to 08:00:27:66:d8:66 via 192.168.88.9

ctrees
2018-01-16 23:01
now I'm attempting to figure out what 'in networking' is different when anaconda fetches the kickstarter...

zehicle
2018-01-17 15:05
I've added some keywords to slack that will bring up links when we type them: FAQ, KRIB, meetup, and issue.


zehicle
2018-01-17 15:05
FAQ


zehicle
2018-01-17 15:10
and :cloudia:

zehicle
2018-01-17 15:17
please let us know if we need other quick links.

shane
2018-01-17 15:19
show me the quickstart

2018-01-17 15:19

2018-01-17 15:47
ok how do we set reservations for existin systems / vms





2018-01-17 15:52
ahhh the mac is the token... got it

shane
2018-01-17 16:10
@shane set the channel topic: For SlackBot command help, type: "slackbot help"

2018-01-17 16:10
Available Commands: FAQ, $FAQ, $faq, $KRIB, $krib, $meetup, $Meetup, $issue, $Issue, $issues, $Issues, $quickstart, $QuickStart

shane
2018-01-17 16:11
$quickstart

2018-01-17 16:11

2018-01-17 16:24
$KRIB

2018-01-17 16:24
pffftttt

2018-01-17 16:25
ignore that

shane
2018-01-17 16:25
Hmm - might be because you're not a Slack user and coming in through a sameroom

shane
2018-01-17 16:25
$KRIB


2018-01-17 16:27
i am a slack user jus no loged into it now

shane
2018-01-17 16:27
doesn't count unless you're logged in ... :slightly_smiling_face:

2018-01-17 16:28
LOL

kamp.scott
2018-01-17 16:47
$KRIB


kamp.scott
2018-01-17 16:47
hah

zehicle
2018-01-17 17:02
welcome @kamp.scott!

wdennis
2018-01-17 17:13
@wdennis uploaded a file: https://rackn.slack.com/files/U416T0AAX/F8UAC0HUJ/derpy.jpg and commented: We need a "D[e]RP" emoji...

kamp.scott
2018-01-17 18:06
@zehicle thanks

kamp.scott
2018-01-17 18:07
So KRIBs on 5 large vms?

kamp.scott
2018-01-17 18:07
Spun up via rebar

kamp.scott
2018-01-17 18:08
Curious also why debiab and ububtu vms installed via rebar have no console access

kamp.scott
2018-01-17 18:09
I think I might go redeploy a production rebar first

greg
2018-01-17 18:28
Because the console= kernel param is not correct for your VM environment. I would guess. You can set the parameter `kernel-console` to `console=ttyS1,115200` or whatever it needs to be for your environment.

greg
2018-01-17 18:28
That can be set in the global profile.

greg
2018-01-17 18:28
The above is what we use for http://packet.net (as an example).

greg
2018-01-17 18:37
@kamp.scott - sorry forgot to note you

kamp.scott
2018-01-17 18:39
@greg I'll try it thanks

florent.wagener
2018-01-17 18:49

vlowther
2018-01-17 18:55
Cool. I have further cleanups incoming -- the T320 I keep around is significantly pickier then qemu with tianocore.

florent.wagener
2018-01-17 19:05
@vlowther aaaah my bad, I didn't see that I was using Legacy BIOS :smile: Let me try again with UEFI !

vlowther
2018-01-17 19:11
ok -- if you cound capture a tcpdump of the DHCP traffic while that is happening and send that to me that would help.

vlowther
2018-01-17 19:11
or a dhcpdump, if you have that tool lying around handy. :slightly_smiling_face:

vlowther
2018-01-17 19:12
has been staring at dhcp packets for the last couple of days trying to bang all this stuff out.

florent.wagener
2018-01-17 19:13

florent.wagener
2018-01-17 19:13
let me check :slightly_smiling_face:

vlowther
2018-01-17 19:16
yeah, that looks familiar.

vlowther
2018-01-17 19:16
I will have a patch out shortly.

kamp.scott
2018-01-17 19:21
so curious now can i got from stanndalone to a production install without re-insallin ?

florent.wagener
2018-01-17 19:21
@florent.wagener uploaded a file: https://rackn.slack.com/files/U8FAN7PLK/F8UCYJV8S/tcpdump_uefi_boot_failure.txt and commented: @vlowther here's the tcpdump of 2 boot sequences.

shane
2018-01-17 19:43
@kamp.scott - not really a "supported" path for that. However, you should be able to do it with the following hackery ... 1. backup your current `drp-data` directory (eg `tar -czvf /root/drp-isolated-backup.tgz drp-data/`) 2. `pkill dr-provision` service 3. perform fresh install on same host, without the `--isolated` flag 4. follow the start up scripts setup - BUT do NOT start the `dr-provision` service at this point 5. copy the `drp-data/*` directories recursively to `/var/lib/dr-provision` (eg: `unalias cp; cp -ra drp-data/* /var/lib/dr-provision/`) 6. make sure you're start up scripts are in place for your production mode (eg: `/etc/systemd/system/dr-provision.service`) 7. start the new production version with `systemctl start dr-provision.service` 8. verify everything is running fine 9. delete the `drp-data` directory (suggest retaining the backup copy for later just in case) 10. YMMV ... buyer beware ... I didn't fully test this process ... don't run with scissors, sharp objects may poke you, etc...

kamp.scott
2018-01-17 19:45
@shane hrmmmm might just consider a reinstall....

shane
2018-01-17 19:46
this is a "reinstall" - just pulling over any content and configurations and machine data from previous provisioning activities

kamp.scott
2018-01-17 20:19
so have he debian / ubunu disk partiions issue been fixed in he newest version ?

kamp.scott
2018-01-17 20:20
sine in a vm hey are xvdX and not sdX

shane
2018-01-17 20:21
change operating system disk

shane
2018-01-17 20:22
did you try that ?

kamp.scott
2018-01-17 20:30
@greg yes tha works for VMs bu no fo bare metal

kamp.scott
2018-01-17 20:30
and yes some of my keys on the keyboard are on holiday

shane
2018-01-17 20:31
bare metal most likely has a different device name than a VM - can you please verify the device names in a bare metal install ?

kamp.scott
2018-01-17 20:32
@shane linnux is sda sdb sdc on baremetal

kamp.scott
2018-01-17 20:33
hence my issue if i use rebar for a VM it fails... unless globally is set to xvda

kamp.scott
2018-01-17 20:36
@shane so just .... curl -fsSL get.rebar.digital/stable | bash -s -- install

shane
2018-01-17 20:44
ah - so what you're probably doing is applying the "operating-system-disk" in the global profile - and trying to provision BOTH VMs and bare metal - that won't work

shane
2018-01-17 20:45
you need to apply the Param `operating-system-disk` to individual machines. One way to do that is create a Profile - say "virtual-machines" and add the Param to that Profile - make any customizations in that Profile - then add that profile to the VMs

shane
2018-01-17 20:45
similarly - create a Profile named "bare-metal" and add any customizations specific to the bare metal hosts in that Profile - now apply that profile to the bare metal Machines

shane
2018-01-17 20:45
(remember to remove the global Param `operating-system-disk` when doing it this way)

kamp.scott
2018-01-17 20:46
@shane you guys provision vms and baremetal don you ?

shane
2018-01-17 20:47
yes - but our focus is on bare metal - VMs are for testing in our world (they have their place) - but our core value is managing bare metal systems

shane
2018-01-17 20:47
as you've tested - VMs work too

kamp.scott
2018-01-17 20:47
so is mine however theres jus some hings i can do in docker in a vm

shane
2018-01-17 20:47
but you need to provide customizations to different classes of machine s- that's what Profiles and Params give you

shane
2018-01-17 20:48
you just have to apply them to the right thing to do the different things correctly

kamp.scott
2018-01-17 20:48
yeah... rebars startin to make me feel a bit stupid

kamp.scott
2018-01-17 20:48
have to et my head around it better

kamp.scott
2018-01-17 20:49
omg this$%R$^%^& keyboard

kamp.scott
2018-01-17 20:52
which is why i considered a full reinstall ive since realized if you shutdown rebar any vm launched with it looses its ip in time


kamp.scott
2018-01-17 21:07
@shane nope fail..... doesnt seem that works.... subnets iso machines workflow all missin

ctrees
2018-01-17 21:07
I was attempting to assign the discovered machine...

ctrees
2018-01-17 21:07
[drpops@drpe drpisolated]$ ./drpcli machines list | jq '.[].Uuid' "4678729e-5147-43f6-a569-93b7668b8a40" [drpops@drpe drpisolated]$ ./drpcli machines bootenv "4678729e-5147-43f6-a569-93b7668b8a40" centos-7-install Error: ValidationError: machines/4678729e-5147-43f6-a569-93b7668b8a40: Can not change bootenv while in a stage unless forced. old: sledgehammer new centos-7-install [drpops@drpe drpisolated]$ ./drpcli machines bootenv "4678729e-5147-43f6-a569-93b7668b8a40" centos-7-install --force Error: ValidationError: machines/4678729e-5147-43f6-a569-93b7668b8a40: Can not change bootenv while in a stage unless forced. old: sledgehammer new centos-7-install [drpops@drpe drpisolated]$

ctrees
2018-01-17 21:08
woops... should have used snippet... but I was attempting to force the new boot env and could not...

florent.wagener
2018-01-17 21:08
dumb question, is there a default root password to login into sledgehammer ?

shane
2018-01-17 21:08
depends - what type of beer are you talking about ?

shane
2018-01-17 21:08
good beer = password


florent.wagener
2018-01-17 21:09
@shane I only drink belgian beer :slightly_smiling_face:

ctrees
2018-01-17 21:09
I think I got the command line right...

shane
2018-01-17 21:09
ok - that'll worky !

shane
2018-01-17 21:09
sledgehammer credentials: root / rebar1

florent.wagener
2018-01-17 21:09
thanks !

shane
2018-01-17 21:10
but - only at console - unless you've added `access-keys` to inject your own user SSH keys

kamp.scott
2018-01-17 21:10
@shane nevermind seems ok now i copied to drp-provision no dr-provision fixed

florent.wagener
2018-01-17 21:10
@shane of course.

shane
2018-01-17 21:11
:slightly_smiling_face:

shane
2018-01-17 21:11
I still expect beer ... just sayin'

ctrees
2018-01-17 21:15
./drpcli machines bootenv "4678729e-5147-43f6-a569-93b7668b8a40" centos-7-install --force Error: ValidationError: machines/4678729e-5147-43f6-a569-93b7668b8a40: Can not change bootenv while in a stage unless forced. old: sledgehammer new centos-7-install [drpops@drpe drpisolated]$

ctrees
2018-01-17 21:16
same result when: ./drpcli machines bootenv "4678729e-5147-43f6-a569-93b7668b8a40" centos-7-install -f

ctrees
2018-01-17 22:14
just was pinged about this: https://github.com/google/netboot

zehicle
2018-01-17 22:29
we monitor that - if you look at the commit history, it's not actively maintained there. Also, very narrow function.

vlowther
2018-01-17 22:32
@florent.wagener https://github.com/digitalrebar/provision/pull/641 should make your R620 box work in uefi mode. It works for my T320.

ctrees
2018-01-17 22:36
Oh... I've been pushing people to look at drp... and this is just them ping'n back...

kamp.scott
2018-01-17 23:19
so.... when you crash a rebar server say total loss, you literally loose access to all your VMs ? as theres no dhcp running... is this correct ?

kamp.scott
2018-01-17 23:20
scennerio rebar server failed... now i can no longer access he vms that got dhcp freom the rebar box

shane
2018-01-17 23:28
@kamp.scott - nope

shane
2018-01-17 23:30
IF DRP server is completely down - then you can not answer new DHCP queries. Existing DHCP leases remain until the renewal period in the DHCP server expires - then you'll lose IP access. This is DHCP - not Digital Rebar. So - if you are heavily leveraging DRP for DHCP services - then you might be wise to increase the DHCP lease period - so DHCP assignments live a lot longer than you expect any outage to be.

shane
2018-01-17 23:31
Since the DHCP leases are maintained as simple JSON files in the filesystem layer - then it's child's play to keep backups of your DHCP lease assignments - and bring them up in a new DRP instance if for some reason you had a catastrophic failure of your DHCP/DRP based server

shane
2018-01-17 23:32
another solution is to look at a distributed Key/Value store for the backend filesystem layer - currently we support HashiCorp Consul - this is for extreme high availability scenarios - where you have a cluster of Consul servers storing the Key/Value data for the DRP service - including the DHCP leases

kamp.scott
2018-01-17 23:32
right

shane
2018-01-17 23:33
remember - this is basic DHCP stuff - not specific to DRP - however, we do provide a lot of VERY easy mechanisms to manage higher availability than most other DHCP and provisioning servers have - via easy manipulation of the Lease data information, and via the Key/Value based distributed storage mechanism

kamp.scott
2018-01-17 23:51
@shane is there a doc on the consul confi ?

greg
2018-01-17 23:54
@florent.wagener - tip now has better UEFI support. Give it a shot please.

florent.wagener
2018-01-17 23:55
@greg thanks I'll test that tomorrow morning :slightly_smiling_face:

greg
2018-01-18 00:01
awesome

florent.wagener
2018-01-18 00:43
btw, what's the best way to upgrade drp? Right now I'm running a clone from the master branch of the github repo.

shane
2018-01-18 00:54
@kamp.scott the Consul K/V piece isn't a community feature, it's a RackN enterprise support feature

shane
2018-01-18 00:54
@florent.wagener - pretty simple in principle - kill `dr-provision` service, replace the binary with the new one, start `dr-provision`

shane
2018-01-18 00:55
this is basically all the `install.sh` script does in with `--upgrade` flag enabled

shane
2018-01-18 00:55
(`curl -s get.digital.rebar/stable | bash -s -- install --upgrade --force --version=tip --isolated`) (does Isolated install of TIP content)

kamp.scott
2018-01-18 01:04
drpcli -f machines bootenv 9102b704-03ea-40cd-becd-0a65f1d09651 ubuntu-16.04-install Error: ValidationError: machines/9102b704-03ea-40cd-becd-0a65f1d09651: Can not change bootenv while in a stage unless forced. old: sledgehammer new ubuntu-16.04-install

kamp.scott
2018-01-18 01:04
GGGGRRRRRRRRR

shane
2018-01-18 01:20
@ctrees and @kamp.scott - please note that a Stage and a BootEnv are two different things - Even though the names are the same (for convenience - a Stage of "ubuntu-16.04-install" bears the same name as the BootEnv that it implements)

shane
2018-01-18 01:25
@shane uploaded a file: https://rackn.slack.com/files/U6QFVRJNB/F8UFHRY67/stage_-vs-_bootenv.sh and commented: here's an example of the problem you're running in to

shane
2018-01-18 01:26
note the change from `bootenv` to `stage`

florent.wagener
2018-01-18 01:56
@shane thanks !

shane
2018-01-18 01:57
@florent.wagener - no problem - let me know if you bump in to issues or have questions

florent.wagener
2018-01-18 01:57
Will do. Time to disconnect for me now :slightly_smiling_face: Talk to you tomorrow :slightly_smiling_face:

shane
2018-01-18 01:58
cheers

andreas.holmsten
2018-01-18 09:02
has joined #community201801

2018-01-18 12:23
ok still not getting these reservations ive added a MAC and ip for the machines but they still dhcp a new ip from the pool and boot into sledgehammer insead of off disk

kamp.scott
2018-01-18 12:48
sorryim here now .... ok still not getting these reservations ive added a MAC and ip for the machines but they still dhcp a new ip from the pool and boot into sledgehammer instead of off disk

wdennis
2018-01-18 13:00
@kamp.scott How important is it to you to preserve the current machine records in DRP?

wdennis
2018-01-18 13:01
And, I'm assuming you are providing DHCP to your provisioning network via DRP?

kamp.scott
2018-01-18 13:01
@wdennis i have 10 machines already prebuilt / booting from hard disk... i dont want drpto do anything to

wdennis
2018-01-18 13:03
It seems that the machine records are tied to the IP address, not the MAC... If the IP changes, DRP would see it as a new machine (my understanding)

wdennis
2018-01-18 13:04
To wit: ``` $ drpcli -E https://192.168.1.148:8092 machines show 174c3987-22a4-43d4-9eb9-0247162e8628 | jq 'del(.Params."gohai-inventory")' { "Address": "192.168.1.102", "Available": true, "BootEnv": "local", "CurrentJob": "9d7d2564-26b4-439e-a049-c5b959b6da32", "CurrentTask": 0, "Description": "Dell PowerEdge R310", "Errors": [], "Meta": { "feature-flags": "change-stage-v2" }, "Name": "k8s-ingress", "OS": "ubuntu-16.04", "Params": { "ipmi/address": "idrac-796MQW1", "ipmi/password": "********", "ipmi/username": "root" }, "Profile": { "Available": false, "Description": "", "Errors": null, "Meta": null, "Name": "", "Params": null, "ReadOnly": false, "Validated": false }, "Profiles": [ "k8s-cluster1" ], "ReadOnly": false, "Runnable": true, "Secret": "*********", "Stage": "complete", "Tasks": [], "Uuid": "174c3987-22a4-43d4-9eb9-0247162e8628", "Validated": true } ```

2018-01-18 13:04
Time to feed the :bear:!

wdennis
2018-01-18 13:04
Notice there is only an "Address" attribute, and no "MAC" attribute

kamp.scott
2018-01-18 13:08
@wdennis there is address / token / strategy which translates to 2xx.3xx.4xx.4xx / ba:25:96:29:71:1f / MAC

kamp.scott
2018-01-18 13:08
at least thats whats in the ui

kamp.scott
2018-01-18 13:09

kamp.scott
2018-01-18 13:10
drpcli reservations create '{ "Addr": "1.1.1.1", "Token": "08:00:27:33:77:de", "Strategy": "MAC" }'

kamp.scott
2018-01-18 13:11
now... seemingly my reservations ip are outside the configured subnet scope

kamp.scott
2018-01-18 13:11
148.251.24.7/27 148.251.24.11 148.251.24.29 6000 72000 is my subnet

kamp.scott
2018-01-18 13:13
my reservation is 148.251.24.4 da:45:cf:f6:6b:11 MAC

kamp.scott
2018-01-18 13:13
be smarter if we could simple "ignore" the MAC - do nothing

wdennis
2018-01-18 13:15
@kamp.scott You can't ignore the MAC - that's what DHCP uses to know what machine gets what IP

wdennis
2018-01-18 13:16
And reservations should be outside of the dynamic address pool

kamp.scott
2018-01-18 13:17
but its ignoring the reservation

wdennis
2018-01-18 13:17
Does it already have a dynamic lease in the "db" for that MAC?

kamp.scott
2018-01-18 13:17
it boots into sledgehammer

wdennis
2018-01-18 13:18
You'd have to remove the dynamic lease record first; the DHCP server would already "know" what IP addr goes with that MAC

wdennis
2018-01-18 13:22
Do this - run this command and post back what it says: `drpcli machines list -E https://<your-drp-ip>:8092 | jq '.[] | .Name + ", " + .Address + ", " + .BootEnv'` For instance, on my DRP system, I get: ``` "testinstall, 192.168.1.125, local" "k8s-ingress, 192.168.1.102, local" "testnode03, 192.168.1.114, local" "testnode04, 192.168.1.132, local" "testnode02, 192.168.1.110, local" "testnode01, 192.168.1.123, local" "is-ef-n1, 192.168.1.112, sledgehammer" ```

wdennis
2018-01-18 13:23
(you don't need the `-E https://<your-drp-ip>:8092` part if you are running `drpcli` off the DRP host itself - that's for a remote `drpcli`)

vlowther
2018-01-18 14:12
I will take a look at that.

wdennis
2018-01-18 14:12
@vlowther at what?

vlowther
2018-01-18 14:12
precreated reservations and machines not behaving the way you expect.

wdennis
2018-01-18 14:13
ah

wdennis
2018-01-18 14:14
If there exists a dyn lease record for a MAC, and then one enters a static lease record for that same MAC, which one "wins"?

vlowther
2018-01-18 14:14
@kamp.scott Is that example machine you poseted above one that you precreated and expect it to boot into Sledgehammer, or something else?

shane
2018-01-18 14:14
@andreas.holmsten welcome

ctrees
2018-01-18 14:16
I'm working through the same sort of thing except with an array of ILO devices... Wondering about 'missing mac'

wdennis
2018-01-18 14:16
Or does (should) the DRP system detect a prior MAC entry, and remove it in favor of the new one? (not sure how this works w/ DRP DHCP server)

ctrees
2018-01-18 14:16

wdennis
2018-01-18 14:17
@ctrees Do you have any statics mapped?

ctrees
2018-01-18 14:18
mapped in drp ? or ?

wdennis
2018-01-18 14:18
yes, in DRP

wdennis
2018-01-18 14:18
Or are you using outboard DHCP?

greg
2018-01-18 14:19
@ctrees - Add `.State` to your jq magic.

ctrees
2018-01-18 14:19
what I did was put statics outside the DHCP

ctrees
2018-01-18 14:20

greg
2018-01-18 14:21
So, `INVALID` means that DRP decided they weren?t safe to use. Either something responded to ping or they were NAKed by the client.

greg
2018-01-18 14:21
`ACK` means what it sounds like

greg
2018-01-18 14:22
`OFFER` means pending.

ctrees
2018-01-18 14:24
hum... so with @vlowther websocket log alert... I could just listen to 'traffic' on a 'dirty toilet' and 'clean-up' a network... hum...

greg
2018-01-18 14:24
umm - not perhaps how I would have put it, but yes.

ctrees
2018-01-18 14:31
how does something make a request without a mac ?

wdennis
2018-01-18 14:31
@ctrees How would that work?

ctrees
2018-01-18 14:32
you mean the toilet thing?

wdennis
2018-01-18 14:32
No, "make a request without a mac"

ctrees
2018-01-18 14:33
don't know... that's what I'm curious about... if it's INVALID... I'd like to figure out where it came from... and would think a mac would be involved

ctrees
2018-01-18 14:34
for drp to even see it... it'd have to be in at least arp ?

wdennis
2018-01-18 14:35
ARP is MAC (layer 2) to IP (layer 3) mapping


ctrees
2018-01-18 14:36
thanks

kamp.scott
2018-01-18 14:37
@vlowther the machines were all created staically addressed before rebar was deployed now with reboo if we reboot one it drops it into sledehammer

kamp.scott
2018-01-18 14:38
so i shut down rebar and rebooted the static machine o its orig state then started rebar again and created a "reservation" however rebar still tries to install it

wdennis
2018-01-18 14:40
@kamp.scott Do you have a DHCP server that is separate from DRP on your deployment network?

kamp.scott
2018-01-18 14:40
no the only thin doing dhcp is rebarthe machines in question have a "static" ip

vlowther
2018-01-18 14:41
ok

wdennis
2018-01-18 14:41
So, they do NOT use DHCP to address themselves?

vlowther
2018-01-18 14:41
Can you give me an example of what JSON you are passing in to create the machine and the reservation?

kamp.scott
2018-01-18 14:42
@vlowther i did it in the ui

vlowther
2018-01-18 14:42
DRP uses ping to see if it can issue a lease, not ARP.

kamp.scott
2018-01-18 14:42
just ip mac address and MAC

wdennis
2018-01-18 14:42
Ah I see - perhaps he's creating machine rec's manually

kamp.scott
2018-01-18 14:42
well they pxe boot

kamp.scott
2018-01-18 14:43
@vlowther want to see the ui ?

vlowther
2018-01-18 14:43
we use ping because arp is limited to the local subnet, and we can handle remote subnets.

vlowther
2018-01-18 14:44
@kamp.scott I am more interested in what the JSON for the machine winds up looking like.

kamp.scott
2018-01-18 14:44
these 10 ips are in the samesubnet which is why i created a reservation for them thinking they would just get the sameip and boo from disk

kamp.scott
2018-01-18 14:45
i guess weneed rebar to think its already done is job for these ips :slightly_smiling_face:

vlowther
2018-01-18 14:45
right, whicn is why I need to see what the machine JSON looks like from the CLI

wdennis
2018-01-18 14:45
Q: does PXE always use DHCP? Or can you somehow assign static to the PXE-boot bios for the NIC?

vlowther
2018-01-18 14:46
PXE is a pseudo-standard layered on top of DHCP

wdennis
2018-01-18 14:46
I've never done PXE without DHCP being involved...

wdennis
2018-01-18 14:46
lol "pseudo-standard"

vlowther
2018-01-18 14:47
If it isn't an RFC, ANSI, ISO, or similar standard, it is a pseudo-standard. :slightly_smiling_face:

wdennis
2018-01-18 14:47
Welcome to your hell :wink:

kamp.scott
2018-01-18 14:47
@vlowther ok how do i get that data for you from the cli?

kamp.scott
2018-01-18 14:48
drpcli leases list [ { "Addr": "148.251.24.5", "Available": true, "Errors": [], "ExpireTime": "2018-01-18T10:01:58.851340004-05:00", "Meta": {}, "ReadOnly": false, "State": "ACK", "Strategy": "MAC", "Token": "ba:25:96:29:71:1f", "Validated": true },

vlowther
2018-01-18 14:49
drpcli machines get Name:<machine name>

kamp.scott
2018-01-18 14:49
drpcli reservations list [ { "Addr": "148.251.24.4", "Available": true, "Errors": [], "Meta": {}, "NextServer": "", "Options": [], "ReadOnly": false, "Strategy": "MAC", "Token": "da:45:cf:f6:6b:11", "Validated": true }, { "Addr": "148.251.24.5", "Available": true, "Errors": [], "Meta": {}, "NextServer": "", "Options": [], "ReadOnly": false, "Strategy": "MAC", "Token": "ba:25:96:29:71:1f", "Validated": true } ]

vlowther
2018-01-18 14:49
will get a single name.

vlowther
2018-01-18 14:50
@kamp.scott PM that to me?

shane
2018-01-18 14:50
(please use "Snippet" -- the plus symbol to left of input box -- in Slack for code like the above)

shane
2018-01-18 14:50
(it helps make long info collapsible and makes the channel more readable)

vlowther
2018-01-18 14:50
yeah, that too. :slightly_smiling_face:

wdennis
2018-01-18 14:51
And remeber to filter the gohai stuff out -- `drpcli machines list | jq '.[] | del(.Params."gohai-inventory")'`

vlowther
2018-01-18 14:51
aw, but I wirked so hard to write that. :slightly_smiling_face:

wdennis
2018-01-18 14:52
Dude - it's awesome, but oh the output lines :joy:

kamp.scott
2018-01-18 14:52
jeeeez howw do you pm someone in slack

shane
2018-01-18 14:53
scroll down on left panel

shane
2018-01-18 14:53
find Direct Messages - click on + to pop up selection panel

shane
2018-01-18 14:53
or - you can use `/msg` command

shane
2018-01-18 14:54
or `/dm` command

kamp.scott
2018-01-18 14:54
yupp got it

kamp.scott
2018-01-18 15:02
so anyway... not that i want to be rebooting our mail server often but it did just happen lucky for us it wasnt totally automated of wed have a fresh install

kamp.scott
2018-01-18 15:02
and no more mailserver

zehicle
2018-01-18 16:34
@kamp.scott I'd suggest not having install-os as a default if you are mixing production servers into your process. Discovery is a safer default since it's not destructive.

kamp.scott
2018-01-18 17:14
welp @vlowther solved my existing systems issue nicely

vlowther
2018-01-18 17:19
tl;dr: to keep DRP from messing with a machine, create a reservation for it and create a machine with the reserved IP, stage none and bootenv local

vlowther
2018-01-18 17:21
That will make sure it gets a consistent IP, that it has no tasks to run, and that the PXE files we write for it will have it boot to the local disk.

kamp.scott
2018-01-18 17:39
ok... coll all machines created.... no onto configuring a workflow....

kamp.scott
2018-01-18 17:39
or should i tackle kribs for my infrastructure first :slightly_smiling_face:

shane
2018-01-18 18:43
KRIBs is just a workflow - nothing more

kamp.scott
2018-01-18 19:16
seems the work flow.....ivegott 3 systes that did the centos-install and rebooed now they jus seemto be sitting here

kamp.scott
2018-01-18 19:18
static 4a4900b9-4116-4057-aee0-84219ad6d12b 148.251.24.13 local local static 7b17aa61-2b44-4d1a-8333-d3ea105bc1d1 148.251.24.14 local local static 8580bec6-c2d2-420e-bc10-42bf87c25c89 148.251.24.15 local local

shane
2018-01-18 19:29
"sitting where"

shane
2018-01-18 19:30
you have those 3 systems to boot to local disk - not to do any provisioning with "local/local"

kamp.scott
2018-01-18 19:31
drpcli profiles show k8s-cluster | jq -r '.Params."krib/cluster-master"' null

kamp.scott
2018-01-18 19:33
@shane i followed the kribs guide

kamp.scott
2018-01-18 19:34
for install-to-local-disk mode: centos-7-install -> runner-service:Success runner-service -> finish-install:Stop finish-install -> docker-install:Success docker-install -> krib-install:Success krib-install-> complete:Success discover->sledgehammer-wait:Success

shane
2018-01-18 19:35
please show me the command you used to display your 3 machines above

kamp.scott
2018-01-18 19:39
@shane that was pasted from the ui under machines

shane
2018-01-18 19:40
right - so "local" stage and "local" bootenv tells Digital Rebar Provision to ignore the Machines and have them just boot to the locally installed operating system

shane
2018-01-18 19:40
you can not perform any further workflow or provisioning

shane
2018-01-18 19:40
you must change the machines Stage to start it on the work flow for KRIB

kamp.scott
2018-01-18 19:40
i did that

kamp.scott
2018-01-18 19:40
i edited each and added the profile

kamp.scott
2018-01-18 19:41
then rebooted per the instructions

kamp.scott
2018-01-18 19:41
they then installed centos-7 and rebooted then nothing more


shane
2018-01-18 19:41
the last few paragraphs of that section

shane
2018-01-18 19:41
you need to change the Stage of the Machines to start them on the KRIB install process

shane
2018-01-18 19:42
starting from: _"Change stage on the Machines to initiate the Workflow transition."_

shane
2018-01-18 19:44

ctrees
2018-01-18 19:45
and remember what @shane told us last night... please note that a Stage and a BootEnv are two different things (see 7:20PM shane)

kamp.scott
2018-01-18 19:49
ok maybe im an idiot but ive done what you both are telling me to do

ctrees
2018-01-18 19:50

ctrees
2018-01-18 19:51
or I'm still confused on when you 'should could' Stage vs BootEnv

shane
2018-01-18 19:52
can you please provide a screenshot of UI Machines panel - or run this `drpcli` command: `drpcli machines list | jq -r '.[] | "\(.Name) : \(.Stage) : \(.BootEnv)"'`

shane
2018-01-18 19:53
@ctrees see the Note - this is not using Workflow/Stages to move a machine through provisioning process

shane
2018-01-18 19:54
using a workflow and stages is different

shane
2018-01-18 19:54
that process highlights manually moving a machine through provisioning steps

ctrees
2018-01-18 19:59
yea... but the instructions on the quickstart did not work 4me (I had to use stage, not bootenv... as the machine was sitting in sledgehammer... I think..) and still wrapping my head around it... I get now that workflow need to plan for expected future events... I was going back over rob and greg video of some of that to see what I was missing...

kamp.scott
2018-01-18 19:59
drpcli machines list | jq -r '.[] | "\(.Name) : \(.Stage) : \(.BootEnv)"' http://static.18.24.251.148.clients.your-server.de : centos-7-install : centos-7-install http://static.16.24.251.148.clients.your-server.de : centos-7-install : centos-7-install http://static.17.24.251.148.clients.your-server.de : centos-7-install : centos-7-install

kamp.scott
2018-01-18 19:59
ok this is 3 new machines

kamp.scott
2018-01-18 20:00
my_machines stage ssh-access + drpcli machines stage 86064587-bd6f-4999-ad0b-772ee5ed12c5 ssh-access Error: ValidationError: machines/86064587-bd6f-4999-ad0b-772ee5ed12c5: Can not change stages with pending tasks unless forced + set +x + drpcli machines stage 1d064d6f-cf06-4a11-a1d6-7b23766ba5a0 ssh-access Error: ValidationError: machines/1d064d6f-cf06-4a11-a1d6-7b23766ba5a0: Can not change stages with pending tasks unless forced + set +x + drpcli machines stage d72597cb-390a-4f96-8cf8-0b72dca2364b ssh-access Error: ValidationError: machines/d72597cb-390a-4f96-8cf8-0b72dca2364b: Can not change stages with pending tasks unless forced + set +x

kamp.scott
2018-01-18 20:01
jeeezzz....

kamp.scott
2018-01-18 20:12
my_machines action powercycle + drpcli machines action 86064587-bd6f-4999-ad0b-772ee5ed12c5 powercycle Error: GET: machines/86064587-bd6f-4999-ad0b-772ee5ed12c5: Action powercycle: Not Found + set +x + drpcli machines action 1d064d6f-cf06-4a11-a1d6-7b23766ba5a0 powercycle Error: GET: machines/1d064d6f-cf06-4a11-a1d6-7b23766ba5a0: Action powercycle: Not Found + set +x + drpcli machines action d72597cb-390a-4f96-8cf8-0b72dca2364b powercycle Error: GET: machines/d72597cb-390a-4f96-8cf8-0b72dca2364b: Action powercycle: Not Found + set +x

kamp.scott
2018-01-18 20:12
doesnt even work

shane
2018-01-18 20:13
did you install an IPMI plugin to support power actions ?

shane
2018-01-18 20:13
if not you can't power cycle

shane
2018-01-18 20:14
I failed to mention that in the Doc - you need a Plugin to implement the IPMI power actions

shane
2018-01-18 20:15
actually - I did mention it - briefly

kamp.scott
2018-01-18 20:15
i rebooted them from console

kamp.scott
2018-01-18 20:15
theirinstalling centos-7 again

kamp.scott
2018-01-18 20:16
then theyll probably jus reboo like before and do nothing

kamp.scott
2018-01-18 20:16
centos-7-install Start runner-service Success (remove step) finish-install Stop (remove step) docker-install Success (remove step) krib-install Success (remove step) complete Success (remove step) discover Start sledgehammer-wait Success (remove step)

kamp.scott
2018-01-18 20:16
thats the work flow for the profile

greg
2018-01-18 20:16
What profile did you create this in?

kamp.scott
2018-01-18 20:16
k8s

kamp.scott
2018-01-18 20:17
i added the profile

greg
2018-01-18 20:17
Is that the only profile on the machines?

kamp.scott
2018-01-18 20:17
well there is default

greg
2018-01-18 20:17
default?

kamp.scott
2018-01-18 20:19

kamp.scott
2018-01-18 20:20
eachmachine has the ks-cluser profile

greg
2018-01-18 20:20
`drpcli profiles show k8s-cluster`

kamp.scott
2018-01-18 20:20
and sure enough they are jus sitting there again

kamp.scott
2018-01-18 20:22

kamp.scott
2018-01-18 20:22
oh wait...is in docker install now

kamp.scott
2018-01-18 20:22
so something changed

greg
2018-01-18 20:23
The profile shows progress being made.

shane
2018-01-18 20:27
I updated the KRIB doc (in `latest`) to highlight the IPMI actions and Plugin Provider status

shane
2018-01-18 20:30
@kamp.scott - I wouldn're recommend leaving those pastebin's up - you have sensitive tokens and cluster admin related information in those

kamp.scott
2018-01-18 20:34
hrmmm seems hung...2 are still on docker-install

kamp.scott
2018-01-18 20:35
ill lesve it be for a while

greg
2018-01-18 20:35
Or you could check the jobs system and see what they say.

greg
2018-01-18 20:35
For example, the jobs for each machine running docker-install. There should be a job with a log for that.

kamp.scott
2018-01-18 22:30
nope seems its hung up on2 nodes inn krib-install

shane
2018-01-18 22:35
check the job log and see why

kamp.scott
2018-01-18 23:13
@shane i would if the ui would load it

kamp.scott
2018-01-18 23:13
anyway to do it from cli ?

kamp.scott
2018-01-18 23:14
2018-01-18T17:44:12 n/a 5ec554d0-0b5d-4bf9-82d9-296ad30af345 e99c4d85-73ca-461e-a599-fc8463a00a7a krib-install krib-install

shane
2018-01-18 23:15
`drpcli jobs list` for all jobs `drpcli jobs list | jq ".[] | select(.Machine==\"$UUID\")"`

shane
2018-01-18 23:15
(where $UUID is a variable holding the UUID of the machine you'd like to inspect jobs for)

kamp.scott
2018-01-19 09:29
ok well seems some cli things did get my KRIBs installed though i cant seem to ssh into it

kamp.scott
2018-01-19 09:31
"ssh-access": { "user1": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDHr/fzI3B7dQ6KEXyPVjA0iXiPwyyAFN2/NwTeBySp290kp4wMKMUQo0cZs8hxZRhJv51zIGGcT46CyASOy9R7vHJJwP+RYVA4LuGKhbFvI4nB3BdCF2M+Rbsc+RR7X4NIVdsMIbbCnYKBWrk4cb8NgXLicns/pH5gL1ZFG2Zecu8H0m7JYyuRNixVJRu4Gk5iGZwGfqyL5iOvQhuD5FpCmQXYoU3CGSALFzRh8DfFDA9ZhdjfR2b/x9feeBdTjLB8kEa0YmqBgPwsW1r8GiV0pRvW8ROEx6RJCRhaUGcg2aE+Re6s+h6IiHPv59TzQjwWNoxDKhSj+WjPg3Jhh+PZ dingo@new-host-2" }

kamp.scott
2018-01-19 09:31
is they key config from the yaml

kamp.scott
2018-01-19 09:32
i tried both as root and as user1

kamp.scott
2018-01-19 09:32
keeps asking me for apassword

ctrees
2018-01-19 13:19
I was just messing with this... and found this helpful:


ctrees
2018-01-19 13:22
Have not done the krib install... but was about to ask about this...

ctrees
2018-01-19 13:23
RackN-Portal -> Profiles -> root-access-example

ctrees
2018-01-19 13:24
access-keys: { "greg": "ssh-rsa blahblah... galthaus@Gregs-MacBook-Pro.local" }

ctrees
2018-01-19 13:25
access-ssh-root-mode: "without-password"

ctrees
2018-01-19 13:30
I was looking at how the "user": "ssh-rsa ... user@machine" association is done... but in your case I 'THINK' you just need the access-ssh-root-mode: "without-password" ? part ?


ctrees
2018-01-19 13:34
access_keys Map of strings The key is the name of the public key. The value is the public key. All keys are placed in the .authorized_keys file of root.

kamp.scott
2018-01-19 14:32
ssh-access: { "user1": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDHr/fzI3B7dQ6KEXyPVjA0iXiPwyyAFN2/NwTeBySp290kp4wMKMUQo0cZs8hxZRhJv51zIGGcT46CyASOy9R7vHJJwP+RYVA4LuGKhbFvI4nB3BdCF2M+Rbsc+RR7X4NIVdsMIbbCnYKBWrk4cb8NgXLicns/pH5gL1ZFG2Zecu8H0m7JYyuRNixVJRu4Gk5iGZwGfqyL5iOvQhuD5FpCmQXYoU3CGSALFzRh8DfFDA9ZhdjfR2b/x9feeBdTjLB8kEa0YmqBgPwsW1r8GiV0pRvW8ROEx6RJCRhaUGcg2aE+Re6s+h6IiHPv59TzQjwWNoxDKhSj+WjPg3Jhh+PZ dingo@new-host-2" }

kamp.scott
2018-01-19 14:33
is whats in my krib profile

kamp.scott
2018-01-19 14:33
and doesnt seem to work

kamp.scott
2018-01-19 14:35

ctrees
2018-01-19 14:35
Well.. the krib file may assume that the global profile has the access-ssh-root-mode: "without-password"

ctrees
2018-01-19 14:42
the 'keep asking for pass' seems like -> access-ssh-root-mode: "without-password" is missing...

kamp.scott
2018-01-19 14:52
@ctrees i get it but i think ifthat was the case it would be in the K8s profile... right?

ctrees
2018-01-19 14:56
Well.. when they did the demo, they were using a packet profile (I THINK)... and using packet the ssh key setup is done in that profile...

ctrees
2018-01-19 14:57
aka... it's a boot-strap-env thing ?

ctrees
2018-01-19 14:59
BTW... I'm sure shane can get you some packet 'free time' setup a packet endpoint and I bet krib works 'out-of-box' and he has a terraform example for that in the repo...

kamp.scott
2018-01-19 14:59
ihave my own servers

greg
2018-01-19 15:25
@kamp.scott - is it NOT `ssh-access`. It is `access-keys`. Check the params in the UX. You will see the keys to set.

greg
2018-01-19 15:26
Updating docs to change it from `access_keys` to `access-keys`

kamp.scott
2018-01-19 15:26
@greg then the profile is wrong in the docks on KRIB ?

greg
2018-01-19 15:27
Yes - krib is wrong and needs to be updated.

kamp.scott
2018-01-19 15:29
@greg so i need to reinstall the whole cluster ?

greg
2018-01-19 15:31
probably safest.

kamp.scott
2018-01-19 15:32
ughhh

kamp.scott
2018-01-19 15:32
:

greg
2018-01-19 15:33
Basically, I?m not sure I can walk you through the running of a single stage to set the keys.

greg
2018-01-19 15:34
For example, I think you could set the machines to the ssh-access stage.

greg
2018-01-19 15:34
It will rerun the ssh-access task.

greg
2018-01-19 15:34
Assuming you?ve edited the profile to have the correct key.

greg
2018-01-19 15:34
then you can set the stage back to complete.

greg
2018-01-19 15:36
That should work, but I don?t know if you have machines in the correct state, if you have the job knowledge to verify that the ssh-access task ran, and when to deicde it was successful.

greg
2018-01-19 15:37
Okay - here is the steps to do (thinking about them):

greg
2018-01-19 15:41
1. Run this: `drpcli machines list | jq -r '.[] | "\(.Name), \(.Runnable), \(.Stage), \(.CurrentTask)"'`

greg
2018-01-19 15:41
The machines you want to manipulate should have ?<name>, true, complete, 0?

greg
2018-01-19 15:42
2. Edit the profile to fix the `ssh-access` to `access-keys`

greg
2018-01-19 15:42
3. For each machine, set the stage to `ssh-access`

greg
2018-01-19 15:44
4. Run the command from #1 until the stage show `ssh-access` and currentTask shows `1`

greg
2018-01-19 15:44
5. Once that is done, you should be able to ssh into the boxes.

greg
2018-01-19 15:45
6. For each machine, set the stage to `complete`

greg
2018-01-19 15:45
@kamp.scott that may get you back to ssh-access.

kamp.scott
2018-01-19 16:09
ok i reinsalled after changing the ssh-access to access-keys

kamp.scott
2018-01-19 16:09
im in now

kamp.scott
2018-01-19 16:14
@greg is there no kubernees ui exposed on the maser public ip ?

shane
2018-01-19 16:16
please see the Video in the KRIB documentation link

shane
2018-01-19 16:16
it's discussed there

kamp.scott
2018-01-19 16:37
no not reallyi dont want to use a local proxy i want to expose the ui on the public side

ctrees
2018-01-19 16:38
oh... sorry..

kamp.scott
2018-01-19 16:39
no idea why they would do it this way....whats the logic

kamp.scott
2018-01-19 16:40
aside from hat i dont have kubectl installed on my local lapop

kamp.scott
2018-01-19 16:43
@shane so no joy here i guess

kamp.scott
2018-01-19 17:54
$KRIB


kamp.scott
2018-01-19 18:00
127.0.0.1 sent an invalid response. ERR_SSL_PROTOCOL_ERROR

kamp.scott
2018-01-19 18:00
jesus

greg
2018-01-19 18:02
What command did you run?


kamp.scott
2018-01-19 18:15
okfrom my laptop

kamp.scott
2018-01-19 18:16
export KUBECONFIG=`pwd`/admin.conf ? ? 13:15 ? 19.01.18 ? 50.64G RAM kubectl get nodes 2018-01-19 13:15:33.601486 I | proto: duplicate proto type registered: google.protobuf.Any 2018-01-19 13:15:33.601637 I | proto: duplicate proto type registered: google.protobuf.Duration 2018-01-19 13:15:33.601688 I | proto: duplicate proto type registered: google.protobuf.Timestamp NAME STATUS AGE VERSION http://static.27.24.251.148.clients.your-server.de Ready 2h v1.9.2 http://static.28.24.251.148.clients.your-server.de Ready 2h v1.9.2 http://static.8.24.251.148.clients.your-server.de Ready 2h v1.9.2

kamp.scott
2018-01-19 18:16
then

kamp.scott
2018-01-19 18:16
kubectl proxy ? ? 13:15 ? 19.01.18 ? 50.64G RAM 2018-01-19 13:16:29.223153 I | proto: duplicate proto type registered: google.protobuf.Any 2018-01-19 13:16:29.223292 I | proto: duplicate proto type registered: google.protobuf.Duration 2018-01-19 13:16:29.223334 I | proto: duplicate proto type registered: google.protobuf.Timestamp Starting to serve on 127.0.0.1:8001

kamp.scott
2018-01-19 18:17
then in a browser

kamp.scott
2018-01-19 18:17

kamp.scott
2018-01-19 18:17
This site can?t provide a secure connection 127.0.0.1 sent an invalid response. ERR_SSL_PROTOCOL_ERROR

greg
2018-01-19 18:17
Sorry - k8s api doesn?t want ssl, I think.

kamp.scott
2018-01-19 18:17
fix the docs :slightly_smiling_face:

shane
2018-01-19 18:17
It's not a Digital Rebar problem

shane
2018-01-19 18:18
it's a Kubernetes thing - please go read the Kubernetes documentation to understand how to use it

kamp.scott
2018-01-19 18:18
24.3.8.4. Use Kubernetes Dashboard via Proxy

kamp.scott
2018-01-19 18:18
@shane i folllowed the doc

kamp.scott
2018-01-19 18:18
can i jus expose the web ui on the public interface ?

greg
2018-01-19 18:19
Did changing https to http address your problem, @kamp.scott

greg
2018-01-19 18:19
I doubt it. That is a k8s issue. Learn about k8s.

shane
2018-01-19 18:21
excellent Kubernetes doc on the UI and how to interact with it: https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/

greg
2018-01-19 18:21
Actually, it should be https.

greg
2018-01-19 18:21
according to that doc. So, I don?t know.

shane
2018-01-19 18:22
yes - Kubernetes use Certs by default

shane
2018-01-19 18:22
@kamp.scott - KRIB is a demonstration pattern of how to create Immutable or Installed Kubernetes clusters using Digital Rebar content

shane
2018-01-19 18:23
we are not a Kubernetes shop - and we don't have expertise in it - we just hack around enough to get it up and running

kamp.scott
2018-01-19 18:23
@shane well thats nice and the work is usefulhowever to be properly used the ui should be exposed on the master ip :slightly_smiling_face:

kamp.scott
2018-01-19 18:23
illsee if i can work to expose i myself

shane
2018-01-19 18:24
we disagree fundamentally - we prefer using a CLI here ... :slightly_smiling_face:

kamp.scott
2018-01-19 18:24
@shane we arent serving just "us" i have clients

kamp.scott
2018-01-19 18:24
i was seeing if in fact it was viabke for a clien to spin up a cluser

greg
2018-01-19 18:25
So, @kamp.scott - how does k8s handle multitenancy?

wdennis
2018-01-19 18:25
@Dingo Actually, in Kubernetes, everything runs on internal private IP space... Nothing gets exposed ?externally? unless an admin exposes it...

kamp.scott
2018-01-19 18:26
@greg via calico or hypernetes

kamp.scott
2018-01-19 18:27
same with DC/OS

kamp.scott
2018-01-19 18:27
andmesosphere

greg
2018-01-19 18:27
hmm - then you may want to start working on a deployment workload for hypernetes. Looks interesting.

whitlow.john
2018-01-19 18:27
has joined #community201801

kamp.scott
2018-01-19 18:28
@greg yeah asi learn more i can contribute more

shane
2018-01-19 18:28
@whitlow.john - welcome

kamp.scott
2018-01-19 18:28
both in knowledeand profiles

greg
2018-01-19 18:28
The default k8s doesn?t. It doesn?t understand multitenancy. It expects external RBAC.

wdennis
2018-01-19 18:29
@Dingo DRP ?KRIB? is just a ?means to an end? to get nodes to 1) get a requisite OS installed; 2) install k8s pre-reqs (I.e. relevant container runtime system, such as Docker); and 3) run the ?kubeadm? k8s deployment tool and pass in relevant variables from the DRP profile

kamp.scott
2018-01-19 18:30
@greg yeah i get that i however was planning tomigrate services to it that require external exposure

kamp.scott
2018-01-19 18:30
@wdennis dont worry imnotknocking it i was just surprised it was exposed only internally

kamp.scott
2018-01-19 18:30
we could mention tha in the docs also

greg
2018-01-19 18:30
Much like @wdennis was pointing out. Services are constructed internally. k8s in AWS is nice because it automatically integrates with ELBs so that ports can be exposed and routed through the proxies. In physical envs, this is a snowflake feature. You could configure node ports (Did you change that on your ui yaml definition file?) or add lb services that hook into a provider mechanism.

wdennis
2018-01-19 18:30
It?s strictly a k8s thing

kamp.scott
2018-01-19 18:31
right like traeffic

wdennis
2018-01-19 18:31
No failing of DRP/KRIB

kamp.scott
2018-01-19 18:31
auto web proxying for a cluster

kamp.scott
2018-01-19 18:31
@wdennis nope just different then im used to is all

greg
2018-01-19 18:32
we choose the result of running kubeadm with defaults. Like the k8s community starts with. We didn?t want to add FUD. Just simple straight usage. You are a truly advanced user. You are beyond KRIB.

wdennis
2018-01-19 18:33
I myself am working on a Træfik proxy for my cluster; the problem is I want to do a two-interface one, which I am not sure is supported...

kamp.scott
2018-01-19 18:34
as a side note is there a coreos drpcli bootenvs uploadiso for it ?

kamp.scott
2018-01-19 18:36
@wdennis should be feasible

wdennis
2018-01-19 18:37
@kamp.scott if you figure it out, tell me how... the Træfik guys themselves don?t know if it?s doable... (join Træfik Slack, read #kubernetes channel)

wdennis
2018-01-19 18:38
But anyways, that?s not DRP-related, so let?s not discuss that here...

zehicle
2018-01-19 19:10
IMHO, KRIB is showing how you can join nodes to a cluster immutably. That's what we hear people asking for - reboot, run in memory, join cluster. The fact that it builds the API server is a necessary bonus. Ideally, you could use Kubespray to build the cluster since it has some real operations choices.

ctrees
2018-01-19 21:12
So I was attempting the centos-7-install and it seemed to get to Stage: local BootEnv: local but does not come up... I 'suspect' the actual HD... I looked into the Job logs but did not see anything...

ctrees
2018-01-19 21:13
again... I suspect maybe a bad local disk drive

greg
2018-01-19 21:18
getting to local usually means that the kickstart post-install finished. Last step is to set the local bootenv/stage. From there, os install should finish and reboot. May have to check the console to see what is happening.

ctrees
2018-01-19 21:20
yea it went to boot local and is just sitting there...

ctrees
2018-01-19 21:20
like it doesnt have a boot sector...

ctrees
2018-01-19 21:23
and yup... I think drp did everything... I think it's something on the drive as these HP SAS I am using I'm not sure of... but was looking for some discover feed back about the state of the drive... or if that's possible... otherwise I better go validate the drives are bootable

shane
2018-01-19 21:23
did you maybe incidentally apply a `kernel-console` change (eg to `TTYS1` or similar) to global profile (or applied profile to machine) ??

greg
2018-01-19 21:23
okay - so - that leads me to believe that we installed to sda (but that may not be in the boot list).

ctrees
2018-01-19 21:25
The only global profile with an ssh key

shane
2018-01-19 21:25
cool - greg is probably on right track then

ctrees
2018-01-19 21:25
yea I think so... smells like the boot list...

shane
2018-01-19 21:25
if the console is switched to an invalid one, as soon as it hits initrd - the console "goes blank", and then if there's a boot problem - you'll never see it

shane
2018-01-19 21:26
but once the OS comes up successfully on boot medium - you should get a console back as soon as getty (or similar) spawns them

ctrees
2018-01-19 21:26
ok... that's good to know...

shane
2018-01-19 21:26
*if* it doesn't come up successfully - you'll never know which/what/why ... :disappointed:

ctrees
2018-01-19 21:27
I'll go dig in the bios abit...

kamp.scott
2018-01-21 00:04
Kubeconfig Please select the kubeconfig file that you have created to configure access to the cluster. To find out more about how to configure and use kubeconfig file, please refer to the Configure Access to Multiple Clusters section. Token Every Service Account has a Secret with valid Bearer Token that can be used to log in to Dashboard. To find out more about how to configure and use Bearer Tokens, please refer to the Authentication section.

kamp.scott
2018-01-21 00:04
ok going deeper... seems kubeconfig want the config file im using however itt says is not valid ?

kamp.scott
2018-01-21 00:04
i know.. dont tell me... its a kubernetes issue right...

zehicle
2018-01-21 00:11
if the system generated a kubeconfig file then it should be working. check out that file

kamp.scott
2018-01-21 00:20
i have the admin.conf as specified in the guide. i can run kubectl commands.... i just cant seem to login to the gui

kamp.scott
2018-01-21 00:22
basically because we really have no idea how tthe deployed cluster is configured i tried to deploy an app and am getting this

kamp.scott
2018-01-21 00:22
Unable to mount volumes for pod "mailserver-6d874d4c67-nm94x_default(790fcb47-fe3b-11e7-8bb5-0a90a721fa77)": timeout expired waiting for volumes to attach/mount for pod "default"/"mailserver-6d874d4c67-nm94x". list of unattached/unmounted volumes=[mailserver-claim0]

kamp.scott
2018-01-21 00:23
i know ... its a kubernetes thing but we dont have a cluehow its configured or deployed or to login

kamp.scott
2018-01-21 00:24
and yes i am a bit new to kubernetes mesosphere Triton and DC/OS... i am familiar with

kamp.scott
2018-01-21 00:26
starts sounding like a whiney #^%& b1tch

zehicle
2018-01-21 00:39
to login to the Kube UI, use kubectl [configfile parm] proxy

zehicle
2018-01-21 00:40
that will give you a local proxy

zehicle
2018-01-21 00:40

zehicle
2018-01-21 00:40
assuming that's your proxy url

zehicle
2018-01-21 00:41
the configuration is default Kubeadm. you can refer to those docs for kube workflow

kamp.scott
2018-01-21 00:53
lastly.. what was the drpcli command to checnge the root access-key

kamp.scott
2018-01-21 00:53
@zehicle ^^

kamp.scott
2018-01-21 00:53
and thanks


zehicle
2018-01-21 18:38

ctrees
2018-01-22 14:47
template expansions follow go templates ? correct ? https://golang.org/pkg/text/template/

shane
2018-01-22 14:47
@ctrees yes

shane
2018-01-22 14:48
our Data Arch docs has a bit of info on the various golang template pieces


ctrees
2018-01-22 14:48

shane
2018-01-22 14:51
presumably, yes - but I'm not certain if our YAML handling follows that exact spec - have to rely on @greg or @vlowther to answer that

ctrees
2018-01-22 14:52
yea I was reading that and saw reference to the go docs in http://provision.readthedocs.io/en/latest/doc/arch.html

greg
2018-01-22 14:52
Close enough.

greg
2018-01-22 14:52
We use this go package: http://github.com/ghodss/yaml

greg
2018-01-22 14:53
it seems to keep pretty close to the spec. I small nit is that if you want yaml pretty printed cleanly. YOu need to make sure that you don?t have white space at the end of lines.

ctrees
2018-01-22 14:53
Oh thanks @greg my fri foo-bar was the HP's have that built-in raid h/w

vlowther
2018-01-22 14:54
@ctrees: We only use the bits of YAML that can be represented in JSON.

vlowther
2018-01-22 14:54
So no integer map keys and the like.

vlowther
2018-01-22 14:54
Our YAML support is basically there to allow a more human-firendly alternative to JSON.

greg
2018-01-22 14:54
oh - yeah that- I just assume it. :slightly_smiling_face:

ctrees
2018-01-22 14:55
... yea I'm sort of sad that yaml seems to have gotten 'accepted' as the default for SDN too... I rather look at JSON bobs and not count white spaces

vlowther
2018-01-22 14:55
since YAML has actual support for inlining large chunks of text. which is a thing we use rather alot. :slightly_smiling_face:

vlowther
2018-01-22 14:56
There is that, but JSON specifically sucks when you are inlining shell scripts

ctrees
2018-01-22 14:57
the "Contents: |+" .... was what started my ... ok, what parse stuff quest..

shane
2018-01-22 14:58
I haven't worked with it much - but what I've seen of TOML - I like it - it's a bit like YAML - but doesn't care about spaces/tabs as much

vlowther
2018-01-22 14:59
The |+ thing basically means "translate all intervening whitespace into a single space"

ctrees
2018-01-22 15:01
yea... but I didn't know if there is a 'stop' to that after... I assume it's until the next same yaml tree level (aka \n_ws_ws_ws match)

ctrees
2018-01-22 15:01
aka yaml rule

vlowther
2018-01-22 15:10
Yep.

ctrees
2018-01-22 15:10
@vlowther is right... that yaml inlining does make it easier to read

shane
2018-01-22 15:11
when I'm looking at templates with inline scripts - I always use the YAML output formatter (eg `drpcli ... <something> ... --format=yaml`)

greg
2018-01-22 21:11
Hi All. v3.6.0 is building and should be out shortly. Stable has been moved. Content updates as well.

greg
2018-01-22 21:13
- thanks!

zehicle
2018-01-22 21:13
well done! :cake:

ctrees
2018-01-22 21:15
I'll pull and run right after my current round of testing reboots...

greg
2018-01-22 21:15
Okay - wait for my all clear. The trees are update, but the builds aren?t quite done yet.

greg
2018-01-22 21:16
I just didn?t want people to be caught off guard.

ctrees
2018-01-22 21:17
well.. my cycle tends to take over a day still... I'm not at that '5-min' shake-n bake Shane can do :wink:

shane
2018-01-22 21:20
@wdennis the new content update includes the "kickseed" capabilities as well

wdennis
2018-01-22 21:22
:tada:woo-hoo!:confetti_ball:

kamp.scott
2018-01-22 21:23
ok jus noticed another "anomoly" where when installing centos VM under XenServer with 120Gb disk rebar only installs a 10GB disk partition

kamp.scott
2018-01-22 21:23
and yes is also seen as xvda

greg
2018-01-22 21:25
We do this:

greg
2018-01-22 21:25
```logvol / --fstype ext4 --name=lv_root --vgname={{.Machine.ShortName}} --size=1 --grow --maxsize=10240 ```

greg
2018-01-22 21:26
Which may be cpaped at 10GB. ? I think grow gorws it. Maybe not.

greg
2018-01-22 21:52
- builds are done. 3.6.0 is out

ctrees
2018-01-22 23:23
I'm having an ssh key issue... I set a the access-ssh in global: access-keys: { "drpops": "ssh-rsa AAA...blahblah...U9n31 drpops@drpe.drpfeature.test" } access-ssh-root-mode: "without-password" Checked jobs on the machine... it looks like it ran the template... but it keeps asking for password...

ctrees
2018-01-22 23:24
is there a default root login on head so I can go figure out what I did ?

greg
2018-01-22 23:24
are logging in as root or drpops?

greg
2018-01-22 23:24
It sets the root keys.

ctrees
2018-01-22 23:24
drpops

ctrees
2018-01-22 23:25
which brings up another question... in robs demo the key is "user" I was debating if I should have that or ?? wasnt sure if the keyname matters ?

ctrees
2018-01-22 23:26
the key of the ssh-rsa string...

greg
2018-01-22 23:26
The keyname is just an id for you to recognize the key.

ctrees
2018-01-22 23:27
[drpops@drpe drpisolated]$ ssh root@192.168.88.102 [root@de4-11-5b-d0-83-78 ~]#

ctrees
2018-01-22 23:27
that worked... so what triggers 'root' vs 'user'

greg
2018-01-22 23:27
not sure. in ubuntu it is rebar. I think.

greg
2018-01-22 23:28
There is a parameter that changes it for ubuntu/debian.

greg
2018-01-22 23:28
centos only sets rootpw.

greg
2018-01-22 23:28
wait aminute I?m wrong.

greg
2018-01-22 23:28
It is always root ssh keys.

shane
2018-01-22 23:28
the Ubuntu seed sets a "Default User" account - while CentOS only twiddles a "root" account

ctrees
2018-01-22 23:28
... yea I got you thining too many things..

shane
2018-01-22 23:28
as part of standard install process

ctrees
2018-01-22 23:29
I got the 'root' part... thinking now of adding addtional 'users'.... I'll go read docs more...

shane
2018-01-22 23:29
in the ubuntu seed - you'll see: ```# Default User Setup d-i passwd/make-user boolean true d-i passwd/user-uid string {{if .ParamExists "provisioner-default-uid"}}{{.Param "provisioner-default-uid"}}{{else}}1000{{end}} d-i passwd/user-fullname string {{if .ParamExists "provisioner-default-fullname"}}{{.Param "provisioner-default-fullname"}}{{else if .ParamExists "provisioner-default-user"}}{{.Param "provisioner-default-user"}}{{else}}Rocket Skates{{end}} d-i passwd/username string {{if .ParamExists "provisioner-default-user"}}{{.Param "provisioner-default-user"}}{{else}}rocketskates{{end}} d-i passwd/user-password-crypted password {{if .ParamExists "provisioner-default-password-hash"}}{{.Param "provisioner-default-password-hash"}}{{else}}$6$drprocksdrprocks$upAIK9ynEEdFmaxJ5j0QRvwmIu2ruJa1A1XB7GZjrnYYXXyNr4qF9FttxMda2j.cmh.TSiLgn4B/7z0iSHkDC1{{end}} d-i user-setup/allow-password-weak boolean true d-i user-setup/encrypt-home boolean false```

shane
2018-01-22 23:30
this drops in a "Default User" with name `rocketskates` - overriding the built-in default of `ubuntu`

shane
2018-01-22 23:30
you can change it by setting the Param `provisioner-default-user`

shane
2018-01-22 23:31
post-provisioning to add more users should be done as a Unique stage to your environment

ctrees
2018-01-22 23:31
so access-keys is JUST for root ?

shane
2018-01-22 23:31
or - we'd suggest other Configuration Management tools as a better approach

shane
2018-01-22 23:32
you shouldn't bake user logic in to your Kickseeds (kickstarts/preseeds)

ctrees
2018-01-22 23:32
ok... I get that...

ctrees
2018-01-22 23:34
I just went through both krib and kubespray and sort of wondering ansible passoff OR ... I take it rob is thinking 'kubectl' (no ssh) but... blah blah... slowly getting head around so much flexiblity...

shane
2018-01-22 23:35
if you look at `drpcli templates show access-keys.sh.tmpl --format=yaml` you'll see the `access-ssh-root-mode` does indeed relate to "root" user only policy and the interesting bit: ``` {{if .ParamExists "access-keys"}} echo "Putting ssh access keys for root in place" mkdir -p /root/.ssh cat >>/root/.ssh/authorized_keys <<EOFSSHACCESS ### BEGIN Access Keys GENERATED CONTENT {{range $key := .Param "access-keys"}} {{$key}} {{end}} ### END Access Keys GENERATED CONTENT EOFSSHACCESS chmod 600 /root/.ssh/authorized_keys {{end}}```

shane
2018-01-22 23:35
which you see is only hacking the `/root/.ssh/authorized_keys` file

shane
2018-01-22 23:36
it'd be pretty trivial to copy-cat this and apply to the default user specified via the `provisioner-default-user` (or fallback to `rocketskates` username if not defined) to set this user SSH keys

shane
2018-01-22 23:36
that would be the preferred model on the Ubuntu side since that's their "security" model ... never mind that the user has full `sudo` access

shane
2018-01-22 23:37
but in theory that's a bit more protected since the default `sudo` access does require user password to authenticate `sudo` usage

ctrees
2018-01-22 23:40
yea and I notice the use of yaml to create yaml to injest yaml elsewhere... I'm sure you guys all have 'spinning tops' (cheap inception joke)

ctrees
2018-01-22 23:41
that threw me till @vlowther explained | vs |+ (I think.... still just theory in my head)

kamp.scott
2018-01-23 01:19
can we deploy rancheros with rebar ?

shane
2018-01-23 01:23
you surely can! You'll need to do some work to create a BootEnv to do this - we don't have a stock one

shane
2018-01-23 01:23
details on how to make PXE boot w/ RancherOS is available at: http://rancher.com/docs/os/v1.1/en/running-rancheros/server/pxe/

shane
2018-01-23 03:15
ok @kamp.scott - if you are willing to experiment and hack around a bit - here is a basic RancherOS set of bootenv/stages that works. WARNING WARNING: THIS WILL NOT WORK ON YOUR METAL this was tested by "borrowing" the http://Packet.net provisioning script - to provision this (successfully) in http://Packet.net environment you WILL HAVE TO MODIFY the `bootenvs/rancheros.yml` kernel options to get a different config file you define you WILL HAVE TO MODIFY the `rancher-packet-provisioning-script.sh` to work for your metal and environment WARNING: I've never used Rancher before - but I was able to boot this against DRP endpoint in http://packet.net without any problems


shane
2018-01-23 03:17
_IF_ you had a http://packet.net account you could test this and it'd work with the following commands/notes: * for http://packet.net need `console=ttyS1,115200n8` parameter applied to machine * untar the above TGZ * create bootenv: `drpcli bootenvs create -< bootenvs/rancheros.yml` * create stage: `drpcli stages crate -< stages/rancheros.yml` * add ISO image: `drpcli bootenvs uploadiso rancheros-latest-install`

shane
2018-01-23 03:22
sort out the customizations in the provisioning steps you need for your bare metal then set a machine to the `rancheros-latest-install` stage and off you go

ctrees
2018-01-23 04:02
[drpops@drpe testansible]$ RS_PROFILE=mycluster ./inventory.py | jq Traceback (most recent call last): File "./inventory.py", line 17, in <module> import requests, argparse, json, urllib3, os ImportError: No module named requests [drpops@drpe testansible]$

ctrees
2018-01-23 04:03
somehow I'm messing up the dynamic inventory... I noticed rob had a ln to the inventory... is that needed (aka how stand-alone is inventory.py)

ctrees
2018-01-23 04:05
[drpops@drpe testansible]$ pip --version pip 8.1.2 from /usr/lib/python2.7/site-packages (python 2.7)

greg
2018-01-23 04:06
I think you need to do `pip install requests`

ctrees
2018-01-23 04:09
ok thanks..

ctrees
2018-01-23 04:12
that did it.. and a pip upgrade to 9... thanks again

kamp.scott
2018-01-23 07:39
@shane cool thanks ill give it a shot on XenServer

zehicle
2018-01-23 14:18
The ln let me run it from kubespray without a long path

greg
2018-01-23 17:50
- hi all - Some issues were found with gohai on UEFI enabled systems. We?ve updated gohai, sledegehammer, and community content in tip. Additionally, we?ve added the gohai function into drpcli. So, you can now do `drpcli gohai` and it will attempt to inventory the system. This really on works on linux. The community content has also been updated to use drpcli gohai if available over gohai. This means that for gohai updates in the future we will not need to rev sledgehammer, but instead rev drpcli. what this means to you! If you update to tip community content, you will need to run: `drpcli bootenvs uploadiso sledgehammer` before booting new machines.

greg
2018-01-23 17:51
Oh - for reference, the content has been updated to allow for not updating DRP to tip while still using the tip content. If you choose to do that.

ctrees
2018-01-23 18:25
So... I took a shot and attempted to create a clone of the kubespray yaml to create another type of feed for ansible ( calling it blender2cld ) this is in prep for my visit to the animation studio

ctrees
2018-01-23 18:26
is there a 'trick' to get dr-provision to use ? (I basically just stuffed the updated yaml into saas-content )

ctrees
2018-01-23 18:28
basically the studio has an old grid of mine that they fire up when they need animation rendering so I was going to test it out there while I'm down doing other maintainance

ctrees
2018-01-23 21:04
... got it... important to maintain unique names... even read it in the arch docs...

daniel.bernier
2018-01-24 20:42
has anyone used DRP for ONIE based installs ?

greg
2018-01-24 20:52
@daniel.bernier - umm - hmm - kinda. We?ve talked about it in the past with some switch vendors. Started to show a path on with DRv2. DRP should be similar to setup for it. Just haven?t tried. You interested? Anything in particular you trying to boot/install with ONIE?

daniel.bernier
2018-01-24 20:55
ONL

shane
2018-01-24 20:58

greg
2018-01-24 20:59
In general, it has never been much different than sledgehammer.

greg
2018-01-24 21:09
@daniel.bernier - okay - so - in first quick glance, you could have the ONIE device DHCP. Drp would provide an address and options.

daniel.bernier
2018-01-24 21:12
Yup already getting ips from DRP

greg
2018-01-24 21:12
The DRP config need to set option 114 (default-url) to ?http://DRP_IP:8091/files/NOS_image?

greg
2018-01-24 21:13
then put `NOS_image` in the files directory of DRP.

greg
2018-01-24 21:13
```drpcli files upload NOS_image as NOS_image```

greg
2018-01-24 21:14
if you already have a webserver , you can point it at that instead.


greg
2018-01-24 21:16
You should be able to check leases to find your switch and ssh into.

greg
2018-01-24 21:17
There is a lot of advanced config stuff that our DHCP server can do (kinda like ISC?s) to push it.

greg
2018-01-24 21:17
THe option could be set on the subnet and would like be ignored by all things except switches.

greg
2018-01-24 21:18
Or create a reservation for that specific switch with option 114 for its need.

greg
2018-01-24 21:18
An advanced config option would be to put vendor string decomposition into the option 114 and select the right file for the right vendor type, but that is pretty hardcore.

greg
2018-01-24 21:19
@daniel.bernier - make sense?

daniel.bernier
2018-01-24 21:21
@greg totally

daniel.bernier
2018-01-24 21:21
Will try it in a bit

daniel.bernier
2018-01-24 21:21
Already statically reserves but prefer the whole subnet approach

daniel.bernier
2018-01-24 21:22
Next question will be around gohai inventory :-) but that will wait for tomorrow

greg
2018-01-24 21:23
on the switch?

greg
2018-01-24 21:25
Though, it might work for just doing TFTP/http waterfall process as well.

greg
2018-01-24 21:28

daniel.bernier
2018-01-24 23:00
No gohai for existing servers

zehicle
2018-01-24 23:49
w/ the new v1.6 DRPCLI gohai command - you may be able to run it that way. would that work?

2018-01-25 14:40
Hey guys, I'm starting to test digitalrebar/provision and seems that the docs are out of sync (Compared to doc folder in GitHub repo).

shane
2018-01-25 14:51
@amontalban - please use the `latest` version - it is the most up to date in relation to the current version (v3.6.0)


2018-01-25 14:53
Great thanks :+1:

shane
2018-01-25 14:56
no problem - if you bump in to any issues or obvious errors in doc - please let me know .... we've been working on cleaning them up and enhancing them

2018-01-25 14:57
Awesome, will do

shane
2018-01-25 14:57
we'd also be happy to send you a Slack invite so you can use the native Slack app to communicate with us

2018-01-25 14:58
Sure, do yo need my email or something?

shane
2018-01-25 14:59
yes - feel free to email me directly, if you'd rather not post it here ()

2018-01-25 14:59
:+1:

shane
2018-01-25 14:59
not sure if you can direct message me via the sameroom integration ... ?

2018-01-25 15:00
Seems so

shane
2018-01-25 15:01
(I didn't receive a D.M. )

2018-01-25 15:02
Alright, it's amontalban AT perceptyx DOT com

shane
2018-01-25 15:03
sent

amontalban
2018-01-25 15:04
has joined #community201801

shane
2018-01-25 15:04
welcome (officially...!) @amontalban

amontalban
2018-01-25 15:05
Thanks :slightly_smiling_face:

zehicle
2018-01-25 16:07
$welcome

amontalban
2018-01-25 17:35
Guys, I?m trying to have a machine use a custom bootenv (Want to install FreeBSD)

amontalban
2018-01-25 17:35
I have set the BootEnv for the machine, but for some reason it still loads sledgehammer

amontalban
2018-01-25 17:35
Any pointer?

amontalban
2018-01-25 17:44
NVM, I think I know what?s going on

lae
2018-01-25 18:16
@greg upon upgrading from 3.4.1 to 3.5.0 (and 3.6.0) I'm running into what seems like it might be the same issue with there being null values in our machine's metadata

shane
2018-01-25 18:17
hmm ... @lae do you have hand built content that this is occurring in ?

shane
2018-01-25 18:17
I haven't seen the issue w/ upgrades to v3.5.0 or v3.6.0 - with Digital Rebar or RackN content

shane
2018-01-25 18:18
I don't think I've had hand hacked content that I tried upgrading around ...


shane
2018-01-25 18:18
are you generating the machine objects in advance of machines showing up - or is this generated by DRP on new machines ?

lae
2018-01-25 18:18
I removed machine definitions and it starts up fine

lae
2018-01-25 18:18
if I re-add one of them, it fails to start

lae
2018-01-25 18:19
we're usually creating machine objects with drpcli

shane
2018-01-25 18:19
def. sounds similar to the `null` issue

shane
2018-01-25 18:19
you do _have_ to create them with a value for required fields, not let it `null` out

shane
2018-01-25 18:19
but - we shouldn't let you create a machine if a required field is missing ... ?

lae
2018-01-25 18:20
are Meta and Params supposed to be required fields?

lae
2018-01-25 18:20
not sure what Meta would be set to

greg
2018-01-25 18:20
Can you send me one? They are now, but the code should have migrated them.

lae
2018-01-25 18:20
There's one at the bottom of that paste I linked

greg
2018-01-25 18:20
Params and Meta can be ?{}? to start.

greg
2018-01-25 18:20
missed that.

greg
2018-01-25 18:27
Okay - I know what it is.

greg
2018-01-25 18:27
We thought we were already doing this.

lae
2018-01-25 18:33
Should I go ahead and string replace the null values with [] (by looks of it iI see null values in Meta/Errors/Params/Profiles spread across all of the machines) or just wait for an update that'll take care of validating/cleaning it up?

lae
2018-01-25 18:38
(gonna head to sleep since it's almost 4am, I'll check back later)

shane
2018-01-25 18:38
lazy bones ...

shane
2018-01-25 18:38
yes - string replace w/ `[]` should fix it

greg
2018-01-25 18:47
no - `{}`

greg
2018-01-25 18:47
well.

greg
2018-01-25 18:47
Meta, Params are `{}`

shane
2018-01-25 18:47
depends - array -vs- object

greg
2018-01-25 18:47
Errors would be `[]`

greg
2018-01-25 18:53
@lae - I?m fixing. It maybe a few hours.

amontalban
2018-01-25 19:13
Guys, how can I validate the generated pxelinux file for a machine?

amontalban
2018-01-25 19:14
(My drp-data/tftpboot/pxelinux.cfg is empty)

greg
2018-01-25 19:15
it will be. It is a virtual filesystem file.

greg
2018-01-25 19:16
you will need to curl them from their expected location

greg
2018-01-25 19:16
`http://<ip>:8091/pxelinux.cfg/default`

greg
2018-01-25 19:17
That would be the one served by discovery bootenv.

greg
2018-01-25 19:17
If the machine has the bootenv set, you can go to the expanded url from the template list in the bootenv.

amontalban
2018-01-25 19:17
Ah alright, thanks!

amontalban
2018-01-25 19:30
BTW, would be great to have the `memdisk` file from syslinux out of the box

greg
2018-01-25 19:31
Issue, please! @amontalban

amontalban
2018-01-25 19:31
Sure, I?m trying to boot FreeBSD so once I get that will do a PR if possible :slightly_smiling_face:

shane
2018-01-25 19:32
Or better: Pull Request !! :slightly_smiling_face:

greg
2018-01-25 19:36
@lae - I think I have a fix for you in tip. It will be there in about 30 minutes.

detiber
2018-01-25 20:51
@vlowther @greg I'm hitting some issues with pxe booting with uefi, and it looks like the situation is worst post 3.5 for me, since <= 3.5 my host would attempt to use binl on 4011, and > 3.5 it just keeps resending dhcp discover on 68

detiber
2018-01-25 20:52
I think something like the uefi workflow from https://github.com/google/netboot/blob/master/pixiecore/README.booting.md might be needed for hardware like mine

detiber
2018-01-25 20:54
As an aside, the api for the dhcp library used by pixiecore looks a lot cleaner than the one that is currently being used, but it isn't a drop in replacement :slightly_smiling_face:

greg
2018-01-25 20:56
Yeah. I thought about. You mean the krakow vs pxicore parts. I assume.

detiber
2018-01-25 20:56
@greg indeed, I started to try and take a hack at swapping out the libraries, but ended up in a rabbit hole that I wasn't prepared for :slightly_smiling_face:

greg
2018-01-25 20:58
I had the pixie UEFI path but victor tested it on our UEFi system it didn?t work and he altered to work in our lab. What hardware are. You running. This is really hard on a phone with autocorrect.

detiber
2018-01-25 20:59
I haven't fully diagnosed what is going on yet, but it appears that I have at least 2 boxes that work with pixiecore but not dr-provision. One is a LivaX (http://www.ecs.com.tw/ECSWebSite/Product/Product_LIVA.aspx?DetailID=1593&LanID=0) and the other is a frakenbox using this MB: https://www.asus.com/us/Motherboards/SABERTOOTH_X79/

greg
2018-01-25 21:01
Okay. Was 3.5 working?

detiber
2018-01-25 21:02
No, but 3.5 seemed to attempt to boot using binl on port 4011, but still failed. With newer builds it just keeps resending dhcp discover packets on 68

vlowther
2018-01-25 22:15
Hm.

vlowther
2018-01-25 22:16
Do you have a pxe stack that works with that gear?

vlowther
2018-01-25 22:17
It would be good to get a packet trace of all the ports involved.

vlowther
2018-01-25 22:19
I have also been working on a DHCP stack refactor, so that would be a good Branch to test with.

vlowther
2018-01-25 22:21
https://github.com/digitalrebar/provision/pull/649 would be a good branch to test with.

lae
2018-01-26 06:04
fix in tip appears to work - although I do still see the null values in the machine objects

lae
2018-01-26 06:04
@greg

greg
2018-01-26 14:14
Yeah it is fixed on load. As the machines get saved. They will change over time.

greg
2018-01-26 14:14
@lae

lae
2018-01-26 14:19
kk

amontalban
2018-01-26 16:51
Hey guys, why machines are indexed by random UUID and not by system UUID (Like the one inside goahi-inventory)?

shane
2018-01-26 16:52
not every operating system or hardware vendor provides a reliable UUID to use

shane
2018-01-26 16:52
we need to insure consistency across all hardware and operating system platforms

shane
2018-01-26 16:53
gohai inventory is showing you the hardware/bios generated system UUID

shane
2018-01-26 16:53
the Digital Rebar Provision ID is guaranteed to be unique across all hardware/OS types we come across

amontalban
2018-01-26 16:54
I see, thanks for the explanation :+1:

shane
2018-01-26 16:54
no problem

vlowther
2018-01-26 17:03
ya, I have seen cases where the system UUID is all zeros, or some crazy stuff like 1234-56-7-8901234, or "ToBe Filled In by O.E.M", among other things.

vlowther
2018-01-26 17:04
So for now we just don't rely on it.

amontalban
2018-01-26 17:12
Guys, don?t hate me but I?m getting a panic error, what information is needed for opening an issue besides the log itself? (It?s inside TFTP server)

amontalban
2018-01-26 17:19
Mmm, might be fixed by @vlowther here https://github.com/digitalrebar/provision/pull/662

vlowther
2018-01-26 17:20
No, that will just make the panic contain the information I need to debug it. :grinning:

amontalban
2018-01-26 17:21
Ah :slightly_smiling_face:

vlowther
2018-01-26 17:21
So if you are getting that panic after updating to the latest tip, pm me the panic message.

amontalban
2018-01-26 17:21
Awesome, thanks

amontalban
2018-01-26 17:22
First I have to check how to update to master

shane
2018-01-26 17:52
@amontalban we don't cut compiled releases against `master` - we have `tip` which sits _slightly_ behind master - we do release compiled versions for `tip`. Once we've done basic sanity checking and minimal testing - we set the Build system to point `tip` at a specific commit.

shane
2018-01-26 17:53
So ... until the Issue 662 is included in `tip` - you can't get binaries from us

shane
2018-01-26 17:53
...you can... if you are so enterprising enough ... compile your own binaries from `master` yourself

shane
2018-01-26 17:54

shane
2018-01-26 17:55
(you need to have Go 1.9.0 or newer setup and working first, etc... )

amontalban
2018-01-26 17:55
Awesome, thanks :+1:

vlowther
2018-01-26 18:30
Tip should have that PR now, BTW.

vlowther
2018-01-26 18:41
@amontalban Did that help?

amontalban
2018-01-26 18:45
Still setting up everything again

amontalban
2018-01-26 18:47
Ok, it crashed again. Should I reach you over PM?

vlowther
2018-01-26 19:01
PM me the stack trace.

detiber
2018-01-26 19:02
@vlowther pixiecore works on my hardware, but I'm trying to avoid writing the workflow stuff myself and would rather use dr-provision. I'll work on getting some packet traces together using both dr-provision and pixiecore later today

vlowther
2018-01-26 19:03
Cool. tcpdump in as much detail as you can, raw DHCP packets if you can get them would be appreciated.

vlowther
2018-01-26 19:04
The current tip is known to work on Dell T320 and R720 gear, and one of our customers has reported that recent HP and Supermicro gear works as well.

2018-01-26 19:04
Time to feed the :bear:!

wdennis
2018-01-26 20:04
@zehicle Having a problem with the UX's "Org. Name & Endpoints" screen; trying to remove a duplicate endpoint (I have two that are the same system for some reason) and when I click the Remove button on the one I want deleted, and then click the bottommost Save button, nothing happens...

wdennis
2018-01-26 20:05
(i.e. I can't delete the endpoint)

wdennis
2018-01-26 20:05
Tried with Safari 11.0.3, and Chrome 64.0.3282.x

wdennis
2018-01-26 20:12
Another UX issue...

wdennis
2018-01-26 20:13
When I log into my newly-upgraded DRP 3.6 system, I see the upgrade notif's as so:


wdennis
2018-01-26 20:14
But when I go to "Content" to upgrade them, the "Update" buttons are not available...


zehicle
2018-01-26 20:15
Checking.... @wdennis

zehicle
2018-01-26 20:20
@wdennis for the endpoint delete thing... you had a very early account with the old naming convention. should be fixed now

wdennis
2018-01-26 20:21
Oh, OK - I was trying to get rid of the 1st one...

wdennis
2018-01-26 20:23
@shane What is the new param I can use in a profile to specify a custom preseed template?

wdennis
2018-01-26 20:24
Not seeing anything when I drop the "Choose undefined..." box

greg
2018-01-26 20:24
Check the params in the UX. I think it is kickseed or something ilke that

wdennis
2018-01-26 20:25
@greg Does that come in with the newer `drp-community-content` that I can't seem to be able to update to?

greg
2018-01-26 20:26
yes - it is part of the community content update.

wdennis
2018-01-26 20:27
OK, that's why...

wdennis
2018-01-26 20:27
Is there a way to update the content via `drpcli`?

wdennis
2018-01-26 20:29
Is it `drpcli contents update [id] [json] [flags]`? If so, how to get remote content?

zehicle
2018-01-26 20:34
@wdennis I've duplicated the content screen issue - looking at the problem

wdennis
2018-01-26 20:46
@zehicle Thx

zehicle
2018-01-26 20:47
FWIW - this issue exposes that we can detect both minor (intra version) and major (extra version) changes. In this case, the content page says that no patches (intra) are needed and is overlooking the upgrade.

wdennis
2018-01-26 20:49
@zehicle But in this case, that's wrong, correct? There is a major update...

zehicle
2018-01-26 20:49
yes. working to add buttons for both cases

wdennis
2018-01-26 20:50
Also, verified that the endpoint save is working - renamed the one endpoint I had left, and it did save. However, no feedback when the "Save" button is clicked - could that be added somehow?

shane
2018-01-26 20:50
@wdennis - the Param you're looking for is indeed `kickseed` - it can be used interchangeably for Kickstart definitions or Preseed (hence it's munged name)

wdennis
2018-01-26 20:50
@shane thx, can't wait to use it!



wdennis
2018-01-26 20:51
(Actually, I do have a system to install, so hope I don't have to wait too long :wink: )

shane
2018-01-26 20:51
(and, I know about the spelling error in the note box ... )

wdennis
2018-01-26 20:52
Nice -- added `jq` usage examples! That's great

wdennis
2018-01-26 20:54
@shane one other thing -- I think it is now possible to "nest" templates, for instance a disk-partitioning template in the bigger kickseed template... Is that documented somewhere?

wdennis
2018-01-26 20:54
I think @lae pioneered that...

shane
2018-01-26 20:55
there is a constant stream of updates in the background, to the Latest doc :slightly_smiling_face:

shane
2018-01-26 20:55
I'm not sure if that's added to Doc yet - there were significant changes to the Architecture (data) docs - @vlowther worked up a lot of new info there

shane
2018-01-26 20:56
all of this stuff got a lot of updates: http://provision.readthedocs.io/en/latest/doc/arch.html

shane
2018-01-26 21:01
@wdennis I think this might be it? `{{template "something.tmpl" .}}` Will call another template named `something.tmpl` and expand it inline

shane
2018-01-26 21:01
that's the golang template pattern you'd use

wdennis
2018-01-26 21:04
OK, cool

greg
2018-01-26 21:07
yes - template or .CallTemplate can be nested.

wdennis
2018-01-26 21:10
I actually see this in the custom seed template I made some time ago:

wdennis
2018-01-26 21:11

wdennis
2018-01-26 21:20
@greg That's what you are talking about?

greg
2018-01-26 21:22
yes

greg
2018-01-26 21:23
`template` only takes a hard coded string. golang text template definition.

greg
2018-01-26 21:23
`.CallTemplate` takes something is a string. So, it can be a parameter or expression that evaluates into a string.

wdennis
2018-01-27 02:30
Looks like the param is actually named `select-kickseed` - I can see it in the params list when I "preview" the updated Community Content

wdennis
2018-01-27 02:31
So close, yet so far away...

greg
2018-01-27 17:22
- when people get a chance, can they PM me if they are using plugins?

kamp.scott
2018-01-27 17:24
Does KRIBs count

greg
2018-01-27 17:24
no - those are content packages.

greg
2018-01-27 17:25
this would be: incrementer, ipmi, packet-ipmi, virtualbox-ipmi, slack

greg
2018-01-27 17:25
or one you created. If you created a plugin, lord, please talk to me know. :slightly_smiling_face:

wdennis
2018-01-27 18:52
@greg ipmi (bare metal)

greg
2018-01-28 03:21
- tip has been updated with new plugins support.

greg
2018-01-28 03:22
THis means that you will need to update your plugins immediately after updating to tip.

greg
2018-01-28 03:22
I think this really only effects @wdennis

greg
2018-01-28 03:22
Just use the SaaS tip. I think.

wdennis
2018-01-28 03:23
@greg Not running tip, but stable - v3.6.0-0-0e5ccf678a3e5b5fdb10f86261247cd28c858ac0

greg
2018-01-28 03:23
I have not pushed this to a release.

wdennis
2018-01-28 03:24
Waiting for @zehicle to fix issue in UX

greg
2018-01-28 03:24
Okay - so you are fine. Don?t update plugins (to tip) without updating DRP.

wdennis
2018-01-28 03:25
ACK

zehicle
2018-01-28 03:30
working on it...

wdennis
2018-01-28 03:44
If there was another way to get updated content for v3.6.0, I wouldn?t be so worried about it - want to be able to use a custom kickseed to roll out machines; need the new param to do that?

wdennis
2018-01-28 03:45
So if there?s another way, let me know

shane
2018-01-28 18:20
@wdennis does this FAQ answer your question? It outlines how to use the CLI to download and apply Content upgrade... http://provision.readthedocs.io/en/latest/doc/faq-troubleshooting.html#update-community-content-via-command-line

zehicle
2018-01-29 05:41
Content upgrade by version is being tested internally. Hopefully available for advanced users tomorrow.

zehicle
2018-01-29 05:43
We'll talk about the new RackN UX stages in the community call on Tuesday. The short version is that we're moving to release stages where we do not roll new UX code into full production in a single step. https://portal.rackn.io will have the most stable version of the UX. New features will surface in https://latest.rackn.io for earlier testing.

wdennis
2018-01-29 14:48
Thx @zehicle!! Sounds like a better plan for UX changes.

wdennis
2018-01-29 14:50
@shane are you available?

shane
2018-01-29 14:50
what's up ?


shane
2018-01-29 14:51
fire away

wdennis
2018-01-29 14:52
1) I think there's a missing `drpcli` at the example show under "View our currently installed Content version:"

shane
2018-01-29 14:52
ah - yep - when I cut-n-pasted the command, I removed my Prompt string ... one too many "dw" commands :disappointed:

wdennis
2018-01-29 14:53
2) Why the `export VER=..."xxx"` before the `curl`?

shane
2018-01-29 14:53
when I do doc - I try to separate out "stuff" that is dynamic as a variable - it makes cut-n-paste of commands easier if you're following along, IMO

shane
2018-01-29 14:54
you can "set" the variable - and cut-n-paste the command w/out change

wdennis
2018-01-29 14:54
3) Under that, `No update the content.` --> `Now update the content.`

shane
2018-01-29 14:54
it also allows re-use across different use tests with tweaking the Var - also if it gets embedded in a script - those dynamic pieces often change, so you want to separate it as a Var/Param

wdennis
2018-01-29 14:55
So I could (should) do: `export VER="stable"` if I'm running v3.6.0 stable?

shane
2018-01-29 14:55
yep

wdennis
2018-01-29 14:56
Thx

shane
2018-01-29 14:56
we also have some catalog changes that emits all version info as well - that will be out soon

shane
2018-01-29 14:56
we'll talk about that tmw at meetup

wdennis
2018-01-29 14:57

shane
2018-01-29 14:57
yep :slightly_smiling_face:

wdennis
2018-01-29 14:58
OK, let's give it a try...

shane
2018-01-29 14:58
ok - let me know if any other changes needed - I have those changes staged and ready to push

wdennis
2018-01-29 15:03
When it says `NOTE that content that is marked Writable may need to be destroyed, and recreated if it?s currently in use on other objects. For Read Only content you can safely update the content.` - does that mean if the object attribute `ReadOnly:` is set to `false` then that == "content that is marked Writable"?

shane
2018-01-29 15:04
yep

wdennis
2018-01-29 15:05
So if I just try `drpcli contents update drp-community-content -< drp-cc.yaml` will it error out if one of the objects is currently in use by something? Of will it update?

shane
2018-01-29 15:06
Community Content is readonly - so you can safely update it - that is by design

shane
2018-01-29 15:06
:slightly_smiling_face:

shane
2018-01-29 15:06
The whole purpose to our layered filesystem model to have a layer as ReadOnly that can be safely updated

wdennis
2018-01-29 15:06
OK

shane
2018-01-29 15:07
I've amended the note to be a little more clear with that *field* value of `ReadOnly`

shane
2018-01-29 15:09
also remember - you can stop DRP, backup the `/var/lib/dr-provision` or `<wherever>/drp-data` directory, and restart - then apply content changes ... ultimately content packs are stored in the `saas-content` directory under each location ...

wdennis
2018-01-29 15:10
I'm doing the `drpcli contents update drp-community-content -< drp-cc.yaml` step but it's just hanging...

shane
2018-01-29 15:11
any messages in your dr-provision log ? that should return quickly - when I tested it saturday it only took a second or so

shane
2018-01-29 15:12
if you have an ISO upload or something happening at the same time - the content layer might be locked from writing (to disk) - so it'll block waiting for that to finish

wdennis
2018-01-29 15:13
This is last few lines of log:

shane
2018-01-29 15:13
so - I just ran those steps - and it ran smoothly for me

shane
2018-01-29 15:14
I'm using a stable DRP endpoint (v3.6.0)

wdennis
2018-01-29 15:14

shane
2018-01-29 15:14
with CC to v1.5.0 update

shane
2018-01-29 15:14
those are the basic annoying audit log messages

shane
2018-01-29 15:15
are you running drpcli from your endpoint - or using an admin laptop/system to point at remote endpoint ?

wdennis
2018-01-29 15:15
drpcli on endpoint

shane
2018-01-29 15:15
do you have RS_ENDPOINT variable set to something else ? or a different user/pass pair?

wdennis
2018-01-29 15:17
no, `RS_KEY` set to default , no `RS_ENDPOINT` set

wdennis
2018-01-29 15:19
OK; stopped/started DRP, and then did following:


shane
2018-01-29 15:20
there you go

shane
2018-01-29 15:20
that's success

wdennis
2018-01-29 15:21
Looks like prior attempt did update at least some things; ver was `"v1.1.0-17-7040582223c11766fcb741ac9436f17c486e271b"` before

shane
2018-01-29 15:21
did you try any other drpcli commands while it was "paused" (from another shell session, etc) ??

shane
2018-01-29 15:21
eg. was DRP responding to other commands ?

shane
2018-01-29 15:22
you'll need to update your sledghammer too (`drpcli bootenvs uploadiso sledgehammer`)

shane
2018-01-29 15:23
most of those warnings are just validation checks - since you don't have those BootEnvs installed, it'll chuck a Warning

wdennis
2018-01-29 15:23
Yes, drpcli was responding properly before I tried the update cmd

shane
2018-01-29 15:23
ok

wdennis
2018-01-29 15:23
After the failed update, though, it was not

wdennis
2018-01-29 15:23
That's why I stopped/restarted DRP (running isolated)

shane
2018-01-29 15:23
that was what I wanted to know

wdennis
2018-01-29 15:25
So there's a newer SH needed for v1.5.0 community-content?

shane
2018-01-29 15:25
yep - you'll see the Warnings related to SH - errors validating since it's missing

shane
2018-01-29 15:26
assuming you have defaultBootEnv/defaultStage/unknownBootEnv using sledgehammer pieces, those won't work until it's updated

wdennis
2018-01-29 15:26
In the UX Bootenvs screen, SH has the "blue check of happiness"

wdennis
2018-01-29 15:27
`sledgehammer-f5ffd3ed10ba403ffff40c3621f1e31ada0c7e15.tar`

shane
2018-01-29 15:27
hmm - I would suggest updating sledgehammer, as we don't test new content w/ old sledgehammer - if content relies on changes in sledgehammer, that can cause unintended sideeffects

wdennis
2018-01-29 15:28
So, I have to d/l tar file, then do `drpcli bootenvs uploadiso sledgehammer`? Or does that cmd do the d/l?

shane
2018-01-29 15:28
if you have the f5ff ... I think that's latest - checking

wdennis
2018-01-29 15:28
Thx

shane
2018-01-29 15:29
your errors indicated you don't have the TAR ball for it, so it failed to explode it

shane
2018-01-29 15:29
``` "Explode ISO: iso does not exist: /isos/sledgehammer-f5ffd3ed10ba403ffff40c3621f1e31ada0c7e15.tar\n",```

shane
2018-01-29 15:30
did you explode it, then remove the tarball ? It'd be in the `tftpboot/isos/` directory

wdennis
2018-01-29 15:30
huhwat


shane
2018-01-29 15:32
huh - odd it'd chuck that warning then ...

shane
2018-01-29 15:32
I'll open an issue about that

wdennis
2018-01-29 15:32
OK, thx

shane
2018-01-29 15:33
everything else look good in the doc ? (I did fix another typo "brief example o how to" ... )

shane
2018-01-29 15:33
otherwise, I'll push the update

wdennis
2018-01-29 15:34
Don't see it in `v: tip` - is that where I can proof?

shane
2018-01-29 15:35
hmm - that might be old warnings from previously - versus newly generated warnings

shane
2018-01-29 15:35
I haven't pushed the change yet

shane
2018-01-29 15:35
it'd be in "latest" doc when I make the push - `tip` provision won't be updated until we move the commit pointer ...

shane
2018-01-29 15:36
I can push the branch, and you can review the branch on github before I merge it

wdennis
2018-01-29 15:36
OK


wdennis
2018-01-29 15:42
Not sure of syntax, but is there a space needed with the `-<` in ` $ drpcli contents update drp-community-content -< drp-cc.yaml`?

wdennis
2018-01-29 15:42
Like `- <`

shane
2018-01-29 15:43
nope not needed


wdennis
2018-01-29 15:46
OK, looks good to me then

wdennis
2018-01-29 15:50
@shane One more quick q: The value of `select-kickseed` should be a template ID, right? (such as, `necla-ubu-seed.tmpl`)

shane
2018-01-29 15:58
It's just a Parameter - so you can use it or set it wherever you want essentially. You can apply the Param directly to a machine: `drpcli machines set "09ae3ae4-095f-40e8-a544-6ac0aa336a30" param select-kickseed to "my-wondrous-kickstart.ks"` or add it to a Profile, and apply that profile to a Machine

shane
2018-01-29 15:59
ultimately - this can point to a template

shane
2018-01-29 15:59
(the value of the Param)

shane
2018-01-29 16:00
(yes, a Template name is the ID you'd use to reference)

wdennis
2018-01-29 16:01
Cool, thx

wdennis
2018-01-29 16:16
OK, so, problems...

wdennis
2018-01-29 16:17
Here's my machine object (target of custom kickseed install):


wdennis
2018-01-29 16:19
Seeing this at PXE install:


shane
2018-01-29 16:24
can you render the template/seed via the HTTP server ?

wdennis
2018-01-29 16:24
No

wdennis
2018-01-29 16:24
Comed back blank

shane
2018-01-29 16:24
ok - I can give you a hand in a bit - in a mtg now, and then I have to get some breakfast

wdennis
2018-01-29 16:24
K

wdennis
2018-01-29 16:29
Oh wait, maybe my bad...

wdennis
2018-01-29 16:29
Seeing this in logs:

wdennis
2018-01-29 16:30

wdennis
2018-01-29 16:34
Guess I named it wrong:

wdennis
2018-01-29 16:34
```[dradmin@dr-admin drp]$ drpcli templates show part-scheme-separate_home.tmpl { "Available": true, "Contents": "{{if .ParamExists \"operating-system-disk\" -}}\nd-i partman-auto/disk string /dev/{{.Param \"operating-system-disk\"}}\nd-i grub-installer/choose_bootdev select /dev/{{.Param \"operating-system-disk\"}}\nd-i grub-installer/bootdev string /dev/{{.Param \"operating-system-disk\"}}\n{{else -}}\nd-i partman-auto/disk string /dev/sda\nd-i grub-installer/choose_bootdev select /dev/sda\nd-i grub-installer/bootdev string /dev/sda\n{{end -}}\nd-i partman-auto/method string lvm\nd-i partman-auto-lvm/guided_size string max\nd-i partman-auto-lvm/new_vg_name string {{.Machine.ShortName}}\nd-i partman-auto/choose_recipe select custom_lvm\nd-i partman/auto expert_recipe string \\\n custom_lvm:: \\\n 500 50 1024 free $iflabel{ gpt } $reusemethod{ } method{ efi } format{ } . \\\n 128 50 256 ext2 $defaultignore{ } method{ format } format{ } use_filesystem{ } filesystem{ ext2 } mountpoint{ /boot } . \\\n 10240 20 1228800 ext4 $lvmok{ } mountpoint{ / } lv_name{ root } in_vg{ {{.Machine.ShortName}} } method{ format } format{ } use_filesystem{ } filesystem{ ext4 } . \\\n 10240 100 10240000000 ext4 $lvmok{ } mountpoint{ /home } lv_name{ home } in_vg{ {{.Machine.ShortName}} } method{ format } format{ } use_filesystem{ } filesystem{ ext4 } . \\\n 50% 20 100% linux-swap $lvmok{ } lv_name{ swap } in_vg{ {{.Machine.ShortName}} } method{ swap } format{ } .\nd-i grub-installer/only_debian boolean true\n", "Description": "", "Errors": [], "ID": "part-scheme-separate_home.tmpl", "Meta": {}, "ReadOnly": false, "Validated": true }```

wdennis
2018-01-29 16:36
The default one is named `part-scheme-default.tmpl` so I guessed my custom one s/b named `part-scheme-XXXXX.tmpl`

wdennis
2018-01-29 16:37
So they need to be named `part-seed-XXXXX.tmpl` right?

wdennis
2018-01-29 16:53
Naming is hard...

greg
2018-01-29 17:13
For using the `select-kickseed` variable. It is the name of the template in total.

greg
2018-01-29 17:13
```{{$selectKickSeed := (printf "%s" (.Param "select-kickseed")) -}} {{.CallTemplate $selectKickSeed .}}```

greg
2018-01-29 17:15
For using the ubuntu/debian `part-scheme` variable, you need to have template that has the full name `part-scheme-<var>.tmpl` where `<var>` is the value of the the `part-scheme` variable. ``` {{$templateName := (printf "part-seed-%s.tmpl" (.Param "part-scheme")) -}} {{.CallTemplate $templateName .}} ```

greg
2018-01-29 17:15
@wdennis - okay?

wdennis
2018-01-29 17:33
OK now, got by that problem by cloning `part-scheme-separate_home.tmpl` to `part-seed-separate_home.tmpl`

wdennis
2018-01-29 17:34
But did you guys come up with the `{{$templateName := (printf "part-seed-%s.tmpl" (.Param "part-scheme")) -}}` line?

wdennis
2018-01-29 17:35
I would have guessed (and did!) that `printf "part-seed-%s.tmpl"` would have been `printf "part-scheme-%s.tmpl"` to fit with the pattern of the default one

shane
2018-01-29 17:41
the Nested Templates FAQ sections contains that notation



wdennis
2018-01-29 18:01
"RTFM" :joy:

wdennis
2018-01-29 18:02
Well, anyways, with that fixed, it works!

shane
2018-01-29 18:03
xclnt

wdennis
2018-01-29 18:03
However, now I have preseed problems :disappointed:

wdennis
2018-01-29 18:04
a) had `d-i passwd/make-user boolean false` in preseed, but installer stopped and made me set up local user...

wdennis
2018-01-29 18:05
b) my custom disk partitioning scheme didn't work as expected...

wdennis
2018-01-29 18:07
(I know these aren't DRP/RackN problems)

wdennis
2018-01-29 18:23
Here *is* a DRP problem, though:

wdennis
2018-01-29 18:24
I have the `ipmi` plugin set up and (was) working; I have been trying to powercycle a machine like so:


wdennis
2018-01-29 18:25
Output looks OK to me, but - no powercycle...

wdennis
2018-01-29 18:25
If I do it "manually" via:

greg
2018-01-29 18:26
What version of DRP? What version of ipmi plugin? If you are at DRP tip, then you need to update the plugin provider or it won?t get loaded.

wdennis
2018-01-29 18:26

wdennis
2018-01-29 18:26
it does work...

greg
2018-01-29 18:26
In the plugin view, does it show the ipmi plugin available?

wdennis
2018-01-29 18:27
On DRP 3.6 stable, I thought did NOT need to update plugins for that...

greg
2018-01-29 18:27
In the plugin-provider view, does it show ipmi installed?

greg
2018-01-29 18:27
You do not.

wdennis
2018-01-29 18:27
Yes, In UX "Plugins", `ipmi` is showing w/ blue checkmark

greg
2018-01-29 18:28
```drpcli machines action 00c933b9-b044-45e9-9c2e-d05abdd8c9c4 powercycle```

greg
2018-01-29 18:28
Shows you how to call it and what is required.

greg
2018-01-29 18:29
```drpcli machines runaction 00c933b9-b044-45e9-9c2e-d05abdd8c9c4 powercycle```

greg
2018-01-29 18:29
actually runs the action

wdennis
2018-01-29 18:32
And the student was enlightened

wdennis
2018-01-29 20:23
@zehicle More UX wonkiness: When I do an ipmi action on a host, it's throwing an error (but the impi does work):

wdennis
2018-01-29 20:24

vlowther
2018-01-29 20:25
welp, which machine wants the profile named ''?

wdennis
2018-01-29 20:33
@vlowther All my machines have a Profile block as so: ```"Profile": { "Available": false, "Description": "", "Errors": [], "Meta": {}, "Name": "", "Params": null, "ReadOnly": false, "Validated": false },```

greg
2018-01-29 20:34
yes - they do. The question is what do the machines Profiles array have in it.

wdennis
2018-01-29 20:34
Ah, let's get that then...

wdennis
2018-01-29 20:36
They all have something there... ```[dradmin@dr-admin drp]$ drpcli machines list | jq '.[].Profiles' [ "necla-ubuntu-default" ] [ "k8s-cluster1" ] [ "k8s-cluster1" ] [ "k8s-cluster1" ] [ "k8s-cluster1" ] [ "k8s-cluster1" ]```

wdennis
2018-01-29 20:37
The one that I got the screenshot error on was the one with `necla-ubuntu-default`

greg
2018-01-29 20:37
Do those exist?

wdennis
2018-01-29 20:37
yes they do

greg
2018-01-29 20:38
```drpcli profiles show necla-ubuntu-default``` returns something?

wdennis
2018-01-29 20:39
Yes, it exists

wdennis
2018-01-29 20:40
It's my "normal params" bag

greg
2018-01-29 20:42
okay - cool - it looks like it maybe a ux bug

wdennis
2018-01-29 20:47
You folks know where's the best place to get help on preseed file settings? (specifically, `d-i partman/auto expert_recipe`)

vlowther
2018-01-29 20:48
I usually google for "debian preseed" and take my chances

wdennis
2018-01-29 20:48
Been there, done that :slightly_smiling_face:

vlowther
2018-01-29 20:48
yeah, that is a good as the docs get without reading you some Perl.

wdennis
2018-01-29 20:49
It's not taking my `/home` fs lvol spec... ```d-i partman/auto expert_recipe string \ custom_lvm:: \ 500 50 1024 free $iflabel{ gpt } $reusemethod{ } method{ efi } format{ } . \ 128 50 256 ext2 $defaultignore{ } method{ format } format{ } use_filesystem{ } filesystem{ ext2 } mountpoint{ /boot } . \ 10240 20 1228800 ext4 $lvmok{ } mountpoint{ / } lv_name{ root } in_vg{ testinstall } method{ format } format{ } use_filesystem{ } filesystem{ ext4 } . \ 10240 20 10240000000 ext4 $lvmok{ } mountpoint{ /home } lv_name{ home } in_vg{ testinstall } method{ format } format{ } use_filesystem{ } filesystem{ ext4 } . \ 50% 20 100% linux-swap $lvmok{ } lv_name{ swap } in_vg{ testinstall } method{ swap } format{ } .```

vlowther
2018-01-29 20:50
yeah, I learned enough of that to get our default layout. That was years ago.

wdennis
2018-01-29 20:50
(and blissfully forgot it immediately thereafter :joy: )

vlowther
2018-01-29 20:51
Basically.

wdennis
2018-01-29 20:51
kickstart is _so_ much saner...

vlowther
2018-01-29 20:51
Though not with the urgency with which i forgot how to write Sendmail configuration files.

wdennis
2018-01-29 20:51
LOL

vlowther
2018-01-29 20:51
Before they had nicities like M4.

shane
2018-01-29 20:52
ewww ... sendmail .... I spent 2 weeks in classes with Eric Allman learning Sendmail from him ...

wdennis
2018-01-29 20:52
Well, looks like Debian IRC here I come...

vlowther
2018-01-29 20:53
Basically once I learned that the postfix mail server existed I promptly erased all traces of sendmail from my domain.

wdennis
2018-01-29 20:54
hear hear

shane
2018-01-29 20:56
the first ever open source commits I ever made were for Sendmail - to expand the queue from a single flat directory to a tiered N level structure for storing queued messages in ... since we were sending over 1 billiion messages a quarter via 100 node sendmail cluster .... we constantly locked the machines up on triple/quadruple lookups in a single directory entry

zehicle
2018-01-29 20:56
Updates to Content UX should be live.

wdennis
2018-01-29 21:09
@shane nice

wdennis
2018-01-29 21:09
gives @shane UNIX War Medal

2018-01-29 22:20
hey, just checking out Digital Rebar as a potential replacement for our exiting Foreman deployments. curious about the plugin system and how I can execute some post-provision actions, my google-fu is failing me and I can't find anything in the docs... can someone point me in the right?

2018-01-29 22:22
*right direction

shane
2018-01-29 22:22
well ... we're on the cusp of releasing an all-new plugin system that makes the current one completely obsolete ...

shane
2018-01-29 22:22
what are you trying to do specifically ? it may not require a plugin ... depending on what you want to do

2018-01-29 22:23
basically want to invoke a webhook on a remote system that we use for the rest of our automation

shane
2018-01-29 22:24
outgoing webhook during provisioning process ?

2018-01-29 22:24
tell that automation system 'hey a new physical node has just been provisioned, go do all the things, here is all the info you need'

shane
2018-01-29 22:25
you can do that fairly easily by adding in a Workflow Stage - which can trigger that webhook for you

shane
2018-01-29 22:25
that would (likely) be a post provisioning Stage, once you're all done, fire off the "phone home" sort of webhook

shane
2018-01-29 22:25
no need for plugin for that action

2018-01-29 22:26
yeah that's exactly what I'm looking for

2018-01-29 22:26
is there any examples of that? the o

2018-01-29 22:27
the only thing I can find about workflows in the docs is here http://provision.readthedocs.io/en/latest/doc/workflows.html

shane
2018-01-29 22:28
once you get Digital Rebar Provision up and running - there are several live examples you can look at ...

shane
2018-01-29 22:28
basically you'd write a Template that "does something" ... in "some language/script" you want on the provisioned OS ... you'd then add that as a Task to a Stage ...

shane
2018-01-29 22:29
that Stage becomes one of the last pieces in your Workflow

shane
2018-01-29 22:29
these are all keywords for you to take a look at our existing stuff in the Community Content

shane
2018-01-29 22:29
our doc isn't up to date w/ workflow and stages yet

2018-01-29 22:34
awesome I'll throw it on a VM tomorrow and check it out!

2018-01-29 22:34
thanks!

daniel.bernier
2018-01-29 22:47
@did the option 114 inside DRP but for some reason ONIE does not kick in for it

shane
2018-01-30 01:16
- Join us tomorrow (Tuesday Feb 1st) at 11:00 am PST for v010 of our Online Meetup topics: Versioned UX Endpoints, UX Content Versions, New Plugin System agenda: https://docs.google.com/document/d/1qe6ycKL2nJpNI9uJ0c1v5kyMWHXzfFyEm9d9x2Ptvfk



amontalban
2018-01-30 01:34
Nice, will try to join :+1:

zehicle
2018-01-30 02:13
note: @nmlaudy that the stages mean that the nodes can make the webcall. If you restrict the webcall to the admin network then a plugin would be the choice. Stages are a simple way to do it and a good starting point. We can setup a call to discuss options if you'd like. You can also request a slack account (http://rackn.com/support/slack).

nmaludy
2018-01-30 11:30
has joined #community201801

romain.lafontaine
2018-01-30 13:29
I'll try to stay as much as possible, interested in the plugin system and the 3.7.0 ^^

zehicle
2018-01-30 13:59
Welcome @nmaludy

nmaludy
2018-01-30 14:10
@zehicle thanks!

shane
2018-01-30 14:54
- today's meetup we'd like to collect use case info around the new Plugin System ... if you think you have any interest in extending Digital Rebar through the plugins - come talk to us today, we'd love your perspective ... (see meetup link above for joining details)

zehicle
2018-01-30 16:05
That includes the collect inventory use cases that have been coming up lately.

romain.lafontaine
2018-01-30 16:05
Just to be sure, what's the timezone attached to the meetup 11AM ?

shane
2018-01-30 16:13
11am PST

romain.lafontaine
2018-01-30 16:14
Damn...

zehicle
2018-01-30 16:46
If you join the meetup or , both will make sure you get the invites on your calendar https://www.meetup.com/digitalrebar

romain.lafontaine
2018-01-30 16:46
I'll, I was guessing that the meetup page shows local time... I'm Tetris-ing my calendar

zehicle
2018-01-30 18:57
loves Tetris as a verb! well done

lae
2018-01-30 19:49
ah I was wondering where this came from lol https://pbs.twimg.com/media/DU0BscnVQAE7yqP.jpg

lae
2018-01-30 19:52
actually never mind, I'm not in either :thinking_face:

wdennis
2018-01-30 19:59
All the things batteries have died... good meetup tho!

wdennis
2018-01-30 20:00
And sorry (not sorry :) I?m such a UX nudge

shane
2018-01-30 20:03
:slightly_smiling_face:

vlowther
2018-01-30 23:24
Sorry I missed it, but my kids have cleaner teeth now!

wdennis
2018-01-30 23:26
We missed you, epic in-mem fs layers discussion

wdennis
2018-01-31 00:57
Freekin? preseed `partman/auto expert_recipe` wrestling?

richard.burrows
2018-01-31 02:24
has joined #community201801

wdennis
2018-01-31 03:25
@greg You still around?

wdennis
2018-01-31 03:47
Actually, anyone that knows anything about Debian (Ubuntu) preeseed `partman/auto expert_recipe`? No matter what I?ve tried, I always get the same (wrong) result?

greg
2018-01-31 04:08
Not really. Can help some tomorrow maybe

wdennis
2018-01-31 04:10
OK, I?m out as well, defeated for the night? Tomorrow?s another day

wdennis
2018-01-31 17:55
And now it?s another day.. time for more preseed wrasslin?

wdennis
2018-01-31 19:36
Oh my lord, this is harder than it should be?

wdennis
2018-01-31 20:49
So, I have the following in my machine?s preseed:


wdennis
2018-01-31 20:50
However, when the install completes, I?m getting this?


wdennis
2018-01-31 20:53
Why u make no /home LV???????

wdennis
2018-01-31 20:54
Anyone out there in DRP with a clue, I?m buying?

greg
2018-01-31 21:04
@wdennis - I think your root partition is min size 10G and max size 100G. with a priority of 102401. Your home drive is min size 10G and max size lots priority 102433. i think that means it will make the root full size and then have no room for /home.

greg
2018-01-31 21:04
It appears you have 100GB drive.

greg
2018-01-31 21:04
You may want to try and change 102400 on the / part to 51200 and see if you get a home drive.

greg
2018-01-31 21:04
Just for a test.

wdennis
2018-01-31 21:05
It?s a 1TB (1000GB) drive?

greg
2018-01-31 21:05
hmm

greg
2018-01-31 21:06
not sure.

greg
2018-01-31 21:06
but I just read the priority should be between the min and max

wdennis
2018-01-31 21:06
Yes, it?s basically a weighting value from what I understand

wdennis
2018-01-31 21:07
I have no idea where to get help on this? #debian in IRC yielded nothing?

greg
2018-01-31 21:07
I?d try putting the 102401 priority to 102399.

wdennis
2018-01-31 21:08
There?s random blog posts, etc but none with any magic for me

vlowther
2018-01-31 21:08
da

vlowther
2018-01-31 21:09
That is the problem with preseeds

vlowther
2018-01-31 21:09
Horrible docs and horrible perl when the docs are not good enough.

wdennis
2018-01-31 21:09
I made a test server, installed by hand, with a correct partitioning, so I know it can be done; just not know how via preseed?

wdennis
2018-01-31 21:10
Does anyone know if Ubuntu changed the Debian preseed logic, or if they stick strictly to Debian?s?

wdennis
2018-01-31 21:10
(That would help with where to ask?)

greg
2018-01-31 21:12
not sure. You may also need to make sure root is first and then home

greg
2018-01-31 21:12
in the list.

wdennis
2018-01-31 21:13
I did play with that? It was that way, then I thought to flip them and see if that did anything different

wdennis
2018-01-31 21:14
I?ve done like 10 installs on the server w/ different partman params?

wdennis
2018-01-31 21:15
I?m thinking of when I (ever) understand this and get it, then it would be cool to have a community library (content pack) of these disk partitioning layouts (both preseed and kickstart) to choose from

wdennis
2018-01-31 21:21
@vlowther Do you know where to read the source code that handles the preseed processing?

wdennis
2018-01-31 21:22
(yes, I?ve sunk to that level :disappointed:)

vlowther
2018-01-31 21:36
uh

vlowther
2018-01-31 21:36
Spread out across the repos in https://anonscm.debian.org/cgit/d-i/

vlowther
2018-01-31 21:36
$DEITY help us all

wdennis
2018-01-31 21:41
Where is your $DEITY now???

vlowther
2018-01-31 21:42
That is some heavy theology that we should probably avoid. :slightly_smiling_face:

wdennis
2018-01-31 21:50
Well, all I know is I?m in $PARTMAN_HELL :rage::imp::tired_face:

wdennis
2018-02-01 01:14
OK, out of desperation, posted a question on Ubuntu Launchpad... https://answers.launchpad.net/ubuntu/+question/663937

wdennis
2018-02-01 01:15
Beginning to think the LVM partitioner only supports root and swap LVs

shane
2018-02-01 01:44
wonders if chucking partman out the window is a better idea and just using `parted`, or other tools as part of a `d-i preseed/late_command string...` method

zehicle
2018-02-01 02:46
w00t new UX feature rolling through testing allows you to set Machine icon & color

wdennis
2018-02-01 06:45
Now investigating this: https://launchpad.net/kickseed

lae
2018-02-01 06:54
kickstart to preseed has several limitations

lae
2018-02-01 06:54
last i looked

lae
2018-02-01 06:55
also i'll take a closer look at your preseed later, heading out atm