2018年4月30日星期一

常用操作指令

ansible-vault是ansible所提供的加解密檔案的功能，但目前只能針對file做操作

加密檔案	ansible-vault encrypt group_vars/all
解密檔案	ansible-vault decrypt group_vars/all
查看檔案	ansible-vault view group_vars/all	類似less group_vars/all
編輯檔案	ansible-vault edit group_vars/all	等同vim group_vars/all
換密碼	ansible-vault rekey group_vars/all

2018年4月20日星期五

[OpenStack][Neutron] Nova instances的name resolution

nova instance藉由dhcp所拿到的ip配置資訊是由neutron-dhcp-agent所負責處理，其內部是由dnsmasq所實現。所以當我們需要做nova insances的domain name解析時，可以分成以下幾個cases來看

[Case1: 在特定子網使用指定的nameserver]

在建立subnet的時候帶入

neutron subnet-create --dns-nameserver DNS_RESOLVER

更新subnet

neutron subnet-update --dns-nameserver DNS_RESOLVER SUBNET_ID_OR_NAME

[Case2: 所有子網使用同一組nameserver]

修改dhcp_agent.ini

[DEFAULT]
dnsmasq_dns_servers = DNS_RESOLVER

[Case3: 所有子網使用host上的DNS設定]

修改dhcp_agent.ini

[DEFAULT]
dnsmasq_local_resolv = True

https://docs.openstack.org/newton/networking-guide/config-dns-res.html

2018年4月18日星期三

[Docker][Ubuntu] 在ubuntu設定docker engine透過http proxy連線

[Ubuntu 16.04]

此版已用systemd來管理服務，所以必須經由以下設定來走http proxy

sudo mkdir -p /etc/systemd/system/docker.service.d
sudo vim /etc/systemd/system/docker.service.d/http-proxy.conf
edit your http proxy config file

[Service]

Environment="HTTP_PROXY=http://proxy.example.com:80/" "NO_PROXY=localhost,127.0.0.1,docker-registry.somecorporation.com"

4. sudo system daemon-reload
5. sudo systemctl restart docker

[Ubuntu 14.04]

sudo vim /etc/default/docker # edit your docker runtime context file

DOCKER_OPTS="--log-opt max-size=500m --insecure-registry {docker-registry} --dns 8.8.8.8" 

export http_proxy="http://{http-proxy}:8080"

export https_proxy="https://{http-proxy}:8080"

export no_proxy="localhost,127.0.0.1"

2. sudo service docker restart

[Refer]

https://docs.docker.com/config/daemon/systemd/#httphttps-proxy

[OpenStack][GPU] OpenStack對於GPU的支援

目前OpenStack對於GPU的支援作法

PCI passthrough
– Nova VM-based compute (e.g. Libvirt+KVM) with PCI passthrough
Ironic
– 透過ironic設定含有GPU的compute node
CPU pinning、NUMA
vGPU
RDMA(Remote Direct Memory Access)
– 透過一台主機上的網路卡API存取到另一台主機上的記憶體之技術

vGPU

目前若cpu有支援intel GVT-g，就可以支援全虛擬化的vGPU, 雖然效能略低於PCI passthrough，但可以多個VM間做sharing (up to 15)

[vGPU的啟動方法]
Nova
1. 於nova-compute啟動GPU type

[devices]
enabled_vgpu_types = nvidia-35

2. 於nova controller設定flavor

Configure a flavor to request one virtual GPU:

$ openstack flavor set vgpu_1 --property "resources:VGPU=1"

RDMA(Remote Direct Memory Access)

GPU on K8S

於K8S 1.10開始支援nv-docker
一個pod會吃滿一張GPU卡，無法切分

[參考資源]

[OpenStack][kolla] Kolla於Queens/Rocky版的發展特色

Kolla於Queens/Rocky版的特色，簡單整理如下:

映像檔建置輕量化

於Queens版的docker映像檔建置開始支援squash layer，可把多個docker image layer合併成一個
而於Rocky版預計會實現docker的multi-stage build
透過上述技術可讓映像檔的建置更加透過分層建置及reuse來達到輕量化的目的，以減少映像檔的建置跟傳輸時間。

Ceph的支援度更顯明顯

Kolla於Queens開始預設從Jewel版本上拉到Luminous版，同時支撐了cephfs以及ceph nfs的服務。
未來更計畫於Rocky版實現Ceph bluestore的部分，所以相關設定於調整會更明顯。
– https://blueprints.launchpad.net/kolla/+spec/kolla-ceph-bluestore

Rolling update

部分服務開始支援最小downtime的upgrade
目前於Queens版已經完成Keystone以及Cinder的部分，其它將於Rocky版實現中

開發者模式

Queens版正式支援"開發者模式"的部署方式(從pike版開始已經開始有這些features)
透過*_dev_mode=true的設定，可以將各專案的原代碼主目錄直接與container內部路徑做綁定，
讓開發者可以直接修改。

Healthcheck及監控服務支援

之前kolla有嘗試過支持一些監控方案，但效果不是很理想
隨著promethus的成熟，kolla社群預計於Rocky版提供promethus監控
目前的設計會朝向promethus+alertmanager+ gnocchi的方案，
promethus用來做數據收集、alertmanager來做告警、gnocchi用來做監控數據的儲存。

DB的備份任務支援

預計將於queens/rocky版來做

Vitrage的支援

目前已於queens版新增vitrage的部署。
Vitrage是OPENSTACK的RCA(Root-Cause Analysis)專案，可以處理OpenStack內部的告警、事件等，
並經由統一分析後於dashboard呈現報表，方便OpenStack之維運。

Blazar的支援

queens版新增blazar的部分，該服務是做資源的預定，使用者可以在一段時間內申請資源的保留(reservation)，
以便後續使用。

其它細節可參閱:

[OVS] OVS debugging skills

[OVS]

[OVS Debugging Skills]

先確認一下基本連線狀態: sudo ovs-vsctl show
使用tcpdump等指令確認封包是否有傳送
使用ovs-ofctl dump-flows {bridge}或ovs-dpctl dump-flows {flows}看看封包是否有hit flows

2018年4月17日星期二

lsof

sudo lsof -i {protocol}:{port} | xargs sudo kill -9 #{protocol}可省略

例如: sudo lsof -i tcp:8080

2018年4月16日星期一

[ManageIQ] ManageIQ簡介

[簡介]
ManageIQ為Redhat所維護，可以整合OpenStack、VMware、oVirt以及K8S的管控圖型介面工具

[安裝方法]

Docker

1. pull manageIQ container

$ sudo docker pull manageiq/manageiq:gaprindashvili-2

2. start service

$ sudo docker run --privileged -d -p 8443:443 manageiq/manageiq:gaprindashvili-2

3. Login service

default login: admin/smartvm

[DB的查詢方式]
manageIQ採用psql, 查看db有兩種方法

在gui上的ops/explorer上有Database的分頁可以查看
透過操作docker

sudo docker exec -it manageiq bash # 進入manageiq docker
su -l postgeres # 切換到postgres使用者
psql
\c vmdb_production #切換到vmdb (manageiq資料庫)
\d #表列table

安裝demo畫面

常用指令集

Init Project

gradle init
gradle init --type java-library

Build Project

gradle clean build

Show all supported gradle tasks

gradle tasks

Static Code Analysis (YOU MUST DO THIS)

gradle check
gradle findbugs

各位先在Eclipse中裝好CheckStyle/PMD/FindBugs/Lint4j等驗測工具進行本地端掃描 (可於Eclipse Marketplace中進行安裝)
CheckStyle教學: http://www.vogella.com/tutorials/Checkstyle/article.html
FindBugs教學: http://blog.csdn.net/strawbingo/article/details/5924005
PMD教學: http://goo.gl/JwwPI
Lint4j: http://www.jutils.com/eclipse/

Unit Test

gradle test
gradle -x test // skip test

測試完可於build/reports/資料夾觀察測試結果

Integration Test

gradle integrationTest
gradle -x integrationTest // skip integrationTest

測試完可於build/reports/資料夾觀察測試結果

Liquibase

gradle update // 執行DB Migration到最新版本
gradle validate // Checks the changelog for errors
gradle dropAll // Drops all database objects owned by the user
gradle rollback // Rolls back the database to the state it was in when the tag was applied.

NOTE

SQL檔內容必須要用Linux的換行字元:\n，避免Liquibase讀不到
若liquibase需要透過spring framework來整合啟動，則必須在spring beans的設定檔(ex: configs.xml)中加入以下XML程式碼片段: (可參考根專案底下src/main/resources/liquibase-springintegration.xml的設定)
<bean id="liquibase" class="liquibase.integration.spring.SpringLiquibase">

<property name="dataSource" ref="dataSource" />

<property name="changeLog" value="classpath:db/migration/changelog-master.yml" />

<property name="dropFirst" value="true" />

</bean>
SQL檔請放置於src/main/resources/db/migration資料夾中
Liquibase changelog檔案(src/main/resources/db/changelog-master.xml)必須進行以下宣告:

databaseChangeLog:

- preConditions:

- runningAs:

dbms: postgresql

username: postgres

- changeSet:

id: 01_00_c_global_schema_20151005

author: steed

changes:

- sqlFile:

path: ../migrate_sql/01_00_c_global_schema_20151005.sql

relativeToChangelogFile: true

rollback:

- sqlFile:

path: ../migrate_sql/01_00_c_global_schema_20151005_rollback.sql

relativeToChangelogFile: true

Packaging Jar/War

gradle jar
gradle war

Watch Project Layout

gradle projects

Show Project dependencies

gradle dependencies

[OpenStack][DevStack] Devstack pip套件版號注意事項

最近在安裝pike版的devstack過程中，發現devstack怎都安裝不起來。然後從log發現pip會安裝到最近pypi.org上所最新所釋出的pip 10，於是發現了以下有趣的程式碼片段:

pip_version=$(python -c "import pip; \

print(pip.__version__.strip('.')[0])")

https://github.com/openstack-dev/devstack/blob/stable/pike/inc/python#L336

這個pip.__version__會經由strip('.')去掉字串頭尾的dot(.)然後印出版號，像是'9.x.y'。這個版號字串在<10的話不會怎樣，但是在新版10的時候就會變成1，這樣會悲劇阿，導致整個安裝過程會爆炸。以下是小弟的測試過程:

>>> import pip

>>> pip.__version__

'9.0.3'

>>> pip.__version__.strip('.')

'9.0.3'

>>> pip.__version__.strip('.')[0]

'9'

>>> x='10.0.0'

>>> x[0]

'1'

>>> pip.__version__.split('.')[0]

'9'

>>> x.split('.')[0]

'10'

>>>

所以應該是要用split('.')來切割，取得大版號才對啊(是否strip()跟split()會傻傻分不清楚<_._>)。正在討論是否要進行bug回報的部分，但看到社區已於新版修復，可以參考以下的連結

1. https://github.com/openstack-dev/devstack/blob/master/inc/python#L340

2. https://github.com/openstack-dev/devstack/commit/f99d1771ba1882dfbb69186212a197edae3ef02c

[OpenStack][Kolla] 如何用kolla建置及部署OpenStack

[Kolla簡介]
Kolla一開始為TripleO專案的一部分，但它跟TripleO的不同是，它是基於Docker container來deploy OpenStack。而Kolla於ocata版已拆分為kolla和kolla-ansible/kolla-k8s兩個部分:

kolla: 用來建置production-ready images
kolla-ansible/kolla-k8s: 用來部署containernized OpenStack

[如何產生kolla-build.conf來建置kolla映像檔]

安裝tox: sudo pip install tox
切換至kolla專案主目錄: cd kolla/
產生kolla-build.conf: tox -e genconfig #預設會放在etc/
kolla-build.conf配置檔設定

從source建置

[heat-base]
type = git
location = https://github.com/openstack/heat.git
reference = stable/pike

若要透過proxy建置dockerfile

- include_header = ./.header # 自訂header檔案的位置
- include_footer = ./.footer # 自訂footer檔案的位置

[.header檔案內容]
ARG http_proxy=http://{proxy}:8080

ARG https_proxy=https://{https-proxy}:8080

ARG no_proxy=localhost

[.footer檔案內容]
ARG http_proxy=""

ARG https_proxy=""

ARG no_proxy=""

[Kolla映像檔建置]

[方法一] python tools/build.py -b ubuntu {{服務}}

例如: sudo -E tools/build.py -b ubuntu glance neutron

[方法二] kolla-build -b ubuntu {{服務}}

kolla-build指令可以透過sudo pip install kolla/來安裝

[Kolla inventory]

* network_interface - While it is not used on its own, this provides the required default for other interfaces below.
* api_interface - This interface is used for the management network. The management network is the network OpenStack services uses to communicate to each other and the databases. There are known security risks here, so it’s recommended to make this network internal, not accessible from outside. Defaults to network_interface.
* kolla_external_vip_interface - This interface is public-facing one. It’s used when you want HAProxy public endpoints to be exposed in different network than internal ones. It is mandatory to set this option when kolla_enable_tls_external is set to yes. Defaults to network_interface.
* storage_interface - This is the interface that is used by virtual machines to communicate to Ceph. This can be heavily utilized so it’s recommended to put this network on 10Gig networking. Defaults to network_interface.
* cluster_interface - This is another interface used by Ceph. It’s used for data replication. It can be heavily utilized also and if it becomes a bottleneck it can affect data consistency and performance of whole cluster. Defaults to network_interface.
* tunnel_interface - This interface is used by Neutron for vm-to-vm traffic over tunneled networks (like VxLan). Defaults to network_interface.
* neutron_external_interface - This interface is required by Neutron. Neutron will put br-ex on it. It will be used for flat networking as well as tagged vlan networks. Has to be set separately.
* dns_interface - This interface is required by Designate and Bind9. Is used by public facing DNS requests and queries to bind9 and designate mDNS services. Defaults to network_interface.
* bifrost_network_interface - This interface is required by Bifrost. Is used to provision bare metal cloud hosts, require L2 connectivity with the bare metal cloud hosts in order to provide DHCP leases with PXE boot options. Defaults to network_interface.
注意如果interface是OVS bridge的話，介面卡代號記得從br-ethX置換成br_ethX (注意是底線)，詳細可以透過ansible -m setup -i {inventory} all去抓來看

[Kolla部署]
由於目前社群較多人使用的是kolla-ansible專案，所以以下將介紹kolla-ansible的部署指令

確認inventory

https://github.com/openstack/kolla-ansible/blob/stable/pike/ansible/inventory/multinode

確認服務的ports是否被占用: tools/kolla-ansible prechecks
執行部署: sudo -E tools/kolla-ansible -i {{環境部署方式之inventory}} --configdir {{環境設定檔資料夾}} --password {{該環境使用密碼檔}} deploy
Neutron網路配置在kolla-ansible專案中ansible/group_vars/all中修改
kolla-ansible的role service一般在部署會經過register => config => bootstrap => restart service

register: 註冊endpoint
config: 配置
bootstrap: migrate db
restart service

[kolla-ansible各role的目錄結構]

bootstrap: 建立服務用的db，也就是執行*-manage db sync
bootstrap-service: 啟動服務用container，但會以前景模式啟動
handlers/main.yml : 規範要以背景常駐啟動的服務，主要設定檔及啟動參數都規範於此

kolla已自訂撰寫模組kolla_toolbox、kolla_image 於ansible/library來動態載入

[kolla-ansible支援的操作actions]

deploy
precheck
destroy
reconfigure
pull

[kolla-ansible重部署的方法]

手動

sudo docker ps -qa | sudo xargs docker rm -f #清除所有containers
sudo docker volume ls -q | sudo xargs docker volume rm #清除docker volumes的所有資料
sudo service docker restart # 重啟docker服務

[kolla-ansible資料庫修復]

當mariadb的galera抱怨連線有錯的時候可以下達tools/kolla-ansible -i {inventory} --configdir {configdir} --passwords {password } mariadb_revory

[kolla-ansible的腳本演進]

於ocata版本開始，將role的start任務轉交給handler處理，會於deploy.yml中透過meta: flush_handlers來執行

[kolla的HA]

MySQL Galera + Haproxy來做負載平衡 + Keepalived

galera的配置部分: https://github.com/openstack/kolla-ansible/blob/b543b3b7120eb1a701f53a6cb283c53360eb2afc/ansible/roles/mariadb/templates/galera.cnf.j2#L27

RabbitMQ Clustering
RabbitMQ Mirrored Queues

Haproxy

主要做OpenStack API endpoint的管理及負載平衡以達到HA
當enable_haproxy的時候，keepalived也會被帶起來

Memcached
Keepalived

https://github.com/openstack/kolla-ansible/blob/stable/pike/ansible/roles/haproxy/templates/keepalived.conf.j2#L8

https://github.com/openstack/kolla/blob/master/specs/high-availability.rst

[kolla的OVS]

OVS BR的設定: https://goo.gl/MHGSWG

- name: Ensuring OVS bridge is properly setup
  command: docker exec openvswitch_db /usr/local/bin/kolla_ensure_openvswitch_configured {{ item.0 }} {{ item.1 }}
  register: status
  changed_when: status.stdout.find('changed') != -1
  when:
    - inventory_hostname in groups["network"]
      or (inventory_hostname in groups["compute"] and computes_need_external_bridge | bool )
  with_together:
    - "{{ neutron_bridge_name.split(',') }}"
    - "{{ neutron_external_interface.split(',') }}"

[kolla社群發展]

Queens版將支援Ceph的Luminous版本，以支撐ceph fs以及ceph nfs等服務; Rocky有計畫要實現Ceph bluestore的功能
未來openstack-helm跟ansible-k8s專案有合併的趨勢

http://lists.openstack.org/pipermail/openstack-dev/2018-March/128822.html

更多可參考

http://lists.openstack.org/pipermail/openstack-dev/2018-March/128044.html

[kolla常見問題]

1. Hostname has to resolve to IP address of api_interface

TASK [prechecks : fail] ********************************************************************************************************************************************************

task path: /home/ubuntu/kolla/ansible/roles/prechecks/tasks/port_checks.yml:448

[WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: '{{ hostvars[item['item']]['ansible_' +

hostvars[item['item']]['api_interface']]['ipv4']['address'] }}' not in '{{ item.stdout }}'

fatal: [10.144.192.36]: FAILED! => {

    "msg": "The conditional check ''{{ hostvars[item['item']]['ansible_' + hostvars[item['item']]['api_interface']]['ipv4']['address'] }}' not in '{{ item.stdout }}'' failed. T                              he error was: Invalid conditional detected: EOL while scanning string literal (<unknown>, line 1)\n\nThe error appears to have been in '/home/ubuntu/kolla/ansible/roles/prechec                              ks/tasks/port_checks.yml': line 448, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- fail: msg=\"                              Hostname has to resolve to IP address of api_interface\"\n  ^ here\n"

[ANS]

此問題常見於Kolla AIO部署上，請嘗試修改remote host上的/etc/hosts，確認被部署機器上api_interface所綁定的網路介面卡如eth0的ip跟hostname是可以互相解析的。

確認有127.0.0.1 localhost和<mgmt ip> <hostname>這兩行，若不行的話移除127.0.1.1 localhost看看
若有看到ansible的error，可確認ansible套件版本是否符合所需，N版以前建議ansible <= 2.2 、>=2.1 (目前實驗環境為2.1.6)。另外問了Kolla Reviewer，目前p和q版kolla的版本建議用<2.4的，目前coverage rate最高

2. inventory缺少的ansible供裝錯誤

TASK [nova : Ensuring config directories exist]
fatal: [10.144.192.36]: FAILED! => 
{"msg": "The conditional check 'inventory_hostname in groups[item.value.group]' failed. 
The error was: error while evaluating conditional 
(inventory_hostname in groups[item.value.group]): Unable to look up a name or 
access an attribute in template string 
({% if inventory_hostname in groups[item.value.group] %} True {% else %} False {% endif %}).\n
Make sure your variable name does not contain invalid characters like '-': 
argument of type 'StrictUndefined' is not iterable\n
\nThe error appears to have been in '/home/ubuntu/kolla-ansible/ansible/roles/nova/tasks/config.yml': 
line 13, column 3, but may\n
be elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:
\n\n\n- name: Ensuring config directories exist\n  ^ here\n"}
    to retry, use: --limit @/home/ubuntu/kolla-ansible/ansible/site.retry

[ANS] 檢查一下inventory的group是否有缺失，再來考慮是否是ansible版本問題

[參考資源]

https://ask.openstack.org/en/question/93967/kolla-build-command-not-found/

[OpenStack][Ceilometer] Pike版ceilometer的event.sample

1. ceilometer collector是已經從ceilometer中被移除了，從pike版後就不再支援。因為當collector push 資料給backend時會有lag情況，所以為了提高效能，現在改用dispatchers來做

2. ceilometer db(mongo)從ocata版後就被拿掉了，改用gnocchi，backend配置檔放在pipeline.yml

[Linux] 安全清除/boot空間滿了的作法

分享清除舊有kernels的簡單safe作法 (不過還是得看情況執行)

sudo dpkg --list 'linux-image*'| awk '{ if ($1=="ii") print $2}'| grep -v $(uname -r) | while read -r line; do sudo apt-get -y purge $line;done;sudo apt-get autoremove; sudo update-grub

[說明] 先反向選擇出目前所用kernel版號以外的所有版本清單來進行purge刪除，然後再移除不必要的套件再執行update-grub更機開機清單，所以在執行此指令前先確保已經使用最新kernel版本號

https://gist.github.com/ipbastola/2760cfc28be62a5ee10036851c654600#case-ii-cant-use-apt-ie-boot-is-100-full

[OpenStack][Qinling] Function-as-a-Service(Serverless)服務「秦嶺Qinling」簡介

去年(2017)於澳洲雪梨參加OpenStack Summit時有看到社群已經在發展最近火紅的Serverless部分，雖然還在初步階段，但值得玩看看

[Qinling專案簡介]

Qinling(秦嶺)專案的發起目標是要成為OpenStack的Function-as-a-Service，提供一個平台架構來支援serverless functions(如: AWS Lambda, Google Cloud functions…)

由Catalyst IT所發起的專案，有於這次的雪梨峰會來介紹，詳細可以參考底下影片連結[1][2]

[Qinling專案特性]

l 支援多種COE(ex: K8S / Swarm …)

l 支援不同的storage backends: local / swift / s3

[Qinling安裝]

DevStack

[[local|localrc]]
RECLONE=True
enable_plugin qinling https://github.com/openstack/qinling

LIBS_FROM_GIT=python-qinlingclient
DATABASE_PASSWORD=password
ADMIN_PASSWORD=password
SERVICE_PASSWORD=password
SERVICE_TOKEN=password
RABBIT_PASSWORD=password
LOGFILE=$DEST/logs/stack.sh.log
LOG_COLOR=False
LOGDAYS=1

ENABLED_SERVICES=rabbit,mysql,key,tempest

[Qinling參考資源]

[1] Make your application Serverless: https://www.youtube.com/watch?v=NmCmOfRBlIU

[2] (demo) Qinling - Function as a Service in OpenStack: https://www.youtube.com/watch?v=K2SiMZllN_A
[3] Qinling source: https://github.com/openstack/qinling
[4] http://qinling.readthedocs.io/en/latest/

[OpenStack][Ironic] Ironic於Queens版的支援特色

1. Ironic rescue mode

於Q版開始，ironic也開始支援救援模式(rescue)，在救援模式下，用戶可以用rescue password修復instance進行trouble-shooting。(PS: 目前nova instance有rescue mode，所以ironic於Q版也開始支援)

2. Traits API支援

因Nova已經開始有Placement功能(一種計算資源註冊調度的追蹤機制)，於Q版開始，Ironic也新增traits api，讓traits訊息可以註冊到nova的placement api中。

Placement API是用來收集不同的物理資源，提供較為詳細的物理資源訊息，並管理查看物理資源的使用分配、以及AZ關聯等等，所以新增traits api後，可以讓Nova那邊有統計調度ironic資源的能力

https://docs.openstack.org/ironic/latest/install/configure-nova-flavors.html#scheduling-based-on-traits

3. Neutron networking-baremetal

neutorn這邊也開始支援baremetal的部分，初步研究發現有修正port狀態不正常的部分，也提供一些routed network的功能

訂閱：意見 (Atom)

2018年4月30日 星期一