2023/02/28

今天早上三點多開始收到ntopng的告警

進主几看了一下 process不見了

重開也沒用

看了一下log

ntopng果然有更新

更新後起不來

接下來看ntopng的log

發現本次更新後必須使用到 libbpf.so.0


Feb 28 08:09:17 W-ntopng-ubuntu-2004 ntopng[3247]: /usr/bin/ntopng: error while loading shared libraries: libbpf.so.0: cannot open shared object file: No such file or directory

Feb 28 08:09:22 W-ntopng-ubuntu-2004 ntopng[3272]: /usr/bin/ntopng: error while loading shared libraries: libbpf.so.0: cannot open shared object file: No such file or directory

Feb 28 08:09:28 W-ntopng-ubuntu-2004 ntopng[3294]: /usr/bin/ntopng: error while loading shared libraries: libbpf.so.0: cannot open shared object file: No such file or directory

 

apt install libbpf0

目前正常了 再觀察看看

2023/02/22

昨天有朋友問我說proxmox的guest開不了几

不知是什麼問題

連進去看了一下

發現一個guest開了好几個HD

而且每個HD都開到2T

而且還做了好几個snapshot

導致實際上guest的HD 膨脹 到5T或更大

把空間全部吃滿了

所以無法開几

因為是使用qcow2格式

所以snapshot會長在原來的檔案上

我是覺得奇怪

guest開那麼多個2T的檔案

user不會覺得效能不好嗎

解決的方式就是把舊的snapshot砍一砍

再觀察看看

2023/02/13

今天早上要進ntop管理介面的時候

打完帳號密碼登不進去

進os看了一下HD滿了

然後再看log

出現一堆如下的訊息 把HD塞爆了

Feb 13 08:14:29 W-ntopng ntopng[286]: 13/Feb/2023 08:14:29 [SQLiteAlertStore.cpp:151] ERROR: SQL Error: database disk image is malformed

Feb 13 08:14:29 W-ntopng ntopng[286]: INSERT INTO flow_alerts (alert_id, interface_id, tstamp, tstamp_end, severity, ip_version, cli_ip, srv_ip, cli_port, srv_port, vlan_id, is_cli_attacker, is_cli_victim, is_srv_attacker, is_srv_victim, proto, l7_proto, l7_master_proto, l7_cat, cli_name, srv_name, cli_country, srv_country, cli_blacklisted, srv_blacklisted, cli_location, srv_location, cli2srv_bytes, srv2cli_bytes, cli2srv_pkts, srv2cli_pkts, first_seen, community_id, score, flow_risk_bitmap, alerts_map, cli_host_pool_id, srv_host_pool_id, cli_network, srv_network, probe_ip, input_snmp, output_snmp, json, info) VALUES (26, 3, 1676247257, 1676247266, 3, 4, '192.168.40.66', '192.168.0.65', 44983, 80, 0, 0, 0, 0, 0, 6, 7, 0, 5, '', '', '', '', 0, 0, 0, 0, 126, 120, 2, 2, 1676247257, '1:rj5vzKw7WQX8TONTQ++bh3BkBh8=', 10, 70368744177664, X'04000000', 0, 0, 65535, 65535, '0.0.0.0', 0, 0, '{"ntopng.key":12345678,"hash_entry_id":23456789,"alert_generation": {"script_key":"ndpi_unidirectional_traffic","subdir":"flow","flow_risk_info":"{\"46\":\"No client to server traffic\"}"},"proto": {"http": {},"confidence":0}}', '');


google了一下是 sqllite 因為斷電導致有問題

果然

斷一次電事情一堆

看是有recovery sqlite的方法

算了

直接倒回事發前一天晚上的備分好了

倒回後目前正常

再觀察看看


https://blog.csdn.net/wolfking0608/article/details/71076588 


2023/02/11

今天下午几房斷電
有一台graylog啟動後
三個 service都有起來
但從管理介面看log全都卡住
過了一個小時還是沒有消化
想說應該是 elasticsearch 有問題了
看了log

[2023-02-11T20:27:56,520][WARN ][o.e.c.r.a.AllocationService] [localhost.localdomain] failing shard [failed shard, shard [graylog_666][2], node[0l7asmrIRFeIxc3FyAB14Q], [P], recovery_source[existing store recovery; bootstrap_history_uuid=false], s[INITIALIZING], a[id=yqeR9a7CSUC4ZIIz-a07Gw], unassigned_info[[reason=ALLOCATION_FAILED], at[2023-02-11T12:27:55.997Z], failed_attempts[4], failed_nodes[[0l7asmrIRFeIxc3FyAB14Q]], delayed=false, details[failed shard on node [0l7asmrIRFeIxc3FyAB14Q]: failed recovery, failure RecoveryFailedException[[graylog_666][2]: Recovery failed on {localhost.localdomain}{0l7asmrIRFeIxc3FyAB14Q}{AHesmcGhQvGWAw7Gxl2V6A}{10.10.0.1}{10.10.01:9300}{dimr}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: NoSuchFileException[/mnt/elasticsearch/nodes/0/indices/soJ39cmwT5-UlEyVIPvfAg/2/index/_x63f.fdt]; ], allocation_status[deciders_throttled]], message [failed recovery], failure [RecoveryFailedException[[graylog_666][2]: Recovery failed on {localhost.localdomain}{0l7asmrIRFeIxc3FyAB14Q}{AHesmcGhQvGWAw7Gxl2V6A}{10.10.0.1}{10.10.0.1:9300}{dimr}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: NoSuchFileException[/mnt/elasticsearch/nodes/0/indices/soJ39cmwT5-UlEyVIPvfAg/2/index/_x63f.fdt]; ], markAsStale [true]]
org.elasticsearch.indices.recovery.RecoveryFailedException: [graylog_666][2]: Recovery failed on {localhost.localdomain}{0l7asmrIRFeIxc3FyAB14Q}{AHesmcGhQvGWAw7Gxl2V6A}{10.10.0.1}{10.10.0.1:9300}{dimr}

果然

手動 rotate active write index

有消化了
再觀察看看

2023/02/08

nftables 雖然是很久的東西了 還是記一下

安裝
dnf install nftables

清空所有
nft flush ruleset

設定新table
nft add table inet filter

新增一個chain 並預設規則
nft add chain inet filter INPUT { type filter hook input priority 0 \; counter \; policy accept \; }

在chain加上新規則

nft insert rule inet filter INPUT ip saddr 192.168.12.85 tcp dport 22 drop

列出規則
nft list table inet filter

列出規則 顯示 handle 號 以利刪除
nft -an list table inet filter

列出所有table的規則
nft -an list ruleset

刪除規則
nft delete rule inet filter INPUT handle 2

nft加的規則 iptables 去看會不完整
但是有作用的


下完 nft flush ruleset 後
再下 nft -an list ruleset 是看不到資料的
但如果再下iptables -L 
下完後
再 nft -an list ruleset
就會看到如下的ruleset

table ip filter { # handle 3
        chain INPUT { # handle 1
                type filter hook input priority 0; policy accept;
        }

        chain FORWARD { # handle 2
                type filter hook forward priority 0; policy accept;
        }

        chain OUTPUT { # handle 3
                type filter hook output priority 0; policy accept;
        }
}


預設所有下的指令 重開几就會清掉 如果要重開几自動執行

nft list ruleset >> /etc/sysconfig/nftables.conf 


先匯出成檔案
nft list ruleset > /etc/nftables/nft_policy.nft


然後再 /etc/sysconfig/nftables.conf 

include "/etc/nftables/nft_policy.nft"

重開後撈進來


如果確定不再需要 iptables 
移除
dnf remove iptables



2023/02/07

sshpass

範例

sshpass -p passwd ssh root@10.0.0.1 date


2023/02/04

今天在檢查 pbs 時又出現 GC (garbage collection) warning 的log
再去看前一天的備分
是顯示備份完成且沒有錯誤的
不知道為什麼備分完成沒有錯誤但GC時會出現如下的錯誤

2023-02-02T00:00:22+08:00: WARN: warning: unable to access non-existent chunk 4d9f87572f2ff8d9f324aef1263e1ab47181a764aac801918b6dd5567fdfdde9, required by "/mnt/nfs418/ct/112/2023-01-31T16:00:47Z/catalog.pcat1.didx"
2023-02-02T00:00:22+08:00: WARN: warning: unable to access non-existent chunk e82ad3ac9b4b29c55420a44c29029c1a69ebd2cae156994c7e6a4f6a3b44524d, required by "/mnt/nfs418/ct/112/2023-01-31T16:00:47Z/catalog.pcat1.didx"
2023-02-02T00:00:22+08:00: WARN: warning: unable to access non-existent chunk 7eeadcfafebe86f0244ab4b07167644784be8485da119208d91e078efb48a7de, required by "/mnt/nfs418/ct/112/2023-01-31T16:00:47Z/root.pxar.didx"

而且問題來了
再接下來每一次備分都會顯示備分完成無異常
但GC就會一直錯誤
而且一旦GC有錯誤這個備分就無法還原
目前pbs在GC有錯誤時無法主動發mail 告警
所以解決方法就是寫個程式每天檢查GC是否有錯誤
如果有
就要把有錯誤相關的備分砍了
讓接下來的備分能正常

2023/02/03

atftp

sudo apt install atftpd
mkdir /tftp_data
chmod -R 777  /tftp_data
atftpd --daemon --port 69 /tftp_data