JP Vossen via plug on 12 Jun 2024 14:14:28 -0700 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
[PLUG] Solved: root "pkill: killing pid * failed: Operation not permitted" |
On 6/10/24 09:14 PM, JP Vossen wrote:
On 6/7/24 04:07 PM, JP Vossen wrote:What could cause "pkill: killing pid * failed: Operation not permitted" *when run by root*? After patching and reboots the other day I started getting daily Anacron emails from Logrotate on most (but not all) of 50+ VMs saying: ``` /etc/cron.daily/logrotate: pkill: killing pid NNN failed: Operation not permitted ``` The culprit is the (quite horrible, but mandatory) Crowdstrike `falcon-agent` service, running from the stock vendor RPM that has not changed since April, and we've had patching reboots since then. The really confusing thing is that *most* of them are doing this, but *not* all, and I can't find any differences! The 50+ VMs are a mix of (quite horrible, but mandatory) Oracle Linux 7.9 (EoL soon, thus migrating) and 8.10, but the problem doesn't follow the distro. Also, a few of the ones that complained on Wed did not complain on Thu, so they "fixed" themselves?
... Solved! The difference I was unable to find is vendor-side: BAD: /opt/CrowdStrike/falcon-sensor -> falcon-sensor16703 Good: /opt/CrowdStrike/falcon-sensor -> falcon-sensor16604 So the installed RPM version is a lie, and I never thought to look into `/opt/CrowdStrike/`, which is a mess, by the way. It's really annoying that the vendor decided slip-stream updating the agent without updating the RPM was OK, especially since that caused this bug. Also, it turns out that Crowdstrike `falcon-sensor` is a root-kit (but not a good one). I understand why it has anti-tampering features, but that's infuriating from an administrative perspective. Fortunately, it's not a very _good_ root kit. Ironically, for the misbehaving (newer falcon-sensor16703) VMs, this Just Worked: ## On VM: `yum erase falcon-sensor` ## Ansible: `crowdstrike_deploy.yml --limit '...'` Really annoyingly, on the 5 or so servers (older falcon-sensor16604) that were NOT misbehaving, that process failed because of the root-kit behaviors. `yum erase falcon-sensor` removed the RPM metadata and such, but was unable to remove `/opt/CrowdStrike/`. But it did remove `/usr/lib/systemd/system/falcon-sensor.service`, so you see where this is going, right? Of course reinstall failed with an obscure message: ``` ... Error unpacking rpm package falcon-sensor-7.01.0-15604.el7.x86_64 error: unpacking of archive failed on file /opt/CrowdStrike/KernelModuleArchive;6669fed4: cpio: symlink ... ``` What that really means is that even root can't `rm` files inside `/opt/CrowdStrike/`. I'm not sure how they did that, but I suspect eBPF features, as Will mentioned on the PLUG N call. (See https://www.reddit.com/r/crowdstrike/comments/187p03v/linux_sensor_tamper_protection/.) But once I rebooted, since the unit file was missing (I assume), the root-kit, err, I mean `falcon-sensor` didn't run, so `rm -rf /opt/CrowdStrike/` worked and *then* I could re-install via Ansible! I probably could have re-installed without the `rm` but a clean install has a *lot less crap* in `/opt/CrowdStrike/`. So thanks for t-shooting everyone! And thanks to Walt for some (more) cool Perl one-liners. Later, JP -- ------------------------------------------------------------------- JP Vossen, CISSP | http://www.jpsdomain.org/ | http://bashcookbook.com/ ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug