JP Vossen via plug on 12 Jun 2024 14:14:28 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[PLUG] Solved: root "pkill: killing pid * failed: Operation not permitted"


On 6/10/24 09:14 PM, JP Vossen wrote:
On 6/7/24 04:07 PM, JP Vossen wrote:
What could cause "pkill: killing pid * failed: Operation not permitted" *when run by root*?

After patching and reboots the other day I started getting daily Anacron emails from Logrotate on most (but not all) of 50+ VMs saying:
```
/etc/cron.daily/logrotate:
pkill: killing pid NNN failed: Operation not permitted
```

The culprit is the (quite horrible, but mandatory) Crowdstrike `falcon-agent` service, running from the stock vendor RPM that has not changed since April, and we've had patching reboots since then.

The really confusing thing is that *most* of them are doing this, but *not* all, and I can't find any differences!  The 50+ VMs are a mix of (quite horrible, but mandatory) Oracle Linux 7.9 (EoL soon, thus migrating) and 8.10, but the problem doesn't follow the distro.  Also, a few of the ones that complained on Wed did not complain on Thu, so they "fixed" themselves?

...

Solved!

The difference I was unable to find is vendor-side:
	BAD:	/opt/CrowdStrike/falcon-sensor -> falcon-sensor16703
	Good:	/opt/CrowdStrike/falcon-sensor -> falcon-sensor16604

So the installed RPM version is a lie, and I never thought to look into `/opt/CrowdStrike/`, which is a mess, by the way.

It's really annoying that the vendor decided slip-stream updating the agent without updating the RPM was OK, especially since that caused this bug.

Also, it turns out that Crowdstrike `falcon-sensor` is a root-kit (but not a good one).  I understand why it has anti-tampering features, but that's infuriating from an administrative perspective.  Fortunately, it's not a very _good_ root kit.

Ironically, for the misbehaving (newer falcon-sensor16703) VMs, this Just Worked:
## On VM: `yum erase falcon-sensor`
## Ansible: `crowdstrike_deploy.yml --limit '...'`

Really annoyingly, on the 5 or so servers (older falcon-sensor16604) that were NOT misbehaving, that process failed because of the root-kit behaviors.  `yum erase falcon-sensor` removed the RPM metadata and such, but was unable to remove `/opt/CrowdStrike/`.  But it did remove `/usr/lib/systemd/system/falcon-sensor.service`, so you see where this is going, right?

Of course reinstall failed with an obscure message:
```
...
Error unpacking rpm package falcon-sensor-7.01.0-15604.el7.x86_64
error: unpacking of archive failed on file /opt/CrowdStrike/KernelModuleArchive;6669fed4: cpio: symlink
...
```

What that really means is that even root can't `rm` files inside `/opt/CrowdStrike/`.  I'm not sure how they did that, but I suspect eBPF features, as Will mentioned on the PLUG N call.  (See https://www.reddit.com/r/crowdstrike/comments/187p03v/linux_sensor_tamper_protection/.)

But once I rebooted, since the unit file was missing (I assume), the root-kit, err, I mean `falcon-sensor` didn't run, so `rm -rf /opt/CrowdStrike/` worked and *then* I could re-install via Ansible!  I probably could have re-installed without the `rm` but a clean install has a *lot less crap* in `/opt/CrowdStrike/`.

So thanks for t-shooting everyone!

And thanks to Walt for some (more) cool Perl one-liners.

Later,
JP
--  -------------------------------------------------------------------
JP Vossen, CISSP | http://www.jpsdomain.org/ | http://bashcookbook.com/

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug