Martin Dellwo on 16 Mar 2018 08:45:00 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] AWS Scripting using awscli and bash


I had a situation similar where I had to clean up 46M duplicate object versions.   Because they are versioned, you can’t simply do it recursively with ‘aws s3’ commands, you have to use lower level ‘aws s3api’ commands; you have to  explicitly delete each object, and/or do in batches of 1000. And to do that, you first need the inventory. All of which takes time. A lot of time, at that scale. You have to subdivide and parallelize the problem. I ran something like 80 scripts in parallel for a week.

But, in the end, waste of effort. Create a lifecycle rule on the bucket to delete old versions older than one day, and voila they will be gone in 24 hrs. Though the user says they need to retain the original data. They could copy that data elsewhere first, or tweak the lifecycle rules somehow. 

Yeah, I know, it wasn’t you it was some random Reddit user. 

AWS cli is incredibly powerful though, especially with jmespath querying of output, and ‘wait’ functionality. 

Martin Dellwo
484-437-3662

On Mar 16, 2018, at 8:54 AM, Andy Wojnarek <andy.wojnarek@theatsgroup.com> wrote:

Great read, thanks for sharing.

In my presentation, I have “unlimited scale” in the pros column, but maybe I should have included it in the con column too. I saw this on reddit:

s3 fiasco - 100+ million duplicated files in a versioned bucket
“So, we built a lambda that handles image compression when images are dumped in a bucket, but didn't handle tagging properly and it ran in a loop for a few hours and stacked a painful amount of duplicate objects. The quantity is so massive at this point, everything I've tried with the command line tools/boto has just crashed - and the bucket is versioned so I'd have to loop through and delete every version of every file. Anyone have any great ideas? The data prior to the run date needs to be retained.”

Uh oh…. Pretty easy to generate enough usage to bankrupt a project if you aren’t careful…. With an on-prem solution, you can only scale within your limits of what you’ve purchased.

--
Andy

On 3/14/18, 7:39 PM, "plug on behalf of JP Vossen" <plug-bounces@lists.phillylinux.org on behalf of jp@jpsdomain.org> wrote:

   $20
   https://www.amazon.com/AWS-Scripted-Automate-Deployment-Resilient/dp/1503137775/
   but see http://www.quickstepapps.com/ and download the code from
   http://www.quickstepapps.com/wp-content/uploads/2014/10/awsscripted.zip.

   I first read this book at the end of 2016 but I had no context for most
   of it.  I just re-read it today, in light of Andy's certification preso
   and some stuff at $WORK, and I thought it might be of interest.

   This is a book that automates 100% of deploying "a resilient Web
   Application" in AWS, using the `aws` cli and bash from a Mac laptop.
   The author uses a Mac because it has bash and Windows sucks, but it's
   not clear why the thought of using an actual Linux desktop never seems
   to have occurred to him.  (Shut up Walt.  :-)

   It's in...bash.  That's cool because...hey, bash!  But in a way it bugs
   me because Ansible is really a lot more suitable to a lot of what he's
   doing.  There's also a lot in the author's bash code that I don't like,
   and there are some questionable security practices and pretty awful uses
   of `expect`.

   BUT...it's bash.  It's easy to see what is going on, and the examples of
   doing really practical things are REALLY, REALLY nice.  It seems there
   are a lot of things you have to do in a certain order, and you need to
   build on previous steps using IDs of things you just created.  And
   that's all in here, in code.

   I haven't tried any of the code yet because my API access isn't sorted
   out, but I have no reason to believe the code will not work.

   There's enough "book" there to get you going, but most of the later
   chapters are a couple of paragraphs and then just the verbatim code...
   The AWS code is interesting, but it gets into a lot of PHP, Apache and
   other stuff I don't care about.  Still, the practical example is a fine
   idea and I like it.

   Enjoy,
   JP
   --  -------------------------------------------------------------------
   JP Vossen, CISSP | http://www.jpsdomain.org/ | http://bashcookbook.com/
   ___________________________________________________________________________
   Philadelphia Linux Users Group         --        http://www.phillylinux.org
   Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
   General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug


___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug