JP Vossen via plug on 14 Nov 2020 11:32:58 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[PLUG] OT: Announcing AWS Glue DataBrew


Strictly speaking this is OT, but I guarantee most, if not all of us on this list have had this problem, and this is a very interesting thing to know about.  I know I see a problem like this at least every few weeks, and I end up solving it with some combination of Unix text utils, Perl, or spreadsheet sort and filter functions.

I'm very curious about how the underlying technology works.  There are probably competing tools I'm not aware a of as well, maybe like Andy's "FOSSCON Analytics on Watson AI as a Service" or the Wolfram Alpha stuff?  And CJ alluded to data cleanup in "Data Analysis in Linux" in https://www.cjfearnley.com/Data.Analysis.PLUG.July2013.pdf as well.

Anyway:

Announcing AWS Glue DataBrew – A Visual Data Preparation Tool That Helps You Clean and Normalize Data Faster: https://aws.amazon.com/blogs/aws/announcing-aws-glue-databrew-a-visual-data-preparation-tool-that-helps-you-clean-and-normalize-data-faster/

	"To be able to run analytics, build reports, or apply machine learning, you need to be sure the data you’re using is clean and in the right format. That’s the data preparation step that requires data analysts and data scientists to write custom code and do many manual activities. First, you need to look at the data, understand which possible values are present, and build some simple visualizations to understand if there are correlations between the columns. Then, you need to check for strange values outside of what you’re expecting, such as weather temperature above 200℉ (93℃) or speed of a truck above 200 mph (322 km/h), or for data that is missing. Many algorithms need values to be rescaled to a specific range, for example between 0 and 1, or normalized around the mean. Text fields need to be set to a standard format, and may require advanced transformations such as stemming."

Later,
JP
--  -------------------------------------------------------------------
JP Vossen, CISSP | http://www.jpsdomain.org/ | http://bashcookbook.com/
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug