Andy Wojnarek via plug on 19 Nov 2020 08:06:54 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] OT: Announcing AWS Glue DataBrew


There definitely is a market for this. There are a plenty of 'big-box' solutions that do exactly this - all locked down in their own ways: inflexibility, cost, complexity etc.

I think one of the best examples of this is Splunk - it can be rather expensive, extremely flexible, but lacks in many ways. Splunk is still my favorite solution for this, the biggest drawback being you cannot expose to the public.

What I'm looking for in a tool like this that has the following capabilities:

* Flexibility in data
* Inexpensive
* Public Promotion (i.e a Public read-only link anonymously viewed)
* Easy to use
* Easy to maintain

IBM Watson for Analytics was ahead of the game on this - but true fashion - it either went away or transformed into something not as useful. 

I'll check this out, because I have a few projects where I need all of this, and being SaaS and inexpensive would be ideal.

--
Andy

On 11/14/20, 2:33 PM, "plug on behalf of JP Vossen via plug" <plug-bounces@lists.phillylinux.org on behalf of plug@lists.phillylinux.org> wrote:

    Strictly speaking this is OT, but I guarantee most, if not all of us on this list have had this problem, and this is a very interesting thing to know about.  I know I see a problem like this at least every few weeks, and I end up solving it with some combination of Unix text utils, Perl, or spreadsheet sort and filter functions.

    I'm very curious about how the underlying technology works.  There are probably competing tools I'm not aware a of as well, maybe like Andy's "FOSSCON Analytics on Watson AI as a Service" or the Wolfram Alpha stuff?  And CJ alluded to data cleanup in "Data Analysis in Linux" in https://www.cjfearnley.com/Data.Analysis.PLUG.July2013.pdf as well.

    Anyway:

    Announcing AWS Glue DataBrew – A Visual Data Preparation Tool That Helps You Clean and Normalize Data Faster: https://aws.amazon.com/blogs/aws/announcing-aws-glue-databrew-a-visual-data-preparation-tool-that-helps-you-clean-and-normalize-data-faster/

    	"To be able to run analytics, build reports, or apply machine learning, you need to be sure the data you’re using is clean and in the right format. That’s the data preparation step that requires data analysts and data scientists to write custom code and do many manual activities. First, you need to look at the data, understand which possible values are present, and build some simple visualizations to understand if there are correlations between the columns. Then, you need to check for strange values outside of what you’re expecting, such as weather temperature above 200℉ (93℃) or speed of a truck above 200 mph (322 km/h), or for data that is missing. Many algorithms need values to be rescaled to a specific range, for example between 0 and 1, or normalized around the mean. Text fields need to be set to a standard format, and may require advanced transformations such as stemming."

    Later,
    JP
    --  -------------------------------------------------------------------
    JP Vossen, CISSP | http://www.jpsdomain.org/ | http://bashcookbook.com/
    ___________________________________________________________________________
    Philadelphia Linux Users Group         --        http://www.phillylinux.org
    Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
    General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug