


This potentially enables people to determine the problem and whether “we can use the software we have now or that we create to solve that so in order to do that“. Then, data scientists will be able to mathematically model based on the framing the problem through a machine learning lens. Casari proposes to frame the problem in a way where machine learning has the potential to be useful. The four aspects of the framework include: The intention is to provide a way for data scientists to think about the process and techniques. from that, how can we maximize those features as opposed to just trying to collect more data ….to go from raw data to features in order to think about the models and the problem trying to solve”.Ĭasari also provided a feature engineering framework in her talk. Casari also indicates that feature engineering enables data scientists to be “more thoughtful in thinking about the data and the features that we build…. This is important because data scientists, with feature engineering, “can make more educated choices and understanding process and then hopefully that will save time “as well as obtaining “more transparency into what that outcome will be and being able to have that interpretability aspect”. She indicates that “the right features can only be defined in the context of both the model and the data”. Why Consider Feature EngineeringĬasari defines a “feature” as a numeric representation and “feature engineering” as “the act of extracting those features from raw data and then transforming them into something that we can use for a machine learning model“.
Feature engineering for shakespeer full#
For additional depth including coverage on feature scaling as well as techniques for text data including bag of words, frequency-based filtering, and chunking parts of speech, the deck and full session video are publicly available. This Domino Field Note provides distilled highlights from the talk on these topics. In the talk, “ Feature Engineering for Machine Learning”, Casari’s provides a definition of feature engineering a framework for thinking about machine learning as well as techniques including converting raw data into vectors, visualizing data in a feature space, binarization, and binning (quantization). The full video of the talk is available here and special thanks to Amanda for providing permission to Domino to excerpt the talk’s slides in this Domino Field Note. Casari is also the co-author of the book, Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists. Casari is the Principal Product Manager + Data Scientist at Concur Labs. This Domino Field Note provides highlights and excerpted slides from Amanda Casari’s “ Feature Engineering for Machine Learning” talk at QCon Sao Paulo.
