Raytheon is seeking experienced data scientist/applications developer SMEs to create, leverage, and apply data science approaches to successfully implement information management solutions.Duties:
Identify and implement methods for duplicate detection, document categorization, entity and information extraction using natural language processing and machine learning.
Assist with the design and implementation of visualizations and reports for business intelligence metrics.
Understand and implement methodologies that are consistent with standard techniques in the data science field.
Propose, implement, and evaluate content analytic strategies for characterizing and categorizing large data sets of unstructured files and messages using COTS/GOTS/Open Source tools.
Develop custom software as required by Sponsor to characterize and categorize large datasets of unstructured files and messages.
Oversee construction of annotated data sets for training and evaluation of prototype tools.
Serve as a Subject Matter Expert (SME) in discussions with analytic tool developers and enterprise IT management.
Partner with information management SMEs to define and refine framework, strategies, and actions for collecting and analyzing unstructured file metadata and content stored in Sponsor's automated systems (e.g. email repositories, databases, shared drives).
Implement collection and analysis actions, such as ingesting, indexing, normalizing, and structuring file content and metadata in preparation for analysis using tools in the big data environment (GOTS, COTS, and open source tools including but not limited to Hadoop, Hive, Tableau, Spark, Visual Studio, and other emerging technologies).
Partner with information management SMEs to determine baseline, analyze patterns and characteristics in file content and metadata, and construct visualizations to share lessons learned.
Lead and/or contribute to discussions with Sponsor and Sponsor partners on collection and analysis framework, strategies, processes, and methodologies.
Build relationships with stakeholders to negotiate access, security, and storage needs for the unstructured file objects and the features created during the collection and analysis process.
Provide recommendations and training to Sponsor and Sponsor partners on techniques and tools in the big data environment.
Write MapReduce jobs, Hive queries; Python, Java, Scala, R, and Scala programs as appropriate to perform various tasks related to machine learning and data science activities including data cleanup, data transformation, data mashing and algorithm parallelization.
Implement algorithms from various sources (academia, federal labs or other Government Agencies) into parallelized MapReduce.
Analyze and correlate large amounts of data.
Run machine learning workflows from various platforms (such as Apache NIFI and Jupyter Notebook) on large amounts of data.
Administer, configure, and optimize a distributed cluster ecosystem such as Hadoop or Spark.
Demonstrated on-the-job experience integrating and analyzing large data sets using big-data technologies, such as GOTS, COTS, Tableau, Visual Studio, Hadoop, and Hbase.
Demonstrated on-the-job experience with XML, JSON, or other Customer standards for data transfer and metadata management.
Demonstrated on-the-job experience with Hadoop and Spark.
Demonstrated on-the-job experience with Java, Python, Bash scripting
Demonstrated on-the-job experience proposing, implementing and evaluating content analytic strategies for characterizing and categorizing large data sets.
Demonstrated on-the-job experience designing and implementing metadata measurement and trend extraction on large data sets
Ability to communicate technical concepts to a non-technical audience.Desired Skills/Experience:
Familiarity with Scala and or R
Experience with AWS
Familiarity with Linux/Windows
Systems administration for AWS, Hadoop, Spark, databases such as Oracle and MySql
Familiarity with Scikit-Learn, Gensim, NLTK, Spacy and the applications of these tools to Natural Language Processing
Familiarity with Theano, Tensorflow, Torch, Keras, Mxnet, Deeplearning4j and the application of these tools to Natural Language Processing
Familiarity with classification and clustering algorithms such as LightGBM, Xgboost, Random Forest, Support Vector Machine, K-means and t-SNE
Raytheon is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, creed, sex, sexual orientation, gender identity, national origin, disability, or protected Veteran status.
Raytheon Intelligence, Information and Services (IIS) is a leader in intelligence, surveillance and reconnaissance; advanced cyber solutions; weather and environmental solutions; information-based solutions for law enforcement and homeland security; and training, logistics, engineering, product support, and operational support services and solutions for the mission support, homeland security, space, civil aviation, counter-proliferation and counter-terrorism markets. IIS, which operates at nearly 551 sites in 80 countries, is headquartered in Dulles, VA. and generated $6 billion in 2014 revenues. As a global business, our leaders must have the ability to understand, embrace and operate in a multicultural world -- both in the marketplace and in the workplace. We strive to hire individuals who reflect our communities and proactively embrace diversity and inclusion in order to advance our culture, develop our employee and leaders, and grow our marketshare with our clients."
Security Clearance: TS/SCI with Poly - Current
Relocation Eligible: No
Engineering Technology, Systems Engineering, Warfighter Support Services, Engineering, All