BIG DATA RESOURCES
Big Data is a term used to describe very large data sets and the technologies and practices of handling those data sets. Advancements in data analysis presents opportunities to improve decision-making in critical areas such as health care, employment, economic productivity, crime, security, and natural disaster and resource management. Sources for Big Data are many and varied. They include web data, sensors, cell towers, census data and other data from the government, social media and transactional data. Big Data discussions often concern one or more of the 5 Vs: Volume, Velocity, Variety, Variability and Value. Cloud-based storage has facilitated data mining and collection. However, this big data and cloud storage integration presents privacy challenges and security threats.
Articles on Big Data:
Well known privacy thought leaders Omer Tene and Jules Polonetsky discuss the benefits and concerns of Big Data. Big Data creates enormous value for the global economy, driving innovation, productivity, efficiency, and growth. At the same time, the “data deluge” presents privacy concerns that could stir a regulatory backlash, dampening the data economy and stifling innovation. In order to craft a balance between beneficial uses of data and the protection of individual privacy, policymakers must address some of the most fundamental concepts of privacy law, including the definition of “personally identifiable information,” the role of consent, and the principles of purpose limitation and data minimization.
Our data-flooded era is one of technological progress, with tides rising at a pace never seen before. Our roles, rights and responsibilities are reorganized and new ethical questions posed.
This paper discusses unique security and privacy challenges presented by big data, arguing traditional approaches for static (non-streaming) data are inadequate. The goal is to raise awareness of the importance of fortifying big data infrastructure.
The Electronic Privacy Information Center’s (EPIC) comments to the Office of Science and Technology Policy (OSTP) regarding current privacy risks presented by “Big Data” and opaque algorithmic profiling.
Resources on Big Data:
This introductory ethics module for data science courses includes a reading, homework assignments, and case studies, all designed to spark a conversation about ethical issues that students will face in their role as data practitioners. No training in ethical theory, applied ethics, or philosophy is required for either the instructor or the students as they tackle these materials.
This site has free resources for learning data science. This page contains tutorials for analytical languages such as SQL, Python, R, and how-to posts about performing common tasks like A/B testing, as well as career advice.
Amazon Macie is a security service that uses machine learning to automatically discover, classify, and protect sensitive data in AWS. Amazon Macie recognizes sensitive data such as personally identifiable information (PII) or intellectual property, and provides dashboards and alerts that give visibility into how this data is being accessed or moved.
Websites addressing Big Data:
EPIC is a public interest research center in Washington, DC. EPIC was established in 1994 to focus public attention on emerging privacy and civil liberties issues and to protect privacy, freedom of expression, and democratic values in the information age.
Congress established the White House Office of Science and Technology Policy to provide the President and others within the Executive Office of the President with advice on the scientific, engineering, and technological aspects of the economy, national security, homeland security, health, foreign relations, the environment, and the technological recovery and use of resources, among other topics.
Future of Privacy Forum (FPF)
Future of Privacy Forum is a nonprofit organization that serves as a catalyst for privacy leadership and scholarship, advancing principled data practices in support of emerging technologies.