Lecture 4-2
Legal/Ethical
considerations in CUS

Esteban Moro

Network Science Institute | Northeastern University

NETS 7983 Computational Urban Science

2025-02-04

Welcome!

This week:

Introduction to the Legal and Ethical considerations in Computational Urban Science

Aims

  • Understand the legal and ethical implications of using large-scale datasets in urban science research.
  • Learn about the current state of the art in data privacy and data protection laws.
  • Discuss the ethical implications of using private sensitive data in urban science research.

Ethical Challenges in Computational Urban Science

  • Large-scale datasets used in CUS are different:
    • The scale, scope, and granularity of data collected in urban science research is unprecedented.
    • Data is typically secondary, produced for a specific purpose, and not for research.
    • Most of the time, researchers do not interact with human subjects (see, however, the famous experiments on social media on emotional contagion).
    • Since data comes from private companies, it has legal and ethical implications for research purposes.
    • These massive datasets are subject to inconsistent, sometimes non-existent, overlapping regulation.

Ethical Challenges in Computational Urban Science

  • Large-scale datasets used in CUS are different:
    • Secondary data is often collected without participants explicit consent to its use for research
    • Given the scale of the data, it is often impossible to anonymize it effectively, even if users do not provide their names or other personal information.
    • Data ownership: who owns the data? The user, the company, the researcher?
    • Power imbalance: companies have the data, and researchers need it. This creates a series of ethical challenges that need to be addressed by researchers and institutions working in CUS.

Ethical Challenges in Computational Urban Science

  • This creates a series of ethical challenges that need to be addressed by researchers and institutions working in CUS.

  • How do we navigate them?

  • Most ethical frameworks are based on principles (Belmont and Menlo Report; see also the Common Rule):

    • Respect for persons, autonomy, agency, voluntariness, and informed consent.
    • Beneficence, minimizing the risks and maximizing the benefits of our research
    • Justice, fairness, and equity in the distribution of the benefits and risks of our research
    • Respect for Law and Public Interest, ensuring that our research is legal and serves the public interest.

Ethical Challenges in Computational Urban Science

Each of these principles has challenges in the context of CUS:

Respect for persons:

  • Obtaining informed consent is rarely feasible for secondary datasets.
  • Researchers must balance this limitation with the need to protect participants
    • How do we ensure that users can opt out of research?
    • How do we ensure that users can access and correct their data?
    • How do we ensure that users understand how their data is being used?

Ethical Challenges in Computational Urban Science

Each of these principles has challenges in the context of CUS:

Beneficence:

  • Even if secondary data is publicly available (e.g., social media posts), using it for research can feel intrusive for some users and communities
  • We have to value the risk/benefit profile of our study to the users and participants, but also to our urban communities, data providers, and our institution.
  • Many secondary datasets are de-identified, but advances in computational techniques make re-identification a real risk, especially when combined.

Ethical Challenges in Computational Urban Science

Justice:

  • Many secondary datasets are not representative of the population, which can lead to biased results and unfair treatment of some groups.
  • Our research can then perpetuate existing biases and inequalities.
  • Most problems in CUS affect vulnerable populations. How do we ensure that our research does not harm them? For example, how do we ensure that our research about gentrification leads to more gentrification? Or does the study about the impact of transportation policies lead to the exclusion of some groups from public services?

Ethical Challenges in Computational Urban Science

Respect for Law and Public Interest

  • Many laws and regulations are not designed to address the scale and scope of data used in CUS.
  • However, in recent years, new laws about the use of data have been passed (GDPR, CCPA, HIPAA, CPRA, etc.).
  • They propose a series of principles that can guide our research, but they are not always clear on how to implement them in practice. For example, the GDPR is based on the following principles:
    • Lawfulness, fairness, and transparency
    • Purpose limitation
    • Data minimization
    • Accuracy
    • Storage limitation
    • Integrity and confidentiality
  • But, for example, what does data minimization mean in practice for a researcher working with large-scale datasets?

Areas of difficulty

The use of large-scale datasets has created a series of areas of difficulty in CUS:

  • Informed consent: be sure the users have some form of consent for research.
    • Clicking “I accept” on apps might not be enough.
  • Privacy: how do we ensure that our data is not re-identifiable?
    • Anonymization is not enough, especially when datasets are combined or contain large amounts of information.
    • Differential privacy is a promising approach, but it is not always feasible.
    • Aggregation is another approach, but it can lead to loss of information.

Areas of difficulty

  • Data protection and sharing:
    • All data are potentially re-identifiable and sensitive.
    • We need data protection plans when using large-scale data to ensure that the data is protected and shared in a responsible way.
    • Walled garden approach: data is shared only with a small group of researchers, overseen by an IRB, and only for a specific purpose. This can limit the impact of our research, but it can also protect the data from misuse.
    • Most proposals, IRB approvals require a data protection plan.

Practical Strategies for Ethical Research with Secondary Data

  • Data Minimization: Use only the data necessary to achieve research objectives, reducing risk exposure.

  • Anonymization: Employ techniques to preserve individual privacy, such as differential privacy or aggregation.

  • Transparent Practices:

    • Be clear about how the data was obtained and used, especially in research publications.
    • Publicly share ethical considerations and safeguards (data protection plan).
  • Ethical Review Processes:

    • Seek approval from Institutional Review Boards (IRBs) but recognize their limitations in addressing the unique challenges of secondary data.
    • Proactively evaluate ethical implications at every stage of the research process.

Practical Tips

  • Is the data we are using classified as “Human Subjects” by the federal government?
    • Human subject data is information obtained through interaction or intervention with a living individual or identifiable private information about that individual used for research purposes.
    • If so, we need to have IRB approval.
    • If not, we can get an exemption, but we must still follow ethical guidelines and best practices.
    • For example:
      • Aggregated LBS mobility data might not considered human subjects because it is not identifiable, it is secondary data, and was not obtained through interaction with individuals.
    • Always check with your institution’s IRB.

Practical Tips

  • Considering working with a data provider?
    • Minimize retrieving individual data from the provider. If possible, work on their premises through VPN or their platform.
    • Storing individual data at your institution might require a data protection plan, IRB approval, and a data-sharing agreement.
  • Working with a dataset that might be re-identifiable?
    • Use differential privacy techniques to ensure that the data is not re-identifiable.
    • Use aggregation techniques to reduce the granularity of the data.
    • Keep the “Census model” in mind: use k-anonymity aggregation and differential privacy on small areas or groups.
    • Use encryption techniques to protect the data.

Practical Tips

  • Not sure about how to proceed?
    • You are not alone! Consult with other researchers using similar data.
    • Always consult your institution’s data protection officer or legal counsel when in doubt.
  • Remember: the goal is to protect the data, the users, and the communities we study.

Conclusions

  • The use of large-scale datasets in CUS research has created a series of ethical challenges that need to be addressed by researchers and institutions.

  • Most ethical frameworks are based on principles of respect for persons, beneficence, justice, and respect for law and public interest.

  • Practical strategies for ethical research with secondary data include data minimization, anonymization, transparent practices, and ethical review processes.

  • Adopt a principles-based approach to ethical research with secondary data.

  • Always consult your institution’s IRB and data protection officer when in doubt.

More reading

References

[1]
M. J. Salganik, Bit by bit: Social research in the digital age. Princeton University Press, 2019.
[2]
M. Zook et al., “Ten simple rules for responsible big data research,” PLoS computational biology, vol. 13. Public Library of Science San Francisco, CA USA, p. e1005399, 2017.
[3]
M. A. Moreno, N. Goniu, P. S. Moreno, and D. Diekema, “Ethics of Social Media Research: Common Concerns and Practical Considerations,” Cyberpsychology, Behavior, and Social Networking, vol. 16, no. 9, pp. 708–713, Sep. 2013, doi: 10.1089/cyber.2012.0334.