Social Media / Privacy / Mastodon Mining

Dr Ben Britton
7 min readNov 23, 2022

--

Hopefully a quick one to reflect on something that happened over on https://sciencemastodon.com/explore today and highlights some of the privacy and safety risks of the Fediverse.

The Administrator of ScienceMastodon posted a network graph to show connections of users on the instance. I’ve make sure it’s blurred in this screenshot of the post, because as you see later on here, there are challenges ethics/privacy/consent issues to consider.

In all of this, I’m sharing this to highlight some concerns moving forward. On the balance of things, I have big concerns about the ‘running concern’ of Twitter, as well as issues regarding safety and moderation in evolution of the Fediverse (i.e. Mastodon).

Toot containing the network graph that highlights user membership of science mastodon, and the message of excitement about what it contains from the instance admin
Here we have Prof Charles Seife posting a toot showing the connections of users on ScienceMastodon, as revealed with a (beautiful) network graph, and my reply-toot querying the consent and ethics considerations. In the original graphic, all the names were clearly visible (and there was a high resolution PDF file that you can zoom in on and see every single node).

Please note that I am not the first person to write about the ethics/privacy/consent issues about mining mastodon for data (e.g. this great piece on ‘home invasion’ associated with the twitter migration; and this piece about data-mining, consent, and scientific use), and the wild west of instance moderation/administration (see this toot thread on moderation at scale issues, and how the system has to change). Furthermore, I admit that I used to say “it doesn’t matter what instance you join, you can move later…” — as this little story tells us, we can see that it really does matter. I can also admit that I am only a recent-started to Mastodon, so I am learning the ropes (quickly, but also still learning…).

Back to our story…

I saw that a friend had reposted the wonderfully beautiful graph above. It is pretty, and it’s elegantly sorting users broadly into disciplines (e.g. pink = high energy physics) — but this graph was generated with everyone’s name shown in detail, and how they are connected. Replies to the toot stated that it was pretty, cool, interesting, etc. etc. and people were sharing how connections were absent/or could have been there in certain disciplines.

The problem — data, ethics and informed consent

Let us consider two scenarios, and the (simple) ethics of this.

Scenario 1 — taking photos in public

I stand at a train station. I take a series of photographs, because I think it’s a vibrant artistic moment to see individuals living their lives and going about their business at a major transit terminal.

A photo of grand central station in New York City, showing people walking around and enjoying the space. (Photo from unsplash).

Now imagine that I do this at the same time each day, and take the same photo, and put this into a computer to look for the same people in this space.

Lets imagine I take this up a notch, and cross reference the visual matching against profile photos from a social media website.

The act of taking a single photo — that’s legal and totally allowed in most public spaces.

The act of taking multiple, and processing that information to reveal a pattern — that has human participant and ethical implications.

Scenario 2— taking photos in gay bar

Lets re-consider this same scenario, which already had ethical implications, and take photos in a space where there members of a marginalised community congregate. Here that photograph, of people in a gay bar has revealing and disclosure of affiliation towards a protected characteristic, which has been subjected to significant harm and violence. This timely note is shared as I write this post in the week following events at Club Q — n.b. content warning queerphobia and gun violence in the link).

We can clearly and demonstrably show the ethical implications of these photos, and how they are used/shared in public. But you might ask, what does this mean regarding the original post?

Back to the piece at hand — the online gay-/queer-dar of network connections

One way that queer people can tell other queer people is through analysis of our networks on social media. It takes me one click to see who follows-who and our mutuals on social media, especially when we have an overlapping interest like queer-nerd-science.

This means the network analysis of my queer connections in STEM will reveal a network of people who may also be queer, even if they have not disclosed this information in their profile. Furthermore, we shouldn’t assume as there is a whole part about not revealing someone’s SOGI online without their consent, and in their own time (for context, see how Rebel Wilson was Outed by a columnist without consent).

This means that a plain text network analysis, especially when there are margenlised members in a group (such as queer members of a STEM community) can reveal insight into invisible and protected characteristics, beyond the intent of the initial piece of study.

So this network analysis is a study of people (mostly, there were/are a few journals on that instance — notably Science and Nature) and as we’ve explored there are specific privacy and ethics issues here. In Europe, there is GDPR that regulates personal data (online connections between Europeans is personal data); in Canada, we have FIPPA that regulates the processing and storage of personal data about Canadians; and in the US is a bit of a mess with a range of laws that vary state-by-state.

Now, I am not a laywer… but from my privacy training (as part of my job) and my ethics training (as part of my job), I know that I should consult ethics regulations and laws, prior to studying humans in any context (even with some of these simplistic analysis tools like mining social media). Furthermore, if I was in a position of power (such as an administrator of a social media server), I would have higher expectations with regards a reasonable understanding of the rules and ethics as bound by my profession/employer/and the place I live and work.

Interestingly, New York University, where the administrator of sciencemastodon is currently employed, does have ethics rules that are based upon informed consent. In the broadest (and simplest sense) informed consent says that if you want to study human participants, you should tell them you will collect data, what you will do with it, why you will do this, and what they can do if they want to opt out. Typically, for an ethical and reasonable company, this would be found in a subclause of the End User License Agreement or the Terms of Service/Use. Alternatively, for an academic study (even if it is preliminary) you would seek ethics approval, and this would result in something like the use of a consent form on the front of your qualtrics survey (for instances) that would provide details.

If we pop over to ScienceMastodon’s about page, we can look at the about for the instance (fetched 22/Nov/2022):

This is supposed to be a comfortable place for scientists and journalists to be. Think of it as a cocktail party; behaviors that are unacceptable in that context are unacceptable here.

Patreon for those who wish to support: https://www.patreon.com/sciencemastodon

DMCA agent: Charles Seife (@cgseife, or cgseife at gmail)

We can also look at the Server Rules (fetched 22/Nov/2022):

1. Users may not be anonymous, at least while this site is in its infancy.

2. Users should be scientists, journalists, or scientific/journalistic institutions or publications.

3. Please be as kind, polite, and understanding as you would be in a cocktail party of friends and strangers. Harassment, racism, sexism, antisemitism, homophobia, transphobia, or bullying will get you shown to the door.

4. No trolling — combative, out-of-left-field responses to posts seeking argumentation rather than genuine discussion are not welcome.

5. Avoid spreading misinformation. Obviously this can be difficult to delineate, and there are many gray areas, but repeated and/or deliberate spread of misinformation is not allowed.

6. Relentlessly pushing a theory or idea — in its extreme, crankery — is not welcome. The line is blurry, but it can be crossed. Again, think cocktail party… feel free to discuss ideas, but if people are likely to start sidling away, it’s an issue.

7. The Content Warning flag should be used for NSFW content; NSFW media should be marked as sensitive.

8. Users must be 18 or older.

Note that none of these rules deal with privacy, consent, opting in for content scraping or analysis such as what was done, by the instance admin, in this case. This is despite NYU, which the instance moderator has a clear affiliation with in their profile, having an informed consent ethics policy with regards to the study of human subjects.

Journalist, NYU professor, debunker of ‘alternative facts.’ Author of several books, most recently a biography of Stephen Hawking. Pronouns: he/him/his.

After this discussion and critique went around a little bit, the original post was deleted, and several hours later the OP posted a comment:

toot that says: I’ve taken down my earlier plot — while I disagree with those who say it was violative, I am removing out of deference to those who think it does harm.
 
 I’ll point out that all the data used were (and are) publicly available — in my view, there’s a lot of grappling to do with the nature of Mastodon’s status as (or desire to keep it from being) a public forum.
Toot from Charles Seife that states he does not see any issues with this public data (from https://sciencemastodon.com/@cgseife/109389344977750941).

Ultimatly, I remain concerned about the administration of the ScienceMastodon instance and privacy of users on that server. I am also concerned that the major outlets of Science and Nature have chosen to make this their local instance, as it is demonstrably unsafe for LGBTQ+ people and therefore excluding many people who could benefit for the advantage of the ‘local instance’ benefits here.

If you found this interesting, give it a clap so it can rise on to prominance with the Medium algorithm.

You can find @bmatb over on Mastodon, and while it lasts, on Twitter too. Hopefully I’ll still be on Mastodon.Social when you read this.

p.s. You may issue concern with me screen shotting toots without permission, and I am aware of the ethical concerns of this critique. All screen shots here are my content and the instance admin. The instance admin has been deleting some of their toots when they are critiqued.

--

--

Dr Ben Britton

Atomic sorcerer, based at UBC (Canada). Plays with metals. Discusses academic life. Swooshes down ski slopes. Pegs it round parks. (Views my own)